what is assembly language?
a low level language where each assembly code instruction is equivalent to one machine code instruction
the instruction set is dependent on hardware
each type of processor has a different instruction set and different assembly code
what is an assembler?
before assembly language can be executed, it must be translated into machine code or an intermediate form called bytecode
assembler takes each assembly code instruction and converts it into machine code
translates source code into object code
what is a compiler?
translates a whole program in a high level language into executable machine code (object code)
compiler scans through source code several times, performing different checks and building up tables of information required to produce final object code
the object code produced is hardware specific, so different hardware needs different compilers
object code can be saved and run whenever needed without compiler’s presence
what is an interpreter?
translates high level code to machine code
interpreter translates code line by line into machine code and runs each line one at a time
interpreted languages include JS and PHP
what is bytecode?
an intermediate representation which combines compiling and interpreting
it is then executed by a bytecode interpreter
the bytecode may be compiled once or compiled each time a change in source code is detected before execution
Python can also be compiled into Java bytecode and the Java interpreter can be used to run it
what are two advantages of bytecode?
bytecode helps to achieve platform independence
eg any computer that can run Java has a Java Virtual Machine, which masks inherent differences between different architectures and OS
the JVM converts bytecode into machine code for that particular computer
therefore bytecode can be run on many different hardwares because the JVM can run it regardless of processor differences
acts as an extra security layer between computer and program
if you download an untrusted program, you can run the Java bytecode interpreter instead of the program itself, which guards against malicious programs
advantages and disadvantages of compilers
advantages
object code can be saved on disk and run whenever required without recompiling
object code executes faster than interpreted code because in interpreters, each line must be translated every time it is encountered, like in for loops
object code can be distributed or executed without compiler present
object code is more secure because it is harder to read by outsiders
appropriate for when a program is run frequently with little changes
appropriate from when object code needs to be distributed to external users as source code is not present so can’t be copied/amended
disadvantages
compilers are not platform independent
each time an error is discovered recompilation is needed
what are the stages of compilation?
lexical analysis, syntax analysis, code generation and optimisation
what happens in lexical analysis?
all comments and unnecessary spaces are removed
simple error checking, eg illegal identifiers and assigning illegal values to constants
all keywords, constants and identifiers are replaced by tokens representing their function in the program
eg, numbers are converted to their run-time representation, identifiers are replaced by pointers to their addresses in the symbol table, keywords are replaced by item codes
creates entries in the symbol table with identifiers and their run-time addresses so it can replace them by tokens in the source code
what is the symbol table?
contains entries for every keyword and identifier in the program
entries contain name, kind, data type, run-time address/value, and pointer to accessing information
lexical analyser puts identifier and run-time address in, and syntax analyser puts in kind and data type
the symbol table must be organised so entries can be found as quickly as possible to improve overall speed of computer
the symbol table is often structured as a hash table where keyword/identifier is hashed to produce array subscript
synonyms/collisions are inevitable so synonym is stored in next available free space
what is syntax and semantic analysis?
syntax analysis
the stream of tokens from the lexer is split into phrases
each phrase is parsed (checking the phrase against a set of language rules to determine whether it is a valid sentence)
stacks are used to check bracket pairing, expressions are converted into a form from which machine code can be more easily generated
semantic analysis
it is possible to create a program which has correct syntax but isn’t a correct program
semantic analysis checks the meaning of the code
eg using an identifier that wasn’t previously declared, assigning a real value to an integer variable
what is code generation?
once the program has been checked, the computer generates machine code
this could happen in several passes over the code because code optimisation also has to happen
what is code optimisation?
aims to reduce execution time of object program, making it more efficient
detects redundant instructions
replaces inefficient code with code that achieves the same effect but is more efficient
disadvantages
increases compilation time
sometimes produces unexpected results (like if a program was intentionally written to take a long time)
what is the linker?
once a program has been compiled, any separately compiled subroutines must be linked into the object code
could be input/output routines, subroutines from the language’s libraries, or subroutines written by the programmer
the linker puts the appropriate memory addresses in all the external call and return functions so that the modules are correctly linked
what is the loader?
copies the program and any linked subroutines into main memory to run, provided programmer hasn’t used absolute addresses and object code is in relocatable format
when executable code was created it may assume the program will load in memory address 0
the loader needs to relocate some memory addresses in the program because some memory is already in use
what are libraries?
software libraries contain pre-written, pre-compiled programs which can be loaded and run when required
most compiled languages have their own built-in libraries of pre-written functions that can be invoked from the user’s program
programmers can also write their own libraries
libraries can generate random numbers, provide a GUI
library routines are tested, error-free and save the programmer time