1/25
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Syntax
the external represenation of a program / i guess like the abstraction of the program
if syntax is right or wrong: given some text , is it a well formed (meaning it can be correctly parsed and compiled, its looking at the actual text and if it adheres to the rules)
Semantics
Denoting the meaning of the program given the code source is well-formed
This often depends on the context of the program cause operators have different meanings depending on the context.
Phases of a compiler
lexer (text to tokens)
turns raw text into tokens, reads char by char and groups them into chunks that it recognizes
removes the spaces, recognizes identifiers, keywords,
parser (tokens to parse trees)
checks if tokens follow the GRAMMAR of the LANGUAGE (syntax rules)
then builds a parse tree structure
semantic analyzer (from parse tree to abstract syntax tree)
figures out the meaning of the code.
Intermediate code generation:
this is the code generated in the target language interpretable by the compiler
optimization : MACHINE INDEPENDET
This eliminates redundancies in the code
Target code generation
optimization (machine dependet)
instruction scheduling: might be able to move things around on target machine code
register allocation
machine dependent or independent compiler steps
Machine independent means it doesn’t have specific requirements for hardware to work
dependent means it depends on hardware like: number of registers , instruction set (
Grammar defined as a 4-tuple
Terminal symbols:
alphabet of the language
symbols appearing in the final strings that CANNOT be expanded further
in compilers these are the TOKENS
Non-terminal symbols
abstract syntactical categories
used to define the structre
can be expanded using rules
Example: Exp , Term, Factor…
Distinguished non terminal:
the ROOT symbol. the starting point of the grammar like S= expr
Rewrite rules:
the rules that define how the non terminals are expanded;
Expr can be expanded as Expr + Term and a Term can be expressed as a Term * Factor
Then we also define the empty term or string as epsilon (basically doesnt map to any terminal or nonterminal symobol)
Define Language
A set of sentences containing only terminal symbols that can be generated by applying the rewriting rules starting from the root symbol
BNF vs EBNF
BNF: no expressiveness to the language meaning we have some abbreviations and thats all
Example : of alternation and sequencign
Sym ::= letter |digit
Id ::= letter sym
EBNF : extended BNF
more detail
to show repetition
Id :==letter{sym}
this basically means we can repea for
EBNF and BNF code to describe grammar
EBNF ADDS SHORTCUTS:
{} repreat zero or more times
* zero or more times the element following it directly after
+ repeat one or more times the following char
option
Num ::= Digit+[.Digit*]
If these symbols are also in the language as terminal symbols (like [, and +, *….) then we need to have a convention for the metasymbols
Grammar for floating point
composed of floating point in terms of digits an then define digits
floating ::= digits | digits.digits
digits::= digit | digitDigits
digits = 0|1|2…..all the numbers
when is grammar ambiguous
if the parse sentence is not unique.
like maybe the pre3cedence of + and * is not established and then we have multiple trees for the same sentence.
precendence indicates
grouping not order of evaluation
how can you specify precedence
writing precedence rules directly into grammar : higher order rules wil be deeper in the parse tree than other rules
write an ambiguous grammar and specify the precedence separately
associativity
when operators at same level of preference the associativity dictates how we group
like 5-2-3 is different result depending on the grouping because of the - operator
if we do left associativity : (5-2)-3
if we do right associativity : 5-(2-3)
solutions to the dangling if statement
pascal rule: last open if statement takes the else
gramatically have different productions for unbalanced or balanced if statements
add a grammatical symbol to signify the end of an if statement
scanners
read input and extract tokens from it
identifiers
constants
keywords
symbols like ( , ?, ! etc
parsers
accept a grammar and tokens as input and attempt to construct a parse tree
LL parsing: depth first meaning we begin at start symbol and then rpredict next rewrite rule and recurse
what rule next?
LR : from left-hand side non terminals that match input tokens already seen
Faster usually
from input resconstructs grammar
reduces input back to start symbol
left recursion: (issue w LL )
if there exists a nonterminal A such that A =+ Aa for some a
this means that becasue we are looking from left to right the parser will always get stuck in this recurscion and never consume the following a. Infinite rec.
binding
the association of a name to something
binding time: the time the association made
static vs dynamic binding
static binding :The association between a name and its type/function is made at compile time.
ex. local variables are objects and if we try to bind them to something else object slicing may occur and the binding is not successful since it was already made at compile time
well it is successful but just not the expected reassignment since it doesnt fit in memory
dynamic binding: The association between a name and its function is made at runtime.
Pointers and references allow dynamic binding because they store a memory address to the full object, allowing flexible assignment at runtime.
lifetimes of different var types
static objs: lifetime of the progra
static varibales
global vars
stack objects: time in the activation record
local variables
Heap obj:
arbitrarylifetime
These are dynamically allocated memory, so unless freed, they remain
static vs dynamic scoping
Static scoping: the binding of a name is given by the scope of the current fucntion we are in
basically start at scope you are in currently and trace back scopes that we are enclosed by to find the declaration
Dynamic scoping: the binding of a name is given by the most recent declaration.
static scoping in nested classes
you can see outward from parent scopes but not inward in child scopes
static, heap and stack memory
static memory is allocated at compile time. This is for strings, constants and static vars.
stacks: allocated in frames as we run subprograms.
LIFO
its local vars, temporaries return address.
Heap :
allocated from main memory according to allocation policy
overloading
methods and operators have several meanings depending on the context.
coercion changes type but here we mean same name for different result
control structure
any mechanism departing from the default straight line execution
if statment / case statements
iteration (loops)
calls / returns /execpt/continue
goto is another:
goto is when we have branches and no if statments.
these branches can be conditional or unconditional
implement loops if and case statemets
by increment, decremenet and branch on zero
short circuit
If the left side sttamenet tells gives away the result of the full operation then we dont need to evaluate the rest.
like
C1 && C2 = if C1 is false we shortcircuit and output F for all
C1 || C2 = if C1 s True shortcircuit to True for the whole statament no C2 eval