Lec 2 Compiler
Compiler Design Lecture Notes
Page 2: Phases of Compilers
Lexical Analysis (Scanner)
Syntax Analysis Phase
Global Optimization
Code Generation
Local Optimization
Page 3: Outline of Finite State Machine
Finite State Machine
Regular Expression
Implementation with Finite State Machines
Page 5: The Role of a Lexical Analyzer
Goal: Partition input string into substrings where the substrings are tokens
Example: if (i == j) Z = 0; else Z = 1;
The input is just a string of characters
The goal is to partition the input string into tokens
Page 6: Finite State Machine and Tokens
Before lexical analysis, cover the concepts of finite state machine and regular expression
Tokens are substrings of the input string
In English: noun, verb, adjective
In a programming language: Identifier, Keyword, Operator, Special Character
Page 8: Symbols and Finite State Machine
A symbol is an abstract entity with no meaning by itself
Examples of symbols: Letters, Digits, Special Characters
Page 9: Alphabet and Finite State Machine
An alphabet is a non-empty finite set of symbols
Examples of alphabets: {a, b, c}, {a, b, ..., z}, {0, 1}
Page 12: Language and Finite State Machine
A language is a set of strings of symbols from an alphabet
Examples of languages: Set of palindromes, Set of strings with a pattern
Page 13: Definition of Finite State Machine
A Finite State Machine is a mathematical model of computation
It is defined as a 5-tuple denoted by M=(Q, Σ, δ, q0, F)
Q: Finite or non-empty set of States
Σ: Input Alphabet
q0: Initial State
F: Set of Final or Accepting States
δ: Transition function or mapping function
Page 14: Representation of Finite State Machine
Finite State Machine can be represented by a transition diagram or transition table
Transition diagram: States represented by circles, Transition function represented by arcs
Transition table: Rows indicate states, Columns indicate input alphabet
Page 21: Regular Expression
Regular Expression is another method for specifying languages that use patterns
It consists of operations like union, concatenation, and Kleene star
Page 22: Operations in Regular Expressions
Union: Combines two sets into one
Concatenation: Concatenates strings from two sets
Kleene star: Generates zero or more concatenations of strings
Page 23: Examples of Regular Expressions
Examples of regular expressions and their corresponding languages
Page 24: More Examples of Regular Expressions
Examples of regular expressions and their corresponding languages
Page 25: Examples of Strings in Regular Expressions
Examples of strings that belong to the languages defined by regular expressions
Page 26: Regular Expressions Example
Language 1: Strings containing an odd number of zeros.
Regular expression: 101(0101)*
Language 2: Strings containing three sequential ones.
Regular expression: (0+1)111(0+1)
Language 3: Strings containing exactly three zeros.
Regular expression: 1010101
Language 4: Strings that begin with 1 and end with zero.
Regular expression: 1(0+1)*0
Page 27: Regular Expressions Example
Language L1 represents strings with an even number of ones (even parity).
Strings belonging to L1: a) 0101 b) 110211 c) 000 d) 010011 e) ε
Strings belonging to L1: a) 0101 c) 000 e) ε
Page 28: Regular Expressions Example
Language L2 represents strings with an equal number of a's, b's, and c's.
Strings belonging to L2: a) bca b) accbab c) ε d) aaa e) aabbcc
Strings belonging to L2: a) bca b) accbab c) ε e) aabbcc
Page 29: Regular Expressions Example
Strings in the language specified by the finite state machine:
Strings: a) abab b) bbb c) aaab d) Aaa e) ε
Strings in the language: a) abab b) bbb c) aaab e) ε
Page 30: Regular Expressions Example
Construct finite state machines for the following regular expressions:
(a+b)*c
(aa)*(bb)*c
Page 31: Lexical Analysis Example
Java source input example with word boundaries and types.
Output of the lexical analysis phase is a stream of tokens.
Each token consists of two parts: class indicating the kind of token.
Page 32: Lexical Analysis Example
Show word boundaries and token classes for Java input strings.
Lexical analysis phase does not check for proper syntax.
Page 33: Lexical Analysis Examples of Finite State Machines
Finite state machines for lexical analysis.
Machine accepts keywords: if, int, import, for, float.
Page 34: Implementation with Finite State Machines
Page 35: Implementation with Finite State Machines
Finite state machine can be implemented using an array.
Array has a row for each state and a column for each input symbol.
Page 36: Actions for Finite State Machines
Finite state machines can be used for more than recognizing words.
Actions can be associated with each state transition.
Page 37: Example of FSM with Action
Design a finite state machine to read numeric strings and convert them to an appropriate format.
Include method calls for transitions: digits(), decimals(), minus(), and expDigits().
Page 38: Example of FSM with Action
Finite state machine with actions for reading numeric strings.
Includes methods: digits(), decimals(), minus(), and expDigits().
Page 39: How to implement Lexical Tables?
Creation of tables in the lexical analysis phase is important for the compiler.
Tables can include a symbol table, table of numeric constants, string constants, and statement labels.
Implementation techniques: Sequential Search, Binary Search Tree, Hash Table.
Page 40: Sequential Search
Table organized as an array or linked list.
Time complexity to build a table of n words is O(n^2).
Page 41: Binary Search Tree
Table organized as a binary tree.
Time complexity to build a table of n words is O(n log n) in the best case, O(n^2) in the worst case.
Page 42: Binary Search Tree
Time complexity to build a table of n words is O(n log n) in the best case.
Time complexity could be O(n^2) in the worst case.
Page 43: Hash Table
Selection of a good hash function is critical for the efficiency of this method.