Syntax and Semantics Notes
Introduction
- Syntax: The form/structure of expressions, statements, and program units.
- Semantics: The meaning of expressions, statements, and program units.
- Syntax and semantics provide a language’s definition.
Terminology
- Sentence: A string of characters over an alphabet.
- Language: A set of sentences.
- Lexeme: Lowest-level syntactic unit (e.g.,
*
, total
, if
). - Token: A category of lexemes (e.g., identifier, int_literal).
- A token is defined by a pattern and a lexeme is an instance of a token.
Common Tokens
- Identifiers, keywords, separators, operators, literals.
- Recognizers: Decide whether input strings belong to the language (e.g., syntax analysis).
- Generators: Generate sentences of a language (e.g., code generation).
Context-Free Grammars (CFG) and BNF
- Context-Free Grammars: Language generators.
- Backus-Naur Form (BNF): Equivalent to context-free grammars used to describe syntax.
BNF Fundamentals
- Abstractions (nonterminal symbols) represent syntactic structures.
- Terminals are lexemes or tokens.
- Production Rules: define how non-terminals can be expanded; LHS (nonterminal), RHS (string of terminals/nonterminals).
- Grammar: a finite non-empty set of rules.
- Start symbol: a special element of the nonterminals.
BNF Rules
- Abstractions can have multiple RHS.
- Abstractions can use recursion (e.g.,
<ident_list> → <ident> | <ident>, <ident_list>
). - Derivation: repeated application of rules, starting with the start symbol.
Derivations
- Sentential form: A string of symbols in a derivation.
- Sentence: A sentential form with only terminal symbols.
- Leftmost derivation: Replaced non-terminal is always the leftmost.
- Rightmost derivation: Replaced non-terminal is always the rightmost.
Parse Tree
- Hierarchical structure showing the derivation process.
Ambiguity in Grammars
- A grammar is ambiguous if it generates a sentential form with two or more distinct parse trees.
Removing Ambiguity in Grammars
- Step I: Specify precedence using different non-terminals for each level.
- Step II: Specify associativity using recursion on either the left or right non-terminal.
Extended BNF
- Optional parts
([])
, choices (|)
, repetitions ({})
.
Syntax Analysis
- Lexical analyzer (low-level): finite automaton based on regular grammar.
- Syntax analyzer/parser (high-level): push-down automaton based on CFG/BNF.
Reasons to Separate Lexical and Syntax Analysis
- Simplicity, efficiency, portability.
Lexical Analyzer
- Pattern matcher for character strings, identifies lexemes.
Lexical Analyzer Functions
- Extract lexemes and produce tokens, skip comments.
- Insert lexemes into symbol table.
- Detect syntactic errors.
Building a Lexical Analyzer
- Use a software tool.
- Design a state diagram and write a program.
- Design a state diagram and hand-construct a table-driven implementation.
State Diagram Design
- Combine transitions and use character classes.
- Reserved words and identifiers can be recognized together.
Useful Subprograms (Functions) in the Lexical Analyzer
Parsing Problem
- Find all syntax errors and produce diagnostic messages.
- Produce the parse tree or a trace.
- Top-down: Begins at the root, leftmost derivation, traces the tree in preorder.
- Bottom-up: Begins at the leaves, reverse of a rightmost derivation.
Notation Conventions
- Terminals: lowercase letters at the beginning of the alphabet.
- Nonterminals: uppercase letters at the beginning of the alphabet.
Top-Down Parsers
- Choose the correct A-rule to get the next sentential form.
- Common algorithms: recursive descent, table-driven (LL algorithm: Left-to-right, Leftmost derivation).
Bottom-Up Parsers
- Identify the substring corresponding to the right-hand side of a grammar rule.
- Common algorithms: LR family (Left-to-right, Rightmost derivation).
Complexity of Parsing
- Complex parsers are O(n^3), compilers use linear parsers O(n).
Recursive-Descent Parsing
- Subprogram for each nonterminal in the grammar.
- EBNF is suited, minimizes nonterminals.
- Process: compare terminal symbols, invoke parsing subprograms for each nonterminal.
The LL Grammar Class Problems
- Left Recursion: Cannot be the basis for a top-down parser. Solution: Remove left recursion.
- Inability to Determine the Correct RHS : using left factoring eliminate common prefixes.
Bottom-up Parsing
- Generates the reverse of a rightmost derivation. It find the specific RHS, handle, in the right sentential form.
Bottom-up Parsing – Shift-Reduce Algorithms
- Shift and reduce are the two most common actions they specify
-Shift: moving the next input token to the top of the parse stack
-Reduce: replacing the handle on the top of the parse stack with its corresponding LHS
Bottom-up Parsing – LR Parser
Three key advantages of LR parsers:
They can be built for all programming languages
They can detect syntax errors as soon as it is possible
The LR, which most left recursive.
Static Semantics
- Rules that constrain syntactically correct programs (e.g., type checking).
- Verified at compile time.
- Described using Attribute Grammars.
Attribute Grammars (AGs)
- Additions to CFGs to carry semantic info on parse tree nodes.
Attribute Grammar: Definition
- For each grammar symbol X, there is a set A(X) of attribute values.
- Each rule has functions defining attributes and predicates checking consistency.
## Dynamic Semantics - Several needs for a methodology and notation for semantics:
-Programmers need to know what statements mean
-Compiler writers must know exactly what language constructs do
- Correctness proofs would be possible
-Compiler generators would be possible
## Three formal methods to describe Dynamic Semantics
-Operational Semantics
-Denotational Semantics
-Axiomatic Semantics
Operational Semantics
- Describe the meaning by specifying the effects of running it on a machine.
Denotational Semantics
Based on recursive function theory.
-Process of building a denotational specification for a language:
-Define a mathematical object for each language entity
-Define a function that maps instances of the language entities
Onto instances of the corresponding mathematical objects
The meaning of language constructs are defined by the
values of the program's variables and how they change over time.
Axiomatic Semantics
- Based on formal logic (predicate calculus).
- Original purpose: formal program verification.
{P} statement {Q}, P: Precondition; Q: Postcondition,
a = b / 2 – 1 {a < 10} {b < 22} a = b / 2 – 1 {a < 10}