PRELIM EXAM
Syntax and Semantics
Introduction
Syntax
Form and structure of the program code
Semantics
Meaning of the program code
meaning of expressions, statements, control structures
Syntax and semantics provide a language’s definition
Users of a language definition
Implementers
Programmers (the users of the language)
General Problem of Describing Syntax: Terminology
A sentence is a string of characters over some alphabet
A language is a set of sentences
A lexeme is the lowest level of syntactic unit of a language
*, sum, begin
A token is a category of lexemes
identifiers
Formal Definition of languages
Generators (Grammars)
A device that generates sentences of a languages
One can determine if the syntax of a particular sentence is syntactically correct by comparing it to the structure of the generator
Recognizers (Parser)
A recognition device reads input strings over the alphabet of the language and decides whether the input strings belong to the language
syntax analysis part of a compiler
BNF / Context- Free Grammars
Context-Free Grammars (CFG)
Developed by Noam Chomsky in the mid-1950s
Language generators, meant to describe the syntax of natural languages
Define a class of languages called context-free languages
Backus-Naur Form (1959)
Invented by John Backus to describe Algol 58
BNF and CFG represent equal concepts
BNF - CFG Fundamentals
In BNF, abstractions are used to represent classes of syntactic structures-they act like syntactic variables (also called nonterminal symbols, or just nonterminals)
Terminals are lexemes or tokens
A rule has a left-hand side (LHS), which is a nonterminal, and a right-hand side (RHS), which is a string of terminals and/or nonterminals
BNF / CFG Grammar Rules
An abstraction (or nonterminal symbol) can have more than one RHS
<stmt> -> <single_stmt> | begin <stmt_list> end
BNF - CFG Fundamentals
Nonterminals are often enclosed in angle brackets
Example of BNF rules:
<ident_list> → identifier | identifier , <ident_list> <if_stmt> → if <logic_expr> then <stmt>
Grammar: a finite non-empty set of rules
A start symbol is a special element of the nonterminals of a grammar
CFG Formal Definition
A context-free grammar G is defined by the 4-tuple G = (V, Σ, R, S) where
V is a finite set; each element v in V is called a non-terminal character or a variable. Each variable represents a different type of phrase or clause in the sentence
Σ is a finite set of terminals, disjoint from V, which make up the actual content of the sentence. The set of terminals is the alphabet of the language defined by the grammar G
R is a finite relation from V to (V + Σ)*, where the asterisk represents the Kleene star operation. The members of R, are called the (rewrite) rules or productions of the grammar. (also commonly symbolized by a P)
S is the start variable (or the start symbol), used to represent the whole sentence (or program). It must be an element of V
Example: Describing List
Sytanctic lists are described using recursion
<ident_list> → ident
| ident ',' <ident_list>
Derivation
A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols)
An example Grammar
<program> → <stmts>
<stmts> → <stmt> | <stmt> ';' <stmts>
<stmt> → <var> '=' <expr>
<var> → 'a' | 'b' | 'c' | 'd'
<expr> → <term> '+' <term> | <term> '-' <term>
<term> → <var> | 'const'
An Example Derivation
<program> => <stmts> => <stmt>
=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + const
Derivations - Some Notions
Every string of symbols in a derivation is a sentential form
A sentence is a sentential form that has only terminal symbols
A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded. Accordingly a rightmost derivation for the rightmost nonterminal
A derivation may be neither leftmost nor rightmost
Parse Tree
A parse tree represents a hierarchical representation of a derivation by means of a tree
Example:
Ambiguity in Grammars
A grammar is ambiguous if and only if it generates a sentential form (the language generated by the grammar comprises at least one word) that has two or more distinct parse trees
Example for Ambiguous Grammar
<expr> → <expr> <op> <expr> | const
<op> → ‘+' | '-'
word: const – const + const
Unambiguous Grammar
Grammar on the example before in ambiguous form:
We get some form of operator precedence
<expr> → <expr> - <term> | <term> <term> → <term> + const | const
Associativity of Operators
Operator associativity can also be indicated by a grammar
Example:
<expr> -> <expr> + <expr> | const
(ambiguous)
<expr> -> <expr> + const | const
(unambiguous)
Extended BNF
Optional Parts are placed in brackets []
<proc_call> -> ident [(<expr_list>)]
Alternative parts of RHS are placed inside parentheses and separated via vertical bars
<term> -> <term> (+|-) const
Repetitions (0 or more) are placed inside braces {}
<ident> -> letter {letter|digit}
BNF and EBNF Examples
BNF
<expr> → <expr> + <term> | <expr> - <term> | <term> <term> → <term> * <factor> | <term> / <factor> | <factor>
EBNF
<expr> → <term> {(+ | -) <term>} <term> → <factor> {(* | /) <factor>}
Variations in EBNF
Alternative RHS are put on separate lines
Use of colon instead of →
Use of opt for optional parts
Use of oneof for choices
Static Semantics
Nothing to do with meaning
Context-free grammars (CFGs) cannot describe all of the syntax of programming languages
Categories of constructs that are “trouble-maker”
Still Context-free, but cumbersome (e.g., types of operands in expressions)
Non-context-free (e.g., variables must be declared before they are used)
Attribute Grammars
Attribute grammars (AGs) have additions to CFGs to carry some semantic info on parse tree nodes
Primary values of AGs:
Static semantics specification for static semantics checks
Attribute Grammars - Definition
Def: An attribute grammar is a context-free grammar with the following additions:
For each grammar symbol x there is a set A(x) of attribute values
Each rule has a set of functions that define certain attributes of the nonterminals in the rule
Each rule has a (possibly empty) set of predicates to check for attribute consistency
Let $X_0 -> X_1...X_n$ be a rule
Functions of the form $S(X_0) = f(A(X_1), ..., A(X_n))$ define synthesized attributes
Initially, there are intrinsic attributes on the attributes on the leaves
Attribute Grammars: An Example
Syntax
<assign> -> <var> = <expr> <expr> -> <var> + <var> | <var> <var> -> id
Attributes
actual_type: synthesized with the non-terminals <var> and <expr>
expected_type: inherited with the non-terminal <expr>