programming class ch3
Introduction
Chapter 3 focuses on describing syntax and semantics in programming languages.
Two complementary facets of a language definition:
Syntax – the formal structure of expressions, statements, program units.
Semantics – the meaning associated with those structures.
Accurate definitions are vital for:
Language designers (to reuse or extend ideas).
Implementers (compiler/interpreter writers).
Programmers (end-users who must understand what programs do).
Syntax vs. Semantics
Syntax ≠ semantics; form versus meaning.
Static semantics: compile-time rules not captured purely by context-free grammars (CFGs).
Dynamic semantics: run-time meaning of constructs.
Basic Terminology
Sentence: a string of characters over an alphabet.
Language: a set of sentences.
Lexeme: the lowest-level syntactic unit (e.g. * , sum, begin).
Token: a category/class of lexemes (e.g. identifier, integer-literal).
Formal Definitions of Languages
Two complementary approaches:
Recognizers: algorithms/devices that accept valid sentences and reject invalid ones (e.g. the parser in a compiler).
Generators: mechanisms that can systematically produce every valid sentence; comparing a candidate sentence against the generator’s patterns shows whether it is syntactically valid.
Context-Free Grammars (CFGs) & BNF
Invented by Noam Chomsky (1950s) to model natural languages; adopted for programming languages.
Backus–Naur Form (BNF) (1959, John Backus) is notation equivalent to CFGs.
BNF Fundamentals
Uses abstractions (nonterminals) to represent classes of syntactic structures.
Terminals = lexemes/tokens.
Rules (productions) have the form \text{LHS} \rightarrow \text{RHS} where LHS is a nonterminal.
Nonterminals are often shown in ⟨angle brackets⟩.
A grammar is a finite, non-empty set of rules plus a distinguished start symbol.
Nonterminals may have multiple alternative RHSs (use |).
Lists & Recursion
Repetitive constructs are expressed recursively.
Example: \langle ident_list \rangle \rightarrow ident \; | \; ident , \langle ident_list \rangle
Derivation: repeated rule application beginning with the start symbol and terminating in a sentence (all terminals).
Example Grammar & Derivation
Grammar:
\langle program \rangle \rightarrow \langle stmts \rangle
\langle stmts \rangle \rightarrow \langle stmt \rangle \; | \; \langle stmt \rangle ; \langle stmts \rangle
\langle stmt \rangle \rightarrow \langle var \rangle = \langle expr \rangle
\langle var \rangle \rightarrow a \; | \; b \; | \; c \; | \; d
\langle expr \rangle \rightarrow \langle term \rangle + \langle term \rangle \; | \; \langle term \rangle - \langle term \rangle
\langle term \rangle \rightarrow \langle var \rangle \; | \; const
Leftmost derivation for “a = b + const” shown step-by-step.
Sentential form: any intermediate string in a derivation.
Parse tree: hierarchical visualization of a derivation.
Ambiguity
A grammar is ambiguous iff it generates at least one sentential form with ≥2 distinct parse trees.
Classic ambiguous arithmetic grammar:
\langle expr \rangle \rightarrow \langle expr \rangle \langle op \rangle \langle expr \rangle \; | \; const
\langle op \rangle \rightarrow / \; | \; -
Ambiguity removed by encoding precedence/associativity into separate nonterminals, e.g.
\langle expr \rangle \rightarrow \langle expr \rangle - \langle term \rangle \; | \; \langle term \rangle
\langle term \rangle \rightarrow \langle term \rangle / const \; | \; const
Operator Associativity Example
Ambiguous: \langle expr \rangle \rightarrow \langle expr \rangle + \langle expr \rangle \; | \; const
Left-associative, unambiguous version: \langle expr \rangle \rightarrow \langle expr \rangle + const \; | \; const
The “Dangling Else” (selector) problem
Naïve Java if-then-else grammar is ambiguous.
Solution partitions statements:
\langle stmt \rangle \rightarrow \langle matched \rangle \; | \; \langle unmatched \rangle
\langle matched \rangle \rightarrow if(\langle logic_expr \rangle)\; \langle stmt \rangle \; else \; \langle matched \rangle \; | \; \text{non-if stmt}
\langle unmatched \rangle \rightarrow if(\langle logic_expr \rangle)\; \langle stmt \rangle \; | \; if(\langle logic_expr \rangle)\; \langle matched \rangle \; else \; \langle unmatched \rangle
Full Unambiguous Expression Grammar (Example 3.4)
\langle assign \rangle \rightarrow \langle id \rangle = \langle expr \rangle
\langle id \rangle \rightarrow A \; | \; B \; | \; C
\langle expr \rangle \rightarrow \langle expr \rangle + \langle term \rangle \; | \; \langle term \rangle
\langle term \rangle \rightarrow \langle term \rangle * \langle factor \rangle \; | \; \langle factor \rangle
\langle factor \rangle \rightarrow ( \langle expr \rangle ) \; | \; \langle id \rangle
Extended BNF (EBNF)
Adds metasymbols for conciseness:
Optional: [ … ]
Repetition (0 +): { … }
Grouped alternatives: (A | B).
Example conversion of above arithmetic grammar:
\langle expr \rangle \rightarrow \langle term \rangle \; {\; (+ | -) \; \langle term \rangle\;}
\langle term \rangle \rightarrow \langle factor \rangle \; {\; (* | /) \; \langle factor \rangle\;}
Recent stylistic innovations: separate lines for alternatives, ':' instead of $\Rightarrow$, keywords like “opt” and “oneof”.
Practice Exercises (from slides)
Write BNF for Java Boolean expressions (operators &&, ||, !).
Convert Example 3.4 to EBNF.
Static Semantics & Attribute Grammars
Pure CFGs can’t express all static constraints (e.g., type compatibility, identifier declaration before use).
Attribute Grammar (AG) enriches CFG with attributes, functions, predicates.
Grammar Definition Recap
A grammar G = (S,N,T,P) where:
N: nonterminals, T: terminals, P: productions, S: start symbol.
Attribute Grammar Components
For each symbol x, attribute set A(x).
Two kinds of attributes:
Synthesized: computed from children up to parent, e.g. S(X0)=f(A(X1),\dots,A(X_n)).
Inherited: passed from parent/siblings downwards, e.g. I(Xj)=f(A(X0),\dots,A(X_n)).
Rules may contain predicates to ensure consistency (type checks, etc.).
Example Attribute Grammar
Syntax (simplified):
\langle assign \rangle \rightarrow \langle var \rangle = \langle expr \rangle
\langle expr \rangle \rightarrow \langle var \rangle + \langle var \rangle \; | \; \langle var \rangle
\langle var \rangle \rightarrow A \; | \; B \; | \; C
Attributes:
actual_type (synthesized) for \<var>, \<expr>.
expected_type (inherited) for \<expr>.
Sample semantic rules:
Production: \langle expr \rangle \rightarrow \langle var \rangle1 + \langle var \rangle2
\langle expr \rangle.actual_type \leftarrow \langle var \rangle_1.actual_type
Predicate: \langle var \rangle1.actual_type == \langle var \rangle2.actual_type
\langle expr \rangle.expected_type == \langle expr \rangle.actual_type
Production: \langle var \rangle \rightarrow id
\langle var \rangle.actual_type \leftarrow lookup(\langle var \rangle.string)
Attribute evaluation strategies:
All inherited ⇒ top-down decoration.
All synthesized ⇒ bottom-up decoration.
Mixed ⇒ combination order needed.
Dynamic Semantics (Meaning of Programs)
Motivation for Formal Semantics
Programmers need precise definitions.
Compiler writers must implement correctly.
Facilitates correctness proofs, compiler generators, ambiguity detection.
No single universally adopted formalism; three main families: operational, denotational, axiomatic.
Operational Semantics
Describes meaning by executing statements on an abstract (often virtual) machine; state changes => meaning.
Pure hardware or software interpreters are impractical/too concrete; instead, build:
Translator from source to idealized machine code.
Simulator of the idealized machine.
Two granularities:
Natural operational semantics (big-step) – directly maps whole constructs to final states.
Structural operational semantics (small-step) – models computation as sequences of elementary steps.
Pros: intuitive for manuals/teaching. Cons: cumbersome for rigorous formal proofs (e.g., IBM VDL for PL/I).
Denotational Semantics
Developed by Scott & Strachey (1970); uses mathematical (often recursive) functions.
For each syntactic entity, define a corresponding mathematical denotation.
Program meaning ≡ mathematical function on program state.
Program State & VARMAP
s = {\langle i1,v1 \rangle,\dots,\langle in,vn \rangle} (mapping identifiers to values).
VARMAP(ij, s) = vj retrieves a variable’s value.
Decimal Numbers
Grammar:
\langle dec_num \rangle \rightarrow '0'|\dots|'9' \; | \; \langle dec_num \rangle \; ('0'|\dots|'9')
Mapping function M_{dec}:
M{dec}('0') = 0, \; M{dec}('1') = 1, \dots
M{dec}(\langle dec_num \rangle 'k') = 10 \cdot M{dec}(\langle dec_num \rangle) + k where k\in{0..9}.
Expression Meaning
Domain: \mathbb Z \cup {error}.
Recursive definition M_e(\langle expr \rangle, s):
Constant ⇒ M_{dec}.
Variable ⇒ error if undefined else VARMAP.
Binary + or * expressions ⇒ evaluate operands then apply operator; propagate error if any.
Assignment Semantics
Ma(x := E, s) = \begin{cases} error & \text{if } Me(E,s)=error\
s' & \text{otherwise}
\end{cases}s' identical to s except x now maps to M_e(E,s).
While-Loop Semantics
Ml(while\ B\ do\ L, s) = \begin{cases} error & \text{if } Mb(B,s)=undef\
s & \text{if } Mb(B,s)=false\ error & \text{if } M{sl}(L,s)=error\
Ml(while\ B\ do\ L, M{sl}(L,s)) & \text{otherwise}
\end{cases}Converts iteration into recursion for mathematical clarity.
Evaluation of Denotational Semantics
Enables formal proofs of correctness and compiler generation.
Highly rigorous, but complex for everyday language users.
Axiomatic Semantics
Based on predicate calculus; created for program verification.
Uses assertions about program states.
Assertions & Weakest Preconditions
Notation: {P}\; statement \;{Q} where P precondition, Q postcondition.
Weakest precondition (WP): least restrictive P guaranteeing Q.
Example: statement a = b + 1; desired postcondition {a > 1}.
One valid precondition: b > 10.
WP: b > 0.
Proof Rules (selected)
Assignment Axiom: {Q[x \leftarrow E]}\; x = E; \; {Q}
Consequence Rule: from {P'}S{Q'}, P \implies P', Q' \implies Q infer {P}S{Q}.
Sequence: if {P1}S1{P2} and {P2}S2{P3} then {P1}S1;S2{P3}.
Selection (if-then-else):
From {P \land B}S1{Q} and {P \land \lnot B}S2{Q} infer {P}if\ B\ then\ S1\ else\ S2{Q}.
While Loop (pretest): choose loop invariant I.
Requirements:
P \Rightarrow I (initially true).
{I \land B} S {I} (body preserves invariant).
I \land \lnot B \Rightarrow Q (termination yields postcondition).
Loop terminates (sometimes proven via a variant function).
Loop Invariants
Must be weak enough to hold before first iteration, yet strong enough (with exit condition) to ensure postcondition.
Evaluation
Powerful for proofs & reasoning; challenging to define full rule sets; less directly useful to casual programmers.
Comparing Semantic Approaches
Operational: state changes given by algorithms (executable descriptions).
Denotational: state changes given by mathematical functions (high abstraction).
Axiomatic: focuses on logical assertions and correctness reasoning, independent of execution details.
Summary / Key Takeaways
BNF/CFG are standard meta-languages for syntax.
EBNF adds syntactic sugar for clarity.
Ambiguity must be resolved to guarantee unique parse trees (precise meaning).
Attribute Grammars extend CFGs to specify some static semantics (types, declarations).
Dynamic semantics can be described via three major formal methods:
Operational
Denotational
Axiomatic
Each method trades off between rigor, abstraction, and practicality for designers, implementers, and users.