programming class ch3

Introduction

Chapter 3 focuses on describing syntax and semantics in programming languages.
Two complementary facets of a language definition:
- Syntax – the formal structure of expressions, statements, program units.
- Semantics – the meaning associated with those structures.
Accurate definitions are vital for:
- Language designers (to reuse or extend ideas).
- Implementers (compiler/interpreter writers).
- Programmers (end-users who must understand what programs do).

Syntax vs. Semantics

Syntax ≠ semantics; form versus meaning.
Static semantics: compile-time rules not captured purely by context-free grammars (CFGs).
Dynamic semantics: run-time meaning of constructs.

Basic Terminology

Sentence: a string of characters over an alphabet.
Language: a set of sentences.
Lexeme: the lowest-level syntactic unit (e.g. * , sum, begin).
Token: a category/class of lexemes (e.g. identifier, integer-literal).

Formal Definitions of Languages

Two complementary approaches:
- Recognizers: algorithms/devices that accept valid sentences and reject invalid ones (e.g. the parser in a compiler).
- Generators: mechanisms that can systematically produce every valid sentence; comparing a candidate sentence against the generator’s patterns shows whether it is syntactically valid.

Context-Free Grammars (CFGs) & BNF

Invented by Noam Chomsky (1950s) to model natural languages; adopted for programming languages.
Backus–Naur Form (BNF) (1959, John Backus) is notation equivalent to CFGs.

BNF Fundamentals

Uses abstractions (nonterminals) to represent classes of syntactic structures.
Terminals = lexemes/tokens.
Rules (productions) have the form \text{LHS} \rightarrow \text{RHS} where LHS is a nonterminal.
Nonterminals are often shown in ⟨angle brackets⟩.
A grammar is a finite, non-empty set of rules plus a distinguished start symbol.
Nonterminals may have multiple alternative RHSs (use |).

Lists & Recursion

Repetitive constructs are expressed recursively.
- Example: \langle ident_list \rangle \rightarrow ident \; | \; ident , \langle ident_list \rangle
Derivation: repeated rule application beginning with the start symbol and terminating in a sentence (all terminals).

Example Grammar & Derivation

Grammar:
- \langle program \rangle \rightarrow \langle stmts \rangle
- \langle stmts \rangle \rightarrow \langle stmt \rangle \; | \; \langle stmt \rangle ; \langle stmts \rangle
- \langle stmt \rangle \rightarrow \langle var \rangle = \langle expr \rangle
- \langle var \rangle \rightarrow a \; | \; b \; | \; c \; | \; d
- \langle expr \rangle \rightarrow \langle term \rangle + \langle term \rangle \; | \; \langle term \rangle - \langle term \rangle
- \langle term \rangle \rightarrow \langle var \rangle \; | \; const
Leftmost derivation for “a = b + const” shown step-by-step.
Sentential form: any intermediate string in a derivation.
Parse tree: hierarchical visualization of a derivation.

Ambiguity

A grammar is ambiguous iff it generates at least one sentential form with ≥2 distinct parse trees.
Classic ambiguous arithmetic grammar:
- \langle expr \rangle \rightarrow \langle expr \rangle \langle op \rangle \langle expr \rangle \; | \; const
- \langle op \rangle \rightarrow / \; | \; -
Ambiguity removed by encoding precedence/associativity into separate nonterminals, e.g.
- \langle expr \rangle \rightarrow \langle expr \rangle - \langle term \rangle \; | \; \langle term \rangle
- \langle term \rangle \rightarrow \langle term \rangle / const \; | \; const

Operator Associativity Example

Ambiguous: \langle expr \rangle \rightarrow \langle expr \rangle + \langle expr \rangle \; | \; const
Left-associative, unambiguous version: \langle expr \rangle \rightarrow \langle expr \rangle + const \; | \; const

The “Dangling Else” (selector) problem

Naïve Java if-then-else grammar is ambiguous.
Solution partitions statements:
- \langle stmt \rangle \rightarrow \langle matched \rangle \; | \; \langle unmatched \rangle
- \langle matched \rangle \rightarrow if(\langle logic_expr \rangle)\; \langle stmt \rangle \; else \; \langle matched \rangle \; | \; \text{non-if stmt}
- \langle unmatched \rangle \rightarrow if(\langle logic_expr \rangle)\; \langle stmt \rangle \; | \; if(\langle logic_expr \rangle)\; \langle matched \rangle \; else \; \langle unmatched \rangle

Full Unambiguous Expression Grammar (Example 3.4)

\langle assign \rangle \rightarrow \langle id \rangle = \langle expr \rangle
\langle id \rangle \rightarrow A \; | \; B \; | \; C
\langle expr \rangle \rightarrow \langle expr \rangle + \langle term \rangle \; | \; \langle term \rangle
\langle term \rangle \rightarrow \langle term \rangle * \langle factor \rangle \; | \; \langle factor \rangle
\langle factor \rangle \rightarrow ( \langle expr \rangle ) \; | \; \langle id \rangle

Extended BNF (EBNF)

Adds metasymbols for conciseness:
- Optional: [ … ]
- Repetition (0 +): { … }
- Grouped alternatives: (A | B).
Example conversion of above arithmetic grammar:
- \langle expr \rangle \rightarrow \langle term \rangle \; {\; (+ | -) \; \langle term \rangle\;}
- \langle term \rangle \rightarrow \langle factor \rangle \; {\; (* | /) \; \langle factor \rangle\;}
Recent stylistic innovations: separate lines for alternatives, ':' instead of $\Rightarrow$, keywords like “opt” and “oneof”.

Practice Exercises (from slides)

Write BNF for Java Boolean expressions (operators &&, ||, !).
Convert Example 3.4 to EBNF.

Static Semantics & Attribute Grammars

Pure CFGs can’t express all static constraints (e.g., type compatibility, identifier declaration before use).
Attribute Grammar (AG) enriches CFG with attributes, functions, predicates.

Grammar Definition Recap

A grammar G = (S,N,T,P) where:
- N: nonterminals, T: terminals, P: productions, S: start symbol.

Attribute Grammar Components

For each symbol x, attribute set A(x).
Two kinds of attributes:
- Synthesized: computed from children up to parent, e.g. S(X0)=f(A(X1),\dots,A(X_n)).
- Inherited: passed from parent/siblings downwards, e.g. I(Xj)=f(A(X0),\dots,A(X_n)).
Rules may contain predicates to ensure consistency (type checks, etc.).

Example Attribute Grammar

Syntax (simplified):
- \langle assign \rangle \rightarrow \langle var \rangle = \langle expr \rangle
- \langle expr \rangle \rightarrow \langle var \rangle + \langle var \rangle \; | \; \langle var \rangle
- \langle var \rangle \rightarrow A \; | \; B \; | \; C
Attributes:
- actual_type (synthesized) for \<var>, \<expr>.
- expected_type (inherited) for \<expr>.
Sample semantic rules:
- Production: \langle expr \rangle \rightarrow \langle var \rangle1 + \langle var \rangle2
- \langle expr \rangle.actual_type \leftarrow \langle var \rangle_1.actual_type
- Predicate: \langle var \rangle1.actual_type == \langle var \rangle2.actual_type
- \langle expr \rangle.expected_type == \langle expr \rangle.actual_type
- Production: \langle var \rangle \rightarrow id
- \langle var \rangle.actual_type \leftarrow lookup(\langle var \rangle.string)
Attribute evaluation strategies:
- All inherited ⇒ top-down decoration.
- All synthesized ⇒ bottom-up decoration.
- Mixed ⇒ combination order needed.

Dynamic Semantics (Meaning of Programs)

Motivation for Formal Semantics

Programmers need precise definitions.
Compiler writers must implement correctly.
Facilitates correctness proofs, compiler generators, ambiguity detection.
No single universally adopted formalism; three main families: operational, denotational, axiomatic.

Operational Semantics

Describes meaning by executing statements on an abstract (often virtual) machine; state changes => meaning.
Pure hardware or software interpreters are impractical/too concrete; instead, build:
1. Translator from source to idealized machine code.
2. Simulator of the idealized machine.
Two granularities:
- Natural operational semantics (big-step) – directly maps whole constructs to final states.
- Structural operational semantics (small-step) – models computation as sequences of elementary steps.
Pros: intuitive for manuals/teaching. Cons: cumbersome for rigorous formal proofs (e.g., IBM VDL for PL/I).

Denotational Semantics

Developed by Scott & Strachey (1970); uses mathematical (often recursive) functions.
For each syntactic entity, define a corresponding mathematical denotation.
Program meaning ≡ mathematical function on program state.

Program State & VARMAP

s = {\langle i1,v1 \rangle,\dots,\langle in,vn \rangle} (mapping identifiers to values).
VARMAP(ij, s) = vj retrieves a variable’s value.

Decimal Numbers

Grammar:
- \langle dec_num \rangle \rightarrow '0'|\dots|'9' \; | \; \langle dec_num \rangle \; ('0'|\dots|'9')
Mapping function M_{dec}:
- M{dec}('0') = 0, \; M{dec}('1') = 1, \dots
- M{dec}(\langle dec_num \rangle 'k') = 10 \cdot M{dec}(\langle dec_num \rangle) + k where k\in{0..9}.

Expression Meaning

Domain: \mathbb Z \cup {error}.
Recursive definition M_e(\langle expr \rangle, s):
- Constant ⇒ M_{dec}.
- Variable ⇒ error if undefined else VARMAP.
- Binary + or * expressions ⇒ evaluate operands then apply operator; propagate error if any.

Assignment Semantics

Ma(x := E, s) = \begin{cases} error & \text{if } Me(E,s)=error\
s' & \text{otherwise}
\end{cases}
s' identical to s except x now maps to M_e(E,s).

While-Loop Semantics

Ml(while\ B\ do\ L, s) = \begin{cases} error & \text{if } Mb(B,s)=undef\
s & \text{if } Mb(B,s)=false\ error & \text{if } M{sl}(L,s)=error\
Ml(while\ B\ do\ L, M{sl}(L,s)) & \text{otherwise}
\end{cases}
Converts iteration into recursion for mathematical clarity.

Evaluation of Denotational Semantics

Enables formal proofs of correctness and compiler generation.
Highly rigorous, but complex for everyday language users.

Axiomatic Semantics

Based on predicate calculus; created for program verification.
Uses assertions about program states.

Assertions & Weakest Preconditions

Notation: {P}\; statement \;{Q} where P precondition, Q postcondition.
Weakest precondition (WP): least restrictive P guaranteeing Q.
- Example: statement a = b + 1; desired postcondition {a > 1}.
- One valid precondition: b > 10.
- WP: b > 0.

Proof Rules (selected)

Assignment Axiom: {Q[x \leftarrow E]}\; x = E; \; {Q}
Consequence Rule: from {P'}S{Q'}, P \implies P', Q' \implies Q infer {P}S{Q}.
Sequence: if {P1}S1{P2} and {P2}S2{P3} then {P1}S1;S2{P3}.
Selection (if-then-else):
- From {P \land B}S1{Q} and {P \land \lnot B}S2{Q} infer {P}if\ B\ then\ S1\ else\ S2{Q}.
While Loop (pretest): choose loop invariant I.
- Requirements:
1. P \Rightarrow I (initially true).
2. {I \land B} S {I} (body preserves invariant).
3. I \land \lnot B \Rightarrow Q (termination yields postcondition).
4. Loop terminates (sometimes proven via a variant function).

Loop Invariants

Must be weak enough to hold before first iteration, yet strong enough (with exit condition) to ensure postcondition.

Evaluation

Powerful for proofs & reasoning; challenging to define full rule sets; less directly useful to casual programmers.

Comparing Semantic Approaches

Operational: state changes given by algorithms (executable descriptions).
Denotational: state changes given by mathematical functions (high abstraction).
Axiomatic: focuses on logical assertions and correctness reasoning, independent of execution details.

Summary / Key Takeaways

BNF/CFG are standard meta-languages for syntax.
EBNF adds syntactic sugar for clarity.
Ambiguity must be resolved to guarantee unique parse trees (precise meaning).
Attribute Grammars extend CFGs to specify some static semantics (types, declarations).
Dynamic semantics can be described via three major formal methods:
1. Operational
2. Denotational
3. Axiomatic
Each method trades off between rigor, abstraction, and practicality for designers, implementers, and users.