JP

programming class ch3

Introduction

  • Chapter 3 focuses on describing syntax and semantics in programming languages.

  • Two complementary facets of a language definition:

    • Syntax – the formal structure of expressions, statements, program units.

    • Semantics – the meaning associated with those structures.

  • Accurate definitions are vital for:

    • Language designers (to reuse or extend ideas).

    • Implementers (compiler/interpreter writers).

    • Programmers (end-users who must understand what programs do).

Syntax vs. Semantics

  • Syntax ≠ semantics; form versus meaning.

  • Static semantics: compile-time rules not captured purely by context-free grammars (CFGs).

  • Dynamic semantics: run-time meaning of constructs.

Basic Terminology

  • Sentence: a string of characters over an alphabet.

  • Language: a set of sentences.

  • Lexeme: the lowest-level syntactic unit (e.g. * , sum, begin).

  • Token: a category/class of lexemes (e.g. identifier, integer-literal).

Formal Definitions of Languages

  • Two complementary approaches:

    • Recognizers: algorithms/devices that accept valid sentences and reject invalid ones (e.g. the parser in a compiler).

    • Generators: mechanisms that can systematically produce every valid sentence; comparing a candidate sentence against the generator’s patterns shows whether it is syntactically valid.

Context-Free Grammars (CFGs) & BNF

  • Invented by Noam Chomsky (1950s) to model natural languages; adopted for programming languages.

  • Backus–Naur Form (BNF) (1959, John Backus) is notation equivalent to CFGs.

BNF Fundamentals

  • Uses abstractions (nonterminals) to represent classes of syntactic structures.

  • Terminals = lexemes/tokens.

  • Rules (productions) have the form \text{LHS} \rightarrow \text{RHS} where LHS is a nonterminal.

  • Nonterminals are often shown in ⟨angle brackets⟩.

  • A grammar is a finite, non-empty set of rules plus a distinguished start symbol.

  • Nonterminals may have multiple alternative RHSs (use |).

Lists & Recursion

  • Repetitive constructs are expressed recursively.

    • Example: \langle ident_list \rangle \rightarrow ident \; | \; ident , \langle ident_list \rangle

  • Derivation: repeated rule application beginning with the start symbol and terminating in a sentence (all terminals).

Example Grammar & Derivation

  • Grammar:

    • \langle program \rangle \rightarrow \langle stmts \rangle

    • \langle stmts \rangle \rightarrow \langle stmt \rangle \; | \; \langle stmt \rangle ; \langle stmts \rangle

    • \langle stmt \rangle \rightarrow \langle var \rangle = \langle expr \rangle

    • \langle var \rangle \rightarrow a \; | \; b \; | \; c \; | \; d

    • \langle expr \rangle \rightarrow \langle term \rangle + \langle term \rangle \; | \; \langle term \rangle - \langle term \rangle

    • \langle term \rangle \rightarrow \langle var \rangle \; | \; const

  • Leftmost derivation for “a = b + const” shown step-by-step.

  • Sentential form: any intermediate string in a derivation.

  • Parse tree: hierarchical visualization of a derivation.

Ambiguity

  • A grammar is ambiguous iff it generates at least one sentential form with ≥2 distinct parse trees.

  • Classic ambiguous arithmetic grammar:

    • \langle expr \rangle \rightarrow \langle expr \rangle \langle op \rangle \langle expr \rangle \; | \; const

    • \langle op \rangle \rightarrow / \; | \; -

  • Ambiguity removed by encoding precedence/associativity into separate nonterminals, e.g.

    • \langle expr \rangle \rightarrow \langle expr \rangle - \langle term \rangle \; | \; \langle term \rangle

    • \langle term \rangle \rightarrow \langle term \rangle / const \; | \; const

Operator Associativity Example
  • Ambiguous: \langle expr \rangle \rightarrow \langle expr \rangle + \langle expr \rangle \; | \; const

  • Left-associative, unambiguous version: \langle expr \rangle \rightarrow \langle expr \rangle + const \; | \; const

The “Dangling Else” (selector) problem
  • Naïve Java if-then-else grammar is ambiguous.

  • Solution partitions statements:

    • \langle stmt \rangle \rightarrow \langle matched \rangle \; | \; \langle unmatched \rangle

    • \langle matched \rangle \rightarrow if(\langle logic_expr \rangle)\; \langle stmt \rangle \; else \; \langle matched \rangle \; | \; \text{non-if stmt}

    • \langle unmatched \rangle \rightarrow if(\langle logic_expr \rangle)\; \langle stmt \rangle \; | \; if(\langle logic_expr \rangle)\; \langle matched \rangle \; else \; \langle unmatched \rangle

Full Unambiguous Expression Grammar (Example 3.4)
  • \langle assign \rangle \rightarrow \langle id \rangle = \langle expr \rangle

  • \langle id \rangle \rightarrow A \; | \; B \; | \; C

  • \langle expr \rangle \rightarrow \langle expr \rangle + \langle term \rangle \; | \; \langle term \rangle

  • \langle term \rangle \rightarrow \langle term \rangle * \langle factor \rangle \; | \; \langle factor \rangle

  • \langle factor \rangle \rightarrow ( \langle expr \rangle ) \; | \; \langle id \rangle

Extended BNF (EBNF)

  • Adds metasymbols for conciseness:

    • Optional: [ … ]

    • Repetition (0 +): { … }

    • Grouped alternatives: (A | B).

  • Example conversion of above arithmetic grammar:

    • \langle expr \rangle \rightarrow \langle term \rangle \; {\; (+ | -) \; \langle term \rangle\;}

    • \langle term \rangle \rightarrow \langle factor \rangle \; {\; (* | /) \; \langle factor \rangle\;}

  • Recent stylistic innovations: separate lines for alternatives, ':' instead of $\Rightarrow$, keywords like “opt” and “oneof”.

Practice Exercises (from slides)

  • Write BNF for Java Boolean expressions (operators &&, ||, !).

  • Convert Example 3.4 to EBNF.

Static Semantics & Attribute Grammars

  • Pure CFGs can’t express all static constraints (e.g., type compatibility, identifier declaration before use).

  • Attribute Grammar (AG) enriches CFG with attributes, functions, predicates.

Grammar Definition Recap

  • A grammar G = (S,N,T,P) where:

    • N: nonterminals, T: terminals, P: productions, S: start symbol.

Attribute Grammar Components

  • For each symbol x, attribute set A(x).

  • Two kinds of attributes:

    • Synthesized: computed from children up to parent, e.g. S(X0)=f(A(X1),\dots,A(X_n)).

    • Inherited: passed from parent/siblings downwards, e.g. I(Xj)=f(A(X0),\dots,A(X_n)).

  • Rules may contain predicates to ensure consistency (type checks, etc.).

Example Attribute Grammar

  • Syntax (simplified):

    • \langle assign \rangle \rightarrow \langle var \rangle = \langle expr \rangle

    • \langle expr \rangle \rightarrow \langle var \rangle + \langle var \rangle \; | \; \langle var \rangle

    • \langle var \rangle \rightarrow A \; | \; B \; | \; C

  • Attributes:

    • actual_type (synthesized) for \<var>, \<expr>.

    • expected_type (inherited) for \<expr>.

  • Sample semantic rules:

    • Production: \langle expr \rangle \rightarrow \langle var \rangle1 + \langle var \rangle2

    • \langle expr \rangle.actual_type \leftarrow \langle var \rangle_1.actual_type

    • Predicate: \langle var \rangle1.actual_type == \langle var \rangle2.actual_type

    • \langle expr \rangle.expected_type == \langle expr \rangle.actual_type

    • Production: \langle var \rangle \rightarrow id

    • \langle var \rangle.actual_type \leftarrow lookup(\langle var \rangle.string)

  • Attribute evaluation strategies:

    • All inherited ⇒ top-down decoration.

    • All synthesized ⇒ bottom-up decoration.

    • Mixed ⇒ combination order needed.

Dynamic Semantics (Meaning of Programs)

Motivation for Formal Semantics

  • Programmers need precise definitions.

  • Compiler writers must implement correctly.

  • Facilitates correctness proofs, compiler generators, ambiguity detection.

  • No single universally adopted formalism; three main families: operational, denotational, axiomatic.

Operational Semantics

  • Describes meaning by executing statements on an abstract (often virtual) machine; state changes => meaning.

  • Pure hardware or software interpreters are impractical/too concrete; instead, build:

    1. Translator from source to idealized machine code.

    2. Simulator of the idealized machine.

  • Two granularities:

    • Natural operational semantics (big-step) – directly maps whole constructs to final states.

    • Structural operational semantics (small-step) – models computation as sequences of elementary steps.

  • Pros: intuitive for manuals/teaching. Cons: cumbersome for rigorous formal proofs (e.g., IBM VDL for PL/I).

Denotational Semantics

  • Developed by Scott & Strachey (1970); uses mathematical (often recursive) functions.

  • For each syntactic entity, define a corresponding mathematical denotation.

  • Program meaning ≡ mathematical function on program state.

Program State & VARMAP
  • s = {\langle i1,v1 \rangle,\dots,\langle in,vn \rangle} (mapping identifiers to values).

  • VARMAP(ij, s) = vj retrieves a variable’s value.

Decimal Numbers
  • Grammar:

    • \langle dec_num \rangle \rightarrow '0'|\dots|'9' \; | \; \langle dec_num \rangle \; ('0'|\dots|'9')

  • Mapping function M_{dec}:

    • M{dec}('0') = 0, \; M{dec}('1') = 1, \dots

    • M{dec}(\langle dec_num \rangle 'k') = 10 \cdot M{dec}(\langle dec_num \rangle) + k where k\in{0..9}.

Expression Meaning
  • Domain: \mathbb Z \cup {error}.

  • Recursive definition M_e(\langle expr \rangle, s):

    • Constant ⇒ M_{dec}.

    • Variable ⇒ error if undefined else VARMAP.

    • Binary + or * expressions ⇒ evaluate operands then apply operator; propagate error if any.

Assignment Semantics
  • Ma(x := E, s) = \begin{cases} error & \text{if } Me(E,s)=error\
    s' & \text{otherwise}
    \end{cases}

  • s' identical to s except x now maps to M_e(E,s).

While-Loop Semantics
  • Ml(while\ B\ do\ L, s) = \begin{cases} error & \text{if } Mb(B,s)=undef\
    s & \text{if } Mb(B,s)=false\ error & \text{if } M{sl}(L,s)=error\
    Ml(while\ B\ do\ L, M{sl}(L,s)) & \text{otherwise}
    \end{cases}

  • Converts iteration into recursion for mathematical clarity.

Evaluation of Denotational Semantics
  • Enables formal proofs of correctness and compiler generation.

  • Highly rigorous, but complex for everyday language users.

Axiomatic Semantics

  • Based on predicate calculus; created for program verification.

  • Uses assertions about program states.

Assertions & Weakest Preconditions
  • Notation: {P}\; statement \;{Q} where P precondition, Q postcondition.

  • Weakest precondition (WP): least restrictive P guaranteeing Q.

    • Example: statement a = b + 1; desired postcondition {a > 1}.

    • One valid precondition: b > 10.

    • WP: b > 0.

Proof Rules (selected)
  • Assignment Axiom: {Q[x \leftarrow E]}\; x = E; \; {Q}

  • Consequence Rule: from {P'}S{Q'}, P \implies P', Q' \implies Q infer {P}S{Q}.

  • Sequence: if {P1}S1{P2} and {P2}S2{P3} then {P1}S1;S2{P3}.

  • Selection (if-then-else):

    • From {P \land B}S1{Q} and {P \land \lnot B}S2{Q} infer {P}if\ B\ then\ S1\ else\ S2{Q}.

  • While Loop (pretest): choose loop invariant I.

    • Requirements:

    1. P \Rightarrow I (initially true).

    2. {I \land B} S {I} (body preserves invariant).

    3. I \land \lnot B \Rightarrow Q (termination yields postcondition).

    4. Loop terminates (sometimes proven via a variant function).

Loop Invariants
  • Must be weak enough to hold before first iteration, yet strong enough (with exit condition) to ensure postcondition.

Evaluation
  • Powerful for proofs & reasoning; challenging to define full rule sets; less directly useful to casual programmers.

Comparing Semantic Approaches

  • Operational: state changes given by algorithms (executable descriptions).

  • Denotational: state changes given by mathematical functions (high abstraction).

  • Axiomatic: focuses on logical assertions and correctness reasoning, independent of execution details.

Summary / Key Takeaways

  • BNF/CFG are standard meta-languages for syntax.

  • EBNF adds syntactic sugar for clarity.

  • Ambiguity must be resolved to guarantee unique parse trees (precise meaning).

  • Attribute Grammars extend CFGs to specify some static semantics (types, declarations).

  • Dynamic semantics can be described via three major formal methods:

    1. Operational

    2. Denotational

    3. Axiomatic

  • Each method trades off between rigor, abstraction, and practicality for designers, implementers, and users.