Chapter 3: Describing Syntax and Semantics
- Source: Concepts of Programming Languages, Robert W. Sebesta
- ISBN: 0-321-49362-1 12/E
Chapter Topics
- Introduction
- The General Problem of Describing Syntax
- Formal Methods of Describing Syntax
- Attribute Grammars
- Describing the Meanings of Programs: Dynamic Semantics
Introduction
- Syntax: Refers to the form or structure of expressions, statements, and program units.
- Semantics: Refers to the meanings of expressions, statements, and program units.
- Both syntax and semantics are essential in defining a programming language and are important for:
- Users of a language definition
- Language designers
- Implementers
- Programmers (end users of the language)
The General Problem of Describing Syntax: Terminology
- Sentence: A string of characters over some alphabet.
- Language: A set of sentences.
- Lexeme: The lowest level syntactic unit (e.g., symbols like *, sum, begin).
- Token: A category of lexemes (e.g., identifiers)
- Recognizers: Devices that read input strings and decide if they belong to the language (example: syntax analysis in compilers).
- Generators: Devices that produce sentences of a language, allowing verification of syntax correctness by comparison to the generator’s structure.
BNF and Context-Free Grammars
- Context-Free Grammars: Introduced by Noam Chomsky in the 1950s to describe natural languages and are used as language generators.
- Backus-Naur Form (BNF): Developed by John Backus to describe Algol 58's syntax; is equivalent to context-free grammars.
BNF Fundamentals
- Abstractions in BNF represent classes of syntactic structures (nonterminal symbols) and are often enclosed in angle brackets.
- Terminals: These are lexemes or tokens.
- A rule consists of a left-hand side (LHS) nonterminal and a right-hand side (RHS) of terminals and/or nonterminals.
BNF Rules
- Nonterminal examples include:
<ident_list> → identifier | identifier, <ident_list>
<if_stmt> → if <logic_expr> then <stmt>
- A grammar is defined as a finite non-empty set of rules, with a special starting symbol known as a start symbol
Describing Lists
- Lists are defined using recursion:
<ident_list> → ident | ident, <ident_list>
- A derivation is the repeated application of rules, starting with the start symbol leading to a sentence composed entirely of terminal symbols.
An Example Grammar
<program> → <stmts>
<stmts> → <stmt> | <stmt> ; <stmts>
<stmt> → <var> = <expr>
<var> → a | b | c | d
<expr> → <term> + <term> | <term> - <term>
<term> → <var> | const
An Example Derivation
- Derivation example demonstrating syntax:
<program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const
Derivations
- All strings in a derivation are called sentential forms.
- A sentence is a sentential form composed only of terminal symbols.
- In a leftmost derivation, the leftmost nonterminal is expanded first. A derivation may not follow this specific order.
Parse Tree
- A parse tree visually represents a derivation, displaying the hierarchical structure of the syntactic elements…
<program>
├── <stmts>
│ ├── <stmt>
│ │ ├── const
│ ├── <var>
│ ├── =
│ └── <expr>
Ambiguity in Grammars
- A grammar is ambiguous if it can generate a sentential form represented by two or more distinct parse trees.
Example of Ambiguous Expression Grammar
- Example grammar indicating ambiguity:
<expr> → <expr> <op> <expr> | const
<op> → / | -
Unambiguous Expression Grammar
- By using parse trees to indicate operator precedence, ambiguity can be prevented:
<expr> → <expr> - <term> | <term>
<term> → <term> / const | const
Operator Associativity
- Associativity can be indicated in a grammar:
- Example of ambiguous association:
<expr> -> <expr> + <expr> | const
- Example of unambiguous association:
<expr> -> <expr> + const | const
Extended BNF
- Optional parts: Denoted with brackets [ ]
<proc_call> -> ident [(<expr_list>)]
- Alternatives: Placed inside parentheses and separated by vertical bars.
<term> → <term> (+|-) const
- Repetitions: Indicated with braces { }
BNF and EBNF
<expr> → <expr> + <term> | <expr> - <term> | <term>
<term> → <term> * <factor> | <term> / <factor> | <factor>
<expr> → <term> {(+ | -) <term>}
Recent Variations in EBNF
- Variations include placing alternative RHSs on separate lines, using a colon instead of =>, utilizing keywords such as "opt" for optional parts and "oneof" for selections.
Static Semantics
- Static semantics focus on constraints that are not related to meaning, indicating context-free grammars cannot cover all constructs.
- Issues include operand types, and the necessity for variables to be declared prior to use.
Attribute Grammars
- Attribute Grammars (AGs) enhance context-free grammars with semantic information on parse tree nodes, crucial for static semantic consistency in compilers.
Definition of Attribute Grammars
- An attribute grammar consists of a context-free grammar ( G = (S, N, T, P) ) plus additional elements:
- Each grammar symbol has associated attribute values.
- Each rule includes functions defining attributes and predicates for checking consistency.
Example of Attribute Grammar
<assign> -> <var> = <expr>
<expr> -> <var> + <var> | <var></var>
- Definitions of actualtype (synthesized) and expectedtype (inherited).
Computation of Attribute Values
- There are methods to compute attributes based on inherited and synthesized values; these may require a combination of top-down and bottom-up approaches.
Semantics Overview
- Semantics lack a universally accepted notation, yet they are crucial for:
- Clarifying meanings to programmers
- Enabling compilers to properly implement constructs
- Supporting correctness proofs.
Operational Semantics
- Define the meaning of a program through its execution on a machine, tracking changes in state (memory, registers).
Issues with Operational Semantics
- Hardware or software interpreters may prove expensive and difficult to understand
- A better solution involves building a complete computer simulation.
Evaluation of Operational Semantics
- Useful for informal purposes, but can be complex for formal definitions (e.g. VDL).
Denotational Semantics
- Developed around recursive function theory as the most abstract semantics description method.
- Involves mapping language constructs to mathematical object instances and states.
Program State in Denotational Semantics
- The state signifies the values of current variables set as pairs of names and values.
Expressions
- Define a semantics mapping for expressions that may include decimal numbers, variables, or binary expressions.
Assignment Statements Semantics
- Maps the assignments to the resulting state sets, incorporating error checks.
Logical Pretest Loops Semantics
- Maps state sets during logical loops, checking preconditions and the evaluation of body states.
Evaluation of Denotational Semantics
- Useful in correctness proofs and language design contexts, though complex for general language users.
Axiomatic Semantics
- Based on formal logic, this method provides frameworks for program verification using assertions.
Pre- and Post-conditions
- Assertions specify relationships before and after statements are executed, with weakest preconditions guiding the most unrestricted conditions.
Program Proof Process
- The aimed result for correctness involves evaluating backwards from the desired outcome through program statements.
Axiomatic Semantics Characteristics
- Assertions ensure loop invariants are maintained before and throughout the execution of statements, ensuring correctness.
Evaluation of Axiomatic Semantics
- While effective for correctness proofs, developing comprehensive inference rules for all statements in a language is complex and less usable for everyday language activities.
Denotation vs Operational Semantics
- Differences lie in execution (operational coded algorithms) versus mathematically defined state change functions within denotational frameworks.
Summary
- BNF and context-free grammars offer suitable descriptions for programming language syntax.
- Attribute grammars bridge syntax and semantics effectively.
- Three primary methods of semantic description include operational, axiomatic, and denotational.