## Chapter 3: Describing Syntax and Semantics
- **Source**: Concepts of Programming Languages, Robert W. Sebesta
- **ISBN**: 0-321-49362-1 12/E
## Chapter Topics
- Introduction
- The General Problem of Describing Syntax
- Formal Methods of Describing Syntax
- Attribute Grammars
- Describing the Meanings of Programs: Dynamic Semantics
## Introduction
- **Syntax**: Refers to the form or structure of expressions, statements, and program units.
- **Semantics**: Refers to the meanings of expressions, statements, and program units.
- Both syntax and semantics are essential in defining a programming language and are important for:
- Users of a language definition
- Language designers
- Implementers
- Programmers (end users of the language)
## The General Problem of Describing Syntax: Terminology
- **Sentence**: A string of characters over some alphabet.
- **Language**: A set of sentences.
- **Lexeme**: The lowest level syntactic unit (e.g., symbols like *, sum, begin).
- **Token**: A category of lexemes (e.g., identifiers)
## Formal Definition of Languages
- **Recognizers**: Devices that read input strings and decide if they belong to the language (example: syntax analysis in compilers).
- **Generators**: Devices that produce sentences of a language, allowing verification of syntax correctness by comparison to the generator’s structure.
## BNF and Context-Free Grammars
- **Context-Free Grammars**: Introduced by Noam Chomsky in the 1950s to describe natural languages and are used as language generators.
- **Backus-Naur Form (BNF)**: Developed by John Backus to describe Algol 58's syntax; is equivalent to context-free grammars.
## BNF Fundamentals
- Abstractions in BNF represent classes of syntactic structures (nonterminal symbols) and are often enclosed in angle brackets.
- **Terminals**: These are lexemes or tokens.
- A **rule** consists of a left-hand side (LHS) nonterminal and a right-hand side (RHS) of terminals and/or nonterminals.
## BNF Rules
- Nonterminal examples include:
- `
→ identifier | identifier, `
- ` → if then `
- A **grammar** is defined as a finite non-empty set of rules, with a special starting symbol known as a start symbol
## Describing Lists
- Lists are defined using recursion:
- ` → ident | ident, `
- A **derivation** is the repeated application of rules, starting with the start symbol leading to a sentence composed entirely of terminal symbols.
## An Example Grammar
- Sample grammar:
```
→
→ | ;
→ =
→ a | b | c | d
→ + | -
→ | const
```
## An Example Derivation
- Derivation example demonstrating syntax:
```
=> => => = => a = => a = + => a = + => a = b + => a = b + const
```
## Derivations
- All strings in a derivation are called **sentential forms**.
- A **sentence** is a sentential form composed only of terminal symbols.
- In a **leftmost derivation**, the leftmost nonterminal is expanded first. A derivation may not follow this specific order.
## Parse Tree
- A **parse tree** visually represents a derivation, displaying the hierarchical structure of the syntactic elements...
```
├──
│ ├──
│ │ ├── const
│ ├──
│ ├── =
│ └──
```
## Ambiguity in Grammars
- A grammar is **ambiguous** if it can generate a sentential form represented by two or more distinct parse trees.
## Example of Ambiguous Expression Grammar
- Example grammar indicating ambiguity:
```
→ | const
→ / | -
```
## Unambiguous Expression Grammar
- By using parse trees to indicate operator precedence, ambiguity can be prevented:
```
→ - |
→ / const | const
```
## Operator Associativity
- **Associativity** can be indicated in a grammar:
- Example of ambiguous association:
```
-> + | const
```
- Example of unambiguous association:
```
-> + const | const
```
## Extended BNF
- **Optional parts**: Denoted with brackets [ ]
```
-> ident [()]
```
- **Alternatives**: Placed inside parentheses and separated by vertical bars.
```
→ (+|-) const
```
- **Repetitions**: Indicated with braces { }
## BNF and EBNF
- BNF Example:
```
→ + | - |
→ * | / |
```
- EBNF Example:
```
→ {(+ | -) }
```
## Recent Variations in EBNF
- Variations include placing alternative RHSs on separate lines, using a colon instead of =>, utilizing keywords such as "opt" for optional parts and "oneof" for selections.
## Static Semantics
- Static semantics focus on constraints that are not related to meaning, indicating context-free grammars cannot cover all constructs.
- Issues include operand types, and the necessity for variables to be declared prior to use.
## Attribute Grammars
- **Attribute Grammars (AGs)** enhance context-free grammars with semantic information on parse tree nodes, crucial for static semantic consistency in compilers.
## Definition of Attribute Grammars
- An attribute grammar consists of a context-free grammar \( G = (S, N, T, P) \) plus additional elements:
- Each grammar symbol has associated attribute values.
- Each rule includes functions defining attributes and predicates for checking consistency.
## Example of Attribute Grammar
- Grammar for assignment:
```
-> =
-> + |
```
- Definitions of **actual_type** (synthesized) and **expected_type** (inherited).
## Computation of Attribute Values
- There are methods to compute attributes based on inherited and synthesized values; these may require a combination of top-down and bottom-up approaches.
## Semantics Overview
- Semantics lack a universally accepted notation, yet they are crucial for:
- Clarifying meanings to programmers
- Enabling compilers to properly implement constructs
- Supporting correctness proofs.
## Operational Semantics
- Define the meaning of a program through its execution on a machine, tracking changes in state (memory, registers).
## Issues with Operational Semantics
- Hardware or software interpreters may prove expensive and difficult to understand
- A better solution involves building a complete computer simulation.
## Evaluation of Operational Semantics
- Useful for informal purposes, but can be complex for formal definitions (e.g. VDL).
## Denotational Semantics
- Developed around recursive function theory as the most abstract semantics description method.
- Involves mapping language constructs to mathematical object instances and states.
## Program State in Denotational Semantics
- The state signifies the values of current variables set as pairs of names and values.
## Expressions
- Define a semantics mapping for expressions that may include decimal numbers, variables, or binary expressions.
## Assignment Statements Semantics
- Maps the assignments to the resulting state sets, incorporating error checks.
## Logical Pretest Loops Semantics
- Maps state sets during logical loops, checking preconditions and the evaluation of body states.
## Evaluation of Denotational Semantics
- Useful in correctness proofs and language design contexts, though complex for general language users.
## Axiomatic Semantics
- Based on formal logic, this method provides frameworks for program verification using assertions.
## Pre- and Post-conditions
- Assertions specify relationships before and after statements are executed, with weakest preconditions guiding the most unrestricted conditions.
## Program Proof Process
- The aimed result for correctness involves evaluating backwards from the desired outcome through program statements.
## Axiomatic Semantics Characteristics
- Assertions ensure loop invariants are maintained before and throughout the execution of statements, ensuring correctness.
## Evaluation of Axiomatic Semantics
- While effective for correctness proofs, developing comprehensive inference rules for all statements in a language is complex and less usable for everyday language activities.
## Denotation vs Operational Semantics
- Differences lie in execution (operational coded algorithms) versus mathematically defined state change functions within denotational frameworks.
## Summary
- BNF and context-free grammars offer suitable descriptions for programming language syntax.
- Attribute grammars bridge syntax and semantics effectively.
- Three primary methods of semantic description include operational, axiomatic, and denotational.