Chapter 3

Chapter 3: Describing Syntax and Semantics

  • Source: Concepts of Programming Languages, Robert W. Sebesta
  • ISBN: 0-321-49362-1 12/E

Chapter Topics

  • Introduction
  • The General Problem of Describing Syntax
  • Formal Methods of Describing Syntax
  • Attribute Grammars
  • Describing the Meanings of Programs: Dynamic Semantics

Introduction

  • Syntax: Refers to the form or structure of expressions, statements, and program units.
  • Semantics: Refers to the meanings of expressions, statements, and program units.
  • Both syntax and semantics are essential in defining a programming language and are important for:
    • Users of a language definition
    • Language designers
    • Implementers
    • Programmers (end users of the language)

The General Problem of Describing Syntax: Terminology

  • Sentence: A string of characters over some alphabet.
  • Language: A set of sentences.
  • Lexeme: The lowest level syntactic unit (e.g., symbols like *, sum, begin).
  • Token: A category of lexemes (e.g., identifiers)

Formal Definition of Languages

  • Recognizers: Devices that read input strings and decide if they belong to the language (example: syntax analysis in compilers).
  • Generators: Devices that produce sentences of a language, allowing verification of syntax correctness by comparison to the generator’s structure.

BNF and Context-Free Grammars

  • Context-Free Grammars: Introduced by Noam Chomsky in the 1950s to describe natural languages and are used as language generators.
  • Backus-Naur Form (BNF): Developed by John Backus to describe Algol 58's syntax; is equivalent to context-free grammars.

BNF Fundamentals

  • Abstractions in BNF represent classes of syntactic structures (nonterminal symbols) and are often enclosed in angle brackets.
  • Terminals: These are lexemes or tokens.
  • A rule consists of a left-hand side (LHS) nonterminal and a right-hand side (RHS) of terminals and/or nonterminals.

BNF Rules

  • Nonterminal examples include:
    • <ident_list> → identifier | identifier, <ident_list>
    • <if_stmt> → if <logic_expr> then <stmt>
  • A grammar is defined as a finite non-empty set of rules, with a special starting symbol known as a start symbol

Describing Lists

  • Lists are defined using recursion:
    • <ident_list> → ident | ident, <ident_list>
  • A derivation is the repeated application of rules, starting with the start symbol leading to a sentence composed entirely of terminal symbols.

An Example Grammar

  • Sample grammar:
<program> → <stmts> 
<stmts> → <stmt> | <stmt> ; <stmts> 
<stmt> → <var> = <expr> 
<var> → a | b | c | d 
<expr> → <term> + <term> | <term> - <term> 
<term> → <var> | const 

An Example Derivation

  • Derivation example demonstrating syntax:
<program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const

Derivations

  • All strings in a derivation are called sentential forms.
  • A sentence is a sentential form composed only of terminal symbols.
  • In a leftmost derivation, the leftmost nonterminal is expanded first. A derivation may not follow this specific order.

Parse Tree

  • A parse tree visually represents a derivation, displaying the hierarchical structure of the syntactic elements…
<program> 
├── <stmts> 
│   ├── <stmt> 
│   │   ├── const 
│   ├── <var> 
│   ├── =  
│   └── <expr> 

Ambiguity in Grammars

  • A grammar is ambiguous if it can generate a sentential form represented by two or more distinct parse trees.

Example of Ambiguous Expression Grammar

  • Example grammar indicating ambiguity:
<expr> → <expr> <op> <expr> | const 
<op> → / | -

Unambiguous Expression Grammar

  • By using parse trees to indicate operator precedence, ambiguity can be prevented:
<expr> → <expr> - <term> | <term> 
<term> → <term> / const | const

Operator Associativity

  • Associativity can be indicated in a grammar:
  • Example of ambiguous association:
<expr> -> <expr> + <expr> | const
  • Example of unambiguous association:
<expr> -> <expr> + const | const 

Extended BNF

  • Optional parts: Denoted with brackets [ ]
<proc_call> -> ident [(<expr_list>)]
  • Alternatives: Placed inside parentheses and separated by vertical bars.
<term> → <term> (+|-) const
  • Repetitions: Indicated with braces { }

BNF and EBNF

  • BNF Example:
<expr> → <expr> + <term> | <expr> - <term> | <term> 
<term> → <term> * <factor> | <term> / <factor> | <factor>
  • EBNF Example:
<expr> → <term> {(+ | -) <term>} 

Recent Variations in EBNF

  • Variations include placing alternative RHSs on separate lines, using a colon instead of =>, utilizing keywords such as "opt" for optional parts and "oneof" for selections.

Static Semantics

  • Static semantics focus on constraints that are not related to meaning, indicating context-free grammars cannot cover all constructs.
  • Issues include operand types, and the necessity for variables to be declared prior to use.

Attribute Grammars

  • Attribute Grammars (AGs) enhance context-free grammars with semantic information on parse tree nodes, crucial for static semantic consistency in compilers.

Definition of Attribute Grammars

  • An attribute grammar consists of a context-free grammar ( G = (S, N, T, P) ) plus additional elements:
    • Each grammar symbol has associated attribute values.
    • Each rule includes functions defining attributes and predicates for checking consistency.

Example of Attribute Grammar

  • Grammar for assignment:
<assign> -> <var> = <expr>  
<expr> -> <var> + <var> | <var></var>  
  • Definitions of actualtype (synthesized) and expectedtype (inherited).

Computation of Attribute Values

  • There are methods to compute attributes based on inherited and synthesized values; these may require a combination of top-down and bottom-up approaches.

Semantics Overview

  • Semantics lack a universally accepted notation, yet they are crucial for:
    • Clarifying meanings to programmers
    • Enabling compilers to properly implement constructs
    • Supporting correctness proofs.

Operational Semantics

  • Define the meaning of a program through its execution on a machine, tracking changes in state (memory, registers).

Issues with Operational Semantics

  • Hardware or software interpreters may prove expensive and difficult to understand
  • A better solution involves building a complete computer simulation.

Evaluation of Operational Semantics

  • Useful for informal purposes, but can be complex for formal definitions (e.g. VDL).

Denotational Semantics

  • Developed around recursive function theory as the most abstract semantics description method.
  • Involves mapping language constructs to mathematical object instances and states.

Program State in Denotational Semantics

  • The state signifies the values of current variables set as pairs of names and values.

Expressions

  • Define a semantics mapping for expressions that may include decimal numbers, variables, or binary expressions.

Assignment Statements Semantics

  • Maps the assignments to the resulting state sets, incorporating error checks.

Logical Pretest Loops Semantics

  • Maps state sets during logical loops, checking preconditions and the evaluation of body states.

Evaluation of Denotational Semantics

  • Useful in correctness proofs and language design contexts, though complex for general language users.

Axiomatic Semantics

  • Based on formal logic, this method provides frameworks for program verification using assertions.

Pre- and Post-conditions

  • Assertions specify relationships before and after statements are executed, with weakest preconditions guiding the most unrestricted conditions.

Program Proof Process

  • The aimed result for correctness involves evaluating backwards from the desired outcome through program statements.

Axiomatic Semantics Characteristics

  • Assertions ensure loop invariants are maintained before and throughout the execution of statements, ensuring correctness.

Evaluation of Axiomatic Semantics

  • While effective for correctness proofs, developing comprehensive inference rules for all statements in a language is complex and less usable for everyday language activities.

Denotation vs Operational Semantics

  • Differences lie in execution (operational coded algorithms) versus mathematically defined state change functions within denotational frameworks.

Summary

  • BNF and context-free grammars offer suitable descriptions for programming language syntax.
  • Attribute grammars bridge syntax and semantics effectively.
  • Three primary methods of semantic description include operational, axiomatic, and denotational.