PRELIM EXAM

Syntax and Semantics

Introduction

  • Syntax

    • Form and structure of the program code

  • Semantics

    • Meaning of the program code

      • meaning of expressions, statements, control structures

  • Syntax and semantics provide a language’s definition

    • Users of a language definition

      • Implementers

      • Programmers (the users of the language)

General Problem of Describing Syntax: Terminology

  • A sentence is a string of characters over some alphabet

  • A language is a set of sentences

  • A lexeme is the lowest level of syntactic unit of a language

    • *, sum, begin

  • A token is a category of lexemes

    • identifiers

Formal Definition of languages

  • Generators (Grammars)

    • A device that generates sentences of a languages

    • One can determine if the syntax of a particular sentence is syntactically correct by comparing it to the structure of the generator

  • Recognizers (Parser)

    • A recognition device reads input strings over the alphabet of the language and decides whether the input strings belong to the language

      • syntax analysis part of a compiler

BNF / Context- Free Grammars

  • Context-Free Grammars (CFG)

    • Developed by Noam Chomsky in the mid-1950s

    • Language generators, meant to describe the syntax of natural languages

    • Define a class of languages called context-free languages

  • Backus-Naur Form (1959)

    • Invented by John Backus to describe Algol 58

  • BNF and CFG represent equal concepts

BNF - CFG Fundamentals

  • In BNF, abstractions are used to represent classes of syntactic structures-they act like syntactic variables (also called nonterminal symbols, or just nonterminals)

  • Terminals are lexemes or tokens

  • A rule has a left-hand side (LHS), which is a nonterminal, and a right-hand side (RHS), which is a string of terminals and/or nonterminals

BNF / CFG Grammar Rules

  • An abstraction (or nonterminal symbol) can have more than one RHS

<stmt> -> <single_stmt> | begin <stmt_list> end

BNF - CFG Fundamentals

  • Nonterminals are often enclosed in angle brackets

    • Example of BNF rules:

    <ident_list> → identifier | identifier , <ident_list>
    <if_stmt> → if <logic_expr> then <stmt>
  • Grammar: a finite non-empty set of rules

  • A start symbol is a special element of the nonterminals of a grammar

CFG Formal Definition

  • A context-free grammar G is defined by the 4-tuple G = (V, Σ, R, S) where

    1. V is a finite set; each element v in V is called a non-terminal character or a variable. Each variable represents a different type of phrase or clause in the sentence

    2. Σ is a finite set of terminals, disjoint from V, which make up the actual content of the sentence. The set of terminals is the alphabet of the language defined by the grammar G

    3. R is a finite relation from V to (V + Σ)*, where the asterisk represents the Kleene star operation. The members of R, are called the (rewrite) rules or productions of the grammar. (also commonly symbolized by a P)

    4. S is the start variable (or the start symbol), used to represent the whole sentence (or program). It must be an element of V

Example: Describing List

  • Sytanctic lists are described using recursion

<ident_list> → ident
| ident ',' <ident_list>

Derivation

  • A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols)

An example Grammar

<program> → <stmts>
<stmts> → <stmt> | <stmt> ';' <stmts>
<stmt> → <var> '=' <expr>
<var> → 'a' | 'b' | 'c' | 'd'
<expr> → <term> '+' <term> | <term> '-' <term>
<term> → <var> | 'const'

An Example Derivation

<program> => <stmts> => <stmt>
=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + const

Derivations - Some Notions

  • Every string of symbols in a derivation is a sentential form

  • A sentence is a sentential form that has only terminal symbols

  • A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded. Accordingly a rightmost derivation for the rightmost nonterminal

    • A derivation may be neither leftmost nor rightmost

Parse Tree

  • A parse tree represents a hierarchical representation of a derivation by means of a tree

  • Example:

Ambiguity in Grammars

  • A grammar is ambiguous if and only if it generates a sentential form (the language generated by the grammar comprises at least one word) that has two or more distinct parse trees

Example for Ambiguous Grammar

<expr> → <expr> <op> <expr> | const
<op> → ‘+' | '-'

word: const – const + const

Unambiguous Grammar

  • Grammar on the example before in ambiguous form:

    • We get some form of operator precedence

    <expr> → <expr> - <term> | <term>
    <term> → <term> + const | const

Associativity of Operators

  • Operator associativity can also be indicated by a grammar

  • Example:

<expr> -> <expr> + <expr> | const
(ambiguous)
<expr> -> <expr> + const | const
(unambiguous)

Extended BNF

  • Optional Parts are placed in brackets []

<proc_call> -> ident [(<expr_list>)]
  • Alternative parts of RHS are placed inside parentheses and separated via vertical bars

<term> -> <term> (+|-) const
  • Repetitions (0 or more) are placed inside braces {}

<ident> -> letter {letter|digit}

BNF and EBNF Examples

  • BNF

    <expr> → <expr> + <term>
    | <expr> - <term>
    | <term>
    <term> → <term> * <factor>
    | <term> / <factor>
    | <factor>
  • EBNF

    <expr> → <term> {(+ | -) <term>}
    <term> → <factor> {(* | /) <factor>}

Variations in EBNF

  • Alternative RHS are put on separate lines

  • Use of colon instead of →

  • Use of opt for optional parts

  • Use of oneof for choices

Static Semantics

  • Nothing to do with meaning

  • Context-free grammars (CFGs) cannot describe all of the syntax of programming languages

  • Categories of constructs that are “trouble-maker”

    • Still Context-free, but cumbersome (e.g., types of operands in expressions)

    • Non-context-free (e.g., variables must be declared before they are used)

Attribute Grammars

  • Attribute grammars (AGs) have additions to CFGs to carry some semantic info on parse tree nodes

  • Primary values of AGs:

    • Static semantics specification for static semantics checks

Attribute Grammars - Definition

  • Def: An attribute grammar is a context-free grammar with the following additions:

    • For each grammar symbol x there is a set A(x) of attribute values

    • Each rule has a set of functions that define certain attributes of the nonterminals in the rule

    • Each rule has a (possibly empty) set of predicates to check for attribute consistency

  • Let $X_0 -> X_1...X_n$ be a rule

  • Functions of the form $S(X_0) = f(A(X_1), ..., A(X_n))$ define synthesized attributes

  • Initially, there are intrinsic attributes on the attributes on the leaves

Attribute Grammars: An Example

  • Syntax

    <assign> -> <var> = <expr>
    <expr> -> <var> + <var> | <var>
    <var> -> id
  • Attributes

    • actual_type: synthesized with the non-terminals <var> and <expr>

    • expected_type: inherited with the non-terminal <expr>

robot