Programming Languages - Syntax

Programming Languages - Syntax Notes

Objectives

  • Understand the lexical structure of programming languages
  • Understand context-free grammars and Backus-Naur Forms (BNFs)
  • Become familiar with parse trees and abstract syntax trees
  • Understand ambiguity, associativity, and precedence
  • Learn to use Extended Backus-Naur Forms (EBNFs) and syntax diagrams
  • Become familiar with parsing techniques and tools
  • Understand lexics vs. syntax vs. semantics
  • Build a syntax analyzer for TinyAda

Introduction

  • Syntax: The structure of a programming language.
  • Developed from the ideas of Noam Chomsky in the 1950s, who introduced context-free grammars.
  • Backus-Naur Forms (BNFs): A notation developed by John Backus and Peter Naur for describing grammars, first used for Algol60.
  • Understanding BNF is crucial for modern computer scientists when interpreting language syntax.

Variations of BNF

  • BNF: Basic form with simple notation.
  • EBNF: Extended BNF which introduces additional notation for clarity.
  • Syntax Diagrams: Visual representation of grammatical rules.

Lexical Structure of Programming Languages

  • Lexical Structure: Refers to the structure of tokens or words in a programming language.
  • Tokens Classification:
    • Reserved words (keywords)
    • Literals or constants
    • Special symbols
    • Identifiers (e.g., variable names)
  • Scanning Phase: Converts character sequences into tokens.
  • Parsing Phase: Processes tokens into a structured syntactic representation.

Lexics vs. Syntax vs. Semantics

  • Lexics: Deals with the symbols (alphabet), words (combinations of symbols), and means of token recognition.
  • Syntax: Concerns valid sequences of tokens based on language rules.
  • Semantics: Defines the expected behavior of syntactically correct strings.
  • Example highlighting the contrast: "Colorless green ideas sleep furiously" has syntax but no meaning.

Phases of Compilation

  1. Character stream: Input from source code.
  2. Token stream: Collection of tokens.
  3. Parse tree & Abstract syntax tree (AST): Internal representations used by the compiler.
  4. Assembly or machine language generation: Final output for execution.

Context-Free Grammars and BNFs

  • Context-Free Grammar: Set of rules defining a language's syntax.
  • Derivation Process: Constructing sentences by replacing symbols according to grammar rules.
  • Ambiguity: Occurs when a grammar allows multiple parse trees for a single string.
  • Nonterminals vs. Terminals: Nonterminals are symbols that can be replaced, terminals are actual tokens.

Parse Trees and Abstract Syntax Trees

  • Parse Trees: Graphical representations showing how a string is derived from the grammar rules.
  • Abstract Syntax Trees: Simplified trees that remove redundant parts of the parse tree, focusing on necessary structural elements.
  • Nodes and Leaves: Nonterminals are nodes with children, while terminals are leaf nodes.

Ambiguity, Associativity, and Precedence

  • Associativity: Rule that determines how operators of the same precedence are grouped.
  • Ambiguous Grammars: Need to be modified to eliminate ambiguity – typically through establishing precedence and associativity rules.
  • Leftmost Derivation: A method to systematically replace the leftmost nonterminal to resolve ambiguities.
    • Example: Left-recursive vs right-recursive definitions impact associativity.

EBNFs and Syntax Diagrams

  • EBNF: Introduces useful notations for optional elements and repetition using {} and [] respectively.
  • Syntax Diagrams: Visual tools that represent grammatical rules through shapes for terminals and nonterminals, illustrating sequence and repetition.

Parsing Techniques and Tools

  • Parsing: The process of analyzing a string based on a grammar.
  • Bottom-up parsers: Construct parse trees from leaves upward.
  • Top-down parsers: Start with the top-most element and work downwards.
  • Parser Generators: Software that automates the creation of parsers based on given grammar specifications.
  • Predictive Parsers: Make parsing decisions based on lookahead tokens to ensure correct derivation.

Lexics vs. Syntax vs. Semantics Discussion

  • Formatting aspects, such as whitespace and specific language rules, dictate how tokens are recognized and are distinct from semantics.
  • Some semantic properties are context-sensitive and cannot be represented by context-free rules alone, indicating the need for careful language design.

Case Study: Building a Syntax Analyzer for TinyAda

  • TinyAda: A simplified language to demonstrate syntax features of larger languages.
  • Includes declarations, statements, and expressions, allowing for nested constructs.
  • The parsing shell ensures token types adhere to expected language constructs, serving as a foundation for further semantic analysis.