1/99
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Syntax
The structure of a language; defines how symbols can be combined to form valid programs.
Noam Chomsky
Developed the idea of context-free grammars in 1950, which describe the structure of languages.
John Backus and Peter Naur
Developed the notational system known as Backus-Naur Form (BNF) to describe programming language syntax, first used for Algol60.
Backus-Naur Form (BNF)
A notation system for describing context-free grammars, using "→" for rules and "|" for choices.
Context-Free Grammar
A grammar made up of rules where a single nonterminal on the left side can be replaced by a sequence of terminals or nonterminals on the right.
Extended Backus-Naur Form (EBNF)
A variation of BNF adding curly braces {} for repetition and square brackets [] for optional parts.
Syntax Diagram
A visual representation of grammar rules using shapes and arrows to show sequence, repetition, and choice.
Lexical Structure
The structure of the tokens (words) in a programming language; handled during the scanning phase.
Token
A basic unit in source code such as a keyword, identifier, literal, or operator.
Scanning Phase
The compilation phase where a translator collects sequences of characters and forms them into tokens.
Parsing Phase
The phase where the translator processes tokens to determine syntactic structure.
Semantics
The meaning of syntactically correct statements; syntax defines structure while semantics defines meaning.
Reserved Words (Keywords)
Predefined tokens that cannot be redefined in a program.
Literals or Constants
Tokens representing fixed values such as numbers or characters.
Special Symbols
Characters like ";", "<=", or "+" that have syntactic meaning.
Identifiers
Names given to variables, functions, etc. defined by the programmer.
Predefined Identifiers
Identifiers with initial meanings in the language that can be redefined by the programmer.
Principle of Longest Substring
The scanner should recognize the longest possible valid token sequence.
Token Delimiters (Whitespace)
Formatting like spaces or newlines that separate tokens.
Free-Format Language
A language where indentation or spacing does not affect structure, only token separation.
Fixed-Format Language
A language where tokens must appear in specific positions (e.g., Fortran).
Regular Expression
A formal pattern describing sets of strings used to define token structures.
Concatenation
Sequentially combining items in a regular expression.
Repetition (*)
Indicates zero or more repetitions of the preceding element in a regular expression.
Choice (|)
Indicates an alternative between items in a regular expression.
Character Range ([a-z])
Specifies a range of allowed characters.
Optional (?)
Marks an item as optional in a regular expression.
Period (.)
Represents any single character in a regular expression.
Context-Free Grammar Rule
A rule describing how a nonterminal can be replaced with other symbols.
Metasymbols
Symbols used in grammar notation like "→" and "|".
Derivation
The process of generating a string by applying grammar rules starting from the start symbol.
Nonterminal
A symbol that can be replaced by other symbols through grammar rules.
Terminal
A basic symbol that cannot be replaced further; corresponds to tokens.
Production
Another name for a grammar rule.
Start Symbol
The top-level nonterminal representing an entire language structure.
Language of the Grammar
The set of all strings derivable from the start symbol following the grammar rules.
Recursion in Grammar
A rule that refers to itself, allowing repeated structures.
Parse Tree
A graphical depiction of how a string is derived from the start symbol using grammar rules.
Abstract Syntax Tree (AST)
A simplified version of a parse tree that omits unnecessary syntactic details.
Syntax-Directed Semantics
Associating meaning with syntactic structures during compilation.
Concrete Syntax
The exact textual form of a program's syntax.
Ambiguous Grammar
A grammar where more than one parse tree can represent the same string.
Leftmost Derivation
A derivation where the leftmost nonterminal is always expanded first.
Rightmost Derivation
A derivation where the rightmost nonterminal is always expanded first.
Precedence
Determines which operations bind more tightly when parentheses are omitted.
Associativity
Determines the order of evaluation for operations of the same precedence (left or right).
Ambiguity Resolution
The process of rewriting or constraining grammar to remove ambiguity.
Left-Recursive Rule
A grammar rule where the leftmost symbol in the right-hand side is the same as the left-hand side.
Right-Recursive Rule
A grammar rule where the recursive call appears on the right end of the production.
Term Rule (Precedence Cascade)
A new grammar rule written to handle operator precedence.
EBNF Curly Braces {}
Indicate zero or more repetitions of a part of a rule (loop).
EBNF Square Brackets []
Indicate optional parts of a rule.
Syntax Diagram Symbols
Circles/ovals for terminals, rectangles for nonterminals, arrows for sequence.
Repetition in Syntax Diagrams
Represented by loops connecting back to a previous element.
Parser
A program or component that analyzes token sequences according to grammar rules.
Recognizer
A parser that accepts or rejects input strings based on whether they follow grammar rules.
Bottom-Up Parser
Builds parse trees from the leaves (tokens) up to the root (start symbol).
Shift-Reduce Parser
A bottom-up parser that shifts tokens onto a stack and reduces them to nonterminals.
Top-Down Parser
Expands nonterminals to match tokens, constructing derivations directly from the start symbol.
Parser Generator
A tool that automatically builds parsers from grammar specifications (e.g., YACC).
Recursive-Descent Parser
A top-down parser implemented with mutually recursive procedures representing grammar rules.
Left Recursion Problem
A left-recursive grammar can cause infinite recursion in a recursive-descent parser.
Left Recursion Removal
Done by rewriting rules or using EBNF loops {}.
Left Factoring
Transforming grammar rules to remove common prefixes so a parser can decide which rule to apply.
Single-Symbol Lookahead
Using one token of lookahead to decide parsing actions.
Predictive Parser
A top-down parser that makes decisions based only on a single lookahead token.
LL(1) Grammar
A grammar suitable for predictive parsing with one token of lookahead.
YACC
A parser generator that produces bottom-up parsers in C using grammar definitions.
Bison
A free version of YACC.
yyparse
The parsing function generated by YACC.
yylex
The scanner function called by the parser to get tokens.
Lexical Conventions
Rules for tokenizing input, such as whitespace or identifier syntax.
Lexics
The study of the lexical (token-level) structure of programming languages.
Syntax
The rules governing structure and form of code.
Semantics
The meaning and behavior of syntactically correct code.
Lexics vs Syntax
Lexics deals with tokens, syntax deals with structure.
Syntax vs Semantics
Syntax defines form; semantics defines meaning.
Context-Sensitive Rules
Rules depending on context, such as variable declarations before use.
Semantic Properties
Aspects that cannot be described with context-free grammar (e.g., type checking).
Reserved Word
A keyword that cannot be used as an identifier.
Predefined Identifier
A name that is defined by default but can be redefined.
TinyAda
A simplified programming language used to illustrate syntax analysis concepts.
Parsing Shell
A tool or framework that applies grammar rules to check token correctness.
Syntax Analyzer
The component that checks the syntactic correctness of a program using grammar rules.
Semantic Analysis
The stage after syntax analysis that checks for meaning and type correctness.
Ambiguity Testing
Searching for multiple leftmost derivations of the same string to detect ambiguity.
Grammar Language Definition
The total set of legal strings derivable from the grammar.
Syntax-Directed Translation
The process of generating intermediate representations during parsing.
Free-Format Example
C, Python (spacing irrelevant except indentation in Python).
Fixed-Format Example
Early Fortran (specific column-based code structure).
Principle of Longest Match
The scanner always matches the longest valid token possible.
Regular Expression Example
[0-9]+ defines an integer constant.
EBNF Loop Meaning
Represents repeated elements in grammar without left recursion.
Predictive Parsing Condition
No two right-hand sides of a rule can start with the same token.
Lookahead Limitation
If grammar choices share prefixes, more lookahead or refactoring is needed.
Syntax and Semantics Interdependence
Occurs when meaning helps resolve ambiguous syntax.
Compiler Phases
Scanning, Parsing, Semantic Analysis, Code Generation, Optimization.
Syntactic Correctness
The program follows all grammar rules.
Semantic Correctness
The program makes logical sense beyond syntax.
Parsing Tree Node Labels
Nonterminals for internal nodes, terminals for leaves.