COMP-2140: Computer Languages, Grammars, and Translators - In-Depth Notes
Formal Definition of Language
- A language is defined as a set of strings.
- Example:
- English Language: {“the brown dog likes a good car”, …}
- Java Language: {program | program written in Java}
- HTML Language: {document | document written in HTML}
How to Define a Language
- Direct enumeration of all possible expressions in a language can be impractical.
- A language can be defined through:
- A set of words (lexicon)
- A set of rules (grammar)
Example of Defining English
- Vocabulary:
- Words: {a, the, brown, friendly, good, book, refrigerator, dog, car, sings, eats, likes}
- Rules of formation:
- A Sentence structure can be defined as:
Derivation of a Sentence
- The process of generating a sentence from the grammatical rules can be shown step-by-step:
- For example, the sentence "the brown dog likes a good car" can be derived as follows:
- Apply rules to break down each component step by step until reaching the complete sentence.
Parse Tree of a Sentence
- The derivation process can be visualized using a parse tree.
- For instance, for the sentence "the brown dog likes a good car":
- Root Node: SENTENCE
- Children Nodes: SUBJECT, VERB, OBJECT
- Further breakdown:
- SUBJECT: ARTICLE (the), ADJECTIVE (brown), NOUN (dog)
- VERB: (likes)
- OBJECT: ARTICLE (a), ADJECTIVE (good), NOUN (car)
Types of Parsers
Top-Down Parsing:
- Starts from the root and expands downwards.
- Repeatedly rewrites the start symbol, pursuing the leftmost derivation of the input string.
- Easier to implement but can be less efficient.
Bottom-Up Parsing:
- Constructs the parse tree from the leaves up to the root.
- Starts with the input tokens and combines them to form interior nodes.
- Finds a rightmost derivation and accepts when the start symbol is reached.
- More prevalent in practical applications.
Terminals and Non-terminals
- Symbols in Grammar:
- Terminals: Actual words (e.g., dog, likes).
- Non-terminals: Categories like SUBJECT, VERB.
Formal Definition of Grammar
- A grammar is a 4-tuple: G = (Σ, N, P, S)
- Σ: Finite set of terminal symbols
- N: Finite set of nonterminal symbols
- P: Set of production rules
- S: Start symbol
- Example for English Grammar:
Defining an Infinite Language
- Infinite languages can be defined using recursive rules. An example is where a SUBJECT/OBJECT can include an indefinite number of adjectives.
- Rule example:
- Example Sentence: "the good brown dog likes a good friendly book"
Chomsky Hierarchy
- A classification of grammars based on their generative power and form of production rules.
- Levels:
- Level 3: Regular grammar
- Level 2: Context-free grammar
- Level 1: Context-sensitive grammar
- Level 0: Unrestricted grammar
Context-Sensitive Grammar
- Defined as AαB → AβB, where the substitution of α depends on the context provided by surrounding symbols.
Takeaways
- Understanding language and grammars is crucial in developing parsers and compilers.
- Recognizing grammars and their types forms a foundational knowledge for computer language processing.
- Explore top-down vs bottom-up parsing methods depending on the intended application.