COMP-2140: Computer Languages, Grammars, and Translators - In-Depth Notes

Formal Definition of Language

  • A language is defined as a set of strings.
    • Example:
    • English Language: {“the brown dog likes a good car”, …}
    • Java Language: {program | program written in Java}
    • HTML Language: {document | document written in HTML}

How to Define a Language

  • Direct enumeration of all possible expressions in a language can be impractical.
  • A language can be defined through:
    • A set of words (lexicon)
    • A set of rules (grammar)
Example of Defining English
  • Vocabulary:
    • Words: {a, the, brown, friendly, good, book, refrigerator, dog, car, sings, eats, likes}
  • Rules of formation:
    • A Sentence structure can be defined as:
    • extSENTENCEextSUBJECTextVERBextOBJECText{SENTENCE} → ext{SUBJECT} ext{VERB} ext{OBJECT}
    • extSUBJECTextARTICLEextADJECTIVEextNOUNext{SUBJECT} → ext{ARTICLE} ext{ADJECTIVE} ext{NOUN}
    • extOBJECTextARTICLEextADJECTIVEextNOUNext{OBJECT} → ext{ARTICLE} ext{ADJECTIVE} ext{NOUN}
    • extARTICLEatheextEMPTYext{ARTICLE} → a | the | ext{EMPTY}
    • extADJECTIVEbrownfriendlygoodextEMPTYext{ADJECTIVE} → brown | friendly | good | ext{EMPTY}
    • extNOUNbookrefrigeratordogcarext{NOUN} → book | refrigerator | dog | car
    • extVERBsingseatslikesext{VERB} → sings | eats | likes

Derivation of a Sentence

  • The process of generating a sentence from the grammatical rules can be shown step-by-step:
  • For example, the sentence "the brown dog likes a good car" can be derived as follows:
    • extSENTENCEextSUBJECTextVERBextOBJECText{SENTENCE} → ext{SUBJECT} ext{VERB} ext{OBJECT}
    • Apply rules to break down each component step by step until reaching the complete sentence.

Parse Tree of a Sentence

  • The derivation process can be visualized using a parse tree.
    • For instance, for the sentence "the brown dog likes a good car":
    • Root Node: SENTENCE
    • Children Nodes: SUBJECT, VERB, OBJECT
    • Further breakdown:
      • SUBJECT: ARTICLE (the), ADJECTIVE (brown), NOUN (dog)
      • VERB: (likes)
      • OBJECT: ARTICLE (a), ADJECTIVE (good), NOUN (car)

Types of Parsers

  • Top-Down Parsing:

    • Starts from the root and expands downwards.
    • Repeatedly rewrites the start symbol, pursuing the leftmost derivation of the input string.
    • Easier to implement but can be less efficient.
  • Bottom-Up Parsing:

    • Constructs the parse tree from the leaves up to the root.
    • Starts with the input tokens and combines them to form interior nodes.
    • Finds a rightmost derivation and accepts when the start symbol is reached.
    • More prevalent in practical applications.

Terminals and Non-terminals

  • Symbols in Grammar:
    • Terminals: Actual words (e.g., dog, likes).
    • Non-terminals: Categories like SUBJECT, VERB.

Formal Definition of Grammar

  • A grammar is a 4-tuple: G = (Σ, N, P, S)
    • Σ: Finite set of terminal symbols
    • N: Finite set of nonterminal symbols
    • P: Set of production rules
    • S: Start symbol
    • Example for English Grammar:
    • Σ=exta,the,brown,friendly,good,book,refrigerator,dog,car,sings,eats,likesΣ = ext{a, the, brown, friendly, good, book, refrigerator, dog, car, sings, eats, likes}
    • N=extSENTENCE,SUBJECT,VERB,NOUN,OBJECT,ADJECTIVE,ARTICLEN = ext{SENTENCE, SUBJECT, VERB, NOUN, OBJECT, ADJECTIVE, ARTICLE}
    • P=extRules1to7P = ext{Rules 1 to 7}
    • S=extSENTENCES = ext{SENTENCE}

Defining an Infinite Language

  • Infinite languages can be defined using recursive rules. An example is where a SUBJECT/OBJECT can include an indefinite number of adjectives.
  • Rule example:
    • extSUBJECTextARTICLEextADJECTIVESextNOUNext{SUBJECT} → ext{ARTICLE} ext{ADJECTIVES} ext{NOUN}
    • extADJECTIVESextADJECTIVEextADJECTIVESADJECTIVEext{ADJECTIVES} → ext{ADJECTIVE} | ext{ADJECTIVES ADJECTIVE}
    • Example Sentence: "the good brown dog likes a good friendly book"

Chomsky Hierarchy

  • A classification of grammars based on their generative power and form of production rules.
    • Levels:
    • Level 3: Regular grammar
    • Level 2: Context-free grammar
    • Level 1: Context-sensitive grammar
    • Level 0: Unrestricted grammar

Context-Sensitive Grammar

  • Defined as AαB → AβB, where the substitution of α depends on the context provided by surrounding symbols.

Takeaways

  • Understanding language and grammars is crucial in developing parsers and compilers.
  • Recognizing grammars and their types forms a foundational knowledge for computer language processing.
  • Explore top-down vs bottom-up parsing methods depending on the intended application.