Week 3 Notes: Phrase Structure Grammars (PSG) and Grammar Building

Week 3 Notes: Phrase Structure Grammars (PSG) and Grammar Building

  • Objective of week 3: move from rules to a full phrase structure grammar that Describes a language and can generate its sentences. We combine a lexicon with a set of phrase structure rules to define a language. A grammar should be descriptive (reflect observations) and generative (produce grammatical sentences and rule out ungrammatical ones).

  • Key idea: a phrase structure rule defines a constituent (a phrase) and specifies its internal structure, including required heads and allowed modifiers.

  • Core syntax idea from last week recap:

    • Every phrase has a head (XP consists of at least one head X).

    • Optional material can appear around the head; order is fixed left-to-right (Y before X, Z after X).

    • Notation example: a generic XP rule where XP has a mandatory head X, an optional constituent Y before the head, and zero-or-more Z-phrases after the head.

Formal representation of a general phrase structure rule

  • The generic pattern described in the lecture, to capture the idea that a phrase consists of a head, optional pre-head material, and optional post-head material:

  • In formal notation (using X, Y, Z as placeholders for categories):

XP<br>ightarrow(Y)  X  ZXP <br>ightarrow (Y)\; X\; Z^*

  • Explanations:

    • X is the obligatory head (no parentheses around X because it must appear).

    • (Y) denotes that Y is optional (there may be zero or one Y before the head).

    • Z^* denotes zero or more Z-phrases after the head (the plus-sign in the lecture corresponds to one-or-more in some contexts; here we use * to denote zero-or-more).

    • The elements are read left-to-right: Y (if present), then X, then any number of Z-phrases after X.

  • Notes on terminology:

    • Heads are obligatory within a XP; the head helps define the category of the entire XP.

    • The left-to-right order is a written representation of the observed order in language data (spoken order is the underlying metaphor).

    • In many datasets, you can choose different but equivalent ways to express optionality; multiple grammars can fit the same data, so you compare rules based on simplicity, generalization, and predictive power.

The two core components of a phrase structure grammar

  • Lexicon: a list of words with their lexical categories (e.g., noun, verb, determiner, adjective, adverb, proper noun) and, optionally, semantic notes.

  • Phrase structure rules (PSRs): rules that generate permissible sequences of constituents (e.g., S -> NP VP, NP -> Det N, VP -> V NP, etc.).

  • A PSG is the combination of (i) a lexicon and (ii) a set of PSRs. This grammar defines a language as all and only the sentences that can be generated from those rules and words.

Toy dataset: seven grammatical sentences and one ungrammatical sentence

  • The instructor walks through a small, self-contained dataset to illustrate how to observe data, identify word categories, and build a grammar that captures observed patterns.

  • Observations from the data (high level):

    • Some words share the same distribution and thus belong to the same category (example: child and boy behave like nouns).

    • Determiner is obligatory with nouns in many contexts (child, the child, the boy, the tall child, etc.).

    • The word tall can appear between the determiner and the noun, indicating that Tall behaves like an adjective (a separate category).

    • Proper nouns (e.g., a name like Tarly) behave differently from common nouns (cannot follow a determiner like the tall child rule).

    • A verb appears after the NP (or as V) and can sometimes be followed by another NP (V NP) showing the possibility of a transitive structure.

    • An adverb (e.g., yesterday) can appear at the beginning or end of a sentence.

    • Some sentences involve a single noun phrase as an NP and a verb (e.g., “the child danced”) while others use a verb plus a following NP (e.g., “the child faced the boy”).

  • Consequence: as you observe distributional patterns, you assign lexical categories and then propose PSRs that capture those patterns.

Example: building a small descriptive grammar for the toy language

  • Lexicon (initial, English-based example):

    • Determiner: the

    • Nouns (common): child, boy

    • Adjective: tall

    • Adverb: yesterday

    • Proper noun: Tarly (example of a name)

    • Verbs (base forms): dance, face (as stems)

  • Observed patterns leading to categories:

    • child, boy: Nouns (N)

    • the: Determiner (Det)

    • tall: Adjective (Adj)

    • yesterday: Adverb (Adv)

    • Tarly: Proper noun (PN)

  • Lexicon-derived insights:

    • Determiner is obligatory with common nouns in the observed data (the child, the boy, the tall child).

    • A noun phrase (NP) can be Det + N (+ Adj) or PN (proper noun).

    • The word tall occurs between Det and N, so Det Adj N is a plausible NP structure; Det N is also possible when there is no adjective.

    • PN cannot appear with a determiner (e.g., the Tarly danced is ungrammatical in the dataset), suggesting PN forms its own NP type.

Provisional phrase structure rules (PSRs) to capture the toy data

  • NP (noun phrase) rules:

    • NP → PN

    • NP → Det N

    • NP → Det Adj N

  • Sentence structure rule (S):

    • S → AdvP NP VP | NP VP AdvP

    • This captures the observed possibility of a sentence starting with an adverb or ending with an adverb.

  • Verb phrase (VP) rules:

    • VP → V NP | V

    • This captures intransitive and transitive verbs (V with or without an following NP).

    • V can be expanded by a word-formation rule if morphology is used (see Morphology below).

  • Verb (V) rule via word-formation (morphology example):

    • Verb → VerbStem Suffix

    • VerbStem ∈ { dance, face }

    • Suffix ∈ { -ed, -s }

    • Optionality note: A morph rule could also allow Verb → VerbStem (i.e., no suffix) if the data support zero suffix for present tense; otherwise, the default is to require a suffix as shown.

  • Adverb phrase (AdvP) rule:

    • AdvP → Adv

    • For simplicity in this toy dataset, AdvP is the single adverb (yesterday).

  • Optional adverb position as part of S:

    • S → Adv NP VP | NP VP Adv (and in some formulations: S → AdvP NP VP | NP VP AdvP)

  • Notes on predictivity and constraints:

    • These PSRs are not the only possible set you could write; several grammars can account for the same data. Model choice depends on how well the grammar predicts unobserved data and how concise it is.

    • Generative vs constraint-based approaches: a purely generative grammar may generate sentences that aren’t observed; a constraint-based approach might prune those later. The course emphasizes starting from a robust generative system and then considering constraints as needed.

  • Important caveats:

    • The grammar should capture the observed seven grammatical sentences and exclude the single ungrammatical one.

    • Morphology, when introduced, can create predictions (e.g., faces) that must be checked against evidence.

How to evaluate a PSG (descriptive and predictive)

  • Descriptive criterion: the rules must clearly and accurately represent the language data you observed.

  • Generative criterion: the rules should generate all grammatical sentences in the data (and only grammatical ones, if possible).

  • Practical evaluation steps:

    • Manually test whether the rules generate the observed sentences.

    • Check for predicted but unattested sentences (e.g., child faced yesterday) and assess whether those predictions are plausible or require more data.

    • Compare competing grammars: prefer the simpler or more explanatory grammar if both generate the observed data (Occam’s razor, general scientific heuristics).

    • If two grammars generate the same set of sentences, prefer the shorter/economical one (fewer rules).

    • For more rigorous evaluation, test multiple grammars against a larger dataset and compare predictive power and conciseness.

Trees, dominance, and constituency (conceptual tools)

  • Phrase structure trees represent the hierarchical organization captured by rules.

  • Key terms:

    • Node: a label in the tree (e.g., NP, Det, N, Adj, VP, S).

    • Mother (or dominance): A node that immediately dominates its child node(s) (e.g., NP immediately dominates Det, N, Adj in NP → Det Adj N).

    • Daughter/Sister: Nodes that share the same mother are sisters.

    • Root: The topmost node of the tree (e.g., S or a top-level phrase).

    • Leaves/terminal nodes: The actual words in the sentence (e.g., the, tall, child).

    • Linear precedence: The left-to-right order of nodes in the tree corresponds to the surface order in the sentence.

  • Important relational facts:

    • Transitivity of dominance: If A dominates B and B dominates C, then A dominates C as well.

    • If a constituent modifies a head, it should be a sister to that head (e.g., very yellow modifies blue; so VP and AdjP are arranged so that modifiers are sisters to the head they modify).

  • Practical takeaway: trees demonstrate the constituent structure and help visualize how a rule applies to form a grammatical sentence.

Single-word phrases and constituency tests

  • You can have a phrase that consists of a single word (e.g., a single noun like cats, a single verb like ran, or a single adverb).

  • Reason for treating single words as phrases:

    • It preserves constituency economy: you can treat a single word as a phrase and apply the same coordination/substitution tests.

    • It allows rules like S → NP VP to stay simple, while also permitting a single-word NP or VP when appropriate.

  • Evidence for single-word: coordination and substitution tests show that single words can behave like larger phrases of the same type (e.g., I like cats and I like them; cats can be replaced by pronouns; coordination requires same category constituents).

Limitations of pure phrase structure grammars and extensions

  • PSGs capture constituency (the grouping of words) but not full functional or dependency information (who does what to whom).

  • Examples where PSGs fall short:

    • Transitivity and argument structure (e.g., who is the subject and who is the object).

    • Long-distance dependencies and subject-verb agreement across clauses (e.g., number or case concordance across embedded clauses).

  • Possible extensions to PSGs:

    • Subcategorization: classify verbs by their argument structure (intransitive, transitive, ditransitive) and encode it in the lexicon or with more detailed PSRs.

    • Functional information: annotation to capture who is doing what to whom, possibly with feature structures (e.g., subject vs object, number/gender agreement).

    • Post-generation constraints: a secondary process that filters sentences to remove ungrammatical outputs or ensure certain constraints (e.g., only transitive verbs take objects).

    • Morphology and word formation: stems + suffixes (e.g., dance + -ed → danced, face + -s → faces) to capture tense and agreement morphology.

  • The course emphasizes starting with a robust generative framework and then considering annotations, subcategorization, and morphology to capture richer linguistic phenomena.

Morphology and word formation (when evidence arises)

  • If data show systematic morphological distinctions (e.g., tense or number), you can introduce word-formation rules to capture it.

  • An example of a simple word-formation rule:

Verb=VerbStem  SuffixVerb = VerbStem\;Suffix

  • Where VerbStem ∈ { dance, face } and Suffix ∈ { -ed, -s }.

  • Optionality note: Sometimes a suffix may be optional depending on data; you can encode this as Verb → VerbStem [Suffix].

  • Consequences of adding morphology:

    • You generate new words (e.g., faces) that you must test against the data.

    • You add predictive power (e.g., you predict possible forms like faces) and you must verify whether those forms occur in the language being described.

    • You extend the lexicon and add a word-formation layer to your grammar. This turns the PSG into a more complete grammar with three components: Lexicon, Word-formation rules, and Phrase structure rules.

  • Practical modeling note: You should only add morphological structure if there is observable evidence; otherwise, keep it simple.

How to revise a grammar when new data arrive

  • When new sentences appear (e.g., the boy dances is observed), you revise the grammar accordingly. Steps:

    • Add new lexical entries if necessary (e.g., dances as a verb form derived from dance).

    • Introduce word-formation rules to capture the new morphological patterns (e.g., VerbStem + Suffix → Verb).

    • Update PSRs to accommodate new patterns (e.g., a V that can be followed optionally by an NP, or a word formation that enables new surface forms).

    • Reassess the grammar’s predictions: does it still generate all grammatical sentences and exclude ungrammatical ones?

  • The example in the lecture shows that adding a suffix rule may predict forms like faces; you then check whether those forms exist in the language or dataset and adjust accordingly.

Philosophical and practical implications

  • Philosophical questions raised:

    • How much structure do we need to describe language? A PSG captures constituency well but may miss functional/semantic relations; do we need more powerful representations?

    • How do we balance data-driven observations with theoretical constraints (e.g., universal grammar, cognitive plausibility) when building grammars?

  • Practical implications for linguistics and NLP:

    • PSGs provide a clear, testable framework for describing sentence structure and building grammars for languages (including artificial or minority languages).

    • The step-by-step process (observe data, assign categories, propose rules, test predictions) mirrors scientific methodology and can guide data-driven grammar engineering.

    • The limitations of PSGs motivate incorporation of morphology, syntax-semantics interfaces, and dependency-based analyses to handle real-world language phenomena.

Key takeaways and workflow you should remember

  • A PSG is a language description built from two pieces: a lexicon and a set of PSRs.

  • A general PSR pattern you’ll use often:

XP(Y)  X  ZXP \rightarrow (Y)\; X\; Z^*

  • Concrete toy dataset rules example (one possible set):

    • NP → PN | Det N | Det Adj N

    • AdvP → Adv

    • VP → V NP | V

    • S → AdvP NP VP | NP VP AdvP

    • Verb formation (morphology): Verb → VerbStem Suffix, with VerbStem ∈ { dance, face } and Suffix ∈ { -ed, -s }

  • Constraints and testing:

    • The grammar should generate all grammatical sentences in your corpus and exclude ungrammatical ones.

    • You can introduce constraints or subcategorization to capture more complex data (transitives, ditransitives, subject-verb agreement).

    • There can be multiple grammars that fit the same data; prefer simpler and more predictive grammars.

  • Beyond PSGs: you’ll encounter the need for functional/dependency information, longer-distance dependencies, and a more sophisticated treatment of morphology to handle real languages.

Quick recap of the terminology you should know

  • Phrase structure rule (PSR): A rule that defines how a phrase is built from its parts.

  • Lexicon: The inventory of words with their categories and basic meanings.

  • Node/Mother/Sister/Root/Leaves: Terms used to describe trees that visualize structure.

  • Dominance: A node A dominates node B if B is a descendant of A in the tree.

  • Linear precedence: The ordering of words within a sentence as read left-to-right in the tree.

  • NP, VP, S, Det, N, Adj, Adv, PN: Standard labels for constituents and parts of speech.

  • Morphology/Word formation: Rules that create new word forms from stems (e.g., dance + -ed → danced).

Practical tips for your upcoming assignment

  • Start by listing all words in the dataset and classifying them by distribution (distributional analysis) to assign lexical categories.

  • Build a minimal but consistent lexicon that reflects the observations (det obligatoriness with nouns, Adjective placement, PN behavior).

  • Draft PSRs that capture the observed data, then test against all sentences to ensure coverage and non-coverage of ungrammatical forms.

  • Consider possible morphologies if the data hint at tense, number, or other inflectional distinctions.

  • Be explicit about the reasoning behind each rule: Why does NP have a Det Adj N option? Why can PN stand alone as NP? Why is AdvP optional at sentence-level?

  • If you introduce alternative grammars, evaluate them on simplicity and predictive power, not just data fit.

If you have questions about how to apply any of these rules to a new dataset (especially non-English data or data with richer morphology), we can walk through a concrete example step by step.