Week 3 Notes: Phrase Structure Grammars (PSG) and Grammar Building

Objective of week 3: move from rules to a full phrase structure grammar that Describes a language and can generate its sentences. We combine a lexicon with a set of phrase structure rules to define a language. A grammar should be descriptive (reflect observations) and generative (produce grammatical sentences and rule out ungrammatical ones).
Key idea: a phrase structure rule defines a constituent (a phrase) and specifies its internal structure, including required heads and allowed modifiers.
Core syntax idea from last week recap:
- Every phrase has a head (XP consists of at least one head X).
- Optional material can appear around the head; order is fixed left-to-right (Y before X, Z after X).
- Notation example: a generic XP rule where XP has a mandatory head X, an optional constituent Y before the head, and zero-or-more Z-phrases after the head.

Formal representation of a general phrase structure rule

The generic pattern described in the lecture, to capture the idea that a phrase consists of a head, optional pre-head material, and optional post-head material:
In formal notation (using X, Y, Z as placeholders for categories):

$XP <br>ightarrow (Y)\; X\; Z^*$

Explanations:
- X is the obligatory head (no parentheses around X because it must appear).
- (Y) denotes that Y is optional (there may be zero or one Y before the head).
- Z^* denotes zero or more Z-phrases after the head (the plus-sign in the lecture corresponds to one-or-more in some contexts; here we use * to denote zero-or-more).
- The elements are read left-to-right: Y (if present), then X, then any number of Z-phrases after X.
Notes on terminology:
- Heads are obligatory within a XP; the head helps define the category of the entire XP.
- The left-to-right order is a written representation of the observed order in language data (spoken order is the underlying metaphor).
- In many datasets, you can choose different but equivalent ways to express optionality; multiple grammars can fit the same data, so you compare rules based on simplicity, generalization, and predictive power.

The two core components of a phrase structure grammar

Lexicon: a list of words with their lexical categories (e.g., noun, verb, determiner, adjective, adverb, proper noun) and, optionally, semantic notes.
Phrase structure rules (PSRs): rules that generate permissible sequences of constituents (e.g., S -> NP VP, NP -> Det N, VP -> V NP, etc.).
A PSG is the combination of (i) a lexicon and (ii) a set of PSRs. This grammar defines a language as all and only the sentences that can be generated from those rules and words.

Toy dataset: seven grammatical sentences and one ungrammatical sentence

The instructor walks through a small, self-contained dataset to illustrate how to observe data, identify word categories, and build a grammar that captures observed patterns.
Observations from the data (high level):
- Some words share the same distribution and thus belong to the same category (example: child and boy behave like nouns).
- Determiner is obligatory with nouns in many contexts (child, the child, the boy, the tall child, etc.).
- The word tall can appear between the determiner and the noun, indicating that Tall behaves like an adjective (a separate category).
- Proper nouns (e.g., a name like Tarly) behave differently from common nouns (cannot follow a determiner like the tall child rule).
- A verb appears after the NP (or as V) and can sometimes be followed by another NP (V NP) showing the possibility of a transitive structure.
- An adverb (e.g., yesterday) can appear at the beginning or end of a sentence.
- Some sentences involve a single noun phrase as an NP and a verb (e.g., “the child danced”) while others use a verb plus a following NP (e.g., “the child faced the boy”).
Consequence: as you observe distributional patterns, you assign lexical categories and then propose PSRs that capture those patterns.

Example: building a small descriptive grammar for the toy language

Lexicon (initial, English-based example):
- Determiner: the
- Nouns (common): child, boy
- Adjective: tall
- Adverb: yesterday
- Proper noun: Tarly (example of a name)
- Verbs (base forms): dance, face (as stems)
Observed patterns leading to categories:
- child, boy: Nouns (N)
- the: Determiner (Det)
- tall: Adjective (Adj)
- yesterday: Adverb (Adv)
- Tarly: Proper noun (PN)
Lexicon-derived insights:
- Determiner is obligatory with common nouns in the observed data (the child, the boy, the tall child).
- A noun phrase (NP) can be Det + N (+ Adj) or PN (proper noun).
- The word tall occurs between Det and N, so Det Adj N is a plausible NP structure; Det N is also possible when there is no adjective.
- PN cannot appear with a determiner (e.g., the Tarly danced is ungrammatical in the dataset), suggesting PN forms its own NP type.

Provisional phrase structure rules (PSRs) to capture the toy data

NP (noun phrase) rules:
- NP → PN
- NP → Det N
- NP → Det Adj N
Sentence structure rule (S):
- S → AdvP NP VP | NP VP AdvP
- This captures the observed possibility of a sentence starting with an adverb or ending with an adverb.
Verb phrase (VP) rules:
- VP → V NP | V
- This captures intransitive and transitive verbs (V with or without an following NP).
- V can be expanded by a word-formation rule if morphology is used (see Morphology below).
Verb (V) rule via word-formation (morphology example):
- Verb → VerbStem Suffix
- VerbStem ∈ { dance, face }
- Suffix ∈ { -ed, -s }
- Optionality note: A morph rule could also allow Verb → VerbStem (i.e., no suffix) if the data support zero suffix for present tense; otherwise, the default is to require a suffix as shown.
Adverb phrase (AdvP) rule:
- AdvP → Adv
- For simplicity in this toy dataset, AdvP is the single adverb (yesterday).
Optional adverb position as part of S:
- S → Adv NP VP | NP VP Adv (and in some formulations: S → AdvP NP VP | NP VP AdvP)
Notes on predictivity and constraints:
- These PSRs are not the only possible set you could write; several grammars can account for the same data. Model choice depends on how well the grammar predicts unobserved data and how concise it is.
- Generative vs constraint-based approaches: a purely generative grammar may generate sentences that aren’t observed; a constraint-based approach might prune those later. The course emphasizes starting from a robust generative system and then considering constraints as needed.
Important caveats:
- The grammar should capture the observed seven grammatical sentences and exclude the single ungrammatical one.
- Morphology, when introduced, can create predictions (e.g., faces) that must be checked against evidence.

How to evaluate a PSG (descriptive and predictive)

Descriptive criterion: the rules must clearly and accurately represent the language data you observed.
Generative criterion: the rules should generate all grammatical sentences in the data (and only grammatical ones, if possible).
Practical evaluation steps:
- Manually test whether the rules generate the observed sentences.
- Check for predicted but unattested sentences (e.g., child faced yesterday) and assess whether those predictions are plausible or require more data.
- Compare competing grammars: prefer the simpler or more explanatory grammar if both generate the observed data (Occam’s razor, general scientific heuristics).
- If two grammars generate the same set of sentences, prefer the shorter/economical one (fewer rules).
- For more rigorous evaluation, test multiple grammars against a larger dataset and compare predictive power and conciseness.

Trees, dominance, and constituency (conceptual tools)

Phrase structure trees represent the hierarchical organization captured by rules.
Key terms:
- Node: a label in the tree (e.g., NP, Det, N, Adj, VP, S).
- Mother (or dominance): A node that immediately dominates its child node(s) (e.g., NP immediately dominates Det, N, Adj in NP → Det Adj N).
- Daughter/Sister: Nodes that share the same mother are sisters.
- Root: The topmost node of the tree (e.g., S or a top-level phrase).
- Leaves/terminal nodes: The actual words in the sentence (e.g., the, tall, child).
- Linear precedence: The left-to-right order of nodes in the tree corresponds to the surface order in the sentence.
Important relational facts:
- Transitivity of dominance: If A dominates B and B dominates C, then A dominates C as well.
- If a constituent modifies a head, it should be a sister to that head (e.g., very yellow modifies blue; so VP and AdjP are arranged so that modifiers are sisters to the head they modify).
Practical takeaway: trees demonstrate the constituent structure and help visualize how a rule applies to form a grammatical sentence.

Single-word phrases and constituency tests

You can have a phrase that consists of a single word (e.g., a single noun like cats, a single verb like ran, or a single adverb).
Reason for treating single words as phrases:
- It preserves constituency economy: you can treat a single word as a phrase and apply the same coordination/substitution tests.
- It allows rules like S → NP VP to stay simple, while also permitting a single-word NP or VP when appropriate.
Evidence for single-word: coordination and substitution tests show that single words can behave like larger phrases of the same type (e.g., I like cats and I like them; cats can be replaced by pronouns; coordination requires same category constituents).

Limitations of pure phrase structure grammars and extensions

PSGs capture constituency (the grouping of words) but not full functional or dependency information (who does what to whom).
Examples where PSGs fall short:
- Transitivity and argument structure (e.g., who is the subject and who is the object).
- Long-distance dependencies and subject-verb agreement across clauses (e.g., number or case concordance across embedded clauses).
Possible extensions to PSGs:
- Subcategorization: classify verbs by their argument structure (intransitive, transitive, ditransitive) and encode it in the lexicon or with more detailed PSRs.
- Functional information: annotation to capture who is doing what to whom, possibly with feature structures (e.g., subject vs object, number/gender agreement).
- Post-generation constraints: a secondary process that filters sentences to remove ungrammatical outputs or ensure certain constraints (e.g., only transitive verbs take objects).
- Morphology and word formation: stems + suffixes (e.g., dance + -ed → danced, face + -s → faces) to capture tense and agreement morphology.
The course emphasizes starting with a robust generative framework and then considering annotations, subcategorization, and morphology to capture richer linguistic phenomena.

Morphology and word formation (when evidence arises)

If data show systematic morphological distinctions (e.g., tense or number), you can introduce word-formation rules to capture it.
An example of a simple word-formation rule:

$Verb = VerbStem\;Suffix$

Where VerbStem ∈ { dance, face } and Suffix ∈ { -ed, -s }.
Optionality note: Sometimes a suffix may be optional depending on data; you can encode this as Verb → VerbStem [Suffix].
Consequences of adding morphology:
- You generate new words (e.g., faces) that you must test against the data.
- You add predictive power (e.g., you predict possible forms like faces) and you must verify whether those forms occur in the language being described.
- You extend the lexicon and add a word-formation layer to your grammar. This turns the PSG into a more complete grammar with three components: Lexicon, Word-formation rules, and Phrase structure rules.
Practical modeling note: You should only add morphological structure if there is observable evidence; otherwise, keep it simple.

How to revise a grammar when new data arrive

When new sentences appear (e.g., the boy dances is observed), you revise the grammar accordingly. Steps:
- Add new lexical entries if necessary (e.g., dances as a verb form derived from dance).
- Introduce word-formation rules to capture the new morphological patterns (e.g., VerbStem + Suffix → Verb).
- Update PSRs to accommodate new patterns (e.g., a V that can be followed optionally by an NP, or a word formation that enables new surface forms).
- Reassess the grammar’s predictions: does it still generate all grammatical sentences and exclude ungrammatical ones?
The example in the lecture shows that adding a suffix rule may predict forms like faces; you then check whether those forms exist in the language or dataset and adjust accordingly.

Philosophical and practical implications

Philosophical questions raised:
- How much structure do we need to describe language? A PSG captures constituency well but may miss functional/semantic relations; do we need more powerful representations?
- How do we balance data-driven observations with theoretical constraints (e.g., universal grammar, cognitive plausibility) when building grammars?
Practical implications for linguistics and NLP:
- PSGs provide a clear, testable framework for describing sentence structure and building grammars for languages (including artificial or minority languages).
- The step-by-step process (observe data, assign categories, propose rules, test predictions) mirrors scientific methodology and can guide data-driven grammar engineering.
- The limitations of PSGs motivate incorporation of morphology, syntax-semantics interfaces, and dependency-based analyses to handle real-world language phenomena.

Key takeaways and workflow you should remember

A PSG is a language description built from two pieces: a lexicon and a set of PSRs.
A general PSR pattern you’ll use often:

$XP \rightarrow (Y)\; X\; Z^*$

Concrete toy dataset rules example (one possible set):
- NP → PN | Det N | Det Adj N
- AdvP → Adv
- VP → V NP | V
- S → AdvP NP VP | NP VP AdvP
- Verb formation (morphology): Verb → VerbStem Suffix, with VerbStem ∈ { dance, face } and Suffix ∈ { -ed, -s }
Constraints and testing:
- The grammar should generate all grammatical sentences in your corpus and exclude ungrammatical ones.
- You can introduce constraints or subcategorization to capture more complex data (transitives, ditransitives, subject-verb agreement).
- There can be multiple grammars that fit the same data; prefer simpler and more predictive grammars.
Beyond PSGs: you’ll encounter the need for functional/dependency information, longer-distance dependencies, and a more sophisticated treatment of morphology to handle real languages.

Quick recap of the terminology you should know

Phrase structure rule (PSR): A rule that defines how a phrase is built from its parts.
Lexicon: The inventory of words with their categories and basic meanings.
Node/Mother/Sister/Root/Leaves: Terms used to describe trees that visualize structure.
Dominance: A node A dominates node B if B is a descendant of A in the tree.
Linear precedence: The ordering of words within a sentence as read left-to-right in the tree.
NP, VP, S, Det, N, Adj, Adv, PN: Standard labels for constituents and parts of speech.
Morphology/Word formation: Rules that create new word forms from stems (e.g., dance + -ed → danced).

Practical tips for your upcoming assignment

Start by listing all words in the dataset and classifying them by distribution (distributional analysis) to assign lexical categories.
Build a minimal but consistent lexicon that reflects the observations (det obligatoriness with nouns, Adjective placement, PN behavior).
Draft PSRs that capture the observed data, then test against all sentences to ensure coverage and non-coverage of ungrammatical forms.
Consider possible morphologies if the data hint at tense, number, or other inflectional distinctions.
Be explicit about the reasoning behind each rule: Why does NP have a Det Adj N option? Why can PN stand alone as NP? Why is AdvP optional at sentence-level?
If you introduce alternative grammars, evaluate them on simplicity and predictive power, not just data fit.

If you have questions about how to apply any of these rules to a new dataset (especially non-English data or data with richer morphology), we can walk through a concrete example step by step.