Dependency & Constituency Parsing: Detailed Lecture Notes

Session Overview

  • Informal start: participants greet each other, wait ~5 min for late-comers.

  • Goal of the class: deepen understanding of parsing in Natural Language Processing (NLP), starting with dependency parsing and moving to statistical constituency parsing.


Dependency Parsing Fundamentals

  • Focus sentence family: “Vineet will join the board as a non-executive director on 29 June.”

    • will → modal auxiliary; encodes futurity; attaches to main verb as aux/aux-mod.

    • join → lexical (main) verb; becomes root of dependency tree.

    • Dependency parser asks: “What arguments does join need?”

    1. Subject (nsubj) → “Vineet/Lincoln” (NP on the left).

    2. Object (dobj/obj) → “the board” (NP on the right).

    3. Optional modifiers (adjuncts):
      • Role: “as a non-executive director” (prep asobl:arg).
      • Time: “on 29 June” (obl:tmod).
      • Place: “at the head office” (obl:lmod).

  • Key property: a verb like join minimally requires two noun phrases; time/place are optional.


Dependency Trees and Non-Projectivity

  • Parser output visualised as arcs from head → dependent.

  • Projective tree: drawing arcs above sentence, none cross.

  • Non-projective structures (crossing arcs) appear in free-word-order languages (and occasionally English via topicalisation). Important for algorithm choice.


Mandatory vs Optional Arguments (Semantic Valency)

  • Subject & direct object = core arguments (valency of verb).

  • Time & place = adjuncts (can be absent without ungrammaticality).

  • Parsing task: detect which dependents are obligatory to satisfy subcategorisation frame.


Transition-Based Parsing (Stack/Buffer Model)

  • Inspired by LR (shift-reduce) parsing in compilers.

  • Data structures:

    1. Stack (S=s<em>1,,s</em>k)(S=s<em>1,\dots,s</em>k) – partially processed words (top = $s_k$).

    2. Buffer (B=b<em>1,,b</em>m)(B=b<em>1,\dots,b</em>m) – remaining input.

    3. Arc set (A)(A) – constructed dependency edges.

  • Oracle decides next action by consulting current configuration $\langle S,B,A \rangle$.

  • Core actions (variants exist):
    SHIFT\text{SHIFT} – move $b1$ to stack. • LEFT dep\text{LEFT dep} – pop two, add arc $b1 \leftarrow sk$. • RIGHT dep\text{RIGHT dep} – pop two, add arc $sk \rightarrow b_1$.
    REDUCE\text{REDUCE} – pop when head is already found.

  • Greedy: applies first legal transition; only builds projective trees.

  • Complexity: O(n)O(n) transitions for sentence of length nn.


Graph-Based Parsing

  • Builds fully connected directed graph (each word candidate head of every other word).

  • Assigns scores w(ij)w(i \rightarrow j) using scoring model (usually neural or MST).

  • Chooses maximum spanning tree (MST) subject to single-head constraint.

  • Handles non-projective structures; cost: O(n2)O(n^2) (edge scoring) to O(n3)O(n^3) (MST inference).


Comparison & Computational Implications

  • Transition-based: fast, local, suitable for real-time; struggles with long-distance non-projective arcs; susceptible to error-propagation.

  • Graph-based: globally optimal, handles crossings; heavier computation & memory.

  • Choice depends on language typology, latency requirements, and hardware.


Probabilistic Context-Free Grammars (PCFGs)

  • Bridge from dependency to constituency world.

  • A CFG rule AαA \rightarrow \alpha augmented with probability P(Aα)P(A \rightarrow \alpha) such that αP(Aα)=1\sum_{\alpha} P(A \rightarrow \alpha)=1.

  • Example:
    NPDet  NominalP=0.7NP \rightarrow Det\;Nominal \quad P=0.7
    NPProperNounP=0.3NP \rightarrow ProperNoun \quad P=0.3

  • Allows ranking of multiple parse trees; highest probability chosen (analogous to most likely hidden path in HMMs).


Constituency Parsing & Distributional Evidence

  • Hypothesis: strings that occur in the same syntactic environments form constituents.

  • Distribution test: slot “____ (verb)”
    • “Harry the Horse” attracts/loves/sits – valid.
    • “Three parties from Brooklyn” attracts/loves/sits – valid.
    • “High-class spots such as Mindy’s” attracts? (ungrammatical) → not NP in subject position.

  • Constituency parser learns which noun-phrase structures licence which environments, often via PCFG probabilities.


Context-Free Grammar Refresher

  • Grammar G=(N,Σ,R,S)G=(N,\Sigma,R,S)
    NN – non-terminals (e.g., S,NP,VPS, NP, VP).
    Σ\Sigma – terminals (lexical words: the, flight, join …).
    RR – production rules, one non-terminal on LHS.
    SS – start symbol (usually Sentence).

  • Sample airline fragment:

    1. SNP  VPS \rightarrow NP\;VP

    2. VPV  NPV  NP  PPVP \rightarrow V\;NP \mid V\;NP\;PP

    3. NPDet  NominalProperNounNP \rightarrow Det\;Nominal \mid ProperNoun

    4. NominalNN  NominalNominal \rightarrow N \mid N\;Nominal

    5. PPPrep  NPPP \rightarrow Prep\;NP

  • Parse tree for “a flight” (top-down derivation):
    NPDet  Nominala  Nominala  Na  flightNP \Rightarrow Det\;Nominal \Rightarrow a\;Nominal \Rightarrow a\;N \Rightarrow a\;flight


Top-Down vs Bottom-Up Parsing Strategies

  • Top-Down (recursive-descent): expand from SS, predict categories, match words.
    • Needs backtracking when wrong rule chosen.

  • Bottom-Up (shift-reduce / CYK): begin with words, repeatedly combine into higher phrases until SS formed.
    • Avoids prediction but may build constituents that never appear in final tree; dynamic programming remedies this.


CKY Chart Parsing & Dynamic Programming

  • Works on Chomsky Normal Form CFG (rules ABCA \rightarrow BC or AaA \rightarrow a).

  • Builds triangular table T[i,j]T[i,j] storing non-terminals that span words $i \dots j$ and their best probabilities.

  • Recurrence (probabilistic):
    P(A,i,j)=maxk,B,C[P(ABC)×P(B,i,k)×P(C,k,j)]P(A, i,j) = \max_{k, B,C} \big[ P(A \rightarrow BC) \times P(B,i,k) \times P(C,k,j) \big]

  • Analogy to Viterbi: each cell encodes highest-prob path to that span; yields most likely parse in O(n3G)O(n^3|G|).


Parts-of-Speech (POS) Tagging as Pre-processing

  • Accurate POS tags critical because grammar rules reference POS categories.

  • Earlier lecture: HMM and Viterbi for sequence labelling.

  • Ambiguity example: leave (N or V) – syntactic context + probability decides.


Historical & Alternative Formalisms

  • Augmented Transition Networks (ATN): finite-state chassis with registers & tests; probabilities can augment arcs.

  • Conceptual Dependency (CD) graphs: semantic roles (actor, object, instrument) → early forerunner of dependency graphs.

  • Finite-State Models & Logic Programming also mentioned as complementary parsing paradigms.


Practical / Ethical / Real-World Implications

  • Computational trade-offs influence parser deployed in real systems (voice assistants vs offline analysis).

  • Over-reliance on greedy decisions can encode bias present in training corpora; probabilistic and neural scorers must be audited.

  • Parsers power downstream tasks: information extraction, question answering, code generation – erroneous parses propagate.


Key Formulas & Notation Recap

  • PCFG constraint: αP(Aα)=1\sum_{\alpha} P(A \rightarrow \alpha)=1.

  • Transition parser complexity: Θ(n)\Theta(n) operations; graph-based MST ≈ Θ(n2)\Theta(n^2) scoring + Θ(n2)\Theta(n^2) inference.

  • CKY dynamic rule given above; total complexity O(n3)O(n^3).


Recap & Study Tips

  • Understand difference between dependency (head–dependent) and constituency (phrase structure) representations.

  • Memorise transition actions and when non-projectivity forces shift from transition-based to graph-based.

  • Be able to convert simple English sentences into CFG derivations and dependency trees.

  • Practise CKY table filling on 5-6 word sentences; track probabilities.

  • Relate compiler LR-parsing knowledge to NLP shift-reduce strategies.

  • Keep POS tagging accuracy in mind; errors at tag level cascade to parse level.

End of lecture – 5-minute break announced (03:04 → 03:09).