Dependency & Constituency Parsing: Detailed Lecture Notes
Session Overview
Informal start: participants greet each other, wait ~5 min for late-comers.
Goal of the class: deepen understanding of parsing in Natural Language Processing (NLP), starting with dependency parsing and moving to statistical constituency parsing.
Dependency Parsing Fundamentals
Focus sentence family: “Vineet will join the board as a non-executive director on 29 June.”
will → modal auxiliary; encodes futurity; attaches to main verb as aux/aux-mod.
join → lexical (main) verb; becomes root of dependency tree.
Dependency parser asks: “What arguments does join need?”
Subject (nsubj) → “Vineet/Lincoln” (NP on the left).
Object (dobj/obj) → “the board” (NP on the right).
Optional modifiers (adjuncts):
• Role: “as a non-executive director” (prep as → obl:arg).
• Time: “on 29 June” (obl:tmod).
• Place: “at the head office” (obl:lmod).
Key property: a verb like join minimally requires two noun phrases; time/place are optional.
Dependency Trees and Non-Projectivity
Parser output visualised as arcs from head → dependent.
Projective tree: drawing arcs above sentence, none cross.
Non-projective structures (crossing arcs) appear in free-word-order languages (and occasionally English via topicalisation). Important for algorithm choice.
Mandatory vs Optional Arguments (Semantic Valency)
Subject & direct object = core arguments (valency of verb).
Time & place = adjuncts (can be absent without ungrammaticality).
Parsing task: detect which dependents are obligatory to satisfy subcategorisation frame.
Transition-Based Parsing (Stack/Buffer Model)
Inspired by LR (shift-reduce) parsing in compilers.
Data structures:
Stack – partially processed words (top = $s_k$).
Buffer – remaining input.
Arc set – constructed dependency edges.
Oracle decides next action by consulting current configuration $\langle S,B,A \rangle$.
Core actions (variants exist):
• – move $b1$ to stack. • – pop two, add arc $b1 \leftarrow sk$. • – pop two, add arc $sk \rightarrow b_1$.
• – pop when head is already found.Greedy: applies first legal transition; only builds projective trees.
Complexity: transitions for sentence of length .
Graph-Based Parsing
Builds fully connected directed graph (each word candidate head of every other word).
Assigns scores using scoring model (usually neural or MST).
Chooses maximum spanning tree (MST) subject to single-head constraint.
Handles non-projective structures; cost: (edge scoring) to (MST inference).
Comparison & Computational Implications
Transition-based: fast, local, suitable for real-time; struggles with long-distance non-projective arcs; susceptible to error-propagation.
Graph-based: globally optimal, handles crossings; heavier computation & memory.
Choice depends on language typology, latency requirements, and hardware.
Probabilistic Context-Free Grammars (PCFGs)
Bridge from dependency to constituency world.
A CFG rule augmented with probability such that .
Example:
Allows ranking of multiple parse trees; highest probability chosen (analogous to most likely hidden path in HMMs).
Constituency Parsing & Distributional Evidence
Hypothesis: strings that occur in the same syntactic environments form constituents.
Distribution test: slot “____ (verb)”
• “Harry the Horse” attracts/loves/sits – valid.
• “Three parties from Brooklyn” attracts/loves/sits – valid.
• “High-class spots such as Mindy’s” attracts? (ungrammatical) → not NP in subject position.Constituency parser learns which noun-phrase structures licence which environments, often via PCFG probabilities.
Context-Free Grammar Refresher
Grammar
• – non-terminals (e.g., ).
• – terminals (lexical words: the, flight, join …).
• – production rules, one non-terminal on LHS.
• – start symbol (usually Sentence).Sample airline fragment:
Parse tree for “a flight” (top-down derivation):
Top-Down vs Bottom-Up Parsing Strategies
Top-Down (recursive-descent): expand from , predict categories, match words.
• Needs backtracking when wrong rule chosen.Bottom-Up (shift-reduce / CYK): begin with words, repeatedly combine into higher phrases until formed.
• Avoids prediction but may build constituents that never appear in final tree; dynamic programming remedies this.
CKY Chart Parsing & Dynamic Programming
Works on Chomsky Normal Form CFG (rules or ).
Builds triangular table storing non-terminals that span words $i \dots j$ and their best probabilities.
Recurrence (probabilistic):
Analogy to Viterbi: each cell encodes highest-prob path to that span; yields most likely parse in .
Parts-of-Speech (POS) Tagging as Pre-processing
Accurate POS tags critical because grammar rules reference POS categories.
Earlier lecture: HMM and Viterbi for sequence labelling.
Ambiguity example: leave (N or V) – syntactic context + probability decides.
Historical & Alternative Formalisms
Augmented Transition Networks (ATN): finite-state chassis with registers & tests; probabilities can augment arcs.
Conceptual Dependency (CD) graphs: semantic roles (actor, object, instrument) → early forerunner of dependency graphs.
Finite-State Models & Logic Programming also mentioned as complementary parsing paradigms.
Practical / Ethical / Real-World Implications
Computational trade-offs influence parser deployed in real systems (voice assistants vs offline analysis).
Over-reliance on greedy decisions can encode bias present in training corpora; probabilistic and neural scorers must be audited.
Parsers power downstream tasks: information extraction, question answering, code generation – erroneous parses propagate.
Key Formulas & Notation Recap
PCFG constraint: .
Transition parser complexity: operations; graph-based MST ≈ scoring + inference.
CKY dynamic rule given above; total complexity .
Recap & Study Tips
Understand difference between dependency (head–dependent) and constituency (phrase structure) representations.
Memorise transition actions and when non-projectivity forces shift from transition-based to graph-based.
Be able to convert simple English sentences into CFG derivations and dependency trees.
Practise CKY table filling on 5-6 word sentences; track probabilities.
Relate compiler LR-parsing knowledge to NLP shift-reduce strategies.
Keep POS tagging accuracy in mind; errors at tag level cascade to parse level.
End of lecture – 5-minute break announced (03:04 → 03:09).