Dependency Parsing – Transition-Based vs. Graph-Based
Dependency Parsing Approaches
• Two mainstream algorithms convert a sentence into a dependency tree:
– Transition-based parsing (stack/LR style, greedy, efficient, only guaranteed for projective trees = trees whose arcs never cross).
– Graph-based parsing (global search for the highest-scoring tree; can handle non-projectivity but is costlier to implement/run).
• Choice affects coverage (projective vs. non-projective languages), speed, memory & code complexity.
Transition-Based Parsing (Stack, Greedy)
• Conceptually identical to LR shift–reduce parsing in compilers.
• Parser state («configuration») = ⧼Stack, Input-Buffer, Set-of-built-dependencies⧽.
• Two data alphabets:
– Input buffer alphabet = surface words.
– Stack alphabet = symbolic summaries; top-of-stack symbol represents entire stack to avoid repeated popping (stack is destructive memory).
• Parsing loop (left-to-right, single pass):
1. Inspect top two stack items & next buffer word.
2. Use an oracle (rule classifier) to decide one of three transitions.
3. Execute transition, update configuration.
4. Repeat until stack = {ROOT} and buffer empty.
• Greedy aspect: first legal reduction is applied; parser never backtracks → fails on non-projective sentences whose correct tree would require postponing a reduction.
Transition Operations: Shift, Left-Arc, Right-Arc
• Shift (SH): move first word from buffer to top of stack; no dependency edge added.
• Left-Arc (LA):
– Preconditions: stack has at least two items.
– Action: assert dependency «stack[0] ◀── stack[1]» (top depends on word beneath).
– Pop LOWER of the two (L removes Lower).
• Right-Arc (RA):
– Preconditions: stack has at least two items.
– Action: assert dependency «stack[1] ───▶ stack[0]» (second-top depends on top).
– Pop TOP item (R removes Top).
• Oracle learns when each transition is correct.
Worked Example: "Book me the morning flight"
Step | Stack (top→left) | Buffer | Action | New Edge |
|---|---|---|---|---|
0 | ROOT | book me the morning flight | SH | – |
1 | book ROOT | me the morning flight | SH | – |
2 | me book ROOT | the morning flight | RA | book ▶ me (OBJ) |
3 | book ROOT | the morning flight | SH | – |
4 | the book ROOT | morning flight | SH | – |
5 | morning the book ROOT | flight | SH | – |
6 | flight morning the book ROOT | LA | flight ◀ morning (NMOD) | |
7 | flight the book ROOT | LA | flight ◀ the (DET) | |
8 | flight book ROOT | RA | book ▶ flight (DOBJ) | |
9 | book ROOT | RA | ROOT ▶ book (ROOT) | |
10 | ROOT | – | parsing complete | |
Dependency graph: ROOT→book; book→{me, flight}; flight→{the, morning}. |
Training a Transition Parser (Oracle & Supervision)
• Treebanks give only final correct trees, not intermediate (config, transition) pairs.
• Solution: simulate parser on each gold tree; at every configuration consult a dynamic oracle that outputs the transition(s) guaranteed to reach the gold tree; collect (feature-vector, transition) pairs.
• Learn classifier (Perceptron, SVM, Neural net) that predicts transition from features (stack top/below, buffer head, POS tags, etc.).
Graph-Based Parsing (MST Search)
• Parser treats every possible ordered pair (h,d) as a directed edge candidate with score .
• The best tree maximises total edge score:
.
• Finding is equivalent to a Maximum Spanning Tree (MST) problem on a directed graph whose nodes are words+ROOT.
• Non-projectivity naturally allowed because MST need not obey planar ordering.
Chu-Liu / Edmonds Algorithm (Directed MST)
For each vertex choose the highest-scoring incoming edge; put all such edges in set .
If is acyclic ⇒ it is the MST; return.
Else contract each cycle into a super-node, adjust edge weights: .
Recurse on contracted graph to obtain MST of super-nodes.
Expand cycles, re-insert previously removed best-in edges.
• Runs in time; practical for sentence lengths (<100 words).
Edge Scoring & Feature Design
• Linear model: .
• Typical feature templates:
– Word forms, lemmas, POS of head & dependent.
– Direction & distance (tokens) between head/dependent.
– Surrounding context words/POS tags (±1, ±2).
– Dependency label under consideration.
– Pre-trained word embeddings (concatenate or average) ⇒ captures semantics.
• Weights trained with structured perceptron, MIRA, or CRF-like objectives using gold trees.
Evaluation Metrics (UAS & LAS)
• Unlabeled Attachment Score (UAS): % of words whose predicted head = gold head (label ignored).
• Labeled Attachment Score (LAS): % of words whose predicted head AND dependency label both correct.
• Example «Book the flight through Houston»
– Gold: book(ROOT)▶flight(DOBJ); flight▶Houston(NMOD).
– System: book▶flight(xcomp); flight▶Houston(NMOD).
– UAS = (all heads right); LAS = (label xcomp wrong).
Building Dependency Trees in Practice
• Three practical routes:
1. Transition parser with greedy or beam search; fastest (≈linear) but projective.
2. Graph/MST parser; or depending on features, handles non-projective.
3. Convert phrase-structure (PCFG) parse to dependency tree by head-percolation rules.
• Hybrid systems combine neural encoders (Bi-LSTM, Transformer) to produce edge scores fed to MST or transition classifier.
• Real-world relevance: used in machine translation, information extraction, conversational agents, and even code-generation where NL utterances resemble compiler input.