Dependency Parsing – Transition-Based vs. Graph-Based

Dependency Parsing Approaches

• Two mainstream algorithms convert a sentence into a dependency tree:
– Transition-based parsing (stack/LR style, greedy, efficient, only guaranteed for projective trees = trees whose arcs never cross).
– Graph-based parsing (global search for the highest-scoring tree; can handle non-projectivity but is costlier to implement/run).
• Choice affects coverage (projective vs. non-projective languages), speed, memory & code complexity.

Transition-Based Parsing (Stack, Greedy)

• Conceptually identical to LR shift–reduce parsing in compilers.
• Parser state («configuration») = ⧼Stack, Input-Buffer, Set-of-built-dependencies⧽.
• Two data alphabets:
– Input buffer alphabet = surface words.
– Stack alphabet = symbolic summaries; top-of-stack symbol represents entire stack to avoid repeated popping (stack is destructive memory).
• Parsing loop (left-to-right, single pass):
1. Inspect top two stack items & next buffer word.
2. Use an oracle (rule classifier) to decide one of three transitions.
3. Execute transition, update configuration.
4. Repeat until stack = {ROOT} and buffer empty.
• Greedy aspect: first legal reduction is applied; parser never backtracks → fails on non-projective sentences whose correct tree would require postponing a reduction.

Transition Operations: Shift, Left-Arc, Right-Arc

• Shift (SH): move first word from buffer to top of stack; no dependency edge added.
• Left-Arc (LA):
– Preconditions: stack has at least two items.
– Action: assert dependency «stack[0] ◀── stack[1]» (top depends on word beneath).
– Pop LOWER of the two (L removes Lower).
• Right-Arc (RA):
– Preconditions: stack has at least two items.
– Action: assert dependency «stack[1] ───▶ stack[0]» (second-top depends on top).
– Pop TOP item (R removes Top).
• Oracle learns when each transition is correct.

Worked Example: "Book me the morning flight"

Step	Stack (top→left)	Buffer	Action	New Edge
0	ROOT	book me the morning flight	SH	–
1	book ROOT	me the morning flight	SH	–
2	me book ROOT	the morning flight	RA	book ▶ me (OBJ)
3	book ROOT	the morning flight	SH	–
4	the book ROOT	morning flight	SH	–
5	morning the book ROOT	flight	SH	–
6	flight morning the book ROOT		LA	flight ◀ morning (NMOD)
7	flight the book ROOT		LA	flight ◀ the (DET)
8	flight book ROOT		RA	book ▶ flight (DOBJ)
9	book ROOT		RA	ROOT ▶ book (ROOT)
10	ROOT		–	parsing complete
Dependency graph: ROOT→book; book→{me, flight}; flight→{the, morning}.

Training a Transition Parser (Oracle & Supervision)

• Treebanks give only final correct trees, not intermediate (config, transition) pairs.
• Solution: simulate parser on each gold tree; at every configuration consult a dynamic oracle that outputs the transition(s) guaranteed to reach the gold tree; collect (feature-vector, transition) pairs.
• Learn classifier (Perceptron, SVM, Neural net) that predicts transition from features (stack top/below, buffer head, POS tags, etc.).

Graph-Based Parsing (MST Search)

• Parser treats every possible ordered pair (h,d) as a directed edge candidate with score $s(h,d)$ .
• The best tree $T^*$ maximises total edge score:
$T^* = \operatorname*{arg\,max}<em>{T\in\mathcal{T}(S)} \sum</em>{(h,d)\in T} s(h,d)$ .
• Finding $T^*$ is equivalent to a Maximum Spanning Tree (MST) problem on a directed graph whose nodes are words+ROOT.
• Non-projectivity naturally allowed because MST need not obey planar ordering.

Chu-Liu / Edmonds Algorithm (Directed MST)

For each vertex $v\neq ROOT$ choose the highest-scoring incoming edge; put all such edges in set $F$ .
If $F$ is acyclic ⇒ it is the MST; return.
Else contract each cycle into a super-node, adjust edge weights: $w'(u,v)=w(u,v)-w(bestIn(v))$ .
Recurse on contracted graph to obtain MST of super-nodes.
Expand cycles, re-insert previously removed best-in edges.
• Runs in $O(EV)$ time; practical for sentence lengths (<100 words).

Edge Scoring & Feature Design

• Linear model: $s(h,d)=w^T \phi(h,d)$ .
• Typical feature templates:
– Word forms, lemmas, POS of head & dependent.
– Direction & distance (tokens) between head/dependent.
– Surrounding context words/POS tags (±1, ±2).
– Dependency label under consideration.
– Pre-trained word embeddings (concatenate or average) ⇒ captures semantics.
• Weights $w$ trained with structured perceptron, MIRA, or CRF-like objectives using gold trees.

Evaluation Metrics (UAS & LAS)

• Unlabeled Attachment Score (UAS): % of words whose predicted head = gold head (label ignored).
• Labeled Attachment Score (LAS): % of words whose predicted head AND dependency label both correct.
• Example «Book the flight through Houston»
– Gold: book(ROOT)▶flight(DOBJ); flight▶Houston(NMOD).
– System: book▶flight(xcomp); flight▶Houston(NMOD).
– UAS = $\frac{3}{3}=1.0$ (all heads right); LAS = $\frac{2}{3}\approx0.67$ (label xcomp wrong).

Building Dependency Trees in Practice

• Three practical routes:
1. Transition parser with greedy or beam search; fastest (≈linear) but projective.
2. Graph/MST parser; $O(n^2)$ or $O(n^3)$ depending on features, handles non-projective.
3. Convert phrase-structure (PCFG) parse to dependency tree by head-percolation rules.
• Hybrid systems combine neural encoders (Bi-LSTM, Transformer) to produce edge scores fed to MST or transition classifier.
• Real-world relevance: used in machine translation, information extraction, conversational agents, and even code-generation where NL utterances resemble compiler input.