JB

Dependency Syntax Notes

Syntax

  • Syntax is the study of sentence structure, answering "Who does what to whom?"
  • Various theories exist with commonalities: Government and Binding (GB), Minimalist Program (MP), Head-driven phrase structure grammar (HPSG), Lexical Functional Grammar (LFG), Categorial Grammar, and Dependency Grammar.

Why Syntax Matters

  • Theoretical syntacticians focus on grammaticality.
  • Relevant for NLP applications like text generation and grammar checking.
  • Parsing provides scaffolding for semantic analysis, aiding opinion mining, information extraction, and machine translation.

Basic Principles of Syntax

  • Form vs. Function
    • Syntactic form uses parts of speech and phrases (NP, VP).
    • Syntactic function describes roles in a sentence (Subject, Object, Adverbial).
  • Constituents
    • Words are organized into groupings that function as a whole.
    • Tested through linguistic tests of constituency.
  • Phrase Structure Grammar (PSG)
    • Captures constituent status and ordering using context-free grammar.
    • Example rules: S \rightarrow NP VP, NP \rightarrow D N, VP \rightarrow V NP

Dependency Grammar (DG)

  • An alternative to phrase structure.
  • Syntactic functions are central.
  • Syntactic structure consists of lexical items linked by binary asymmetric dependencies.
  • Increasing interest in dependency-based parsing for NLP.
  • Useful in relation extraction, question answering, and sentiment analysis.

Constituency vs. Relations

  • DG is based on relationships between words (dependency relations).
  • PSG is based on groupings or constituents.

Simple Relation Example

  • In the sentence "The dog ate my homework", relations include:
    • ate →subj The dog
    • ate →obj my homework

Comparison

  • Dependency structures represent head-dependent relations, functional categories, and parts-of-speech.
  • Phrase structures represent phrases, structural categories, and grammatical functions.

Criteria for Heads and Dependents

  1. H determines the syntactic category of C; H can replace C.
  2. H determines the semantic category of C; D specifies H.
  3. H is obligatory; D may be optional.
  4. The form of D depends on H (agreement or government).
  5. The linear position of D is specified with reference to H.

Some Tricky Cases

  • Complex verb groups
  • Subordinate clauses
  • Coordination
  • Prepositional phrases
  • Punctuation

Dependency Graphs

  • Defined as a directed graph G with:
    • A set V of nodes.
    • A set E of arcs (edges).
    • Labeled graphs with word forms and dependency types.
  • Notations: i \rightarrow j \equiv (i, j) \in E

Formal Properties of Dependency Graphs

  • antisymmetric: if A → B, then B ↛ A
  • antireflexive: if A → B, then B ≠ A
  • antitransitive: if A → B and B → C, then A ↛ C
  • labeled: ∀ →, → has a label (r)

Formal Conditions on Dependency Graphs

  • G is (weakly) connected: For every node i, there is a node j such that i → j or j → i.
  • G is acyclic: If i → j then not j →∗ i.
  • G obeys the single-head constraint: If i → j, then not k → j, for any k ≠ i.

Projectivity

  • A projective graph: If i → j then for any k such that i< k < j or j < k < i, i →∗ k.
  • Non-projective structures are needed for long-distance dependencies and free word order.

Treebanks

  • Collections of sentences manually annotated with syntactic analysis.
  • Used to train data-driven NLP tools.
  • Examples: Penn Treebank, Prague Dependency Treebank, Negra/Tuba-DZ, Penn (Chinese), Norwegian Dependency Treebank, Universal Dependencies.

Norwegian Dependency Treebank (NDT)

  • Completed in 2014 by Språkbanken, National Library.
  • Ca 600,000 tokens of Bokmål and Nynorsk text.
  • Enables training of taggers and parsers for Norwegian.
  • Converted to Universal Dependencies.

Universal Dependencies

  • Harmonized dependency treebanks for more than 100 languages.
  • Norwegian models available in spaCy and Stanza.

CoNLL-U format

  • Standard format for dependency treebanks.