2024-09-23 lecture 12

CS 5134/6034 Natural Language Processing Fall 2024 Lecture 12: Sequence Labeling Shallow Parsing

Definition

Shallow parsing (also known as partial parsing or syntactic chunking) generates fragments representing local syntactic constituents rather than complete parse trees.

Key Constituents Identified

Traditional shallow parsers focus on:
- Noun Phrases (NP)
- Verb Phrases (VP)
- Prepositional Phrases (PP)

Mechanisms

Finite state machines are commonly used for recognizing these constituents by applying simple grammar rules and heuristics.
Shallow parsers can be rule-based or trained using BIO labeling.

Shallow Parsing Example

Flat syntactic representation: Shallow parsers produce fragments of non-recursive constituents called "chunks."
Example sentences demonstrate various chunk representations:
- 1. NP: The election in NP: the U.S. VP: will occur PP: in November
- 1. NP: The election PP: in the U.S. will occur PP: in November
- 1. NP: The election PP: in the U.S. VP: will occur PP: in November

Benefits of Shallow Parsing

Speed: Generally faster than full parsing.
Robustness: More robust when processing ungrammatical input (e.g., from Twitter, speech).
Less Complexity: Certain ambiguity issues can be ignored; deep syntactic structures are not always necessary for specific NLP applications.

Weaknesses of Shallow Parsing

Limited Attachments: Cannot handle attachment ambiguities effectively.
Inability to Represent: Cannot represent embedded relative clauses or reduced relative clauses.
Syntactic Roles: Typically does not assign syntactic roles such as subject or object.

Named Entity Recognition (NER)

Definition

NER systems identify specific types of entities such as:
- Proper Names: individuals, organizations, locations (Elvis Presley, IBM, Washington)
- Dates & Times: variations including November 9, 1997, 10:29 pm, etc.
- Measures: percentages, monetary values, various measurements (45%, $1.4 billion)
- Other Types: URLs, email addresses, etc.

Challenges

Dynamic Naming: New proper names are frequently created.
Capitalization Issues: Proper names can be capitalized or not, making them harder to identify (e.g., "University of Cincinnati").
Case in Spoken Language: Spoken language lacks case distinction.
Abbreviations and Acronyms: Not all acronyms are proper names (e.g., NLP, AI).

Ambiguity in Proper Names

Entities like companies and locations may have ambiguous designs (e.g., Ford, Washington) leading to classification issues.
Acronyms can represent multiple meanings (e.g., ACL, UC).

Rule-Based vs. Machine Learning Approaches

Rule-Based Systems

Advantages: High performance in specialized domains.
Disadvantages: Costly and limited to domain-specific applications.

Machine Learning Systems

Advantages: Adaptability for new domains.
Disadvantages: Require annotated training corpus.

Common Types of Rules in NER

List Matching: Common person names, organization names, and location gazetteers.
Leading/Trailing Terms: Identifiers for titles (e.g., Prof., Inc.).
Surface Structure Patterns: Specific formats for dates, phone numbers, etc.
Contextual Patterns: Location presence in phrases (e.g., "headquarters in X").

Machine Learning Models for NER

NER models can be constructed using:
- Hidden Markov Models (HMM)
- Maximum-entropy Markov Models (MEMM)
- Conditional Random Fields (CRF)

Sequence Tagging Methodology

Named entity recognition as a classification/tagging problem using BIO notation.
- Example: "John/B Smith/I gave/O Mary/B a/O book/O about/O Alaska/B"

HMM for NER

HMMs use transition and emission probabilities based on a labeled NE corpus.
Limitations: Cannot leverage arbitrary features.

Maximum Entropy Markov Models (MEMM)

MEMMs rely on logistic regression classifiers to estimate tag sequences.
MEMMs define feature functions for predicting labels and require careful feature selection.

Conditional Random Fields (CRF)

CRFs model globally to maximize sequence probability rather than individually, addressing the label bias problem associated with MEMMs.

NER Implementation: MENERGI System

Example of a maximum entropy approach for NER, leveraging both local and global features.
Recognizes multiple named entities and applies specific tagging schemes:
- Types: Person, Organization, Location, Date, Time, Money, Percent
- Subdivisions: Begin/Continue/End and Unique classifications.

Addressing Illegal Tag Sequences

Strategies to address inadmissible tag sequences by defining transition probabilities between classes.

Feature Sets and Dictionaries in NER

Local features based on target word properties and global features contextualize entity occurrence.
External dictionaries underscore the importance of comprehensive, accurate datasets to enhance recognition capabilities.