2024-09-23 lecture 12
CS 5134/6034 Natural Language Processing Fall 2024 Lecture 12: Sequence Labeling Shallow Parsing
Definition
Shallow parsing (also known as partial parsing or syntactic chunking) generates fragments representing local syntactic constituents rather than complete parse trees.
Key Constituents Identified
Traditional shallow parsers focus on:
Noun Phrases (NP)
Verb Phrases (VP)
Prepositional Phrases (PP)
Mechanisms
Finite state machines are commonly used for recognizing these constituents by applying simple grammar rules and heuristics.
Shallow parsers can be rule-based or trained using BIO labeling.
Shallow Parsing Example
Flat syntactic representation: Shallow parsers produce fragments of non-recursive constituents called "chunks."
Example sentences demonstrate various chunk representations:
NP: The election in NP: the U.S. VP: will occur PP: in November
NP: The election PP: in the U.S. will occur PP: in November
NP: The election PP: in the U.S. VP: will occur PP: in November
Benefits of Shallow Parsing
Speed: Generally faster than full parsing.
Robustness: More robust when processing ungrammatical input (e.g., from Twitter, speech).
Less Complexity: Certain ambiguity issues can be ignored; deep syntactic structures are not always necessary for specific NLP applications.
Weaknesses of Shallow Parsing
Limited Attachments: Cannot handle attachment ambiguities effectively.
Inability to Represent: Cannot represent embedded relative clauses or reduced relative clauses.
Syntactic Roles: Typically does not assign syntactic roles such as subject or object.
Named Entity Recognition (NER)
Definition
NER systems identify specific types of entities such as:
Proper Names: individuals, organizations, locations (Elvis Presley, IBM, Washington)
Dates & Times: variations including November 9, 1997, 10:29 pm, etc.
Measures: percentages, monetary values, various measurements (45%, $1.4 billion)
Other Types: URLs, email addresses, etc.
Challenges
Dynamic Naming: New proper names are frequently created.
Capitalization Issues: Proper names can be capitalized or not, making them harder to identify (e.g., "University of Cincinnati").
Case in Spoken Language: Spoken language lacks case distinction.
Abbreviations and Acronyms: Not all acronyms are proper names (e.g., NLP, AI).
Ambiguity in Proper Names
Entities like companies and locations may have ambiguous designs (e.g., Ford, Washington) leading to classification issues.
Acronyms can represent multiple meanings (e.g., ACL, UC).
Rule-Based vs. Machine Learning Approaches
Rule-Based Systems
Advantages: High performance in specialized domains.
Disadvantages: Costly and limited to domain-specific applications.
Machine Learning Systems
Advantages: Adaptability for new domains.
Disadvantages: Require annotated training corpus.
Common Types of Rules in NER
List Matching: Common person names, organization names, and location gazetteers.
Leading/Trailing Terms: Identifiers for titles (e.g., Prof., Inc.).
Surface Structure Patterns: Specific formats for dates, phone numbers, etc.
Contextual Patterns: Location presence in phrases (e.g., "headquarters in X").
Machine Learning Models for NER
NER models can be constructed using:
Hidden Markov Models (HMM)
Maximum-entropy Markov Models (MEMM)
Conditional Random Fields (CRF)
Sequence Tagging Methodology
Named entity recognition as a classification/tagging problem using BIO notation.
Example: "John/B Smith/I gave/O Mary/B a/O book/O about/O Alaska/B"
HMM for NER
HMMs use transition and emission probabilities based on a labeled NE corpus.
Limitations: Cannot leverage arbitrary features.
Maximum Entropy Markov Models (MEMM)
MEMMs rely on logistic regression classifiers to estimate tag sequences.
MEMMs define feature functions for predicting labels and require careful feature selection.
Conditional Random Fields (CRF)
CRFs model globally to maximize sequence probability rather than individually, addressing the label bias problem associated with MEMMs.
NER Implementation: MENERGI System
Example of a maximum entropy approach for NER, leveraging both local and global features.
Recognizes multiple named entities and applies specific tagging schemes:
Types: Person, Organization, Location, Date, Time, Money, Percent
Subdivisions: Begin/Continue/End and Unique classifications.
Addressing Illegal Tag Sequences
Strategies to address inadmissible tag sequences by defining transition probabilities between classes.
Feature Sets and Dictionaries in NER
Local features based on target word properties and global features contextualize entity occurrence.
External dictionaries underscore the importance of comprehensive, accurate datasets to enhance recognition capabilities.