2024-09-23 lecture 12

CS 5134/6034 Natural Language Processing Fall 2024 Lecture 12: Sequence Labeling Shallow Parsing

Definition

  • Shallow parsing (also known as partial parsing or syntactic chunking) generates fragments representing local syntactic constituents rather than complete parse trees.

Key Constituents Identified

  • Traditional shallow parsers focus on:

    • Noun Phrases (NP)

    • Verb Phrases (VP)

    • Prepositional Phrases (PP)

Mechanisms

  • Finite state machines are commonly used for recognizing these constituents by applying simple grammar rules and heuristics.

  • Shallow parsers can be rule-based or trained using BIO labeling.

Shallow Parsing Example

  • Flat syntactic representation: Shallow parsers produce fragments of non-recursive constituents called "chunks."

  • Example sentences demonstrate various chunk representations:

      1. NP: The election in NP: the U.S. VP: will occur PP: in November

      1. NP: The election PP: in the U.S. will occur PP: in November

      1. NP: The election PP: in the U.S. VP: will occur PP: in November

Benefits of Shallow Parsing

  • Speed: Generally faster than full parsing.

  • Robustness: More robust when processing ungrammatical input (e.g., from Twitter, speech).

  • Less Complexity: Certain ambiguity issues can be ignored; deep syntactic structures are not always necessary for specific NLP applications.

Weaknesses of Shallow Parsing

  • Limited Attachments: Cannot handle attachment ambiguities effectively.

  • Inability to Represent: Cannot represent embedded relative clauses or reduced relative clauses.

  • Syntactic Roles: Typically does not assign syntactic roles such as subject or object.

Named Entity Recognition (NER)

Definition

  • NER systems identify specific types of entities such as:

    • Proper Names: individuals, organizations, locations (Elvis Presley, IBM, Washington)

    • Dates & Times: variations including November 9, 1997, 10:29 pm, etc.

    • Measures: percentages, monetary values, various measurements (45%, $1.4 billion)

    • Other Types: URLs, email addresses, etc.

Challenges

  1. Dynamic Naming: New proper names are frequently created.

  2. Capitalization Issues: Proper names can be capitalized or not, making them harder to identify (e.g., "University of Cincinnati").

  3. Case in Spoken Language: Spoken language lacks case distinction.

  4. Abbreviations and Acronyms: Not all acronyms are proper names (e.g., NLP, AI).

Ambiguity in Proper Names

  • Entities like companies and locations may have ambiguous designs (e.g., Ford, Washington) leading to classification issues.

  • Acronyms can represent multiple meanings (e.g., ACL, UC).

Rule-Based vs. Machine Learning Approaches

Rule-Based Systems

  • Advantages: High performance in specialized domains.

  • Disadvantages: Costly and limited to domain-specific applications.

Machine Learning Systems

  • Advantages: Adaptability for new domains.

  • Disadvantages: Require annotated training corpus.

Common Types of Rules in NER

  • List Matching: Common person names, organization names, and location gazetteers.

  • Leading/Trailing Terms: Identifiers for titles (e.g., Prof., Inc.).

  • Surface Structure Patterns: Specific formats for dates, phone numbers, etc.

  • Contextual Patterns: Location presence in phrases (e.g., "headquarters in X").

Machine Learning Models for NER

  • NER models can be constructed using:

    • Hidden Markov Models (HMM)

    • Maximum-entropy Markov Models (MEMM)

    • Conditional Random Fields (CRF)

Sequence Tagging Methodology

  • Named entity recognition as a classification/tagging problem using BIO notation.

    • Example: "John/B Smith/I gave/O Mary/B a/O book/O about/O Alaska/B"

HMM for NER

  • HMMs use transition and emission probabilities based on a labeled NE corpus.

  • Limitations: Cannot leverage arbitrary features.

Maximum Entropy Markov Models (MEMM)

  • MEMMs rely on logistic regression classifiers to estimate tag sequences.

  • MEMMs define feature functions for predicting labels and require careful feature selection.

Conditional Random Fields (CRF)

  • CRFs model globally to maximize sequence probability rather than individually, addressing the label bias problem associated with MEMMs.

NER Implementation: MENERGI System

  • Example of a maximum entropy approach for NER, leveraging both local and global features.

  • Recognizes multiple named entities and applies specific tagging schemes:

    • Types: Person, Organization, Location, Date, Time, Money, Percent

    • Subdivisions: Begin/Continue/End and Unique classifications.

Addressing Illegal Tag Sequences

  • Strategies to address inadmissible tag sequences by defining transition probabilities between classes.

Feature Sets and Dictionaries in NER

  • Local features based on target word properties and global features contextualize entity occurrence.

  • External dictionaries underscore the importance of comprehensive, accurate datasets to enhance recognition capabilities.