1/104
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Terms are _____, Probabilistic, ________-dependent, purposive, and _______.
Subjective, context, evolve over time
Two fundamental obstacles to creating universal term system:
Model construction problem
2, symbol grounding
Compositional systems (postcoord) are _____ and _____ to maintain than enumerative systems.
Easier, less expensive
What is coding?
Compression/summarization of data
False Positive (FP)
Assigning a label to pts who DO NOT warrant label
False Negatives (FN)
Failing to assign label to pts who DO warrant label
Reasons for coding errors:
Data errors in pt record (human caused), data not available at time of coding, cannot establish primary purpose of visits (multi-morbidity), coder expertise, data entry errors
Perfect terminology (impossible) has 2 reqs:
Ability to cover all concepts
Independence from reasoning
Concepts are probabilistic, but ______ are not.
Terminologies
Release cycle for terminologies has two phases:
Gradual modification period
Major revision
When changing enumerative terminologies, change must be reflected:
Across whole system
When changing compositional terminologies, change only requires:
Change to core terms associated with disease (fewer changes needed than enumerative)
Reference terminologies can be used for:
Automated error checking
Mapping terminologies is easier if they are created from:
Same compositional core (enumerative created independently, compositional created using building block method)
NLP seeks to match _____ with ______.
Context, concepts/terms
Named Entity Recognition (NER)
Process of assigning predefined labels to text (labels may come from terminology or may be names of items)
Word level processing analyzes individual word parts independent of
Linguistic structure or meaning (bag of words)
Tokenization
Converts bag of words into lemma
Stemming
Converting different variants of words into common stems (ex. Catheterization -> catheter)
Lemmatization:
Converts related words into lemma (ex. went -> go)
Regular Expression
Describes in logical form all rules that dictates how a string of symbols can be decomposed
Reference Resolution:
process of using context to determine whether two separate mentions of a concept to same event
Word-Sense Disambiguation
When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as ________.
Negation
Determining whether a text implies concept "a" or "not a"
N-Grams
Commonly associated words
Parsing
Taking sequent of words and assignign them to different roles in grammar
Query Expansion
Action of a search engine to include synonyms and lexical variants
What is the most common application of NLP?
Information Extraction
Information extraction:
Looks for patterns or performs analysis to locate specific info in text
Information Retrieval:
Keyword searching through tokenization/lemmatization
Used to access documents in large collection
Goal of info retrieval:
Match user's query againt available docs to create list
Question Answering:
User submits NL question, QA system automatically answers in NL
Precise questions —> Direct answers
Advanced text summarization and generation
Text Summarization:
Takes several docs and produces single, coherent text which synthesizes main points
Steps in Summarization Process:
Content selection
Organization
Re-generation
Text Labeling:
Categorization of text into known types
Text Generation:
Formulation of NL sentences from a non-human-readable source
Application Casting:
Deciding which NLP tasks are appropriate to expected output
Topic Modelling:
Takes docs and identifies topics (unsupervised, bag of words approach)
Sequence Labeling:
Leeping track of order in which textual units occur (supervised)
Relation Extraction:
Relation detection and determination of relation type
Approaches to relation extraction:
Knowledge-based
Statistical
Linguistic Levels:
Word-Level (tokens/morphology)
Sentence-Level (syntax/semantics)
Document-Level (pragmatics/discourse)
Tokens:
Basic language units
Language Units:
Morphemes, words, numbers, symbols, punctuation
Morphology:
words/meaningful parts of words
Combo of morphemes to produce words/lexemes
Morpheme:
root, prefix, suffix
Lexeme:
Forms of the same word (ex. activated, activate, activation)
Syntax:
structure of phrases/sentences
Categorization of words in a language and the structure of phrases/sentences
Discourse structure refers to how author ____.
Organizes info in docs
(aids comprehension for reader)
Coreference Resolution:
Words/phrases referring to same entity
UMLS Metatheseaurs establishes:
Relationships between concepts
Word Embedding
Words/phrases from vocab are numeric vectors
2 Functions of Word Embedding:
Capture meaning of word using context
Condense representation into vector
Semantics:
Meaning/interpretation of language (words/phrases/sentences)
Entity Linking:
Representing a word with a unique semantic concept
Intrinsic NLP software eval:
Measure changes in system output due to system parameter changes
Extrinsic NLP software eval:
Measure method's performance in a given task
True Positive (TP)
Outputs correctly labeled as HAVING
True Negative (TN)
Outputs correctly labeled as NOT HAVING
Recall (performance assessment)
Number of correct results/gold standard (TP)/(TP+FN)
Precision (performance assessment)
Number of correct results/total results (TP)/(TP+FP)
Clinical Terminologies are important for:
Decision making, PH surveillance, data mining
Pragmatics
Impact of context/intent of speaker on meaning
Discourse
Paragraph/documents
Corpus
Collection of documents
Bag of Words
Word-level processing, analyzes individual words independent of structure/meaning, SIMPLEST way to analyze text
Bag of Words Steps
Remove irrelevant text elements (stop words, punctuation)
Tokenization
Regular expression (stemming, lemmetization)
Many concepts require >1 ______ to describe them.
Lemma
_______ used to present duration in an N-Gram.
Temporal Modifier
In term frequency approach, we prioritize mapping based on:
Historically determine frequency
(Term Frequency) = weight
Inverse Document Frequency (IDF):
Determine weights of rare words in corpus
Parsing
Applying grammar to interpret a texr
Parse Tree
Graphic representation of nested structure of grammar
Grammatical Clues we can use to determine structure of text:
Parts of Speech, Parsing, Context-Tree Grammar, Probabilistic Grammar
Domain knowledge:
Application of ontology, conceptual model of domain, describes all concepts and relationships between concepts
Text Mining
Finding unknown patterns/relationships in clinical texts
Adverse event reporting
Linguistic Stack:
Words - Syntax - Semantics - Discourse - Pragmatics - Formal Understanding - Generation
To extract lab results from Clinical Documents: (Linguistic Stack)
Words - Syntax - Semantics
To extract Pt Hx from Clin Docs: (Linguistic Stack)
Words - Syntax - Semantics - Discourse - Pragmatics - Formal Understanding
To translate pt hx from English to Chinese: (Linguistic Stack)
Words - Syntax - Semantics - Discourse - Pragmatics - Formal Understanding - Generation
To keyword search: (Linguistic Stack)
Words
To do concept search: (Linguistic Stack)
Words - Syntax - Semantics
To do voice-based question answering: (Linguistic Stack)
Words - Syntax - Semantics - Discourse - Pragmatics - Formal Understanding - Generation
Regular Expression Good for:
Simple Pattern Extraction in Word-based application
Regular Expression Bad for:
Anything that needs to understand English
Regular Expression Good for:
Botton-up structure specification, syntactic/semantic parsing
Regular Expression Bad for:
Anything that require a lot of context
Supervised ML Classification requires use to define ____ and _____ but not how they are connected.
Inputs, Outputs
Supervised ML Classification Good for:
A lot of annotated data
Supervised ML Regression similar inputs to __________ by real-valued _____.
Supervised ML Classification, Outputs
Supervised ML Regression good for:
A lot of real-valued data (numeric)
Unsupervised ML Clustering allows for ______ without need for _____.
Grouping similar items, Annotated Data
Unsupervised ML Clustering Good For:
Similarity tasks, NOT needle-in-haystack tasks
Steps to Question Answering:
Step 1: Answer Type Detection (ML Classification)
Step 2: Keyword Extraction (Heuristics)
Step 3: Information Retrieval (Inverted Text)
Step 4: Answer Extraction (ML Classification)
Step 5: Answer Ranking (ML Regression)
Heuristics Method:
Remove stop words, stemming, give high value to terms in UMLS, combine collocations
Inverted Index Method:
Very quick keyword search over large number of docs