Mod6

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/104

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

105 Terms

New cards

Terms are _____, Probabilistic, ________-dependent, purposive, and _______.

Subjective, context, evolve over time

New cards

Two fundamental obstacles to creating universal term system:

Model construction problem
2, symbol grounding

New cards

Compositional systems (postcoord) are _____ and _____ to maintain than enumerative systems.

Easier, less expensive

New cards

What is coding?

Compression/summarization of data

New cards

False Positive (FP)

Assigning a label to pts who DO NOT warrant label

New cards

False Negatives (FN)

Failing to assign label to pts who DO warrant label

New cards

Reasons for coding errors:

Data errors in pt record (human caused), data not available at time of coding, cannot establish primary purpose of visits (multi-morbidity), coder expertise, data entry errors

New cards

Perfect terminology (impossible) has 2 reqs:

Ability to cover all concepts
Independence from reasoning

New cards

Concepts are probabilistic, but ______ are not.

Terminologies

New cards

Release cycle for terminologies has two phases:

Gradual modification period
Major revision

New cards

When changing enumerative terminologies, change must be reflected:

Across whole system

New cards

When changing compositional terminologies, change only requires:

Change to core terms associated with disease (fewer changes needed than enumerative)

New cards

Reference terminologies can be used for:

Automated error checking

New cards

Mapping terminologies is easier if they are created from:

Same compositional core (enumerative created independently, compositional created using building block method)

New cards

NLP seeks to match _____ with ______.

Context, concepts/terms

New cards

Named Entity Recognition (NER)

Process of assigning predefined labels to text (labels may come from terminology or may be names of items)

New cards

Word level processing analyzes individual word parts independent of

Linguistic structure or meaning (bag of words)

New cards

Tokenization

Converts bag of words into lemma

New cards

Stemming

Converting different variants of words into common stems (ex. Catheterization -> catheter)

New cards

Lemmatization:

Converts related words into lemma (ex. went -> go)

New cards

Regular Expression

Describes in logical form all rules that dictates how a string of symbols can be decomposed

New cards

Reference Resolution:

process of using context to determine whether two separate mentions of a concept to same event

New cards

Word-Sense Disambiguation

When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as ________.

New cards

Negation

Determining whether a text implies concept "a" or "not a"

New cards

N-Grams

Commonly associated words

New cards

Parsing

Taking sequent of words and assignign them to different roles in grammar

New cards

Query Expansion

Action of a search engine to include synonyms and lexical variants

New cards

What is the most common application of NLP?

Information Extraction

New cards

Information extraction:

Looks for patterns or performs analysis to locate specific info in text

New cards

Information Retrieval:

Keyword searching through tokenization/lemmatization

Used to access documents in large collection

New cards

Goal of info retrieval:

Match user's query againt available docs to create list

New cards

Question Answering:

User submits NL question, QA system automatically answers in NL

Precise questions —> Direct answers

Advanced text summarization and generation

New cards

Text Summarization:

Takes several docs and produces single, coherent text which synthesizes main points

New cards

Steps in Summarization Process:

Content selection
Organization
Re-generation

New cards

Text Labeling:

Categorization of text into known types

New cards

Text Generation:

Formulation of NL sentences from a non-human-readable source

New cards

Application Casting:

Deciding which NLP tasks are appropriate to expected output

New cards

Topic Modelling:

Takes docs and identifies topics (unsupervised, bag of words approach)

New cards

Sequence Labeling:

Leeping track of order in which textual units occur (supervised)

New cards

Relation Extraction:

Relation detection and determination of relation type

New cards

Approaches to relation extraction:

Knowledge-based
Statistical

New cards

Linguistic Levels:

Word-Level (tokens/morphology)
Sentence-Level (syntax/semantics)
Document-Level (pragmatics/discourse)

New cards

Tokens:

Basic language units

New cards

Language Units:

Morphemes, words, numbers, symbols, punctuation

New cards

Morphology:

words/meaningful parts of words

Combo of morphemes to produce words/lexemes

New cards

Morpheme:

root, prefix, suffix

New cards

Lexeme:

Forms of the same word (ex. activated, activate, activation)

New cards

Syntax:

structure of phrases/sentences

Categorization of words in a language and the structure of phrases/sentences

New cards

Discourse structure refers to how author ____.

Organizes info in docs

(aids comprehension for reader)

New cards

Coreference Resolution:

Words/phrases referring to same entity

New cards

UMLS Metatheseaurs establishes:

Relationships between concepts

New cards

Word Embedding

Words/phrases from vocab are numeric vectors

New cards

2 Functions of Word Embedding:

Capture meaning of word using context
Condense representation into vector

New cards

Semantics:

Meaning/interpretation of language (words/phrases/sentences)

New cards

Entity Linking:

Representing a word with a unique semantic concept

New cards

Intrinsic NLP software eval:

Measure changes in system output due to system parameter changes

New cards

Extrinsic NLP software eval:

Measure method's performance in a given task

New cards

True Positive (TP)

Outputs correctly labeled as HAVING

New cards

True Negative (TN)

Outputs correctly labeled as NOT HAVING

New cards

Recall (performance assessment)

Number of correct results/gold standard (TP)/(TP+FN)

New cards

Precision (performance assessment)

Number of correct results/total results (TP)/(TP+FP)

New cards

Clinical Terminologies are important for:

Decision making, PH surveillance, data mining

New cards

Pragmatics

Impact of context/intent of speaker on meaning

New cards

Discourse

Paragraph/documents

New cards

Corpus

Collection of documents

New cards

Bag of Words

Word-level processing, analyzes individual words independent of structure/meaning, SIMPLEST way to analyze text

New cards

Bag of Words Steps

Remove irrelevant text elements (stop words, punctuation)
Tokenization
Regular expression (stemming, lemmetization)

New cards

Many concepts require >1 ______ to describe them.

Lemma

New cards

_______ used to present duration in an N-Gram.

Temporal Modifier

New cards

In term frequency approach, we prioritize mapping based on:

Historically determine frequency

(Term Frequency) = weight

New cards

Inverse Document Frequency (IDF):

Determine weights of rare words in corpus

New cards

Parsing

Applying grammar to interpret a texr

New cards

Parse Tree

Graphic representation of nested structure of grammar

New cards

Grammatical Clues we can use to determine structure of text:

Parts of Speech, Parsing, Context-Tree Grammar, Probabilistic Grammar

New cards

Domain knowledge:

Application of ontology, conceptual model of domain, describes all concepts and relationships between concepts

New cards

Text Mining

Finding unknown patterns/relationships in clinical texts

Adverse event reporting

New cards

Linguistic Stack:

Words - Syntax - Semantics - Discourse - Pragmatics - Formal Understanding - Generation

New cards

To extract lab results from Clinical Documents: (Linguistic Stack)

Words - Syntax - Semantics

New cards

To extract Pt Hx from Clin Docs: (Linguistic Stack)

Words - Syntax - Semantics - Discourse - Pragmatics - Formal Understanding

New cards

To translate pt hx from English to Chinese: (Linguistic Stack)

Words - Syntax - Semantics - Discourse - Pragmatics - Formal Understanding - Generation

New cards

To keyword search: (Linguistic Stack)

Words

New cards

To do concept search: (Linguistic Stack)

Words - Syntax - Semantics

New cards

To do voice-based question answering: (Linguistic Stack)

Words - Syntax - Semantics - Discourse - Pragmatics - Formal Understanding - Generation

New cards

Regular Expression Good for:

Simple Pattern Extraction in Word-based application

New cards

Regular Expression Bad for:

Anything that needs to understand English

New cards

Regular Expression Good for:

Botton-up structure specification, syntactic/semantic parsing

New cards

Regular Expression Bad for:

Anything that require a lot of context

New cards

Supervised ML Classification requires use to define ____ and _____ but not how they are connected.

Inputs, Outputs

New cards

Supervised ML Classification Good for:

A lot of annotated data

New cards

Supervised ML Regression similar inputs to __________ by real-valued _____.

Supervised ML Classification, Outputs

New cards

Supervised ML Regression good for:

A lot of real-valued data (numeric)

New cards

Unsupervised ML Clustering allows for ______ without need for _____.

Grouping similar items, Annotated Data

New cards

Unsupervised ML Clustering Good For:

Similarity tasks, NOT needle-in-haystack tasks

New cards

Steps to Question Answering:

Step 1: Answer Type Detection (ML Classification)

Step 2: Keyword Extraction (Heuristics)

Step 3: Information Retrieval (Inverted Text)

Step 4: Answer Extraction (ML Classification)

Step 5: Answer Ranking (ML Regression)

New cards

Heuristics Method:

Remove stop words, stemming, give high value to terms in UMLS, combine collocations

100

New cards

Inverted Index Method:

Very quick keyword search over large number of docs