Mod6

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/104

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

105 Terms

1
New cards

Terms are _____, Probabilistic, ________-dependent, purposive, and _______.

Subjective, context, evolve over time

2
New cards

Two fundamental obstacles to creating universal term system:

  1. Model construction problem

  2. 2, symbol grounding

3
New cards

Compositional systems (postcoord) are _____ and _____ to maintain than enumerative systems.

Easier, less expensive

4
New cards

What is coding?

Compression/summarization of data

5
New cards

False Positive (FP)

Assigning a label to pts who DO NOT warrant label

6
New cards

False Negatives (FN)

Failing to assign label to pts who DO warrant label

7
New cards

Reasons for coding errors:

Data errors in pt record (human caused), data not available at time of coding, cannot establish primary purpose of visits (multi-morbidity), coder expertise, data entry errors

8
New cards

Perfect terminology (impossible) has 2 reqs:

  1. Ability to cover all concepts

  2. Independence from reasoning

9
New cards

Concepts are probabilistic, but ______ are not.

Terminologies

10
New cards

Release cycle for terminologies has two phases:

  1. Gradual modification period

  2. Major revision

11
New cards

When changing enumerative terminologies, change must be reflected:

Across whole system

12
New cards

When changing compositional terminologies, change only requires:

Change to core terms associated with disease (fewer changes needed than enumerative)

13
New cards

Reference terminologies can be used for:

Automated error checking

14
New cards

Mapping terminologies is easier if they are created from:

Same compositional core (enumerative created independently, compositional created using building block method)

15
New cards

NLP seeks to match _____ with ______.

Context, concepts/terms

16
New cards

Named Entity Recognition (NER)

Process of assigning predefined labels to text (labels may come from terminology or may be names of items)

17
New cards

Word level processing analyzes individual word parts independent of

Linguistic structure or meaning (bag of words)

18
New cards

Tokenization

Converts bag of words into lemma

19
New cards

Stemming

Converting different variants of words into common stems (ex. Catheterization -> catheter)

20
New cards

Lemmatization:

Converts related words into lemma (ex. went -> go)

21
New cards

Regular Expression

Describes in logical form all rules that dictates how a string of symbols can be decomposed

22
New cards

Reference Resolution:

process of using context to determine whether two separate mentions of a concept to same event

23
New cards

Word-Sense Disambiguation

When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as ________.

24
New cards

Negation

Determining whether a text implies concept "a" or "not a"

25
New cards

N-Grams

Commonly associated words

26
New cards

Parsing

Taking sequent of words and assignign them to different roles in grammar

27
New cards

Query Expansion

Action of a search engine to include synonyms and lexical variants

28
New cards

What is the most common application of NLP?

Information Extraction

29
New cards

Information extraction:

Looks for patterns or performs analysis to locate specific info in text

30
New cards

Information Retrieval:

Keyword searching through tokenization/lemmatization

Used to access documents in large collection

31
New cards

Goal of info retrieval:

Match user's query againt available docs to create list

32
New cards

Question Answering:

User submits NL question, QA system automatically answers in NL

Precise questions —> Direct answers

Advanced text summarization and generation

33
New cards

Text Summarization:

Takes several docs and produces single, coherent text which synthesizes main points

34
New cards

Steps in Summarization Process:

  1. Content selection

  2. Organization

  3. Re-generation

35
New cards

Text Labeling:

Categorization of text into known types

36
New cards

Text Generation:

Formulation of NL sentences from a non-human-readable source

37
New cards

Application Casting:

Deciding which NLP tasks are appropriate to expected output

38
New cards

Topic Modelling:

Takes docs and identifies topics (unsupervised, bag of words approach)

39
New cards

Sequence Labeling:

Leeping track of order in which textual units occur (supervised)

40
New cards

Relation Extraction:

Relation detection and determination of relation type

41
New cards

Approaches to relation extraction:

  1. Knowledge-based

  2. Statistical

42
New cards

Linguistic Levels:

  1. Word-Level (tokens/morphology)

  2. Sentence-Level (syntax/semantics)

  3. Document-Level (pragmatics/discourse)

43
New cards

Tokens:

Basic language units

44
New cards

Language Units:

Morphemes, words, numbers, symbols, punctuation

45
New cards

Morphology:

words/meaningful parts of words

Combo of morphemes to produce words/lexemes

46
New cards

Morpheme:

root, prefix, suffix

47
New cards

Lexeme:

Forms of the same word (ex. activated, activate, activation)

48
New cards

Syntax:

structure of phrases/sentences

Categorization of words in a language and the structure of phrases/sentences

49
New cards

Discourse structure refers to how author ____.

Organizes info in docs

(aids comprehension for reader)

50
New cards

Coreference Resolution:

Words/phrases referring to same entity

51
New cards

UMLS Metatheseaurs establishes:

Relationships between concepts

52
New cards

Word Embedding

Words/phrases from vocab are numeric vectors

53
New cards

2 Functions of Word Embedding:

  1. Capture meaning of word using context

  2. Condense representation into vector

54
New cards

Semantics:

Meaning/interpretation of language (words/phrases/sentences)

55
New cards

Entity Linking:

Representing a word with a unique semantic concept

56
New cards

Intrinsic NLP software eval:

Measure changes in system output due to system parameter changes

57
New cards

Extrinsic NLP software eval:

Measure method's performance in a given task

58
New cards

True Positive (TP)

Outputs correctly labeled as HAVING

59
New cards

True Negative (TN)

Outputs correctly labeled as NOT HAVING

60
New cards

Recall (performance assessment)

Number of correct results/gold standard (TP)/(TP+FN)

61
New cards

Precision (performance assessment)

Number of correct results/total results (TP)/(TP+FP)

62
New cards

Clinical Terminologies are important for:

Decision making, PH surveillance, data mining

63
New cards
64
New cards
65
New cards
66
New cards

Pragmatics

Impact of context/intent of speaker on meaning

67
New cards
68
New cards

Discourse

Paragraph/documents

69
New cards

Corpus

Collection of documents

70
New cards

Bag of Words

Word-level processing, analyzes individual words independent of structure/meaning, SIMPLEST way to analyze text

71
New cards

Bag of Words Steps

  1. Remove irrelevant text elements (stop words, punctuation)

  2. Tokenization

  3. Regular expression (stemming, lemmetization)

72
New cards

Many concepts require >1 ______ to describe them.

Lemma

73
New cards

_______ used to present duration in an N-Gram.

Temporal Modifier

74
New cards

In term frequency approach, we prioritize mapping based on:

Historically determine frequency

(Term Frequency) = weight

75
New cards

Inverse Document Frequency (IDF):

Determine weights of rare words in corpus

76
New cards

Parsing

Applying grammar to interpret a texr

77
New cards

Parse Tree

Graphic representation of nested structure of grammar

78
New cards

Grammatical Clues we can use to determine structure of text:

Parts of Speech, Parsing, Context-Tree Grammar, Probabilistic Grammar

79
New cards

Domain knowledge:

Application of ontology, conceptual model of domain, describes all concepts and relationships between concepts

80
New cards

Text Mining

Finding unknown patterns/relationships in clinical texts

Adverse event reporting

81
New cards

Linguistic Stack:

Words - Syntax - Semantics - Discourse - Pragmatics - Formal Understanding - Generation

82
New cards

To extract lab results from Clinical Documents: (Linguistic Stack)

Words - Syntax - Semantics

83
New cards

To extract Pt Hx from Clin Docs: (Linguistic Stack)

Words - Syntax - Semantics - Discourse - Pragmatics - Formal Understanding

84
New cards

To translate pt hx from English to Chinese: (Linguistic Stack)

Words - Syntax - Semantics - Discourse - Pragmatics - Formal Understanding - Generation

85
New cards

To keyword search: (Linguistic Stack)

Words

86
New cards

To do concept search: (Linguistic Stack)

Words - Syntax - Semantics

87
New cards

To do voice-based question answering: (Linguistic Stack)

Words - Syntax - Semantics - Discourse - Pragmatics - Formal Understanding - Generation

88
New cards

Regular Expression Good for:

Simple Pattern Extraction in Word-based application

89
New cards

Regular Expression Bad for:

Anything that needs to understand English

90
New cards

Regular Expression Good for:

Botton-up structure specification, syntactic/semantic parsing

91
New cards

Regular Expression Bad for:

Anything that require a lot of context

92
New cards

Supervised ML Classification requires use to define ____ and _____ but not how they are connected.

Inputs, Outputs

93
New cards

Supervised ML Classification Good for:

A lot of annotated data

94
New cards

Supervised ML Regression similar inputs to __________ by real-valued _____.

Supervised ML Classification, Outputs

95
New cards

Supervised ML Regression good for:

A lot of real-valued data (numeric)

96
New cards

Unsupervised ML Clustering allows for ______ without need for _____.

Grouping similar items, Annotated Data

97
New cards

Unsupervised ML Clustering Good For:

Similarity tasks, NOT needle-in-haystack tasks

98
New cards

Steps to Question Answering:

Step 1: Answer Type Detection (ML Classification)

Step 2: Keyword Extraction (Heuristics)

Step 3: Information Retrieval (Inverted Text)

Step 4: Answer Extraction (ML Classification)

Step 5: Answer Ranking (ML Regression)

99
New cards

Heuristics Method:

Remove stop words, stemming, give high value to terms in UMLS, combine collocations

100
New cards

Inverted Index Method:

Very quick keyword search over large number of docs