Section 7 - Natural Language Processing (NLP)

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/12

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 2:23 AM on 5/15/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

13 Terms

1
New cards

NLP

Any computer manipulation of natural language, from counting words to fully understanding meaning.

2
New cards

NER

Named Entity Recognition, finds and classifies named entities in text: PERSON, PLACE, ORGANISATION, DATE, MONEY, PERCENTAGE.

3
New cards

Challenges of NER

The same word means different things in different contexts, requiring the system to read full context.

4
New cards

Advantages of One-Hot Encoding

Simple and easy to implement; every word has a unique identifier.

5
New cards

Disadvantages of One-Hot Encoding

Vector size = vocabulary (10,000+ numbers per word); cannot handle new words not in the vocabulary, Zero semantics

6
New cards

Word2Vec

Maps each word to a DENSE vector (about 300 numbers), with similar words close together in this mathematical space.

7
New cards

Meaning in Word2Vec

Encoded as direction and distance; words used in similar contexts cluster together.

8
New cards

Comparison: One-Hot Encoding vs Word2Vec

One-Hot: Vector size = vocabulary (10,000+)

Almost all zeros (very sparse)

No relationship between any words

No training needed

Word2Vec:

Vector size = ~300 (fixed, compact)

All numbers are meaningful (dense)

Similar words are mathematically close

Requires training on a large text corpus

9
New cards

TF-IDF

Measures how IMPORTANT and DISTINCTIVE a word is to a specific document.

TF-IDF = TF x IDF.

10
New cards

TF (Term Frequency)

Count of word in document / total words in document; high TF means word appears often in this document.

11
New cards

IDF (Inverse Document Frequency)

log(total number of documents / number of documents containing this word); high IDF means word is RARE across all documents.

12
New cards

Purpose of IDF

Punishes words that appear everywhere and rewards rare, distinctive words.

13
New cards

Example of TF Limitation

The word 'the' appears very often (high TF) in every document — but it is meaningless.