Lecture 15: Text analytics

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/16

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

17 Terms

New cards

Q: What are some applications of text mining?

Spam filters, search engine relevancy, social media summarization, essay grading, author attribution, AI-written news stories.

New cards

Q: What format is used for tidy text mining in R?

Tidy text format using tibbles.

New cards

Q: What function in R splits text into individual words?

unnest_tokens() from the tidytext package.

New cards

Q: What does unnest_tokens(word, text) do?

Breaks each line of text into separate words.

New cards

Q: What is a stopword in text mining?

A common word like “the” or “and” that is usually removed because it carries little meaning.

New cards

Q: How do you remove stopwords in R?

Use antijoin(stopwords).

New cards

Q: What command counts word frequencies after removing stopwords?

count(word, sort = TRUE).

New cards

Q: What are the three major sentiment datasets mentioned?

AFINN, Bing, and NRC.

New cards

Q: What does the get_sentiments("afinn") function do?

Loads a table mapping words to sentiment scores.

New cards

Q: What does a negative value in AFINN sentiment scores indicate?

A negative or unpleasant sentiment.

New cards

Q: How do you compute average sentiment by line in R?

Unnest tokens ➔ inner join with sentiment ➔ group by line ➔ summarize mean(value).

New cards

Q: What is an example of text for sentiment analysis?

“I hate the dentist”, “I love candy”.

New cards

Q: What is the goal of clustering in text analytics?

Group data points without using labels.

New cards

Q: What clustering algorithm is mentioned?

K-means clustering.

New cards

Q: How does K-means clustering work?

Iteratively reassign points to clusters based on the nearest cluster center.

New cards

Q: What are strengths of K-means clustering?

Simple to compute and easy to explain.

New cards

Q: What are weaknesses of K-means clustering?

Requires choosing K beforehand and only finds convex-shaped clusters