1/16
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Q: What are some applications of text mining?
Spam filters, search engine relevancy, social media summarization, essay grading, author attribution, AI-written news stories.
Q: What format is used for tidy text mining in R?
Tidy text format using tibbles.
Q: What function in R splits text into individual words?
unnest_tokens() from the tidytext package.
Q: What does unnest_tokens(word, text) do?
Breaks each line of text into separate words.
Q: What is a stopword in text mining?
A common word like “the” or “and” that is usually removed because it carries little meaning.
Q: How do you remove stopwords in R?
Use antijoin(stopwords).
Q: What command counts word frequencies after removing stopwords?
count(word, sort = TRUE).
Q: What are the three major sentiment datasets mentioned?
AFINN, Bing, and NRC.
Q: What does the get_sentiments("afinn") function do?
Loads a table mapping words to sentiment scores.
Q: What does a negative value in AFINN sentiment scores indicate?
A negative or unpleasant sentiment.
Q: How do you compute average sentiment by line in R?
Unnest tokens ➔ inner join with sentiment ➔ group by line ➔ summarize mean(value).
Q: What is an example of text for sentiment analysis?
“I hate the dentist”, “I love candy”.
Q: What is the goal of clustering in text analytics?
Group data points without using labels.
Q: What clustering algorithm is mentioned?
K-means clustering.
Q: How does K-means clustering work?
Iteratively reassign points to clusters based on the nearest cluster center.
Q: What are strengths of K-means clustering?
Simple to compute and easy to explain.
Q: What are weaknesses of K-means clustering?
Requires choosing K beforehand and only finds convex-shaped clusters