1/13
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is topic modeling?
Topic modeling is a method for automatically discovering the abstract "topics" that occur in a collection of documents.
What is Latent Dirichlet Allocation (LDA)?
LDA is a probabilistic topic model that assumes documents are mixtures of topics, and topics are mixtures of words.
Can a document have more than one topic in LDA?
Yes, each document can be represented as a distribution over multiple topics.
What is a key assumption in LDA about how documents are created?
Each document is generated by choosing a number of words, a mixture of topics, then choosing a topic for each word and finally a word based on that topic.
What type of distributions are used in LDA?
Dirichlet distribution for topic mixtures in documents, and multinomial distributions for word selection in topics and topic selection in documents.
What does LDA output?
LDA outputs topic distributions for each document and word distributions for each topic, but does not assign titles to topics.
What is a multinomial distribution?
A probability distribution describing outcomes of multi-class experiments (like how many times each word appears).
How does LDA differ from clustering?
Clustering assigns one topic (or cluster) per document; LDA allows multiple topics per document.
What Python libraries are used for topic modeling?
Gensim (fast, scalable), scikit-learn (for small datasets), PyLDAvis (for visualization), and LDA (lightweight implementation).
Give an example of a topic distribution output from LDA.
Document 1: 10% Topic 3, 60% Topic 7, 30% Topic 5; where each topic is a distribution of words.
What is a document-level topic distribution in LDA?
It’s the probability distribution of topics within a single document, showing what proportion of the document is about each topic.
What is a topic-level word distribution in LDA?
It’s the probability distribution of words within a topic, indicating how likely each word is to appear if that topic is selected.
What kind of distributions are both document-level and topic-level in LDA?
Both are modeled using multinomial distributions.