1/39
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
K-means clustering is an example of:
Unsupervised learning (because it operates on data that has no predefined labels or targets)
Clustering groups data based on:
Similarity
K-means works best with:
Numerical data
Distance commonly used in K-means:
Euclidean
Centroid represents:
Cluster center (average/mean of all points within a cluster)
K-means output includes:
Cluster assignments
K-means is sensitive to:
Initialization (initial placement of centroids
K must be:
Predefined
WSS (Within-Cluster Sum of Squares) measures:
Cluster compactness
Elbow method helps:
Choose K
Association rules are used to:
Find relationships or patterns within large datasets
Apriori algorithm (association rule learning) works on:
Itemsets
Support measures:
Frequency (how frequently an itemset appears in a data set)
Confidence measures:
Conditional probability (if this then that)
Lift measures:
Independence
Frequent itemset means:
Appears often
Apriori property states:
Subsets are frequent
(all non-empty subsets of a frequent itemset must also be frequent.)
Classification is:
Supervised learning
Regression predicts:
Continuous Values
Logistic regression predicts:
Probability
Naive Bayes is:
a Classifier
(a supervised machine learning algorithm commonly used for classification tasks like spam filtering, document categorization, and sentiment analysis)
Decision tree is:
a Classifier
(to predict categorical target variables.)
Time series analysis studies:
Data over time
Text analysis uses:
TF-IDF (Term Frequency-Inverse Document Frequency)
Data analytics lifecycle includes:
Model planning
Model planning determines:
Method choice (the type of model/algorithm to be used)
Clustering does NOT:
Predict labels
Association rules example:
Market basket
Support threshold filters:
Frequent itemsets
(Filters out itemsets that are not frequent, keeping only those that meet the minimum support level.)
Confidence threshold filters:
Rules
(filter association rules, keeping only those that meet a minimum confidence level.)
Lift > 1 indicates:
Positive association
Leverage measures:
Difference
(difference between the observed co-occurrence of items and what would be expected if they were independent.)
K-means assumes clusters are:
Round
K-means limitation:
Not for categorical data
Cluster separation evaluated by:
Visualization
Outliers affect:
K-means strongly (because it uses the mean to calculate cluster centroids)
Apriori grows itemsets:
Iteratively (finds 1, uses 1 for 2, etc)
Time series detects:
Trends and seasonality
Text representation includes:
Bag of words (BoW): converts text into numerical vectors
TF-IDF (Term Frequency-Inverse Document Frequency) measures:
Term importance (evaluate how important or relevant a word is to a document in a collection or corpus)