ML Week 9

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/67

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 11:19 AM on 5/3/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

68 Terms

1
New cards

What is the main difference between supervised and unsupervised learning?

The main difference is the presence or absence of labelled training data.

2
New cards

What does K-means clustering do?

It groups the data into k clusters based on similar features and common patterns.

3
New cards

What is the purpose of customer segmentation in K-means clustering?

To divide the customer base into groups that share similarities relevant to marketing.

4
New cards

What does 'k' represent in K-means clustering?

The number of clusters the algorithm is set to find, which is a user-defined parameter.

5
New cards

What is a cluster in K-means clustering?

A group of observations that are similar to each other in feature space.

6
New cards

What is cluster membership in K-means clustering?

It indicates which cluster a data point belongs to after running the clustering algorithm.

7
New cards

What is a cluster centroid?

The center of a cluster, calculated as the mean of all points assigned to that cluster.

8
New cards

What is the first step in the K-means clustering algorithm?

Randomly choose k points from the existing data points.

9
New cards

What is the second step in the K-means clustering algorithm?

Measure the distance from each data point to each centroid.

10
New cards

What formula is used to calculate the distance in K-means clustering?

d = |𝑥1 −𝑥2|² + |𝑦1 −𝑦2|².

11
New cards

What happens in the third step of the K-means clustering process?

Assign each data point to the cluster of the closest centroid.

12
New cards

What is the fourth step in the K-means clustering algorithm?

Compute the new centroids as the mean of the points in each cluster.

13
New cards

What is hyperparameter tuning in the context of machine learning?

The process of optimizing the parameters that govern the learning process.

14
New cards

Why is selecting the value of k important in K-means clustering?

It determines how many clusters the algorithm will create, affecting the results.

15
New cards

What is the role of the centroid in K-means clustering?

It serves as the representative point of a cluster, guiding the assignment of data points.

16
New cards

What is the outcome of K-means clustering?

A partitioning of the data into k distinct clusters based on similarity.

17
New cards

What is a common application of K-means clustering?

Customer segmentation for marketing purposes.

18
New cards

What does the term 'feature space' refer to in K-means clustering?

The multidimensional space defined by the features of the data points.

19
New cards

How does K-means clustering handle unlabelled data?

By grouping it into clusters based on feature similarity without prior labels.

20
New cards

What is the significance of the mean in calculating cluster centroids?

It provides the average position of all points in the cluster, representing its center.

21
New cards

What is the iterative nature of K-means clustering?

The algorithm repeatedly assigns points to clusters and recalculates centroids until convergence.

22
New cards

What is the expected outcome after multiple iterations of K-means clustering?

Stabilization of cluster assignments and centroids, leading to a final clustering solution.

23
New cards

What challenges can arise in K-means clustering?

Choosing the right value of k and sensitivity to initial centroid placement.

24
New cards

What is the core idea of K-means clustering?

Repeat the process of computing centroids and assigning points to clusters until the cluster centers no longer change or the maximum number of iterations is reached.

25
New cards

How is the centroid of a cluster calculated?

The centroid is calculated as the mean of the points in the cluster: (x1 + x2 + ... + xm) / m, (y1 + y2 + ... + ym) / m.

26
New cards

What does WCSS stand for in K-means clustering?

WCSS stands for Within-Cluster Sum of Squares.

27
New cards

What is the purpose of the Elbow method in selecting K?

To identify the optimal number of clusters by finding the point where WCSS stops decreasing rapidly.

28
New cards

What are hyperparameters?

Hyperparameters are settings used to control the training process and structure of the model, set before training and not learned from the data.

29
New cards

What is hyperparameter tuning?

The process of finding the optimal hyperparameter values for a learning algorithm to maximize model performance and minimize loss.

30
New cards

What is the difference between parameters and hyperparameters?

Parameters are learned from the data during training, while hyperparameters are set by the user before training.

31
New cards

What does the tuning procedure involve?

Defining hyperparameters, checking their values, performing tuning, and selecting the best hyperparameter values.

32
New cards

What are the two common strategies for hyperparameter tuning?

Grid search and random search.

33
New cards

How does grid search work?

It examines all combinations of predefined hyperparameter values.

34
New cards

How does random search differ from grid search?

Random search selects points randomly within the search space, allowing exploration of potentially better combinations.

35
New cards

What is the significance of the maximum depth hyperparameter?

It controls how deep a decision tree can grow, affecting the model's complexity and performance.

36
New cards

What is the main goal of hyperparameter tuning?

To achieve the best generalization performance on unseen data.

37
New cards

What is meant by 'no absolute or single correct value of K'?

The optimal value of K can vary based on statistical heuristics, domain knowledge, and interpretability.

38
New cards

What is the impact of increasing K on WCSS?

As K increases, WCSS always decreases, but the rate of improvement diminishes.

39
New cards

What is the formula for calculating WCSS?

WCSS is the sum of squared distances between each point and its cluster centroid.

40
New cards

What is the purpose of defining hyperparameters before training?

To control how the model learns and to guide the training process.

41
New cards

What is the role of domain knowledge in selecting K?

Domain knowledge helps in interpreting the results and choosing a K that makes sense for the specific application.

42
New cards

What is the significance of WCSS in K-means clustering?

It measures the compactness of the clusters, with lower values indicating better clustering.

43
New cards

What does the term 'fine-grained clusters' refer to?

Clusters that reveal detailed patterns in the data, as opposed to coarse clusters that may generalize too much.

44
New cards

What does the term 'coarse clusters' refer to?

Clusters that represent broader, less detailed groupings in the data.

45
New cards

What is the importance of model interpretability in machine learning?

It helps stakeholders understand how predictions are made and the factors influencing them.

46
New cards

What is the relationship between hyperparameters and model performance?

The choice of hyperparameters directly affects how well the model learns and performs on unseen data.

47
New cards

What is LSOA?

Lower Layer Super Output Area, a geographic area used for statistical purposes in the UK.

48
New cards

What is the significance of predicting gas and electricity consumption?

It helps in planning infrastructure and energy supply for regions, and informs energy efficiency policies.

49
New cards

What is feature engineering?

The process of using domain knowledge to select, modify, or create features that make machine learning algorithms work better.

50
New cards

What are the target variables for the energy consumption prediction?

Total consumption and mean/median consumption at the LSOA level.

51
New cards

What is the purpose of exploratory data analysis (EDA)?

To analyze data sets to summarize their main characteristics, often using visual methods.

52
New cards

What is the significance of hyperparameter tuning?

It optimizes the performance of machine learning models by finding the best parameters for the algorithms.

53
New cards

What is cross-validation?

A technique for assessing how the results of a statistical analysis will generalize to an independent data set.

54
New cards

What are some machine learning models suggested for training?

Support Vector Regression, Decision Tree Regression, Linear Regression, AdaBoost, Random Forest.

55
New cards

What is the purpose of feature importance analysis?

To determine which features contribute the most to the predictions made by the model.

56
New cards

What is the recommended approach for merging datasets?

Use pd.merge() to combine datasets based on common keys.

57
New cards

What should be done with missing data during data processing?

Check for missing data, remove or fill it using methods like mean or median.

58
New cards

What is the role of spatial distribution in the analysis?

To visualize and understand the geographic patterns of energy consumption.

59
New cards

What is a potential feature to consider for energy consumption prediction?

Income estimates at the MSOA level.

60
New cards

What is the significance of using multiple models?

To compare their performance and select the best one based on accuracy and interpretability.

61
New cards

What is the suggested method for presenting model performance?

Use tables and graphs to summarize the prediction performance of the models.

62
New cards

What does the term 'model generalization performance' refer to?

The ability of a model to perform well on unseen data after being trained.

63
New cards

What is the purpose of calculating R2 and MSE in model evaluation?

To assess the accuracy of the predictions made by the models.

64
New cards

What is the rationale behind selecting specific models for the research?

Based on their suitability for the data and the prediction task at hand.

65
New cards

What is K Means Clustering commonly used for?

It is often used in customer segmentation in marketing to group similar individuals.

66
New cards

What is an application of K Means Clustering outside of marketing?

Finding clusters of residential areas.

67
New cards

What does 'max depth' refer to in hyperparameter tuning?

It refers to the maximum depth of a decision tree.

68
New cards

What does 'N_estimators' indicate in the context of decision trees?

It refers to the number of decision trees in a model.