1/37
Mark lecture 1
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is unsupervised learning?
Learning from data without labels, where algorithms find patterns, structures, or relationships on their own.
What are the main types of unsupervised learning?
Clustering (grouping similar items), dimensionality reduction (simplifying data), and anomaly detection (finding outliers).
How does unsupervised learning differ from supervised learning?
Supervised learning uses labeled data to make predictions, while unsupervised learning discovers patterns without any predefined answers.
What is Principal Component Analysis (PCA)?
A technique that reduces data dimensions by finding new variables (principal components) that capture the most variation in the data.
What problem does PCA solve?
It simplifies complex data with many variables by creating fewer new variables that still preserve most of the important information.
What are principal components?
New variables created by combining the original variables, ordered by how much data variation they capture.
What is variance in statistics?
A measure of how spread out data points are from their average value.
What is covariance?
A measure of how two variables change together - positive when they move in the same direction, negative when they move in opposite directions.
What is correlation?
A scaled version of covariance that ranges from -1 to +1, making it easier to interpret how strongly variables are related.
How are correlation and covariance related?
Correlation equals covariance divided by the product of standard deviations, putting the relationship on a standard scale (-1 to +1).
What's the first step in performing PCA?
Standardize the data by centering variables at zero and scaling them to have equal variance.
How are principal components ordered?
By the amount of variance they explain - the first component explains the most variance, the second explains the second most, and so on.
What do the weights in a principal component tell us?
They show how much each original variable contributes to that component, with larger values (positive or negative) indicating stronger influence.
How many principal components should you keep?
Enough to explain a sufficient amount of variance (often 80-90%) or by looking for an "elbow" in the scree plot.
How can PCA be used for data visualization?
By reducing data to 2 or 3 dimensions, allowing us to plot and visually explore relationships in originally high-dimensional data.
How does PCA help with data compression?
It represents data using fewer variables (components) while preserving most of the important information.
What was revealed in the drug use study example using PCA?
It found that legal substances had positive weights while illegal substances had negative weights, revealing two distinct patterns of student behavior.
How can PCA improve machine learning models?
By removing noise, reducing overfitting, speeding up training, and breaking down correlations between features.
What is Hebbian learning?
A neural learning rule stating "neurons that fire together, wire together," which provides an algorithmic approach to find correlational patterns.
What is the basic Hebbian update rule?
Change in weight equals learning rate times input activation times output activation (Δw = η × x × y).
How is Hebbian learning related to PCA?
With linear activation and proper normalization, Hebbian learning can implement PCA, finding the same principal components.
What's the main limitation of basic Hebbian learning?
It can only find the first principal component (direction of maximum variance) without additional modifications.
What is Sequential PCA (SPCA)?
A technique that uses multiple output units to learn multiple principal components in sequence, from most to least important.
How does Sanger's rule differ from basic Hebbian learning?
It includes an extra term that subtracts the influence of previously learned components, allowing each unit to learn a new component.
Why would you need multiple principal components?
One component usually isn't enough to capture all important variation in the data; multiple components provide a more complete picture.
How can multiple principal components be used in image compression?
By representing images using just the top components, significantly reducing file size while maintaining most visual information.
What is t-SNE?
A more advanced technique for visualizing high-dimensional data that preserves local structure better than PCA.
How does t-SNE differ from PCA?
t-SNE is non-linear and focuses on keeping similar points close together, while PCA is linear and focuses on maximum variance.
When would you use t-SNE instead of PCA?
When you care more about seeing clear clusters and local relationships than preserving global structure or exact distances.
How can t-SNE help visualize image data?
It can place similar images near each other in a 2D map, making it easy to see patterns and relationships in large image collections.
How can PCA help with clustering?
By reducing dimensions, removing noise, and making distance calculations more meaningful, which often leads to better cluster separation.
How can PCA provide insight into cluster formation?
It can reveal the underlying factors that explain why certain data points cluster together.
Why use both PCA and clustering together?
PCA simplifies the data while preserving important information, and clustering then groups similar data points based on these simplified features.
What can you learn by coloring PCA plots by known categories?
You can see if the principal components naturally separate the categories, indicating they've captured meaningful variations.
Why is data standardization important before PCA?
Without standardization, variables with larger scales will dominate the principal components regardless of their actual importance.
What is a scree plot in PCA?
A graph showing the variance explained by each principal component, used to decide how many components to keep.
What are PCA's limitations?
It only captures linear relationships, can be difficult to interpret, and is sensitive to outliers.
How can you determine if PCA is capturing useful information?
By examining how much variance is explained, if components reveal meaningful patterns, and if they improve downstream tasks like classification.