1/6
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are the 3 main data science problems?
Regression
Clustering
Classification
What is regression?
Finds a relationship between different data points
Trend between 2 vars measured in same environment
Ex: finding rate of spread of diseases (new cases vs. time)
What is clustering?
Method of grouping closely related data
Used to find patterns and anomalies in data
Ex: group customers based on purchase behavior
What is classification?
Sorting of data into specific categories or groups
Train machine learning model by labeling data into desired categories
Ex: sort products as popular or unpopular
Precision and Recall Example
What classification algorithm do we use and what are advantages/disadvantages?
K Nearest Neighbors (KNN)
finds desired number of closest neighbors and assigns the classification corresponding to the most common one returned
advantages: no training period, easy implementation
disadvantages: difficult to scale for large data sets, sensitive to missing and noisy data
What clustering algorithm do we use and what does it do?
K-Means
divides data into clusters by minimizing the distance between each point and the cluster’s centroid
after all assignments done it recalculates the centroid and repeats the process until it finds true centroid