Lesson 8 - Data Science

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/6

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

7 Terms

1
New cards

What are the 3 main data science problems?

  • Regression

  • Clustering

  • Classification

2
New cards

What is regression?

  • Finds a relationship between different data points

  • Trend between 2 vars measured in same environment

  • Ex: finding rate of spread of diseases (new cases vs. time)

3
New cards

What is clustering?

  • Method of grouping closely related data

  • Used to find patterns and anomalies in data

  • Ex: group customers based on purchase behavior

4
New cards

What is classification?

  • Sorting of data into specific categories or groups

  • Train machine learning model by labeling data into desired categories

  • Ex: sort products as popular or unpopular

5
New cards

Precision and Recall Example

knowt flashcard image
6
New cards

What classification algorithm do we use and what are advantages/disadvantages?

K Nearest Neighbors (KNN)

  • finds desired number of closest neighbors and assigns the classification corresponding to the most common one returned

  • advantages: no training period, easy implementation

  • disadvantages: difficult to scale for large data sets, sensitive to missing and noisy data

7
New cards

What clustering algorithm do we use and what does it do?

K-Means

  • divides data into clusters by minimizing the distance between each point and the cluster’s centroid

  • after all assignments done it recalculates the centroid and repeats the process until it finds true centroid