FDA - Data Mining Methods, week 4

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/13

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

14 Terms

1
New cards

Data mining categories - Supervised method

Uses labeled data to predict the output for new, unseen data

→ Example: data= customer financial info, whether they defaulted on a loan

task= predict if new applicant is likely to default

2
New cards

Data mining categories - Unsupervised method

Uses unlabeled data to find hidden patterns, groupings or structures.

→ Example: data= shopping transactions

task= identify patterns like: ‘people who buy bread often buy butter’

3
New cards

Data mining categories - Global method

Creates information applicable to all data.

4
New cards

Data mining categories - Local method

Creates information only applicable to some of the data.

5
New cards

Data mining method - Linear regression, goal

  • Supervised and global

  • Goal= to create a linear model to relate an output variable to one or more input variables

6
New cards

Data mining method - Linear regression, SSD

SSD= Sum of Squared Deviations

  • The lower the SSD of a model the better the model.

→ SSD expresses variation relative to the model. Small SSD shows you that the prediction is close to the real data

  • Always look at the outcome of SSD in relation to the y-value to see if it is large or small

→ small: SSD= 0,8 and y-value= around 10

→ large: SSD= 3,6 and y-value= ranges 0-5

7
New cards

Data mining method - Linear regression, R2

Explains how well the regression line fits the data.

→ - R²= a slope down from left to right

→ + R²= a slope going up from left to right

The closer the R² to 0, the regression line has little to no predictive power.

→ -1 = perfect predictive power

→ -0.2 = little predictive power

→ 1 = perfect predictive power

→ 0.3 = little predictive power

8
New cards

Data mining method - Linear regression, residual plot

Used to check the linear regression plot.

→ Good plot= residual plot centered around 0, no pattern

→ Bad plot= residual plot with patterns, clusters and extreme plots(outliers)

9
New cards

Data mining method - Clustering, goal

  • Unsupervised and global

  • Goal= partition all locations in geographically meaningful groups.

<ul><li><p>Unsupervised and global</p></li><li><p>Goal= partition all locations in geographically meaningful groups.</p></li></ul><p></p>
10
New cards

Data mining method - Clustering, k-means

This is clustering by algorithms.

  1. Pick random centroids of clusters.

  2. Assign points to nearest centroid.

  3. Recompute centroids until clusters are stable

→ The smaller the distance between centroids the better

11
New cards

Data mining method - Decision tree, goal

  • Supervised and global

  • Goal= Learn a tree to separate ‘quit’ and ‘non-quit’ cases.

<ul><li><p>Supervised and global</p></li><li><p>Goal= Learn a tree to separate ‘quit’ and ‘non-quit’ cases.</p></li></ul><p></p>
12
New cards

Data mining method - Decision tree, structure

Branches= the lines in between the ovals and cubes

Nodes= the ovals, the ‘topics’

Tree’s leaves= the cubes, the eventual outcome

→ Branches connect the nodes and the tree’s leaves

13
New cards

Data mining method - Decision tree, confusion matrix

Used to check the quality of the decision tree.

<p>Used to check the quality of the decision tree.</p>
14
New cards

Data mining method - Association rules, goal

  • Unsupervised and local

  • Goal= to find high-confidence associations between frequently occurring (exam outcomes)

<ul><li><p>Unsupervised and local</p></li><li><p>Goal= to find high-confidence associations between frequently occurring (exam outcomes)</p></li></ul><p></p>