FDA - Data Mining Methods, week 4

0.0(0)
studied byStudied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/13

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 1:51 PM on 1/13/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

14 Terms

1
New cards

Data mining categories - Supervised method

Uses labeled data to predict the output for new, unseen data

→ Example: data= customer financial info, whether they defaulted on a loan

task= predict if new applicant is likely to default

2
New cards

Data mining categories - Unsupervised method

Uses unlabeled data to find hidden patterns, groupings or structures.

→ Example: data= shopping transactions

task= identify patterns like: ‘people who buy bread often buy butter’

3
New cards

Data mining categories - Global method

Creates information applicable to all data.

4
New cards

Data mining categories - Local method

Creates information only applicable to some of the data.

5
New cards

Data mining method - Linear regression, goal

  • Supervised and global

  • Goal= to create a linear model to relate an output variable to one or more input variables

6
New cards

Data mining method - Linear regression, SSD

SSD= Sum of Squared Deviations

  • The lower the SSD of a model the better the model.

→ SSD expresses variation relative to the model. Small SSD shows you that the prediction is close to the real data

  • Always look at the outcome of SSD in relation to the y-value to see if it is large or small

→ small: SSD= 0,8 and y-value= around 10

→ large: SSD= 3,6 and y-value= ranges 0-5

7
New cards

Data mining method - Linear regression, R2

Explains how well the regression line fits the data.

→ - R²= a slope down from left to right

→ + R²= a slope going up from left to right

The closer the R² to 0, the regression line has little to no predictive power.

→ -1 = perfect predictive power

→ -0.2 = little predictive power

→ 1 = perfect predictive power

→ 0.3 = little predictive power

8
New cards

Data mining method - Linear regression, residual plot

Used to check the linear regression plot.

→ Good plot= residual plot centered around 0, no pattern

→ Bad plot= residual plot with patterns, clusters and extreme plots(outliers)

9
New cards

Data mining method - Clustering, goal

  • Unsupervised and global

  • Goal= partition all locations in geographically meaningful groups.

<ul><li><p>Unsupervised and global</p></li><li><p>Goal= partition all locations in geographically meaningful groups.</p></li></ul><p></p>
10
New cards

Data mining method - Clustering, k-means

This is clustering by algorithms.

  1. Pick random centroids of clusters.

  2. Assign points to nearest centroid.

  3. Recompute centroids until clusters are stable

→ The smaller the distance between centroids the better

11
New cards

Data mining method - Decision tree, goal

  • Supervised and global

  • Goal= Learn a tree to separate ‘quit’ and ‘non-quit’ cases.

<ul><li><p>Supervised and global</p></li><li><p>Goal= Learn a tree to separate ‘quit’ and ‘non-quit’ cases.</p></li></ul><p></p>
12
New cards

Data mining method - Decision tree, structure

Branches= the lines in between the ovals and cubes

Nodes= the ovals, the ‘topics’

Tree’s leaves= the cubes, the eventual outcome

→ Branches connect the nodes and the tree’s leaves

13
New cards

Data mining method - Decision tree, confusion matrix

Used to check the quality of the decision tree.

<p>Used to check the quality of the decision tree.</p>
14
New cards

Data mining method - Association rules, goal

  • Unsupervised and local

  • Goal= to find high-confidence associations between frequently occurring (exam outcomes)

<ul><li><p>Unsupervised and local</p></li><li><p>Goal= to find high-confidence associations between frequently occurring (exam outcomes)</p></li></ul><p></p>