Data Mining S7

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/10

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

11 Terms

1
New cards

SMART

Spécifique, Mesurable, Atteignable, Réaliste, Temporellement défini

2
New cards

Big Data 3V

Velocity, Volume, Variety

3
New cards

Method

  1. Business Understanding (only for CRISP-DM)
    2. Data Understanding
    3. Data selection and prep(+transfo) (SEMMA Sample and touch datas Modify)
    4. Model (KDD ouvert à tous)
    5. Evaluate
    6. Implement (only for CRISP-DM)

4
New cards

Preprocessing

Clean, Selection, Transformation, Feature Engineering, Dim reduc and web scraping

5
New cards

support

freq of a rule over data

6
New cards

confiance

freq of a rule over freq of antecedent

7
New cards

lift

prob of confidence of rule over proba of consequence

confidence of a rule over support of antecedent

8
New cards

KDD, SEMMA, CRSIP-DM

{Recherche et decouverte de modèle}, {SAS, focus exploration and modelization}, {Business,flexible,itératif}

9
New cards

Data Mining tools

R, SAS, WEKA, Orange, SQL Server Data Mining, KNIME, RAPID MINER, ORACLE Mining CORP, Google COLLAB avec Python

10
New cards

QQ Plot

Quantile Quantile plot, to see if data follow a distrib; take a quantile of data and quantile of theoretical distrib and plot them; if it fit a 45° line data follows distrib; quantile is $F^{-1}$ (with proba gives a value)

11
New cards

Stepwise : Forward, backward, stepwise(combined)

  1. On one feature, incorpore features, fit a model and test criterion one by one to remove backstep
    2. On all features, remove feature, fit a model and test criterion one by one to remove backstep
    3. On empty or full model, forward or backward one by one to remove backstep