Methods of masterclass

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/22

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 11:05 AM on 3/26/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

23 Terms

1
New cards

What is machine learning ?

Machine Learning is about prediction: using data to teach algorithms to predict outcomes they have never seen.

  • We give the computer data and outcomes

  • The algorithm finds patterns by itself

  • It uses these patterns to make predictions on new data.

2
New cards

2 phases of machine learning

  • First: Learn from data

  • Then: Predict outcomes for new inputs

<ul><li><p><strong>First:</strong> Learn from data</p></li><li><p><strong>Then:</strong> Predict outcomes for new inputs</p></li></ul><p></p>
3
New cards

What is the difference between Statistics and Machine Learning?

Doel

  • Statistics: begrijpen hoe dingen samenhangen

  • Machine Learning: zo goed mogelijk voorspellen

Vraag

  • Statistics: wat is het verband tussen X en Y?

  • Machine Learning: als ik X weet, wat is dan Y?

Hoe je het beoordeelt

  • Statistics: kijk naar dingen zoals coëfficiënten en p-waarden

  • Machine Learning: kijk hoe goed het model voorspelt (fout/accuracy)

Aanpak

  • Statistics: werkt met modellen en aannames (bijv. lineair verband)

  • Machine Learning: leert zelf patronen uit data

Stijl

  • Statistics: duidelijk en goed uitlegbaar, maar minder flexibel

  • Machine Learning: flexibel en krachtig, maar vaak moeilijker te begrijpen

👉 Kort samengevat:

  • Statistics = uitleggen waarom iets gebeurt

  • Machine Learning = voorspellen wat er gaat gebeuren

4
New cards

What are the main types of Machine Learning?

Type

Description

Example

Supervised Learning

The model learns from labeled data (data + outcomes).

Classification, Regression

Unsupervised Learning

The model learns from unlabeled data and finds hidden patterns.

Clustering

  • Classification: predict a category

  • Regression: predict a number

  • Clustering: group similar data together

5
New cards

What are features, targets, and training in Machine Learning?

Term

Meaning

Examples

Feature (X)

Independent variable used to make a prediction

Firm size, word counts, pixel values

Target / Label (Y)

Dependent variable the model tries to predict

Management quality, AI vs human, fraud or not

Training

The process of estimating a model by minimizing a loss function

e.g. minimize sum of squared errors

6
New cards

What is the difference between Inference and Prediction?

Doel

  • Inference: begrijpen of X invloed heeft op Y

  • Prediction: Y zo goed mogelijk voorspellen

Focus

  • Inference: kijken wat elke variabele precies doet

  • Prediction: zo klein mogelijke fout maken

Belangrijk

  • Inference: let op problemen zoals verborgen factoren en oorzaak-gevolg

  • Prediction: oorzaak maakt niet uit, zolang de voorspelling goed is

Methodes

  • Inference: meer “strakke” modellen (zoals fixed effects)

  • Prediction: flexibele modellen (mogen complex en niet-lineair zijn)

Voorbeeldvraag

  • Inference: zorgt een grotere firma voor beter management?

  • Prediction: hoe goed kunnen we managementkwaliteit voorspellen?

7
New cards

What is a Regression Tree?

Een regressieboom is een soort beslisboom die een getal probeert te voorspellen.

  • Hij stelt stap voor stap simpele ja/nee-vragen

  • Met elke vraag wordt de groep data opgesplitst in kleinere groepen

  • De groepen worden zo gemaakt dat de waarden binnen elke groep zo veel mogelijk op elkaar lijken

  • Aan het einde (bij de “blaadjes” van de boom) krijg je een voorspelling:
    → dat is gewoon het gemiddelde van die groep

<p>Een <strong>regressieboom</strong> is een soort beslisboom die een getal probeert te voorspellen.</p><ul><li><p>Hij stelt stap voor stap simpele <strong>ja/nee-vragen</strong></p></li><li><p>Met elke vraag wordt de groep data <strong>opgesplitst in kleinere groepen</strong></p></li><li><p>De groepen worden zo gemaakt dat de waarden binnen elke groep zo veel mogelijk op elkaar lijken</p></li><li><p>Aan het einde (bij de “blaadjes” van de boom) krijg je een voorspelling:<br>→ dat is gewoon het <strong>gemiddelde van die groep</strong></p></li></ul><p></p>
8
New cards

Why do we start learning Machine Learning with Regression Trees?

  • Makkelijk te begrijpen en te visualiseren
    → Je kunt het zien als een boom met simpele stappen

  • Werkt als een algoritme
    → De computer zoekt zelf de beste splits (in plaats van formules te schatten zoals bij regressie)

  • Laat belangrijke ML-ideeën zien:

    • Flexibiliteit → kan verschillende patronen leren

    • Overfitting → kan té goed op de trainingsdata passen

    • Cross-validation → helpt om te checken of het model ook goed werkt op nieuwe data

9
New cards

What is the anatomy of a Regression Tree?

  • Node: a yes/no question that splits the data

  • Leaf: the final point where the tree makes a prediction (average outcome)

  • The tree automatically finds the best splits. No need to specify the functional form

10
New cards

How does a Regression Tree work?

  • Bij elke stap probeert de boom veel verschillende splits (bijv. leeftijd < 30?)

  • Hij kiest de split waarbij de groepen zo veel mogelijk op elkaar lijken

  • Hij blijft splitsen totdat hij moet stoppen (bijv. groepen worden te klein)

Voordelen

  • Werkt automatisch met niet-lineaire verbanden (dus geen rechte lijn nodig)

  • Je hoeft interacties niet zelf te bedenken

  • Makkelijk te begrijpen en uit te leggen (je kunt het als een boom tekenen)

11
New cards

What is the difference between a Regression Line and a Regression Tree?

Method

How it works

Linear Regression

Fits one straight line through all the data

Regression Tree

Splits the data into groups and predicts the average outcome in each group

12
New cards

How Good Are Our Predictions?(RMSE (Root Mean Squared Error)

  • RMSE measures the average prediction error of a model

  • It compares the actual value (y) with the predicted value (ŷ)

  • The error is squared, averaged, and then square-rooted

  • The result is in the same units as the outcome (y)

Key idea:

  • Lower RMSE = better predictions

<ul><li><p><strong>RMSE</strong> measures the <strong>average prediction error</strong> of a model</p></li><li><p>It compares the <strong>actual value (y)</strong> with the <strong>predicted value (ŷ)</strong></p></li><li><p>The error is <strong>squared, averaged, and then square-rooted</strong></p></li><li><p>The result is in the <strong>same units as the outcome (y)</strong></p></li></ul><p><strong>Key idea:</strong></p><ul><li><p><strong>Lower RMSE = better predictions</strong></p></li></ul><p></p>
13
New cards

What is a baseline model in prediction?

  • The baseline always predicts the average outcome (ȳ)

  • It ignores all variables

  • If your model cannot beat the baseline, it is useless

  • Always compare models to the baseline as a sanity check

14
New cards

What happens when a Regression Tree becomes more complex (more splits/leaves)?

  • More splits/leaves → the tree captures more detailed patterns in the data

  • This increases model complexity

  • On the training data, RMSE usually keeps decreasing

Problem:

  • A very complex tree may fit noise instead of real patterns

  • This is called overfitting

Key idea:
Better performance on training data does not always mean better predictions on new data.

15
New cards

What is the difference between a training set and a test set?

Dataset

Purpose

Training set (~70%)

Used to train/estimate the model

Test set (~30%)

Not used during training; used to evaluate prediction performance

16
New cards

What does a validation curve show?

  • As model complexity increases, training RMSE keeps decreasing

  • Test RMSE first decreases, then increases

  • When test error increases, the model is overfitting (memorizing noise)

Key idea:


The best model is at the “sweet spot” where test RMSE is lowest, meaning it generalizes best to new data.

17
New cards

What is overfitting?

  • Overfitting happens when a model learns the training data too well, including noise

  • As a result, it performs worse on new (test) data

Key lessons:

  • Training error ≠ true performance

  • Always evaluate models on test data the model has never seen

  • More complex models are not always better 📊

18
New cards

How do we turn text into numbers for a machine learning model?

  1. Split text into words and remove common words (e.g., “the”, “is”, “and”)

  2. Select informative words that appear differently in AI vs. human texts

  3. Create binary indicators:

    • 1 = word appears in the text

    • 0 = word does not appear

Example:
Select 100 words50 AI-signaling + 50 human-signaling.

19
New cards

What is the difference between a Regression Tree and a Classification Tree?

Aspect

Regression Tree

Classification Tree

Prediction

Predicts a number

Predicts a class/category (majority class)

Split criterion

Minimizes prediction error (RMSE)

Makes groups as pure as possible

20
New cards

When do we use a logit (logistic regression) model?

  • When the outcome is binary (0 or 1)

  • The model predicts the probability that Y = 1

  • Uses an S-shaped curve so predictions stay between 0 and 1

  • Often classify as 1 if P > 0.5 📊

21
New cards

What changes when moving from a Regression Tree to a Classification Tree?

  • Prediction: each leaf predicts a class (majority class) instead of a number

  • Split criterion: splits aim to make groups as pure as possible (observations mostly in the same class)

22
New cards

What is entropy in a classification tree?

  • Entropy measures how mixed a group is

  • Low entropy: mostly one class (pure group)

  • High entropy: classes are evenly mixed

<ul><li><p><strong>Entropy measures how mixed a group is</strong></p></li><li><p><strong>Low entropy:</strong> mostly one class (pure group)</p></li><li><p><strong>High entropy:</strong> classes are evenly mixed</p></li></ul><p></p>
23
New cards
term image

Answer: (A) The FPR (50/500 = 10%) tells us what fraction of truly human texts get flagged. Overall accuracy (85%) is misleading here because it was measured on balanced data.

Explore top notes

note
Lecture 6
Updated 1156d ago
0.0(0)
note
Instruments and Ensembles
Updated 1343d ago
0.0(0)
note
the environment
Updated 1205d ago
0.0(0)
note
APWH STUDY GUIDE
Updated 1053d ago
0.0(0)
note
COMECE AQUI
Updated 55d ago
0.0(0)
note
Lecture 6
Updated 1156d ago
0.0(0)
note
Instruments and Ensembles
Updated 1343d ago
0.0(0)
note
the environment
Updated 1205d ago
0.0(0)
note
APWH STUDY GUIDE
Updated 1053d ago
0.0(0)
note
COMECE AQUI
Updated 55d ago
0.0(0)

Explore top flashcards

flashcards
UNIT 3: Periodicity
35
Updated 774d ago
0.0(0)
flashcards
Chapter 20-21 Vocabulary Quiz
35
Updated 1074d ago
0.0(0)
flashcards
OTA 115 Mod 2
32
Updated 1097d ago
0.0(0)
flashcards
Romeo and Juliet Act 1
68
Updated 1069d ago
0.0(0)
flashcards
Unidad 3-La rutina diaria
101
Updated 100d ago
0.0(0)
flashcards
Relative Pronouns
30
Updated 54d ago
0.0(0)
flashcards
UNIT 3: Periodicity
35
Updated 774d ago
0.0(0)
flashcards
Chapter 20-21 Vocabulary Quiz
35
Updated 1074d ago
0.0(0)
flashcards
OTA 115 Mod 2
32
Updated 1097d ago
0.0(0)
flashcards
Romeo and Juliet Act 1
68
Updated 1069d ago
0.0(0)
flashcards
Unidad 3-La rutina diaria
101
Updated 100d ago
0.0(0)
flashcards
Relative Pronouns
30
Updated 54d ago
0.0(0)