Topic 5 - fundamental of data science

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/29

flashcard set

Earn XP

Description and Tags

Comprehensive practice flashcards covering tidy data, bias, visualization, linear regression, probability distributions, hypothesis testing, and common R language statistical functions.

Last updated 7:36 AM on 6/3/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

30 Terms

1
New cards

What is Tidy Data? 🤔

A neat structure where each column is a variable (like price or weight) and each row is one observation (like one listing). 🗂️

2
New cards

How do we define Qualitative (Categorical) Variables? 🧐

These are variables without meaningful numbers (e.g., room type), and we summarize them using counts or proportions. They’re often shown using cute bar charts! 📊

3
New cards

What are Quantitative (Numerical) Variables? 📈

These are meaningful numbers (like price and age), shown using mean, SD, or median, and visualized with histograms or boxplots. Think of it as solid data you can count on! 😄

4
New cards

What does Bias (Systematic Error) mean? 🕵️‍♂️

An error consistently leaning in one direction, like a pirate always steering left! It includes sampling bias, response bias, and non-response bias. 🏴‍☠️

5
New cards

What is Sampling Bias? 🎯

It happens when your sample doesn’t reflect the whole population, like surveying cats about their favorite dogs! 🐱➡️🐶

6
New cards

What is Response Bias? 🗳️

Here, a poorly-worded question leads people to say something they don’t mean, like asking if they love broccoli when they really don’t! 🥦❌

7
New cards

What is Non-response Bias? ❓

Occurs when certain types of people don’t respond, like missing out on grandmas in online surveys! 🧓💻

8
New cards

How can we write the Measurement Error Formula? 🔍

It’s Individual measurement=exact value+chance error+bias\text{Individual measurement} = \text{exact value} + \text{chance error} + \text{bias}. Think of it as detective work on your measurements! 🕵️‍♀️

9
New cards

What is Chance Error? 🤷‍♂️

These are random fluctuations you can figure out by repeating measurements and calculating the Standard Deviation (SD). Consider it the oops factor! 🤪

10
New cards

What do we know about Standard Deviation (SD)? 🎢

It shows distance and is always 0\text{≥} 0; like a hike up a mountain, it can’t go down! ⛰️

11
New cards

What is the Mean Transformation formula? 📊

It goes like this: New Mean=a+b×(old mean)\text{New Mean} = a + b \times (\text{old mean}). Imagine adjusting your favorite recipe! 🍰

12
New cards

How does SD Transformation work? 🎉

It’s New SD=b×(old SD)\text{New SD} = |b| \times (\text{old SD}); adding a just shifts it without changing the spread. Like changing the pizza topping without altering the slice size! 🍕

13
New cards

What does the Correlation Coefficient (r) tell us? 📏

It’s a value from 1-1 to +1+1 showing the strength/direction of a linear association. Think of it like the friendship meter! 👯

14
New cards

What is the Regression Line equation? 📉

It’s yˆ=a+bx\text{ŷ} = a + bx; this line tries to minimize the discrepancy! Think of it as finding the best fit for your outfit! 👗👖

15
New cards

What is the Slope (b) of the Regression Line? 🤔

It shows how much yy changes for every 1-unit increase in xx; b=r×SDySDxb = r \times \frac{SD_y}{SD_x}. Think of it as how you’ll grow taller each year! 📏

16
New cards

What does R2R^2 (Coefficient of Determination) express? 📊

It tells us the percentage of variation in yy explained by xx. Think of it like your study time explaining your grade outcome! 🎓

17
New cards

What is a Residual? 📉

It’s Residual=ActualPredicted\text{Residual} = \text{Actual} - \text{Predicted}; fun fact: the mean of residuals is always zero! Like when surprises don’t balance out! 🎲

18
New cards

What is RMS Error? 📏

It’s the Standard Deviation (SD) of the residuals – basically giving you a fun summary of how well your predictions are doing! 🙌

19
New cards

What does the Normal Distribution N(μ, σ²) Empirical Rule say? 📐

68\text{%} of data is within ±1SD\text{±}1 SD, 95\text{%} within ±2SD\text{±}2 SD, and 99.7\text{%} within ±3SD\text{±}3 SD. It’s like the data party where most show up early! 🎉

20
New cards

How do we use R function: pnorm(x, mean, sd)? 📊

This function finds P(Xx)P(X ≤ x) for a normal distribution; remember, the 3rd argument is SD, not variance. Think of it as your calculator buddy! 🖥️

21
New cards

What does the Binomial Formula P(X=k)P(X = k) provide us? 📈

It’s (nk)×pk×(1p)nk\binom{n}{k} \times p^k \times (1-p)^{n-k}. Picture it as breaking down choices like your snack options! 🍿

22
New cards

What is the Central Limit Theorem (CLT)? 🎇

It says the sample sum or mean becomes Normal as the sample size grows, like gathering friends for a party! 🎈

23
New cards

What are the Sample Sum Expected Value (EV) and Standard Error (SE) formulas? 🧮

EV=n×μEV = n \times μ and SE=n×σSE = ∑n \times σ. Think of them as your homework values and confidence levels! 📚

24
New cards

What are the Sample Mean Expected Value (EV) and Standard Error (SE) formulas? 📏

EV=μEV = μ and SE=σnSE = \frac{σ}{∑n}. These are like your average scores to keep you on track! 🎯

25
New cards

What is Prosecutor's Fallacy? ⚖️

The error of confusing P(evidenceinnocent)P(\text{evidence} | \text{innocent}) with P(innocentevidence)P(\text{innocent} | \text{evidence}). It’s like mixing up your friends' stories! 🗣️

26
New cards

What does a P-value represent? 🧙‍♂️

It’s the probability of seeing data as extreme as observed if the null hypothesis (H0H_0) is true. Think of it as your luck factor in magic tricks! 🎩

27
New cards

What is a Chi-squared Test of Independence? ❓

A test where H0H_0 states that two categorical variables are independent. It’s like checking if two games can be played without affecting each other! 🎮🎲

28
New cards

What is the Confidence Interval (CI) Hypothesis Testing rule? 📏

If H0H_0 value is inside the CI, keep it; if not, reject it. It’s like checking if your favorite spot is still open! 🏞️

29
New cards

What is Homoscedasticity? 🎢

A condition in residual plots where they scatter randomly and have consistent spread around zero; think of it as a fun fair ride that stays steady! 🎠

30
New cards

What is Extrapolation? ⚠️

The mistake of predicting values outside the original data range fitted by a regression model. It’s like guessing how popular a new snack might be without trying it! 🍿📉