Topic 5 - fundamental of data science

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/29

Earn XP

Description and Tags

Comprehensive practice flashcards covering tidy data, bias, visualization, linear regression, probability distributions, hypothesis testing, and common R language statistical functions.

Last updated 7:36 AM on 6/3/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

30 Terms

New cards

What is Tidy Data? 🤔

A neat structure where each column is a variable (like price or weight) and each row is one observation (like one listing). 🗂️

New cards

How do we define Qualitative (Categorical) Variables? 🧐

These are variables without meaningful numbers (e.g., room type), and we summarize them using counts or proportions. They’re often shown using cute bar charts! 📊

New cards

What are Quantitative (Numerical) Variables? 📈

These are meaningful numbers (like price and age), shown using mean, SD, or median, and visualized with histograms or boxplots. Think of it as solid data you can count on! 😄

New cards

What does Bias (Systematic Error) mean? 🕵️‍♂️

An error consistently leaning in one direction, like a pirate always steering left! It includes sampling bias, response bias, and non-response bias. 🏴‍☠️

New cards

What is Sampling Bias? 🎯

It happens when your sample doesn’t reflect the whole population, like surveying cats about their favorite dogs! 🐱➡️🐶

New cards

What is Response Bias? 🗳️

Here, a poorly-worded question leads people to say something they don’t mean, like asking if they love broccoli when they really don’t! 🥦❌

New cards

What is Non-response Bias? ❓

Occurs when certain types of people don’t respond, like missing out on grandmas in online surveys! 🧓💻

New cards

How can we write the Measurement Error Formula? 🔍

It’s $\text{Individual measurement} = \text{exact value} + \text{chance error} + \text{bias}$ . Think of it as detective work on your measurements! 🕵️‍♀️

New cards

What is Chance Error? 🤷‍♂️

These are random fluctuations you can figure out by repeating measurements and calculating the Standard Deviation (SD). Consider it the oops factor! 🤪

New cards

What do we know about Standard Deviation (SD)? 🎢

It shows distance and is always $\text{≥} 0$ ; like a hike up a mountain, it can’t go down! ⛰️

New cards

What is the Mean Transformation formula? 📊

It goes like this: $\text{New Mean} = a + b \times (\text{old mean})$ . Imagine adjusting your favorite recipe! 🍰

New cards

How does SD Transformation work? 🎉

It’s $\text{New SD} = |b| \times (\text{old SD})$ ; adding a just shifts it without changing the spread. Like changing the pizza topping without altering the slice size! 🍕

New cards

What does the Correlation Coefficient (r) tell us? 📏

It’s a value from $-1$ to $+1$ showing the strength/direction of a linear association. Think of it like the friendship meter! 👯

New cards

What is the Regression Line equation? 📉

It’s $\text{ŷ} = a + bx$ ; this line tries to minimize the discrepancy! Think of it as finding the best fit for your outfit! 👗👖

New cards

What is the Slope (b) of the Regression Line? 🤔

It shows how much $y$ changes for every 1-unit increase in $x$ ; $b = r \times \frac{SD_y}{SD_x}$ . Think of it as how you’ll grow taller each year! 📏

New cards

What does $R^2$ (Coefficient of Determination) express? 📊

It tells us the percentage of variation in $y$ explained by $x$ . Think of it like your study time explaining your grade outcome! 🎓

New cards

What is a Residual? 📉

It’s $\text{Residual} = \text{Actual} - \text{Predicted}$ ; fun fact: the mean of residuals is always zero! Like when surprises don’t balance out! 🎲

New cards

What is RMS Error? 📏

It’s the Standard Deviation (SD) of the residuals – basically giving you a fun summary of how well your predictions are doing! 🙌

New cards

What does the Normal Distribution N(μ, σ²) Empirical Rule say? 📐

68\text{%} of data is within $\text{±}1 SD$ , 95\text{%} within $\text{±}2 SD$ , and 99.7\text{%} within $\text{±}3 SD$ . It’s like the data party where most show up early! 🎉

New cards

How do we use R function: pnorm(x, mean, sd)? 📊

This function finds $P(X ≤ x)$ for a normal distribution; remember, the 3rd argument is SD, not variance. Think of it as your calculator buddy! 🖥️

New cards

What does the Binomial Formula $P(X = k)$ provide us? 📈

It’s $\binom{n}{k} \times p^k \times (1-p)^{n-k}$ . Picture it as breaking down choices like your snack options! 🍿

New cards

What is the Central Limit Theorem (CLT)? 🎇

It says the sample sum or mean becomes Normal as the sample size grows, like gathering friends for a party! 🎈

New cards

What are the Sample Sum Expected Value (EV) and Standard Error (SE) formulas? 🧮

$EV = n \times μ$ and $SE = ∑n \times σ$ . Think of them as your homework values and confidence levels! 📚

New cards

What are the Sample Mean Expected Value (EV) and Standard Error (SE) formulas? 📏

$EV = μ$ and $SE = \frac{σ}{∑n}$ . These are like your average scores to keep you on track! 🎯

New cards

What is Prosecutor's Fallacy? ⚖️

The error of confusing $P(\text{evidence} | \text{innocent})$ with $P(\text{innocent} | \text{evidence})$ . It’s like mixing up your friends' stories! 🗣️

New cards

What does a P-value represent? 🧙‍♂️

It’s the probability of seeing data as extreme as observed if the null hypothesis ( $H_0$ ) is true. Think of it as your luck factor in magic tricks! 🎩

New cards

What is a Chi-squared Test of Independence? ❓

A test where $H_0$ states that two categorical variables are independent. It’s like checking if two games can be played without affecting each other! 🎮🎲

New cards

What is the Confidence Interval (CI) Hypothesis Testing rule? 📏

If $H_0$ value is inside the CI, keep it; if not, reject it. It’s like checking if your favorite spot is still open! 🏞️

New cards

What is Homoscedasticity? 🎢

A condition in residual plots where they scatter randomly and have consistent spread around zero; think of it as a fun fair ride that stays steady! 🎠

New cards

What is Extrapolation? ⚠️

The mistake of predicting values outside the original data range fitted by a regression model. It’s like guessing how popular a new snack might be without trying it! 🍿📉