1/36
These flashcards cover key concepts related to exploratory data analysis, hypothesis testing, and statistics.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is the purpose of Exploratory Data Analysis (EDA)?
To understand data, find patterns, detect anomalies, and prepare for analysis.
What are the two main types of data?
Numerical and categorical.
Mean vs. Median?
Mean is the average and is affected by outliers; Median is the middle value and is robust.
What does standard deviation measure?
The spread or variability of data.
What does a histogram show?
The distribution of data.
What does a boxplot show?
Distribution, quartiles, and outliers.
What is an outlier?
A value far from the rest of the dataset.
Why are outliers important?
They can distort analysis and statistics.
What is the IQR method for detecting outliers?
Outliers are below Q1 − 1.5×IQR or above Q3 + 1.5×IQR.
What is a Z-score used for?
Detecting how far a value is from the mean.
Which is more affected by outliers: mean or median?
Mean.
When should you keep outliers?
When they are real and meaningful.
What is correlation?
A measure of relationship between two variables.
What is the range of correlation?
-1 to +1.
+1 correlation means what?
Perfect positive relationship.
What does 0 correlation mean?
No relationship.
What does -1 correlation mean?
Perfect negative relationship.
Does correlation imply causation?
No.
What does a heatmap show?
Correlation strengths using colors.
What does a pairplot show?
Relationships between multiple variables.
What is the null hypothesis (H₀)?
No effect or no difference.
What is the alternative hypothesis (H₁)?
There is an effect or difference.
What is a p-value?
Probability of results assuming H₀ is true.
What does a small p-value indicate?
Strong evidence against H₀.
What is the significance level (α)?
Threshold to reject H₀ (usually 0.05).
What is the decision rule for hypothesis testing?
If p ≤ α → Reject H₀.
What is a Type I error?
Rejecting a true H₀ (false positive).
What is a Type II error?
Not rejecting a false H₀ (false negative).
Why do EDA before hypothesis testing?
To understand and clean data first.
How do outliers affect correlation?
They can falsely strengthen or weaken it.
Why is visualization important?
It reveals patterns statistics might miss.
What is skewness?
Asymmetry in data distribution.
Mean is most affected by what?
Outliers.
Best measure for skewed data?
Median.
Correlation range shortcut?
[-1, 1].
When do you reject H₀ quickly?
When p ≤ 0.05.
Key warning about correlation?
Correlation does not imply causation.