1/28
Vocabulary practice flashcards covering key terms and concepts from the DATA1001/1901 Semester 1 lecture notes, including data types, EDA, probability, regression, and hypothesis testing.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is data?
Data: Information about a subject; often a sample of a population.
What is quantitative data?
Quantitative Data: Numerical info; can be continuous (decimals) or discrete (whole numbers).
What is a continuous variable?
Continuous Variable: A type of quantitative variable that can take any decimal value.
What is a discrete variable?
Discrete Variable: A quantitative type that only has whole numbers.
What is qualitative data?
Qualitative Data: Labels or categories; can't be used for math.
What is nominal data?
Nominal Data: Qualitative data with no order (like gender).
What is ordinal data?
Ordinal Data: Qualitative data with a natural order, unequal gaps.
What is exploratory data analysis (EDA)?
EDA: First look at data; includes Background, Structure, Wrangling, Summaries.
What is tidy data?
Tidy Data: Variables are columns, observations are rows.
What is self-selection bias?
Self-selection Bias: Bias from participants excluding themselves.
What is a randomised controlled trial (RCT)?
RCT: A study design that establishes causation through random assignment.
What is data linkage?
Data Linkage: Combining data from different sources.
What is a comparative boxplot?
Comparative Boxplot: Graph for visualizing qualitative vs. quantitative variables.
What is a scatterplot?
Scatterplot: Visualizes the relationship between two quantitative variables.
What is a density histogram?
Density Histogram: Histogram showing density on the y-axis (total area = 100%).
What is the interquartile range (IQR)?
IQR: The spread of the middle 50% of data; IQR=Q3−Q1.
What is standard deviation (SD)?
SD: The average distance of data points from the mean; ≥ 0.
What are robust statistics?
Robust Statistics: Measures like Median and IQR; not influenced by outliers.
What is chance error?
Chance Error: Random measurement error measured by standard deviation of replicas.
What is bias?
Bias: A consistent error in one direction, often harder to detect.
What is the normal distribution rule?
Normal Distribution Rule: About 68%, 95%, 99.7% of data within ±1, 2, 3 SD.
What is correlation (r)?
Correlation (r): Measure of linear association; range from -1 to +1.
What is the central limit theorem?
Central Limit Theorem: Sums from large samples are approximately normally distributed.
What is the prosecutor's fallacy?
Prosecutor's Fallacy: Mistaking P(evidence | innocent) for P(innocent | evidence).
What are residuals?
Residuals: Differences between actual data and the regression line; average is 0.
What is R-squared (R²)?
R²: Proportion of variation in dependent variable explained by an independent variable.
What is a p-value?
P-value: Probability of extremes if the null hypothesis is true.
What is Cochran's Rule?
Cochran's Rule: Expected frequency in chi-squared test must be ≥ 5.
What is YAML?
YAML: Configuration format in Quarto; correct indentation is key.