1/28
Vocabulary practice flashcards covering key terms and concepts from the DATA1001/1901 Semester 1 lecture notes, including data types, EDA, probability, regression, and hypothesis testing.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is data?
Information about a subject, which can be represented in various forms.
What is quantitative data?
Numerical information that can be measured or counted.
What is a continuous variable?
A quantitative variable that can take any decimal value within a range.
What is a discrete variable?
A quantitative type that consists of whole numbers only.
What is qualitative data?
Non-numerical data categorized into groups or labels.
What is nominal data?
Qualitative data that has no inherent order or ranking.
What is ordinal data?
Qualitative data that has a defined order or ranking.
What is exploratory data analysis (EDA)?
The initial examination and interpretation of data to uncover patterns.
What is tidy data?
A structured way of organizing data where variables are columns and observations are rows.
What is self-selection bias?
A bias introduced when participants choose their own involvement in a study.
What is a randomised controlled trial (RCT)?
A study design that randomly assigns participants to intervention or control groups to establish causation.
What is data linkage?
The process of combining data from different sources to enrich analysis.
What is a comparative boxplot?
A graphical tool used to visualize differences between groups for qualitative and quantitative variables.
What is a scatterplot?
A chart that displays the relationship between two quantitative variables using points.
What is a density histogram?
A histogram where the height represents the density of data rather than frequency.
What is the interquartile range (IQR)?
The range that captures the middle 50% of the data, calculated as Q3 - Q1.
What is standard deviation (SD)?
A statistical measure that indicates the average distance of data points from the mean.
What are robust statistics?
Statistical measures, such as median and IQR, that are not greatly affected by outliers.
What is chance error?
The random error that can occur in measurements due to variability.
What is bias?
A consistent error in measurement that skews results in a specific direction.
What is the normal distribution rule?
A statistical principle stating that 68%, 95%, and 99.7% of data falls within ±1, 2, and 3 standard deviations from the mean respectively.
What is correlation (r)?
A statistical measure that indicates the strength and direction of a linear relationship between two variables.
What is the central limit theorem?
A fundamental statistical principle stating that the sum of variables from a large sample approximates a normal distribution.
What is the prosecutor's fallacy?
The error of interpreting the probability of evidence assuming guilt, rather than calculating the likelihood of innocence given the evidence.
What are residuals?
The differences between observed values and the values predicted by a regression model.
What is R-squared (R²)?
A statistic that represents the proportion of variance in the dependent variable that can be explained by the independent variable.
What is a p-value?
The probability of obtaining results at least as extreme as the observed results, given that the null hypothesis is true.
What is Cochran's Rule?
A rule stating that the expected frequency for each category in a chi-squared test should be at least 5.
What is YAML?
A human-readable data serialization format used for configuration files with a focus on simplicity and readability.