1/34
A set of vocabulary-style flashcards covering key data analysis concepts from the lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Frequentist probability
Probabilities are objective and have a frequency interpretation; prior knowledge is not formally included.
Bayesian probability
Probabilities can be subjective and represent degrees of belief; prior knowledge is formally included in analyses.
Statistics
Scientific study of numerical data collected from natural phenomena and the methods used to collect, analyze, and interpret such data.
Science
Systematic study of the physical and natural world through observation and experiment.
Description (in science)
An adequate description of the things and events investigated.
Law or theory (in science)
General laws or theories by which particular events may be explained and predicted.
Scientific Method
A sequence: observation, formulating a question/problem, hypothesis, prediction, and experimental design.
Population
All individuals of the same species in a defined location, sharing a gene pool.
Sample
A subset of the population used to make inferences about the population.
Random sampling
A sampling method where each member of the population has a known chance of being selected.
Random variable
A numerical quantity whose value is determined by chance from the population.
Parameters
Population characteristics (e.g., μ, σ², N) that describe the population.
Statistics (in sampling)
Numerical summaries computed from a sample (e.g., X̄, s², n).
Random sampling vs population vs sample
Understanding how samples estimate population parameters through probability.
Central tendency
A measure that summarizes the center of a data set (e.g., mean, median).
Mean (arithmetic mean)
The sum of values divided by the number of observations.
Median
The middle value of an ordered data set; splits data into two halves.
Variance
A measure of dispersion; the average squared deviation from the mean.
Standard deviation
The square root of the variance; a measure of dispersion around the mean.
Standard error
The standard deviation of the sampling distribution; equals S/√n.
Range
The difference between the maximum and minimum values in a data set.
Quartiles
Values that divide data into four equal parts: Q1, Q2 (median), Q3.
Five-number summary
Min, Q1, Median, Q3, Max; used for box plots.
Box plot
A graphical display of the five-number summary showing distribution and potential outliers.
IQR (Interquartile Range)
Q3 − Q1; the spread of the middle 50% of the data.
Outlier fences (f1, f3)
Lower and upper bounds for outliers: f1 = Q1 − 1.5×IQR, f3 = Q3 + 1.5×IQR.
Empirical rule
Approx. 68% of data fall within mean ± SD; approx. 95% within mean ± 2×SD.
Accuracy
Closeness of a measured value to the true value.
Precision
Closeness of repeated measurements to each other (reproducibility).
30-300 rule
Max−min should equal between 30 and 300 unit steps to assess accuracy/precision.
Quantitative variable
A numeric variable that can be measured; includes continuous and discrete types.
Continuous variable
Quantitative variable that can take any value within a range.
Discrete variable
Quantitative variable that takes only specific, separate values.
Ordinal (ranked) variable
Qualitative variable with an inherent order but uneven intervals.
Categorical variable
Qualitative variable representing categories without intrinsic order.