1/34
Flashcards from Lecture Notes
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data Literacy
The ability to analyze, interpret, and question data, which is an increasingly valuable skill for evidence-based decision-making.
Data
Raw facts, like a single supermarket transaction.
Information
The result of processing data to create meaning and enable decision-making.
Target Population
All subjects of interest in a study.
Sample
A manageable subset of the target population used to make studies feasible.
Observational Study
A study where the researcher collects data without intervention.
Experimental Study
A study where the researcher intervenes to influence outcomes.
Bias
Systematic error causing incorrect parameter estimation or association.
Mode
The most frequently occurring category in a data set.
Mean
The arithmetic average of a data set.
Median
The middle value of an ordered data set.
Variance
The average of squared deviations from the mean.
Standard Deviation
The square root of the variance, indicating how spread out the data is.
Inter-Quartile Range (IQR)
The difference between the upper and lower quartiles.
Range
The difference between the maximum and minimum values.
Outliers
Values that fall outside the calculated fences (LQ - 1.5 x IQR or UQ + 1.5 x IQR).
Histogram
A visual representation of the distribution of a single numerical variable.
Contingency Table
Used to determine the relationship between two categorical variables.
Scatterplot
Used to determine the relationship between two numerical variables.
Normal Distribution
A symmetrical distribution where most observations cluster around the central peak.
Uniform Distribution
A distribution where all outcomes are equally likely.
Skewed Distribution
A distribution that is not symmetrical; can be skewed left (long tail on the left) or skewed right (long tail on the right).
One Sample t-Test
Used when comparing a sample mean to a known value.
Two Sample t-Test
Compares means from two independent samples.
Paired Sample t-Test
Compares means from related groups; involves calculating the differences.
Chi-Square Goodness of Fit Test
Used for a categorical variable to compare observed frequencies with expected proportions.
Central Limit Theorem (CLT)
States that the mean of a random sample has a sampling distribution whose shape can be approximated by a Normal distribution; the larger the sample, the better the approximation.
Confidence Interval (CI)
An interval that is expected to contain the population parameter being estimated with a certain level of confidence.
Correlation
Measures the association between two numeric variables; it is a number between -1 and 1.
Regression Line
Helps predict a value of y given a value of x.
Residuals
Vertical distances between data points and the regression line.
Linearity
Assumption that the relationship between variables is linear.
Independence
Assumption that residuals are independent.
Normality
Assumption that residuals are normally distributed.
Equality of Variance (Homoscedasticity)
Assumption that residuals have equal variance.