1/35
Vocabulary flashcards covering key terms from the lecture notes on descriptive and inferential statistics.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Descriptive statistics
Characterizes a group using data from the group, focusing on central tendency and variability.
Inferential statistics
Generalizes findings from a sample to a larger population and asks about relationships or differences between variables or groups.
Two Questions:
1.) Is there a relationship between variables?
EXAMPLES:
Is there a relationship between GRE scores and success in a graduate program?
Is there a relationship between a field test and a lab-based measure for a variable of interest?
2.) Is there a difference across conditions?
EXAMPLES:
Is there a difference between age groups, gender, training status, etc.?
Is there a difference before and after an intervention?
Nominal level of measurement (qualitative)
Categorical data in mutually exclusive categories (e.g., gender, marital status, favorite color). nonparametric
Ordinal level of measurement (Qualitative)
Ordered categories; distances between categories are not necessarily equal (e.g., levels of satisfaction, letter grades). Categorical data; nonparametric
Interval level of measurement (Quantitative)
Differences between data points with equal intervals; no true zero (e.g., SAT scores, IQ). Parametric
Ratio level of measurement (Quantitative)
Differences with a true zero; allows multiplication/division (e.g., age, height, weight, distance). parametric
Qualitative data
Non-numeric, categorical data that describes qualities.Notes often paired with nominal/ordinal levels.
Quantitative data
Numeric data; can be further classified as parametric (usually interval/ratio) or nonparametric.
Central tendency (descriptive statistics)
Measures that describe the center of a data set (mean, median, mode).
Central tendency refers to statistical measures that summarize or capture the center point of a data set, commonly identified by the mean, median, and mode.
Variability (descriptive statistics)
How spread out the data are (range, standard deviation, variance, IQR).
Mean
Arithmetic average; sum of scores divided by the number of scores; population mean = μ, sample mean = X.
The mean is a measure of central tendency that represents the arithmetic average of a set of values, calculated by summing all scores and dividing by the total number of scores. In statistics, the population mean is denoted by μ, while the sample mean is represented by X.
Median
Middle value in ordered data; for even samples, the average of the two middle values.
Mode
Most frequently occurring value in the distribution.
not a very useful measure of central tendency- insensitive to large changes in data base.
not useful with small data sets
not useful with rations and interval data
Range
Difference between the highest and lowest scores; simple spread measure.
range= (highest value-lowest value)
The range is a measure of variability that quantifies the spread of a data set by calculating the difference between the highest and lowest values. It provides a simple indication of the extent of variation in the data.
Standard Deviation
Square root of the variance; indicates spread of scores around the mean.
It measures the amount of variation or dispersion in a set of values, helping to understand how much individual scores deviate from the mean.
Variance
Mean of squared distances from the mean; for a sample, s².
It quantifies how much the values in a data set differ from the mean, providing insight into data variability.
High variance means most scores are far away from the mean.
Low variance indicates that most scores cluster tightly around the mean.
Standard Error of the Mean (SEM)
Standard deviation of the sampling distribution of the mean; SEM = SD/√n.
It estimates how much the sample mean is expected to fluctuate from the population mean, aiding in inferential statistics.
Interquartile Range (IQR)
Width of the middle 50% of the distribution; IQR = Q3 − Q1.
A measure of variation for interval-radio data
Skewness
a non-normal data distrubution; Asymmetry of a distribution; positive skew to the right, negative skew to the left.
The highest frequencies of scores do not fall centrally, but are shifted right or left.
Kurtosis
Non-normal data distrubutions; A vertical shift in the normal curve such that the middle of the curve in elevated or depressed
Peakedness of a distribution relative to a normal curve.
Normal distribution
Bell-shaped distribution; many statistics assume normality; 68% within ±1 SD, 95% within ±2 SD, 99% within ±3 SD.
A probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. It is characterized by its bell-shaped curve and defined by its mean and standard deviation.
Z-score
Relates to percentage on the standard normal curve.
Number of standard deviations a value is from the mean in a standard normal distribution.
Confidence interval
An interval of values computed from the sample that is almost sure to cover the true population value.
In 95% of the samples we take the true population proportion (or mean) will be in the interval.
Research/alternative hypothesis (HA) (inferential statistics)
There is a relationship or a difference; may be one-sided or two-sided.
Type I error (errors in hypothesis testing)
False positive: falsely concluding there is a relationship or difference when there is none; rejecting H0 when it is true.
Type II error (errors in hypothesis testing)
False negative: falsely concluding there is a relationship or difference when there is none; failing to reject H0 when it is false.
Power= 1-B: (errors in hypothesis testing)
Probability of correctly rejecting a false null hypothesis; researchers often aim for power around 0.80.
Sampling
Process of selecting a subset of a population for study to make inferences about the population.
Sample should be large enough to be representive of the population. yet small enough to be practical.
Parametric data
Data assumed to come from a population with a defined distribution (often normal) with known variance.
Nonparametric data
Data not assumed to come from a specific distribution; often ordinal or nominal.
Null hypothesis (HO) (in inferential statistics)
there is no relationship or no difference.
In practice, the hypothesis that is tested is HO; researchers seek to disprove HO
Truth table concept (inferential statistics)
A mathematical table used to determine the validity of logical expressions by showing all possible truth values of their variables.
decisions about HO true/false and accept/reject HO, with corresponding correct decisions and errors.
P-values and significance: (inferential statistics)
P-value indicates the probability of obtaining the observed data (or more extreme) if HO is true.
If p ≤ α (commonly α = 0.05), reject HO and report statistical significance.
Cautions:
P-values depend on the data and sample size;
They do not measure probability that HO is true; they reflect data compatibility with HO.
Multiple testing can inflate Type I error (problem of multiple comparisons).
Clinical relevance vs statistical significance:
A finding can be statistically significant yet not clinically meaningful, indicating that results may not have practical implications in real-world settings.
Researchers should argue for clinical relevance despite p-values lacking a strict objective threshold.
Probability
If a coin has equal chances of heads/tails, the probability of heads is 0.5 (50%).
Recap on inference workflow:
State HO and HA.
Choose a test statistic and significance level α.
Compute p-value or test statistic.
Compare to critical value or use p-value to make a decision.
Report CI and effect size to discuss practical significance.