Statistics Vocabulary Flashcards (Data collection, sampling, data representation, measures, correlation, probability, hypothesis testing, distributions)

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/73

Earn XP

Description and Tags

A set of vocabulary flashcards covering data collection, sampling methods, data types, data representation, measures of location and spread, box plots, cumulative frequency, histograms, correlation, regression, probability, and hypothesis testing.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

74 Terms

New cards

Population

The entire set of items or individuals of interest in a study.

New cards

Census

Observing or measuring every member of the population.

New cards

Sample

A subset of the population used to estimate information about the population.

New cards

Sampling frame

A list of sampling units from which a sample is drawn.

New cards

Sampling units

Individual items in the population that can be sampled.

New cards

Simple random sample

Every possible sample of size n has an equal chance of being chosen; requires a sampling frame.

New cards

Lottery sampling

A method of simple random sampling where sample units are drawn like tickets from a hat.

New cards

Systematic sampling

Selects elements at regular intervals from an ordered list (e.g., every k-th item).

New cards

Stratified sampling

Population is divided into strata; random samples are taken from each, proportionate to stratum size.

New cards

Quota sampling

Non-random sampling where quotas reflect population characteristics; quotas filled during interviewing.

New cards

Opportunity sampling

Sample chosen from people available at the time and who fit criteria (convenience sampling).

New cards

Non-random sampling

Sampling methods that do not use random selection (e.g., quota, opportunity).

New cards

Qualitative data

Non-numeric data such as hair colour or types of occupation.

New cards

Quantitative data

Numeric data that can be measured or counted.

New cards

Discrete data

Quantitative data that take only specific values (e.g., number of students).

New cards

Continuous data

Quantitative data that can take any value within a range (e.g., height, time).

New cards

Grouped data

Data organized into classes or intervals with frequencies.

New cards

Class boundaries

Lower and upper limits that define a class in a grouped frequency table.

New cards

Midpoint

The average of the class boundaries; used as a representative value for a class.

New cards

Class width

Difference between the upper and lower class boundaries.

New cards

Frequency table

Table listing class intervals and their frequencies.

New cards

Raw data

Original data before any summarisation or grouping.

New cards

Large data set

A big dataset (e.g., weather data) used to practice sampling and statistics; includes multiple variables.

New cards

Sampling units

Individual items that are sampled from the population (often numbered or named).

New cards

Data type: qualitative

Non-numeric data (e.g., hair colour, species).

New cards

Data type: quantitative

Numeric data that can be measured or counted.

New cards

Data type: discrete

Data that takes only whole-number values (counts).

New cards

Data type: continuous

Data that can take any value within a range (measurements).

New cards

Class boundaries (grouped data)

The actual lower and upper limits of a class interval in a grouped distribution.

New cards

Frequency density

Height of a bar in a histogram; used when class widths vary; area corresponds to frequency.

New cards

Histogram

A graph of grouped continuous data where area of bars is proportional to frequencies.

New cards

Frequency polygon

A line graph joining the midpoints of the tops of the histogram bars.

New cards

Box plot

A graphical representation showing min, Q1, median (Q2), Q3 and max, with possible outliers.

New cards

Outlier

An observation far from the pattern of the rest of the data (often defined using IQR).

New cards

Interquartile range (IQR)

Difference between Q3 and Q1; a measure of spread.

New cards

Quartiles

Q1 (lower quartile), Q2 (median), Q3 (upper quartile).

New cards

P10, P90

10th and 90th percentiles; points that divide data into tenths and ninth-tenths.

New cards

Percentile

A value below which a given percentage of data falls.

New cards

Cumulative frequency diagram

Plot of cumulative frequencies to read medians/percentiles from the graph.

New cards

Measures of central tendency

Statistics that describe the centre of a data set (mean, median, mode).

New cards

Mean (x-bar)

Average value: x̄ = (sum of data values)/n.

New cards

Median

Middle value when data are arranged in order (or the average of the two middle values for even n).

New cards

Mode

Most frequent value in the data (or modal class in grouped data).

New cards

Variance

The average of squared deviations from the mean; σ² = Σ(x−x̄)² / n (population).

New cards

Standard deviation

The square root of the variance; a measure of spread in the same units as the data.

New cards

Coding a data set

Transforming data using y = a + bx to simplify calculations; relationships of mean and spread follow specific rules.

New cards

Box plot features

Whiskers show min and max (excluding outliers); box spans Q1 to Q3; line at median; outliers plotted separately.

New cards

Skewness

Asymmetry of the data distribution; reflected in the position of the median within the box plot.

New cards

Correlation

A measure of the linear relationship between two variables; r indicates strength and direction.

New cards

Scatter diagram

Plot of paired data points (x, y) to assess relationships between two variables.

New cards

Regression line (least squares)

Line that minimises the sum of squared distances from data points; y = a + bx.

New cards

Independent variable (explanatory)

The variable that is purposely changed or used to explain changes in the other variable.

New cards

Dependent variable (response)

The outcome measured in a study, believed to depend on the independent variable.

New cards

Prediction (interpolation)

Estimating a value within the range of observed data using the regression line.

New cards

Extrapolation

Predicting a value outside the range of observed data; usually less reliable.

New cards

Binomial distribution

X ~ B(n, p): a fixed number of independent trials with two outcomes (success, failure).

New cards

Probability mass function (pmf)

P(X = x) giving the probability that X takes the value x.

New cards

B(n, p) mean

Mean of a binomial distribution: np.

New cards

B(n, p) variance

Variance of a binomial distribution: np(1−p).

New cards

Cumulative probability (binomial CD)

Probabilities P(X < x) calculated from binomial distribution tables or calculator.

New cards

Null hypothesis (H0)

Presumed statement about a population parameter to be tested, e.g., p = p0.

New cards

Alternative hypothesis (H1)

Statement opposite to H0, describing the parameter value being tested for.

New cards

Test statistic

A quantity calculated from sample data used to decide whether to reject H0.

New cards

Significance level

Probability threshold (e.g., 0.05) used to decide whether to reject H0.

New cards

Critical region

Set of values of the test statistic that lead to rejection of H0.

New cards

Acceptance region

Values of the test statistic that fail to lead to rejection of H0.

New cards

One-tailed test

Hypothesis test where the alternative specifies a direction (>, <).

New cards

Two-tailed test

Hypothesis test where the alternative does not specify a direction (≠).

New cards

P-value

The probability, under H0, of obtaining a test statistic as extreme or more extreme than observed.

New cards

Interpolation

Estimating a value within the range of observed data inside a class interval.

New cards

Extrapolation (regression)

Estimating a value outside the observed data range; less reliable.

New cards

Beaufort scale

Descriptive scale for wind speed (e.g., calm to gale) used with large data sets.

New cards

Raw data vs summary statistics

Raw data are the original measurements; summary statistics (mean, median, etc.) describe the data.

New cards

Redundancy caution (outliers)

Outliers may be genuine or errors; must justify removing anomalies with reason.