sac 1 statistics & data analysis terms T1-4

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/74

flashcard set

Earn XP

Description and Tags

This flashcard set includes key vocabulary and concepts in statistics and data analysis with their definitions for exam preparation.

Last updated 6:02 AM on 4/26/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

75 Terms

1
New cards

68-95-99.7% Rule

A rule for determining the percentage of values that lie within one, two, and three standard deviations of the mean in a normal distribution.

2
New cards

Allocation

The process of assigning tasks to different members of a group to complete tasks efficiently.

3
New cards

Bar Chart

A statistical chart used to display the frequency distribution of categorical data.

4
New cards

Bivariate Data

Data where each observation records information about two variables for the same subject.

5
New cards

Boxplot

A graphical display of the five-number summary showing outliers if present.

6
New cards

Categorical Variable

A variable that represents characteristics of individuals, such as eye color or place of birth.

7
New cards

Centre of Distribution

A measure of location in a distribution, including mean and median.

8
New cards

Centring

The process of adjusting smoothed values to align with original data values.

9
New cards

Coefficient of Determination (rÂČ)

A coefficient measuring the predictive power of a regression line.

10
New cards

Continuous Variable

A numerical variable representing a quantity that is measured rather than counted.

11
New cards

Correlation Coefficient (r)

A statistical measure of the strength of the linear association between two numerical values.

12
New cards

Cycle (Time Series)

Periodic movement in a time series over a period greater than a year.

13
New cards

Data Transformations

Using a mathematical rule to change the scale on an axis to linearize a scatterplot.

14
New cards

Deseasonalise

The process of removing seasonality from a time series.

15
New cards

Discrete Variable

A numerical variable determined by counting, such as the number of people in a queue.

16
New cards

Dot Plot

A statistical graph using dots to display individual data values on a number line.

17
New cards

Explanatory Variable (EV)

In bivariate data, the variable used to explain or predict the response variable's value.

18
New cards

Extrapolation

Using a model to make predictions outside the range of the original data.

19
New cards

Five-number Summary

A list including the minimum, first quartile, median, third quartile, and maximum.

20
New cards

Frequency Table

A list showing the values a variable takes along with their occurrences.

21
New cards

Histogram

A statistical graph for displaying the frequency distribution of a numerical variable.

22
New cards

Interpolation

Using a regression line to make predictions within the range of explanatory variable values.

23
New cards

Interquartile Range (IQR)

Defined as IQR=Q3 - Q1, it measures the spread of the middle 50% of data.

24
New cards

Irregular Fluctuations

Unpredictable fluctuations present in any real-world time series.

25
New cards

Least Squares Method

A technique for finding regression line equations that minimizes the sum of squares of residuals.

26
New cards

Linear Regression

The process of fitting a straight line to bivariate data.

27
New cards

Log Scale

A scale used to transform skewed histograms to symmetry or linearize scatterplots.

28
New cards

Logarithmic Transformations

Transformations that linearize scatterplots by compressing the upper end of scale.

29
New cards

Lower Fence

A threshold for identifying outliers in a dataset.

30
New cards

Mean (x̄)

The balance point of a data distribution, calculated as x̄=∑x / n.

31
New cards

Median (M)

The middle value in a data distribution that divides an ordered dataset into two equal parts.

32
New cards

Modal Category

The category or interval that occurs most frequently in a dataset.

33
New cards

Mode

The value that occurs most frequently in a dataset.

34
New cards

Modelling

The use of a mathematical rule to represent real-life situations.

35
New cards

Moving Mean Smoothing

A technique where original data values are replaced by the means of values around them.

36
New cards

Moving Median Smoothing

Smoothing a time series plot using moving medians instead of means.

37
New cards

Negatively Skewed Distribution

A data distribution with a long tail to the left.

38
New cards

Nominal Variable

A categorical variable used for naming only, such as eye color.

39
New cards

Normal Distribution

A bell-shaped data distribution where the 68-95-99.7% rule applies.

40
New cards

Numerical Variable

A variable representing quantities that are counted or measured.

41
New cards

Ordinal Variable

A categorical variable that allows for both naming and ordering.

42
New cards

Outliers

Data values that stand out from the main body of a dataset.

43
New cards

Parallel Box Plots

Box plots drawn side-by-side for comparing distributions.

44
New cards

Percentage Frequency

Frequency expressed as a percentage of the total.

45
New cards

Positively Skewed Distribution

A data distribution with a long tail to the right.

46
New cards

Quartiles

Statistics that divide an ordered set into four equal groups.

47
New cards

Range (R)

The difference between the smallest and largest observations in a dataset.

48
New cards

Reciprocal Transformations

Transformations that compress the upper end of the scale more than log transformations.

49
New cards

Reseasonalise

Converting seasonal data back to its original form.

50
New cards

Residual

The vertical distance from a data point to the fitted regression line.

51
New cards

Residual Plot

A plot of the residuals against an explanatory variable.

52
New cards

Response Variable (RV)

The primary variable of interest in a statistical investigation.

53
New cards

Scatterplot

A graph used to display bivariate data where data pairs are represented by points.

54
New cards

Seasonal Indices

Indices that quantify seasonal variations in data.

55
New cards

Seasonality

The tendency for values in a time series to vary predictably based on time periods.

56
New cards

Segmented Bar Chart

A graph that displays information contained in a two-way frequency table.

57
New cards

Shape of Distribution

The general form of a data distribution, described as symmetric, positively or negatively skewed.

58
New cards

Slope (of a straight line)

Defined as slope = Δy / Δx, it is also known as the gradient.

59
New cards

Smoothing

A technique used to reduce random variation in a time series to make underlying patterns (such as trends or seasonality) easier to see.

60
New cards

Spread of a Distribution

A measure of data values' clustering around a central point in the distribution.

61
New cards

Squared Transformations

Transformations that stretch out the upper end of the scale on either axis.

62
New cards

Standard Deviation (s)

A summary statistic measuring the data's spread around the mean.

63
New cards

Standardised (z) Scores

Scores indicating the distance and direction of a data value from the mean.

64
New cards

Statistical Question

A question that depends on data for its answer.

65
New cards

Stem Plot

A method for displaying data by splitting each observation into a stem and leaf.

66
New cards

Strength of Linear Relationship

Classified as weak, moderate, or strong, determined by scatter in a scatterplot.

67
New cards

Structural Change (time series)

A sudden change in the established pattern of a time series plot.

68
New cards

Summary Statistics

Numerical values representing features such as centre and spread of a data distribution.

69
New cards

Symmetric Distribution

A data distribution where values are evenly spread around the mean.

70
New cards

Time Series Data

A collection of data values recorded at specific times.

71
New cards

Time Series Plot

A line graph plotting values of a response variable in time order.

72
New cards

Trend

The tendency for values in a time series to increase or decrease over time.

73
New cards

Trend Line Forecasting

Using a line fitted to a time series to predict future values.

74
New cards

Two-Way Frequency Table

A table classifying subjects according to two categorical variables.

75
New cards

Univariate Data

Data associated with a single variable.