1/74
This flashcard set includes key vocabulary and concepts in statistics and data analysis with their definitions for exam preparation.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
68-95-99.7% Rule
A rule for determining the percentage of values that lie within one, two, and three standard deviations of the mean in a normal distribution.
Allocation
The process of assigning tasks to different members of a group to complete tasks efficiently.
Bar Chart
A statistical chart used to display the frequency distribution of categorical data.
Bivariate Data
Data where each observation records information about two variables for the same subject.
Boxplot
A graphical display of the five-number summary showing outliers if present.
Categorical Variable
A variable that represents characteristics of individuals, such as eye color or place of birth.
Centre of Distribution
A measure of location in a distribution, including mean and median.
Centring
The process of adjusting smoothed values to align with original data values.
Coefficient of Determination (rÂČ)
A coefficient measuring the predictive power of a regression line.
Continuous Variable
A numerical variable representing a quantity that is measured rather than counted.
Correlation Coefficient (r)
A statistical measure of the strength of the linear association between two numerical values.
Cycle (Time Series)
Periodic movement in a time series over a period greater than a year.
Data Transformations
Using a mathematical rule to change the scale on an axis to linearize a scatterplot.
Deseasonalise
The process of removing seasonality from a time series.
Discrete Variable
A numerical variable determined by counting, such as the number of people in a queue.
Dot Plot
A statistical graph using dots to display individual data values on a number line.
Explanatory Variable (EV)
In bivariate data, the variable used to explain or predict the response variable's value.
Extrapolation
Using a model to make predictions outside the range of the original data.
Five-number Summary
A list including the minimum, first quartile, median, third quartile, and maximum.
Frequency Table
A list showing the values a variable takes along with their occurrences.
Histogram
A statistical graph for displaying the frequency distribution of a numerical variable.
Interpolation
Using a regression line to make predictions within the range of explanatory variable values.
Interquartile Range (IQR)
Defined as IQR=Q3 - Q1, it measures the spread of the middle 50% of data.
Irregular Fluctuations
Unpredictable fluctuations present in any real-world time series.
Least Squares Method
A technique for finding regression line equations that minimizes the sum of squares of residuals.
Linear Regression
The process of fitting a straight line to bivariate data.
Log Scale
A scale used to transform skewed histograms to symmetry or linearize scatterplots.
Logarithmic Transformations
Transformations that linearize scatterplots by compressing the upper end of scale.
Lower Fence
A threshold for identifying outliers in a dataset.
Mean (xÌ)
The balance point of a data distribution, calculated as xÌ=âx / n.
Median (M)
The middle value in a data distribution that divides an ordered dataset into two equal parts.
Modal Category
The category or interval that occurs most frequently in a dataset.
Mode
The value that occurs most frequently in a dataset.
Modelling
The use of a mathematical rule to represent real-life situations.
Moving Mean Smoothing
A technique where original data values are replaced by the means of values around them.
Moving Median Smoothing
Smoothing a time series plot using moving medians instead of means.
Negatively Skewed Distribution
A data distribution with a long tail to the left.
Nominal Variable
A categorical variable used for naming only, such as eye color.
Normal Distribution
A bell-shaped data distribution where the 68-95-99.7% rule applies.
Numerical Variable
A variable representing quantities that are counted or measured.
Ordinal Variable
A categorical variable that allows for both naming and ordering.
Outliers
Data values that stand out from the main body of a dataset.
Parallel Box Plots
Box plots drawn side-by-side for comparing distributions.
Percentage Frequency
Frequency expressed as a percentage of the total.
Positively Skewed Distribution
A data distribution with a long tail to the right.
Quartiles
Statistics that divide an ordered set into four equal groups.
Range (R)
The difference between the smallest and largest observations in a dataset.
Reciprocal Transformations
Transformations that compress the upper end of the scale more than log transformations.
Reseasonalise
Converting seasonal data back to its original form.
Residual
The vertical distance from a data point to the fitted regression line.
Residual Plot
A plot of the residuals against an explanatory variable.
Response Variable (RV)
The primary variable of interest in a statistical investigation.
Scatterplot
A graph used to display bivariate data where data pairs are represented by points.
Seasonal Indices
Indices that quantify seasonal variations in data.
Seasonality
The tendency for values in a time series to vary predictably based on time periods.
Segmented Bar Chart
A graph that displays information contained in a two-way frequency table.
Shape of Distribution
The general form of a data distribution, described as symmetric, positively or negatively skewed.
Slope (of a straight line)
Defined as slope = Îy / Îx, it is also known as the gradient.
Smoothing
A technique used to reduce random variation in a time series to make underlying patterns (such as trends or seasonality) easier to see.
Spread of a Distribution
A measure of data values' clustering around a central point in the distribution.
Squared Transformations
Transformations that stretch out the upper end of the scale on either axis.
Standard Deviation (s)
A summary statistic measuring the data's spread around the mean.
Standardised (z) Scores
Scores indicating the distance and direction of a data value from the mean.
Statistical Question
A question that depends on data for its answer.
Stem Plot
A method for displaying data by splitting each observation into a stem and leaf.
Strength of Linear Relationship
Classified as weak, moderate, or strong, determined by scatter in a scatterplot.
Structural Change (time series)
A sudden change in the established pattern of a time series plot.
Summary Statistics
Numerical values representing features such as centre and spread of a data distribution.
Symmetric Distribution
A data distribution where values are evenly spread around the mean.
Time Series Data
A collection of data values recorded at specific times.
Time Series Plot
A line graph plotting values of a response variable in time order.
Trend
The tendency for values in a time series to increase or decrease over time.
Trend Line Forecasting
Using a line fitted to a time series to predict future values.
Two-Way Frequency Table
A table classifying subjects according to two categorical variables.
Univariate Data
Data associated with a single variable.