1/20
Flashcards covering descriptive statistics, measures of central tendency, measures of variability, the normal distribution, Z-scores, and shapes of distributions like skewness and kurtosis.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Descriptive Statistics
Statistics used to describe data by summarizing and organizing it.
Inferential Statistics
Statistics used to determine the likelihood of pure randomness explaining data, often for making generalizations about a population based on a sample.
types of descriptive statistics
measures of central tendency - one number to represent all numbers
measures of variability - how the data varies, also called measure of spread or dispersion
statistics for describing shapes of distributions
Measures of Central Tendency
Statistics that describe the center of a distribution of values, providing a single number about which a group of numbers cluster, often called 'averages'.
mean, median, mode
tells the most central value and how other values vary around the central value
Mode
The value that occurs most frequently in a set of data, best for nominal and dichotomous data.
if there are two highest frequently occurring values, that is bimodal
if there are three or more highest frequently occurring values, that is multimodal
Median
The middle value or score in a sorted set of data (ascending or descending order), with half of the scores falling above and half below; best for ordinal data.
to get this, you need to order the data. if there are two values in the middle, you add them together and divide by 2
you can use the cumulative percentage and find the value where the 50% threshold is at
Mean
The arithmetic average of a set of scores, calculated by adding all raw scores and dividing by the number of scores; the most commonly used measure of central tendency, best for normal/scale data.
sensitive to every single number in distribution - can be heavily influenced by outliers
Outliers
Scores far higher or lower than most others in a dataset, which can significantly influence the mean.
Measures of Variability
Statistics that describe how the values of a variable are spread out or dispersed, or how much the values vary from each other.
the less variability there is, the less spread out the values are, the better the dataset
Number of Categories
A measure of variability best suited for nominal data.
Range
A measure of variability that represents the difference between the highest and lowest values in a dataset, best for ordinal data.
Standard Deviation (SD)
A commonly used measure of the spread or dispersion of scores, representing the 'average distance' between each score and the mean. A smaller SD means values are less spread out. the higher the standard deviation, the more spread out the data is
also called frequency distribution, average distance, average mean
Z-scores (Standardized Scores)
Scores that tell you the number of standard deviations a raw score or value is from the mean (above or below) - calculated using mean and standard deviation
used to compare values from different distributions - ex comparing income of two employed working in different cities
they have a mean of 0 and a standard deviation of 1.
raw score - group mean, divided by group standard deviation . that gives you the z score
negative z score means actual raw score is below the mean
positive z score means actual raw score is above the mean
Normal Distribution (Normal Curve, Bell Curve)
A theoretical distribution that is bell-shaped, symmetrical with its peak in the middle, and has most scores in the middle with fewer at the extreme ends. In a normal distribution, the mean, median, and mode are the same.
Central Limit Theorem
A theorem stating that data are often distributed approximately as the normal curve when the sample size is large - larger the sample, the closer to a normal curve it will be.
Skewness
A statistic describing the asymmetry of a distribution, indicating how its shape differs from a normal curve.
Symmetrical Distribution
A distribution where the mean, median, and mode are equal, indicating no skewness.
Positive Skewness
A distribution where the tail trails off to the right, and the mean is higher than the median, which is also higher than the mode.
Negative Skewness
A distribution where the tail trails off to the left, and the mean is lower than the median, which is also lower than the mode.
Kurtosis
A statistic describing how peaked or flat the centre of a distribution is, and how flat or skinny its tails are, relative to a normal distribution.
leptokurtic - thin tails and higher centre
mesokurtic - normal curve
platykurtic - flat tails, middle is too short
why does normal distribution matter
lots of tests - especially parametric tests - can only be used for normally distributed variables or data
if continuous data are not normally distributed, then you have to use nonparametric (less powerful) tests - nominal and ordinal data are assumed to be skewed