1/54
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Central Tendency
A statistical measure to determine a single score that defines the centre of a distribution
What are the 3 measures of central tedency
Mode
Median
Mean
Mode
The most frequent category/score in a distribution (often the high point in a graph)
In what scales of measurement can mode be used with
Nominal, ordinal and ratio/interval scale (Only scale that can work with nominal)
T or F: You can only have one mode in each dataset
F, there can be multiple if two values have the same number of frequencies
What are the 4 major advantages of the mode
Easy to compute/determine
Value that is observed in dataset (real number from data)
Can be used for all scales of measurement
Not affected by outliers
What are the 2 major disadvantages of the mode
Not used in most statistical computations as it is not useful for making inferences
May not be representative of the entire collection of numbers
What is the measure of variability with the mode
There is none as the variable cannot be quantified
Median
Physical middle of an ordered set of data, also known as the 50th percentile, can be used with ordinal and ratio/interval scale
When is the most common use for the median
When data is extremely skewed
What are the 3 major advantages of the median
Can be computed on data that is ordinal, interval or ratio and is the best measure of central tendency in those ranks
Less biased measure of central tendency when interval/ratio data is skewed
Not affected by outliers or extreme scores
What are the 2 major disadvantages of the median
Not used in statistical computations/can not be used to make inferences
Subject to sampling variation (not stable from sample to sample so we cannot infer anything about population)
What is the measure of variability used with the median
Find the min/max values as well as the range (max value - min value)
Mean
The average value in a dataset, can work with an interval/ratio scale
What is the symbol used for the population mean
Mu (μ)
What is the symbol used for the sample mean
M or x-bar (x̄)
What are the 3 major characteristics of the mean
Changing the value of any score or changing the number of scores will change the mean
Adding or subtracting a constant for each score in a distribution will add or subtract that same constant from the mean
Multiplying or dividing every score in a distribution by a constant will multiply or divide the mean by that same constant
What are the 4 major advantages of the mean
Can be manipulated algebraically
Takes into account quantitative info about each value
Value has the most meaning for interpretation, especially with ratio values
Most common value for inferential stats
What are the 3 major disadvantages of the mean
Applies only to interval/ratio data
Influenced by outliers
Computed value may not reflect any actual value in the dataset
What measure(s) of CT can be found with nominal data
Only the mode
What measure(s) of CT can be found with ordinal data
Median or mode, median is usually the best pick
Variability
Quantitative distance, the measure of the differences between the scores in a distribution
What are the 5 major measures of variability
Range
Semi-interquartile Range (SIQR)
Median Absolute Deviation (MAD)
Variance
Standard Deviation
Range
Distance coved by scores in a distribution from smallest to largest
T or F: Range is calculated differently depending on if the variables are continuous or discrete
T, Discrete values are simply max - min, whereas continuous values are the upper limit of max - upper limit of min
What are the major advantages and disadvantages of range
Advantages: Quick to compute and includes entire distribution of data
Disadvantages: Derived from only two values so spread of data is unknown and sensitive to outliers
Semi-interquartile Range (SIQR)
Half of the range of the middle 50% of observations, calculated by the (third quartile - first quartile) / 2
First quartile
25th percentile, range of the first 25% of data
Third quartile
75th percentile, range between 50% and 75% of data
Interquartile range
third quartile - first quartile
Median Absolute Deviation (MAD)
Absolute measure of how many physical units values deviate from the median, calculated as: MAD = Mdn|X - Mdn| (take the median of the each value - the median of the original data set)
What are the 3 major advantages of the MAD
Takes all scores into account
Less sensitive than SD to extreme scores or skews in data
It is a minimum, we cannot get smaller value even if we take the absolute deviation from another location in the dataset
What are the 2 major disadvantages of the MAD
Provides limited description of variability
Not useful in advanced statistical procedure
Variance
Average squared distance from the mean, one can calculate it by using the formula: V = SD2 = (Σ(x - x̄)2) / N
What is the Sum of Squared Deviations (SS)
Sum of the squared difference between each score and its mean, calculated by: SS = ∑(x −x̄)2
Standard Deviation
Measure of standard/average distance from the mean (how dispersed scores are around the mean), calculated: SD = √SD2 = √variance
What measure of variability can by used with a nominal scale
None, needs quantitative values
What measure of variability can by used with a ordinal scale
Range, SIQR, MAD
What measure of variability can by used with a interval/ratio scale
Range, SIQR, MAD, Variance, SD
What are the two characteristics of standard deviation
Adding or subtracting a constant to every score in a distribution will not change the standard deviation
Multiplying or dividing every score in a distribution by a constant will multiply or divide the standard deviation by the same constant
What are the 3 advantages of SD
Accounts for all scores in a distribution
Good description of variability
Used in many advanced stats
What are the 2 disadvantages of SD
Can only be used with interval/ratio data
Sensitive to extreme scores or outliers and is biased when distributions are skewed
Pearson coefficient of skew (Skewp)
Helps us understand the magnitude of skew, calculated using a skew statistic (in our case the pearson coefficient of skew)
How does one calculate Skewp
Skewp = 3(x̄ - Mdn) / SD
What does the sign of Skewp tell us
The direction of skew (positive or negative)
What does the magnitude Skewp tell us
The degree of skew, no skew and Skewp = 0
What values of Skewp demonstrate a normal distribution and what measure of variability should be used
Between 0 and |.5|, use mean and SD
What values of Skewp demonstrate a mild/moderate skew and what measure of variability should be used
Between |.5| and |1.0|, use mean and SD
What values of Skewp demonstrate a moderate/strong skew and what measure of variability should be used
Between |1.0| and |2.0|, use mean and SD if ≤ |1.5|, else use median and MAD
What values of Skewp demonstrate a severe skew and what measure of variability should be used
Greater than |2.0|, use the median and MAD
T or F: We do not need central tendency and variability to fully grasp the shape of a distribution
F, we need both
Explain what each variable in the formula represents:
X0= Xt ± Xe
X0: represents our observed value
Xt: represents our measure of central tendency
Xe: represents our measure of error
Why is measure of error so important
Allows us to make hypothesis on why something occured
Anscombes quartet
4 datasets with nearly identical summary stats, however the actual datasets are different which can be demonstrated with graphs
What is the overall conclusion we can make about shape, central tendency, variability and data observation
We need all of them if we are to be able to make assumptions about our data