acronym to describe a distribution
SOCS
Socs
spread; state the highest and lowest value
sOcs
outliers; state values that stand out
soCs
center; approximate mean
socS
shape; symmetry or skewedness
symmetric
mean = median
skewed left
mean < median
negative skew
mean to the left of median
skewed right
mean > median
positive skew
mean to the right of the median
time plot of a variable
plots each observation against the time at which it was measured
5# Summary
minimum, Q1, median, Q3, maximum
IQR
the distance between the first and third quartile
IQR =
Q3 - Q1
calculate lower outlier
Q1 - 1.5(IQR)
calculate higher outlier
Q3 + 1.5(IQR)
second method to calculate outliers
if value is located 2 or more SD’s above or below the mean
standard deviation
average distance from the mean of all the values in a data set
SD for a population
SD = sqrt [ Σ ( x - μ )² / N ]
SD for a sample
SD = sqrt [ Σ(x - x̄)² / (n - 1) ]
frequency table
gives the number of cases falling into each category
relative frequency table
gives the proportion of cases falling into each category
percentages, relative frequencies, and rates all provide the same information as
proportions
bar charts(graphs) are used to display
frequencies for categorical data
discrete variable
can take a countable number of values, could be finite or countably infinite
continuous variable
can take on infinitely many values but values cannot be counted no matter how small the interval there is, there is always a value in between
univariate
one main peak
bimodal
two prominent peaks
uniform
height is the same
gap
region between two values with on data observed
cluster
concentration of data separated by gaps
statistic
numerical summary of sample data
p^th percentile
value that has p% of data less than or equal to it
variability
how spread out scores are in a distribution
The degree to which data points differ from each other or from the mean value. It can be measured by calculating the variance or standard deviation of a dataset.
3 measures of variability
range, IQR, SD
the empirical rule
This rule states that for a normal distribution, approximately 68% of the data falls within one standard deviation of the mean, 95% falls within two standard deviations, and 99.7% falls within three standard deviations.