acronym to describe a distribution

SOCS

Socs

spread; state the highest and lowest value

sOcs

outliers; state values that stand out

soCs

center; approximate mean

socS

shape; symmetry or skewedness

symmetric

mean = median

skewed left

mean < median

negative skew

mean to the left of median

skewed right

mean > median

positive skew

mean to the right of the median

time plot of a variable

plots each observation against the time at which it was measured

5# Summary

minimum, Q1, median, Q3, maximum

IQR

the distance between the first and third quartile

IQR =

Q3 - Q1

calculate lower outlier

Q1 - 1.5(IQR)

calculate higher outlier

Q3 + 1.5(IQR)

second method to calculate outliers

if value is located 2 or more SD’s above or below the mean

standard deviation

average distance from the mean of all the values in a data set

SD for a population

SD = sqrt [ Σ ( x - μ )² / N ]

SD for a sample

SD = sqrt [ Σ(x - x̄)² / (n - 1) ]

frequency table

gives the number of cases falling into each category

relative frequency table

gives the proportion of cases falling into each category

percentages, relative frequencies, and rates all provide the same information as

proportions

bar charts(graphs) are used to display

frequencies for categorical data

discrete variable

can take a countable number of values, could be finite or countably infinite

continuous variable

can take on infinitely many values but values cannot be counted no matter how small the interval there is, there is always a value in between

univariate

one main peak

bimodal

two prominent peaks

uniform

height is the same

gap

region between two values with on data observed

cluster

concentration of data separated by gaps

statistic

numerical summary of sample data

p^th percentile

value that has p% of data less than or equal to it

variability

how spread out scores are in a distribution

The degree to which data points differ from each other or from the mean value. It can be measured by calculating the variance or standard deviation of a dataset.

3 measures of variability

range, IQR, SD

the empirical rule

This rule states that for a normal distribution, approximately 68% of the data falls within one standard deviation of the mean, 95% falls within two standard deviations, and 99.7% falls within three standard deviations.

