statistics
the study of how to collect/organize/analyze/interpret numerical information from a data set
individuals
people/object included in the study (2 types)
variables
characteristics of the individuals to be measured or observed (2 types)
population data (parameter)
the data from each individual in the population of interest
sample data (statistic)
comes from some individuals in the population of interest
nominal level
applies to data that consists of names/labels/categories, order doesn't matter
ordinal level
applies to data that can be arranged in order, not numerical
interval level
numerical measurements with no true zero or the ratio isn't meaningful
true zero
if the measurement is converted into other units then the result would be zero
ratio level
data with meaningful ratios, have a true zero
simple random sampling
selected randomly with a number table or computer, each individual/sample of the population has equal chances of being selected
stratified sampling
population divided into several groups and then a simple random sample is drawn from each group
systematic sampling
elements of the population are arranged in order and then every kth element is selected with a random starting point
cluster sampling
population are is divided into sections and then some sections are randomly selected with every individual of each section included in the sample
convenience sampling
has a time/location limit and not every individual of the population has a chance of being selected
multistage sampling
using a combination of sampling methods to create smaller groups at each stage (not convenience, last stage is cluster)
class requirements
every data value must fall into one of the classes the classes must not overlap the classes must be of equal width there must be no gaps between classes
relative frequency
class frequency/total number of data values
relative frequency percentage
relative frequency * 100
cumulative frequency
sum of the frequencies for that class and previous classes
mount shaped symmetric
T shaped, mean/median/mode are equal
skewed to the left
rises to the right, mean<median<mode
skewed to the right
rises to the right, mode<median<mean
unimodal
only has 1 peak/mode
bimodal
has 2 peaks/modes
population mean
u, sum of xi over N (N is the population size)
sample mean
x_, sum of xi, over n (n is the sample size)
deviation
xi-mean (u or x_), negative=value below average, positive=value above average
coefficient of variation
(standard deviation/mean) * 100, small=most of the data is split around the mean, large=data values are split everywhere
outlier
a value that's considerably small or large
lower outlier limit
Q1-1.5(LQR)
upper outlier limit
Q3+1.5(LQR)