1/74
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
population
collection of all elements (people or things) of interest
example of population
ALL cscc students, ALL toddlers at daycare
sample
subset of the population actually surveyed- usually identified by the sample size (n=#)
example of sample
50 cscc students, 12 toddlers at daycare
parameter
summary values that comes from population data, usually noted by greek letters (mu=populaiton mean, sigma= population standard deviation)
statistic
summary values that come from sample data
descriptive statistics
facts/outcomes of collected data, describes what was collected (volleyball highlights)
inferential statistics
uses descriptive stats to make predictions or generalize the population (predictions for who will win the game)
data collection methods
census, simulation, experiment, sampling
census
collect “as is” data from the entire population (collect data from all stat1450 sp25 students)
simulation
collect “pretend” data- reduces times, risk and cost (flight simulation, study effects of meds on mice)
experiment
impose a treatment and record the response, often compares control (placebo) vs. experimental groups
sampling
collect “as is” data from a subset of the population (survey 30 randomly selected hair salon customers)
sampling methods
simple random, convenience, cluster, stratified, systematic
simple random
requires randomly selecting from a list of the entire population so that element of the population has the same change of being selected; this is the least biased sampling methods but the most costly/timely
example of simple random
drawing names from a hat, computer random dialing
convenience
survey those that are easily accessible, this is the most biased methods
example of convenience method
survey one class, phone in survey, magazine survey
cluster
sample all from a few pre-existing locations; in theory, each location is a good representation of the population (age, gender, etc)
example of cluster method
survey all students in 3 classes on campus
stratified
characteristic is determined that might bias the survey results (gender, politics, age) and sample equal/proportionate numbers from each
example of stratified method
20 male and 20 female, 100 democrat and 100 republican
systematic
sample every nth element of the population (every 3rd package on the assembly line)
qualitative data
words or numbers that describe words
examples of qualitative data
hometown, rank mood from 1-10, phone number, room number
quantitative data
number that represents counts or measurements
examples of quantitative data
age, gpa, number of siblings, “how much?”, total $
discrete data
finite number of outcome in any given interval, graph displays gaps
continuous data
infinite number of possible outcomes, graph displays no gaps/breaks
levels of measurement
nominal, ordinal, interval, ratio
nominal
“name”, qualitative data (words) that aren’t ranked, names something EX: hometown, zip code, favorite color
ordinal
“order”, qualitative data (words), that have an inherent rank, ranks words EX: military rank, class rank, letter grade
interval
quantitative data that has an arbitrary scale, zero does not mean absence of EX: clock time, temperature, year
ratio
quantitative data where zero means ‘nothing exists’, highest level of measurement EX: age, $, height, gpa
formula to find class width
(max-min)/ # of classes; round up to next whole #
histogram
bar graph
quantitative data
bars touch
horizontal axis: midpoints
vertical axis: frequency
relative frequency histogram
bar graph
bars touch
HA: midpoints
VA: relative frequency
polygon
line graph (connect dots)
HA: midpoints
VA: frequency
goes up and down
ogive
line graph
HA: upper class limits
VA: cumulative frequency
always increases/ stays level
NEVER decreases
stem and leaf plot rules
numeric order
can’t skip values
at least 3 stems required
time series plot
quantitative data
show how something changes over time
uses time units along HA (days, years, months)
uses collected data along VA
dot plot
quantitative data
used to display shape
stacked dots for data
box plot rule about outliers
there if an outlier if it is further than 1.5 from box
pareto chart
qualitative data
HA: categories
gaps between bars
go in decreasing height
VA: frequency/ rel. frequency
displays mode
shape is data dependent
common mode is brown hair color
pie chart
qualitative data
circle graph
separate parts of a whole
each section displays the amount
all add to 100%
measures of center
mean, media, mode
mean
average, “not resistant measure”→ affected by extreme data
median
middle piece of data, considered “resistant measure”→ not affected by extreme data
mode
data that occurred most often, no mode if all values occurred the same number of times
trimmed mean
average caluclated from a “trimmed” data set, trime % of data from both low and high end of sorted list, then find mean of the remaining values
Multiply % times n (sample number) to determine how many values to deleter from each end of the list
outliers
extreme values that pull the mean away from the center
best when no outliers exist
mean
best when outliers exist
median
measures of variation (spread)
range, standard deviation
range
max - min
standard deviation
average distance that each peice of data is away from the mean
Sx= sample standard deviation
Sigmax= population standard deviation
uniform distribution
no mode, mean=median
normal distribution
1 mode= mean= median (all in center top of peak)
skewed right
“positively skewed”, mean>median
skewed left
“negatively skewed”, mean<median
bimodal
2 modes, mean=median
empirical rule
only applies to normal distributions for k=1,2,3 respectively
empirical rule of 68%
mu (sample) ±1sigma (stdev sx)= 68%
empirical rule of 95%
mu ± 2sigma(stdev)= 95%
empirical rule of 99.7%
mu ± 3sigma=99.7%
chebyshev’s theorem
used when distribution is not normal, at least 1-(1/k²) of the data lie within k standard deviation of the mean
chebyshev’s theorem of 75%
mu (sample) ± 2(stdev)= 75%
chebyshev’s theorem of 89%
mu (sample) ± 3(stdev)= 89%
chebyshev’s theorem of 94%
mu (sample) ± 4(stdev)= 94%
coefficient of variation formula
stdev/mean; the smaller the CV- the more consistent the data
measures of relative position
percentiles, quartiles, z-scores, unusual data
percentiles
percent of data below a given value; P= #’s below given #/ total amount of #’s times 100
quartiles
divide data into quarters
Q1: 25% of data is below this
Q2: median, 50% of data is below this
Q3: 75% of data is below this
z-scores
number of standard deviations a value is from the mean
z= sample - mean/ stdev
pos= above mean
neg= below mean
0= exactly on the mean
unusual data
data that is more than 2 standard deviation from the mean, left or right