1/43
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
data
systematically recorded information
used to calculate, analyze, or predict
something
Statistics
it is the study of how to collect,
organize, analyze, and interpret data (usually
collected from a group)
descriptive statistics
Using tables, graphs (histograms, scatterplots,
etc) and numbers (mean, median, standard
deviation, correlation, etc) to organize and
summarize and describe the data at hand
(usually sample data).
inferential statistics
Analyze a “small” specific set of data (sample)
in order to draw a conclusion about a “large”,
more general group (population)
• 2 Types
– Confidence intervals
– Hypothesis tests
individual
a person or object that you are interested in
finding information about
variable
the measurement of observation recorded for each
individual
population
set of all individuals that are of interest for some
question or study (typically very “large” and often
implied/unstated)
sample
a subset from the population that you actually
collect data for (usually “small” compared to the the
population and always known)
parameter
is a numeric descriptor of / number
calculated from a population
– It is a fixed number, usually unknown, that you want to find
– EX: μ, σ (typically Greek letters)
statisitc
is a numeric descriptor of / number
calculated from a sample
– readily known/found and used to estimate parameter
– EX: ̅ 𝑥, s
what is the difference between a parameter and statistic
Parameters are fixed, but statistics vary from
sample to sample
qualitiative
A word or name that describes a quality of the individual
(does not count or measure anything)
• Non-numeric (typically)
quantitative
Something that can be counted or measured from the individual
• Numeric
types of quantitative data
discrete and continuous
discrete data
finite number of subdivisions (things you count)
continuous
infinite subdivisions (things you measure)
census
Collects data on the entire population
SRS
every individual has same chance and every
sample has the same chance- known as best method
stratified
break into strata (not random) and
then take SRS from each strata
systematic
take every Kth individual for the
sample
cluster
divide into clusters (not random) and
then get info from all individuals in some
randomly selected clusters
convience
sample individuals that are nearby
or conveniently located
data collection bias
– Measurement Device
– Personal Bias (Blind/Double Blind)
– Questioning technique / wording
sampling method bias
– Voluntary Response
– Convenience Sampling
– Nonresponse
– Undercoverage
– Response Bias
distribution
A list of all possible values of a variable and how
often each value occurs
When describing distributions of quantitative
variables, we focus on…
– Shape
– Center
– Spread (variation)
– Outliers
frequency
how often something occurs
frequency distribution
A summary table of a distribution listing variable
values, frequencies, and often other information
relative frequency
Percent of total, listed as percent or decimal; also
called sample proportion
frequency table
organizes collected data in table form using categories
(or classes) and frequencies (counts)
relative frequency table
organizes raw data in table form using categories (or
classes) and proportions (or percentages).
• Relative frequency tables are useful when comparing data sets where the sample sizes are not the same
graphical displays for qualitative data
pie chart
bar chart (bars dont touch)
pareto chart
univariate data
one variable
bivariate data
two variables
multivariate data
3 or more variables
graphical displays for quantitative data
histogram
scatterplot
stem plot
histogram
a graph of the frequencies (or relative
frequencies)
for quantitative data
bunched into classes
class width
(max-min)/# of classes
always round up
symmetrical
does not need to be perfect, also called bell shaped
skewed left
tail to the left (negatively skewed)
skewed right
tail to the right (positively skewed)
uniform
symmetric
stem plots
are used for small (typically 𝑛𝑛 ≤ 40)
quantitative data sets involving a single variable
can be used to determine shape
scatterplots
display trends in the relationship
between the 2 quantitative variables