Looks like no one added any tags here yet for you.
Examples of Variability
gas prices, height, weight
3 main reasons to study statistics
to be informed, make good decisions, evaluating decisions that effect me
population
the set of all individuals of interest
sample
a subset of individuals selected from the population (representative group of a population)
parameter
numerical characteristic of a population, fixed quantity
statistic
numerical characteristic of a sample, variable quantity
statistical questions
a question that is answered by collecting data that varies
what is needed to determine if a question is statistical
the population, the variable that is being measured, and the variation that occurs in the variable
variable
any characteristic observed in a study
descriptive statistics
methods of organizing and summarizing information
what is the purpose of descriptive statistics
to reduce the data to simple summaries without losing too much information
what is descriptive statistics used with?
samples or with a population data set
inferential statistics
methods for drawing conclusions about a population
what is inferential statistics based on?
samples
what is inferential statistics used with?
sample data
key words for descriptive stats
summarize, record, reflect, reduce
key words for inferential stats
concluding, predicting, estimating
categorical variable
non numerical groups or categories present
quantitative variable
numerical variable
discrete
possible values form a set of separate numbers
example of discrete
number of books, shoe size, number of apples
continuous
its possible values form a continuum of values over the real number line
example of continuous
distance, height, weight
we can …. discrete variables
count
we can … continuous variables
measure
frequency distribution
a listing of distinct categories and their counts
relative frequency distribution
a listing of distinct values and their relative frequencies
what is the use of a relative frequency distribution
compare samples especially when they samples are unequal
Pareto diagram
a type of bar graph where the categories are order by their counts from the tallest bar to the shortest bar in descending order
pie chart
circle divided into wedge shape pieces proportional to their relative frequencies
what are pie charts best used for
data sets with less than 8 categories
what are graphical displays for categorical data sets
pie chart, Pareto diagram, and bar charts
mean
sum of observations divided by the number of observations
median
middle
mode
any value that occurs with the greatest frequency
what is sensitive to extreme values
mean
what isn’t sensitive to extreme values
percentiles
indicate the point below which percentage of observations occur
quartiles
divides the data into quarters
1st quartile
median of the lower half of data
2nd quartile
the median of the data
3rd quartile
median of the upper half of data
standard deviation
the square root of the variance
interquartile range (IQR)
the difference between Q3 and Q1
what does the IQR tell us
how spread out the middle 50% of data is
Range
max-min
why isn’t range used more
it only takes into account the largest and the smallest observations, might not be indicative of the trend due to being extreme
examples of quantitative graphs
dot plots, histograms, boxplots, density plots, comparative bar charts, and time plots
dot plots
display individual values of a data set
histograms
bar graph, but the categories touch
density plot
a smoothed histogram that is useful for determining the shape of the data
boxplot
a graph of the five number summary
what is the five number summary?
minimum, Q1, median, Q3, and max
time plots
used to show changes over time
what are the components of SOCS
shape, outliers, center, and spread
when do we use SOCS?
when we are asked to describe the distribution of a quantitative variable
Modality
number of peaks
unimodal
1 peak
bimodal
2 peaks
multimodal
more than 2 peaks
left skewed
negatively skewed, left tail extends longer than the right tail
right skewed
positively skewed, right tail extends longer than the left tail
outliers
unusual values, separate from the rest of the data
when do we use the median to describe data?
when the data is skewed
when do we use the mean to describe data?
when the data isn’t skewed
center
symmetric, reports the mean
spread
reports the standard deviation
do symmetric distributions have outliers?
No
do skewed distributions have outliers?
they can
bivariate data
data that has two variables
response variable
measured to make comparisons between two groups
explanatory variable
explains the value of the response variable
contingency variable
a frequency distribution for bivariate data
cell
each row+column combination
How to determine if there is an association between two categorical variables
compare row % of observations within each category of the group variable for each category. See if the response changes across the groups and if it changes there is an association
what is the range of difference for determining if there is an association between two categorical variables?
5-10% means there is an association
comparative bar chart
a bar chart that compares the conditional proportions of the response within each category of the grouping variable
positive association
as values of one variable increase, so do the values of the other
negative association
as the values of one variable increase, the values of the other decrease
no association
no relationship, neutral
correlation
measure of the strength and direction of the linear relationship between two models
magnitude
indicates the strength of the linear relationship
direction
the sign of the correlation coefficient, indicates the direction of the association
how do I know if the relationship is strong or weak from a numerical standpoint
closer to 1 or -1 means it has a stronger association
what to include in a description of the correlation coefficient
type of relationship (linear)
strength of association (weak, moderate, strong)
direction of association (positive or negative)
context (between the variables x and y)
probability
chance of an event occurring
subjective probability
you decide the likelihood EX: what are the chances I do my homework?
theoretical probability
based on a formula EX: when flipping a coin you have a 50% of either heads or tails
experimental probability
based on the results of a random experiment EX: flipping a coin 10 times and using the amount of times you obtain heads to estimate the prob. of getting heads
sample space
collection of all events and outcomes in an experiment
observation
the observed outcomes of a random experiment
event
a subset of the sample, a collection of outcomes
complement
the subset of all outcomes with the sample space that are NOT in event A, “opposite”
intersection
the event containing all the elements that are common to both A and B, “overlap”
Union
The event containing all of the elements that belong to only A, only B, or both, “combination”
mutually exclusive
events that have no outcomes in common, cannot happen at the same time
Formula for complements
P(A^c)=1-P(A)
Formula for Unions
P(A and B)= P(A)+P(B)-P(B)-P(A and B)
Marginal
an individual event probability