1/34
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
categorical data
Data that consists of names, labels, or other nonnumerical values (averages are meaningless)
examples: grade, fav season, birth month
What is statistics?
A way of reasoning, along with a collection of tools and methods to help us understand the world
It’s the science of collecting, organizing, analyzing and interpreting data.
quantitative data
numerical data (averages give some sort of information)
examples: # of people, age, time, height
frequency table
includes category and frequency (amount)
relative frequency
includes category, frequency (amount), and relative frequency (%)
Ways to represent categorical data
pie chart, bar chart, dot plot
Rules for bar chart
Label your axes: variable name on horizontal, frequency on vertical
Scale axes: start scaling vertical axis at 0 and go up in equal increments until you =/< maximum frequency
Draw bars: make them equal in width and leave gaps
Types of quantitative data
discrete (whole numbers with gaps) and continuous (no gaps, can be decimal)
dot plot advantages and disadvantages (quantitative data
advantages: shows every individual value, easy to see shape of distribution
disadvantages: bad for large amts
Ways to represent quantitative variables
dot plot, stem plot, histogram
stem plot advantages and disadvantages
shows every individual value in the data set, easy to see distribution shape
disadvantages: difficult to make for large data sets
histogram advantages and disadvantages
advantages: good for large data sets, easy to see distribution shape
disadvantages: doesn't show every individual value
cumulative frequency
the sum of the frequencies for that class and all previous classes
factors to comment on to describe a quantitative variable
shape, center, variability, unusual features
IQR range
Q3-Q1 where Q1 is in between the min and median, and Q3 is between max and median
Formula for outliers
Lower fence: Q1 - 1.5 x IQR
Upper fence: Q3 + 1.5 x IQR
If it goes beyond these boundaries, it’s an outlier
Range
highest value - lowest value
Variance
standard deviation squared
skewed left
mean is less than median
skewed right
mean is greater than median
standard deviation
the square root of the variance (always less than variance when variance => 1)
if you add a constant to add values, SD remains the same
ways to describe distribution
shape, center, spread, variability
shapes of graphs
Uniform (roughly the same across all points)
Symmetric (no skew)
Skewed right (more low values)
Skewed left (more high values)
unimodal (one spot where data raises forming a lump)
bimodal (two spots where data raises forming a lump)
center
median and mean
unusual features
outliers, gaps, clusters
Variability
IQR, range, standard deviation, variance
Box plot/five number summary
minimum, Q1, median, Q3, maximum
Histogram
a bar graph depicting a frequency distribution (larger, more inspecific data points)
z-score
how many SD away from the mean;
value given > mean = positive
value given < mean = negative
Percentile
amount below or equal to the point
if you are in 99th percentile, you're the top 1%
number of values before & including value / total data
68-95-99.7 rule
in a normal model, about 68% of values fall within 1 standard deviation of the mean, about 95% fall within 2 standard deviations of the mean, and about 99.7% fall within 3 standard deviations of the mean
Z-Score Chart
use if you can't find % with the rule
working backwards
find closest value on z-score chart and solve for x
Adding data
If adding amount: range, SD and variance stays the same but mean changes
If adding percentage: change all by multiplying by 1.x but variance is still just SD²
Finding median and IQR with cumulative frequency chart
Point where cumulative proportion is 0.5 = median
Q1 is between median and lowest, Q3 is between median and highest → find both then use Q3 - Q1 to find IQR