1/52
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Intro & Quantitative Data - TYPES OF DATA
Quantitative: data in the form of numerical values
ex> height, weight
Qualitative: data in the form of words, characteristics, etc.
ex> fav color, birthday month
Intro & Quantitative Data - TYPES OF GRAPHS
For univariable (1 variable) data: bar graph, pie chart, histogram, line graph, stem + leaf plot, dot plot, box plot
For bivariable (studies the relationship b/w 2 variables) data: scatter plot
Intro & Quantitative Data - Distribution
→ set of data that uses the frequency that each outcome occurs among all possibilities
Measures of Central Tendency → where center of distribution of data lies
mean, median, mode
Measures of Spread → amount of variation in distribution
range, IQR, standard deviation
Shape of Distribution
Intro & Quantitative Data - Histogram
title
x-axis (+labels)
y-axis (+labels)
bars touch, measures a quantitative variable against frequency
Intro & Quantitative Data - Dot Plot
title
x-axis
dots above corresponding values to represent frequency
Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Wherever tail is…)
…pulls the mean up or down…
Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Skew Right)
Skew Right: most data on left
mean > med
high values have a big weight on mean
few data points to right pull mean up
tail w/ less data on right
Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Skew Left)
Skew Left: most data on right
mean < med
tail on left
few data points to left pull mean down
Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Symmetric)
mean = med
Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Unimodal)
“one mode”
One hump w/ highest frequency
Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Uniform)
frequencies are about the same
Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Bimodal)
(symmetric)
Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Multimodal)
Intro & Quantitative Data - SYMBOLS: Population Mean
μ (“mu”)
Intro & Quantitative Data - SYMBOLS: Sample Mean
x̄ (x-bar)
x → any variable
Intro & Quantitative Data - SYMBOLS: Population Standard Deviation
𝛔 (sigma)
Intro & Quantitative Data - SYMBOLS: Population Variable
𝛔2 (sigma squared)
Intro & Quantitative Data - SYMBOLS: Sample Standard Deviation
s
Intro & Quantitative Data - SYMBOLS: Sample Variable
s2
Intro & Quantitative Data - MEASURES OF CENTRAL TENDENCY
Typically the mean best describes a distribution
When outliers exist or a large skew, the median is best
outliers and skewedness affect the mean b/c the mean takes into account the weight of all values whereas the median does not
Mode is used for qualitative data (you can’t find mean/median w/o #’s)
Intro & Quantitative Data - HISTOGRAM W/ CLASSES
To create classes → Range / # of classes
(must be whole #, ALWAYS round up)
Classes: use formula and add by class width for each class
MP: (smaller number in class width + larger number in class width) / 2
x-axis
Frequency: find how many numbers are present in the distribution in classes
Should add up to sample size!
Relative Frequency: frequency/sample size
Y-AXIS
Cumulative Relative Frequency: add up relative frequencies
Always ends at 1!
Intro & Quantitative Data - MEASURES OF SPREAD
Range (max-min) = 29-5 = 24
*The range is 24 or the range is from 5 to 29
IQR: interquartile range (Q3 - Q1)
Standard deviation
Intro & Quantitative Data - BOX PLOTS
List numbers in order
Find MEDIAN
Median term # when listed in order
(n + 1) / 2
Find Q1
Median between median and minimum value
Find Q3
Median between median and maximum value
25% of the data is within each quartile
SIZE of quartile doesn’t matter (just indicates more or less spread)
FOR OUTLIERS…
Solve for outliers
Make the maximum/minimum value the next highest number
Intro & Quantitative Data - 5 NUMBER SUMMARY
Minimum
Q1
Median
Q3
Maximum
Intro & Quantitative Data - OGIVE: CUMULATIVE RELATIVE FREQUENCY GRAPH
Plot points as a line
x-axis: MP’s
y-axis: Cumulative Relative Frequency
*ogives are only interpreted to the left (‘this or less”)
*to go from cumulative relative frequency to a box plot, estimate the quartiles (0%, 25%, 50%, 75%, 100%)
0% → min
25% → Q1
50% → Q2
75% → Q3
100% → max
Intro & Quantitative Data - STANDARD DEVIATION
→ the average distance each value lies from the mean
Make a table with x, (x-x̄), & (x-x̄)2
List data points under x column
Do (x-x̄) under (x-x̄) column
Add up all the values
Do (x-x̄)2 under (x-x̄)2 column
Add up all the values = TOTAL VARIABLE
𝛔2 (population variable) = total variable / average variable
𝛔 (population standard deviation) = √𝛔2
On average ____ stray ____ (𝛔) away from the mean.
Intro & Quantitative Data - FORMULAS FOR STANDARD DEVIATION
For population:
For sample:
Intro & Quantitative Data - CALCULATE OUTLIERS
Rule is outliers fall outside of interval
[Q1 - 1.5(IQR), Q3 + 1.5(IQR)]
Intro & Quantitative Data - WRITE A FEW SENTENCES DESCRIBING THE DATA
center
spread
shape
unusual features (outliers, gaps, clusters)
MUST be in context ⭐
Describing Qualitative Data - BAR CHART
x-axis
y-axis: frequency
bars DO NOT touch
Describing Qualitative Data - PARETO CHART
x-axis
y-axis: Frequency
bars DO NOT touch
*Bars in descending order, highlights the mode
Describing Qualitative Data - PIE CHART
percentage = relative frequency
# of people = frequency
Describing Qualitative Data - SEGMENTED BAR GRAPH
Make a table with RELATIVE FREQUENCY & CUMULATIVE RELATIVE FREQUENCY
Add relative frequencies before value to get cumulative relative frequency
x-axis: One Bar
y-axis: Cumulative Relative Frequency
label segments of bar
*break messes with scale… can make relative frequency look smaller than it is
Describing Qualitative Data - CONTINGENCY TABLE
2 variables
….of the… = denominator
Comparing Distributions
Include a discussion of center, spread and shape using context and comparative statements. Include approximate values/ranges when possible.
Comparing Distributions - Comparative Statements
Comparative Statements: greater than, higher, less than, lower, equal, etc. (except shape)
Use “whereas” only for shape
List:
mean
standard deviation
sample size
minimum value
Q1
median
Q3
maximum value
outliers
Introduction to Normal Distributions - normal distribution
a bell-shaped frequency distribution curve. Most of the data values in a normal distribution tend to cluster around the mean.
→ the further away a data point is from the mean, the less likely it is to happen
Introduction to Normal Distributions - Characteristics
unimodal (one mode, one peak), symmetric (right side mirrors left side), asymptotic (approach, but never touch x-axis), mean = median = mode (center = peak → 50% data below mean, 50% data above mean)
Introduction to Normal Distributions - What does the NORMAL MODEL look like?
x-axis: mean @ center + standard deviations away
curve with asymptotic ends
Names for Normal Distributions
One of the most important examples of a continuous probability distribution is the normal distribution. The graph is usually called normal, bell-shaped or Gaussian curve.
Properties of Normal Distributions - Area Under the Curve
Total area under the curve is always equal to one.
The portion of the area under the curve above a given interval represents the probability that a measurement will lie in that interval.
area under curve = probability
Properties of Normal Distributions - EMPIRICAL FORMULA
The Empirical Rule can be applied for any normal distribution which says:
→ about 68% of data lies within 1 std. dev. of mean
→about 95% of data lies within 2 std. dev. of mean
→ about 99.7% of data lies within 3 std. dev. of mean
The Empirical Rule can be used to find different percentiles.
*MAKE SURE TO INCLUDE ABOUT WHEN ANSWERING QUESTIONS
Properties of Normal Distributions - Normal distributions vary from one another in two ways: the mean may be located anywhere on the x axis and the bell shape may be more or less spread according to the size of the standard deviation. It would be difficult to compute the area under the curve for each different combination… Z SCORES
→ a z-score tells you exactly how many std. dev. a data value is above or below the mean
Properties of Normal Distributions - Z SCORES: How does standardizing affect the center, spread and shape of the distribution?
When the data is converted to z scores the mean (center) becomes 0, the std. dev. becomes 1, shape remains the same
z = (x - μ) / 𝛔 → standardized test statistic = (statistic - parameter) / std. Dev.
Properties of Normal Distributions - Z SCORES: We can use these z-scores to then…
… calculate probabilities using our z-score chart to determine the area under the curve that corresponds with each z-score.
Properties of Normal Distributions - Z SCORES: Find the specified areas!
4 decimal places b/c table uses 4 decimal places (for area under curve/probability)
Go to z-score table using probability notation → P(z </>/</> _)
For negative values (to the left), find z score & corresponding value.
For positive values (to the right), subtract corresponding values from 1.
For between, (larger value in table) - (smaller value in table).
Properties of Normal Distributions - Z SCORES: < / <
< / < are the same!
Properties of Normal Distributions - ACCURACY
ACCURACY: The normal model is not accurate if <3 std. dev. from the mean is negative.
Depends on the context of the problem…
A normal model must be able to go 3 std. dev. in both directions.
Rescaling Data - How does shift affect mean + std. dev.?
The mean increases or decreases by shift.
The std. dev. stays the same (not affected)
Rescaling Data - How does multiplier affect mean + std. dev.?
→ multiplying (scaling) data
Both mean + std. dev. get multiplied by the scalar value.
Rescaling Data - Adding a number to a distribution that is the same as the mean will…
not change the mean as it is equal to the current mean
decrease the std. dev. because there is less variability since we have added another value at the center
Rescaling Data - *when you convert units…
…nothing is actually changing (this applies to z-scores as well)
Rescaling Data - Turn rescaling into an algebraic expression!
*substitute values you are looking for into expression… need to take into account which values are affected by shifts/multipliers.
*data points/measures of center affected by both shifts + mult. while measures of spread are only affected by mult.