AP Stats Unit 2: Normal Distribution

0.0(0)
studied byStudied by 2 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/14

flashcard set

Earn XP

Description and Tags

Might not correspond to Unit 2: Exploring Two-Variable Data.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

15 Terms

1
New cards

percentile

% of values that are less than or equal to a given value; i.e. the cumulative relative frequency

  • at Xth percentile” - not “in” - percentile is a location, not a bucket

calculating percentile:

  • percentile = proportion from the left

  • = (# of points less than or equal to)/(total # of points) * 100 (for a full #)

*for normal distributions, can use:

  • normalcdf(lower: -10^99, upper: value (or z-score of value), mean: mean (or 0), SD: SD (or 1))

  • = proportion * 100

  • = percentile

2
New cards

proportion

percentile but as a fraction

= percentile/100

= %/100%

3
New cards

cumulative relative frequency

adding up all the relative %s before that data point

= to the percentile (as a proportion from the left)

<p>adding up all the <strong><em>relative</em></strong> %s before that data point</p><p>= to the<strong> percentile (as a proportion from the left)</strong></p>
4
New cards

GRAPH: cumulative relative frequency

graphing cumulative relative frequency:

  • x-axis: data value

  • y-value: proportion (on a 1.0 scale)

  • use straight lines between the points known/given since we don’t know the behavior of the data between the points

estimating using a graph:

  • guess using the graph, linearly, where the line’s y-value would line up for that x-value

important notes:

  • median: x-value with 0.5 y-value

  • Q1: x-value with 0.25 y-value

  • Q3: x-value with 0.75 y-value

<p><u>graphing cumulative relative frequency:</u></p><ul><li><p>x-axis: data value</p></li><li><p>y-value: proportion (on a 1.0 scale)</p></li><li><p>use<strong> straight lines</strong>&nbsp;between the points known/given since we don’t know the behavior of the data between the points</p></li></ul><p><u>estimating using a graph:</u></p><ul><li><p>guess using the graph, linearly, where the line’s y-value would line up for that x-value</p></li></ul><p><u>important notes:</u></p><ul><li><p>median: x-value with 0.5 y-value</p></li><li><p>Q1: x-value with 0.25 y-value</p></li><li><p>Q3: x-value with 0.75 y-value</p></li></ul><p></p>
5
New cards

INTERPRETATION: cumulative relative frequency

PERCENTILE% of [specific subjects] have a [measured variable] equal to or less than [value], the [value] of the [subject that the percentile is of]

6
New cards

z-score

= (value - mean)/SD

“standardized score” that allows for comparison to:

  • a standardized curve if you know the distribution of the data (e.g. a normal distribution)

  • the dataset overall to know much MORE something is in comparison to the mean/variability of the original

7
New cards

INTERPRETATION: z-score

the [measured variable] of the [specific subject that has the z-score] is [z-score] standard deviations below the mean of [mean value & context]

8
New cards

effects of +- a constant to all values in a dataset

  • shape: no change

  • center: +- that constant

    • ex: mean = old mean +- constant

      • all of them got +- constant, so total we had a change of n*+-constant

      • finding the mean → must divide by /n, so the change also gets divided:

        • change = n*+-constant

        • n*+-constant/n = +-constant → this is the final change

    • ex: median = old median +- constant

      • obviously, bc all values had the constant +-

  • variability: no change

9
New cards

effects of */ a constant to all values in a dataset

  • shape: no change

    • stretching in any way → does NOT change the shape itself, but only the scale of of the shape (because the distribution of the points themselves doesn’t change)

  • center: */ that constant

    • ex. mean = old mean */ constant

      • old mean = (total dataset)/n

      • new dataset: constant*(total dataset)

      • new mean = constant*(total dataset)/n = constant*(old mean)

    • ex. median = old median */ constant

  • variability (SD): */ that constant

    • because distance between the distance */ constant as well

    • variance: */ constant² — “Since we multiplied the SD by x, variance was multiplied by x²”

10
New cards

shape, mean, & SD of a z-score distribution

  • shape: same

  • mean: 0

  • standard deviation: 1

because:

  • z-score is a linear transformation on the values: (value - mean)/SD

  • -mean: mean becomes 0, SD stays the same

  • /SD: mean is 0/SD = 0, SD becomes 0

  • *throughout all changes, shape stays the same

11
New cards

density curve

graphical representation of the probability distribution of a numerical variable (may be continuous and is smoothest that way, but could be discrete)

features:

  • total area = 1

    • other “area” rules apply (represents #/density of points in that area); think of it like a “smoothed out histogram”

  • *this is essentially the integral of the function (x, f(x)) if you just drew the curve tops. the area below it = the integral of this function (which is how the calc would do it)

telling if skew:

  • skewed left: many low-probability points to the left (tail off to the left)

    • mean < median

  • skewed right: many low-probability to the right (tail off to the right)

    • mean > median

  • symmetric: poitns are distributed about evenly around a mean in the center

    • mean ~= median

mean:

  • 50% of the area on one side

  • 50% of the area on the other side

12
New cards

normal distribution

a common curve that appears in all of nature/the world; when the density curve is perfectly symmetric & follows the empirical rule

  • N(mean, SD): graph of normal distribution with mean & SD

  • iff empirical rule → normally distributed & the other way around

proving something is normal:

  • empirical rule is similar to the real %s between SD’s

  • use normalcdf(value) and check if the given proportion of that value is similar to the calculated normal proportion

13
New cards

empirical rule (68-95-99.7 rule)

68% of data is 1 SD away from mean in either direction; 95% of data is 2 SD away from mean in either direction; 99.7% of data is 3 SD away from mean in either direction

  • applies only for normal distributions

usage:

  • can be used to estimate %s without a calc/z-table, particularly if the values lie on perfect SD counts (or almost perfect; the further they get from these perfect SD counts, the worse the approx. becomes, bc we assume it’s linear but it isn’t)

  • can be used to “guess” about how many SD’s or %s a data point should be between (given one or the other)

<p><span style="background-color: transparent;">68% of data is 1 SD away from mean in either direction; 95% of data is 2 SD away from mean in either direction; 99.7% of data is 3 SD away from mean in either direction</span></p><ul><li><p>applies only for normal distributions</p></li></ul><p><u>usage:</u></p><ul><li><p>can be used to estimate %s without a calc/z-table, particularly if the values lie on perfect SD counts (or almost perfect; the further they get from these perfect SD counts, the worse the approx. becomes, bc we assume it’s linear but it isn’t)</p></li><li><p>can be used to “guess” about how many SD’s or %s a data point should be between (given one or the other)</p></li></ul><p></p>
14
New cards

Calculating Proportion from a Boundary Value (specific x) — Normalcdf

GRAPH & PROCESS:

  • must draw curve — either original or standardized ok

    • (standardized needs you to INPUT the z-score)

    • label: mean, value, N(mean, SD)

    • shade: area of interest

  • *CALCULATE THE Z-SCORE REGARDLESS

    • always need to do this, even if you don’t use it in the final calculation

  • *NEED LABELS on functions (lower, upper, mean, SD)

    • mean, SD can be the greek letters or just the words

original version:

  • curve with N(mean, SD) given and then:

    • label mean

    • label cutoff value (literally just the value you have)

    • shade area that you are calculating

  • label all of the calculations with: upper = actual value, mean (μ) = actual, SD (σ) = 26)

  • = proportion

  • → area (* 100% of proportion, round to tenth of a percent)

standardized version:

  • curve with N(0,1) and then:

  • label 0

  • label the cutoff value

  • shade the area that you are calculating

  • normalcdf(lower: NUMBER (-1*10^99), upper: NUMBER, mean: 0, SD: 1)

  • = proportion

  • → area (* 100% of proportion, round to tenth of a percent)

CALCULATION:

  • x is known, proportion/percentile/area is not

  • normalcdf(lower, upper, mean, SD)

15
New cards

Calculating Boundary Value (specific x) from a Proportion — invNorm

GRAPH & PROCESS:

  • must draw curve — either original or standardized ok

    • label: mean, N(mean, SD)

    • *also label value (specific x), BUT label with an “x”

    • shade: area of interest

  • *CALCULATE THE Z-SCORE

    • *can be done in 2 ways:

    • 1. before you get the actual value, use the standardized method → automatically returns the z-score for you

      • invNorm(area: proportion, mean: 0, SD: 1) = z-score

    • 2. use the original method → manually calc z-score afterward

  • *NEED LABELS on functions (lower, upper, mean, SD)

    • mean, SD can be the greek letters or just the words

CALCULATION:

  • proportion/percentile/area is known, x is not

  • invNorm(area, mean, SD, area of interest (left/right/center))

  • notes on left/right/center:

    • left: aka percentile from left, proportion from left, or area from left

    • right: aka percentile from right, proportion from right, or area from right

    • center*: usually not used until we get to confidence intervals - helps us find the boundary values for a confidence interval of 95%, for example; would give us the values on the low and high end if we “fill in” 95% in the middle

      • equivalent to finding 2.5% on left & 2.5% on the right (or 97.5% on the left)