AP Statistics - Descriptive Statistics

0.0(0)
studied byStudied by 2 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/52

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

53 Terms

1
New cards

Intro & Quantitative Data - TYPES OF DATA

  • Quantitative: data in the form of numerical values

    • ex> height, weight

  • Qualitative: data in the form of words, characteristics, etc.

    • ex> fav color, birthday month

2
New cards

Intro & Quantitative Data - TYPES OF GRAPHS

  • For univariable (1 variable) data: bar graph, pie chart, histogram, line graph, stem + leaf plot, dot plot, box plot

  • For bivariable (studies the relationship b/w 2 variables) data: scatter plot

3
New cards

Intro & Quantitative Data - Distribution

set of data that uses the frequency that each outcome occurs among all possibilities

  • Measures of Central Tendency where center of distribution of data lies

    • mean, median, mode

  • Measures of Spread → amount of variation in distribution

    • range, IQR, standard deviation

  • Shape of Distribution

4
New cards

Intro & Quantitative Data - Histogram

  • title

  • x-axis (+labels)

  • y-axis (+labels)

  • bars touch, measures a quantitative variable against frequency

5
New cards

Intro & Quantitative Data - Dot Plot

  • title

  • x-axis

  • dots above corresponding values to represent frequency

6
New cards

Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Wherever tail is…)

…pulls the mean up or down…

7
New cards

Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Skew Right)

  • Skew Right: most data on left

    • mean > med

    • high values have a big weight on mean

      • few data points to right pull mean up

    • tail w/ less data on right

8
New cards

Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Skew Left)

  • Skew Left: most data on right

    • mean < med

    • tail on left

    • few data points to left pull mean down

9
New cards

Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Symmetric)

  • mean = med

10
New cards

Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Unimodal)

  • “one mode”

    • One hump w/ highest frequency

11
New cards

Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Uniform)

  • frequencies are about the same

12
New cards

Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Bimodal)

  • (symmetric)

13
New cards

Intro & Quantitative Data - SHAPES OF DISTRIBUTIONS (Multimodal)

14
New cards

Intro & Quantitative Data - SYMBOLS: Population Mean

  • μ (“mu”)

15
New cards

Intro & Quantitative Data - SYMBOLS: Sample Mean

  • x̄ (x-bar)

    • x → any variable

16
New cards

Intro & Quantitative Data - SYMBOLS: Population Standard Deviation

  • 𝛔 (sigma)

17
New cards

Intro & Quantitative Data - SYMBOLS: Population Variable

  • 𝛔2 (sigma squared)

18
New cards

Intro & Quantitative Data - SYMBOLS: Sample Standard Deviation

  • s

19
New cards

Intro & Quantitative Data - SYMBOLS: Sample Variable

  • s2

20
New cards

Intro & Quantitative Data - MEASURES OF CENTRAL TENDENCY

  • Typically the mean best describes a distribution

  • When outliers exist or a large skew, the median is best

    • outliers and skewedness affect the mean b/c the mean takes into account the weight of all values whereas the median does not

  • Mode is used for qualitative data (you can’t find mean/median w/o #’s)

21
New cards

Intro & Quantitative Data - HISTOGRAM W/ CLASSES

  • To create classes → Range / # of classes

    • (must be whole #, ALWAYS round up)

  • Classes: use formula and add by class width for each class

  • MP: (smaller number in class width + larger number in class width) / 2

    • x-axis

  • Frequency: find how many numbers are present in the distribution in classes

    • Should add up to sample size!

  • Relative Frequency: frequency/sample size

    • Y-AXIS

  • Cumulative Relative Frequency: add up relative frequencies

    • Always ends at 1!

22
New cards

Intro & Quantitative Data - MEASURES OF SPREAD

  • Range (max-min) = 29-5 = 24

    • *The range is 24 or the range is from 5 to 29

  • IQR: interquartile range (Q3 - Q1)

  • Standard deviation

23
New cards

Intro & Quantitative Data - BOX PLOTS

  • List numbers in order

  • Find MEDIAN

    • Median term # when listed in order

      • (n + 1) / 2

  • Find Q1

    • Median between median and minimum value

  • Find Q3

    • Median between median and maximum value

  • 25% of the data is within each quartile

    • SIZE of quartile doesn’t matter (just indicates more or less spread)

  • FOR OUTLIERS…

    • Solve for outliers

    • Make the maximum/minimum value the next highest number

24
New cards

Intro & Quantitative Data - 5 NUMBER SUMMARY

  • Minimum

  • Q1

  • Median

  • Q3

  • Maximum

25
New cards

Intro & Quantitative Data - OGIVE: CUMULATIVE RELATIVE FREQUENCY GRAPH

  • Plot points as a line

  • x-axis: MP’s

  • y-axis: Cumulative Relative Frequency

  • *ogives are only interpreted to the left (‘this or less”)

  • *to go from cumulative relative frequency to a box plot, estimate the quartiles (0%, 25%, 50%, 75%, 100%)

    • 0% → min

    • 25% → Q1

    • 50% → Q2

    • 75% → Q3

    • 100% → max

26
New cards

Intro & Quantitative Data - STANDARD DEVIATION

→ the average distance each value lies from the mean

  • Make a table with x, (x-x̄), & (x-x̄)2

  • List data points under x column

  • Do (x-x̄) under (x-x̄) column

    • Add up all the values

  • Do (x-x̄)2 under (x-x̄)2 column

    • Add up all the values = TOTAL VARIABLE

  • 𝛔2 (population variable) = total variable / average variable

  • 𝛔 (population standard deviation) = √𝛔

    • On average ____ stray ____ (𝛔) away from the mean.

27
New cards

Intro & Quantitative Data - FORMULAS FOR STANDARD DEVIATION

  • For population:

  • For sample:

28
New cards

Intro & Quantitative Data - CALCULATE OUTLIERS

  • Rule is outliers fall outside of interval

    • [Q1 - 1.5(IQR), Q3 + 1.5(IQR)]

29
New cards

Intro & Quantitative Data - WRITE A FEW SENTENCES DESCRIBING THE DATA

  1. center

  2. spread

  3. shape

  4. unusual features (outliers, gaps, clusters)

  5. MUST be in context

30
New cards

Describing Qualitative Data - BAR CHART

  • x-axis

  • y-axis: frequency

  • bars DO NOT touch

31
New cards

Describing Qualitative Data - PARETO CHART

  • x-axis

  • y-axis: Frequency

  • bars DO NOT touch

  • *Bars in descending order, highlights the mode

32
New cards

Describing Qualitative Data - PIE CHART

  • percentage = relative frequency

  • # of people = frequency

33
New cards

Describing Qualitative Data - SEGMENTED BAR GRAPH

  • Make a table with RELATIVE FREQUENCY & CUMULATIVE RELATIVE FREQUENCY

    • Add relative frequencies before value to get cumulative relative frequency

  • x-axis: One Bar

  • y-axis: Cumulative Relative Frequency

  • label segments of bar

  • *break messes with scale… can make relative frequency look smaller than it is

34
New cards

Describing Qualitative Data - CONTINGENCY TABLE

  • 2 variables

  • ….of the… = denominator

35
New cards

Comparing Distributions

  • Include a discussion of center, spread and shape using context and comparative statements. Include approximate values/ranges when possible.

36
New cards

Comparing Distributions - Comparative Statements

  • Comparative Statements: greater than, higher, less than, lower, equal, etc. (except shape)

    • Use “whereas” only for shape

  • List:

    • mean

    • standard deviation

    • sample size

    • minimum value

    • Q1

    • median

    • Q3

    • maximum value

    • outliers

37
New cards

Introduction to Normal Distributions - normal distribution

  • a bell-shaped frequency distribution curve. Most of the data values in a normal distribution tend to cluster around the mean.

    • → the further away a data point is from the mean, the less likely it is to happen

38
New cards

Introduction to Normal Distributions - Characteristics

  • unimodal (one mode, one peak), symmetric (right side mirrors left side), asymptotic (approach, but never touch x-axis), mean = median = mode (center = peak → 50% data below mean, 50% data above mean)

39
New cards

Introduction to Normal Distributions - What does the NORMAL MODEL look like?

  • x-axis: mean @ center + standard deviations away

  • curve with asymptotic ends

40
New cards

Names for Normal Distributions

  • One of the most important examples of a continuous probability distribution is the normal distribution. The graph is usually called normal, bell-shaped or Gaussian curve.

41
New cards

Properties of Normal Distributions - Area Under the Curve

  • Total area under the curve is always equal to one.

  • The portion of the area under the curve above a given interval represents the probability that a measurement will lie in that interval.

    • area under curve = probability

42
New cards

Properties of Normal Distributions - EMPIRICAL FORMULA

  • The Empirical Rule can be applied for any normal distribution which says:

    • → about 68% of data lies within 1 std. dev. of mean

    • →about 95% of data lies within 2 std. dev. of mean

    • → about 99.7% of data lies within 3 std. dev. of mean

  • The Empirical Rule can be used to find different percentiles.

  • *MAKE SURE TO INCLUDE ABOUT WHEN ANSWERING QUESTIONS

43
New cards

Properties of Normal Distributions - Normal distributions vary from one another in two ways: the mean may be located anywhere on the x axis and the bell shape may be more or less spread according to the size of the standard deviation. It would be difficult to compute the area under the curve for each different combination… Z SCORES

  • → a z-score tells you exactly how many std. dev. a data value is above or below the mean

44
New cards

Properties of Normal Distributions - Z SCORES: How does standardizing affect the center, spread and shape of the distribution?

  • When the data is converted to z scores the mean (center) becomes 0, the std. dev. becomes 1, shape remains the same

  • z = (x - μ) / 𝛔 → standardized test statistic = (statistic - parameter) / std. Dev.

45
New cards

Properties of Normal Distributions - Z SCORES: We can use these z-scores to then…

 calculate probabilities using our z-score chart to determine the area under the curve that corresponds with each z-score.

46
New cards

Properties of Normal Distributions - Z SCORES: Find the specified areas!

  • 4 decimal places b/c table uses 4 decimal places (for area under curve/probability)

    • Go to z-score table using probability notation → P(z </>/</> _)

      • For negative values (to the left), find z score & corresponding value.

      • For positive values (to the right), subtract corresponding values from 1.

      • For between, (larger value in table) - (smaller value in table).

47
New cards

Properties of Normal Distributions - Z SCORES: < / <

  • < / < are the same!

48
New cards

Properties of Normal Distributions - ACCURACY

  • ACCURACY: The normal model is not accurate if <3 std. dev. from the mean is negative.

    • Depends on the context of the problem…

    •  A normal model must be able to go 3 std. dev. in both directions.

49
New cards

Rescaling Data - How does shift affect mean + std. dev.?

  • The mean increases or decreases by shift.

  • The std. dev. stays the same (not affected)

50
New cards

Rescaling Data - How does multiplier affect mean + std. dev.?

  • → multiplying (scaling) data

  • Both mean + std. dev. get multiplied by the scalar value.

51
New cards

Rescaling Data - Adding a number to a distribution that is the same as the mean will…

  • not change the mean as it is equal to the current mean

  • decrease the std. dev. because there is less variability since we have added another value at the center

52
New cards

Rescaling Data - *when you convert units…

nothing is actually changing (this applies to z-scores as well)

53
New cards

Rescaling Data - Turn rescaling into an algebraic expression!

  • *substitute values you are looking for into expression… need to take into account which values are affected by shifts/multipliers.

  • *data points/measures of center affected by both shifts + mult. while measures of spread are only affected by mult.