AP Stats Unit 1: Displaying Data

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/40

flashcard set

Earn XP

Description and Tags

Might not correspond to Unit 1: Exploring One-Variable Data.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

41 Terms

1
New cards

individuals

subject from which the categorical data is being taken (“who”)

2
New cards

variable

the thing being measured from the individuals (“what”)

3
New cards

categorical variable

variable that takes on values that are category names or group labels

  • bar graphs: can only show frequency or relative frequency (%)

  • pie charts: can only show relative frequency (%)

4
New cards

quantitative variable

takes on numerical values; for a measured/counted quantity

  • dot plots: displays numerical values on a number line with dots showing their frequencies

  • histogram: sorts numerical values into buckets and shows their frequencies; shows general shape of distribution

  • stem-and-leaf plots: shows first digit(s) as a “stem” on one side of a line, and then the final digit (or another digit that is specified in the key) as the “leaf”; retains all original data while showing distribution

  • box plots: for displaying 5-number summary & outliers on a number line

5
New cards

SECTION: CATEGORICAL VARIABLES

(if you’re shuffling this will make no sense)

6
New cards

association

where knowing the value of one variable (which variable it is) helps predict the likelihood of getting the other value(s)

  • basically, any difference in probability of getting a certain variable value after knowing one variable’s value

  • when changes in one variable affect another variable, i.e. if there is any difference in %s for a certain variable in any way for another variable

  • does NOT imply causation, only shows some kind of correlation within the dataset alone (no statistical significance required either)

interpreting segmented bar graphs in terms of association:

  • NO association: segmented bar graphs are the same (%s wise)

  • HAS association: segmented bar graphs are different (difference in %s)

7
New cards

margin

edges of a table that represent the number of data points within smaller groups that all fulfill a certain variable

<p>edges of a table that represent the number of data points within smaller groups that all fulfill a certain variable</p>
8
New cards

grand total

the total within a table that represents all possible data points summed across the groups (marginal totals on either side should both sum to the same number, the grand total)

<p>the total within a table that represents all possible data points summed across the groups (marginal totals on either side should both sum to the same number, the grand total)</p>
9
New cards

frequency

flat number of how many data points fit a certain criterion

10
New cards

relative frequency

scaling a frequency to a %/to 100% or to 1

= marginal frequency/grand total

11
New cards

marginal relative frequency

(# within a margin)/(grand total) * 100%

= B/C

  • B = total of one specific margin/variable

  • C = grand total

<p>(# within a margin)/(grand total) * 100%</p><p>= B/C</p><ul><li><p>B = total of one specific margin/variable</p></li><li><p>C = grand total</p></li></ul><p></p>
12
New cards

joint relative frequency

(# that chose MULTIPLE things at once, that satisfies BOTH one variable AND another)/(grand total) * 100%

= A/C

  • A = chose two specific variables at once

  • C = grand total

<p>(# that chose MULTIPLE things at once, that satisfies BOTH one variable AND another)/(grand total) * 100%</p><p>= A/C</p><ul><li><p>A = chose two specific variables at once</p></li><li><p>C = grand total</p></li></ul><p></p>
13
New cards

conditional relative frequency

(# who chose something within a margin)/(marginal total) * 100%

= A/B

  • A = chose two specific variables at once

  • B = total within one of those variables (marginal total)

<p>(# who chose something within a margin)/(marginal total) * 100%</p><p>= A/B</p><ul><li><p>A = chose two specific variables at once</p></li><li><p>B = total within one of those variables (marginal total)</p></li></ul><p></p>
14
New cards

bar graph (*considerations)

graph that displays the frequencies or relative frequencies of different groups via bars whose vertical heights scale with those frequencies

*IMPORTANT CONSIDERATIONS:

  • vertical axis MUST start at 0, or else data can be misrepresented (can scale small changes too large, etc)

  • CANNOT use “images” or non-bar visuals for bar graphs, bc their area/width is not equal → can be misleading or make larger ones look too large/smaller too small

<p>graph that displays the frequencies or relative frequencies of different groups via bars whose vertical heights scale with those frequencies</p><p><strong><u>*IMPORTANT CONSIDERATIONS:</u></strong></p><ul><li><p>vertical axis MUST start at 0, or else data can be misrepresented (can scale small changes too large, etc)</p></li><li><p>CANNOT use “images” or non-bar visuals for bar graphs, bc their area/width is not equal → can be misleading or make larger ones look too large/smaller too small</p></li></ul><p></p>
15
New cards
<p>side-by-side bar graph</p>

side-by-side bar graph

bar graph that displays one variable as “groups” of bars, and the other variable as bars within those groups of bars

16
New cards

segmented bar graph

bar graph that has major bars representing one variable, then splits those bars into smaller “segments” for the other variable

  • *usually (for our purposes) shows all of them as relative frequencies and each of the bars as 100%; this is to distinguish them from a mosaic plot

<p>bar graph that has major bars representing one variable, then splits those bars into smaller “segments” for the other variable</p><ul><li><p><span style="background-color: transparent;">*usually (for our purposes) shows all of them as relative frequencies and each of the bars as 100%; this is to distinguish them from a mosaic plot</span></p></li></ul><p></p>
17
New cards

mosaic plot

segmented bar graph that scales the bars horizontally to represent the number of subjects within each of the variables on the x-axis

<p>segmented bar graph that scales the bars horizontally to represent the number of subjects within each of the variables on the x-axis</p>
18
New cards

proportion

relative frequency as a decimal version of the relative frequency fraction

19
New cards

percent

relative frequency as a percent (proportion * 100%)

20
New cards

distribution

if asked to “find the distribution”:

  • list all of the “relative frequencies” but as proportions across the whole population

  • all of these results should add up to 1

<p>if asked to “find the distribution”:</p><ul><li><p>list all of the “relative frequencies” but as proportions across the whole population</p></li><li><p>all of these results should add up to 1</p></li></ul><p></p>
21
New cards

SECTION: QUANTITATIVE VARIABLES

(if you’re shuffling this will make no sense)

22
New cards

discrete (quantitative) variable

quantitative variable that has a “countable” number of values where there are not an infinite number of intermediate values; instead, spaces exist between the values

23
New cards

continuous (quantitative) variable

quantitative variable where all intermediate values are okay (can go up to any precision level)

24
New cards

dot plot

plot that displays each individual response as a dot or marker (of equal size) above the value on a number line

  • technically usually used to show quantitative data because it’s on a number line; could potentially show categorical

<p>plot that displays each individual response as a dot or marker (of equal size) above the value on a number line</p><ul><li><p>technically usually used to show quantitative data because it’s on a number line; could potentially show categorical</p></li></ul><p></p>
25
New cards

stem and-leaf plot (stem plot)

shows first digit(s) as a “stem” on one side of a line, and then the final digit (or another digit that is specified in the key) as the “leaf”

  • REQUIRES A KEY: X | X = # (explaining how to interpret the formatting)

  • MUST LEAVE GAPS/blank spaces; cannot “skip” a | line because there aren’t any with that digit before it

    • accurately displays the distribution

  • retains all original data, while showing distribution* this is the goal of a stem-and-leaf plot

pros & cons:

  • pros: shows how data is spread out; allows us to visually see the shape of the distribution

  • cons: when too many numbers for one of the digits, hard to read (though this can be alleviated by splitting the stem)

<p>shows first digit(s) as a “stem” on one side of a line, and then the final digit (or another digit that is specified in the <strong><u>key</u></strong>) as the “leaf”</p><ul><li><p><strong><u>REQUIRES A KEY: X | X = #</u></strong>&nbsp;(explaining how to interpret the formatting)</p></li><li><p><strong><u>MUST LEAVE GAPS/blank spaces</u></strong>; cannot&nbsp;“skip” a | line because there aren’t any with that digit before it</p><ul><li><p>accurately displays the distribution</p></li></ul></li><li><p>retains all original data, while <strong><em>showing distribution</em></strong>* this is the goal of a stem-and-leaf plot</p></li></ul><p>pros &amp; cons:</p><ul><li><p>pros: shows how data is spread out; allows us to visually see the shape of the distribution</p></li><li><p>cons: when too many numbers for one of the digits, hard to read (though this can be alleviated by splitting the stem)</p></li></ul><p></p>
26
New cards

back-to-back stem and leaf plot (back-to-back stem plot)

shows two distributions of data side by side on the same stem, but leafs going on either side

<p><span style="background-color: transparent;">shows two distributions of data side by side on the same stem, but leafs going on either side</span></p>
27
New cards

histogram

sorts numerical values into buckets and shows their frequencies; shows general shape of distribution, not individual responses

  • NO spaces between bars in the histogram****

  • data on the dividing line of a bucket → goes into HIGHER BUCKET (bucket to the right)

  • if the data is discretely the bucket (e.g. the histogram is basically a dot plot that shows how many chose 1, how many chose 2, etc), then write dividing lines in the middle of the bars with the numbers

<p>sorts numerical values into buckets and shows their frequencies; shows general shape of distribution, not individual responses</p><ul><li><p><strong><u>NO spaces between bars</u></strong> in the histogram****</p></li><li><p>data on the <strong><u>dividing line</u></strong> of a bucket → <strong><u>goes into HIGHER BUCKET</u></strong> (bucket to the right)</p></li><li><p>if the data is discretely the bucket (e.g. the histogram is basically a dot plot that shows how many chose 1, how many chose 2, etc), then write dividing lines in the<strong><u> middle of the bars</u></strong>&nbsp;with the numbers</p></li></ul><p></p>
28
New cards

METHOD: describing a distribution**

SOCS/SOCV: shape, outliers, center, spread/variability

+CONTEXT: MUST STATE AT LEAST ONCE (describe what exactly the numbers are - the distribution of what? what’s being measured?)

  • shape: skew (if any), modality, where most values are (gaps or patterns if any)

    • skew:

      • skewed right: values or outliers trail off towards the right (more to the higher side than lower)

      • skewed left: values or outliers trail off towards the left (more to the lower side than higher)

      • symmetric: no/little skew to the left or right

    • modality:

      • unimodal: 1 major peak

      • bimodal: 2 major peaks

      • uniform: no peaks, almost all the same frequency across

    • gaps: places where there are no data points at all

      • *NOTE: if you note outliers, you don’t really have to say gaps as well bc outliers imply gaps. but you can. so this piece is kinda optional

  • outliers: data that is really off from the others

    • “there are possible outliers at…” → don’t need to show calculation

    • there are outliers at…” → show calculation used to get the outlier status

      • IQR method (Q1 - 1.5IQR or Q3 + 1.5IQR)

      • SD method ± 2SD

  • center: either mean or median — depending on if there are OUTLIERS or not:

    • mean: used if shape is SYMMETRIC and NO outliers

    • median: used if shape is SKEWED and/or HAS outliers; more resistant to skew and outliers

  • spread/variability: range, IQR, or SD

    • range = max - min (single number): always allowed to use

    • SD = √(Σ(x - xbar)²/(n - 1)): should mostly use if you gave the mean in the last step

    • IQR = Q3 - Q1 (single number): should mostly use if you gave the median in the last step

29
New cards

standard deviation

s (sample) = √(Σ(x - xbar)²/(n - 1))

a typical (NOT average, bc n-1, not n) difference from the mean

  • for a population, σ = denominator with n (not n-1)

30
New cards

METHOD: describing standard deviation

“the context (variable) typically varies by standard deviation (value) from the mean of mean (value)”

31
New cards

variance

square of the SD

s2 = sample variance

σ2 = population variance

32
New cards

resistance to outliers?

  • mean: NOT resistant to outliers; changes significantly with an outlier at one end towards that end

  • standard deviation: NOT resistant to outliers; INCREASES significantly with outliers

  • median: in comparison, RESISTANT to outliers

  • IQR: in comparison, RESISTANT to outliers

so, for:

  • symmetric distributions: use mean, SD

  • skewed distributions/outliers: use median, IQR

33
New cards

median

middle value of a dataset (or average of two middle values)

easy calculation for the POSITION (not value): (n+1)/2

  • even: pick the two numbers around that number to average

  • odd: pick the number you get

  • *this is calculating the POSITION of the median within the list

34
New cards

Q1 & Q3 calculations

Q1: 25% percentile

Q3: 75% percentile

easy calculation:

  • split the data from the median into two sides depending on n:

  • even # of terms: split ALL data in half EVENLY, then get the medians of each side

    • (use the strategy for easy median calculation)

  • odd # of terms: DO NOT include median when splitting data in half, then get medians of each side

    • (use the strategy for easy median calculation)

35
New cards

five number summary

  • minimum: smallest value in the entire dataset

    • *can be an outlier

    • “0th percentile”

  • Q1: median of the lower half of the dataset

    • 25th percentile

    • first quartile

  • median: median of the entire dataset

    • 50th percentile

  • Q3: median of the upper half of the dataset

    • 75th percentile

    • third quartile

  • maximum: largest value in the dataset

    • *can be an outlier

    • 100th percentile

36
New cards

interquartile range (IQR)

IQR = Q3 - Q1 (a single value/number)

represents where 50% of the data falls

*MUST SHOW CALCULATION if you get the IQR as a question

37
New cards

outliers: 1.5 IQR method

works better with medians; can be used if NOT symmetric (technically works with symmetric, but you should use means/SD in that case)

  • low outlier < Q1 - 1.5*IQR

  • high outlier > Q3 + 1.5*IQR

38
New cards

outliers: SD method

works with means/if you have a symmetric plot ONLY

  • low outlier < mean - 2*SD

  • high outlier > mean + 2*SD

39
New cards

boxplot

shows five-number summary of a quantitative set of data on a number line; can show outliers

  • MUST be on a number line

drawing:

  1. draw a number line

  2. draw vertical lines above each of the numbers in the 5-number summary

  3. connect 3 lines in the middle (Q1-Q3, IQR) to each other, making a box

  4. draw lines from the sides of the box out to the other two vertical lines, making “whiskers”

  5. note outliers with an ASTERISK (*)

    1. you have to change the whiskers so it DOESN’T go out to the outlier, but instead goes out to the next highest/lowest value that is NOT an outlier

<p>shows five-number summary of a quantitative set of data on a number line; can show outliers</p><ul><li><p><strong><u>MUST</u></strong> be on a number line</p></li></ul><p>drawing:</p><ol><li><p>draw a number line</p></li><li><p>draw vertical lines above each of the numbers in the 5-number summary</p></li><li><p>connect 3 lines in the middle (Q1-Q3, IQR) to each other, making a box</p></li><li><p>draw lines from the sides of the box out to the other two vertical lines, making “whiskers”</p></li><li><p><strong>note outliers with an ASTERISK (*)</strong></p><ol><li><p>you have to <strong><u>change the whiskers so it DOESN’T go out to the outlier</u></strong>, but instead goes out to the next highest/lowest value that is NOT an outlier</p></li></ol></li></ol><p></p>
40
New cards

comparing distributions

same as describing distributions, but need more context and COMPARATIVE LANGUAGE:

  • “___ is greater than ___” for each one

  • shape: can be compared (which one is more/less skewed, comparing if their skews are different)

  • outliers: simply state if they have outliers or not (though you can say if the high/low outliers are higher or lower)

  • center: state whether measures of center are higher or lower than each other

  • spread/variability: state whether variability/spread is higher or lower than each other

41
New cards

skew: in a boxplot

hard to tell for sure; these are general guidelines:

  • if the boxes/halves “look symmetric” ish: can assume that the distribution is roughly symmetric (probably)

  • if the boxes/halves look like TOP HALF (min - median) is more than 2x different than the BOTTOM HALF (median - max):

    • then, this is actually skewed

  • if the boxes/halves differ, but not as much as 2x, then slightly skewed

<p>hard to tell for sure; these are general guidelines:</p><ul><li><p>if the boxes/halves “look symmetric” ish: can assume that the distribution is <u>roughly symmetric</u> (probably)</p></li><li><p>if the boxes/halves look like TOP HALF (min - median) is more than 2x different than the BOTTOM HALF (median - max):</p><ul><li><p>then, this is<strong><em> actually skewed</em></strong></p></li></ul></li><li><p>if the boxes/halves differ, but not as much as 2x, then<strong> <em>slightly skewed</em></strong></p></li></ul><p></p>