AP Statistics Chapters 1-3 Concepts

studied byStudied by 17 people
0.0(0)
Get a hint
Hint

individuals

1 / 63

flashcard set

Earn XP

Description and Tags

Statistics

64 Terms

1

individuals

the objects described by a set of data; may be people, animals, or things

New cards
2

variables

Any characteristics of an individual; can take different values for different individuals

New cards
3

categorical variable

a variable that places and individual into one of several groups or categories

New cards
4

quantitative variable

a variable that takes numerical values for which it makes sense to find an average

New cards
5

distribution

tells us what values the variable takes and how often it takes these values

New cards
6

frequency

the number of times a particular value for a variable has been observed

New cards
7

relative frequency

the ratio that compares the frequency of each category to the total frequency

New cards
8

pie graphs/charts

used only when you want to emphasize each category's relation to the whole

<p>used only when you want to emphasize each category&apos;s relation to the whole</p>
New cards
9

two-way table

a way to display the frequencies of two categorial variables; one variable is represented by rows, the other by columns

<p>a way to display the frequencies of two categorial variables; one variable is represented by rows, the other by columns</p>
New cards
10

marginal distribution

in a two-way table of counts, the distribution of values of one of the categorical variables among all individuals described by the table

<p>in a two-way table of counts, the distribution of values of one of the categorical variables among all individuals described by the table</p>
New cards
11

conditional distribution

describes the values of a variable among individuals who have a specific value of another variable; basically, looking for the values of this variable that satisfy a condition of the other variable

<p>describes the values of a variable among individuals who have a specific value of another variable; basically, looking for the values of this variable that satisfy a condition of the other variable</p>
New cards
12

side-by-side bar graph

used to compare the distribution of a categorical variable in each of several groups; for each value of the categorical variable, there is a bar corresponding to each group. can be in counts of percents

<p>used to compare the distribution of a categorical variable in each of several groups; for each value of the categorical variable, there is a bar corresponding to each group. can be in counts of percents</p>
New cards
13

segmented bar graph

displays the distribution of a categorical variable as segments of a rectangle, with the area of each segment proportional to the percent of individuals in the corresponding category

<p>displays the distribution of a categorical variable as segments of a rectangle, with the area of each segment proportional to the percent of individuals in the corresponding category</p>
New cards
14

association between variables

if knowing the value of one variable helps predict the value of the other; if it doesn't then there is no association (the bar graphs would look the same)

New cards
15

dotplot

a graph w/ a horizontal axis and w/ dots above locations on the number line; displays quantitative variables . . . remember to label the graph

<p>a graph w/ a horizontal axis and w/ dots above locations on the number line; displays quantitative variables . . . remember to label the graph</p>
New cards
16

stemplot

used for fairly small data sets; show distribution by putting the final digit on the outside (leaves) and having the first digit(s) on the inside (stem) . . . remember to add a key . . . can also have a back-to-back stemplot

<p>used for fairly small data sets; show distribution by putting the final digit on the outside (leaves) and having the first digit(s) on the inside (stem) . . . remember to add a key . . . can also have a back-to-back stemplot</p>
New cards
17

histogram

nearby values of quantitative data are grouped together . . . bars are side by side/connected . . . can be frequency counts of relative frequency

<p>nearby values of quantitative data are grouped together . . . bars are side by side/connected . . . can be frequency counts of relative frequency</p>
New cards
18

"describe this distribution"

describe shape, outliers, center, spread, and include context

New cards
19

outliers and rule

any point that lies MORE than 1.5 IQR's from either quartile

1.5IQR+Q3< = outlier Q1-1.51QR> = outlier

New cards
20

skewed left/right

a non-symmetrical distribution where one tail stretches out further (to the left/right) than the other . . . if the long tail is to the right, it's skewed-right, if the long tail is to the left, it's skewed-left

<p>a non-symmetrical distribution where one tail stretches out further (to the left/right) than the other . . . if the long tail is to the right, it&apos;s skewed-right, if the long tail is to the left, it&apos;s skewed-left</p>
New cards
21

"compare these distributions"

describe the shape, outliers, spread, center, of each, but use comparative words/phrases and explain how they differ from each other . . . include CONTEXT

New cards
22

mean vs. median (when to use)

use the mean (and SD) when you have symmetric data with NO outliers or skewness . . . use the median (and IQR) when you have heavy skewness or outliers because the median is resistant

<p>use the mean (and SD) when you have symmetric data with NO outliers or skewness . . . use the median (and IQR) when you have heavy skewness or outliers because the median is resistant</p>
New cards
23

mean of population vs. sample

use X̄ (x-bar) when you are describing the mean of a sample . . . use μ (mew) when you are describing the mean of a population (whole thing)

<p>use X̄ (x-bar) when you are describing the mean of a sample . . . use μ (mew) when you are describing the mean of a population (whole thing)</p>
New cards
24

standard deviation of population vs. sample

use s_x when you are describing the standard deviation of a sample . . . use σ (sigma) when you are describing the standard deviation of a population (whole thing)

New cards
25

quartiles

values that divide a data set into four equal parts . . . first (lower) quartile is @ 25th percentile and halfway between the minimum and the median . . . second quartile is @ 50th percentile and is the median . . . third (upper) quartile is @ 75th percentile and is halfway between the median and the maximum . . . the fourth quartile is irrelevant,

<p>values that divide a data set into four equal parts . . . first (lower) quartile is @ 25th percentile and halfway between the minimum and the median . . . second quartile is @ 50th percentile and is the median . . . third (upper) quartile is @ 75th percentile and is halfway between the median and the maximum . . . the fourth quartile is irrelevant,</p>
New cards
26

IQR (interquartile range)

third quartile - first quartile; the middle half /50% of the data

<p>third quartile - first quartile; the middle half /50% of the data</p>
New cards
27

5 number summary

consists of the minimum, first quartile, median (second quartile), third quartile, and maximum

<p>consists of the minimum, first quartile, median (second quartile), third quartile, and maximum</p>
New cards
28

range

maximum - minimum . . . is a single number . . . you cannot say the range is 100-300 , must say the range is somewhere between 100-300, etc.

<p>maximum - minimum . . . is a single number . . . you cannot say the range is 100-300 , must say the range is somewhere between 100-300, etc.</p>
New cards
29

boxplot

a graph that does not display shape very well and does not display amount of observations but does display the 5 number summary in the form of a split box with two "whiskers" . . . also called a box-and-whisker plot

<p>a graph that does not display shape very well and does not display amount of observations but does display the 5 number summary in the form of a split box with two &quot;whiskers&quot; . . . also called a box-and-whisker plot</p>
New cards
30

variance

the standard deviation squared . . . does not use the same units as the standard deviation and original data, so can only be used to prove something mathematically . . . s_x^2

<p>the standard deviation squared . . . does not use the same units as the standard deviation and original data, so can only be used to prove something mathematically . . . s_x^2</p>
New cards
31

standard deviation

the average deviation of data from the mean . . . ex: on average, the football scores deviate/are off from the mean by 3 points . . . lowest the SD can be is 0 . . . measured with same units as original (when all points are the same) . . . is not resistant

<p>the average deviation of data from the mean . . . ex: on average, the football scores deviate/are off from the mean by 3 points . . . lowest the SD can be is 0 . . . measured with same units as original (when all points are the same) . . . is not resistant</p>
New cards
32

resistant measure of center

the median is a resistant measure of center because it is only taking into account one point (the center point)

<p>the median is a resistant measure of center because it is only taking into account one point (the center point)</p>
New cards
33

mean/SD vs. median/IQR

the mean/SD are NOT resistant (because they use every data point) and will be affected by outliers and skewness, so they should only be used to describe a distribution when the data is roughly symmetric . . . the median/IQR ARE resistant (because they only use 1-2 points) and should be used when there is heavy skewness or outliers

<p>the mean/SD are NOT resistant (because they use every data point) and will be affected by outliers and skewness, so they should only be used to describe a distribution when the data is roughly symmetric . . . the median/IQR ARE resistant (because they only use 1-2 points) and should be used when there is heavy skewness or outliers</p>
New cards
34

percentile

the value with p percent of the observations less than or equal to it . . . expressed as a percentile . . . interpreted as: "the value of ___ is at the pth percentile. about p percent of the values are less than or equal to ___."

<p>the value with p percent of the observations less than or equal to it . . . expressed as a percentile . . . interpreted as: &quot;the value of ___ is at the pth percentile. about p percent of the values are less than or equal to ___.&quot;</p>
New cards
35

z-score (standardized score)

a measure of how many standard deviations you are away from the mean (negative = below, positive = above) . . . calculated by (observation - mean)/(standard deviation)

<p>a measure of how many standard deviations you are away from the mean (negative = below, positive = above) . . . calculated by (observation - mean)/(standard deviation)</p>
New cards
36

cumulative relative frequency graph

can be used to describe the position of an individual within a distribution or to locate a specified percentile of the distribution . . . uses percentiles on y-axis . . . the steeper areas mean more observations in that area, and vice versa for gradually growing areas

<p>can be used to describe the position of an individual within a distribution or to locate a specified percentile of the distribution . . . uses percentiles on y-axis . . . the steeper areas mean more observations in that area, and vice versa for gradually growing areas</p>
New cards
37

recentering vs. rescaling

recentering is when you add/subtract a constant to the distribution, moving it on the x-axis either left or right, NOT changing shape, spread (range and IQR), SD, . . . rescaling is when you multiply/divide by a constant, either making it more spread apart or closer together, NOT changing shape, median, mean

<p>recentering is when you add/subtract a constant to the distribution, moving it on the x-axis either left or right, NOT changing shape, spread (range and IQR), SD, . . . rescaling is when you multiply/divide by a constant, either making it more spread apart or closer together, NOT changing shape, median, mean</p>
New cards
38

density curve

a mathematical curve that is always on or above the horizontal axis, has an area of 1 underneath it, and describes the overall pattern of a distribution . . . outliers are NOT described by the curve

<p>a mathematical curve that is always on or above the horizontal axis, has an area of 1 underneath it, and describes the overall pattern of a distribution . . . outliers are NOT described by the curve</p>
New cards
39

find mean/median in density curve

when the density curve is symmetric, the mean/median are the same and are in the middle . . . when the curve is skewed-right, the mean will be closer to the tail than the median, and the median will be at the middle of the data while the mean will be @ the "balance point" . . . vice versa for skewed-left distributions

<p>when the density curve is symmetric, the mean/median are the same and are in the middle . . . when the curve is skewed-right, the mean will be closer to the tail than the median, and the median will be at the middle of the data while the mean will be @ the &quot;balance point&quot; . . . vice versa for skewed-left distributions</p>
New cards
40

Normal distributions of data

distributions that fall in a bell-shaped shape and follow somewhat closely the empirical (68-95-99.7) rule . . . can be modeled by a Normal curve/model

<p>distributions that fall in a bell-shaped shape and follow somewhat closely the empirical (68-95-99.7) rule . . . can be modeled by a Normal curve/model</p>
New cards
41

Normal curve/model

mathematical model that describes normal distributions . . . they have the same overall pattern: symmetrical, single-peaked, bell-shaped . . . described by giving it's mean and SD (larger SD means more flat)

<p>mathematical model that describes normal distributions . . . they have the same overall pattern: symmetrical, single-peaked, bell-shaped . . . described by giving it&apos;s mean and SD (larger SD means more flat)</p>
New cards
42

60%-95%-99.7% (empirical) rule of thumb

in a Normal model, 68% of data will be between 1 SD of the mean, 95% within two SD's, and 99.7% within three SD's

<p>in a Normal model, 68% of data will be between 1 SD of the mean, 95% within two SD&apos;s, and 99.7% within three SD&apos;s</p>
New cards
43

standard Normal model

the Normal model w/ mean 0 and SD 1 . . . the completely standardized Normal distribution

<p>the Normal model w/ mean 0 and SD 1 . . . the completely standardized Normal distribution</p>
New cards
44

Normal probability plot

a display to help assess whether a distribution of data is approximately normal; if it is nearly straight, the data satisfy the nearly normal condition . . . found by getting the percentiles of each observation, then the z-scores for every percentile, and plot the data x w/ expected z-scores on the y-axis

<p>a display to help assess whether a distribution of data is approximately normal; if it is nearly straight, the data satisfy the nearly normal condition . . . found by getting the percentiles of each observation, then the z-scores for every percentile, and plot the data x w/ expected z-scores on the y-axis</p>
New cards
45

response variable

on the y-axis, measures an outcome of a study

<p>on the y-axis, measures an outcome of a study</p>
New cards
46

explanatory variable

on the x-axis, may help explain or predict changes in a response variable

<p>on the x-axis, may help explain or predict changes in a response variable</p>
New cards
47

correlation (r)

measures the direction and strength of the LINEAR relationship between two QUANTITATIVE variables . . . just because correlation is high does not indicate linear-ness . . . can be -1 ≤ r ≤ 1 , where 0 is no correlation, and ±1 is perfect correlation . . . has NO unit of measurement . . . does NOT imply causation . . . NOT resistant . . . when x and y are flipped, the correlation r stays the same

<p>measures the direction and strength of the LINEAR relationship between two QUANTITATIVE variables . . . just because correlation is high does not indicate linear-ness . . . can be -1 ≤ r ≤ 1 , where 0 is no correlation, and ±1 is perfect correlation . . . has NO unit of measurement . . . does NOT imply causation . . . NOT resistant . . . when x and y are flipped, the correlation r stays the same</p>
New cards
48

regression line

a line that describes how a response variable y changes as an explanatory variable x changes . . . oftentimes, these lines are used to predict the value of y for a given value of x . . . ONLY used when one variable helps explain/predict the other . . . also known as line of best fit

<p>a line that describes how a response variable y changes as an explanatory variable x changes . . . oftentimes, these lines are used to predict the value of y for a given value of x . . . ONLY used when one variable helps explain/predict the other . . . also known as line of best fit</p>
New cards
49

regression line equation

ŷ = a +bx

  • ŷ (y hat) is the PREDICTED value of the response variable y for a given value of the explanatory variable x

  • b is the slope, the amount by which y is PREDICTED to change when x increases by one unit

  • a is the y-intercept, the PREDICTED value of y when x=0

<p>ŷ = a +bx</p><ul><li><p>ŷ (y hat) is the PREDICTED value of the response variable y for a given value of the explanatory variable x</p></li><li><p>b is the slope, the amount by which y is PREDICTED to change when x increases by one unit</p></li><li><p>a is the y-intercept, the PREDICTED value of y when x=0</p></li></ul>
New cards
50

extrapolation

the use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line, these predications are NOT accurate . . . sometimes the y-intercept is an extrapolation because x=0 wouldn't make sense or makes y negative

<p>the use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line, these predications are NOT accurate . . . sometimes the y-intercept is an extrapolation because x=0 wouldn&apos;t make sense or makes y negative</p>
New cards
51

residuals

the difference between an observed value of the response variable and the value predicted by the regression line (vertical difference) = observed y - predicted y = y - ŷ

<p>the difference between an observed value of the response variable and the value predicted by the regression line (vertical difference) = observed y - predicted y = y - ŷ</p>
New cards
52

least squares regression line (LSRL)

the line of y on x that makes the sum of the squared residuals as small as possible . . . it's the residuals squared because if you didn't square them, when you added them together they would all cancel out . . . the mean of the least squares residuals is always 0

<p>the line of y on x that makes the sum of the squared residuals as small as possible . . . it&apos;s the residuals squared because if you didn&apos;t square them, when you added them together they would all cancel out . . . the mean of the least squares residuals is always 0</p>
New cards
53

residual plot

a scatterplot of the residuals against the explanatory variable . . . helps to assess whether a linear model is appropriate . . . turns the regression line horizontal . . . if random scatter is on the plot, it is linear, if there is a pattern left over (such as a curve), it's not linear and the linear model is not appropriate

<p>a scatterplot of the residuals against the explanatory variable . . . helps to assess whether a linear model is appropriate . . . turns the regression line horizontal . . . if random scatter is on the plot, it is linear, if there is a pattern left over (such as a curve), it&apos;s not linear and the linear model is not appropriate</p>
New cards
54

standard deviation of the residuals (s)

measures the the typical/approximate size of the typical prediction errors (residuals) when using the regression line . . . is s . . . written in original units . . . interpreted as: "when using the LSRL w/ x=[explanatory] to PREDICT y=[response], the model will typically be off by about ____ units."

<p>measures the the typical/approximate size of the typical prediction errors (residuals) when using the regression line . . . is s . . . written in original units . . . interpreted as: &quot;when using the LSRL w/ x=[explanatory] to PREDICT y=[response], the model will typically be off by about ____ units.&quot;</p>
New cards
55

coefficient of determination (r^2)

the PERCENTAGE of the variation in the values of y that is accounted for by the LSRL of y on x . . . no units . . . measured 0 (does not predict at all) ≤ r ≤ 1 (perfect) . . . is the correlation squared . . . interpreted as: "___% of the variation in [response] is accounted for/explained by the linear model on [explanatory]."

New cards
56

describing slope of LSRL

"This model PREDICTS that for every 1 additional [explanatory], there is an increase by ____ more [response]."

New cards
57

describing y-intercept of LSRL

"This model PREDICTS that [explanatory] of 0 (context) would have a [response] of ____."

New cards
58

outlier in regression

a point that does not follow the GENERAL TREND shown in the rest of the data AND has a LARGE RESIDUAL when the LSRL is calculated

<p>a point that does not follow the GENERAL TREND shown in the rest of the data AND has a LARGE RESIDUAL when the LSRL is calculated</p>
New cards
59

high-leverage point

a point in regression with a substantially larger or smaller x-value than the other observations

<p>a point in regression with a substantially larger or smaller x-value than the other observations</p>
New cards
60

influential point

any point in regression that, if removed, changes the relationship substantially (much dif, slope, y-int, correlation, or r^2) . . . oftentimes, outliers and high-leverage points are influential

<p>any point in regression that, if removed, changes the relationship substantially (much dif, slope, y-int, correlation, or r^2) . . . oftentimes, outliers and high-leverage points are influential</p>
New cards
61

writing LSRL equations

ŷ = a +bx

b= correlation * (standard deviation of y's/ standard deviation of x's) b= r (s_y)/(s_x)

a= mean of y values * slope * mean of x values a= ȳ-bx̄

LSRL always passes through point (x̄,ȳ)

New cards
62

regression to the mean

in a LSRL, ŷ is going to be closer to ȳ than x is to x̄, except for when r = 1 or -1 . . . ŷ is r*(s_y) above ȳ, whereas x is just 1(s_x) above x̄

New cards
63

standardizing regressions

(x̄,ȳ) becomes (0,0), s_x = s_y = 1, and b=r (slope is equal to the correlation), because b= r (s_y)/(s_x), b= r (1/1) . . .

New cards
64

describing scatterplots

form (linear, non-linear (curved, etc.))

direction (positive, negative, none)

strength (strong, moderately-strong, moderate, moderately-weak, weak)

outliers (possible outliers, one @ (x,y), etc.)

context (Ex: actual and guessed ages . . .)

<p>form (linear, non-linear (curved, etc.))</p><p>direction (positive, negative, none)</p><p>strength (strong, moderately-strong, moderate, moderately-weak, weak)</p><p>outliers (possible outliers, one @ (x,y), etc.)</p><p>context (Ex: actual and guessed ages . . .)</p>
New cards

Explore top notes

note Note
studied byStudied by 10 people
... ago
5.0(1)
note Note
studied byStudied by 12 people
... ago
4.0(1)
note Note
studied byStudied by 5 people
... ago
4.0(1)
note Note
studied byStudied by 18 people
... ago
5.0(1)
note Note
studied byStudied by 13 people
... ago
5.0(1)
note Note
studied byStudied by 10 people
... ago
4.0(1)
note Note
studied byStudied by 23 people
... ago
5.0(1)
note Note
studied byStudied by 40070 people
... ago
4.8(312)

Explore top flashcards

flashcards Flashcard (201)
studied byStudied by 32 people
... ago
5.0(1)
flashcards Flashcard (64)
studied byStudied by 8 people
... ago
5.0(1)
flashcards Flashcard (22)
studied byStudied by 6 people
... ago
4.0(2)
flashcards Flashcard (42)
studied byStudied by 2 people
... ago
5.0(1)
flashcards Flashcard (91)
studied byStudied by 4 people
... ago
5.0(1)
flashcards Flashcard (35)
studied byStudied by 19 people
... ago
5.0(1)
flashcards Flashcard (32)
studied byStudied by 18 people
... ago
4.0(1)
flashcards Flashcard (45)
studied byStudied by 4 people
... ago
5.0(1)
robot