Ch. 1-4 Stats

5.0(1)
studied byStudied by 3 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/120

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

121 Terms

1
New cards

exponential model

y=ab^x (note that a is not the y-int and b is not the slope, they are just placeholders)

- if there is a common ratio (or approximately common) for each equal time period, you have exponential growth/decay

- common ratio > 1: growth- 0 < common ratio < 1: decay

- make sure to note that you can't use the world exponential unless it has been proven by the data

- we usually to study/decay over time

- x vs log y- LSRL: log y^ = a +bx

2
New cards

Statistics

the science and art of collecting, analyzing, and drawing conclusions from data

3
New cards

Individuals

- an object described in a set of data -- can be people, animals, or things

- WHO/WHAT are we gathering information about?

4
New cards

Variables

- an attribute that can take different values for different individuals

- what do we want to know about these individuals?

5
New cards

Qualitative/Categorical Variables

- assigns labels that place each individual into a particular group called a category

- distinct groups/classifications; can be numerical values that make no sense to average (phone numbers)

6
New cards

Marginal relative frequency

- gives the percent or proportion of individuals that have a specific value for one categorical variable

7
New cards

Conditional relative frequency

- gives the percent or proportion of individuals that have a specific value for one categorical variable among individuals who share the same value of another categorical variable (the condition

8
New cards

Simpson's paradox

- an association between two variables that

holds for each value of a third variable can be changed or even reversed when the data for all values of the third variable are combined

9
New cards

Side-by-side bar graph

Displays the distribution of a categorical variable for each value of another categorical variable. The bars are grouped together based on the values of one of the categorical variables and placed side by side.

10
New cards

Segmented bar graph

displays the distribution of a categorical variable as segments of a rectangle, with the area of each segment proportional to the percent of individuals in the corresponding category

11
New cards

Mosaic plot

a modified segmented bar graph in which the width of each rectangle is proportional to the number of individuals in the corresponding category

12
New cards

Association

- if knowing the value of one variable helps us predict the value of the other, there is association

- if knowing the value of one variable does not help us predict the value of the other, there is no association

13
New cards

Dot Plot

shows each data value as a dot above its location on a number line

14
New cards

first quartile

the median of the data values that are to the left of the median in the ordered list

15
New cards

cumulative relative frequency graph (ogive)

plots a point corresponding to the percentile of a given value in a distribution of quantitative data. consecutive points are then connected with a line segment to form the graph

16
New cards

no association

If knowing the value of one variable does not help you predict the value of the other.

17
New cards

cluster sampling

selects a sample by randomly choosing clusters and including each member of the selected clusters in the sample

18
New cards

experiment

deliberately imposes some treatment on individuals to measure their responses

19
New cards

random assignment

experimental units are assigned to treatments using a chance process

20
New cards

Quantitative Variables

- takes number values that are quantities -- counts or measurements

- makes sense to carry out arithmetic operations like adding and averaging

21
New cards

Discrete Variable

- a quantitative variable that takes a fixed set of possible values with gaps between them (shoe size)

22
New cards

Continuous Variable

- a quantitative variable that can take any value in an interval on the number line (GPA)

23
New cards

Distribution

tells us what values the variable takes and how often it takes these values

24
New cards

Bar Graph (Bar Chart)

- shows each category as a bar

- the heights of the bars show the category frequencies or relative frequencies

- 1 categorical variable

25
New cards

Two-way (contingency) tables

- table of counts that summarizes data on the relationship between two categorical variables for some group of individuals

26
New cards

Joint relative frequency

- gives the percent or proportion of individuals that have a specific value for one categorical variable and a specific value for another categorical variable

27
New cards

Symmetric

- if the right side of the graph (containing the half of observations with the largest values) is approximately a mirror image of the left side

28
New cards

Skewed to the left

if the left side of the graph is much longer than the right side

29
New cards

Skewed to the right

if the right side of the graph is much longer than the left side

30
New cards

Stem plot

Shows each data value separated into two parts: a stem, which consists of all but the final digit, and a leaf, the final digit. The stems are ordered from lowest to highest and arranged in a vertical column. The leaves are arranged in increasing order out from the appropriate stems.

31
New cards

Histogram

Shows each interval of values as a bar. The heights of the bars show the frequencies or relative frequencies of values in each interval.

32
New cards

Mean

the arithmetic average of a distribution, obtained by adding the scores and then dividing by the number of scores

33
New cards

statistic

a number that describes some characteristic of a sample

34
New cards

parameter

a number that describes some characteristic of the population

35
New cards

resistant

not sensitive to extreme values

36
New cards

median

midpoint of a distribution, the number such that about half the observations are smaller and about half are larger

37
New cards

range

distance between the minimum value and the maximum value

38
New cards

variance

average squared deviation s^2

39
New cards

standard deviation

- measures the typical distance of the values in a distribution from the mean

- average of squared deviations and then taking the square root

- square root of variance

40
New cards

quartiles

divide the ordered data set into four roups having roughly the same number of values

41
New cards

third quartile

the median of the data values that are to the right of the median in the ordered list

42
New cards

interquartile range

distance between the first and third quartiles of a distribution

43
New cards

outliers

individuals values that fall outside the overall pattern of a distribution

44
New cards

five-number summary

The minimum, first quartile (Q1), median, third quartile (Q3), and the maximum

45
New cards

box plot

visual representation of five-number summary

46
New cards

modified box plot

A box plot that indicates which data values, if any, are outliers by representing them as dots separate from the box plot. The whisker(s) connect the box to the lowest and/or highest data values that are not outliers, instead of the minimum and/or maximum values.

47
New cards

percentile

the 5th percentile of a distribution is the value with p% of observations less than or equal to it

48
New cards

standardized (z-score)

tells us how many standard deviations from the mean the value falls, and in what direction

49
New cards

density curve

models the distribution of a quantitative variable with a curve that

- is always on/above the horizontal axis

- has area exactly 1 underneath it

50
New cards

mean of a density curve

the point at which the curve would balance if made of solid material

51
New cards

median of a density curve

the equal-areas point, the point that divides the area under the curve in half

52
New cards

normal curve

a symmetric, single-peaked, bell-shaped density curve

53
New cards

normal distribution

- specified by mean and standard deviation

- described by a symmetric, single-peaked, bell-shaped density curve

54
New cards

Empirical Rule (68-95-99.7)

In a normal distribution, about 68% of the terms are within one standard deviation of the mean, about 95% are within two standard deviations, and about 99.7% are within three standard deviations

55
New cards

Standard normal distribution

the normal distribution with mean 0 and standard deviation of 1

56
New cards

assess for normality method 1

1) construct a dot plot/stem plot (time-consuming), box plot (stay away from using it as support), or histogram (default, and iffy, then boxplot)

2) see if the graph is approximately symmetrical and bell-shaped about the mean

3) mark off the points at x +/- s, x +/- 2s, x +/- 3s. then compare the count of observations in each interval with the Empirical Rule

57
New cards

normal probability plot

A scatterplot of the ordered pair (data value, expected z-score) for each of the individuals in a quantitative data set. That is, the x-coordinate of each point is the actual data value and the y-coordinate is the expected z-score corresponding to the percentile of that data value in a standard Normal distribution.

58
New cards

assess for normality method 2

1. Construct a normal probability plot

2. Plotted points will lie close to a straight line if the distribution is close to a normal distribution

3. Outliers will appear as points that are far away from the overall pattern of the plot

59
New cards

explanatory variables

may help explain or predict changes in a response variable

60
New cards

independent variables

explanatory variables

61
New cards

response variables

measures an outcome of a study

62
New cards

dependent variables

response variables

63
New cards

positive association

when the values of one variable tend to increase as the values of the other variable increase

64
New cards

negative association

when the values of one variable tend to decrease as the values of the other variable increase

65
New cards

correlation coefficient

- r

- measures the direction and strength of the association

66
New cards

least squares regression line (LSRL)

line that models how a response variable y changes as an explanatory variable x changes

- y hat = a+bx

- line that makes the sum. of the squared residuals as small as possible

67
New cards

extrapolation

Use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line.

68
New cards

residual

the difference between the actual value of y and the value of y predicted by the regression line

69
New cards

scatterplots

shows the relationship between two quantitative variables measured on the same individuals. the values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis

70
New cards

intercept

predicted value of y when x = 0

71
New cards

slope

the amount by which the predicted value of y changes when x increases by 1 unit

72
New cards

residual plot

a scatterplot that displays the residuals on the vertical axis and the explanatory variable on the horizontal axis

73
New cards

standard deviation of residuals

s measures size of a typical residual

- measures the typical distance between the actual y values and the predicted y values

74
New cards

coefficient of determination

measures the percent reduction in the sum of squared residuals when using the LSRL to make predictions, rather than the mean value of y

- measures the percent of the variability in the response variable that is accounted for by the LSRL

75
New cards

high leverage points in regression

have much larger/much smaller x values than the other points in the data set

76
New cards

outliers in regression

point that does not follow the pattern of the data and has a large residual

77
New cards

influential points in regression

any point that, if removed, substantially changes the slope, y-intercept, correlation, coefficient of determination, or standard deviation of the residuals

78
New cards

power model

y = ax^b

- if y is proportional to a power of x, we should use a power model- log(x) vs log(y)

- LSRL: log y^ = a + b(log(x))

79
New cards

population

the entire group of individuals we want information about

80
New cards

census

collects data from every individual in the population

81
New cards

sample

a subset of individuals in the population from which we actually collect data

82
New cards

convenience sampling

selects individuals from the population who are easy to reach

83
New cards

bias

likely to underestimate/overestimate the value you want to know

84
New cards

voluntary response sampling

allows people to choose to be in the sample by responding to a general invitation

85
New cards

voluntary response bias

- people who self-select to participate in such surveys are usually not representative of the population of interest

- attracts people who feel strongly about an issue, and who often share the same opinion

86
New cards

random sampling/random selection

involves using a chance process to determine which members of a population are included in the sample

87
New cards

simple random sample

chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sample

88
New cards

strata

groups of individuals in a population who share characteristics thought to be associated with the variables being measured in a study

89
New cards

stratified random sampling

selects a sample by choosing an SRS from each stratum and combining the SRSs into one overall sample

90
New cards

cluster

group of individuals in the population that are located near each other

91
New cards

systematic random sampling

selects a sample from an ordered arrangement of the population by randomly selecting one of the first k individuals and choosing every kth individual thereafter

92
New cards

multistage sampling

combines two or more sampling methods

93
New cards

undercoverage

occurs when some members of the population are less likely to be chosen or cannot be chosen in a sampel

94
New cards

nonresponse

occurs when an individual chosen for the sample can't be contacted or refuses to participate

95
New cards

wording of questions bias

confusing/leading questions

96
New cards

response bias

occurs when there is a systematic pattern of inaccurate answers to a survey question

97
New cards

observational study

observes individuals and measures variables of interest but does not attempt to influence the responses

98
New cards

retrospective observational studies

observational study that examines existing data for a sample of individuals

99
New cards

prospective observational studies

observational studies that track individuals into the future

100
New cards

confounding

occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other