Ch. 1-4 Stats

studied byStudied by 3 people
5.0(1)
Get a hint
Hint

exponential model

1 / 120

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

121 Terms

1

exponential model

y=ab^x (note that a is not the y-int and b is not the slope, they are just placeholders)

- if there is a common ratio (or approximately common) for each equal time period, you have exponential growth/decay

- common ratio > 1: growth- 0 < common ratio < 1: decay

- make sure to note that you can't use the world exponential unless it has been proven by the data

- we usually to study/decay over time

- x vs log y- LSRL: log y^ = a +bx

New cards
2

Statistics

the science and art of collecting, analyzing, and drawing conclusions from data

New cards
3

Individuals

- an object described in a set of data -- can be people, animals, or things

- WHO/WHAT are we gathering information about?

New cards
4

Variables

- an attribute that can take different values for different individuals

- what do we want to know about these individuals?

New cards
5

Qualitative/Categorical Variables

- assigns labels that place each individual into a particular group called a category

- distinct groups/classifications; can be numerical values that make no sense to average (phone numbers)

New cards
6

Marginal relative frequency

- gives the percent or proportion of individuals that have a specific value for one categorical variable

New cards
7

Conditional relative frequency

- gives the percent or proportion of individuals that have a specific value for one categorical variable among individuals who share the same value of another categorical variable (the condition

New cards
8

Simpson's paradox

- an association between two variables that

holds for each value of a third variable can be changed or even reversed when the data for all values of the third variable are combined

New cards
9

Side-by-side bar graph

Displays the distribution of a categorical variable for each value of another categorical variable. The bars are grouped together based on the values of one of the categorical variables and placed side by side.

New cards
10

Segmented bar graph

displays the distribution of a categorical variable as segments of a rectangle, with the area of each segment proportional to the percent of individuals in the corresponding category

New cards
11

Mosaic plot

a modified segmented bar graph in which the width of each rectangle is proportional to the number of individuals in the corresponding category

New cards
12

Association

- if knowing the value of one variable helps us predict the value of the other, there is association

- if knowing the value of one variable does not help us predict the value of the other, there is no association

New cards
13

Dot Plot

shows each data value as a dot above its location on a number line

New cards
14

first quartile

the median of the data values that are to the left of the median in the ordered list

New cards
15

cumulative relative frequency graph (ogive)

plots a point corresponding to the percentile of a given value in a distribution of quantitative data. consecutive points are then connected with a line segment to form the graph

New cards
16

no association

If knowing the value of one variable does not help you predict the value of the other.

New cards
17

cluster sampling

selects a sample by randomly choosing clusters and including each member of the selected clusters in the sample

New cards
18

experiment

deliberately imposes some treatment on individuals to measure their responses

New cards
19

random assignment

experimental units are assigned to treatments using a chance process

New cards
20

Quantitative Variables

- takes number values that are quantities -- counts or measurements

- makes sense to carry out arithmetic operations like adding and averaging

New cards
21

Discrete Variable

- a quantitative variable that takes a fixed set of possible values with gaps between them (shoe size)

New cards
22

Continuous Variable

- a quantitative variable that can take any value in an interval on the number line (GPA)

New cards
23

Distribution

tells us what values the variable takes and how often it takes these values

New cards
24

Bar Graph (Bar Chart)

- shows each category as a bar

- the heights of the bars show the category frequencies or relative frequencies

- 1 categorical variable

New cards
25

Two-way (contingency) tables

- table of counts that summarizes data on the relationship between two categorical variables for some group of individuals

New cards
26

Joint relative frequency

- gives the percent or proportion of individuals that have a specific value for one categorical variable and a specific value for another categorical variable

New cards
27

Symmetric

- if the right side of the graph (containing the half of observations with the largest values) is approximately a mirror image of the left side

New cards
28

Skewed to the left

if the left side of the graph is much longer than the right side

New cards
29

Skewed to the right

if the right side of the graph is much longer than the left side

New cards
30

Stem plot

Shows each data value separated into two parts: a stem, which consists of all but the final digit, and a leaf, the final digit. The stems are ordered from lowest to highest and arranged in a vertical column. The leaves are arranged in increasing order out from the appropriate stems.

New cards
31

Histogram

Shows each interval of values as a bar. The heights of the bars show the frequencies or relative frequencies of values in each interval.

New cards
32

Mean

the arithmetic average of a distribution, obtained by adding the scores and then dividing by the number of scores

New cards
33

statistic

a number that describes some characteristic of a sample

New cards
34

parameter

a number that describes some characteristic of the population

New cards
35

resistant

not sensitive to extreme values

New cards
36

median

midpoint of a distribution, the number such that about half the observations are smaller and about half are larger

New cards
37

range

distance between the minimum value and the maximum value

New cards
38

variance

average squared deviation s^2

New cards
39

standard deviation

- measures the typical distance of the values in a distribution from the mean

- average of squared deviations and then taking the square root

- square root of variance

New cards
40

quartiles

divide the ordered data set into four roups having roughly the same number of values

New cards
41

third quartile

the median of the data values that are to the right of the median in the ordered list

New cards
42

interquartile range

distance between the first and third quartiles of a distribution

New cards
43

outliers

individuals values that fall outside the overall pattern of a distribution

New cards
44

five-number summary

The minimum, first quartile (Q1), median, third quartile (Q3), and the maximum

New cards
45

box plot

visual representation of five-number summary

New cards
46

modified box plot

A box plot that indicates which data values, if any, are outliers by representing them as dots separate from the box plot. The whisker(s) connect the box to the lowest and/or highest data values that are not outliers, instead of the minimum and/or maximum values.

New cards
47

percentile

the 5th percentile of a distribution is the value with p% of observations less than or equal to it

New cards
48

standardized (z-score)

tells us how many standard deviations from the mean the value falls, and in what direction

New cards
49

density curve

models the distribution of a quantitative variable with a curve that

- is always on/above the horizontal axis

- has area exactly 1 underneath it

New cards
50

mean of a density curve

the point at which the curve would balance if made of solid material

New cards
51

median of a density curve

the equal-areas point, the point that divides the area under the curve in half

New cards
52

normal curve

a symmetric, single-peaked, bell-shaped density curve

New cards
53

normal distribution

- specified by mean and standard deviation

- described by a symmetric, single-peaked, bell-shaped density curve

New cards
54

Empirical Rule (68-95-99.7)

In a normal distribution, about 68% of the terms are within one standard deviation of the mean, about 95% are within two standard deviations, and about 99.7% are within three standard deviations

New cards
55

Standard normal distribution

the normal distribution with mean 0 and standard deviation of 1

New cards
56

assess for normality method 1

1) construct a dot plot/stem plot (time-consuming), box plot (stay away from using it as support), or histogram (default, and iffy, then boxplot)

2) see if the graph is approximately symmetrical and bell-shaped about the mean

3) mark off the points at x +/- s, x +/- 2s, x +/- 3s. then compare the count of observations in each interval with the Empirical Rule

New cards
57

normal probability plot

A scatterplot of the ordered pair (data value, expected z-score) for each of the individuals in a quantitative data set. That is, the x-coordinate of each point is the actual data value and the y-coordinate is the expected z-score corresponding to the percentile of that data value in a standard Normal distribution.

New cards
58

assess for normality method 2

1. Construct a normal probability plot

2. Plotted points will lie close to a straight line if the distribution is close to a normal distribution

3. Outliers will appear as points that are far away from the overall pattern of the plot

New cards
59

explanatory variables

may help explain or predict changes in a response variable

New cards
60

independent variables

explanatory variables

New cards
61

response variables

measures an outcome of a study

New cards
62

dependent variables

response variables

New cards
63

positive association

when the values of one variable tend to increase as the values of the other variable increase

New cards
64

negative association

when the values of one variable tend to decrease as the values of the other variable increase

New cards
65

correlation coefficient

- r

- measures the direction and strength of the association

New cards
66

least squares regression line (LSRL)

line that models how a response variable y changes as an explanatory variable x changes

- y hat = a+bx

- line that makes the sum. of the squared residuals as small as possible

New cards
67

extrapolation

Use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line.

New cards
68

residual

the difference between the actual value of y and the value of y predicted by the regression line

New cards
69

scatterplots

shows the relationship between two quantitative variables measured on the same individuals. the values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis

New cards
70

intercept

predicted value of y when x = 0

New cards
71

slope

the amount by which the predicted value of y changes when x increases by 1 unit

New cards
72

residual plot

a scatterplot that displays the residuals on the vertical axis and the explanatory variable on the horizontal axis

New cards
73

standard deviation of residuals

s measures size of a typical residual

- measures the typical distance between the actual y values and the predicted y values

New cards
74

coefficient of determination

measures the percent reduction in the sum of squared residuals when using the LSRL to make predictions, rather than the mean value of y

- measures the percent of the variability in the response variable that is accounted for by the LSRL

New cards
75

high leverage points in regression

have much larger/much smaller x values than the other points in the data set

New cards
76

outliers in regression

point that does not follow the pattern of the data and has a large residual

New cards
77

influential points in regression

any point that, if removed, substantially changes the slope, y-intercept, correlation, coefficient of determination, or standard deviation of the residuals

New cards
78

power model

y = ax^b

- if y is proportional to a power of x, we should use a power model- log(x) vs log(y)

- LSRL: log y^ = a + b(log(x))

New cards
79

population

the entire group of individuals we want information about

New cards
80

census

collects data from every individual in the population

New cards
81

sample

a subset of individuals in the population from which we actually collect data

New cards
82

convenience sampling

selects individuals from the population who are easy to reach

New cards
83

bias

likely to underestimate/overestimate the value you want to know

New cards
84

voluntary response sampling

allows people to choose to be in the sample by responding to a general invitation

New cards
85

voluntary response bias

- people who self-select to participate in such surveys are usually not representative of the population of interest

- attracts people who feel strongly about an issue, and who often share the same opinion

New cards
86

random sampling/random selection

involves using a chance process to determine which members of a population are included in the sample

New cards
87

simple random sample

chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sample

New cards
88

strata

groups of individuals in a population who share characteristics thought to be associated with the variables being measured in a study

New cards
89

stratified random sampling

selects a sample by choosing an SRS from each stratum and combining the SRSs into one overall sample

New cards
90

cluster

group of individuals in the population that are located near each other

New cards
91

systematic random sampling

selects a sample from an ordered arrangement of the population by randomly selecting one of the first k individuals and choosing every kth individual thereafter

New cards
92

multistage sampling

combines two or more sampling methods

New cards
93

undercoverage

occurs when some members of the population are less likely to be chosen or cannot be chosen in a sampel

New cards
94

nonresponse

occurs when an individual chosen for the sample can't be contacted or refuses to participate

New cards
95

wording of questions bias

confusing/leading questions

New cards
96

response bias

occurs when there is a systematic pattern of inaccurate answers to a survey question

New cards
97

observational study

observes individuals and measures variables of interest but does not attempt to influence the responses

New cards
98

retrospective observational studies

observational study that examines existing data for a sample of individuals

New cards
99

prospective observational studies

observational studies that track individuals into the future

New cards
100

confounding

occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other

New cards

Explore top notes

note Note
studied byStudied by 13 people
... ago
5.0(1)
note Note
studied byStudied by 8 people
... ago
5.0(1)
note Note
studied byStudied by 21 people
... ago
5.0(1)
note Note
studied byStudied by 60 people
... ago
5.0(1)
note Note
studied byStudied by 14 people
... ago
5.0(1)
note Note
studied byStudied by 11 people
... ago
5.0(1)
note Note
studied byStudied by 48 people
... ago
5.0(1)
note Note
studied byStudied by 8136 people
... ago
4.8(38)

Explore top flashcards

flashcards Flashcard (42)
studied byStudied by 3 people
... ago
5.0(1)
flashcards Flashcard (44)
studied byStudied by 36 people
... ago
4.0(4)
flashcards Flashcard (100)
studied byStudied by 3 people
... ago
5.0(1)
flashcards Flashcard (112)
studied byStudied by 75 people
... ago
5.0(2)
flashcards Flashcard (27)
studied byStudied by 4 people
... ago
5.0(2)
flashcards Flashcard (29)
studied byStudied by 24 people
... ago
5.0(1)
flashcards Flashcard (25)
studied byStudied by 37 people
... ago
5.0(1)
flashcards Flashcard (20)
studied byStudied by 677 people
... ago
5.0(6)
robot