Ch. 1-4 Stats

5.0(1)

Studied by 3 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/120

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

121 Terms

New cards

exponential model

y=ab^x (note that a is not the y-int and b is not the slope, they are just placeholders)

- if there is a common ratio (or approximately common) for each equal time period, you have exponential growth/decay

- common ratio > 1: growth- 0 < common ratio < 1: decay

- make sure to note that you can't use the world exponential unless it has been proven by the data

- we usually to study/decay over time

- x vs log y- LSRL: log y^ = a +bx

New cards

Statistics

the science and art of collecting, analyzing, and drawing conclusions from data

New cards

Individuals

- an object described in a set of data -- can be people, animals, or things

- WHO/WHAT are we gathering information about?

New cards

Variables

- an attribute that can take different values for different individuals

- what do we want to know about these individuals?

New cards

Qualitative/Categorical Variables

- assigns labels that place each individual into a particular group called a category

- distinct groups/classifications; can be numerical values that make no sense to average (phone numbers)

New cards

Marginal relative frequency

- gives the percent or proportion of individuals that have a specific value for one categorical variable

New cards

Conditional relative frequency

- gives the percent or proportion of individuals that have a specific value for one categorical variable among individuals who share the same value of another categorical variable (the condition

New cards

Simpson's paradox

- an association between two variables that

holds for each value of a third variable can be changed or even reversed when the data for all values of the third variable are combined

New cards

Side-by-side bar graph

Displays the distribution of a categorical variable for each value of another categorical variable. The bars are grouped together based on the values of one of the categorical variables and placed side by side.

New cards

Segmented bar graph

displays the distribution of a categorical variable as segments of a rectangle, with the area of each segment proportional to the percent of individuals in the corresponding category

New cards

Mosaic plot

a modified segmented bar graph in which the width of each rectangle is proportional to the number of individuals in the corresponding category

New cards

Association

- if knowing the value of one variable helps us predict the value of the other, there is association

- if knowing the value of one variable does not help us predict the value of the other, there is no association

New cards

Dot Plot

shows each data value as a dot above its location on a number line

New cards

first quartile

the median of the data values that are to the left of the median in the ordered list

New cards

cumulative relative frequency graph (ogive)

plots a point corresponding to the percentile of a given value in a distribution of quantitative data. consecutive points are then connected with a line segment to form the graph

New cards

no association

If knowing the value of one variable does not help you predict the value of the other.

New cards

cluster sampling

selects a sample by randomly choosing clusters and including each member of the selected clusters in the sample

New cards

experiment

deliberately imposes some treatment on individuals to measure their responses

New cards

random assignment

experimental units are assigned to treatments using a chance process

New cards

Quantitative Variables

- takes number values that are quantities -- counts or measurements

- makes sense to carry out arithmetic operations like adding and averaging

New cards

Discrete Variable

- a quantitative variable that takes a fixed set of possible values with gaps between them (shoe size)

New cards

Continuous Variable

- a quantitative variable that can take any value in an interval on the number line (GPA)

New cards

Distribution

tells us what values the variable takes and how often it takes these values

New cards

Bar Graph (Bar Chart)

- shows each category as a bar

- the heights of the bars show the category frequencies or relative frequencies

- 1 categorical variable

New cards

Two-way (contingency) tables

- table of counts that summarizes data on the relationship between two categorical variables for some group of individuals

New cards

Joint relative frequency

- gives the percent or proportion of individuals that have a specific value for one categorical variable and a specific value for another categorical variable

New cards

Symmetric

- if the right side of the graph (containing the half of observations with the largest values) is approximately a mirror image of the left side

New cards

Skewed to the left

if the left side of the graph is much longer than the right side

New cards

Skewed to the right

if the right side of the graph is much longer than the left side

New cards

Stem plot

Shows each data value separated into two parts: a stem, which consists of all but the final digit, and a leaf, the final digit. The stems are ordered from lowest to highest and arranged in a vertical column. The leaves are arranged in increasing order out from the appropriate stems.

New cards

Histogram

Shows each interval of values as a bar. The heights of the bars show the frequencies or relative frequencies of values in each interval.

New cards

Mean

the arithmetic average of a distribution, obtained by adding the scores and then dividing by the number of scores

New cards

statistic

a number that describes some characteristic of a sample

New cards

parameter

a number that describes some characteristic of the population

New cards

resistant

not sensitive to extreme values

New cards

median

midpoint of a distribution, the number such that about half the observations are smaller and about half are larger

New cards

range

distance between the minimum value and the maximum value

New cards

variance

average squared deviation s^2

New cards

standard deviation

- measures the typical distance of the values in a distribution from the mean

- average of squared deviations and then taking the square root

- square root of variance

New cards

quartiles

divide the ordered data set into four roups having roughly the same number of values

New cards

third quartile

the median of the data values that are to the right of the median in the ordered list

New cards

interquartile range

distance between the first and third quartiles of a distribution

New cards

outliers

individuals values that fall outside the overall pattern of a distribution

New cards

five-number summary

The minimum, first quartile (Q1), median, third quartile (Q3), and the maximum

New cards

box plot

visual representation of five-number summary

New cards

modified box plot

A box plot that indicates which data values, if any, are outliers by representing them as dots separate from the box plot. The whisker(s) connect the box to the lowest and/or highest data values that are not outliers, instead of the minimum and/or maximum values.

New cards

percentile

the 5th percentile of a distribution is the value with p% of observations less than or equal to it

New cards

standardized (z-score)

tells us how many standard deviations from the mean the value falls, and in what direction

New cards

density curve

models the distribution of a quantitative variable with a curve that

- is always on/above the horizontal axis

- has area exactly 1 underneath it

New cards

mean of a density curve

the point at which the curve would balance if made of solid material

New cards

median of a density curve

the equal-areas point, the point that divides the area under the curve in half

New cards

normal curve

a symmetric, single-peaked, bell-shaped density curve

New cards

normal distribution

- specified by mean and standard deviation

- described by a symmetric, single-peaked, bell-shaped density curve

New cards

Empirical Rule (68-95-99.7)

In a normal distribution, about 68% of the terms are within one standard deviation of the mean, about 95% are within two standard deviations, and about 99.7% are within three standard deviations

New cards

Standard normal distribution

the normal distribution with mean 0 and standard deviation of 1

New cards

assess for normality method 1

1) construct a dot plot/stem plot (time-consuming), box plot (stay away from using it as support), or histogram (default, and iffy, then boxplot)

2) see if the graph is approximately symmetrical and bell-shaped about the mean

3) mark off the points at x +/- s, x +/- 2s, x +/- 3s. then compare the count of observations in each interval with the Empirical Rule

New cards

normal probability plot

A scatterplot of the ordered pair (data value, expected z-score) for each of the individuals in a quantitative data set. That is, the x-coordinate of each point is the actual data value and the y-coordinate is the expected z-score corresponding to the percentile of that data value in a standard Normal distribution.

New cards

assess for normality method 2

1. Construct a normal probability plot

2. Plotted points will lie close to a straight line if the distribution is close to a normal distribution

3. Outliers will appear as points that are far away from the overall pattern of the plot

New cards

explanatory variables

may help explain or predict changes in a response variable

New cards

independent variables

explanatory variables

New cards

response variables

measures an outcome of a study

New cards

dependent variables

response variables

New cards

positive association

when the values of one variable tend to increase as the values of the other variable increase

New cards

negative association

when the values of one variable tend to decrease as the values of the other variable increase

New cards

correlation coefficient

- r

- measures the direction and strength of the association

New cards

least squares regression line (LSRL)

line that models how a response variable y changes as an explanatory variable x changes

- y hat = a+bx

- line that makes the sum. of the squared residuals as small as possible

New cards

extrapolation

Use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line.

New cards

residual

the difference between the actual value of y and the value of y predicted by the regression line

New cards

scatterplots

shows the relationship between two quantitative variables measured on the same individuals. the values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis

New cards

intercept

predicted value of y when x = 0

New cards

slope

the amount by which the predicted value of y changes when x increases by 1 unit

New cards

residual plot

a scatterplot that displays the residuals on the vertical axis and the explanatory variable on the horizontal axis

New cards

standard deviation of residuals

s measures size of a typical residual

- measures the typical distance between the actual y values and the predicted y values

New cards

coefficient of determination

measures the percent reduction in the sum of squared residuals when using the LSRL to make predictions, rather than the mean value of y

- measures the percent of the variability in the response variable that is accounted for by the LSRL

New cards

high leverage points in regression

have much larger/much smaller x values than the other points in the data set

New cards

outliers in regression

point that does not follow the pattern of the data and has a large residual

New cards

influential points in regression

any point that, if removed, substantially changes the slope, y-intercept, correlation, coefficient of determination, or standard deviation of the residuals

New cards

power model

y = ax^b

- if y is proportional to a power of x, we should use a power model- log(x) vs log(y)

- LSRL: log y^ = a + b(log(x))

New cards

population

the entire group of individuals we want information about

New cards

census

collects data from every individual in the population

New cards

sample

a subset of individuals in the population from which we actually collect data

New cards

convenience sampling

selects individuals from the population who are easy to reach

New cards

bias

likely to underestimate/overestimate the value you want to know

New cards

voluntary response sampling

allows people to choose to be in the sample by responding to a general invitation

New cards

voluntary response bias

- people who self-select to participate in such surveys are usually not representative of the population of interest

- attracts people who feel strongly about an issue, and who often share the same opinion

New cards

random sampling/random selection

involves using a chance process to determine which members of a population are included in the sample

New cards

simple random sample

chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sample

New cards

strata

groups of individuals in a population who share characteristics thought to be associated with the variables being measured in a study

New cards

stratified random sampling

selects a sample by choosing an SRS from each stratum and combining the SRSs into one overall sample

New cards

cluster

group of individuals in the population that are located near each other

New cards

systematic random sampling

selects a sample from an ordered arrangement of the population by randomly selecting one of the first k individuals and choosing every kth individual thereafter

New cards

multistage sampling

combines two or more sampling methods

New cards

undercoverage

occurs when some members of the population are less likely to be chosen or cannot be chosen in a sampel

New cards

nonresponse

occurs when an individual chosen for the sample can't be contacted or refuses to participate

New cards

wording of questions bias

confusing/leading questions

New cards

response bias

occurs when there is a systematic pattern of inaccurate answers to a survey question

New cards

observational study

observes individuals and measures variables of interest but does not attempt to influence the responses

New cards

retrospective observational studies

observational study that examines existing data for a sample of individuals

New cards

prospective observational studies

observational studies that track individuals into the future

100

New cards

confounding

occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other