Statistics Midterm

0.0(0)

Studied by 6 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/53

Earn XP

Description and Tags

Statistics

Last updated 11:07 PM on 10/11/23

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

54 Terms

New cards

population

total set of individuals that are of interest

New cards

parameter

summarizes the population (μ-mean, and σ-standard deviation)

New cards

sample

portion OF the total population (should lack bias)

New cards

stastistic

value summarizing the sample (Xbar-mean, s-standard deviation)

New cards

cases

individual items on which data is collected

New cards

respondents

people who answer survey

New cards

subjects/participants

people who are experimented on

New cards

experimental units

objects of an experiment when not a person

New cards

graphs for categorical data

bar Chart
pie Chart

New cards

graphs for quantitative data

dot plot
histogram
box plot

New cards

histogram benefit

seeing distribution of the data
good to compare two or three groups

New cards

to describe quantitative data

Shape

Outliers (and other unusual features)

Center

Spread

New cards

shape

modality/peaks
symmetry and skewness

New cards

outliers (what it is, how to find)

data value that is far above or far below the rest of the data
upper outlier: Q3+1.5(IQR)
lower outlier: Q1-1.5(IQR)

New cards

center

median and mean

New cards

median (what it is, how to find, when to use it)

the middle of the data
order the values and find the one that is positionally the middle value
best for symmetric distributions
resistant to outliers

New cards

mean (what it is, how to find, when to use it)

the average
ybar=total/n
good for skewed data
not resistant to outliers

New cards

what happens to the mean when the data is skewed

it will be further in the direction of the skewness (ex. right skewed data will have a higher mean than median)

New cards

spread (2 main kinds)

standard deviation
IQR

New cards

standard deviation (what is it, how to find, what different sd’s mean)

distance of a value from the mean, how tightly packed the data are
s = √(∑ (y−ybar)²/n−1)
small sd= data values less spread out and closer to the mean

New cards

quartiles

Q1 (median of lower half of data)= 25th percentile
Q2 (median)= 50th percentile
Q3 (median of upper half of the data)= 75th percentile
Q4= max data in the value, 100th percentile

New cards

5 number summary

median, quartiles, min, max, IQR, and range

New cards

IQR (how to find, benefits, drawbacks)

IQR=Q3-Q1
reasonable summary of distribution spread
not affected by outliers
most people don’t know what it is

New cards

center/spread combos

mean + standard deviation
- both best with roughly symmetric data
- based on magnitude and values
median + IQR
- both best for skewed data
- based on order of the data

New cards

independence

distribution of one variable is the same for all categories of another

New cards

dependent variables

have an association between the two variables

New cards

observational studies

look at sample of data to learn more about larger population
often lead to contradictory results because nothing is controlled for or really conclusive

New cards

boxplots

central box shows the middle half of the data
height of box=IQR
whiskers show skewness if they are not roughly the same length
if median is centered=roughly symmetric middle half of data
compare many groups

New cards

z-score (what is it, how to find it, what do small/large, +/- z scores mean)

how far a value is from the mean in terms of standard deviations but is useful for re-expression of values
z=(y-ybar)/s
small/large: close/far from the mean (respectively)
positive/negative: above/below the mean (respectively)

New cards

shifting data (adding or subtracting to all values) does what?

changes only the position, not the spread
mean, median, and quartiles (location) are changed
standard deviation, range, and IQR (spread) remain unchanged

New cards

rescaling the data (multiplying it by a constant) does what?

changes points and spread

New cards

the normal model (notation, empirical rule)

N ( μ , σ )
68-95-99.7 rule (1σ away, 2σ away, 3σ away)

New cards

percentile

the percent of data that falls at or below some value

ex) the 80th percentile HAS 80% of the data below

New cards

how to get from values (variables) to z-scores
how to get from z-score to area (%, proportion percentile)

standardize: z= (x-μ)/σ
- normcdf(lower z, upper z)
invnorm (percentile or area below)

New cards

scatterplots (what do they show, what are they good for)

relationship between two quantitative variables
detect patterns, trends, relationships, extraordinary values

New cards

roles for variables (what’s on y and x axis)

y axis= response variable, what we want to predict
x axis= explanatory variable, what is providing info and helping to predict response variable

New cards

correlation coefficient (r) (how to find, what it means, conditions)

in ti-84 go to STAT→CALC→8
strength and direction of linear relationship (btw -1 and 1)
need a nearly linear relationship, quantitative variables, and no strong outliers
has no units
changing x and y does not change correlation

New cards

correlation does not equal causation

lurking variables
correlation means a LINEAR relationship, an association is ANY relationship

New cards

residual (how to find it, what is it)

y-ŷ (ŷ= predicted value)
residual is the difference between the observed value and the predicted value
points above the line have + residuals and below the line have - residuals

New cards

line of best fit (lease squares line, regression line)

(what is it, what to do with it, how its written)

line for which the sum of the squares of the residuals is the smallest
- squaring the residuals makes them all positive
best fitting line will have small residuals
- if you have small residuals, that means you predicted the results of your data well
y=mx+b

New cards

line of best fit interpretation

for every (slope) +/- we can expect to see an (intercept) +/- in the (y)

New cards

conditions for using regression

quantitative
straight enough
no outliers

New cards

what is r²

the fraction of the data’s variation accounted for by the model
found by squaring the residual and subtracting 1 from it
- ex) r²= 0.76² = 0.58
  1-r² =1-0.58 = 0.42= 42%
interpreted as:
- 42% of the variation of y is accounted for by the residuals
- 58% of the variation of y is accounted for by x

New cards

extrapolation

the farther away from the mean, the less trust should be put in the predicted value of y

New cards

when do values have leverage

x-values that are far from the mean of the rest of the x-values
extreme in y have large residuals

New cards

when are values influential

if omitting it from the analysis changes the model enough to make a meaningful difference
determined by:
- residual
- leverage

New cards

simple random sample

guarantees that each person has an equal chance of being selected
ensures that a non-representative sample is unlikely to occur

New cards

stratified random sampling

divides the population into HOMOGENOUS groups where proportionate amounts from each group are randomly selected
estimates will be more precise, but watch out for simpson’s paradox

New cards

cluster sampling

dividing population into smaller groups
less expensive and less time consuming
some populations are naturally broken into clusters already
HETEREROGENOUS

New cards

multistage sampling

combination of several sampling methods
- ex) for a college, select dorms as a cluster and then continue with other sampling methods such as a census, etc.

New cards

observational studies

researchers do not assign choices, passively observe participants
bad for cause-and-effect establishment
tough to handle lurking variables

New cards

retrospective studies

collect data on something that has already occurred
similar pros and cons as obs. studies

New cards

prospective studies

study where we identify subjects in advance and collect data as events unfold
possible to isolate variables
can be expensive and time-consuming

New cards

4 principals of experimental design

control
- control what you can
randomize
- randomize the rest
replicate
block
- group similar individuals together and randomize within each of these blocks
- helps account for variability due to the difference between blocks