Statistics Midterm

0.0(0)
Studied by 6 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/53

flashcard set

Earn XP

Description and Tags

Last updated 11:07 PM on 10/11/23
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

54 Terms

1
New cards

population

total set of individuals that are of interest

2
New cards

parameter

summarizes the population (μ-mean, and σ-standard deviation)

3
New cards

sample

portion OF the total population (should lack bias)

4
New cards

stastistic

value summarizing the sample (Xbar-mean, s-standard deviation)

5
New cards

cases

individual items on which data is collected

6
New cards

respondents

people who answer survey

7
New cards

subjects/participants

people who are experimented on

8
New cards

experimental units

objects of an experiment when not a person

9
New cards

graphs for categorical data

  • bar Chart

  • pie Chart

10
New cards

graphs for quantitative data

  • dot plot

  • histogram

  • box plot

11
New cards

histogram benefit

  • seeing distribution of the data

  • good to compare two or three groups

12
New cards

to describe quantitative data

Shape

Outliers (and other unusual features)

Center

Spread

13
New cards

shape

  • modality/peaks

  • symmetry and skewness

14
New cards

outliers (what it is, how to find)

  • data value that is far above or far below the rest of the data

  • upper outlier: Q3+1.5(IQR)

  • lower outlier: Q1-1.5(IQR)

15
New cards

center

median and mean

16
New cards

median (what it is, how to find, when to use it)

  • the middle of the data

  • order the values and find the one that is positionally the middle value

  • best for symmetric distributions

  • resistant to outliers

17
New cards

mean (what it is, how to find, when to use it)

  • the average

  • ybar=total/n

  • good for skewed data

  • not resistant to outliers

18
New cards

what happens to the mean when the data is skewed

it will be further in the direction of the skewness (ex. right skewed data will have a higher mean than median)

19
New cards

spread (2 main kinds)

  • standard deviation

  • IQR

20
New cards

standard deviation (what is it, how to find, what different sd’s mean)

  • distance of a value from the mean, how tightly packed the data are

  • s = √(∑ (y−ybar)²/n−1)

  • small sd= data values less spread out and closer to the mean

21
New cards

quartiles

  • Q1 (median of lower half of data)= 25th percentile

  • Q2 (median)= 50th percentile

  • Q3 (median of upper half of the data)= 75th percentile

  • Q4= max data in the value, 100th percentile

22
New cards

5 number summary

median, quartiles, min, max, IQR, and range

23
New cards

IQR (how to find, benefits, drawbacks)

  • IQR=Q3-Q1

  • reasonable summary of distribution spread

  • not affected by outliers

  • most people don’t know what it is

24
New cards

center/spread combos

  • mean + standard deviation

    • both best with roughly symmetric data

    • based on magnitude and values

  • median + IQR

    • both best for skewed data

    • based on order of the data

25
New cards

independence

distribution of one variable is the same for all categories of another

26
New cards

dependent variables

have an association between the two variables

27
New cards

observational studies

  • look at sample of data to learn more about larger population

  • often lead to contradictory results because nothing is controlled for or really conclusive

28
New cards

boxplots

  • central box shows the middle half of the data

  • height of box=IQR

  • whiskers show skewness if they are not roughly the same length

  • if median is centered=roughly symmetric middle half of data

  • compare many groups

29
New cards

z-score (what is it, how to find it, what do small/large, +/- z scores mean)

  • how far a value is from the mean in terms of standard deviations but is useful for re-expression of values

  • z=(y-ybar)/s

  • small/large: close/far from the mean (respectively)

  • positive/negative: above/below the mean (respectively)

30
New cards

shifting data (adding or subtracting to all values) does what?

  • changes only the position, not the spread

  • mean, median, and quartiles (location) are changed

  • standard deviation, range, and IQR (spread) remain unchanged

31
New cards

rescaling the data (multiplying it by a constant) does what?

changes points and spread

32
New cards

the normal model (notation, empirical rule)

  • N ( μ , σ )

  • 68-95-99.7 rule (1σ away, 2σ away, 3σ away)

33
New cards

percentile

the percent of data that falls at or below some value

ex) the 80th percentile HAS 80% of the data below

34
New cards
  • how to get from values (variables) to z-scores

  • how to get from z-score to area (%, proportion percentile)

  • standardize: z= (x-μ)/σ

    • normcdf(lower z, upper z)

  • invnorm (percentile or area below)

35
New cards

scatterplots (what do they show, what are they good for)

  • relationship between two quantitative variables

  • detect patterns, trends, relationships, extraordinary values

36
New cards

roles for variables (what’s on y and x axis)

  • y axis= response variable, what we want to predict

  • x axis= explanatory variable, what is providing info and helping to predict response variable

37
New cards

correlation coefficient (r) (how to find, what it means, conditions)

  • in ti-84 go to STAT→CALC→8

  • strength and direction of linear relationship (btw -1 and 1)

  • need a nearly linear relationship, quantitative variables, and no strong outliers

  • has no units

  • changing x and y does not change correlation

38
New cards

correlation does not equal causation

  • lurking variables

  • correlation means a LINEAR relationship, an association is ANY relationship

39
New cards

residual (how to find it, what is it)

  • y-ŷ (ŷ= predicted value)

  • residual is the difference between the observed value and the predicted value

  • points above the line have + residuals and below the line have - residuals

40
New cards

line of best fit (lease squares line, regression line)

(what is it, what to do with it, how its written)

  • line for which the sum of the squares of the residuals is the smallest

    • squaring the residuals makes them all positive

  • best fitting line will have small residuals

    • if you have small residuals, that means you predicted the results of your data well

  • y=mx+b

41
New cards

line of best fit interpretation

  • for every (slope) +/- we can expect to see an (intercept) +/- in the (y)

42
New cards

conditions for using regression

  • quantitative

  • straight enough

  • no outliers

43
New cards

what is r²

  • the fraction of the data’s variation accounted for by the model

  • found by squaring the residual and subtracting 1 from it

    • ex) r²= 0.76² = 0.58

      1-r² =1-0.58 = 0.42= 42%

  • interpreted as:

    • 42% of the variation of y is accounted for by the residuals

    • 58% of the variation of y is accounted for by x

44
New cards

extrapolation

  • the farther away from the mean, the less trust should be put in the predicted value of y

45
New cards

when do values have leverage

  • x-values that are far from the mean of the rest of the x-values

  • extreme in y have large residuals

46
New cards

when are values influential

  • if omitting it from the analysis changes the model enough to make a meaningful difference

  • determined by:

    • residual

    • leverage

47
New cards

simple random sample

  • guarantees that each person has an equal chance of being selected

  • ensures that a non-representative sample is unlikely to occur

48
New cards

stratified random sampling

  • divides the population into HOMOGENOUS groups where proportionate amounts from each group are randomly selected

  • estimates will be more precise, but watch out for simpson’s paradox

49
New cards

cluster sampling

  • dividing population into smaller groups

  • less expensive and less time consuming

  • some populations are naturally broken into clusters already

  • HETEREROGENOUS

50
New cards

multistage sampling

  • combination of several sampling methods

    • ex) for a college, select dorms as a cluster and then continue with other sampling methods such as a census, etc.

51
New cards

observational studies

  • researchers do not assign choices, passively observe participants

  • bad for cause-and-effect establishment

  • tough to handle lurking variables

52
New cards

retrospective studies

  • collect data on something that has already occurred

  • similar pros and cons as obs. studies

53
New cards

prospective studies

  • study where we identify subjects in advance and collect data as events unfold

  • possible to isolate variables

  • can be expensive and time-consuming

54
New cards

4 principals of experimental design

  1. control

    • control what you can

  2. randomize

    • randomize the rest

  3. replicate

  4. block

    • group similar individuals together and randomize within each of these blocks

    • helps account for variability due to the difference between blocks