Stat 250 Final Exam

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/46

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

47 Terms

1
New cards

Case

smallest unit on which data are measured/recorded

2
New cards

Variable

characteristic that is measured or recorded and can vary

3
New cards

The _______ variable explains, predicts, or influences the _____ variable

Explanatory, response

4
New cards

___________ sampling is required for generalization, _____________ is required for association to imply causation

random sampling; randomization of treatments

5
New cards

Categorical data: 2 Types

Names, labels, categories we call "levels"

Nominal: no ordering

Ordinal: some ordering

6
New cards

Sampling bias is bias due to

the methods employed to obtain the sample

Bias = a systematic favoring of certain outcomes

i.e. the sample is NOT representative

7
New cards

Observational Study

uncontrolled; researcher does not control either variable, just collects and records data; can have confounding variables; can infer association but not causation- association does not imply causation

8
New cards

Confounding Variable

can be a problem; varies between cases and is related to both the explanatory and response variables

9
New cards

Experimental Study

controlled; researcher controls the value/level of the explanatory variable and measures the response; if randomization is used, confounding is not a problem; CAN infer causation when an association is found

10
New cards

If data was selected In random units, you _______ generalize to population

CAN

11
New cards

If there was a random assignment of treatments, causality _______ can be concluded

CAN

12
New cards

If there was NOT a random assignment of treatments, only _____ not _____ can be concluded

association; not causation

13
New cards

Charts to use when summarizing s categorical variable

Visualizations:

Bar chart, pie chart, table

Statistics:

counts, proportion/risk, odds

14
New cards

Charts to use when summarizing two categorical variables

Visualizations:

side-by-side bar charts, stacked bar charts, two-way table

Statistics:

difference in proportions, conditional proportions, relative risk, odds ratio

15
New cards

Charts to use when summarizing a quantitative variable

Visualizations:

dot plot, histogram, box plot

Statistics:

mean, median, mode (center), standard deviation, variance (spread), range, IQR (spread), percentile/quartile

5 number summary: min, Q1, median, Q3, max

16
New cards

When skewed LEFT, the mean is...

to the LEFT of the median

17
New cards

When skewed RIGHT, the mean is...

to the RIGHT of the median

18
New cards

Z-scores are a measure of

distance from the mean in terms of "standard deviations"; puts values on standardized scale for comparison; meaningful for bell-shaped distributions

19
New cards

Z-score equation

knowt flashcard image
20
New cards

Empirical Rule

95% of the data lies within 2 SD of the mean: +/- 2(SD)

21
New cards

Summarizing a quantitative response, categorical explanatory

Visualizations:

side-by-side box plots, side-by-side dot plots, side-by-side histograms

Statistics:

difference in means

22
New cards

Summarizing two quantitative variables

Visualizations:

scatterplot

Statistics:

correlation

regression/slope

23
New cards

Interpreting a scatterplot/correlation

Direction: positive or negative

Form: linear or nonlinear

Strength: no relationship, weak, moderate, or strong

Outliers?

24
New cards

Point estimation

estimate the value of a parameter using a single value - the sample statistic

25
New cards

Interval estimation

take into account uncertainty by creating an interval estimate in which we expect the parameter to lie

26
New cards

Hypothesis testing

determine whether the evidence supports a theory or hypothesis about a parameter

27
New cards

Statistic vs Parameter

we want to make inferences about a __________ using a __________

population parameter; sample statistic

28
New cards

Paired data vs Independent (2 groups)

Paired data = observations/cases from groups can be matched or paired together meaningfully; EX: do teenagers consume more sugar on average than their parents? (samples of teenager-parent pairs)

Independent sample = observations in the two groups are unrelated to one another and are not matched in any meaningful way

EX: does a teenager consume more sugar on average than an adult? (two independent samples of teenagers and adults)

29
New cards

Sampling Distribution

the distribution of a sample statistic

30
New cards

Standard error

standard deviation of the sampling distribution

31
New cards

Bootstrapping

sample with replacement from the original sample, using the same sample size

32
New cards

Resampling method

Calculate the statistic for each one, create a dot plot to estimate the sample distribution, and then use the same procedure as before: find the standard error by calculating the standard deviation of the sampling distribution, then plug that into the Cl formula.

33
New cards

Interval estimate: SE method

An interval provides plausible range of values for the parameter: for a 95% Cl

Point estimate +/- Margin of error (2 x bootstrap SE)

34
New cards

Point estimate

approximates the population parameter

35
New cards

Margin of Error

reflects the precision/uncertainty of the point estimate and determines the width of the interval (larger sample size (n) = narrower Cl)

36
New cards

When you increase sample size...

Precision?

Variation?

SD?

Increases precision, less uncertainty in population parameter

Less variation between sample statistics/bootstrap statistics

Smaller Standard error and narrower Cl

37
New cards

Confidence interval: percentile method

for a p% confidence interval, keep the middle p% of bootstrap statistics

38
New cards

We are [x] confident that the [x] [x] is between [x] and [x]

we are [confidence level] confident that the [true population parameter] [in context] is between [lower limit] and [upper limit]

39
New cards

Writing hypotheses:

Ho is the NULL HYPOTHESIS

no effect or difference = always contains an equality sign (=)

40
New cards

Writing hypotheses:

Ha is the ALTERNATIVE HYPOTHESIS

the claim for which we seek significant evidence = contains an inequality sign (>,<, not equal to) based open the claim

41
New cards

Hypotheses are always about _______, not sample statistics

population parameters

42
New cards

P-value

the probability of obtaining a sample statistic as extreme (or more extreme than the observed sample statistic, assuming Ho is true

43
New cards

if the p-value is < a (significance level)

unlikely sample to obtain

suggests that Ho is NOT true

REJECT Ho

statistically significant

evidence DOES support your claim (Ha)

44
New cards

if the p-value >/= a (significance level)

reasonably likely sample to obtain

suggests Ho may be true

do NOT reject Ho

NOT statistically significant

evidence does NOT support your claim (Ha)

45
New cards

Bootstrapping vs. Randomization: main difference

the main difference is a randomization distribution assumes Ho is true, while a bootstrap does not

46
New cards

Type 1 error

false positive`

47
New cards

Type 2 error

false negative