Data Analysis Exam Review

0.0(0)
studied byStudied by 1 person
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/82

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

83 Terms

1
New cards

Hypothesis

2
New cards

Null hypothesis

X does not cause Y or is not associated with Y

3
New cards

Type I error

False positive (wrongly rejecting a null hypothesis)

4
New cards

Type II error

False negative (failure to reject a null hypothesis that ought to be rejected)

5
New cards

Statistical significance

Allows us to measure whether data we observe is attributable to random chance alone

6
New cards

Descriptive data

Meant to reflect some truth about the world as it is

7
New cards

Mean

The average, x̄, or result of adding up all numbers (n) in a given set, then dividing by the number of numbers

8
New cards

Median

The middle, or the number in the middle when all numbers in a given set are arranged from smallest to largest

9
New cards

Mode

The most common or repeated observations

10
New cards

Normal distribution

Data that are distributed normally are symmetrical around their mean in a bell shape

11
New cards

Standard Deviation

Measure of how disperse the data are from their mean; how spread out the observations are - assigns a single number to a dispersion around the mean

12
New cards

The _ the standard deviation, the closer the data points tend to be to the mean

Lower

13
New cards

The lower the standard deviation, the ____ the data points tend to be to the mean

Closer

14
New cards

Standard error

Measure of the dispersion of the sample means (in the underlying population)

15
New cards

The standard error=

Standard deviation of the sample means

16
New cards

Large standard error

Sample means are spread out widely around the population mean

17
New cards

Small standard error

Sample means are clustered relatively tightly around the mean; so the more representative it is of the overall population

18
New cards

Ordinal Variables

Have a clear order to them; can be ranked

19
New cards

Ordinal Variable Eg.

Party ID ordered from left (1=Strong Democrat) to right (7=Strong Republican)

20
New cards

Categorical Variables

Observations correspond to categories/classes that do not necessarily have intrinsic ordering

21
New cards

Categorical Variable Eg.

Male vs female (0, 1)

22
New cards

Numerical Variables

Exact numbers that are evenly spaced apart

23
New cards

Numerical Variables Eg.

Exact income in dollar amounts, age, etc.

24
New cards

Formula for formulas in Excel

= FORMULA NAME(range: in parentheses)

25
New cards

Correlation

Measures the degree to which two phenomena are related to one another

26
New cards

Two variables are positively correlated if

A change in one is associated with a change in the other in the same direction

27
New cards

Two variables are negatively correlated if

A change in one is associated with a change in the other in the opposite direction

28
New cards

Correlation coefficient

Encapsulates the association between two variables in a single descriptive statistic

29
New cards

Correlation of 0 (or close to it)

Variables have no meaningful association with one another

30
New cards

Spurious correlation

Two variables appear to be directly related, but a hidden third variable actually influences both, or when the relationship exists purely by coincidence without any underlying causal mechanism

31
New cards

Expected value (or payoff)

Sum of all the different outcomes, each weighted by its probability and payoff; tells you whether a particular event is "fair," given its price and expected outcome

32
New cards

P-Value

A number between 0 and 1 that expresses the probability that the null hypothesis is true

33
New cards

Probability

Study of events and outcomes involving an element of uncertainty; not about the world we see, but about all the possible worlds

34
New cards

"Black swan" events

When events that may occur only "very rarely" do eventually happen

35
New cards

Expected Value

Way to more precisely understand the value of a decision - a shorthand for understanding the stakes of a choice

36
New cards

Calculate Expected Value

Weigh (multiply) all possible outcomes by their probability and payoff and then sum them

37
New cards

Law of Large Numbers

The more independent observations you draw randomly from a population, the closer the mean of those observations will get to the mean of the population

38
New cards

Selection bias

When mechanism for sorting individuals into one group or the other is not random; participants are not representative of the population

39
New cards

Publication bias

Positive findings are more likely to be published than negative findings, which can skew the results that we see

40
New cards

Recall bias

When participants do not remember previous events or experiences accurately; recalling the past based on the present or things that have since happened

41
New cards

Survivorship bias

When researchers focus on individuals, groups, or cases that have passed some sort of selection process while ignoring those who did not

42
New cards

Healthy user bias

People who faithfully engage in activities that are good for them are fundamentally different from those who don't

43
New cards

Central Limit Theorem

"A large, properly drawn sample will resemble the population from which it is drawn" - If we know about a population, we can make inferences about a sample, and vice-versa

44
New cards

Causal inference

Study of what we can learn about whether X causes Y, and if so, what effects X has on Y (causal hypothesis)

45
New cards

Experiments

Most common and strong way to learn about the effect of X on Y

46
New cards

Causal Effect

Compares the outcomes of different treatment values to quantify the impact of a treatment

47
New cards

Causal Effect Formula

Yi(1) - Yi(0)

48
New cards

Confounders

Things we do not directly observe that may be affecting the relationship we are interested in

49
New cards

Some experimental basics

Treatment, Subjects or participants, Potential outcomes, Average treatment effect/average causal effect, Randomization

50
New cards

Average treatment effect (ATE)

Average of the causal effects (treated potential outcome - untreated potential outcome) for all subjects

51
New cards

Three core assumptions of experimental analysis

Randomization, Noninterference, Symmetry

52
New cards

Randomization

Allows us to assume that potential outcomes are unrelated to pre-treatment covariates or characteristics

53
New cards

Noninterference

Each subject's potential outcomes reflect only whether that subject is treated or not -- a subject's potential outcomes are not affected by the treatments that other subjects receive

54
New cards

Symmetry

Identically structured treatments

55
New cards

Compliance

Whether or not people completed their assignment

56
New cards

Intent-to-Treat (ITT)

All participants who were enrolled and randomly allocated to treatment are included in the analysis and are analyzed in the groups to which they were randomized

57
New cards

Attention

Were people paying attention to your study? Or did they just give you a random number to make you happy?

58
New cards

Pre-registration

Declaring your expectations in advance of what you'll find

59
New cards

Demand effects

People infer what you're trying to do and respond accordingly

60
New cards

T-test

Determines whether the mean difference between two groups (experiment and control/placebo) is statistically significant

61
New cards

If the t-test is a value higher than 1.96 OR lower than -1.96

The associated p-value is .05 or less

62
New cards

Statistical significance p-value

0.5 or less

63
New cards

T-test formula for experimental data

knowt flashcard image
64
New cards

Regression

Tool used to understand relationships between independent and dependent variables

65
New cards

Regression Equation

Y = a + bX

66
New cards

Y in Regression Equation

The outcome variable we are interested in (dependent variable)

67
New cards

X in Regression Equation

Independent variable

68
New cards

a in Regression Equation

The y-intercept (the value for x when y is zero)

69
New cards

b in Regression Equation

The coefficient associated with each X independent variable

70
New cards

Bivariate Regression

Just one dependent and one independent variable

71
New cards

Multivariate Regression

One dependent variable and many independent variables

72
New cards

Regression Coefficient

Measures the average change in the Y variable if the X variable changes, holding everything else constant

73
New cards

R Squared (R^2)

Measures the amount of variation in the Y that is explained by your regression

74
New cards

0 in R^2

None of the Y is explained by your regression

75
New cards

1 in R^2

All of Y is explained by your regression

76
New cards

Confidence interval

Gives an estimated range of values that is likely to include the "actual" number you are looking for

77
New cards

Types of variables

Categorical, Ordinal, Nominal, Dummy/binary, Continuous

78
New cards

Ordinal

Variables that can be ranked

79
New cards

Nominal

Variables that cannot be ranked

80
New cards

Dummy/binary

Variables that only have two types of observations (almost always 1's and 0's)

81
New cards

Continuous

Observations are numbers that have intrinsic ordering, with potentially infinite range (eg. height, weight)

82
New cards

Standardization

Achieved by converting everything into standard deviation units

83
New cards

Running a regression in R

Object name ← lm (dependent variable ~ independent variable, data = data set name)