stats 3

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/63

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 8:56 AM on 3/24/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

64 Terms

1
New cards

fit of the model

The degree to which a statistical model represents the data collected

2
New cards

outcomei=(model)+errori

the data we observe can be predicted from the model we choose to fit plus some amount of error

3
New cards

parameter p

are not measured and are (usually) constants believed to represent some fundamental truth about the relations between variables in the mode

4
New cards

variables

are measured constructs that vary across entities in the sample

5
New cards

the mean

is a hypothetical value: it is a model created to summarize the data and there will be error in prediction

6
New cards

error/deviance

he score predicted by the model for that entity subtracted from the corresponding observed score.

7
New cards

standard deviation

tells us about how well the mean represents the sample data

8
New cards

sampling distribution

is the frequency distribution of sample means (or whatever parameter you’re trying to estimate) from the same population

9
New cards

standard error of the mean (SE)/ standard error

tells us how widely sample means spread around the population mean

10
New cards

central limit theorem

as samples get large (usually defined as greater than 30), the sampling distribution has a normal distribution with a mean equal to the population mean

11
New cards

confidence intervals

boundaries within which we believe the population value will fall

12
New cards

t-distribution

is a family of probability distributions that change shape as the sample size gets bigger (when the sample is very big, it has the shape of a normal distribution)

13
New cards

5% threshold

only when there is a 5% chance (or 0.05 probability) of getting the result we have (or one more extreme) if no effect exists are we confident enough to accept that the effect is genuine

14
New cards

alpha (α)

the long-run error rate that you are prepared to accept

the probability of accepting an effect in our population as true, when no such effect exists

15
New cards

systematic variation

ariation that can be explained by the model that we’ve fitted to the data (and, therefore, due to the hypothesis that we’re testing).

16
New cards

unsystematic variation

variation that cannot be explained by the model that we’ve fitted. In other words, it is error, or variation not attributable to the effect we’re investigating.

17
New cards

test statistic

The ratio of effect relative to error

18
New cards

significant test statistic

tells us that the model would be unlikely to fit this well if the there was no effect in the population

19
New cards

type 1 error

occurs when we believe that there is a genuine effect in our population, when in fact there isn’t

20
New cards

type 2 error

occurs when we believe that there is no effect in the population when, in reality, there is

21
New cards

familywise or experiment-wise error rate

error rate across statistical tests conducted on the same data

22
New cards

Bonferroni correction

divide α by the number of comparisons, k, to control for familywise error rate

23
New cards

moderator variable

affects the relationship between two others

24
New cards

centering

refers to the process of transforming a variable into deviations around a fixed point

25
New cards

mediation

refers to a situation when the relationship between a predictor variable and an outcome variable can be explained by their relationship to a third variable

26
New cards

the four conditions of mediation

1. the predictor variable must significantly predict the outcome variable in model 1

2. the predictor variable must significantly predict the mediator in model 2

3. the mediator must significantly predict the outcome variable in model 3

4. the predictor variable must predict the outcome variable less strongly in model 3 than in model 1.

27
New cards

three linear model of mediation

1. A linear model predicting the outcome from the predictor variable. The b value coefficient for the predictor gives us the value of c

2. A linear model predicting the mediator from the predictor variable. The b value for the predictor gives us the value of a

3. A linear model predicting the outcome from both the predictor variable and the mediator. The b-value for the predictor gives us the value of c’ and the b-value for the mediator gives us the value of b

28
New cards

Sobel test

assesses the significance of the indirect effect

29
New cards

index of mediation

standardized indirect effect

30
New cards

dummy variable

is a way of representing groups of people using only zeros and ones

31
New cards

direct effect

the effect of the predictor independent of the mediator

32
New cards

indirect effect

the effect of the predictor through the mediator

33
New cards

p-hacking

testing multiple hypotheses but only reporting the significant ones

34
New cards

HARKing

formulating or modifying hypotheses after data have already been analyzed to make the results seem predicted and theoretically sound

35
New cards

rules for residuals

  • having a standardized residual greater then 3.24 (3)

  • more than 1% of the sample has a standardized residual above 2.58 (2.5)

  • more than 5% of the sample have a residual above 1.96 (2)

36
New cards

cooks distance

can be thought of as a general measure of influence of a point on the values of the regression coefficients

greater than 1 may be cause for concern

37
New cards

point with high leverage

An observation with an outlying value on a predictor variable

can have a large effect on the estimate of regression coefficients

greater than 3 x ((k+1)/n) or 2 x ((k+1)/n)

38
New cards

Mahalanobis distance

indicates the distance of cases from the means of the predictor variables

influential cases have values above 25 in large samples (500), above 15 in smaller samples (100), and above 11 in small samples (30)

<p>indicates the distance of cases from the means of the predictor variables</p><p>influential cases have values above 25 in large samples (500), above 15 in smaller samples (100), and above 11 in small samples (30)</p>
39
New cards

detecting multicollinearity

1. correlations between predictors (!) higher than .80 or .90

2. VIF of a predictor >10

3. tolerance of a predictor <.10

40
New cards

ways of bias entering

  • parameter estimates

  • standard errors and confidence intervals

  • test statistics and p-values

41
New cards

outlier

score very different from the rest of the data

42
New cards

assumption

is a condition that ensures that what you’re attempting to do works

43
New cards

main assumptions

  • additivity and linearity

  • normality of something or other

  • homoscedasticity/homogeneity of variance

  • independence

44
New cards

additivity and linearity

means that the relationship between the outcome variable and predictors is accurately described by the equation of the linear model

45
New cards

central limit theorem

egardless of the shape of the population, parameters estimates of that population will have a normal distribution provided the samples are ‘big enough’

should have at least 30

46
New cards

impact of homoscedasticity

  • parameters

  • null hypothesis significance testing

47
New cards

homoscedasticity/homogeneity of variance

In designs in which you test groups of cases this assumption means that these groups come from populations with the same variance. In correlational designs, this assumption meansia that the variance of the outcome variable should be stable at all levels of the predictor variable.

48
New cards

Independence

the errors in your model are not related to each other

49
New cards

z-scores for outliers

in a normal distribution we’d expect about 5% to be greater than 1.96 (we often use 2 for convenience), 1% to have absolute values greater than 2.58, and none to be greater than about 3.29

50
New cards

skewness

positive values indicate a pile-up on the left of the distribution

negative values indicate a pile-up on the right

51
New cards

kurtosis

positive values indicate a heavy-tailed distribution

negative scores indicate a light-tailed distribution

the further away from zero the less likely that its normally distributed

52
New cards

Levenes test

tests the null hypothesis that the variances in different groups are equal

Levene’s test is significant at p ≤ 0.05 then people tend to conclude that the null hypothesis is incorrect and that the variances are significantly different therefore, the assumption of homogeneity of variances has been violated

53
New cards

Kolmogorov–Smirnov test/ Shapiro–Wilk test

compare the scores in the sample to a normally distributed set of scores with the same mean and standard deviation.

If the test is non-significant (p > 0.05) it tells us that the distribution of the sample is not significantly different from a normal distribution

54
New cards

Q-Q plot

kurtosis is shown up by the dots sagging above or below the line,

skew is shown up by the dots snaking around the line in an ‘S’ shape.

55
New cards

TWAT

  • trim the data

  • winsorizing

  • apply a robust estimation method

  • transform the data

56
New cards

trimming the data

means deleting some scores from the extremes

should be done only if you have good reason to believe that this case is not from the population that you intended to sample

57
New cards

percentage based rule

would be, for example, deleting the 10% of highest and lowest scores

58
New cards

trimmed mean

calculate the mean in a sample that has been trimmed

59
New cards

standard deviation based rule

involves calculating the mean and standard deviation of a set of scores, and then removing values that are a certain number of standard deviations greater than the mean

60
New cards

Winsorizing

involves replacing outliers with the next highest score that is not an outlier

61
New cards

Robust methods

non-parametric tests that do not rely on the assumption of normality

62
New cards

bootstrap

the sample data are treated as a population from which smaller samples (called bootstrap samples) are taken (putting each score back before a new one is drawn from the sample). The parameter of interest (e.g., the mean) is calculated in each bootstrap sample

63
New cards

transforming data

you do something to every score to correct for distributional problems, outliers, lack of linearity or unequal variances

if you are looking at relationships between variables you can transform only the problematic variable, but if you are looking at differences between variables (e.g., changes in a variable over time) you must transform all the relevant variables.

64
New cards

Explore top flashcards

flashcards
Physics 3LC Final review
63
Updated 657d ago
0.0(0)
flashcards
QB questions
75
Updated 1180d ago
0.0(0)
flashcards
Parts of the Brain - AP Psych
29
Updated 911d ago
0.0(0)
flashcards
Earth's Interior
20
Updated 209d ago
0.0(0)
flashcards
antigone revision
41
Updated 1173d ago
0.0(0)
flashcards
Physics 3LC Final review
63
Updated 657d ago
0.0(0)
flashcards
QB questions
75
Updated 1180d ago
0.0(0)
flashcards
Parts of the Brain - AP Psych
29
Updated 911d ago
0.0(0)
flashcards
Earth's Interior
20
Updated 209d ago
0.0(0)
flashcards
antigone revision
41
Updated 1173d ago
0.0(0)