301 Midterm!

0.0(0)
studied byStudied by 2 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/156

flashcard set

Earn XP

Description and Tags

Modules 1-4

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

157 Terms

1
New cards

learning to use statistics is NOT about…

  • calculations

  • mechanical conclusions

  • certainty and exactitude

2
New cards

statistics should be concerned with…

creating arguments that convey interesting and credible points based on interpretation of appropriate evidence from empirical measurements or observations

3
New cards

analysis of data requires observing some _____ _______

comparative difference

4
New cards

example of comparative difference

are the scores for a sample of people on some outcome measure different for two groups of people assigned to different conditions of an experiment

5
New cards

inferential statistics provides guidance in doing what

differentiating among different types of explanations (chance vs systematic) for a given comparative difference

6
New cards

what is considered to be a claim

a chosen explanation

7
New cards

what are the competing explanations for CD’s

chance vs systematic explanations

8
New cards

3 explanations for difference

  1. systematic factor accounts for ALL variation

  2. random chance accounts for ALL variation

  3. combination of systematic and chance account for variation

9
New cards

in psychology, which explanation for difference is rarely/never used

the first explanation (systematic factor accounts for ALL variation)

almost never tenable and often not directly tested

10
New cards

what explanations do we test? what happens if that explanation is rejected?

we first test the chance explanation

if that doesnt work and our test rejects it, then we accept the combination explanation

11
New cards

what is NHST meant to assess

if our observed difference is substantially different than what one would expect from what would occur if random chance factors were completely responsible (the null hypothesis)

12
New cards

NHST is the ____ _____ for differentiating between chance and systematic influence explanations

dominant procedure

13
New cards

(NHST) if the data are not dramatically discrepant with what would be expected from chance, what can we conclude?

the random chance explanation is a tenable explanation for our difference

14
New cards

(NHST) if the test indicates the data are highly inconsistent with a chance explanation, what can we conclude?

we discard the purely chance explanation and by process of elimination prefer the explanation of a combined systematic influence and chance factors

15
New cards

why dont we use terms like “accept” and “reject” in formal statistical language

generally considered too strong and definitive to use these terms because NHST are aids to judging between explanations and not absolute declarations of truth or falsity

16
New cards

alternatives to “accept” and “reject”

if we fail to reject - “null hypothesis remains viable” or “we have retained the null hypothesis”

if we reject - we have “discredited” the null hypothesis

17
New cards

limitations of NHST

  • backhanded way of testing a research hypothesis

  • concluding that there is some systematic difference is a very limited form of information

18
New cards

what is the logic behind Abelson’s MAGIC criteria

Abelson believes that the goal of statistical analysis should be to make compelling and persuasive claims; the MAGIC criteria helps us to establish if a claim is compelling and persuasive

19
New cards

MAGIC

  • magnitude

  • articulation

  • generality

  • interestingness

  • credibility

20
New cards

M (MAGIC)

magnitude; how big is the effect? large effects are more compelling than small ones

21
New cards

A (MAGIC)

articulation; how specific is the claim? precise statements are more compelling than imprecise ones.

22
New cards

G (MAGIC)

generality; how generally does it apply? more general effects are more compelling than less general ones. claims that would interest a more general audience are more compelling

23
New cards

I (MAGIC)

interestingness; interesting effects are those that "have the potential, through empirical analysis, to change what people believe about an important issue" more interesting effects are more compelling than less interesting ones. In addition, more surprising effects are more compelling than ones that merely confirm what is already known

24
New cards

C (MAGIC)

credibility; credible claims are more compelling than incredible ones. the researcher must show that the claims made are credible. results that contradict previously established ones are less credible

25
New cards

chance is the ______ ______ we assume, UNLESS….

baseline explanation; the data require us to adopt a more complex explanation (chance + systematic)

26
New cards

NHST is used to tell us how…

rare sample discrepancy is from the sample and the true pop

27
New cards

sampling error

error in statistical analysis arising from the unrepresenativeness of the sample taken

28
New cards

if NHST indicates the observed difference is common for a sample with no difference….

chance explanation is viable

29
New cards

if NHST indicates the observed difference is very rare for a sample with no difference….

chance explanation is NOT viable

30
New cards

what influences NHST?

sample size

31
New cards

in NHST, differences in larger samples are…

smaller because they tend to more accurately represent the population

unlikely; NHST is going to assess a difference as unlikely even if the difference is only small because sampling error is low in this situation

32
New cards

in NHST, differences in smaller samples are…

larger because they tend to less accurately represent the population

UNlikely; NHST is going to assess a difference as unlikely if the difference is substantial because even small differences often occur with high sampling error

33
New cards

small sample =

bigger sampling error

34
New cards

large sample =

smaller sampling error

35
New cards

what is an independent samples t-test

a statistical test that compares two samples that are independent from one another (they are drawn from separate populations)

36
New cards

generic formula for t-tests

knowt flashcard image
37
New cards

what do larger t values indicate

greater likelihood of discrepancy from the hypothesized (population) value

less likely that the 2 samples were drawn from 2 populations that DO NOT differ in mean scores

more difference between pop.

more weird/unlikely

38
New cards

what is standard error

precision of sample estimates/amount of error in the sample

39
New cards

larger SE (t-test) =

smaller t value

40
New cards

what do smaller t values indicate

less difference between sample and population

less weird and unlikely

41
New cards

standard deviation reflects the…

dispersion of scores around the sample mean

42
New cards

formula for independent samples t-test

where 𝑋1 and 𝑋2 are means from the two samples

where 𝜇1 and 𝜇2 are means from the two populations

<p>where 𝑋1 and 𝑋2 are means from the two samples</p><p>where 𝜇1 and 𝜇2 are means from the two populations</p>
43
New cards

characteristics of the t-value distribution

  • helps us interpret a t value with a given df

  • centers on a mean of zero with the majority of values falling relatively close to zero and fewer and fewer the more one moves away from zero

  • when there are few df, the values tend to spread out from zero. They cluster closer to zero as df increases and eventually approximates the normal distribution (above 120 very close)

<ul><li><p>helps us interpret a t value with a given df</p></li><li><p>centers on a mean of zero with the majority of values falling relatively close to zero and fewer and fewer the more one moves away from zero</p></li><li><p>when there are few df, the values tend to spread out from zero. They cluster closer to zero as df increases and eventually approximates the normal distribution (above 120 very close)</p></li></ul>
44
New cards

the error in our sample mean as an estimate of the population mean (standard error) decreases as…

dispersion decreases and sample size increases

45
New cards

what is the level where we decide the chance explanation is no longer tenable? (t-tests)

alpha

46
New cards

conventional standard for alpha (t-tests)

5% (p ≤ .05)

47
New cards

two-tailed tests

test where there is no direction

no special status to direction of effect so we consider the 2.5% most extreme negative values and 2.5% most extreme positive values

48
New cards

type I error

concluding a mean difference exists in the populations (rejecting the null) when there is actually no difference (the null is true)

49
New cards

alpha is the chance of ____ ___ _____ we are willing to accept (t-tests)

type I error

50
New cards

what is a one-tailed test

when a researcher has a strong basis to make a directional prediction regarding the mean difference, differences in the “wrong” direction will be dismissed and treated as similar to null effects

one will only consider extreme t values in one direction (e.g., positive)

rather than splitting 5% into both tails of the distribution, all 5% is are in one tail

51
New cards

one-tailed tests make the test more….

“liberal” in that less extreme values are required for significance

52
New cards

a significant one-tailed t value (p = .05) will correspond with….

a two-tailed t value with a p = .10 (one-tailed p x 2)

53
New cards

criticisms of one-tailed directional tests

  • how sure do we need to be to use it?

  • too liberal?

  • can we/should we ignore differences in the opposite direction?

54
New cards

what is a lopsided tests

not common but used to compromise between one and two-tailed tests when a researcher has a directional prediction

differentially weight the tails of the distribution (i.e., more liberal threshold for the predicted direction and a more conservative threshold for the unexpected direction)

makes it easier to abandon the null if it is in the expected direction, but does allow for abandoning the null for an unexpected finding if it meets a very stringent standard

no widely accepted standard - researcher could specify any differential weighting so long as the researcher could defend the logic of the choice

55
New cards

type II error

concluding there is no mean difference between our populations (accepting or failing to reject the null) when there is actually a difference in means between the populations (the null is false)

56
New cards

conventional level for type II error

.20

57
New cards

what is power

probability that a statistical test will correctly reject a false null hypothesis

58
New cards

relationship between power and type II error (t-tests)

power is inversely related to Type II error

as the statistical power of a test increases, the likelihood of making a Type II error decreases (and vice versa)

59
New cards

determinants of power (t-tests)

  • alpha level; stricter the alpha, the lower the power (controlled by researcher)

  • sample size; larger the sample size, the greater the power (controlled by researcher)

  • magnitude of effect - effect size; larger the effect of the IV, the greater the power (somewhat under control of researcher)

60
New cards

how can we plan sample size based on power (t-tests)

can use power as a basis for determining an appropriate sample size

we specify our alpha (e.g., .05), specify our desired power (e.g., .80), and then make an assumption about the magnitude of the effect we expect, we can calculate the sample size required to achieve that power

61
New cards

traditional view of problems with low power

  • difficult to interpret null findings

  • wasteful to conduct research with low power if you get a null

  • perhaps not a big deal if we get a significant effect?

62
New cards

contemporary view of problems with low power

  • problem is more complex: low power is not only an issue with false negatives. also can have false positive

  • power is low because we typically have small effect and/or small sample size

  • studies of this sort will tend to have lots of error in estimating populations

  • not so misleading if we run lots of studies and report them all

  • is a problem IF…

    • we do a single study and then only report it if its significant

    • do lots of studies and only report the significant ones

  • power is a major concern in the replication crisis

63
New cards

assumptions of independent samples t-tests

  • independence of observations

  • the distribution of the outcome variable should be normally distributed in each group

  • homogeneity (equality) of variance in the outcome

    variable across the groups

64
New cards

what is a repeated measures t-test

testing a difference between two means for the same sample of people

65
New cards

contexts for using repeated measures t-tests

  • testing Time 1 and Time 2 differences on the same outcome measure

  • testing differences on the same outcome measure under different conditions (e.g., within-subjects experiment)

  • testing differences in means for two different outcome measures (requires equivalence of scaling)

66
New cards

formula for repeated measures t-test

𝜇𝐷 is the mean of the difference scores in the population

𝑆𝐷 is the standard error for the sample mean of difference scores

<p>𝜇𝐷 is the mean of the difference scores in the population</p><p>𝑆𝐷 is the standard error for the sample mean of difference scores</p>
67
New cards

for repeated measures t-tests, factors affecting the size of the t value are….

similar: magnitude of difference, standard deviation of difference scores, and sample size

concepts such as alpha (α),one-tailed vs. two-tailed tests, beta (β), and power all remain the same

68
New cards

factors affecting the size of the t value (t-tests)

  • magnitude of difference

    • t-value tends to increase as the magnitude of the difference between the groups or conditions increases.

    • if the difference between groups is very small, the t-value is likely to be small

  • standard deviations of difference scores

    • when the SD is small (less variability), it leads to a larger t-value because the difference between groups is relatively more pronounced compared to the variation within each group.

    • a larger SD (greater variability) results in a smaller t-value because the difference between groups is less clear when compared to the inherent variability within each group.

  • sample size

    • larger sample size provides more data, which can reduce the impact of random variation and make it easier to detect significant differences

    • smaller sample sizes can lead to larger variability in the t-value, making it harder to detect significant differences unless the effect is very large

69
New cards

assumptions of repeated measures t-tests

  • independence of observations

  • difference scores are normally distributed

70
New cards

advantages/disadvantages of independent samples vs repeated measures designs

  • RM have more power

  • RM are more economical

  • IND. have no carry over effects

  • IND. less vulnerable to demand characteristics

71
New cards

carry over effects

when the effects of one treatment or condition persist and influence the outcomes of subsequent treatments or conditions

72
New cards

demand characteristics

subtle cues or expectations within an experiment that may influence participants' behavior or responses

73
New cards

NHST does not speak to the….

size of the difference; and we need to know the magnitude of the difference to make compelling statistical claims based on the MAGIC criteria !!

74
New cards

concluding that the null hypothesis is very unlikely (based on the p value) is not the same as concluding that…

the difference is large!

75
New cards

the proposed alternative to NHST

Bayesian statistics

76
New cards

advocates for Bayesian statistics argue that the logic of NHST is…

fundamentally flawed

just because the null is unlikely for our data, does not necessarily mean the data are likely to be drawn from a population where our systematic difference is true (i.e., the alternative hypothesis is true)

77
New cards

what is the Bayes factor

what we calculate in Bayesian stats

ratio of the likelihood of the alternative hypothesis relative to the likelihood of the null hypothesis

used to calculate magnitude of difference

78
New cards

interpretation of the Bayes factor

  • value of 1 means equal likelihood of alternative relative to null

  • below 1 means null more likely

  • above 1 means alternative more likely than null (threshold of 3 for moderate evidence, 10 for strong evidence)

79
New cards

objection to Bayesian statistics

does increase in confidence in the alternative relative to the null really translate into magnitude of the effect and how do we interpret that?

80
New cards

raw effect size

measure of the absolute or unadjusted difference between groups or conditions, typically expressed in the original units of measurement of the data, making it interpretable in a practical context

e.g. mean differences or unstandardized regression coefficients (in regression)

81
New cards

when are raw effect sizes useful

when the DV is on a metric that is meaningful and readily interpretable in light of some other criteria

when you want to convey the size of the effect in the same units as the data and when you need to understand the practical significance of an effect

82
New cards

when are raw effect sizes problematic

when the outcome variable is not easily interpretable with respect to specifiable criteria

when one needs to compare effects with outcome variables that are on different metrics

NOTE: dont get used that much in psyc!

83
New cards

what is a standardized effect size

designed to provide a unitless or standardized representation of the effect

common standardized effect size measures include Cohen's d (for comparing means), Pearson's r (for correlational relationships) and Hedges' g (a variant of Cohen's d that corrects for sample size bias)

84
New cards

what is Cohen’s d

one of the most widely used effect size indices which expresses magnitude as a standardized difference between mean

85
New cards

Cohen’s d for independent samples

the mean difference divided by the pooled standard deviation of the two samples

<p>the mean difference divided by the pooled standard deviation of the two samples</p>
86
New cards

what impacts Cohen’s d (Ds) for independent samples

  • increases as the mean difference increases and the standard deviations decrease

  • not influenced by sample size

87
New cards

interpretation/range of Cohen’s d (Ds) for independent samples

has a minimum value of 0 (no difference) and no upper boundary

can be interpreted as the percentage of the standard deviation

  • 0.5 indicates the difference between the means is half the size of

    the dependent variable’s standard deviation

  • 1.00 indicates the difference is as big as the standard deviation of

    of the dependent variable

  • 2.00 indicates a mean difference twice the size of the standard

    deviation of the dependent variable

88
New cards

Cohen’s d (Ds) guidelines for independent samples

0.2 (small), 0.5 (medium), and 0.8 (large)

not based on a compelling theoretical or empirical foundation, chosen arbitrarily

89
New cards

Cohens d for independent samples

dS

90
New cards

Cohen’s d for repeated measures

dAV or dRM

<p>dAV or dRM </p>
91
New cards

interpretation of Cohens d for repeated measures

𝑑𝑎𝑣 and 𝑑𝑟𝑚 are interpreted in a manner similar to ds

actors affecting the size of 𝑑𝑎𝑣 and 𝑑𝑟𝑚are similar to that of ds

when standard deviations in both sets of observations are equal, 𝑑𝑎𝑣 and 𝑑𝑟𝑚 are equal

dav will tend to be more similar than 𝑑𝑟𝑚 to ds except when r is low and the difference between standard deviations are large

𝑑𝑟𝑚 is more conservative than 𝑑𝑎𝑣 but is considered overly conservative when r is large

92
New cards

what is Hedges g

a modification of Cohen's d (another effect size measure) designed to account for potential bias in the estimation of effect sizes due to small sample sizes

Cohen’s d is a positively biased estimate of the population effect size, particularly for small samples, this effect size corrects d for that bias

93
New cards

what is Pearson’s r coefficient

used to quantify the strength and direction of the linear relationship between two continuous variables (another effect size), it is one of the most common methods for assessing the degree to which two variables are related to each other

94
New cards

what is a point biserial correlation

the relationship between a dichotomous variable (e.g., membership in one of two groups) and a continuous variable (e.g., a dependent variable)

can be expressed through Pearsons r

95
New cards

interpreting r

ranges from -1.00 to 1.00 with .00 indicating no association

96
New cards

standardized effect sizes and their relationship to importance

small effects are not necessarily unimportant and large effects are not necessarily important

97
New cards

why do large effect sizes not directly imply practical significance

  • metric can be hard to interpret without reference to more concrete reference criteria

  • durability of an effect might also be relevant in addition to its size

  • cost/benefit analysis also can determine practicality

98
New cards

when are small effects impressive

  • when there are minimal manipulations of the independent variable

  • difficult to influence dependent variable

99
New cards

conceptual consequences of an effect

  • existence of an effect differentiates between competing theories

  • existence of an effect challenges reigning theory

  • existence of an effect demonstrates a new or disputed phenomenon

100
New cards

what can be used to calculate confidence intervals for the sample value

standard error