Ch 13 Inferential Statistics

studied byStudied by 2 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 96

flashcard set

Earn XP

Description and Tags

Psychology

97 Terms

1
Statistics
descriptive data that involves measuring 1(+) variables in a sample & computing descriptive summary data (eg means, correlation coefficients) for those variables
New cards
2
parameters
corresponding values in the pop
New cards
3
sampling error
the random variability in a stat from sample to sample. (term error refers to random variability, not anyone making a mistake)
New cards
4
any stat relationship in a sample can be interpreted in 2 ways
1) there’s a relationship in the pop, & relationship in the sample reflects this. 2) there’s no relationship in the pop, & relationship in sample reflects only sampling error
New cards
5
purpose of null hypo testing is to
help researchers decide btwn these 2 interpretations ^^ (relationship in the pop, & relationship in the sample reflects this OR no relationship in the pop, & relationship in sample reflects only sampling error)
New cards
6
Null hypothesis testing
(often called null hypothesis significance testing or NHST) is a formal approach to deciding btwn 2 interpretations of a stat relationship in a sample
New cards
7
Null hypothesis
one interpretation from null hypo testing. The idea that there’s no relationship in the pop & the relationship in the sample reflects only sampling error (symbolized H0, “H-zero”)
New cards
8
Alternative hypothesis
another interpretation from null hypo. This hypothesis proposes that there’s a relationship in the pop & that relationship in the sample reflects this relationship in the pop. (symbolized as H1)
New cards
9
Every statistical relationship in a sample can be interpreted in either of these 2 ways
1) it might have occurred by chance. 2) might reflect a relationship in the pop
New cards
10
Although there r many specific null hypothesis testing techns, all based on same general logic. Steps are
1) Assume for the moment that the null hypo is true. There’s no relationship btwn the variables in the pop. 2) Determine how likely the sample relationship would be if the null hypothesis were true. 3) If the sample relationship would be extremely unlikely, then reject the null hypothesis in favor of the alternative hypothesis. If it would not be extremely unlikely, then retain the null hypothesis.
New cards
11
Reject the null hypothesis
a decision made by researchers using null hypothesis testing which occurs when the sample relationship would be extremely unlikely
New cards
12
Retain the null hypothesis
a decision made by researchers in null hypothesis testing which occurs when the sample relationship would not be extremely unlikely
New cards
13
p value
crucial step in null hypo. The probability of obtaining sample result or more extreme result if null hypo were true. Not probability that any particular hypo is true or false. Instead, probability of obtaining the sample result if null hypo were true.
New cards
14
low p value means
sample/more extreme result would be unlikely if null hypo were true & leads to rejection of null hypo.
New cards
15
Not-low P value means
sample/more extreme result would be likely if null hypo were true & leads to retention of null hypo.
New cards
16
a (alpha)
the criterion that shows how low a p-value should be before the sample result is considered unlikely enough to reject the null hypothesis (usually set to 0.05).
New cards
17
if there’s a 5% (or less) chance of a result @least as extreme as the sample result if null hypothesis were true, then null hypo is
rejected & result is said to be stag sig
New cards
18
statistically significant
an effect that’s unlikely due to random chance & therefore likely represents a real effect in the pop
New cards
19
if there’s >5% chance of result as extreme as the sample result when the null hypo is true, then the null hypo is
retained. (This doesn’t necce mean researcher accepts null hypo as true, just that isn’t enough evidence to reject.
New cards
20
P value is probability that null hypo is true & the sample result occurred by chance (T/F)
FALSE. P value is really the probability of a result @least as extreme as the sample result IF the null hypo WERE true.
New cards
21
null hypo test involves answering q
“If null hypo were true, what’s the probability of a sample result as extreme as this one?” In other words, “What is the p value?”
New cards
22
answer to question for null hypo depends on 2 considerations:
strength of relationship & size of sample. The stronger the sample relationship & the larger the sample, the less likely the result would be if the null hypo were true. That is, the lower the p value.
New cards
23
Columns of table represent
3 levels of relationship strength: weak, medium, and strong.
New cards
24
Rows represent
4 sample sizes that can be considered: small, medium, large, & extra large in context of psycho research.
New cards
25
Each cell represents
a combo of relationship strength & sample size. If a cell contains word Yes, then combo would be stat sig for both Cohen’s d & Pearson’s r. If contains word No, wouldn’t be stat sig for either.
New cards
26
A stat sig is a strong one (T/F)
not necce. Even a very weak result can be stat sig if it’s based on a large enough sample. Word significant can cause peop to interpret these difs as strong/important. However, these sta sig difs r actually quite weak. (this is why it is important to distinguish between the statistical significance of a result and the practical significance of that result)
New cards
27
practical significance
refers to importance/usefulness of result in some real-world context
New cards
28
in clinical practice, same concept as practical significance is often referred to as
“clinical significance”
New cards
29
t-Test
a test that involves looking @ the dif btwn 2 means. 3 types used for slightly dif research designs: one-sample t-test, the dependent samples t-test, & the independent-samples t-test
New cards
30
One-sample t-test
used to compare sample mean (M) w hypo pop mean (μ0) that providesinteresting standard of comparison
New cards
31
The null hypo is that the mean for the population (µ) is equal to the hypothetical population mean: μ = μ0 (T/F)
True
New cards
32
The alternative hypothesis is that the mean for the pop is eual to the hypo pop mean (T/F)
False. means dif from the hypo pop mean: μ ≠ μ0.
New cards
33
To decide btwn these 2 hypos, (null & alt), need to find
probability of obtaining the sample mean (or one more extreme) if the null hypo were true. But finding this p value requires first computing a test statistic called t. The formula for t is as follows:
New cards
34
Test Statistic
a statistic (eg F, t, etc) that’s computed to compare against what’s expected in the ull hypo, & thus helps find the p value. Useful bc we know how it’s distributed when null hypo is true.
New cards
35
If p is = or
Reject null hypo, differs from hypo mean of interest.
New cards
36
If p is >.05, we _____ null hypo & conclude ____ the hypo mean of interest.
retain null hypo, conclude there’s not enough evidence to say the popu mean differs from
New cards
37
Critical Values
the absolute value that a test statistic (eg F, t, etc) must exceed to be considered stat sig
New cards
38
Two-tailed test
where we reject the null hypo if the test statistic for the sample is extreme in either direction (+/-). This test makes sense when we believe the sample mean might differ from the hypo pop mean but we don’t have good reason to expect dif to go in a particular direction.
New cards
39
One-tailed test
where we reject the null hypo only if the t score for the sample is extreme in 1 direction that we specify before collecting the data. This test makes sense when we have good reason to expect the sample mean will differ from the hypo pop mean in a particular direction.
New cards
40
Dependent-samples t-test
(sometimes called the paired-samples t-test) used to compare 2 means for the same sample tested @ 2 dif times or under 2 dif conditions. This comparison is appropriate for pretest-posttest designs or within-subject experiments. This test can also be one-tailed if the researcher has good reason to expect the dif goes in a particular direction
New cards
41
difference score
a method to reduce pairs of scores (eg pre- & post- test) to a single score by calculating the dif btwn them
New cards
42
Independent-samples t-test
used to compare the means of 2 separate samples (M1 & M2). The 2 samples might have been tested under dif conditions in a btwn-subjects experiment, or they could be pre-existing groups in a cross-sectional design (eg. women & men, extraverts & introverts).
New cards
43
T-tests are used to compare 2 means
(a sample mean w a pop mean, the means of two conditions or 2 groups).
New cards
44
When there r more than 2 groups or condition means to be compared, most common null hypo test is
analysis of variance (ANOVA)
New cards
45
Analysis of Variance (ANOVA)
a statistical test used when there r more than 2 groups or condition means to be compared.
New cards
46
One-way ANOVA
used for btwn-subjects designs w a single independent variable. Used to compare means of more than 2 samples (M1, M2…MG) in a btwn-subjects design. The null hypo is that all the means are equal in the population: µ1= µ2 =…= µG. The alternative hypo is that not all means in pop r =.
New cards
47
The test stat for the ANOVA is called
F
New cards
48
What is F
a ration of 2 estimates of the pop variance based on the sample data.
New cards
49
2 estimates of pop variance:
1) Mean Squares btwn groups (MSB). 2) Mean Squares w/i groups (MSW)
New cards
50
Mean Squares btwn groups (MSB)
estimate of the pop variance & based on difs among the sample means
New cards
51
Mean Squares w/i groups (MSW)
estimate of the pop variance & based on difs among the scores w/i each group
New cards
52
The F statistic is the ratio of the MSB to the MSW & is expressed as follows, & useful bc
F = MSB / MSW Useful bc we know how it’s distributed when null hypo is true, & this allows us to find p value.
New cards
53
The between-groups degrees of freedom is the
# of groups minus one: dfB = (G − 1).
New cards
54
The within-groups degrees of freedom is the
total sample size minus the # of groups: dfW = N − G.
New cards
55
The online tools in Ch12 & statistical software (Excel and SPSS) cannot compute F & find the p value (T/F)
false.
New cards
56
An “ANOVA table”
also includes “sum of squares” (SS) for btwn & within groups. Values r computed on way to finding MSB and MSW but aren’t typically reported by researcher.
New cards
57
ㅁ When we reject the null hypo in a one-way ANOVA, we conclude that
the group means aren’t all the same in the pop… But this can indicate dif things. W 3 groups, it can indicate that all 3 means are sig dif from eachother, or that 1 of means is sig dif from other 2 but other 2 aren’t sig dif from eachother, or that mean of dieticians is sig dif from the means for psych & nutrition majors, but means for psych & nutrition majors aren’t sig dif from eachother.
New cards
58
post hoc comparisons
an unplanned (not hypothesized) test of which pairs of group mean scores are dif from which others
New cards
59
One approach to post hoc comparisons
conduct a series of independent-samples t-tests comparing each group mean to each of other group means. Prob → If we conduct a t-test when null hypo is true, have 5% chance of mistakenly rejecting null hypo. If conduct several t-tests when null hypo is true, chance of mistakenly rejecting at least 1 null hypo increases w each test
New cards
60
one-way ANOVA is appropriate for what kind of design
between-subjects designs in which the means being compared come from separate groups of participants.
New cards
61
One-way ANOVA is not appropriate for what kind of design
within-subjects designs in which the means being compared come from the same participants tested under different conditions or at different times. This requires a slightly different approach, called repeated-measures ANOVA
New cards
62
Repeated-measures ANOVA
compares the means form the same participants tested under dif conditions or at dif times in which the dependent variable is measured multiple times for each participant
New cards
63
How are repeated-measures ANOVA & one-way ANOVA different?
The main difference is that measuring the dependent variable multiple times for each participant allows for a more refined measure of MSW.
New cards
64
When more than one independent variable is included in a factorial design, the appropriate approach is
the factorial ANOVA
New cards
65
Factorial ANOVA
a statistical method to detect differences in the means between conditions where there are 2(+) independent variables in a factorial design. It allows the detection of main effects & interaction effects.
New cards
66
Difference btwn factorial ANOVA and one-way/repeated-measures ANOVAs
main dif is that it produces an F ratio and p value for each main effect and for each interaction.
New cards
67
For relationships btwn quantitative variables, where Pearson’s r (the correlation coefficient) is used to describe strength of those relationships, the appropriate null hypo test is
a test of the correlation coefficient.
New cards
68
Is null hypo testing usually correct?
In null hypo testing, researcher tries to draw a reasonable conclusion abt the pop based on sample, but not guaranteed correct.
New cards
69
Rows represent 2 possible decisions researchers can make in null hypo testing
reject or retain the null hypo.
New cards
70
Columns represent 2 possible states of wrld
null hypo’s false or tru.
New cards
71
4 cells of table represent 4 distinct outcomes of a null hypo test
Two of outcomes (reject null hypo when false & retain when true) are correct decisions. The other two (reject null hypo when true & retainit when it’s false) are errors.
New cards
72
Type I Error
a false + in which the researcher concludes that their results are statistically significant when in reality there is no real effect in the population & the results are due to chance. In other words, rejecting the null hypo when it’s true.
New cards
73
Type II Error
a missed opportunity in which the researcher concludes that their results are not statistically significant when in reality there’s a real effect in the pop & they just missed detecting it. In other words, retaining the null hypo when it’s false.
New cards
74
In principle, it’s possible to reduce the chance of a Type I error (T/F)
True. by setting α to something
New cards
75
In principle, it’s possible to reduce the chance of a Type II error (T/F)
True. by setting α to something greater than .05 (e.g., .10). But making it easier to reject false null hypotheses also makes it easier to reject true ones and therefore increases the chance of a Type I error.
New cards
76
Possibility of committing Type I & Type II errors has several important implications for interpreting the results of our own & others’ research:
We should be cautious abt interpreting the results of any indv study bc there’s a chance it reflects a Type I or II error. This possibility is why researchers consider it important to replicate their studies. Each time researchers replicate a study and find a similar result, they rightly become more confident that the result represents a real phenomenon and not just a Type I or Type II error.
New cards
77
File Drawer Problem
issue related to Type I errors. When researchers obtain non-sig results, they tend not to submit them for publication, or if they do submit them, journal editors/reviewers tend not to accept them. As a consequence, the published literature fails to contain a full representation of the + & - findings abt a research q. Researchers end up putting these non-significant results away in a file drawer (or nowadays, in a folder on their hard drive). Difficult bc result of trad conduct & publish scientific research.
New cards
78
One effect of File Drawer Problem
published lit prob contains a higher proportion of Type I errors than we might expect on basis of stat considerations alone. Even when there’s a relationship btnw 2 variables in pop, published research lit likely to overstate strength of that relationship.
New cards
79
One solution to File Drawer Problem
is registered reports, whereby journal editor/reviewers evaluate research submitted for publication w/o knowing results. If the research q judged to be interesting & sound method, then a non-sig result should be just as important & worthy of publication as a significant one.
New cards
80
p-hacking
when researchers make various decisions in the research process to increase their chance of a statistically sig result (& type I error) by arbitrarily removing outliers, selectively choosing to report dependent variables, only presenting sig results, etc. until their results yield a desirable p value.
New cards
81
statistical power
in research design, it means the probability of rejecting the null hypo given the sample size & expected relationship strength.
New cards
82
2 steps to increase stat power (given that it depends prim on relationship & sample size)
increase strength of relationship & sample size
New cards
83
Increase the strength of the relationship
can sometimes be accomplished by using a stronger manipulation or by more carefully controlling extraneous variables to reduce the amount of noise in the data (e.g., by using a within-subjects design rather than a between-subjects design).
New cards
84
Tncrease the sample size
usual strategy. For any expected relationship strength, there will always be some sample large enough to achieve adequate power.
New cards
85
Criticisms of null hypo testing focus on
researchers’ misunderstanding of it.
New cards
86
Criticisms of null hypo testing
researchers don't understand, it's illogical, & uninformative.
New cards
87
Was null hypo testing ever banned?
In 2015, the editors of Basic and Applied Social Psychology banned use of null hypo testing & related statistical procedures. Can submit papers w p-values, but editors will remove them before publication. Editors didn’t provide better solution, but emphasized importance of descriptive stats & effect sizes.
New cards
88
What should be done abt probs w null hypo? Some suggestions in APA Publication Manual:
1) Each null hypo test should be accompanied by an effect size measure like Cohen’s d or Pearson’s r. This ensures an estimate of how strong relationship in pop is, not just if there is/isn’t one. 2) Use confidence intervals vs null hypo tests.
New cards
89
Confidence intervals
a range of values that’s computed in such a way that some % of the time (usually 95%) the population parameter will lie within that range
New cards
90
More radical solutions to probs of null hypo testing
involve using dif approaches to inferential statistics: Bayesian Statistics
New cards
91
Bayesian Statistics
an approach in which the researcher specifies the probability that the null hypo & any important alternative hypos are true before conducting the study, conducts the study, & then updates the probabilities based on the data
New cards
92
Replicability crisis
a phrase that refers to the inability of researchers to replicate earlier research findings
New cards
93
The low replicability of many studies is evidence of widespread use of questionable research practices by psycho researchers. These may include
(1) → Selective deletion of outliers in order to influence stat relationships among measured variables. (2) → Selective reporting of results, cherry-picking only those findings that support hypotheses. (3) → Mining the data w/o an a priori hypothesis, only to claim that a stat sig result had been originally predicted, a practice referred to as “HARKing” or hypothesizing after the results are known. (4) → A practice colloquially known as “p-hacking”, in which a researcher might perform inferential statistical calculations to see if a result was signbefore deciding whether to recruit additional participants and collect more data. (5) → Outright fabrication of data, although this would be a case of fraud rather than a “research practice.”
New cards
94
HARKing
Hypothesizing After the Results are Known: A practice where researchers analyze data w/o a priori hypo, claiming afterward that a stat sig result had been orig predicted
New cards
95
this “crisis” has also highlighted the importance of enhancing scientific rigor by:
(1) → Designing & conducting studies that have sufficient statistical power, in order to increase the reliability of findings. (2) → Publishing both null & sig findings (thereby counteracting the publication bias & reducing file drawer problem).(3) → Describing one’s research designs in sufficient detail to enable other researchers to replicate your study using an identical or at least very similar procedure. (4) → Conducting high-quality replications and publishing these results
New cards
96
One particularly promising response to the replicability crisis has been the emergence of
open science practices that increase the transparency & openness of scientific enterprise.
New cards
97
Open science practices
a practice in which researchers openly share their research materials w other researchers in hopes of Increasing the transparency & openness of the scientific enterprise
New cards

Explore top notes

note Note
studied byStudied by 3 people
187 days ago
5.0(1)
note Note
studied byStudied by 37 people
353 days ago
5.0(1)
note Note
studied byStudied by 12 people
697 days ago
5.0(1)
note Note
studied byStudied by 6 people
830 days ago
5.0(1)
note Note
studied byStudied by 11 people
238 days ago
4.0(1)
note Note
studied byStudied by 61 people
884 days ago
4.5(2)
note Note
studied byStudied by 8 people
705 days ago
5.0(1)
note Note
studied byStudied by 251 people
170 days ago
5.0(1)

Explore top flashcards

flashcards Flashcard (30)
studied byStudied by 16 people
556 days ago
5.0(1)
flashcards Flashcard (80)
studied byStudied by 13 people
690 days ago
5.0(1)
flashcards Flashcard (66)
studied byStudied by 80 people
548 days ago
5.0(1)
flashcards Flashcard (29)
studied byStudied by 5 people
772 days ago
5.0(1)
flashcards Flashcard (53)
studied byStudied by 7 people
25 days ago
5.0(1)
flashcards Flashcard (349)
studied byStudied by 175 people
523 days ago
5.0(1)
flashcards Flashcard (27)
studied byStudied by 4 people
799 days ago
5.0(1)
flashcards Flashcard (112)
studied byStudied by 48 people
527 days ago
5.0(1)
robot