Intermediate Statistics Mini Quiz 3

0.0(0)
studied byStudied by 29 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/69

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

70 Terms

1
New cards

What is a theoretical sampling distribution? Empirical sampling distribution?

Theoretical - distribution of results we will get if we calculate a statistic for an infinite number of samples, based on logic or mathematic formulas

Empirical - frequency distribution of observed scores

2
New cards

What is meant when we say that a sampling distribution has been estimated by simulation?

1. theoretical results are asymptomatic, they tell us the shape of the distribution that we will get as sample size tend to infinity

2. sampling distributions are central to statistical inference, we can calculate how often we will get a figure in the tail of the distribution through random variation across samples

3
New cards

What is standard error?

the standard deviation of a sampling distribution

if normal, 60% of the sample will lie within 1 standard error of the mean, and 95% within 1.96 (2)

4
New cards

what does the central limit theorem tell us?

theorem that the sampling distribution of the mean becomes normal as sample size increases.

5
New cards

Appropriate sampling distribution for the mean from small sample

t distribution

6
New cards

appropriate sampling distribution for the mean from large sample

normal

7
New cards

appropriate sampling distribution for the test of independence for crosstabulation

chi square

8
New cards

what are 3 ways the t distribution resembles the normal

1. symmetrical

2. unimodal

3. as sample size rises, they become more normal, their central peaks rise and their tails lighten

9
New cards

2 ways t-distribution differs from the normal

1. low central peak

2. heavy tails

10
New cards

what is the relationship between the chi square distribution with one DF differ for a normal distribution? why?

the mean for chi square can only be a positive number because all negative values in the normal distribution will become positive once they are squared

11
New cards

why does the standard error for a mean become greater as sample size becomes smaller?

standard error increases as standard deviation increases, and standard deviation increases when sample size is low. so when sample size increases closer to that of the actual population, standard error, as well as standard deviation will increase

12
New cards

in what sense are there many t distributions? many chi squared distributions?

t - many as in there is a whole family of t distributions but convention is to refer to them as a single distribution

chi - another family of distributions like simple distributions and crosstabulations

13
New cards

for chi square, how does the sampling distribution change with DF? why does this matter?

as DF increases, the shape, central tendency, and dispersion of the distribution shift

the mean increases by 1 as DF increases by 1 - becomes more symmetric

14
New cards

for t distributions, why must we be concerned about DF?

the degree of freedom defines the shape of a t distribution

15
New cards

suppose someone calculates the standard error of the mean for some variable, and obtains 1.0. Assuming simple random sampling within what range of the true mean would 95% of the sample means lie?

95% will lie within 1.96 (approx 2)

16
New cards

what is a suggested rule of thumb for when the sampling distribution of a proportion can be treated as normal? what did graphs suggest about the accuracy of this rule of thumb?

Np and N(1-p) should both exceed 10, graphs show they become more symmetrical when these values are greater than 10

17
New cards

what does the law of large numbers tell us?

as samples grow larger statistics calculated for them tend to ward the results we would obtain if we got data from a full population

18
New cards

Difference between a null hypothesis and a research hypothesis

null states no difference between groups, or no association between variables

research states there is a difference between groups, some can predict what is higher, or there is an association between variables

19
New cards

what is a two-tailed test? one-tailed? situations when we would use them?

two - two critical regions in a distributions, extreme results in either tail are significant

- normal and t distributions

one - one critical region, results are significant only if they fall in the tail of the distribution specified by the research hypothesis

- chi-square and f distributions

20
New cards

what is a type 1 error? how can we reduce the chances of one?

mistaken rejection of a truly null hypothesis (false positive)

taking larger samples and reducing standard errors

21
New cards

what is a type 2 error? what can we do to reduce the chances of one?

fails to reject a truly false hypothesis (false negative)

taking larger samples and reducing standard errors

22
New cards

what is statistical power? why is it important?

likelihood of a significant test detecting an effect when there actually is one

tells us if the sample size is too small

23
New cards

what is a confidence interval? how are they constructed?

ranges within a given proportion of sample results can be expected to fall

start from our observed results, and then place bounds around it

24
New cards

in graphing regression results, why are confidence bands wider at the top and bottom of the centroid?

confidence bands are typically wider for low and high values of the predictor than for values near the mean

25
New cards

what fundamental question does bayesian's inference try to answer that differs from the questions posed in standard inference?

how likely it is that the true difference between groups, or the association between variables something, or lies in a specific range

in standard we test whether we hold to the null

26
New cards

write the equation for bayes' theorem. what do the various elements in the formula mean?

P(AD)=P(A/D)P(D)

P(A) probability of characteristic A

P(D) probability of a result in the data

P(A/D) prob. of having A, given the result of the data

P(D/A) prob. of a result in the data, give characteristic of A

P(AD) prob. of having A, and of getting a result in the data

P(DA) prob. of getting results in the data and having A

P(AD)+P(DA) same combo of events

27
New cards

what is the bayes factor? how does it help us estimate the P(A/D)?

the ratio that gives us thr factor by which we multiply P(A) to get P(A/D)

28
New cards

What is the difference between frequentist and a personal probability?

frequentist - probability is just the long-run relative frequency of some event, based on large numbers of people gathered over time

personal - beliefs held by an individual based on past experience and personal beliefs

29
New cards

difference between a standard confidence interval and a bayesian credible interval?

standard - the mean of your estimate plus and minus the variation in that estimate

bayesian - there is a 95% probability that the true (unknown) estimate would lie within the interval, given the evidence provided by the observed data

30
New cards

what are 3 replies those who accept personal probabilities have given to the argument allowing them involves the risk that unrealistic conclusions will be drawn because of unrealistic priors?

1. data from moderately large sample will swamp any reasonable priors

2. if people report their priors, others can redo the analysis

3. serious data analysis tend to use vague priors that assume more than modest knowledge

31
New cards

What is a PRE measure?

proportional reduction in error

these include lambda, gamma, and somers' d

it tells us how much we can reduce our errors in predicting outcomes if we know how two variables are linked

32
New cards

What is lambda?

used for nominal measures

proportion by which we can reduce our errors in guessing a case's score on the dependent variable if we know how the dependent is linked to the independent

33
New cards

what is gamma?

designed for ordinal measures

we try to predict whether pairs of cases suggest a positive or negative relationship between the variables

34
New cards

what is somner's d

modification of gamma

addition of Ty to the denominator, it represents pairs of cases tied to the dependent variable, but not to the independent

35
New cards

What is yules Q?

special case of gamma

numerically identical to gamma, difference in usage. Yule's Q does not restrict his statistic to ordered variables, can be used for nominal dichotomous

36
New cards

In the formula for r, what is the function of the SD in the denominator?

ensures that r will remain within the range from -1 to +1 , telling us if the association is positive or negative or there is no linear association

37
New cards

why is the numerator of the formula for r called the covariance?

because it is similar in form to the variance

38
New cards

n the formula for r, how is evidence of a positive association tailed up? of a negative association?

positive - if each variable scores below its mean (Xi-Xbar) will be negaitve, so will (Yi-Ybar) so (Xi-Xbar)(Yi-Ybar) will be positive

negative - X is above the mean, y below the mean, or vice versa, their product will be negative suggesting a negative association

39
New cards

starting from a formula for r that does not use algebraic notation, show what happens when the variables are standardized

its SD becomes 1

r becomes the mean of the product of x and y

40
New cards

what are 2 ways to interpret pearson's r?

1. based on the fact that squaring it yields r^2, a PRE measure

2. based on what happens if we standardized the variables in bivariate regression

41
New cards

difference between spearman's P and pearsons r

for p, instead of using the observed values of x and y, we use their ranked position (1-N)

42
New cards

what do we do before calculating rho if more than one case lies in a category?

we suppose w finer measurement, they could be distinguished , and we take the median rank that would then be found for the set

43
New cards

what is a scatterplot? what are some alternatives?

set of points plotted that show the extent of correlation

alternatives - boxplot, bar chart

44
New cards

what is a moving average

a calculation to analyze data points by creating a series of different subsets of the full data set

45
New cards

what are 2 advantages of a bar chart over a line graph? two disadvantages?

advantages - bars make it easier to estimate a value on the y-axis, greater visual impact

dis - breaks up a smooth trend line, high ink-to-information ratio

46
New cards

what is a mosaic plot? why are rectangles in the plot different sizes? why do we care about the "Pearson residuals"

a plot in which each cell is represented by a rectangle whose area is proportionate to the number of case in the cell, they are shaded differently to display residuals

Pearsons residuals tell us the difference between overserved and expected cell counts

47
New cards

why are some cells in a mosaic chart patterned or shaded differently? why might we be interested in a particularly light/dark cell?

standardized residuals are displayed by shading and patterns

dark cells usually represent a heavier cell and light shading for lighter cells. meaning larger or smaller residual values, large residuals identify a larger difference between the expected and observed cell count

48
New cards

what measure do we typically use to identify heavy tails? how is it related to chi square? what levels of measurement are we typically interested in?

we can use a crosstabulation

crosstabulations tell us the breakdown of data by the two variables, a chi tells us the results of a crosstabulation are statistically significant

nominal and ordinal

49
New cards

what is an association plot? what is the difference between rectangles above/below the line?

used when we do not need to show residuals, just draw attention to cells that are heavy/light

heavy tails are darker and are above the line, light cells are shaded lighter and rest below the line

50
New cards

what are conditional tables? what is another name for them?

set of mutually exclusive variables testing conditional probabilities of a single variable to another, 2 variables and a test factor

also called partial tables

51
New cards

how do conditional tables "control for" third variables?

the third variable the test factor is fixed so it cannot be linked between the other two variables within the table

52
New cards

what, for the columbia school, was a "test factor"

a third variable that was controlled to see how associations may change

53
New cards

what is a practical problem in breaking a sample down by many variables at once? what is one way to try to get around this problem and what difficulty arises if we take this route?

we will have few cases left in some subtables, too few cases will be the result

we could collapse test factors by for example making them dichotomous, but subtables would no longer be identifiable

54
New cards

term: specification

exists when the association between 2 variables is different for subsamples with different values of a third variable

55
New cards

term: moderation

when the relationship between 2 variables change when the value of a third changes, the moderator is the third variable

56
New cards

term: distortion

exists when the relationship between 2 variables is reversed when we control for a third

57
New cards

term: spurious relationship

the observed correlation between 2 variables exist because each is affected by a common cause

58
New cards

term: intervening variable

variable that affects the relationship between independent/dependent variables

59
New cards

term: mediator

how 2 variables relate

60
New cards

what is a doubledecker? what does the width of the bars tell us?

two variables identified at the same time, two variables within another

width of the bars are proportionate to the size of the category

61
New cards

final exam = 30+2.2*(study hours) what does 30 tell us? the 2.2?

30 - tells us someone who did not study at all is predicted to get a 30

2.2 - mark rise on average

62
New cards

in the general formula y=a+bx, what are "a" and "b" called? give 2 names for b

a - intercept

b - slope/coefficient

63
New cards

final mark = 60+0.3*(math anxiety) what does 60 tell us? the 0.3?

60 - someone with no math anxiety is predicted to get a score of 60

0.3 - grade rise on average by 0.3 for each additional point of the anxiety scale

64
New cards

explain the principle of least square through which a regression line is chosen

we must choose the line that minimizes the sum of the squared distances between scores on the dependent variable and scores predicted for them

65
New cards

two helpful by-products of our usual methods of choosing a bivariate regression line?

r^2 - how much variance in y is accounted for by x

standard error estimate

66
New cards

is the standard error equal to the variance or our errors of prediction? if not, what does it equal?

we call it the error variance

very similar to the variance

67
New cards

if errors of prediction are normally distributed, what range will 95% of them lie?

95% of the observations will lie within 1.96 standard deviations of the mean

68
New cards

show why pearsons r is equal to the regression coefficient for standardized variables

their numerators are the same

b will equal r when the denominators are the same

69
New cards

what is the generic interpretation for b? what does it become when the variables are standardized?

b gives us the average number of units of change in y for a unit of change in x

when variables are standardized, b becomes beta, beta gives us the average number of SDs of change in x

70
New cards

if a predictor is dichotomous, what does b tell us?

b gives us the average difference between the 2 groups