Intermediate Statistics Mini Quiz 3

studied byStudied by 0 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 77

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

78 Terms

1

What is a theoretical sampling distribution? Empirical sampling distribution?

Theoretical - distribution of results we will get if we calculate a statistic for an infinite number of samples, based on logic or mathematic formulas

Empirical - frequency distribution of observed scores

New cards
2

What is meant when we say that a sampling distribution has been estimated by simulation?

1. theoretical results are asymptomatic, they tell us the shape of the distribution that we will get as sample size tend to infinity

2. sampling distributions are central to statistical inference, we can calculate how often we will get a figure in the tail of the distribution through random variation across samples

New cards
3

What is standard error?

the standard deviation of a sampling distribution

if normal, 60% of the sample will lie within 1 standard error of the mean, and 95% within 1.96 (2)

New cards
4

what does the central limit theorem tell us?

theorem that the sampling distribution of the mean becomes normal as sample size increases.

New cards
5

Appropriate sampling distribution for the mean from small sample

t distribution

New cards
6

appropriate sampling distribution for the mean from large sample

normal

New cards
7

appropriate sampling distribution for the test of independence for crosstabulation

chi square

New cards
8

what are 3 ways the t distribution resembles the normal

1. symmetrical

2. unimodal

3. as sample size rises, they become more normal, their central peaks rise and their tails lighten

New cards
9

2 ways t-distribution differs from the normal

1. low central peak

2. heavy tails

New cards
10

what is the relationship between the chi square distribution with one DF differ for a normal distribution? why?

the mean for chi square can only be a positive number because all negative values in the normal distribution will become positive once they are squared

New cards
11

why does the standard error for a mean become greater as sample size becomes smaller?

standard error increases as standard deviation increases, and standard deviation increases when sample size is low. so when sample size increases closer to that of the actual population, standard error, as well as standard deviation will increase

New cards
12

in what sense are there many t distributions? many chi squared distributions?

t - many as in there is a whole family of t distributions but convention is to refer to them as a single distribution

chi - another family of distributions like simple distributions and crosstabulations

New cards
13

for chi square, how does the sampling distribution change with DF? why does this matter?

as DF increases, the shape, central tendency, and dispersion of the distribution shift

the mean increases by 1 as DF increases by 1 - becomes more symmetric

New cards
14

for t distributions, why must we be concerned about DF?

the degree of freedom defines the shape of a t distribution

New cards
15

suppose someone calculates the standard error of the mean for some variable, and obtains 1.0. Assuming simple random sampling within what range of the true mean would 95% of the sample means lie?

95% will lie within 1.96 (approx 2)

New cards
16

what is a suggested rule of thumb for when the sampling distribution of a proportion can be treated as normal? what did graphs suggest about the accuracy of this rule of thumb?

Np and N(1-p) should both exceed 10, graphs show they become more symmetrical when these values are greater than 10

New cards
17

what does the law of large numbers tell us?

as samples grow larger statistics calculated for them tend to ward the results we would obtain if we got data from a full population

New cards
18

Difference between a null hypothesis and a research hypothesis

null states no difference between groups, or no association between variables

research states there is a difference between groups, some can predict what is higher, or there is an association between variables

New cards
19

what is a two-tailed test? one-tailed? situations when we would use them?

two - two critical regions in a distributions, extreme results in either tail are significant

- normal and t distributions

one - one critical region, results are significant only if they fall in the tail of the distribution specified by the research hypothesis

- chi-square and f distributions

New cards
20

what is a type 1 error? how can we reduce the chances of one?

mistaken rejection of a truly null hypothesis (false positive)

taking larger samples and reducing standard errors

New cards
21

what is a type 2 error? what can we do to reduce the chances of one?

fails to reject a truly false hypothesis (false negative)

taking larger samples and reducing standard errors

New cards
22

what is statistical power? why is it important?

likelihood of a significant test detecting an effect when there actually is one

tells us if the sample size is too small

New cards
23

what is a confidence interval? how are they constructed?

ranges within a given proportion of sample results can be expected to fall

start from our observed results, and then place bounds around it

New cards
24

in graphing regression results, why are confidence bands wider at the top and bottom of the centroid?

confidence bands are typically wider for low and high values of the predictor than for values near the mean

New cards
25

what fundamental question does bayesian's inference try to answer that differs from the questions posed in standard inference?

how likely it is that the true difference between groups, or the association between variables something, or lies in a specific range

in standard we test whether we hold to the null

New cards
26

write the equation for bayes' theorem. what do the various elements in the formula mean?

P(AD)=P(A/D)P(D)

P(A) probability of characteristic A

P(D) probability of a result in the data

P(A/D) prob. of having A, given the result of the data

P(D/A) prob. of a result in the data, give characteristic of A

P(AD) prob. of having A, and of getting a result in the data

P(DA) prob. of getting results in the data and having A

P(AD)+P(DA) same combo of events

New cards
27

what is the bayes factor? how does it help us estimate the P(A/D)?

the ratio that gives us thr factor by which we multiply P(A) to get P(A/D)

New cards
28

What is the difference between frequentist and a personal probability?

frequentist - probability is just the long-run relative frequency of some event, based on large numbers of people gathered over time

personal - beliefs held by an individual based on past experience and personal beliefs

New cards
29

difference between a standard confidence interval and a bayesian credible interval?

standard - the mean of your estimate plus and minus the variation in that estimate

bayesian - there is a 95% probability that the true (unknown) estimate would lie within the interval, given the evidence provided by the observed data

New cards
30

what are 3 replies those who accept personal probabilities have given to the argument allowing them involves the risk that unrealistic conclusions will be drawn because of unrealistic priors?

1. data from moderately large sample will swamp any reasonable priors

2. if people report their priors, others can redo the analysis

3. serious data analysis tend to use vague priors that assume more than modest knowledge

New cards
31

What is a PRE measure?

proportional reduction in error

these include lambda, gamma, and somers' d

it tells us how much we can reduce our errors in predicting outcomes if we know how two variables are linked

New cards
32

What is lambda?

used for nominal measures

proportion by which we can reduce our errors in guessing a case's score on the dependent variable if we know how the dependent is linked to the independent

New cards
33

what is gamma?

designed for ordinal measures

we try to predict whether pairs of cases suggest a positive or negative relationship between the variables

New cards
34

what is somner's d

modification of gamma

addition of Ty to the denominator, it represents pairs of cases tied to the dependent variable, but not to the independent

New cards
35

What is yules Q?

special case of gamma

numerically identical to gamma, difference in usage. Yule's Q does not restrict his statistic to ordered variables, can be used for nominal dichotomous

New cards
36

In the formula for r, what is the function of the SD in the denominator?

ensures that r will remain within the range from -1 to +1 , telling us if the association is positive or negative or there is no linear association

New cards
37

why is the numerator of the formula for r called the covariance?

because it is similar in form to the variance

New cards
38

n the formula for r, how is evidence of a positive association tailed up? of a negative association?

positive - if each variable scores below its mean (Xi-Xbar) will be negaitve, so will (Yi-Ybar) so (Xi-Xbar)(Yi-Ybar) will be positive

negative - X is above the mean, y below the mean, or vice versa, their product will be negative suggesting a negative association

New cards
39

starting from a formula for r that does not use algebraic notation, show what happens when the variables are standardized

its SD becomes 1

r becomes the mean of the product of x and y

New cards
40

what are 2 ways to interpret pearson's r?

1. based on the fact that squaring it yields r^2, a PRE measure

2. based on what happens if we standardized the variables in bivariate regression

New cards
41

difference between spearman's P and pearsons r

for p, instead of using the observed values of x and y, we use their ranked position (1-N)

New cards
42

what do we do before calculating rho if more than one case lies in a category?

we suppose w finer measurement, they could be distinguished , and we take the median rank that would then be found for the set

New cards
43

what is a scatterplot? what are some alternatives?

set of points plotted that show the extent of correlation

alternatives - boxplot, bar chart

New cards
44

what is a moving average

a calculation to analyze data points by creating a series of different subsets of the full data set

New cards
45

what are 2 advantages of a bar chart over a line graph? two disadvantages?

advantages - bars make it easier to estimate a value on the y-axis, greater visual impact

dis - breaks up a smooth trend line, high ink-to-information ratio

New cards
46

what is a mosaic plot? why are rectangles in the plot different sizes? why do we care about the "Pearson residuals"

a plot in which each cell is represented by a rectangle whose area is proportionate to the number of case in the cell, they are shaded differently to display residuals

Pearsons residuals tell us the difference between overserved and expected cell counts

New cards
47

why are some cells in a mosaic chart patterned or shaded differently? why might we be interested in a particularly light/dark cell?

standardized residuals are displayed by shading and patterns

dark cells usually represent a heavier cell and light shading for lighter cells. meaning larger or smaller residual values, large residuals identify a larger difference between the expected and observed cell count

New cards
48

what measure do we typically use to identify heavy tails? how is it related to chi square? what levels of measurement are we typically interested in?

we can use a crosstabulation

crosstabulations tell us the breakdown of data by the two variables, a chi tells us the results of a crosstabulation are statistically significant

nominal and ordinal

New cards
49

what is an association plot? what is the difference between rectangles above/below the line?

used when we do not need to show residuals, just draw attention to cells that are heavy/light

heavy tails are darker and are above the line, light cells are shaded lighter and rest below the line

New cards
50

what are conditional tables? what is another name for them?

set of mutually exclusive variables testing conditional probabilities of a single variable to another, 2 variables and a test factor

also called partial tables

New cards
51

how do conditional tables "control for" third variables?

the third variable the test factor is fixed so it cannot be linked between the other two variables within the table

New cards
52

what, for the columbia school, was a "test factor"

a third variable that was controlled to see how associations may change

New cards
53

what is a practical problem in breaking a sample down by many variables at once? what is one way to try to get around this problem and what difficulty arises if we take this route?

we will have few cases left in some subtables, too few cases will be the result

we could collapse test factors by for example making them dichotomous, but subtables would no longer be identifiable

New cards
54

term: specification

exists when the association between 2 variables is different for subsamples with different values of a third variable

New cards
55

term: moderation

when the relationship between 2 variables change when the value of a third changes, the moderator is the third variable

New cards
56

term: distortion

exists when the relationship between 2 variables is reversed when we control for a third

New cards
57

term: spurious relationship

the observed correlation between 2 variables exist because each is affected by a common cause

New cards
58

term: intervening variable

variable that affects the relationship between independent/dependent variables

New cards
59

term: mediator

how 2 variables relate

New cards
60

what is a doubledecker? what does the width of the bars tell us?

two variables identified at the same time, two variables within another

width of the bars are proportionate to the size of the category

New cards
61

final exam = 30+2.2*(study hours) what does 30 tell us? the 2.2?

30 - tells us someone who did not study at all is predicted to get a 30

2.2 - mark rise on average

New cards
62

in the general formula y=a+bx, what are "a" and "b" called? give 2 names for b

a - intercept

b - slope/coefficient

New cards
63

final mark = 60+0.3*(math anxiety) what does 60 tell us? the 0.3?

60 - someone with no math anxiety is predicted to get a score of 60

0.3 - grade rise on average by 0.3 for each additional point of the anxiety scale

New cards
64

explain the principle of least square through which a regression line is chosen

we must choose the line that minimizes the sum of the squared distances between scores on the dependent variable and scores predicted for them

New cards
65

two helpful by-products of our usual methods of choosing a bivariate regression line?

r^2 - how much variance in y is accounted for by x

standard error estimate

New cards
66

is the standard error equal to the variance or our errors of prediction? if not, what does it equal?

we call it the error variance

very similar to the variance

New cards
67

if errors of prediction are normally distributed, what range will 95% of them lie?

95% of the observations will lie within 1.96 standard deviations of the mean

New cards
68

show why pearsons r is equal to the regression coefficient for standardized variables

their numerators are the same

b will equal r when the denominators are the same

New cards
69

what is the generic interpretation for b? what does it become when the variables are standardized?

b gives us the average number of units of change in y for a unit of change in x

when variables are standardized, b becomes beta, beta gives us the average number of SDs of change in x

New cards
70

if a predictor is dichotomous, what does b tell us?

b gives us the average difference between the 2 groups

New cards
71

what is truncation? when might we apply it in regression?

you might truncate variables when experiencing flat trendlines, essentially recoding or removing variables to remove a flattened trend line

New cards
72

what might we do to deal with accelerating curve?

also called an exponential curve

take the log of the accelerating curve and present the logging income

New cards
73

how do we typically interpret the coefficient for a spine? why are these created?

giving us the change in slopes at a point we call the knot

spines deal with data where different slopes exist for different ranges of x values

New cards
74

when does b give us an estimate of how much % change we get in y for a unit of change in x?

when b < .20 it gives us the approximate % increase we obtain in y for a unit of change in x

New cards
75

what is an alternative to the quadratic curve? advantage to alteration?

dummy variables

allows us to give a verbal comparison of the reference category with each of the others

New cards
76

principle used to select evaluation of multiple regression? 3 advantages of choosing this way?

principle of least squares

1) passes through centroid

2) neither over no underestimate on average

3) it leads to different measures of how well we are doing in predicting the DV

New cards
77

difference between r^2 and R^2

r^2 - how much variance in y is accounted for by x

R^2 - how well we can explain the DV

New cards
78

in multiple regression, how do we obtain coefficients that shows the effects of an IV independent of other IVs

the coefficient is the number the total increases by every how much amount of time. like if you get $1000 increase in your salary every year, the coefficient is 1000*(years worked)

New cards
robot