Data science quiz 11/4

0.0(0)
studied byStudied by 1 person
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/22

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

23 Terms

1
New cards

t-statistic represents

how many standard errors the estimate is from zero

2
New cards

omitted variable bias

  • arises when a relevant variable is left out

  • a correctly specified model should reflect a plausible set of assumptions and should be complete

  • very hard to detect OVB

3
New cards

A model with a higher R² means

the model explains more of the variance in the dependent variable

4
New cards

data mining

  • selectively reporting significant results

  • scouring through datasets for (partial) correlations from which to develop a thesis

5
New cards

how can outliers and violations of normality affect statistical inference

outliers and non-normal data can distort estimates and invalidate hypothesis tests. Check residual plots, use transformations, or robust methods

6
New cards

consider a regression output where R² = 0.85 and TSS = 1000, what is the ESS?

ESS = R² x TSS = 0.85 × 1000 = 850

7
New cards

three main types of distributional tests

  • seeing where a value fits within a distribution (determines how unusual or typical a single value is compared to the rest of the data)

  • making a certainty estimate from a sample to the general population (estimating population parameters from a sample, larger the sample the better our estimates)

  • comparing two sample means (involves testing whether two groups are significantly different from each other)

8
New cards

z-score

number of standard deviations away from the mean

9
New cards

confidence intervals

error bars, for example at the 95% confidence interval, it is the range of values that has a 95% probability of containing the measure you are interested in

10
New cards

standard error formula

standard deviation of the sample mean

11
New cards

what question do we answer with a t test

is the difference between the two statistically significant?

12
New cards

what are all the explanations of correlation between A and B?

A causes B (possibly through C)

B causes A (possibly through C)

C causes both A and B

C causes B, and A is only spuriously correlated

13
New cards

What other misspecifications could there be to a set of data?

  • omitted variable bias

  • included variable bias

  • normality and outliers

  • data mining

  • interaction effects

  • casual endogeneity

  • ecological fallacy

14
New cards

included variable bias

the addition of multiple (usually irrelevant) variables to obtain the desired result

15
New cards

normality and outliers 

  • OLS can only analyze variables that follow a (close to) normal distribution - data can still be transformed 

  • make sure you eliminate outliers

16
New cards

How is the margin of error affected by sample size

The margin of error decreases as the sample size increases.

17
New cards

inferential statistics

using sample data to make inferences about a larger populationd

18
New cards

distributional tests

statistical tools that help determine how data points relate to theoretical probability distributions

19
New cards

OLS (ordinary least squares) multivariate analysis

statistical technique used to estimate the relationship between one dependent variable and two or more independent variables, extension of a simple linear regression

20
New cards

R² formula

1 - (residual sum of squares / total sum of squares)

21
New cards

TSS (total sum of squares)

  • measures the total variation in the dependent variable

  • quantifies how spread out the observed data are before considering any model

22
New cards

ESS (expected sum of squares)

  • a quantity used in describing how well a model, often a regression model, represents the data being modelled

23
New cards

RSS (residual sum of squares)

  • measures the unexplained variation

  • difference between point and line of best fit