ADA Summary Test

0.0(0)

Studied by 5 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/67

Earn XP

Description and Tags

Statistics

Collecting Data

University/Undergrad

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

68 Terms

New cards

Deviance/ Error

The distance of each score from the mean

New cards

Sum squared errors

The total amount of error in the mean (the errors/deviances are squared before adding them up)

New cards

Variance

The average distance of scores from the mean. (it is the sum of squares divided by the number of scores) Tells us how widely dispersed scores are around the mean.

New cards

Standard Deviation

The square root of the variance

New cards

Z-score

The sign tells us if the original score was above or below the mean, the value tells us how far the score was from the mean in sd units. (z-score = (score - mean of all scores)/ standard deviation of all scores)

New cards

Probability theory

Uses language of sets

New cards

Sets

A collection of things/ elements

New cards

Universal Set

is the set of all things that we could possibly consider in the context of what we are studying (S={1,2,3,4,5,6} --> for a dice)

New cards

Function

A rule that takes an input from a specific set, called the domain and produces an output from another set, called co-domain

New cards

Sample Space

The set of all possible outcomes

New cards

Range

A function as the set containing all the possible values of f(x). Thus the range of a function is always a subset of its co-domain

New cards

Mutually exclusive Independence

--> Mutually exclusive events cannot be statistically independent, since knowing that one occurs gives information about the other (specifically, that it certainly does not occur)
--> can’t happen at the same time
--> If A and B are mutually exclusive events they are statistically independent if and only if P(A)=0 or P(B)=0 or both are zero

New cards

The law of large numbers

The higher the numbers of trials the closer we get to the true probability

New cards

Central limit theorem

framework so we can do statistical inference, as the sample size increases becomes more and more like a normal distribution

New cards

Sample Distribution

When you take the average of the sample averages, it will look like your population mean because it is a normal distribution we can come up with p-values

New cards

Descriptive statistics

Summarize the characteristics of a data set (The data with regards to average mean, median, range)

New cards

Inferential statistics

Allows you to test a hypothesis or asses whether your data is generalizable to the broader population.

- takes data from a sample and makes inferences about the larger population from which the sample was drawn.
- goal of inferential statistics is to draw conclusions from a sample and generalize them to a population
- draw conclusion from the sample, happens when calculating p values

New cards

Normal Distribution

99,7% of sample results are contained within 3 standard errors
95% within 2 standard errors
68% within 1 standard error

New cards

Standard error

= standard deviation of a sampling distribution

New cards

Null hypothesis

Null hypothesis rejected when sample statistic falls in the rejection region/s. (In hypothesis testing, we start by assuming the null hypothesis is true.)

When 95% and P

New cards

Confidence Level

probability between the 2 rejection regions for a two tailed test. If α=0,10, then the Confidence Level is 1-0,10 = 0,90 or 90%

New cards

Confidence Interval

The bounds equal to the lower and upper critical values (The area (region can measure anything))

New cards

Type 1

False positive (reject a true null hypothesis)

New cards

Type 2

False negative (accept a false null hypothesis)

New cards

non parametric test (distribution free test)

does not assume anything about the underlying distribution (for example, that the data comes from anormal distribution and does not have a normal distribution)

New cards

parametric test

makes assumptions about a population’s parameters(for example, the mean or standard deviation)

New cards

One tailed test

When you want to know if something is simply higher or lower

New cards

What test should be used with unequal variances?

Welch's ANOVA

New cards

ANOVA

The analysis of the variances, it tells you if there is a difference between at least 2 of the groups, not which groups are different from another.

New cards

Total sum of squares (TSS or SST)

tells you how much variation there is in the dependent variable, it is a measure of how a data set varies around a central number (like the mean)

New cards

Sum of squares

the main goal is to see if there is any overlapping, just like variance

New cards

Between Sum of Squares Between Sum of Squares (a.k.a. Explained/Model/Treatment) (SSB)

the explained Sum of Squares tells you how much of the variation in the dependent variable of your model is explained.

New cards

Residual (Error) Sum of Squares (within sum of squares) (SSE)

tells you how much of the dependent variable’s variation your model did not explain. It is the sum of the squared differences between the actual Y and the predicted Y (observed vs expected)

New cards

F-distribution (use)

We use an F-distribution when we are studying the ratio of the variances of two normally distributed populations

New cards

F-test

The further the groups are from the grand mean, the larger the variance in the numerator becomes. In our F-test, this corresponds to having a higher variance in the numerator.

New cards

F-ratio

=MSB/MSE

New cards

What type of Anova? (1 grouping variable)

one-way ANOVA

New cards

What type of Anova? (Another grouping variable)

two-way anova

New cards

What type of Anova? (factorial ANOVA)

three-way Anova

New cards

ANOVA Assumptions

1. Check Assumptions
a. Normality (Sharpiro-Wilk)
b. Outliers (BoxPlots)
2. Run one-way ANOVA with post-hoc
a. Tukey & Games Howell
b. Levene’s Test= do the distribution for each group, looking almost the same, is there homogeneity?
3. Run GLM to check partial eta squared
a. Estimates of effect size
4. Calculate omega-squared
5. Interpret the data

New cards

Eta square (Eta^2)

How good that measures the outcome; how much does my model explain the total variance in the observations; how much does it explain of the total variation (= SSbetween / SStotal)

New cards

Omega square

less biased alternative measure of the how good your model explains the results (especially when sample size is small)

New cards

Factoiral ANOVA

= we are examining how much of the variance in our data can be explained by our independent variables (>1)
= it looks at the main effects of the PV and their interaction effect on the OV
=a (name) with 2 PVs is a two-way ANOVA, etc.

New cards

When do we use a Factoiral ANOVA

a) OV (outcome variable) = quantitative
b) PV= categorical
c) Independent groups (between-subject design)
d) Variance is homogenous across groups (similar in shape)
e) (Residuals (actual obs. to the average) are normally distributed)
If you have an interaction effect= there is dependency

New cards

Moderation

It is a way to check whether that third variable influences the strength or direction of the relationship between independent and dependent.

New cards

Mediator

= mediated the relationship between independent and dependent; explains the reason for such a relationship to exist
Is the influence of mediator stronger than the influence of the direct independent variable (imagine you have grades lead to happiness and self-esteem is the mediator; what you want to do with mediation= we try to see if the variable “self-esteem” explains the existence of the “grades” variable completely)

New cards

Correlation

= measures the degree of a relationship between two variables (x and y)
= find the numerical value that shows the relationship between the two variables and how they move together

New cards

Regression

= analysis helps to determine the functional relationship between two variables (x and y) so that you’re able to estimate the unknown variable to make future projections on events and goals

= to estimate the values of a random variable (z) based on the values of your known (or fixed) variables (x and y).
= is considered to be the best fitting line through the data points

New cards

Pearson 'r'

= measures the strength of the linear relationship between two quantitative variables.
= is always a number between -1 and 1. r > 0 indicates a positive association.

New cards

R squared

Coefficient of determination= tells you SSB (SSM)/ SST= tells you whether your model, how much the variability in the outcome is explained by the model; indicator on how good your linear model is/ proportion of variability in your outcome that is explained by the model

= shows how well the data fit the regression model (the goodness of fit)
= The higher the better

New cards

Bootstrap Procedure (non-parametric)

1. Choose a number of bootstrap samples to perform
2. Choose a sample size
3. For each bootstrap sample
4. Draw a sample with replacement with the chosen size
5. Calculate the statistic on the sample
6. Calculate the mean of the calculated sample statistics

New cards

Simple linear regression

represented by: y = β0 +β1x+ε
β0 --> y-intercept
β1 --> slope
E(y) --> is the mean or expected value of y for a given value of x

New cards

Adjusted R-squared

A modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases when the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected. Typically, the adjusted R-squared is positive, not negative. It is always lower than the R-squared.

New cards

Standardized coefficient beta

= Standardize (we use it; use apples & apples – just brings everything to the same base) = we want to compare between two variables (for example 0.467 has a bigger effect than 0.146); which has a bigger predictive effect
 Use it when we want to compare effect sizes across PV
 Easier to compare

New cards

Unstandardized beta

= we want to figure out the exact relation/ predictive relation between the variables and our outcome
The math aptitude test scores for every unit increase in that, we can see at point 0.116 increase in our statistics exam results= the actual outcome that happens

 If you want to interpreted individual PVs impact on the OV
 Easier to interpret

New cards

Orthogonalization

= refers to axes being at a right angle
= in moderation we need it to fix the distorting effect of multicollinearity (increasing standard errors and decreasing the t-statistic)
In factor analysis we also make use of orthogonalization when we rotate the factors because all the multidimensional axes have to be at a right angle to form the factor/component

New cards

Tolerance

= an indication of a percent of variance in the predictor that cannot be accounted for by the other predictors; meaning that very small values indicate that a predictor is redundant

New cards

Dummy variables

=in statistics and econometrics, particularly in regression analysis, a dummy variable is one that takes (converts – main goal) only the value 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome.
=dummy variable is a dichotomous, but a dichotomous is not necessarily a dummy variable

In ANOVA, the dependent variable was continuous
 Independent variables can be dichotomous (dummy variables), but not the dependent variables

New cards

Odds

= probability of success/ probability of failure

New cards

Multivariate statistical methods

= the joint behavior of more than one random variable

New cards

Goal of PCA (principal components analysis)

= reduce the number of dimensions that we have;
We decide on the 2 or 3 variables (15 variables) we do that by Scree Plot

We started in the survey with 15 questions, but we don’t really know what variable they are measuring, so PCA will help us to see if any underline variables that we cannot see just by looking at the data

New cards

Principal Components & Factor Analysis

= reduce dimensionality of the problem to better understand the underlying factors affecting those variables

New cards

Factors

= Linear combination (variate) of the original variables. Factors also represent the underlying dimensions (constructs) that summarize or account for the original set of observed variables. Factors are a type of latent (hidden/ underlying/ its hidden somewhere there, but you don’t know yet) variable.
It is a variable that is dependent on any other variables

New cards

Factor loading

= correlation between variables (it is the SWLS1)
A question is a variable

New cards

Communality

= another word from R-square (how much does you PV explain the OV variance)

New cards

Eigenvalue

% of variance explained * the total number of variables (throw away eigenvalue below 1)

New cards

Covariance

= involves 2 dimensions; think of it like correlation

New cards

Correlation (R)

= can be calculated covariance of 2 dimensions/ SD of X * SD of Y