1/42
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Two population hypothesis testing for independent samples
Testing if population means/proportions are equal by comparing two independent samples
-assume difference = 0 (null)
Two sample hypothesis tests for dependent samples
-There is only one sample - made up of data from two independent populations
- We are testing whether the separate populations are equal by comparing the differences between the two populations in the sample
- We assume that the distributions of differences = 0 (null hypothesis)
- Aka paired samples, repeated measures, related samples
Dependent samples vs independent samples
dependent samples are preferred over independent samples because they reduce the variation in the sampling distribution (aka remove extraneous variation)
Quantitative data (means)
Data that measures 'how much' of an attribute a variable has
Qualitative data (proportions)
Data that answers 'does the variable have the attribute?' with yes or no
3 basic assumptions for 2-sample hypothesis testing
1. Two populations are normally distributed, 2. Two populations are independent, 3. Two populations standard deviations are known
2 extra assumptions made when test is about proportions (qualitative data)
1. Binomial conditions: Mutually exclusive results, independent trials
2. large sample: n(p) and n(1-p) >=5
Standard error
Standard deviation of the sampling distribution
Center of the sampling distribution
The population parameter/s
ANOVA
Analysis of variance technique to test whether 3 or more population means are equal or different
--The populations must follow a normal distribution
- The populations must have equal standard deviations
- The populations must be independent
One-way ANOVA
ANOVA with only one factor (treatments)
-completely randomized design
Two-way ANOVA
ANOVA with two factors (treatments and blocks)
-randomized block design
-removes extraneous variation by decreasing MSE
Treatments/Blocks
The independent populations being examined
SOV
Source of variation
-one-way: treatments, error, total
-two-way: treatments, blocks, error, total
SOS
Sum of squares aka variation
one-way: SST, SSE, SStotal
two-way: SST, SSB, SSE, SS Total
DF
Degrees of freedom
-one-way:
treatments: k-1
error: n-k
total: n-1
-two-way:
treatments: k-1
blocks: B-1
error: (k-1)(b-1)
total: n-1
*k=total # treatments
*b=total # blocks
*n= total # of observations
F-distribution
Used to test whether samples are from populations with equal variances
SST (treatment variation)
the sum of the squared differences between each treatment mean and the overall mean
-Aka "explained variation" because the variation is explained by the factor
-variation BETWEEN the treatment means
SSE (random variation)
the sum of the squared differences between each observation and its treatment mean
-Aka "unexplained variation" because the variation is explained by error/chance
-variation WITHIN the treatment means
SST (total variation)
sum of the squared differences between each observation and the overall mean
Mean Squares
estimate of variance
F-distribution
used to test whether samples are from populations with equal variances
- Has a family
- Continuous
- Asymptotic
- Positively skewed (unlike Z and T which follow a normal distribution)
-cannot be negative (unlike z and t)
6 Step Hypothesis Test for ANOVA
1. Null hypothesis, 2. Alternative hypothesis,
3. Test statistic,
4. P-value,
5. Decision,
6. Answer
F-value (test statistic)
Indicates whether the ratio is too much greater than 1 to have happened by chance
-If F ≈ 1, accept Ho (if F is equal to or really close to 1, accept)
-If F > 1, reject Ho (if F is too much greater than 1, reject)
Interaction
The effect of one factor on a response variable differs depending on the value of another factor
Scatter plot
Visual graphical representation of the relationship between two variables (bivariant)
Correlation coefficient (r)
Measure of the strength of the linear relationship between two variables
Elements of correlation coefficient (r)
- Shows direction (positive/negative) and strength (weak/moderate/strong)
- 0 = no linear relationship
- +1 = direct/positive relationship
- -1 = inverse/negative relationship
Spurious correlations
Relationship between two variables seems to be cause and effect but is actually not
Regression analysis
Technique used to find the equation of the line that best fits the data
Regression equation
Equation that expresses the linear relationship between 2 variables
Least squares method
Uses data to position the line of best fit that minimizes the sum of squares of the vertical distances between the points (actual y values) and the line (predicted y values)
Standard error of the estimate
Measure of the dispersion around the regression line (average distance of the points from the line)
-smaller SE, better prediction
Coefficient of determination (r^2)
Proportion of variation in the dependent variable explained by the variation in the independent variable
Global test
*for multiple reg
Hypothesis test to test the ability of the independent variables to predict the dependent variable
Homoscedasticity
Variation in the residuals (diff btwn Y and y hat) is the same for all values of the x variables
Multicollinearity
Exists when independent variables are strongly correlated (close to +1 or -1)
-Causes: inaccurate estimates of population slopes
-to fix this: check the VIF (variance inflation factor)
-If VIF >10, then independent variables should be removed from the analysis
Adjusted coefficient of determination (adjusted r^2)
Adjusts for the number of additional independent variables
-inc as more are added
Dummy variables
Variables with only 2 possible outcomes used to represent a qualitative variable in regression analysis
-coded 0 or 1
Interaction
2+ independent variables combined have a larger effect on the dependent variable as compared to the independent variables alone
Stepwise regression
Method used to decide which independent variables to use in a multiple regression equation
-Only independent variables with nonzero regression coefficients can be used (aka they must have a relationship with the dependent variable otherwise they are useless)
forward selection method (stepwise)
start with 0 ind variables and add one one at a time to regression equation
backward elimination method (stepwise)
start with entire set of variables and eliminate one at a time