1/81
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Define correlation
Describes linaer association between two numerical variables
What are the two components of describing correlation?
1. Sign / direction of relationship
2. magnitude/strength - tightness of data
What are the three possibilities of explaining a correlation between two variables?
1. Result occurred by change
2. One variable influences another
3. Variables are influenced by some other variable
What are the two ways in which variables can be influenced by a third variable?
1. A third variable (C) could influence both variables being examined (A and B).
2. One examined variable (A) could influence 3rd variable (C) which can influence other examined variable (B)
What is the correlation coefficient?
Describes strength and direction of linear association
What are some caveats of correlation?
1. Correlation does NOT imply causation
2. Be careful of 3rd variable
3. Correlation coefficient is a measure of linear association, does not address non-linear relationships
4. Confidence intervals and hypothesis tests are possible, but remember biological significance vs statistical
What kind of ratio is correlation coefficient?
Signal / Noise
What are the assumptions for correlation test?
1. Relationship between x and y is linear
2. Frequency distributions of x and y are separately normal
3. Variance of x doesnt change with y (and vice-versa). Gives circular or elliptical could of points, not
What do you do if correlation assumptions are untrue?
attempt to transform data by taking log (or natural log) of the values.
What does the Spearman's Rank correlation do?
Measures the strength and direction of linear association between ranks of two variables
How is the spearman's rank correlation performed?
Rank both variables from 1 to n, using mid ranks (averages) when you have multiple values at same rank. Then calculate correlation coefficient (r), actually rs in spearman test. compare rs to critical value in stats table.
When would you want to use a spearman's test?
When assumptions fail
What is linear regression?
Method that predicts the value of one numerical variable from that of another.
How is linear regression different from correlation?
The two variables are not treated equally.
What does regression measure?
How steeply y changes with changes in x
How does linear regression try to find the best fit?
Tries to find smallest sum of all squared deviations in y
What can the regression line equation be used for?
To predict values of y for a known x
What are the assumptions of linear regression?
1. Linear relationship between x and y
2. Frequency distribution of y values for each x value is normal
3. Variance of y doesnt change with x
4. Each measured y at a given x is a random sample from a population of y-measurements.
If variance of y doesnt change with x, what will cloud of points look like?
gives circular cloud of points and not a funnel shaped one
How are residuals calculated for linear regression?
Observed y - predicted y
What does the Standard Error of slope measure in linear regression?
Measure of uncertainty of sample estimate of slope
What does the confidence interval of slope measure in linear regression?
Measure of uncertainty of sample estimate of slope
For what part of linear regression can t-tests be done?
slope
What does the use of an ANOVA for linear regression compare?
Variance in residuals with variance for predicted values of y vs mean value of y
What should a plot of residuals look like?
Roughly symmetric with equal variance above and below the y=0 line, with little to no curvature from right to left along x-axis
What can residual plots be used for?
They help in assessing assumptions
What kind of curve does normal distribution have?
A bell shaped curve
What are mean, median, and mode in normal distribution?
They are all the same
What can normal distribution be fully described by?
Its mean and standard deviation
What do all non-parametric tests do do data?
They rank the data
Can you extrapolate based on linear regression?
No, extrapolation is not valid
Why is extrapolation not valid?
You don't know what data might do beyond observed values.
Narrower prediction interval means __ precision?
higher
Wider prediction interval means __ precision?
lower
What are the two ways in which predictions can happen for linear regression?
1. Predict mean y for given x
2. Predict single y for given x
What plot can be done to detect non-normality and unequal variance?
Residual plot
How is a residual plot done?
(y-y^) vs x
y^ - predicted value
What three features should a residual plot have?
1. Roughly symmetric, equal variance above and below y = 0 line.
2. Little to no curvature from left to right along x-axis
3. Approximately equal variance of points above and below line at all values of x.
What is the R^2 value?
Faction of variation in y that is "explained" by x.
What is the equation for R^2?
SS regression / SS total
SS sum of squares
How is SS total calculated?
Formula: SSTotal = Σ(yi - ȳ)²
yi: The observed values of the dependent variable.
ȳ: The mean of the observed values of the dependent variable.
How is SS regression calculated?
SSregression = Σ(ŷi - ȳ)²
ŷi: The predicted values of the dependent variable.
ȳ: The mean of the observed values of the dependent variable.
Why is normal distribution used as an assumption for many statistical test?
It is very common in nature
The normal distribution is ________ around its mean.
symmetric
How are 2/3 of random draws from the normal distribution related to the mean?
2/3 of random draws are within one standard deviation of the mean.
How are 95% of random draws from the normal distribution related to the mean?
95% of random draws are within two standard deviations of the mean
What do the characteristics of the normal distribution allow for?
Ready assessment of probability and statistical analysis.
Many statistical test assumptions are that data or error associated with data is ___________?
Normally distributed
What are the three possibilities when assumptions are untrue?
1. Ignore assumptions
2. Transform data
3. Non-parametric tests
What are non-parametric tests?
Tests that do not require one to assume a certain distribution for raw data.
What are the three common assumptions for statistical analysis?
1. Random samples
2. Normality
3. Equal variance
Which assumptions are tested?
Normality and equal variance
What plot(s) can examine normality?
Histograms
Q-Q plots
What does normal data look like in a histogram?
normal data isnt scewed
What does normal data look like in a Q-Q plot?
Data should follow a relatively straight line, but can wiggle a little.
What plot(s) can be used to assess equal variance?
Boxplots
What does data with equal variance look like on a boxplot?
Interquartile Ranges (boxes) are similar size
Whisker lengths are similar
What hypothesis test can be used to assess normality?
Shapiro-Wilk test
What is the R command for a Shapiro-Wilk test?
shapiro.test()
What is the null hypothesis for a Shapiro-Wilk test?
The data set is normally distributed.
For a p-value less than the significance level (0.05) in a Shapiro-Wilk test what yould you conclude?
Reject null hypothesis (that data is normally distributed) and conclude that the data set is not normally distributed
Most non-parametric data tests use____
ranks
How is data ranked in most non-parametric tests?
From lowest to highest. Lowest gets rank 1, next lowest gets rank 2....etc
What does a Mann-Whitney test do?
Compares central tendencies of two groups using ranks
What is a Mann-Whitney test called in R?
A wilcoxon test
How is a Mann-Whitney test performed?
Data from both groups are ranked together in order and the ranks for all individuals in each groups are summed.
What does U stand for?
The test statistic
What is U?
Number of times an individual from population one has a lower rank than an individual from population two out of all pairwise comparisions.
For a two tailed t-test how many Us are calculated?
two
If more than one U is calculated, what U is picked?
The largest
What is the equation for U?
U = n1n2 * (n1(n1+1)) / (2)) - R1
R1 = sum of all ranks for first group
n1 = sample size of group 1
n2 = sample size of group 2
If U is less than or equal to the critical value what happens?
the null hypothesis can be rejected
Ranks and test statistics are all done in R using the ___?
wilcoxon test
What is the R command for a wilcoxon test?
wilcox.test()
What are the assumptions of the Mann-Whitney U test (wilcoxon test)?
Both samples are random samples
Both populations have the same shape of distribution
What are the assumptions of a correlation test?
1. Linear
2. X and Y normally distributed
3. Constant variance
4. Random samples
What are the options when assumptions for correlation test fail?
Transformation of data, Spearman's rank correlation
When do you use a spearman's rank test?
When data violates normality assumption
When data has outliers
What is an R^2 value and how does it relate to ANOVA?
An R value describes the fraction of variance in a response variable that is accounted for by an explanatory variable. An ANOVA gives the statistical significance
Describe the general approach to problems
1. Determine objective
2. What does data look like? Plot and evaluate is assumptions are true for test you want to do.
3. Are varaibles normally distributed? Does variance change or is it equal?
What is the general approach to regression problems?
Same as approach to problems w/ additional
1. If assumptions aren't true, try to transform data
2. If data can't be transformed, use a nonparametric test (spearman's or wilcoxon)
How do you know whether to use spearman's or wilcoxon test?
If you want to know if two variables are related, use Spearman's.
If you want to know if two groups are different, use Wilcoxon.