1/45
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
general null hypothesis definition
a statement that the value of a population parameter is equal to some value
general alternative hypothesis definition
a statement that the parameter has a value that somehow differs from the null hypothesis
general null hypothesis symbolic notation
H0: mean = expected value
general alternative hypothesis symbolic notation
HA: mean =/ (does not equal) expected value
steps of a hypothesis test
1. establish null and alt. hypotheses
2. select a significance level
3. select an appropriate statistical test
4. collect sample and summarize the data into a test statistic
5. decide whether the result is statistically significant based on critical and/or p-value
alpha level definition
-aka significance level
-the probability value used as the cutoff for determining when the evidence is significant against the null hypothesis
-the probability of mistakenly rejecting the null hypothesis when it is true (type 1 error)
- = 1 - confidence levels
-commonly 0.01, 0.05, and 0.10
test statistic definition
-a value used in making a decision about the null hypothesis
-found by converting a sample statistic to a score with the assumption that the null hypothesis is true
-different for each statistical test used
-calculated by technology when running the particular statistical test
p-value definition and usage
-tells you how likely it is that the test statistic could have occurred under the null hypothesis
-the probability of getting a value of the test statistic that is at least as extreme as the actual calculated test statistic, assuming that the null hypothesis is true
-used to decide if a test is statistically significant
-p-value > alpha: fail to reject null hypothesis
-p-value < alpha: reject the null hypothesis
one-sample t-test in R
-t.test(x, mu, alternative)
-x = sample values
-mu = known population mean or value of comparison
-alternative = "less", "greater", or "two.sided"
independent t-test assumptions, usage, and distinguishing features
-compares the mean scores of 2 different GROUPS of people or conditions
-independent groups- the measurement of an individual in one group is unrelated to measurements in the second group
-observations are independent and randomly selected
-populations are normally distributed or n > 30
-test for normality using quantile-quantile plot of dependent variable
independent t-test in R
-t.test(x, y, alternative)
-x = dependent variable
-y = group
-alternative = "less", "greater", or "two.sided"
paired t-test assumptions, usage, and distinguishing features
-compares the mean scores for the same group of people on 2 different OCCASIONS
-simple random sampling is used
-the 2 groups of data are dependent
-the differences of individuals between 2 observations approximately follow a normal distribution or n > 30
-test for normality using quantile-quantile plot of difference between groups
paired t-test in R
-t.test(after, before, alternative, paired)
-after - dataset$after
-before - dataset$before
-alternative - "less", "greater", or "two.sided"
-paired = TRUE
generic independent t-test null and alternative hypotheses
-H0: mean(difference) = 0
-HA: mean(difference) < or > 0
generic paired t-test null and alternative hypotheses
-H0: mean(difference) = 0
-HA: mean(difference) < or > 0
what is the difference in using a comma or a tilda to separate the data sample inputs in R
-comma: x - y = ?
-tilda: alphabetical
chi-squared goodness of fit test variables, purpose, and requirements
-counts of categorical variables (frequencies)
-frequency tables (2 columns)
-used to test the hypothesis that an observed frequency distribution fits some expected distribution
-the sample data consists of frequency counts for each of the different categories
-the data are randomly selected
-for each category, the expected frequency is at least 5
chi-squared tests of independence variables, purpose, and requirements
-counts of categorical variables (frequencies)
-contingency tables (3+ columns) which consist of frequency counts of categorical data corresponding to 2 different variables
-test whether or not the row and column variables are independent
-the sample data are randomly selected
-the sample data are represented as frequency counts in a two-way table
-for every cell in the contingency table, the expected frequency (E) is at least 5
chi-squared distribution basic features
-continuous distribution
-minimum value of 0
-right-skewed
-shape determined by degrees of freedom (# of categories - 1)
-mean is the same as the degrees of freedom
generic chi-squared goodness of fit null and alternative hypotheses
-H0: the frequency counts agree with the expected distribution
-HA: the frequency counts do not agree with the expected distribution
generic chi-squared tests of independence null and alternative hypotheses
-H0: success or failure are independent of treatment type
-HA: success or failure are dependent on treatment type
chi-squared goodness of fit test in R
-chisq.test(x, p)
-x = a vector with the observed values
-p (optional) = a vector of probabilities, if the expectation is NOT that all probabilities are equal
making a contingency table in R
-(x, ncol)
-x = a vector with the observed values, entered one column at a time, starting with the left-most column
-ncol = number of columns
chi-squared tests of independence in R
-chisq.test(x)
-x = a matrix with the observed values
calculating expected values for a chi-squared test of independence
E = (row total * column total) / (grand total)
what are the main 2 problems of performing multiple t-tests?
-time consuming
-increase false positive probability
generic ANOVA null and alternative hypotheses
-H0: the means of all groups are equal
-HA: at least one group's mean is different
changes in variance effect on the F-statistic
-F = (variance between groups) / (variance within the groups)
-F = (n * s (2/x) / (s^2 pooled)
-greater variance between groups increases F
-greater variance within the groups decreases F
ANOVA assumptions
-random and independent observations
-assumption of normality or n > 30
-test normality using quantile-quantile plots
-assumption of homogeneity of variance
-test homogeneity of variance using Bartlett's test
Bartlett's test in R
-bartlett.test(y ~ x)
-y = the dependent variable (numbers)
-x = the independent variable (category names)
-P < 0.05 indicates that the groups do NOT have the same variance
ANOVA in R
-Bartlett's test
-aov(y ~ x)
-y = dependent variable (numbers)
-x = independent variable (category names)
-save ANOVA as an object
-summary(ANOVA object)
-Tukey's Honest Significant Different test
Tukey's Honest Significant Different Test in R (posthoc)
-TukeyHSD(ANOVA object)
Pearson correlation distinguishing features and assumptions
-provides information on the strength and direction (positive or negative) of a linear relationship between 2 variable
-no causation inferred
-effects on variables are symmetric
-data are independent and random
-variables are quantitative
-both variables are normally distributed
-correlation coefficient alone cannot tell us if results are significant
Linear regression distinguishing features and assumptions
-provides an explicit formula for a line that can be used to predict values of one variable based on the other
-provides a robust measure of "fit"
-only accounts for uncertainty in the dependent variable
-often interpreted as one variable causing an effect on the other
-data are random and independent
-residuals are normally distributed
-homogeneity of variance
-check assumptions at the end of test
Pearson correlation coefficients
-a correlation exists between 2 variables when the values of one variable are somehow associated with the values of the other variable
-a linear correlation exists between 2 variables when there is a correlation and the plotted points of paired data result in a patter than be approximated by a straight line
-r is always between -1 to 1
-r measures the strength of a linear relationship
-r is sensitive to outliers
generic Pearson correlation test null and alternative hypotheses
-H0: rho = 0
-HA: rho =/ (does not equal) 0
Pearson correlation test in R
-cor.test(y ~ x)
--both variables need to be numeric vectors of equal length
parameters of a linear regression model
-y = b0 + bi * x
-b0 = the intercept
-bi = the slope
residuals are the vertical distances of the data to the value predicted by the line
-linear regression fits a line to the data that minimizes the squares of the residuals
interpret R2 and recognize residuals
-R2 = 1 - [sum of (yi -y^)^2 / sum of (yi - ybar)^2]
-y^ = predicted value
-ybar = mean value
-numerator = sum of squared residuals
-denominator = sum of squared distances from mean
-R2 = (variance explained by line) / (total variance)
-R2 states what percent of variance is explained by a certain variable
generic linear regression null and alternative hypotheses
-H0: b1 = 0 (slope of best fit line is 0)
-HA: b1 =/ (does not equal) 0 (slope of best fit line is not 0)
linear regression in R
-lm(y ~ x)
-y = dependent variable
-x = independent variable
-save as model
-summary(model) to see p-value
checking linear regression test assumptions
-check after testing
-plot(model)
-Residual graph: want to see an even cloud across 0
-QQ graph: follow the QQline
what kind of variables do chi-squared test use
frequency counts of categories
what kind of variables do linear regression and correlation use
numerical variables to each other
what kind of data do paired t-test, independent t-test, and ANOVA us
population means to each other
what data does z-test and one-sample t-test use
unknown population mean compared to known population mean or set value