1/57
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Logistic Regression
Nonlinear regression model that relates a set of explanatory variables to a dichotomous dependent variable.
Logistic Regression is used when?
When the dependent variable is categorical and binary
Linear Regression predicted Y can Exceed what range?
0 & 1
Logistic Regression predicted y lies within what range?
0 & 1
P =
Occurrences/Chances
Odds =
Occurances/Non-occurances
Odds =
Ratio of the probability of an event and probability of the event not occurring.
Logged odds
Expresses a number as an exponent of a constant or base
Correlation Coefficient - Pearson’s r
-1 to 1
If something’s correlation coefficient is 1 it is?
Perfectly correlated and positive
If something’s correlation coefficient is -1 it is?
Perfectly correlated and negative
If something’s correlation coefficient is 0 it is?
Unrelated
Bivariate Regression conditions
Gives effect of one independent variable on a dependent variable
Gives magnitude and direction
Magnitude
How much change we see on the dependent variable with the independent present
Multiple regression
Gives effect of multiple independent variables on the dependent variables while controlling for the effects of the other independents.
Gives effect of each independent variable while considering the effect of other independents.
Gives the effect of the optimal combination of the independent variables on the dependent variable
Assertion of causality comes when
Temporal sequence
Relationships between variables
No plausible alternate explanations
Multicollinearity in multiple regression models
Situation in which independent variables are so strongly related that it is difficult to estimate the partial effect of each independent variable on the dependent variable
Multicollinearity becomes more of a concern as we increase the number on independent variables
Multicollinearity in multiple regression models
The problem is related to the strength or degree of the relationship between the independent variables.
Regression is robust and able to partial shared variance of the independent variable
Expected that the independent variable will be related to some degree
Interaction effect
Occurs when the effect of an independent variable cannot be fairly summarized by a single partial effect
The effect varies depending on the value of another independent variable in the model
Cross-tabulation rules
Interpret by comparing percentages across columns at the same value of the dependent variable
Always calculate percentages of categories of independent variable and never calculate percentage of dependent variable
Type one error occurs when
Rejecting the ho when it should be accepted
Chi Square does which of the following?
Takes into account tabular data
Begins with cross-tabulation
Determines whether the observed dispersal of cases departs significantly from what we would expect to find if the Ho was correct
Degrees of Freedom are
Maximum logically independent values
If the value of Chi Square is greater than the critical value
It is possible to reject the Ho
If the value of Chi Square is less than the critical value
Accept the Ho
If the value of chi square is greater than the critical value
Reject the Ho
Symmetrical
Does not distinguish between dependent and independent variable
Returns the same result no matter which is used
Take on the same value irrespective of which variable is used to explain a relationship
Asymmetrical
Value varies according to which variable is used to explain a relationship
Distinguishes between the dependent and independent
Returns a different result depending on which is used as the dependent and independent variable
Symmetrical measure of association
Gamma 0 to 1 - ordinal level
Asymmetrical
Lambda 0 to 1 nominal level
Cramer’s V 0 to 1
Somer’s d -1 to 1 ordinal level
Proportional Reduction in Error
Bounded by 0 and 1
Lambda and Cramer’s V
PRE for gauging strength of relationships
Distribution
Shows possible values for a variable and how they occur
How the data are distributed across the range of possible values
Central limit theorem
Makes statistical inference possible
Allows us to draw conclusions about a population based on a sample mean
The means from an infinite number of samples drawn from a population are normally distributed and have a mean equal to the population mean. The standard deviation is equal to the population standard deviation divided by the square root of sample size.
1 standard deviation contains what %
68
2 standard deviation contains what %
95
3 standard deviations contain what %
99.7
T distribution was created by
William Gosset
Null hypothesis
Converse of hypothesis we are trying to support
Symbol for hypothesis
Ho
Alternative hypothesis
What we are trying to support
Alternative hypothesis symbol
Ha
Type two error
failing to reject Ho when it be rejected
Levels of statistical significance come from
P Values
The lower the P value the more or less confidence
More
What are the accepted P Values
.10
.05
.01
If p <.10 what confidence level
90% confidence
If p <.05 what confidence level
95% confidence
If p <.01 what confidence level
99% confidence
If p <.001 what confidence level
99% confidence
If p <.000 what confidence level
99% confidence
Why are difference of mean tests used
Used to determine if there is a statistically significant relationship between the means of two variables.
Z test are used for
Population data
T test are used for
Sample data
T tests can be used with
Paired samples
Independent samples
One sample
Interval and Ratio levels of measurement and central tendancy
Mode median mean
Ordinal levels of measurement and measure of central tendancy
Mode median
Nominal levels of measurement and measures of central tendancy
Mode
Measures of dispersion
Range
Deviations from mean
Variance
Standard deviation