MAX 201 Exam #2 Study Guide

What is a research question? A clear and specific question about the social world that can be answered through the collection and analysis of empirical data.

– what makes a good one? – criteria for research questions A good research question must be:

Empirical (not normative)
Generalizable
Clear, focused, and specific
Testable/answerable with data
Possess Theoretical significance
Possess Practical relevance
Possess Originality

Hypotheses (research and null – writing them, what are you assessing with each of them?)

Null Hypothesis (H0): You assume there is no relationship between the variables, and that the differences found in your sample are not found in the population. The analysis is approached by assuming you will fail to reject the null hypothesis.
Research Hypothesis (HR): This states your expectation about the relationship. It is only potentially supported if you reject the null hypothesis (meaning the effect is likely not due to chance).

Identify independent and dependent variables – what is an IV and a DV?

Independent Variable (IV): The predictor variable.
Dependent Variable (DV): The outcome variable.

Deductive research – what is the process and goal? The process involves starting with a theorized and hypothesized focal relationship (your expectation). The goal is to collect and analyze data to see if that expectation (the research hypothesis) is supported.

Types of Studies/Research Design

Trend Study
Cohort Study
Panel Study

Properties of a normal curve/distribution

The sampling distribution of means is always a normal distribution.
Most sample means will fall close to the true population mean.
They are scattered around the mean as in any normal curve (e.g., 68% are within ±1 standard error from the mean).

Sampling – census vs. a sample

Sample: A selection of cases used to estimate a characteristic (statistic) of the population.
Census: Collecting data from the entire population to observe the true effects/relationships.

Sources of error

Sampling Error

Population parameter vs. a statistic

Statistic: A value (e.g., a mean) obtained from a sample.
Population Parameter: The true mean of the population that the statistic is used to estimate.

Estimating sampling means – creating a normal curve and the mean of the sampling distribution of means When you draw many samples, the means (statistics) from those samples form a normal distribution. The mean of this sampling distribution of means is equal to the mean of the population.

Confidence intervals

What they tell you: They indicate how confident you are that the value obtained from your sample is close to the true mean of the population.
Interpreting the intervals: The 95% confidence interval is a range within which you are 95% confident that the population mean will fall.
Comparing groups: If the confidence intervals for two groups DO NOT overlap, you can conclude the groups are different. If they DO overlap, you can't say the groups are different.

Inferential statistics – what are we doing when we use these? We are using the characteristics of the normal distribution to help us make inferences from the sample to the population.

Logic of hypothesis testing (how are we approaching an analysis in relation to the null and research hypotheses?) The approach is to assume nothing is going on (assume H0 is correct). The goal is to determine if there is less than a 5% chance that the null hypothesis is correct.

If the p-value < 0.05, you reject the null hypothesis and conclude that something is going on (support for HR).
If the p-value > 0.05, you fail to reject the null hypothesis and continue to assume nothing is going on.

Analyzing crosstab and Chi-square test for significance with SPSS output

Assessing Significance: If the p-value is less than .05, you can reject the null hypothesis, and the relationship is considered statistically significant (95% confident the differences aren't due to chance).
Interpreting: You first describe the overall distribution of the DV, then compare the categories of the DV across the groups (columns) to demonstrate the pattern in the data, and finally interpret the statistical significance and whether the research hypothesis is supported.

Calculating degrees of freedom in a Chi-square test

df=(number of rows−1)×(number of columns−1)

What is causation? – what three things do you need to show causation? Causation requires that the variables (IV and DV) be empirically correlated or associated with one another. (The slides mention the conditions needed for causality but do not explicitly list the three specific requirements, which are typically correlation, time order, and non-spuriousness).

– what type of research design would you use to show causation (treatment/non-treatment)? You would use a design to compare treatment/non-treatment groups to estimate the effects of a treatment.

Causal links – explanations for the associations we see Causal links are the reasons or mechanisms that explain why two variables are associated, which often involves taking other factors (third variables) into account.

Multivariate analyses A multivariate analysis explores and describes the relationship between two variables while “holding constant” one or more other variables. It determines how the initial bivariate relationship changes or stays the same when a third variable is introduced.

What are spurious and non-spurious relationships?

Spurious Relationship: The initial statistically significant bivariate relationship becomes non-significant when a third variable is introduced. This means the original bivariate relationship was not a true relationship.
Non-Spurious Relationship: The introduction of a control variable (a third variable) does not make the focal relationship non-significant; the relationship remains significant.

What’s a specified relationship? A specified relationship exists when the relationship between the IV and DV is significant for some categories of the control variable, but you fail to reject the null hypothesis for other categories of the control variable.

Interpreting a multivariate crosstab/Chi-square test wish SPSS output (determine if you have a spurious, specified, or non-spurious relationship using SPSS output) Start by looking at the bivariate focal relationship. Then, introduce the third variable (control variable) and see how the focal relationship's p-value changes across the different categories of the control variable.

What are confounding and mediating variables?

Confounding Variables (Confounders): Variables that influence both the IV and the DV, which creates the appearance of a relationship where there may not be a direct one.
Mediating Variables: An additional independent variable that explains the relationship between the original IV and DV.

Interpreting a difference in means t-test using SPSS output You compare the average scores (means) of two groups on a continuous variable. You use the p-value to determine if the difference in means is statistically significant (p<.05). If significant, you reject the null hypothesis and conclude the groups' means are different.

What are we assessing with a t-test and the p-value?

t-Test: Assesses the null hypothesis that there is no difference between two subgroups in the population. It determines if the difference in the sample means is large enough to reject the null hypothesis.
p-value: Assesses how often you would get a test statistic this large when the null hypothesis is true. The goal is to find a p-value that is less than 5%.

Scatterplot and Correlations – what does a Pearson’s r correlation coefficient tell you? The Pearson’s r correlation coefficient is a value between −1 and +1. It tells you the extent to which two variables move or vary together in a predictable way.

Interpret a relationship (direction and strength of association) – identify the relationship on a scatterplot The correlation coefficient indicates the strength and direction of the association. The visual pattern on the scatterplot indicates whether the correlation is weak, moderate, or strong, and whether the direction is positive or negative.

Formula for a regression best fit line

Y=a+b(X)

What does each part of the equation mean?

Y: The Dependent Variable (DV).
a: The Y intercept (constant).
b: The regression coefficient.
X: The Independent Variable (IV).

What can we do with this equation? We can use it for regressions and making predictions.

What is the best fit line? The best fit line is the one that has the smallest differences between the values predicted by the line and the data that were actually observed. It represents the average relationship in the data.

Interpret a regression coefficient – what does the regression coefficient tell us? The regression coefficient (b) tells us that if the independent variable (X) increases by one unit, how much the dependent variable (Y) will change.

Regression analyses – using SPSS output to interpret a regression analysis You check the p-value. If the p-value is less than the significance threshold of .05, the relationship is statistically significant, and you can reject the null hypothesis. The sign (positive or negative) of the coefficient indicates the direction of the relationship.

can you i include all of the formulas included too

I've updated the study guide to explicitly include the formulas found in your slides, as you requested.