COMS 312 Final Study Guide Notes

Content Analysis

Definition: Quantitative study of human communication.
Involves categorizing observed items or units.
Why it’s useful:
- Efficient for large datasets.
- Provides context.
- Connects experiments and content analysis.
Myths:
- It's easy.
- Applies to all content examinations.
- Requires no special preparation (coder training, coding rules, and check-ins are needed).
- Only for academic use.
Goals:
- Generality (theoretical relevance).
- Descriptions (problem/phenomenon).
- Explanation (inferences about creators).
Uses:
- Compare message prevalence over time.
- Compare message content and real life.
- Analyze message creators.
Steps:
- Develop a testable proposition.
- Review literature.
- Develop hypotheses/research questions.
- Develop coding instructions/classification system.
  - Unit of analysis: Coded or counted discrete thing.
  - Sample codebook.
- Define population, sampling units.
- Code messages.
  - Intercoder reliability: Agreement among coders.
    - Cohen’s Kappa (nominal).
    - Scott’s Pi (two coders).
    - Krippendorf’s alpha (two or more coders, any measurement type).
- Analyze.
- Interpret.
$IR= \frac{P<em>{AO} + P</em>{AE}}{1 - P_{AE}}$
Strengths:
- Experiments: Casual mechanism determined.
- Surveys: Generalizable, good for variables we cannot manipulate.
- Content Analysis: Not tampering with environment.
Weaknesses:
- Experiments: Hard to generalize.
- Surveys: Casualty determination not absolute, working with perceptions.
- Content Analysis: Hard to determine source motivations, casual effects on human behavior.
Intercoder/interrater reliability: Measure of agreement among coders, ensuring coding scheme isn't limited to individual opinions.
Threats to reliability: Time and resource constraints.

Sampling

Definition: Selecting events (often people) from a population.
Population: Universe of events.
Essentials:
- Sample reflects population variance.
- Generalizability.
- Sample error: Degree sample differs from population.
Confidence Interval: Range likely to include population parameter.
Probability Sampling: Equal chance of selection.
- Requires a sampling frame.
- Stratified random sample: Represents known portions (ex. race, gender, age).
- Random Sampling: equal chance of being selected.
- Cluster sampling: Moving through different stages within a sample.
Non-random sampling (nonprobability):
- Simple convenience sampling (volunteer sampling, exclusion/inclusion criteria).
- Quota sampling: Nonrandom stratified sampling.
- Purposive/known group sampling: Groups with known characteristic.
- Snowball sampling: Participants help recruit.
Problems with random samples:
- Sometimes impossible.
- Require resources.
- Definition of population.
Problems with nonrandom sampling:
- Greater bias.
- Limits conclusions.
- Not representative.

Qualitative Research

Qualitative: soft, flexible, subjective, political, specific, investigative/exploratory, grounded.
Establish Credibility: triangulation and Member validation.
What counts as data? Researcher interpretation. Subjective valuing, contingent accuracy.
Threats to credibility: inaccuracy/incompleteness, improper interpretation, selection bias.
Field Interviewing: Semi-directed discourse to uncover participant’s point of view
- Steps:
  1. Conceptualize study and design research questions.
  2. Design the interview.
    - Interview guide (may have probing questions).
    - Sufficiency and saturation.
  3. Conduct interviews.
  4. Transcription.
  5. Analysis.
  6. Verification.
  7. Final description of analysis.
Focus Group: Facilitator-led group discussion for data collection.
- Facilitator is key.
- Process:
  1. Create a focus group schedule
    - Funnel.
  2. Select participants.
    - Over recruit.
  3. Conduct focus groups.
  4. Analyze data.
  5. Final description and analysis.
  - Example: Colbert Report and Daily Show focus groups.
Collecting narratives: stories people tell face to face, in surveys, or in conversation.
Ethnography: Study and representation of people and their interactions. Combines interviews, observations, collecting narratives, and immersion.

Descriptive Statistics

Why perform statistics?
Descriptive statistics.
Inferential statistics.
Measures of central tendency.
Measures of variability/dispersion.
Importance of unbiased estimators.
Logic of hypothesis testing

Distributions

A distribution is a way of organizing data to show how frequently each value occurs.
We are concerned with distribution because it helps us understand:
- Whether data is normally distributed.
- If results can be generalized.
- Whether results are due to chance or not.
The standard normal curve is a bell-shaped, symmetric distribution where:
- The mean = 0
- The standard deviation = 1
- Predictable percentages of scores fall within 1, 2, and 3 standard deviations (68%, 95%, 99.7%).

Kurtosis and Skewness

Kurtosis: Refers to the "peakedness" of a distribution.
- Leptokurtic: Tall and thin.
- Platykurtic: Flat and wide.
- Mesokurtic: Normal kurtosis (the standard shape).
Skewness: Refers to the asymmetry in the distribution.
- Positive skew: Tail is on the right.
- Negative skew: Tail is on the left.
Probability helps us determine how likely it is that an observed result happened by chance.
The lower the probability, the more likely the effect is real.
A common cutoff is p < .05, which means we accept less than a 5% chance the result is due to randomness.
We use distributions (like the standard normal curve) to understand where a score falls and how probable that score is.
For example, scores in the tails of the distribution are less probable and may indicate a significant result.

Significance Level

The significance level (alpha) is the threshold for deciding whether an effect is real.
Commonly set at 0.05, meaning we accept a 5% chance of being wrong if we reject the null hypothesis.

Critical Region

The critical region is the part of the distribution where, if a test statistic falls there, we reject the null hypothesis.
It marks the most extreme values (usually top 5%).

Critical Value

The critical value is the boundary score that separates the critical region from the rest.
It depends on the type of test and the alpha level.
If your test statistic is more extreme than the critical value, you reject the null hypothesis.

Errors in Hypothesis Testing

Type I Error (α error):
- Rejecting a true null hypothesis.
- Saying there’s an effect when there isn’t.
- Controlled by alpha (usually .05).
Type II Error (β error):
- Failing to reject a false null hypothesis.
- Saying there is no effect when there actually is one.
- Can be reduced with larger sample sizes or stronger experimental design.

Chi Square Test

When to Use:
- When comparing frequencies or proportions between categorical variables.
- Commonly used in contingency tables (e.g., gender vs. voting preference).
Goal:
- To test whether there is a significant association or independence between two categorical variables.
Assumptions:
1. Observations are independent.
2. Categories are mutually exclusive.
3. Expected frequency in each cell is typically ≥ 5 for validity.
Limitations:
- Cannot be used with small sample sizes (due to expected count assumptions).
- Only detects association, not causal relationships.
- Assumes nominal-level data — can't be used for ordinal or interval without simplification.

Independent Samples T-Test

When to Use:
- When comparing the means of two independent groups (e.g., men vs. women on test scores).
Goal:
- To determine if the difference in means is statistically significant.
Assumptions:
1. Two groups are independent.
2. Dependent variable is interval or ratio scale.
3. Approximately normally distributed data.
4. Homogeneity of variances (Levene’s Test can check this).
Limitations:
- Sensitive to violations of normality or unequal variances.
- Assumes random sampling.
- Not suitable for more than two groups (ANOVA needed).

Dependent (Paired) Samples T-Test

When to Use:
- When comparing the means of the same group at two time points (e.g., pre-test and post-test).
Goal:
- To assess if the mean difference within the same group is statistically significant.
Assumptions:
1. Paired observations (same participants).
2. Differences are normally distributed.
3. Dependent variable is interval or ratio scale.
Limitations:
- Only compares two time points or conditions.
- Sensitive to outliers in the difference scores.
- Requires that the measurement conditions are equivalent.

ANOVAS

Factor: An independent variable in an ANOVA (e.g., gender, treatment group).
Level: Each condition or category within a factor (e.g., male/female, placebo/low dose/high dose).
One-Way ANOVA
- When to Use: Use when comparing the means of three or more independent groups on one factor.
- Goal: To determine whether at least one group mean is significantly different from the others.
- Based on a Ratio of Variance: The F statistic compares:
- $F = \frac{Between-group variance}{Within-group variance}$
- Between-group variance = differences due to the treatment/factor.
- Within-group variance = natural variability within each group (error).
- Assumptions:
  - Random sampling/assignment.
  - The dependent variable is normally distributed in each group.
  - Homogeneity of variance (equal variances across groups).
Planned vs. Unplanned Comparisons
- Planned Comparisons: Specific group differences are hypothesized before analyzing data.
- Unplanned (Post-Hoc) Comparisons: Performed after finding a significant F to determine which groups differ.
  - Example: Fisher’s LSD Test compares pairwise group means. If the difference is greater than the LSD, it is considered statistically significant.
Two-Way ANOVA
- When to Use: When studying the effects of two independent variables (factors) on a dependent variable.
- Goal: To examine:
  - The main effect of each factor.
  - The interaction effect between the two factors.
Main Effect
- The effect of one independent variable regardless of the other variable.
- Example: The effect of cohabitation on satisfaction, ignoring musical taste.
Interaction Effect
- Occurs when the effect of one factor depends on the level of the other factor.
- Example: Shared music taste may increase relationship satisfaction more for cohabitating couples than for non-cohabitating couples.
Visual Summary (from lecture)
- One-Way ANOVA Variance Partition: Total Variation → Between (Treatment) + Within (Error)
- Two-Way ANOVA Variance Partition: Total Variation → Between Treatments → Factor A → Factor B → Interaction (A × B) → Within Treatments

Correlation

When do we calculate a correlation, and what is the goal?
- We calculate a Pearson correlation when both variables are interval or ratio level.
- The goal is to determine the strength and direction of a linear relationship between two variables.
What are the assumptions?
1. Both variables (X and Y) are interval or ratio scale.
2. There is a random sample of X and Y pairs.
3. Both X and Y are normally distributed.
4. The relationship is linear.
What is a linear association?
- A relationship where as one variable increases, the other either increases (positive correlation) or decreases (negative correlation) in a straight-line pattern.
- Example: More study time is associated with higher test scores (positive linear).
What is a nonlinear association?
- A relationship where the change in one variable does not correspond to a consistent change in the other.
- These relationships cannot be captured by a straight line (e.g., U-shaped or curvilinear patterns).
How do we interpret the size of a correlation?
- Use the correlation coefficient (r):
  - Small: $r ≈ 0.1$ ( $r^2 ≈ 0.01$ )
  - Moderate: $r ≈ 0.3$ ( $r^2 ≈ 0.09$ )
  - Large: $r ≥ 0.7$ ( $r^2 ≈ 0.49$ or higher)
- $r^2$ (coefficient of determination) tells us how much of the variation in Y is explained by X.
What are the limitations of correlation?
- Correlation ≠ Causation.
- It only measures linear relationships.
- Outliers can distort the correlation.
- Cannot account for third variables (confounds).

Regression

When do we calculate a regression, and what is the goal?
- Use regression when you want to predict the value of a dependent variable (Y) based on one or more independent variables (X).
- The goal is to create an equation ( $Y = mx + b$ ) that best predicts Y from X.
What are the assumptions of regression?
1. X and Y must be correlated.
2. Data must be interval or ratio level.
3. The DV is normally distributed.
4. The data is homoscedastic (equal variance along the line).
5. Predictors are not highly correlated with each other (no multicollinearity).
What are the advantages of regression?
- Can predict outcomes.
- Allows for multiple predictors.
- Shows the strength and unique contribution of each predictor (via beta weights).
- Combines benefits of correlation and group comparison (like ANOVA).
What are the limitations of regression?
- Still doesn’t prove causation.
- Sensitive to outliers and violations of assumptions (like heteroscedasticity).
- Requires careful selection of predictors to avoid overfitting or multicollinearity.

Writing Up Results

Revisit Lit review:
Method: should describe in detail how your research was executed, description of the research procedures, and descriptive information on the the variables in your study
Results section:
- The results sections should include
  - Statistical test
  - Results
  - Significance
  - Information without interpretation
  - All results, even when hypotheses are not supported
- Tables and Graphs: should include a title, supplement text, be referenced in text, useful with complex findings
Discussion: should
- Interpret the results
- Explain what the results mean and why we should care
- Explain how conclusions confirm/extend/challenge theory
- Explain if the results are lined to the theory (is this consistent with literature or alternative interpretations)
- disect limitations and future directions