COMS 312 Final Study Guide Notes
Content Analysis
- Definition: Quantitative study of human communication.
- Involves categorizing observed items or units.
- Why it’s useful:
- Efficient for large datasets.
- Provides context.
- Connects experiments and content analysis.
- Myths:
- It's easy.
- Applies to all content examinations.
- Requires no special preparation (coder training, coding rules, and check-ins are needed).
- Only for academic use.
- Goals:
- Generality (theoretical relevance).
- Descriptions (problem/phenomenon).
- Explanation (inferences about creators).
- Uses:
- Compare message prevalence over time.
- Compare message content and real life.
- Analyze message creators.
- Steps:
- Develop a testable proposition.
- Review literature.
- Develop hypotheses/research questions.
- Develop coding instructions/classification system.
- Unit of analysis: Coded or counted discrete thing.
- Sample codebook.
- Define population, sampling units.
- Code messages.
- Intercoder reliability: Agreement among coders.
- Cohen’s Kappa (nominal).
- Scott’s Pi (two coders).
- Krippendorf’s alpha (two or more coders, any measurement type).
- Analyze.
- Interpret.
- IR=1−PAEP<em>AO+P</em>AE
- Strengths:
- Experiments: Casual mechanism determined.
- Surveys: Generalizable, good for variables we cannot manipulate.
- Content Analysis: Not tampering with environment.
- Weaknesses:
- Experiments: Hard to generalize.
- Surveys: Casualty determination not absolute, working with perceptions.
- Content Analysis: Hard to determine source motivations, casual effects on human behavior.
- Intercoder/interrater reliability: Measure of agreement among coders, ensuring coding scheme isn't limited to individual opinions.
- Threats to reliability: Time and resource constraints.
Sampling
- Definition: Selecting events (often people) from a population.
- Population: Universe of events.
- Essentials:
- Sample reflects population variance.
- Generalizability.
- Sample error: Degree sample differs from population.
- Confidence Interval: Range likely to include population parameter.
- Probability Sampling: Equal chance of selection.
- Requires a sampling frame.
- Stratified random sample: Represents known portions (ex. race, gender, age).
- Random Sampling: equal chance of being selected.
- Cluster sampling: Moving through different stages within a sample.
- Non-random sampling (nonprobability):
- Simple convenience sampling (volunteer sampling, exclusion/inclusion criteria).
- Quota sampling: Nonrandom stratified sampling.
- Purposive/known group sampling: Groups with known characteristic.
- Snowball sampling: Participants help recruit.
- Problems with random samples:
- Sometimes impossible.
- Require resources.
- Definition of population.
- Problems with nonrandom sampling:
- Greater bias.
- Limits conclusions.
- Not representative.
Qualitative Research
- Qualitative: soft, flexible, subjective, political, specific, investigative/exploratory, grounded.
- Establish Credibility: triangulation and Member validation.
- What counts as data? Researcher interpretation. Subjective valuing, contingent accuracy.
- Threats to credibility: inaccuracy/incompleteness, improper interpretation, selection bias.
- Field Interviewing: Semi-directed discourse to uncover participant’s point of view
- Steps:
- Conceptualize study and design research questions.
- Design the interview.
- Interview guide (may have probing questions).
- Sufficiency and saturation.
- Conduct interviews.
- Transcription.
- Analysis.
- Verification.
- Final description of analysis.
- Focus Group: Facilitator-led group discussion for data collection.
- Facilitator is key.
- Process:
- Create a focus group schedule
- Select participants.
- Conduct focus groups.
- Analyze data.
- Final description and analysis.
- Example: Colbert Report and Daily Show focus groups.
- Collecting narratives: stories people tell face to face, in surveys, or in conversation.
- Ethnography: Study and representation of people and their interactions. Combines interviews, observations, collecting narratives, and immersion.
Descriptive Statistics
- Why perform statistics?
- Descriptive statistics.
- Inferential statistics.
- Measures of central tendency.
- Measures of variability/dispersion.
- Importance of unbiased estimators.
- Logic of hypothesis testing
Distributions
- A distribution is a way of organizing data to show how frequently each value occurs.
- We are concerned with distribution because it helps us understand:
- Whether data is normally distributed.
- If results can be generalized.
- Whether results are due to chance or not.
- The standard normal curve is a bell-shaped, symmetric distribution where:
- The mean = 0
- The standard deviation = 1
- Predictable percentages of scores fall within 1, 2, and 3 standard deviations (68%, 95%, 99.7%).
Kurtosis and Skewness
- Kurtosis: Refers to the "peakedness" of a distribution.
- Leptokurtic: Tall and thin.
- Platykurtic: Flat and wide.
- Mesokurtic: Normal kurtosis (the standard shape).
- Skewness: Refers to the asymmetry in the distribution.
- Positive skew: Tail is on the right.
- Negative skew: Tail is on the left.
- Probability helps us determine how likely it is that an observed result happened by chance.
- The lower the probability, the more likely the effect is real.
- A common cutoff is p < .05, which means we accept less than a 5% chance the result is due to randomness.
- We use distributions (like the standard normal curve) to understand where a score falls and how probable that score is.
- For example, scores in the tails of the distribution are less probable and may indicate a significant result.
Significance Level
- The significance level (alpha) is the threshold for deciding whether an effect is real.
- Commonly set at 0.05, meaning we accept a 5% chance of being wrong if we reject the null hypothesis.
Critical Region
- The critical region is the part of the distribution where, if a test statistic falls there, we reject the null hypothesis.
- It marks the most extreme values (usually top 5%).
Critical Value
- The critical value is the boundary score that separates the critical region from the rest.
- It depends on the type of test and the alpha level.
- If your test statistic is more extreme than the critical value, you reject the null hypothesis.
Errors in Hypothesis Testing
- Type I Error (α error):
- Rejecting a true null hypothesis.
- Saying there’s an effect when there isn’t.
- Controlled by alpha (usually .05).
- Type II Error (β error):
- Failing to reject a false null hypothesis.
- Saying there is no effect when there actually is one.
- Can be reduced with larger sample sizes or stronger experimental design.
Chi Square Test
- When to Use:
- When comparing frequencies or proportions between categorical variables.
- Commonly used in contingency tables (e.g., gender vs. voting preference).
- Goal:
- To test whether there is a significant association or independence between two categorical variables.
- Assumptions:
- Observations are independent.
- Categories are mutually exclusive.
- Expected frequency in each cell is typically ≥ 5 for validity.
- Limitations:
- Cannot be used with small sample sizes (due to expected count assumptions).
- Only detects association, not causal relationships.
- Assumes nominal-level data — can't be used for ordinal or interval without simplification.
Independent Samples T-Test
- When to Use:
- When comparing the means of two independent groups (e.g., men vs. women on test scores).
- Goal:
- To determine if the difference in means is statistically significant.
- Assumptions:
- Two groups are independent.
- Dependent variable is interval or ratio scale.
- Approximately normally distributed data.
- Homogeneity of variances (Levene’s Test can check this).
- Limitations:
- Sensitive to violations of normality or unequal variances.
- Assumes random sampling.
- Not suitable for more than two groups (ANOVA needed).
Dependent (Paired) Samples T-Test
- When to Use:
- When comparing the means of the same group at two time points (e.g., pre-test and post-test).
- Goal:
- To assess if the mean difference within the same group is statistically significant.
- Assumptions:
- Paired observations (same participants).
- Differences are normally distributed.
- Dependent variable is interval or ratio scale.
- Limitations:
- Only compares two time points or conditions.
- Sensitive to outliers in the difference scores.
- Requires that the measurement conditions are equivalent.
ANOVAS
- Factor: An independent variable in an ANOVA (e.g., gender, treatment group).
- Level: Each condition or category within a factor (e.g., male/female, placebo/low dose/high dose).
- One-Way ANOVA
- When to Use: Use when comparing the means of three or more independent groups on one factor.
- Goal: To determine whether at least one group mean is significantly different from the others.
- Based on a Ratio of Variance: The F statistic compares:
- F=Within−groupvarianceBetween−groupvariance
- Between-group variance = differences due to the treatment/factor.
- Within-group variance = natural variability within each group (error).
- Assumptions:
- Random sampling/assignment.
- The dependent variable is normally distributed in each group.
- Homogeneity of variance (equal variances across groups).
- Planned vs. Unplanned Comparisons
- Planned Comparisons: Specific group differences are hypothesized before analyzing data.
- Unplanned (Post-Hoc) Comparisons: Performed after finding a significant F to determine which groups differ.
- Example: Fisher’s LSD Test compares pairwise group means. If the difference is greater than the LSD, it is considered statistically significant.
- Two-Way ANOVA
- When to Use: When studying the effects of two independent variables (factors) on a dependent variable.
- Goal: To examine:
- The main effect of each factor.
- The interaction effect between the two factors.
- Main Effect
- The effect of one independent variable regardless of the other variable.
- Example: The effect of cohabitation on satisfaction, ignoring musical taste.
- Interaction Effect
- Occurs when the effect of one factor depends on the level of the other factor.
- Example: Shared music taste may increase relationship satisfaction more for cohabitating couples than for non-cohabitating couples.
- Visual Summary (from lecture)
- One-Way ANOVA Variance Partition: Total Variation → Between (Treatment) + Within (Error)
- Two-Way ANOVA Variance Partition: Total Variation → Between Treatments → Factor A → Factor B → Interaction (A × B) → Within Treatments
Correlation
- When do we calculate a correlation, and what is the goal?
- We calculate a Pearson correlation when both variables are interval or ratio level.
- The goal is to determine the strength and direction of a linear relationship between two variables.
- What are the assumptions?
- Both variables (X and Y) are interval or ratio scale.
- There is a random sample of X and Y pairs.
- Both X and Y are normally distributed.
- The relationship is linear.
- What is a linear association?
- A relationship where as one variable increases, the other either increases (positive correlation) or decreases (negative correlation) in a straight-line pattern.
- Example: More study time is associated with higher test scores (positive linear).
- What is a nonlinear association?
- A relationship where the change in one variable does not correspond to a consistent change in the other.
- These relationships cannot be captured by a straight line (e.g., U-shaped or curvilinear patterns).
- How do we interpret the size of a correlation?
- Use the correlation coefficient (r):
- Small: r≈0.1 (r2≈0.01)
- Moderate: r≈0.3 (r2≈0.09)
- Large: r≥0.7 (r2≈0.49 or higher)
- r2 (coefficient of determination) tells us how much of the variation in Y is explained by X.
- What are the limitations of correlation?
- Correlation ≠ Causation.
- It only measures linear relationships.
- Outliers can distort the correlation.
- Cannot account for third variables (confounds).
Regression
- When do we calculate a regression, and what is the goal?
- Use regression when you want to predict the value of a dependent variable (Y) based on one or more independent variables (X).
- The goal is to create an equation (Y=mx+b) that best predicts Y from X.
- What are the assumptions of regression?
- X and Y must be correlated.
- Data must be interval or ratio level.
- The DV is normally distributed.
- The data is homoscedastic (equal variance along the line).
- Predictors are not highly correlated with each other (no multicollinearity).
- What are the advantages of regression?
- Can predict outcomes.
- Allows for multiple predictors.
- Shows the strength and unique contribution of each predictor (via beta weights).
- Combines benefits of correlation and group comparison (like ANOVA).
- What are the limitations of regression?
- Still doesn’t prove causation.
- Sensitive to outliers and violations of assumptions (like heteroscedasticity).
- Requires careful selection of predictors to avoid overfitting or multicollinearity.
Writing Up Results
- Revisit Lit review:
- Method: should describe in detail how your research was executed, description of the research procedures, and descriptive information on the the variables in your study
- Results section:
- The results sections should include
- Statistical test
- Results
- Significance
- Information without interpretation
- All results, even when hypotheses are not supported
- Tables and Graphs: should include a title, supplement text, be referenced in text, useful with complex findings
- Discussion: should
- Interpret the results
- Explain what the results mean and why we should care
- Explain how conclusions confirm/extend/challenge theory
- Explain if the results are lined to the theory (is this consistent with literature or alternative interpretations)
- disect limitations and future directions