Notes on Content Coding, Research Question Design, and Data Sources for Content Analysis

News sources and media influence

  • The session starts from how researchers identify background information from eastern and media bodies as the base for coding; this background informs what codes will be used.
  • Emphasis on not going beyond headings when reading some sources; headings often guide initial understanding but may oversimplify.
  • Historical context: about 28 years ago, people still got news from satirical/commentary sources like The Daily Show, indicating that media consumption has long included a mix of traditional and satirical outlets.
  • Research question motivation: surveys historically asked people where they get news (e.g., Daily Show, Central, Ameripore, Facebook, MySpace, traditional media) to understand differences in news acquisition and how patterns shift across media formats.
  • Key distinction: traditional media vs. other forms of news (satirical shows, social media, etc.) and how news spreads across platforms (including TikTok today).
  • Personal anecdote: campus newspaper (The Spectrum at NDSU) was used to illustrate that headlines are how people encounter university news (e.g., remodeling dining areas).
  • Core takeaway: how we cope with news and how we code it depends on where the data comes from and what we’re trying to understand.

From data to codes: the coding mindset

  • The coding process must be guided by the research question; questions direct what data sources to use and what to look for when coding.
  • The next assignments will build: students will submit research questions, then theory, then content coding to answer those questions.
  • Content coding requires data that can answer the research question; data sources should be chosen to fit the question.
  • Examples of text-based data sources:
    • Transcripts of podcasts (e.g., two-hour episodes) → can be coded for content.
    • Interview transcripts.
    • Social media text: Mastodon, Threads, Truth, TikTok captions/comments, etc. Transcripts may be needed for some platforms.
    • Other text sources depending on question; data choice hinges on question and what you’re trying to find.
  • Data source selection depends on the research question and the nature of what you want to measure (e.g., attitudes, sentiments, behaviors).
  • Example discussion: a question about youth attitudes toward the shutdown of social media could use various sources (TikTok videos, statements, transcripts, etc.).
  • Special note on TikTok: video content lacks straightforward transcripts; you’d need to identify language and possibly extract textual elements; data availability and format influence data collection.
  • The scope of data sources should balance accessibility, relevance, and representativeness for the question.
  • Practical prompt: think about a topic (e.g., shutdown of social media) and discuss what kinds of texts you would code to answer that question; weigh pros/cons of each source.
  • The teacher highlights that the data choice affects the type of findings (results, analysis, conclusions) and the overall interpretation of social behaviors.

Generating research questions: approach and practice

  • Emphasis on generating two research questions (or more) in a small group or individually.
  • Encourage exploring ideas that stand out or are of broad interest, while also considering practical relevance for academic and organizational purposes.
  • The process of generating questions can be informed by patterns seen in daily observations and curiosity (e.g., observations about trends in social behavior, media use, or consumer behavior).
  • Examples discussed during the session, illustrating how questions can arise from personal interest or from current events:
    • How TikTok statements reflect public attitude toward a topic; how to code them as positive/negative or in broad attitude categories.
    • The impact of a topic’s data availability and audience demographics (e.g., TikTok vs The Wall Street Journal) on question feasibility.
    • A thought experiment about two potential questions:
    • At-home nail trends and their relation to economic indicators (unemployment, inflation, etc.).
    • The effect of weather (e.g., a sunny 70-degree day) on productivity and related behaviors.
  • Important design insight: a good research question should be answerable with observable data and facilitate clear operationalization of variables.
  • Operationalization is crucial: you must define how you will measure both independent and dependent variables (e.g., nails trends as a function of local economic conditions).
  • Group activity idea: propose two questions, then discuss what makes a question good and how to improve it; consider definitions, scope, and clarity.
  • Concrete example discussed: two questions emerged around nails trends and economic state; a third example around social media shutdowns was also considered; all require clear constructs and measurable outcomes.
  • The instructor emphasizes moving beyond vague questions (e.g., "What makes students at MSU happy?") by specifying populations, settings, and metrics (e.g., undergraduates vs. graduates; on-campus vs. online experiences; happiness operationalized via a validated scale or observable behaviors).
  • A salient point: the data type determines the analysis path (e.g., observation, documents, interviews, or text data); the data type and question shape each other.
  • Real-world example of research has included advocacy after 9/11 (2001) and the balance of advocacy vs. hate events in newspaper reporting; this illustrates how questions must stay anchored in measurable patterns and be defendable.

Data sources for content analysis: breadth and quality considerations

  • Text-based sources to consider:
    • News articles and press releases.
    • Interview transcripts.
    • Podcast transcripts.
    • Social media posts and comments (Facebook, Mastodon, Threads, TikTok captions, etc.).
    • Other public documents relevant to the question.
  • Platform-specific considerations:
    • TikTok: rich in video content; lacks straightforward text transcripts; require language identification and perhaps manual coding of dialogue.
    • Traditional outlets (e.g., Wall Street Journal): longer-form articles; may provide more formal language and structured data.
    • Emerging platforms (Mastodon, Threads, Truth): different user bases and discourse styles; data access and representativeness differ.
  • Data choice should be aligned with the research question’s target population and topic; for example, TikTok may be better for youth-oriented topics; traditional media may be better for debates or policy discussions.
  • Accessibility and speed: TikTok data can be quickly available and up-to-date; traditional sources may require archival access.
  • Example topics and data alignment:
    • Studying attitudes toward social media shutdowns may benefit from TikTok video content and platform captions rather than only traditional news outlets.
    • Studying media framing of advocacy vs hate after 9/11 may benefit from newspaper articles and editorials across local and national outlets.

Data organization and pre-coding steps

  • Organization basics:
    • Ensure data are in the same format for comparable coding (print out articles, standardize transcripts, etc.).
    • Create a consistent data structure for each unit of analysis (e.g., article, post, transcript segment).
  • Pre-coding workflow:
    • Initial skim: read quickly to gain a general sense of topics and content relevant to the research question.
    • Preliminary close-reading: annotate margins with initial notes or jotting about potential codes and themes.
    • Use preliminary jotting to begin coding; develop a codebook iteratively as patterns emerge.
  • Rationale for multi-pass reading:
    • A single reading may miss important information present in other articles or posts; multiple passes help identify cross-article patterns and ensure codes capture diverse content.
    • Patterns become the color-coded themes tied to the research question; they guide subsequent analysis.
  • Example from practice: the Muslim Pain Advocacy Project coding project involved local and national newspapers and yielded insights about advocacy being more prevalent than hate events; this underscores the importance of not prematurely discarding potentially informative patterns.
  • Practical takeaway: pre-coding is not just labeling; it’s building a structured, reproducible approach to data that supports later analysis and generalization.

Criteria for good research questions

  • Core attributes of a good question:
    • Clarity and definitional precision: avoid ambiguity in constructs and terms (e.g., what exactly counts as a "minority group" or what constitutes a "nail trend").
    • Measurability and operationalization: the question should be answerable using observable data and clearly defined variables.
    • Reproducibility: another researcher should be able to apply the same methods to yield similar results.
    • Specificity about scope: define populations, settings, and time frames (e.g., local economy vs. global economy; undergraduate students at MSU; pandemic-era data).
    • Relevance and significance: the question should offer value to academia, policy, or practical application (e.g., informing organizational strategy or public discourse).
  • Examples of refinement strategies:
    • Break broad questions into more specific components (e.g., economy impact on nail-trend indices at a local level; or whether self-reported happiness correlates with social activity in a university setting).
    • Use constructs with established measures to improve comparability across studies.
    • Consider potential confounders and specify how you will address them in design and analysis.
  • Important caveats:
    • Personal interest can drive engagement but must be balanced with methodological rigor and public relevance.
    • Researchers often need to defend inclusion of certain topics when advisors doubt their relevance; empirical data and plausible theory can help justify such choices.
  • Final note: research questions are not static; they evolve with data access, theoretical framing, and feedback from peers and advisors.

Examples and anecdotes from the session

  • The instructor shares a personal example from 2001–2002 era about advocacy vs. hate incidents following 9/11; coding revealed advocacy events outnumbered hate-related events, illustrating the value of empirical data in challenging common narratives.
  • A reflective anecdote about the consistency and clarity of language, constructs, and definitions, highlighting the importance of avoiding ambiguity in research questions and coding schemes.
  • A long digression on nails and beauty trends used to illustrate how everyday interests can inspire research questions, but these topics still require clear operational definitions and measurable data (e.g., economic indicators linked to at-home nail trends).
  • The class discussion includes a current events example: debates over the shutdown or ban of TikTok in certain jurisdictions and the data sources one might use to study its impact on information spread or public opinion.
  • The session encourages students to think about practical data-gathering strategies, including the potential to use transcripts from podcasts or interviews to build text datasets for coding.

Practical guidance for assignments and next steps

  • You will submit two research questions (or more) and then discuss in groups what makes a good research question and how to improve it.
  • You should consider how the data source you choose will support the question and what the coding scheme will look like (e.g., coding for sentiment, attitude, or topics).
  • After proposing questions, you will refine them to ensure they are measurable and reproducible, with clear definitions of constructs and units of analysis.
  • Remember to think about the potential data types and to justify why a given source (TikTok vs traditional outlets) is appropriate for your question.
  • The instructor emphasizes that the data and questions should be mutually supportive: the question should guide data collection, and data availability should guide the question formulation.
  • Final emphasis: prepare questions with a clear plan for data organization, coding approach, and analysis pathway, so your work can be evaluated for rigor and reproducibility.

Ethical, philosophical, and practical implications

  • Language and interpretation: ambiguous terms can lead to misclassification; clear definitions protect against misinterpretation and ensure replicability.
  • Representation and bias: data sources reflect certain populations more than others; be mindful of who is included or excluded in the data and how that shapes findings.
  • The value of replication: systematic organization and pre-coding steps facilitate reproducibility and allow others to verify results.
  • Personal interest vs broader impact: while personal curiosity can fuel good questions, justification to others (advisors, funders) is crucial for support and relevance.
  • Practical considerations: platform-specific data access, data length, and the presence or absence of transcripts influence how you design the study and what you can measure.
  • The overarching goal is to produce a robust, transparent, and useful study that can inform understanding of social behaviors and media effects in real-world contexts.