Quantitative Analysis
Arizona College of Social & Behavioral Sciences
Department of Communication
Quantitative Analysis of Text
Title: Turning Messages into Numbers.
Research Questions
Focus areas of inquiry in quantitative analysis of texts:
Do presidential speeches contain more religious rhetoric during times of war?
Does programming on cable television contain more sexual explicitness than network programming?
What kinds of advertising appear during children’s television programming?
Is the editorial content of national news organizations more or less liberal than that of regional or local news?
Does group interaction exhibit reliable communication patterns?
What are the communication differences between distressed and non-distressed couples?
Introduction to Quantitative Analysis of Text
The study of texts/messages is central to the communication discipline.
The primary problem is converting messages/content into quantifiable numbers.
This contrasts with qualitative research, which typically focuses on non-numeric data.
The term "message" is multifaceted:
Often refers to intentional, persuasive communication (e.g., political advertisements).
Can include elements from television scripts, editorials, speeches, or interpersonal interactions.
Two primary methods of data collection and analysis:
Content Analysis.
Interaction Analysis.
Content Analysis
A combination of data collection and analytical technique aimed at inferring communication phenomena by examining specific characteristics of messages and their distribution.
Types of content:
Manifest content: the actual words spoken/written.
Latent content: the underlying meanings of those words.
Example: Analyzing the hypothesis that rock song lyrics have become more sexually explicit over time, illustrated with lyrics from the Rolling Stones' song "Beast of Burden."
Content Analysis as Social Science
Objective: Must strictly adhere to established rules and procedures.
The research question or hypothesis dictates the entire study's framework.
Involves:
Sampling procedure: A method of selecting what content will be analyzed.
Rules for coding content: Standardized approaches to categorizing data.
Should be:
Reliable: If followed correctly, similar results should ensue.
Systematic: Contents are organized based on clear criteria.
Generalizable: Findings should have theoretical significance.
Basic Principles of Content Analysis
Messages can be categorized into classes where elements share similar meanings or functions.
The number of categories is contingent on theoretical frameworks and the data itself.
Categories yield frequency counts, facilitating comparisons across different texts.
Some coding approaches utilize continuous scales, such as assessing persuasiveness or language intensity.
What Content Can Be Analyzed?
Anything that can be captured in data form:
Sources, senders, or recipients of messages.
Reasons for message dispatch (e.g., motivations behind breakup texts).
Types/messages through various communication technologies followed by modality choice.
Content can be analyzed at surface level or deep functional levels (e.g., effects of messages).
Nonverbal elements, visual cues, and audio signals may also contribute to analysis.
Content Analysis Process
Develop a hypothesis or research question requiring content analysis.
Select messages for analysis, addressing sampling issues.
Select coding categories and units for analysis.
Create protocols for resolving coding discrepancies (establish necessary agreements).
If not all messages can be analyzed, sample messages from the larger set to ensure quality and relevance.
Code messages into categories based on established rules.
Interpret the results derived from the coding process.
Selecting What to Code
Identify where messages of interest can be found (e.g., political ads, television programming).
Narrow data set to focus on relevant elements,
Consider how many lyrics or speeches to include in the coding process.
Sampling may still be necessary, especially if the population of messages is extensive.
Acknowledge that structural characteristics of messages can affect sampling strategies (e.g., news broadcast segments).
Developing Content Categories
Categories can emerge theoretically (deductive) or inductively:
What was said and how it was presented.
Categories need to be:
Exhaustive: Cover all possibilities.
Equivalent: Make sure the categories are comparable.
Mutually Exclusive: Ensure no overlaps occur. Be cautious with catchall categories like “Other.”
Unit of Analysis
Defined as a discrete element that becomes coded and counted:
Typical units in communication include complete thoughts, opinions/sentences, thematic content, etc.
Training Coders
A coder is an individual who categorizes content based on defined criteria.
At least two coders are required, all must undergo training to ensure consistency in outcomes.
A well-documented coding system with rules must be created to guide the process.
Initial practice coding on similar texts helps achieve consistency.
Coding Reliability and Validity
Intercoder Reliability: Evaluates agreement between coders regarding units counted - Guetzkow’s U index measures the level of disagreement, where zero indicates no disagreement.
Unitizing Reliability: Looks at the accuracy of unit identification; includes the Kappa statistic, which corrects for chance.
Validity: Assesses the appropriateness of coding schemes for the set of messages analyzed.
Interpreting Coding Results
Ensure the analysis is pertinent to the original hypothesis or research question.
Look for:
Frequencies of coded content.
Differences across categories or groups.
Emerging trends and patterns in the data.
The Role of Computers in Content Analysis
Certain aspects of content analysis can be efficiently carried out using computer programs.
Example: Kevin Coe’s study of religious rhetoric in presidential speeches utilized computerized text analysis.
Limitations include a lack of human-like understanding or interpretative capabilities when analyzing language and context.
Word Clouds in Content Analysis
Visual representations of the frequency of words, such as those derived from random tweets or social media content.
Strengths and Limitations of Content Analysis
Strengths:
Data is closely aligned with communicators.
Unobtrusive nature of the analysis.
Versatile across various text/message structures.
Limitations:
Messages not captured cannot be analyzed.
Coding schemes might miss subtle nuances present in the messages.
Selection processes may not always represent the larger population adequately.
Interaction Analysis
Involves coding the interactions between two or more individual communicators, focusing on both verbal and nonverbal elements.
This approach allows for in-depth analysis of the intent, function of messages, and their effects over time.
Example: The script from "12 Angry Men" exemplifies the type of analysis that can be applied to real interactions.
Preparing and Coding Interaction
Record interaction audio/video and transcribe it.
Train coders to understand the analysis components.
Ensure interactions are unitized for consistent coding.
Conduct reliability calculations for the coding of the interaction.
Resolve any discrepancies in coding among coders.
Examples of Interaction Analysis Coding
Example I:
Speaker: Tom says it’s better to go to Harvard. (Proposition)
Speaker: Terry asserts about degree options, leading to elaboration.
Example II:
Wife initiates control by saying they don’t do much as a family.
Husband questions her meaning, illustrating a dynamic of control and submission.
Example Questions from Interaction Analysis using the Ubuntu Corpus
inquiry about user prompt locations.
Response and back-and-forth communication demonstrating resolution of user conflict with command lines.
Analyzing and Interpreting Coded Data
Return focus to the research question or hypothesis for analysis context.
Frequency analysis is a common method to detect trends and derive insights that may not be evident through simple counts.
Machine Learning and Content Analysis
Machine Learning (ML):
Defined as the field that enables computers to learn independently from human programming.
Facilitates the analysis of communication content through predefined and emergent patterns.
Types of ML:
Supervised learning: Machines trained on human-coded data.
Unsupervised learning: Machines discovering patterns without external guidance.
Application: Success in instances like Google searches, where algorithms gather and process human language data.
AI in Content/Interaction Analysis
Generative AI is increasingly relevant for content analysis tasks.
Capable of coding messages either deductively or inductively.
Options for instruction include zero-shot prompting (minimal clues) and few-shot prompting (includes examples).
Effectiveness varies by the capabilities of the large language model (LLM) in remembering previous content and focus settings (e.g., conversational vs. technical coding).