Week 11. Content analysis, big data analysis

0.0(0)
studied byStudied by 1 person
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/46

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

47 Terms

1
New cards

What is the key difference between experiments/surveys and content analysis?

Experiments and surveys elicit new data, whereas content analysis studies existing communicative behavior.

2
New cards

What implications does content analysis have for sampling?

You sample content, not people.

3
New cards

Why does content analysis often have high ecological validity?

Because it studies behavior that occurs naturally in the real world.

4
New cards

Why is reliability crucial in content analysis?

Because content must be interpreted as objectively and consistently as possible.

5
New cards

What is content analysis (Berelson, 1952)?

quantitative, systematic, and objective technique for describing the manifest content of communication.

6
New cards

What does “quantitative” mean in content analysis?

Counting how often something occurs.

7
New cards

What does “systematic” mean in content analysis?

Using clear, predefined rules for sampling and analysis.

8
New cards

What does “objective” mean in content analysis?

Coding rules are unambiguous and not dependent on personal interpretation.

9
New cards

What is “manifest content”?

Content that is directly observable (e.g., words, images), usually with the goal of uncovering some latent concept

10
New cards

What is “latent content”?

Underlying meanings or constructs inferred from manifest content (e.g., framing, values).

11
New cards

What are the 8 steps in content analysis?

  1. Develop a hypothesis.

  2. Define the content to be analyzed (e.g., text only, no photos). → set boundries

  3. Sample the content (e.g., 300 dating profiles – how much data to analyze). → Pick a sample

  4. Select units for coding (e.g., individual words or sentences). → Select what piece

  5. Develop a coding scheme (the rules in a "codebook"). → select which piece of the text you’re going to count, and create clear rules to categorize data. 

  6. Code the units: Apply the coding rules systematically

  7. Count occurrences: Analyse how often specific elements appear.

  8. Report the results: Share findings, using visuals and explanations.

12
New cards

What is a theory-driven content analysis approach?

Coding categories are based on existing theory. You select specific (categories of) words and count how often they were used based on these models.

13
New cards

Give examples of theory-driven word categories.

Body/sexuality words, status words, emotion words, pronouns (I, you, we).

14
New cards

What is a data-driven content analysis approach?

Using algorithms or machine learning to identify patterns without predefined categories.

15
New cards

What is a strength of the data-driven approach?

It can reveal unexpected patterns.

16
New cards

What is LIWC?

A dictionary-based tool that automatically counts words in predefined categories.

17
New cards

What is a key advantage of dictionary-based approaches?

High reliability and consistency.

18
New cards

What is a key disadvantage of dictionary-based approaches?

What is a key disadvantage of dictionary-based approaches?

19
New cards

What does it mean that coding categories must be exhaustive?

Every coding unit must fit into a category.

20
New cards

What does it mean that coding categories must be exclusive?

Each unit can belong to only one category.

21
New cards

What is intercoder reliability?

The extent to which different coders agree on coding decisions.

22
New cards

What does validity mean in content analysis coding?

Categories accurately represent the theoretical construct of interest.

23
New cards

What are the “three V’s” of big data? And what is big data?

Volume, Variety, Velocity.

Big data involves huge digital archives of "traces" left by spontaneous human behavior, often online.

24
New cards

What does “Volume” mean in big data?

Extremely large datasets covering many observations.

25
New cards

What does “Variety” mean in big data?

Different data types (text, images, audio, video).

26
New cards

What does “Velocity” mean in big data?

Data are generated and available rapidly, often in real time.

27
New cards

What is the fourth V of big data, and what does it mean?

Veracity.

Data quality: the data is accurate and truthful.

Interpretability: looking at everyday behaviour and not unrealistic settings.

28
New cards

Are big data studies usually exploratory or confirmatory?

Exploratory.

29
New cards

Does big data research focus more on induction or deduction?

Induction.

30
New cards

Does big data research usually test causality?

No, it focuses mainly on correlations.

31
New cards

What are Opportunities in Big Data Research?

  1. Big data may include rare phenomena and hard-to-reach populations.

  1. Reduces the risk of error and bias associated with small samples, because you typically have large samples. (However, representative samples are not guaranteed)

  2. It can lead to the discovery of correlations that no current theory would predict.

(But at the risk of these correlations being spurious)

  1. Provides the opportunity to construct more sophisticated statistical models (But at the risk of these models being overfit).

32
New cards

Why can big data capture rare phenomena?

Because datasets are very large and comprehensive.

33
New cards

Why is big data still not necessarily representative?

Because “big” does not equal “all”.

34
New cards

What are disadvantages of Big Data?

  1. Spurious correlations

  2. Overfitting

  3. Representation

35
New cards

What are spurious correlations in big data?

Statistically significant relationships that occur by chance.

36
New cards

Why are spurious correlations common in big data?

With many variables, some correlations will appear randomly.

37
New cards

What is overfitting?

A model fits existing data well but performs poorly on new data. The model may be too complicated, and may try to explain new details and exceptions in the old data.

Models that match existing data too tightly may fail to predict new data.

38
New cards

What does representation as a disadvantage mean in Big Data?

Big data often ignores groups with limited internet access.

39
New cards

Why can simpler models sometimes be better?

They generalize better to new data.

40
New cards

What are advantages of Big Data?

  • High ecological validity (natural behavior)

  • includes rare events (e.g., plane crashes)

  • Reaches hard-to-reach population

41
New cards

What are ethical considerations in big data?

  • De-anonymization: Combining a few "anonymous" data points (age + city + hobby) can identify specific individuals.

  • Privacy: Accessible data is not always public (e.g., closed groups). Consent is difficult to obtain.

  • Bots/AI: Researchers must verify if content is from real humans or automated bots.

42
New cards

What is de-anonymization?

Identifying individuals from supposedly anonymous data (combining a few “anonymous” data points (age, city, hobby) that can lead to identifying specific individuals

43
New cards

What is the issue that arises with privacy in big data?

Accessible data is not always public (e.g., closed groups). Consent is difficult to obtain.

44
New cards

What is the issue that arises with bots/AI in big data?

Researchers must verify if content is from real humans or automated bots.

45
New cards

What is the main strength of content analysis?

Systematic, quantitative study of real-world communication.

46
New cards

What is the main risk of big data research?

Misinterpreting correlations as meaningful or causal.

47
New cards

Why is content analysis and big data analysis sometimes better than experiments, surveys, and interviews?

  • Sometimes cheaper and more straightforward than creating a new survey/experiment. 

  • There is no substitute for the richness, creativity, and humor in the communication on social media platforms, in written news media.

  • There is no subtitle for the hate, bias, and conformity there. 

  • You can see the whole spectrum of human behavior by looking at content.