ARTICLE 1: Notes on Adolescents’ Diet-Related Social Media Communications (Acta Psychologica 2022)

Overview

Topic: Adolescents’ diet-related communications on social media; goal is to describe what they say (content) and how they say it (linguistic style), to understand dietary social influences and intervention opportunities.
Dataset: MyMovez project data; N = 72,384 unique messages from Dutch adolescents (N = 1038; ages 9–16; 53% female) collected Feb 2017–Jul 2018 via a custom app Social Buzz; messages anonymized. Mean age of authors: 11.23 years (SD = 1.41).
Measures were derived with a mix of dictionary-based filtering, topic modeling, sentiment analysis, and LIWC-based psycholinguistic dimensions; outcome variable = message likes (peer validation).
Main finding summarized in abstract: adolescents more often discuss neutral-to-unhealthy foods; content healthiness and topics did not predict likes, while positive sentiment and higher subjectivity did slightly predict more likes; visual (diet-related) content attracted more likes.

Key concepts and rationale

Dietary social influence in adolescence
- Obesity is a major global health threat; early-life dietary behaviors tend to persist and influence BMI trajectories and related health risks (type 2 diabetes, hypertension, cardiovascular issues).
- Adolescents are embedded in social environments (parents, peers) that shape dietary attitudes and behaviors via social influence and norms; peer socialization aligns behaviors with group norms.
- Social networks can propagate obesity-related behaviors; observing peers’ eating can lead to imitation (behavior modeling).
Online diet-related communication
- Adolescents are exposed to food marketing and branded content online; many share dietary posts themselves; online dietary content can shape short- and long-term attitudes and behaviors.
- The study investigates two dimensions of dietary online messages: content (what is mentioned) and linguistic style (how it is framed).
Linguistic style and influence potential
- Message sentiment (valence) and latent psychological dimensions (LIWC categories) offer context for how messages are received and engaged with.
- Influence potential is operationalized via likes, linked to social validation and engagement, which in turn relate to credibility and persuasiveness.
Hypotheses and research questions
- H1: Healthier diet-related messages receive fewer likes (negative association between healthiness and likes).
- H2: More positive sentiment in diet-related messages is associated with more likes (positive association).
- RQ3a/b: How broader topics and LIWC-driven psychological dimensions relate to likes.

Research questions and hypotheses (detailed)

RQ1a: Proportion of diet-related messages relative to total adolescent communication.
RQ1b: Healthiness level of referenced foods/items in messages (via nutri-score).
RQ1c: Broader latent topics in youths’ diet-related messages (via LDA).
RQ2a: Linguistic style: sentiment and subjectivity of diet-related messages.
RQ2b: How linguistic style varies with different levels of message healthiness.
RQ2c: Correlations between linguistic style features and healthiness.
H1: Negative association between healthiness of diet-related messages and number of likes.
H2: Positive association between sentiment of diet-related messages and number of likes.
RQ3a: Relationship between broader topics and likes.
RQ3b: Relationship between LIWC-dimensions (affection, socialness, drives, cognitive, perceptual, biological focus) and likes.

Method: dataset and computational approach

Dataset details (3x dataset pieces):
- Message corpus: 72,384 unique messages (N = 1,038 adolescents; 9–16 years old; 53% female).
- Timeframe: Feb 2017 – Jul 2018.
- Platform: Social Buzz (Education/health program context); direct messages and group chats; allowed image exchanges; “Like” feature tracked.
- Ethical approvals: ERC-COG-617253; ECSW2014100614-222.
Diet-related dictionary construction (N = 1,943 words)
- Starting points from English diet lexica; translation to Dutch and curation to Dutch context; synonyms added via word embeddings; misspellings captured via stemming; final dictionary includes foods, meals, beverages, eating-related words, fast-food chains, supermarkets, brands, and food-related jargon.
- Full dictionary list accessible (MovezNetwork GitHub).
Data cleaning and preprocessing
- Five-step cleaning: remove non-informative stop words (pronouns, conjunctions) and punctuation; fix vowel elongations common in Dutch slang; lowercase; tokenize; lemmatize with pattern.nl; retain only nouns, verbs, adjectives for topic modeling.
- For topic modeling, remove rare words (occurring once); keep words with semantic weight; average message length ~ $M ext{(Words per Message)} = 3.03$ (SD $4.58$ ).
Nutri-score healthiness (content healthiness measure)
- Nutri-score ranges 1–5 (transformed from A–E to 1–5) representing very healthy to very unhealthy.
- For each diet word in a message, assign a nutri-score from nutritional data (Nutritionix); average across items in a message to obtain a per-message healthiness, then center and reverse code to range $[-2, 2]$ (−2 very unhealthy to 2 very healthy).
- If a message contains no diet words, healthiness is missing.
Content measures
- Content: healthiness of referenced items (per-message average); topic modeling via LDA on a Tf-idf representation (to improve interpretability).
- LDA setup: number of topics k = 50; topic coherence analysis used to select k; final choice: k = 50 with 15 dominant topics explaining ~94% of messages; topic 16 used as reference (not among 1–15).
Linguistic style measures
- Sentiment and subjectivity: pattern.nl (Dutch) outputs message sentiment in range $[-1, 1]$ and subjectivity in $[0, 1]$ .
- LIWC-2015 (six dimensions): Affection, Socialness, Cognitive focus, Perceptual focus, Biological focus, Drives; scores per message from LIWC dictionaries; many messages have zero scores due to short length or absence of words in a category.
Message liking (outcome)
- Outcome variable: number of likes per message on SocialBuzz; mean $M = 0.25$ , SD $0.73$ .
Covariates and exploratory variables
- Covariates: Sex (0 = male, 1 = female), Age.
- Exploratory variables: message length (words per message); presence of an image; if an image depicts diet-related content vs non-diet content.

Plan of analysis (analytical steps)

Step 1: Content analysis (RQ1a–c)
- Calculate proportion of diet-related messages (mentions at least one diet word).
- Compute average healthiness per message from nutri-scores; examine distribution and differences between dictionary distribution vs. dataset uses (to assess representativeness and potential regression to the mean).
- Use LDA results to identify dominant topics; assess interpretability and meaningfulness of topics.
Step 2: Linguistic style analysis (RQ2a–c)
- Compute sentiment and subjectivity (pattern.nl) for messages that express sentiment; summarize means and SDs; test association with healthiness via correlations.
- Compute six LIWC dimensions; report descriptive statistics; test correlations with healthiness.
Step 3: Relation to influence potential (likes; H1–H2; RQ3a–b)
- Regression analyses with message likes as dependent variable:
- Model 1: healthiness as predictor (H1).
- Model 2: message topics as predictors (deviation coding; RQ3a).
- Model 3: sentiment and subjectivity as predictors (H2).
- Model 4: six LIWC dimensions as predictors (RQ3b).
- Exploratory regression: all message dimensions together (25 predictors) to assess unique explained variance; include exploratory variables (image presence, diet-related image, message length).
- Bonferroni corrections applied for multiple testing (adjusted p-values for subject tests; e.g., p < 0.003 or p < 0.002 depending on model).
Robustness checks and notes
- Reproducibility: fixed random state for LDA (random_state = 42).
- Acknowledgement of limitations: short messages limit topic modeling and context; automated text analysis may miss sarcasm/metaphor; healthiness score may be affected by regression to the mean; not measuring actual dietary behavior.

Key findings: content and healthiness

Frequency of diet-related communication (RQ1a)
- 2138 messages contained at least one diet word out of 72,384 total messages = 2.95%.
- Interpretation: dietary content is present but represents a relatively small share in overall adolescent communications.
Healthiness of referenced items (RQ1b)
- Of all dictionary words, 195 unique items were mentioned; 907 messages received a nutri-score.
- Healthiness distribution (centered score): M = -0.03, SD = 1.23 on a scale from $-2$ (very unhealthy) to $2$ (very healthy).
- When comparing dictionary distribution to observed usage, neutral-to-unhealthy items were overrepresented (neutral 0 and unhealthy -1 values overrepresented by +6% and +12%, respectively); healthier items underrepresented overall, though extreme very healthy foods (e.g., kale) and very unhealthy foods (e.g., Oreos) were relatively less mentioned due to regression-to-the-mean effects.
- Conclusion: adolescents tend to discuss neutral-to-unhealthy items more than very healthy items, after accounting for dictionary frequencies.
Topics in diet-related messages (RQ1c)
- LDA identified 50 topics; 15 dominant topics explained 94% of diet messages; topic 16 served as reference.
- Most prevalent topics included:
- Topic 1: planning to eat / leaving chat to eat (words like “eten” = eating, “gaan” = to go, “eerst” = first).
- Topic 2: hunger (e.g., “Hier krijg ik zo’n honger van” = This makes me so hungry).
- Topic 3: eating with family (social context like “zameneten”, family, celebration at grandmother’s, etc.).
- Topic 11: eating/drinking at school (social context of school meals).
- Other topics tended to be single-food items (e.g., cake, chicken, French fries) or ambiguous; overall, topic modeling offered only limited insights beyond these overlapping themes; 4 topics appeared meaningfully distinct, others were highly specific or ambiguous.
Linguistic style: sentiment and LIWC dimensions (RQ2a–c)
- Message sentiment (subset with sentiment/subjectivity): N = 628 messages; average sentiment positive (M = 0.25, SD = 0.47).
- Message subjectivity: N = 628; M = 0.78, SD = 0.22; indicates messages often reflect personal opinions or experiences.
- Relationship to healthiness: sentiment r = -0.04, p = .194; subjectivity r = -0.01, p = .744; no statistically significant association between sentiment/subjectivity and healthiness in simple correlations.
LIWC psychological dimensions (RQ2b): descriptive and correlations
- Most messages had zero scores for several LIWC dimensions due to short length and lack of match to dictionaries:
- Affection: mean 18.05 (SD 15.81; n = 461)
- Socialness: mean 18.43 (SD 9.86; n = 807)
- Cognitive focus: mean 19.84 (SD 10.25; n = 755)
- Perceptual focus: mean 17.34 (SD 15.87; n = 365)
- Biological focus: mean 26.36 (SD 20.00; n = 1592)
- Drives: mean 15.47 (SD 12.66; n = 376)
- Pattern: Biological focus appeared more frequently (due to words like “eten”/“drinken”).
- Correlations with healthiness: small positive association for Affection and healthiness (r = 0.07, p = .043); other dimensions showed no significant associations with healthiness.

Key findings: relation to likes (influence potential) and hypotheses

H1: Healthiness and likes
- Regression model (healthiness predicting likes): not significant; Table 1 shows healthiness estimate ≈ 0.02, t ≈ 0.56, p ≈ .575; intercept and covariates also non-significant. Conclusion: no evidence that healthiness of referenced foods predicts message likes.
H2: Sentiment/subjectivity predicting likes
- Model with sentiment and subjectivity: overall model significant (F(4,2133) = 9.30, p < .001).
- Sentiment: b = 0.08, t = 3.44, p = .001; positive association with likes.
- Subjectivity: b = 0.07, t = 2.69, p = .007; positive association with likes.
- Sex and age effects: small or non-significant; sex p ≈ .09; age p ≈ .71.
- Conclusion: messages with more positive sentiment and higher subjectivity tend to receive slightly more likes.
RQ3a: Topic-level effects on likes
- Regression with 15 topic predictors (Topic 1–15) vs. likes; Bonferroni-adjusted p < .003.
- After correction, no topic significantly predicted likes; none stood out above the grand mean.
RQ3b: LIWC dimensions predicting likes
- Regression with six LIWC dimensions predicting likes; overall model significant (F(8,2129) = 2.34, p = .017) but none of the individual dimensions reached significance after correction.
Exploratory analysis (multi-predictor model)
- Full model (25 predictors: content, sentiment, LIWC, topics, covariates, length, image indicators): F(25, 2006) = 14.38, p < .001; R^2 = 0.143.
- Significant predictors:
- Message subjectivity: small positive predictor of likes.
- Containing an image (yes): strong positive predictor of likes.
- Containing a diet-related image (yes): strong positive predictor; larger effect than non-diet images.
- Other variables (including message length) did not reach significance after adjustment.
Appendix findings (healthiness included vs. not included)
- When healthiness is included (Table 6): diet-related metadata (e.g., diet-related image presence, certain topics) influence likes; but main robust effects remain imaging related.

Results interpretation and discussion

Content vs. engagement
- About 3% of adolescent messages discuss dietary topics; healthiness is generally skewed toward neutral-to-unhealthy items, aligning with concerns about online dietary norms.
- Engagement (likes) was not driven by healthiness or most discrete topics; rather, how content was framed mattered: positive sentiment and subjective, person-centered framing predicted more likes.
- Visual content mattered: messages with images, especially diet-related images, attracted more likes; combining text and diet-related images yielded the strongest engagement signal.
Implications for dietary interventions and health communication
- Interventions aiming to influence adolescent dietary attitudes via social media should consider tone and framing: messages that are positive and convey personal perspective may engage peers more effectively.
- Visuals are key: integrating diet-related imagery could boost engagement, potentially increasing intervention reach and visibility in social feeds.
- Focus on the way messages are communicated, not just the content; if content is unhealthy, it may still be engaging if framed positively and personally.
The role of LDA topics
- LDA provided limited practical insight for short messages; 15 topics captured most messages but did not robustly predict engagement; short text length challenges topic interpretability.
Theoretical implications
- Supports the view that social validation (likes) is reinforced by emotionally charged and personally oriented communications, consistent with social influence theories and prior findings on engagement signals.
Practical implications for researchers and practitioners
- For big-data health communication, combining lexical sentiment, psychological dimensions, and visual content can improve understanding of engagement dynamics.
- Designing adolescent-centric interventions should incorporate co-creation with youth to ensure messages resonate and maintain engagement over time.

Strengths and limitations

Strengths
- Large, ecologically valid dataset from real-world adolescent social interactions.
- Mixed-methods computational approach: diet dictionary, nutri-score healthiness, LDA topics, Pattern sentiment, LIWC psychological dimensions.
- Examination of both what is said (content) and how it is said (linguistic style) alongside engagement metrics (likes).
- Ethical data handling: anonymization; consent procedures; data collected in a health-promotion context.
Limitations
- Automated text analysis may miss nuanced meaning, sarcasm, or metaphor; context is hard to infer from short messages.
- Short message length reduces reliability of certain NLP techniques (topic modeling particularly sensitive to short text).
- Healthiness measure (nutri-score) relies on dictionary coverage and nutrient data; regression-to-the-mean may bias extreme values.
- Likes as a proxy for influence potential do not directly equate to actual dietary behavior change.
- The platform (Social Buzz) resembles direct messaging; results may differ on one-to-many platforms (Instagram, TikTok).

Conclusions and practical takeaways

Overall dietary content among adolescents in the sample is skewed toward unhealthy or neutral items, consistent with expectations about the online food environment.
Engagement with diet-related messages is more strongly predicted by linguistic style and visual content than by the specific diet content or topics.
- Positive sentiment and higher subjectivity modestly increase like counts.
- Images, particularly diet-related images, are the strongest predictors of message liking; text+image congruence yields the largest engagement.
Implications for interventions
- Design messages with positive, personal framing and include relevant visuals to boost engagement and potential influence.
- Consider co-creating interventions with adolescents to ensure content is both engaging and credible to the target audience.
Future research directions
- Investigate cognitive mechanisms (processing fluency, cognitive load) related to image-driven engagement in adolescent health messages.
- Explore longitudinal effects: do high-like messages predict sustained engagement or behavior change over time?
- Test platform-specific effects across different social media environments (one-to-many vs. private messaging) and demographic subgroups.

Notes on key numerical references (for quick reference)

Dataset size and scope: $N = 72{,}384$ messages; $N = 1{,}038$ adolescents; age range $9–16$ ; $M_{ ext{age}} = 11.23$ (SD $1.41$ ); 53% female.
Diet-related messages: 2138/72{,}384 = 2.95\ ext{%}.
Nutri-score healthiness per message: scale transformed to density $[-2, 2]$ ; subset with healthiness: $N=1008$ messages; healthiness mean $M = -0.03$ , $SD = 1.23$ .
LIWC dimensions: biological focus more common (mean $26.36, SD 20.00$ ; n = 1592); other dimensions report means with their sample sizes.
Sentiment/subjectivity (pattern.nl): sentiment $M = 0.25, ext{ SD } = 0.47$ (n = 628); subjectivity $M = 0.78, ext{ SD } = 0.22$ (n = 628).
Message likes: $M = 0.25, ext{ SD } = 0.73$ .
Key regression results (selected):
- Healthiness predicting likes: $ext{Estimate} <br>eq significant; p = .575$ (Table 1).
- Sentiment predicting likes: $b = 0.08, t = 3.44, p = .001$ (Table 3).
- Subjectivity predicting likes: $b = 0.07, t = 2.69, p = .007$ (Table 3).
- LIWC dimensions collectively: F $= 2.34, p = .017$ (Table 4), but none individually significant after correction.
- Exploratory full model: F(25, 2006) = 14.38, p < .001, R^2 = 0.143; image-related predictors strongest (Table 5).

Appendix: data and code references

MovezNetwork GitHub: Fruit-Basket-of-Adolescents-Food-Talk (diet dictionary and related code).
Pattern (pattern.nl) for Dutch sentiment/subjectivity processing.
LIWC2015 dictionaries for six psychological dimensions.
Gensim LDA (tf-idf representation) for topic modeling; topic coherence used to determine topic count.
Nutri-score calculator: Colruyt nutri-score calculator referenced; nutrition data from Nutritionix database.

Bottom-line takeaway

The study provides a nuanced view of how adolescents discuss diet on social media: healthiness alone does not drive engagement; rather, how positively and subjectively a message is framed, and whether it’s accompanied by diet-related imagery, drives peer engagement. This has direct implications for designing engaging, youth-centered health communication interventions on social media.