ARTICLE 1: Notes on Adolescents’ Diet-Related Social Media Communications (Acta Psychologica 2022)

Overview

  • Topic: Adolescents’ diet-related communications on social media; goal is to describe what they say (content) and how they say it (linguistic style), to understand dietary social influences and intervention opportunities.

  • Dataset: MyMovez project data; N = 72,384 unique messages from Dutch adolescents (N = 1038; ages 9–16; 53% female) collected Feb 2017–Jul 2018 via a custom app Social Buzz; messages anonymized. Mean age of authors: 11.23 years (SD = 1.41).

  • Measures were derived with a mix of dictionary-based filtering, topic modeling, sentiment analysis, and LIWC-based psycholinguistic dimensions; outcome variable = message likes (peer validation).

  • Main finding summarized in abstract: adolescents more often discuss neutral-to-unhealthy foods; content healthiness and topics did not predict likes, while positive sentiment and higher subjectivity did slightly predict more likes; visual (diet-related) content attracted more likes.

Key concepts and rationale

  • Dietary social influence in adolescence

    • Obesity is a major global health threat; early-life dietary behaviors tend to persist and influence BMI trajectories and related health risks (type 2 diabetes, hypertension, cardiovascular issues).

    • Adolescents are embedded in social environments (parents, peers) that shape dietary attitudes and behaviors via social influence and norms; peer socialization aligns behaviors with group norms.

    • Social networks can propagate obesity-related behaviors; observing peers’ eating can lead to imitation (behavior modeling).

  • Online diet-related communication

    • Adolescents are exposed to food marketing and branded content online; many share dietary posts themselves; online dietary content can shape short- and long-term attitudes and behaviors.

    • The study investigates two dimensions of dietary online messages: content (what is mentioned) and linguistic style (how it is framed).

  • Linguistic style and influence potential

    • Message sentiment (valence) and latent psychological dimensions (LIWC categories) offer context for how messages are received and engaged with.

    • Influence potential is operationalized via likes, linked to social validation and engagement, which in turn relate to credibility and persuasiveness.

  • Hypotheses and research questions

    • H1: Healthier diet-related messages receive fewer likes (negative association between healthiness and likes).

    • H2: More positive sentiment in diet-related messages is associated with more likes (positive association).

    • RQ3a/b: How broader topics and LIWC-driven psychological dimensions relate to likes.

Research questions and hypotheses (detailed)

  • RQ1a: Proportion of diet-related messages relative to total adolescent communication.

  • RQ1b: Healthiness level of referenced foods/items in messages (via nutri-score).

  • RQ1c: Broader latent topics in youths’ diet-related messages (via LDA).

  • RQ2a: Linguistic style: sentiment and subjectivity of diet-related messages.

  • RQ2b: How linguistic style varies with different levels of message healthiness.

  • RQ2c: Correlations between linguistic style features and healthiness.

  • H1: Negative association between healthiness of diet-related messages and number of likes.

  • H2: Positive association between sentiment of diet-related messages and number of likes.

  • RQ3a: Relationship between broader topics and likes.

  • RQ3b: Relationship between LIWC-dimensions (affection, socialness, drives, cognitive, perceptual, biological focus) and likes.

Method: dataset and computational approach

  • Dataset details (3x dataset pieces):

    • Message corpus: 72,384 unique messages (N = 1,038 adolescents; 9–16 years old; 53% female).

    • Timeframe: Feb 2017 – Jul 2018.

    • Platform: Social Buzz (Education/health program context); direct messages and group chats; allowed image exchanges; “Like” feature tracked.

    • Ethical approvals: ERC-COG-617253; ECSW2014100614-222.

  • Diet-related dictionary construction (N = 1,943 words)

    • Starting points from English diet lexica; translation to Dutch and curation to Dutch context; synonyms added via word embeddings; misspellings captured via stemming; final dictionary includes foods, meals, beverages, eating-related words, fast-food chains, supermarkets, brands, and food-related jargon.

    • Full dictionary list accessible (MovezNetwork GitHub).

  • Data cleaning and preprocessing

    • Five-step cleaning: remove non-informative stop words (pronouns, conjunctions) and punctuation; fix vowel elongations common in Dutch slang; lowercase; tokenize; lemmatize with pattern.nl; retain only nouns, verbs, adjectives for topic modeling.

    • For topic modeling, remove rare words (occurring once); keep words with semantic weight; average message length ~Mext(WordsperMessage)=3.03M ext{(Words per Message)} = 3.03 (SD 4.584.58).

  • Nutri-score healthiness (content healthiness measure)

    • Nutri-score ranges 1–5 (transformed from A–E to 1–5) representing very healthy to very unhealthy.

    • For each diet word in a message, assign a nutri-score from nutritional data (Nutritionix); average across items in a message to obtain a per-message healthiness, then center and reverse code to range [2,2][-2, 2] (−2 very unhealthy to 2 very healthy).

    • If a message contains no diet words, healthiness is missing.

  • Content measures

    • Content: healthiness of referenced items (per-message average); topic modeling via LDA on a Tf-idf representation (to improve interpretability).

    • LDA setup: number of topics k = 50; topic coherence analysis used to select k; final choice: k = 50 with 15 dominant topics explaining ~94% of messages; topic 16 used as reference (not among 1–15).

  • Linguistic style measures

    • Sentiment and subjectivity: pattern.nl (Dutch) outputs message sentiment in range [1,1][-1, 1] and subjectivity in [0,1][0, 1].

    • LIWC-2015 (six dimensions): Affection, Socialness, Cognitive focus, Perceptual focus, Biological focus, Drives; scores per message from LIWC dictionaries; many messages have zero scores due to short length or absence of words in a category.

  • Message liking (outcome)

    • Outcome variable: number of likes per message on SocialBuzz; mean M=0.25M = 0.25, SD 0.730.73.

  • Covariates and exploratory variables

    • Covariates: Sex (0 = male, 1 = female), Age.

    • Exploratory variables: message length (words per message); presence of an image; if an image depicts diet-related content vs non-diet content.

Plan of analysis (analytical steps)

  • Step 1: Content analysis (RQ1a–c)

    • Calculate proportion of diet-related messages (mentions at least one diet word).

    • Compute average healthiness per message from nutri-scores; examine distribution and differences between dictionary distribution vs. dataset uses (to assess representativeness and potential regression to the mean).

    • Use LDA results to identify dominant topics; assess interpretability and meaningfulness of topics.

  • Step 2: Linguistic style analysis (RQ2a–c)

    • Compute sentiment and subjectivity (pattern.nl) for messages that express sentiment; summarize means and SDs; test association with healthiness via correlations.

    • Compute six LIWC dimensions; report descriptive statistics; test correlations with healthiness.

  • Step 3: Relation to influence potential (likes; H1–H2; RQ3a–b)

    • Regression analyses with message likes as dependent variable:

    • Model 1: healthiness as predictor (H1).

    • Model 2: message topics as predictors (deviation coding; RQ3a).

    • Model 3: sentiment and subjectivity as predictors (H2).

    • Model 4: six LIWC dimensions as predictors (RQ3b).

    • Exploratory regression: all message dimensions together (25 predictors) to assess unique explained variance; include exploratory variables (image presence, diet-related image, message length).

    • Bonferroni corrections applied for multiple testing (adjusted p-values for subject tests; e.g., p < 0.003 or p < 0.002 depending on model).

  • Robustness checks and notes

    • Reproducibility: fixed random state for LDA (random_state = 42).

    • Acknowledgement of limitations: short messages limit topic modeling and context; automated text analysis may miss sarcasm/metaphor; healthiness score may be affected by regression to the mean; not measuring actual dietary behavior.

Key findings: content and healthiness

  • Frequency of diet-related communication (RQ1a)

    • 2138 messages contained at least one diet word out of 72,384 total messages = 2.95%.

    • Interpretation: dietary content is present but represents a relatively small share in overall adolescent communications.

  • Healthiness of referenced items (RQ1b)

    • Of all dictionary words, 195 unique items were mentioned; 907 messages received a nutri-score.

    • Healthiness distribution (centered score): M = -0.03, SD = 1.23 on a scale from 2-2 (very unhealthy) to 22 (very healthy).

    • When comparing dictionary distribution to observed usage, neutral-to-unhealthy items were overrepresented (neutral 0 and unhealthy -1 values overrepresented by +6% and +12%, respectively); healthier items underrepresented overall, though extreme very healthy foods (e.g., kale) and very unhealthy foods (e.g., Oreos) were relatively less mentioned due to regression-to-the-mean effects.

    • Conclusion: adolescents tend to discuss neutral-to-unhealthy items more than very healthy items, after accounting for dictionary frequencies.

  • Topics in diet-related messages (RQ1c)

    • LDA identified 50 topics; 15 dominant topics explained 94% of diet messages; topic 16 served as reference.

    • Most prevalent topics included:

    • Topic 1: planning to eat / leaving chat to eat (words like “eten” = eating, “gaan” = to go, “eerst” = first).

    • Topic 2: hunger (e.g., “Hier krijg ik zo’n honger van” = This makes me so hungry).

    • Topic 3: eating with family (social context like “zameneten”, family, celebration at grandmother’s, etc.).

    • Topic 11: eating/drinking at school (social context of school meals).

    • Other topics tended to be single-food items (e.g., cake, chicken, French fries) or ambiguous; overall, topic modeling offered only limited insights beyond these overlapping themes; 4 topics appeared meaningfully distinct, others were highly specific or ambiguous.

  • Linguistic style: sentiment and LIWC dimensions (RQ2a–c)

    • Message sentiment (subset with sentiment/subjectivity): N = 628 messages; average sentiment positive (M = 0.25, SD = 0.47).

    • Message subjectivity: N = 628; M = 0.78, SD = 0.22; indicates messages often reflect personal opinions or experiences.

    • Relationship to healthiness: sentiment r = -0.04, p = .194; subjectivity r = -0.01, p = .744; no statistically significant association between sentiment/subjectivity and healthiness in simple correlations.

  • LIWC psychological dimensions (RQ2b): descriptive and correlations

    • Most messages had zero scores for several LIWC dimensions due to short length and lack of match to dictionaries:

    • Affection: mean 18.05 (SD 15.81; n = 461)

    • Socialness: mean 18.43 (SD 9.86; n = 807)

    • Cognitive focus: mean 19.84 (SD 10.25; n = 755)

    • Perceptual focus: mean 17.34 (SD 15.87; n = 365)

    • Biological focus: mean 26.36 (SD 20.00; n = 1592)

    • Drives: mean 15.47 (SD 12.66; n = 376)

    • Pattern: Biological focus appeared more frequently (due to words like “eten”/“drinken”).

    • Correlations with healthiness: small positive association for Affection and healthiness (r = 0.07, p = .043); other dimensions showed no significant associations with healthiness.

Key findings: relation to likes (influence potential) and hypotheses

  • H1: Healthiness and likes

    • Regression model (healthiness predicting likes): not significant; Table 1 shows healthiness estimate ≈ 0.02, t ≈ 0.56, p ≈ .575; intercept and covariates also non-significant. Conclusion: no evidence that healthiness of referenced foods predicts message likes.

  • H2: Sentiment/subjectivity predicting likes

    • Model with sentiment and subjectivity: overall model significant (F(4,2133) = 9.30, p < .001).

    • Sentiment: b = 0.08, t = 3.44, p = .001; positive association with likes.

    • Subjectivity: b = 0.07, t = 2.69, p = .007; positive association with likes.

    • Sex and age effects: small or non-significant; sex p ≈ .09; age p ≈ .71.

    • Conclusion: messages with more positive sentiment and higher subjectivity tend to receive slightly more likes.

  • RQ3a: Topic-level effects on likes

    • Regression with 15 topic predictors (Topic 1–15) vs. likes; Bonferroni-adjusted p < .003.

    • After correction, no topic significantly predicted likes; none stood out above the grand mean.

  • RQ3b: LIWC dimensions predicting likes

    • Regression with six LIWC dimensions predicting likes; overall model significant (F(8,2129) = 2.34, p = .017) but none of the individual dimensions reached significance after correction.

  • Exploratory analysis (multi-predictor model)

    • Full model (25 predictors: content, sentiment, LIWC, topics, covariates, length, image indicators): F(25, 2006) = 14.38, p < .001; R^2 = 0.143.

    • Significant predictors:

    • Message subjectivity: small positive predictor of likes.

    • Containing an image (yes): strong positive predictor of likes.

    • Containing a diet-related image (yes): strong positive predictor; larger effect than non-diet images.

    • Other variables (including message length) did not reach significance after adjustment.

  • Appendix findings (healthiness included vs. not included)

    • When healthiness is included (Table 6): diet-related metadata (e.g., diet-related image presence, certain topics) influence likes; but main robust effects remain imaging related.

Results interpretation and discussion

  • Content vs. engagement

    • About 3% of adolescent messages discuss dietary topics; healthiness is generally skewed toward neutral-to-unhealthy items, aligning with concerns about online dietary norms.

    • Engagement (likes) was not driven by healthiness or most discrete topics; rather, how content was framed mattered: positive sentiment and subjective, person-centered framing predicted more likes.

    • Visual content mattered: messages with images, especially diet-related images, attracted more likes; combining text and diet-related images yielded the strongest engagement signal.

  • Implications for dietary interventions and health communication

    • Interventions aiming to influence adolescent dietary attitudes via social media should consider tone and framing: messages that are positive and convey personal perspective may engage peers more effectively.

    • Visuals are key: integrating diet-related imagery could boost engagement, potentially increasing intervention reach and visibility in social feeds.

    • Focus on the way messages are communicated, not just the content; if content is unhealthy, it may still be engaging if framed positively and personally.

  • The role of LDA topics

    • LDA provided limited practical insight for short messages; 15 topics captured most messages but did not robustly predict engagement; short text length challenges topic interpretability.

  • Theoretical implications

    • Supports the view that social validation (likes) is reinforced by emotionally charged and personally oriented communications, consistent with social influence theories and prior findings on engagement signals.

  • Practical implications for researchers and practitioners

    • For big-data health communication, combining lexical sentiment, psychological dimensions, and visual content can improve understanding of engagement dynamics.

    • Designing adolescent-centric interventions should incorporate co-creation with youth to ensure messages resonate and maintain engagement over time.

Strengths and limitations

  • Strengths

    • Large, ecologically valid dataset from real-world adolescent social interactions.

    • Mixed-methods computational approach: diet dictionary, nutri-score healthiness, LDA topics, Pattern sentiment, LIWC psychological dimensions.

    • Examination of both what is said (content) and how it is said (linguistic style) alongside engagement metrics (likes).

    • Ethical data handling: anonymization; consent procedures; data collected in a health-promotion context.

  • Limitations

    • Automated text analysis may miss nuanced meaning, sarcasm, or metaphor; context is hard to infer from short messages.

    • Short message length reduces reliability of certain NLP techniques (topic modeling particularly sensitive to short text).

    • Healthiness measure (nutri-score) relies on dictionary coverage and nutrient data; regression-to-the-mean may bias extreme values.

    • Likes as a proxy for influence potential do not directly equate to actual dietary behavior change.

    • The platform (Social Buzz) resembles direct messaging; results may differ on one-to-many platforms (Instagram, TikTok).

Conclusions and practical takeaways

  • Overall dietary content among adolescents in the sample is skewed toward unhealthy or neutral items, consistent with expectations about the online food environment.

  • Engagement with diet-related messages is more strongly predicted by linguistic style and visual content than by the specific diet content or topics.

    • Positive sentiment and higher subjectivity modestly increase like counts.

    • Images, particularly diet-related images, are the strongest predictors of message liking; text+image congruence yields the largest engagement.

  • Implications for interventions

    • Design messages with positive, personal framing and include relevant visuals to boost engagement and potential influence.

    • Consider co-creating interventions with adolescents to ensure content is both engaging and credible to the target audience.

  • Future research directions

    • Investigate cognitive mechanisms (processing fluency, cognitive load) related to image-driven engagement in adolescent health messages.

    • Explore longitudinal effects: do high-like messages predict sustained engagement or behavior change over time?

    • Test platform-specific effects across different social media environments (one-to-many vs. private messaging) and demographic subgroups.

Notes on key numerical references (for quick reference)

  • Dataset size and scope: N=72,384N = 72{,}384 messages; N=1,038N = 1{,}038 adolescents; age range 9169–16; Mextage=11.23M_{ ext{age}} = 11.23 (SD 1.411.41); 53% female.

  • Diet-related messages: 2138/72{,}384 = 2.95\ ext{%}.

  • Nutri-score healthiness per message: scale transformed to density [2,2][-2, 2]; subset with healthiness: N=1008N=1008 messages; healthiness mean M=0.03M = -0.03, SD=1.23SD = 1.23.

  • LIWC dimensions: biological focus more common (mean 26.36,SD20.0026.36, SD 20.00; n = 1592); other dimensions report means with their sample sizes.

  • Sentiment/subjectivity (pattern.nl): sentiment M=0.25,extSD=0.47M = 0.25, ext{ SD } = 0.47 (n = 628); subjectivity M=0.78,extSD=0.22M = 0.78, ext{ SD } = 0.22 (n = 628).

  • Message likes: M=0.25,extSD=0.73M = 0.25, ext{ SD } = 0.73.

  • Key regression results (selected):

    • Healthiness predicting likes: extEstimate<br>eqsignificant;p=.575ext{Estimate} <br>eq significant; p = .575 (Table 1).

    • Sentiment predicting likes: b=0.08,t=3.44,p=.001b = 0.08, t = 3.44, p = .001 (Table 3).

    • Subjectivity predicting likes: b=0.07,t=2.69,p=.007b = 0.07, t = 2.69, p = .007 (Table 3).

    • LIWC dimensions collectively: F =2.34,p=.017= 2.34, p = .017 (Table 4), but none individually significant after correction.

    • Exploratory full model: F(25, 2006) = 14.38, p < .001, R^2 = 0.143; image-related predictors strongest (Table 5).

Appendix: data and code references

  • MovezNetwork GitHub: Fruit-Basket-of-Adolescents-Food-Talk (diet dictionary and related code).

  • Pattern (pattern.nl) for Dutch sentiment/subjectivity processing.

  • LIWC2015 dictionaries for six psychological dimensions.

  • Gensim LDA (tf-idf representation) for topic modeling; topic coherence used to determine topic count.

  • Nutri-score calculator: Colruyt nutri-score calculator referenced; nutrition data from Nutritionix database.

Bottom-line takeaway

  • The study provides a nuanced view of how adolescents discuss diet on social media: healthiness alone does not drive engagement; rather, how positively and subjectively a message is framed, and whether it’s accompanied by diet-related imagery, drives peer engagement. This has direct implications for designing engaging, youth-centered health communication interventions on social media.