Guarini Alumni Research Award Recipient 2021: Catherine Pollack, QBS

Obesity is a pressing public health problem within the United States, impacting over 40% of adults and 19% of children. As a result, it is critical that clinical content on obesity is both readily available and factual. While clinicians may be considered the "gold standard" of quality information on diseases like obesity, they may not always be easily accessible. As a result, individuals have increasingly turned to alternative avenues, including the Internet and social media, for health information. While nearly-ubiquitous, this information may not always be credible, accurate, or safe. Therefore, it is important to explore the state of obesity-related content across various social media platforms.

One such platform to consider is Reddit, a social media platform with over 50 million daily active users. Reddit is made up of a series of smaller communities called "subreddits," each of which is comprised of a variety of posts made by anonymous users. Other users can anonymously "react" to these posts in several different ways, including commenting. Eighteen percent of US adults report "ever" using Reddit, including 36% of those between 18 and 29. Furthermore, 42% of users report that they use the platform for news-related purposes, which may indicate a high degree of trust in its content. As a result, it is important to explore the state of quality, health information on the platform. Reddit is especially crucial to explore in terms of obesity-related content, as it has historically faced scrutiny for the presence of stigmatizing obesity content whereby persons with obesity are ascribed negative traits (such as laziness) "due" to their weight. Therefore, the purpose of this project was to evaluate the state of obesity-related content on Reddit, with a particular focus on obesity-related misinformation and stigma.

To accomplish this task, we first collected approximately 760,000 sentences from comments made between 2011 and 2019 that mentioned "obese" or "obesity." Through the generosity of the Guarini Alumni Research Award, we were then able to hire and train three research assistants to label a small subset of this data as either factual content, misinformation, weight-related stigma, or too ambiguous to determine.  We used this labeled data to build a machine learning pipeline that could automatically categorize each sentence into one of these five categories based on the sentence's linguistic properties. This included the sentiment of the text as well as various psycholinguistic features, such as the percent of first-person pronouns or the percent of terms related to leisure activities. After this labeling, we conducted additional statistical analysis to determine whether there were features that were overrepresented or underrepresented in one category relative to both an external and internal comparator.

Our findings suggest that there were indeed several features that were significantly different in one category relative to others. For example, stigmatizing content tended to have more anger-related terms (e.g., "hate," "annoyed") and third-person plural pronouns (e.g., "they," "their," "they'd") but fewer first-person pronouns (e.g., "I," "me," "mine") compared to body positivity. By comparison, misinformation tended to have more quotation marks and "net speak" (e.g., "btw," "lol," "thx") but fewer words with at least six letters compared to factual content. Taken together, these findings contribute to our understanding of how users on Reddit discuss and perceive obesity, provide a baseline amount of obesity-related misinformation and stigma on the platform, and introduce a tool for content moderators on the platform to use to identify content that may go against community standards and guidelines. Furthermore, the method developed here could be applied to other health topics (such as COVID-19, vaccinations, or vaping) with minimal additional intervention required.

For more information, visit our preprint available at JMIR Preprint ( We would like to acknowledge the Guarini Alumni Research Award for their generous contribution to this work.