speech bubbles (Petr Vaclavek/Shutterstock.com)


How natural language processing can support public health

Technology has evolved swiftly over the past two decades, but since the novel coronavirus first surfaced in Wuhan last December, the world has rearranged itself even more rapidly.

With no vaccine for COVID-19 yet available, public health officials have relied on non-pharmaceutical interventions like face masks and social distancing in an attempt to quell its spread. However, advances in artificial intelligence -- and in natural language processing (NLP) in particular -- could help officials better adapt their responses and the way they position those recommendations based on community sentiment and reactions.

NLP is a subset of AI in which computers understand and interpret human language. The last three years, bigger models and new architectures have enabled NLP systems to finally surpass human performance on multiple well-known benchmarks. Topic modeling and aspect-based sentiment analysis are two ways NLP can be used to better understand the public’s reactions to COVID-19.

Agencies interested in applying NLP to crisis response should follow these three steps:

1. Start with topic modeling.

Topic modeling, in a nutshell, is when AI systems are fed a large batch of documents and they surface the most common keywords or topics. Already, academic research has used topic modeling to assess what people on Twitter are saying about the virus: common topics range from physical distancing to the original outbreak in Wuhan. This information is crucial for public officials to understand what citizens are most concerned about during a particular crisis. It can also show what topics people are not talking about, indicating measures they’re likely not following.

The biggest challenge with analyzing COVID-related posts is that a whole new language has emerged in response to the virus. The machine learning that underpins NLP relies on models trained on large, historical datasets. If that training data was scraped a year ago, nomenclature like “social distancing” wouldn’t be a part of it. To leverage NLP, researchers must collect data on an ongoing basis to account for new words being added to the lexicon.

2. Use aspect-based sentiment analysis.

NLP can also be used for sentiment analysis which, as the name suggests, interprets emotions based on text, as opposed to simply clustering topics. This is particularly useful for crisis response, as it shows how the public feels about measures put in place. Right now, the main challenge is that besides the pandemic, many people are dealing with an economic recession and political upheaval. Thus, many social media posts contain a wide range of somewhat unrelated topics. In the world of sentiment analysis, that can create significant noise, clouding particular sentiment regarding COVID.

Aspect-based sentiment analysis can remedy this problem. It breaks the text down into components -- grouping related nouns, verbs and adjectives together -- and then analyzes the sentiment of each group. Users can create and define which aspects are important, and if those aspects are absent from a particular statement, it isn’t analyzed. Sentiment analysis is particularly useful to public health officials when it suggests, for example, a non-pharmaceutical intervention will not be successful because people react negatively or ignore it altogether.

3. Interpret results.

As is the case with anything AI-related, researchers must make sure they minimize biases before interpreting their data. Social media sites like Twitter are often used for rants more than praise. Very few people log on to profess how happy they were working from home last week. More directed surveys, on the other hand, will be less biased and may capture more balanced responses. As public officials conduct surveys, they can also look at other demographic information and make sure their sample represents their constituents.

Of course, officials don’t just want to understand sentiment at a particular moment in time; they want to understand trends. After several months of sheltering in place, there’s a good chance people’s perspectives have changed. Topic modeling and sentiment analysis alike must be presented in a way that shows changes over time, and those trends should inform future guidance.

NLP is a powerful technology -- and one that has made tremendous strides in recent years. In the midst of a crisis, this technology can offer scientists and public health officials a useful reference point for positioning their response and recommendations to the general public. Topic modeling and aspect-based sentiment analysis offer a level of sophistication and depth that other more rudimentary forms of social media monitoring lack. By leveraging these advances in machine learning and AI, public health officials can get an accurate, continuous pulse on how citizens are responding to their recommendations and adapt their responses accordingly.

About the Author

Sean McPherson is a deep learning data scientist at Intel.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/Shutterstock.com)

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected