The New York City Health Department and Columbia University analyze Yelp comments for keywords indicating foodborne illness in the city.
Restaurant diners afflicted with food poisoning tend to share their gastrointestinal distress on social media, rather than reporting the illness to the local health department, which monitors reported food poisoning so it can curtail future outbreaks.
Vasudha Reddy and her colleges at the New York City Department of Health and Mental Hygiene noticed mentions of food poisoning among Yelp reviews when they were tracking down an outbreak a few years ago. That’s when the city decided to partner with Columbia University to analyze Yelp comments for keywords indicating foodborne illness in the city.
“We realized there might be people who are not aware of reporting through the mechanism of 311,” said Reddy, a foodborne disease epidemiologist, adding that 311 calls or online complaints to the city remain the preferred mechanism for reporting these issues.
To sift through the comments, Columbia receives a daily feed from Yelp that is then analyzed by an algorithm that “breaks down the raw review text into words and phrases of up to three consecutive words,” Reddy and Thomas Effland, a computer science Ph.D. student at Columbia, said in a written explanation of the system. “It then uses the counts of these words and phrases to predict if the review discusses foodborne illness or not.”
The algorithm is a well-known machine learning algorithm, but the training and data labeling has been done by city epidemiologists.
It is able to pick up key words like “sick,” “vomiting,” “diarrhea,” “food poisoning” and others that could be indicative of illness occurring at the restaurant. The comments and reviews that trip the algorithms are then sent onto the Health Department where officials follow up with the commenter for verification.
“We set aside time every day to look at all the reviews that the computer algorithm spits out,” Reddy said.
The system has been able to identify 8,523 complaints of foodborne illness since 2012. There have been about 28,000 in total, so 311 is still the main source of complaints, she said. Since 2012, the algorithm has helped the Health Department identify 10 outbreaks, which are defined as two or more cases of gastrointestinal illness occurring within 30 days of each other and associated with eating at the same restaurant.
Since its initial release, the process has undergone improvements that has made it more accurate. This includes using a better classification algorithm and providing it with more and better data for training.
Luis Gravano and Daniel Hsu, professors of computer science at Columbia Engineering and coauthors of a recent study on the Health Department system, said it has already improved the detection of outbreaks of foodborne illnesses.
"Effective information extraction regarding foodborne illness from social media is of high importance -- online restaurant review sites are popular, and many people are more likely to discuss food poisoning incidents in such sites than on official government channels," said Gravano and Hsu.