health data (everything

Making synthetic health records more realistic

For years, health researchers struggled to gain access to the detailed patient data they needed to test the effectiveness of tools, algorithms and disease modeling approaches.

In 2017, MITRE employees designed an open source tool that creates synthetic patient records – those generated by computer programs rather than collected from actual individuals. Called Synthea, the tool models the medical history of these fictional patients, generating “high-quality, synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare,” according to the project’s website. It gives researchers free access to patient data that has no protected health information or personally identifiable information.

According to MITRE, data on each patient runs from birth to the present day. Diseases, conditions and medical care are defined by one or more modules that represent common health conditions -- such as allergies, breast and lung cancer and joint replacements – and the changes caused by the condition and its treatment. The synthetic patient data also features a complete medical history, including medications, medical encounters and social factors impacting health. 

Synthea’s demonstration project, SyntheticMass, modeled the health information of more than 1 million Massachusetts residents that researchers could use for demos and testing.  

The Centers for Disease Control and Prevention used Synthea data for a project on childhood obesity, creating a data set reflecting pediatric growth curves. Researchers then created simulations of the impact of weight-loss programs to chart the range of outcomes.

Most recently, the synthetic data was used to study COVID-19 disease progression and treatment. In the early stages of the pandemic, researchers used Synthea to generate health records for fictional COVID patients, including their daily consumption of personal protective equipment and use of dialysis machines and ventilators, according to an October 2020 paper by Jason Walonoski, a MITRE software engineer who leads the Synthea team.

Now, the Department of Health and Human Services has launched the Synthetic Health Data Challenge to improve Synthea’s capabilities.

HHS’ Office of the National Coordinator for Health Information Technology (ONC) is asking researchers and developers to validate the realism of Synthea’s health records, expand the disease-progression and treatment modules used in the synthetic records and develop novel uses of synthetic health data.

"By enhancing Synthea with new clinical data modules or demonstrating novel uses of Synthea-generated synthetic data, Challenge participants will support [Patient-Centered Outcomes Research] research and development efforts by enhancing PCOR researchers' ability to conduct rigorous analyses and generate relevant findings," ONC Chief Scientist Teresa Zayas-Cabán said in the HHS announcement.

The two-phase challenge is accepting proposals in two categories: enhancements to Synthea and novel uses of Synthea-generated synthetic data. The best proposals will move on the second phase where prototypes or solutions will be developed and awards totaling up to $100,000 are available to winning participants.

A phase one informational webinar will be held Feb. 2. More information on the challenge can be found here.  

About the Author

Susan Miller is executive editor at GCN.

Over a career spent in tech media, Miller has worked in editorial, print production and online, starting on the copy desk at IDG’s ComputerWorld, moving to print production for Federal Computer Week and later helping launch websites and email newsletter delivery for FCW. After a turn at Virginia’s Center for Innovative Technology, where she worked to promote technology-based economic development, she rejoined what was to become 1105 Media in 2004, eventually managing content and production for all the company's government-focused websites. Miller shifted back to editorial in 2012, when she began working with GCN.

Miller has a BA and MA from West Chester University and did Ph.D. work in English at the University of Delaware.

Connect with Susan at [email protected] or @sjaymiller.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected