health data (everything

Coronavirus resources climb into the cloud

Two industry giants are supporting research into COVID-19 by hosting COVID-19 related data in their clouds.

Amazon Web Services has been working with partners to develop a data lake of curated, updated datasets related to COVID-19.  The goal is to give data scientists, researchers, public health officials and other organizations a centralized trove of vetted information about COVID-19, with the aim of helping them develop policies that curb the spread of the disease.

Hosted on the AWS cloud, the data lake includes “case tracking data” from The New York Times and Johns Hopkins University, data from Definitive Healthcare on hospital bed availability, and a vast library of coronavirus-related research articles (over 45,000 as of this writing) from the Allen Institute for AI. AWS plans to keep updating the resource as new and reliable information surfaces.

“The AWS COVID-19 data lake allows experimenters to quickly run analyses on the data in place without wasting time extracting and wrangling data from all the available data sources,” AWS said in its April 8 announcement.

Users can perform “trend analysis, do keyword search, perform question/answer analysis, build and run machine learning models, or run custom analyses” on the data using third-party solutions or AWS tools like Amazon Athena, Amazon QuickSight and Amazon Redshift Spectrum. There is no extra cost to access the data lake; users only pay the normal costs of whatever AWS services they use to work with the data. Users can choose to work solely within the data lake or combine it with their proprietary data. They also have the option to subscribe to the COVID-19 data sources directly via the AWS Data Exchange.

The Google Cloud and HCA Healthcare and SADA Systems are developing a COVID-19 National Response Portal, an open data platform and operational dashboard. The portal will promote data-sharing about the COVID-19 pandemic and how it is spreading in an effort to help hospitals and communities prepare and respond.

The portal will aggregate and display anonymous HIPAA-compliant metrics from HCA hospital systems into a single platform. Daily metrics submitted by U.S. hospital systems will include supply and utilization of ICU beds and ventilators; total numbers of positive, negative, and pending COVID-19 test results; and total numbers of patients who have been discharged. The platform will also have the ability to leverage publicly available datasets, such as data on local shelter-in-place policies, and traffic or mobility patterns, to help shed light on how public behaviors and policies may impact the spread of COVID-19.

HCA Healthcare has invited groups representing approximately 4,000 hospitals across the country to join and share data on the platform.

Gladys Rama, the senior site producer for, and, sibling sites of GCN, contributed to this article.

About the Author

Connect with the GCN staff on Twitter @GCNtech.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected