Improving search across clinical trials

Improving search across clinical trials

Searching clinical trial data may soon get easier thanks to open data technology dubbed OpenTrialsFDA.

The project is one of six finalists for the Open Science Prize, a competition to create open data technology that will advance biomedical research.  The Open Science Prize is a collaborative effort between the Wellcome Trust, the National Institutes of Health and the Howard Hughes Medical Institute.

With OpenTrialsFDA, researchers can more easily search data in the Federal Drug Administration’s drug approval packages (DAPs), which contain detailed information about the methods and results of clinical trials, some of which has never been published.

Today, while the clinical information provided in DAPs is useful to researchers, it is not easily accessible, searchable or usable. Most of it is not machine-readable, consisting primarily of physical documents scanned as images. Additionally, the data is not indexed or searchable via clinical trial identifiers, making data navigation difficult.

According to the development team, OpenTrialsFDA will enable academic researchers to not only access and search unbiased descriptions of results of published and unpublished clinical trials of drugs used by billions of patients, but also reveal discrepancies between information in DAPs and in published journal articles.

The OpenTrialsFDA prototype uses a web interface with application programming interfaces that allow third-party software to access, search and present the FDA information. The team developed code for scraping data and files from [email protected], a searchable catalog of FDA-approved drug products, and used optical character recognition technology to automate the text extraction. Algorithms search for mentions of clinical trial identifiers using a search index built by the team.

According to the team, OpenTrialsFDA marks the first time DAPs have been made available for electronic searching and matching and exposed as an API so they can be accessed by third-party software. Users can search text within documents, match documents to clinical trials and search across documents. All code, including future additions, is open source.

Open Science Prize awarded six finalists $80,000 to develop tools or services to “advance discovery and improve health.” All teams must have at least one member based in the United States and one abroad.

The other finalist projects are:

  1. Fruit Fly Brain Observatory -- improving the modeling of mental and neurological diseases by connecting data related to the fly brain.

  2. Open Neuroimaging Laboratory -- advancing brain research by enabling collaborative annotation, discovery and analysis of brain imaging data.

  3. MyGene2: Accelerating Gene Discovery with Radically Open Data Sharing -- facilitating public sharing of health and genetic data by integrating publicly available information.

  4. OpenAQ: A Global Community Building the First Open, Real-Time Air Quality Data Hub for the World -- providing real-time information on poor air quality by combining data from across the globe.

  5. Real-Time Evolutionary Tracking for Pathogen Surveillance and Epidemiological Investigation -- permitting analysis of emerging epidemics such as Ebola, MERS-CoV and Zika.

In the second phase of the competition, the team judged to have the prototype with the greatest potential to advance science will be awarded $230,000. Voting for this second phase of the prize will be open until 11:59 PM PST on Jan. 6, 2017.

About the Author

Kathleen Hickey is a freelance writer for GCN.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected