cloud migration

Machine learning optimizes efficiency of cloud databases

Research intended to help scientists maximize data throughput to process microbiome or metagenomics data ended up improving the cloud efficiency of long-running dynamic workloads, saving both cloud providers and users money.

The software, called OPTIMUSCLOUD, boosts efficiency for cloud-hosted databases by rightsizing resources. It works by using machine learning to develop algorithms that help optimize the cost and performance of both the virtual machine selection and the database management system options.

“Our system takes a look at the hundreds of options available and determines the best one normalized by the dollar cost,” said Somali Chaterji, a Purdue University assistant professor of agricultural and biological engineering and OPTIMUSCLOUD team leader. “When it comes to cloud databases and computations, you don’t want to buy the whole car when you only need a tire.”

More efficient computation means cloud providers do not have to aggressively over-provision their servers for fail-safe operations and clients can use only the cloud resources they need. “The prices for on-demand instances on Amazon EC2 vary by more than a factor of five thousand, depending on the virtual memory instance type you use,” Chaterji said. “Even a slight improvement in utilization can result in huge gains. … Consider that currently, even the best data centers run at lower than 50% utilization and so the costs that are passed down to end-users are hugely inflated.”

Rather than relying on the automated decision-making currently used for cloud workloads that often only works for short and repeat tasks and workloads, Chaterji said her team created an optimal configuration to handle long-running, dynamic workloads -- whether it be workloads from sensor networks, high-performance scientific applications or COVID-19 simulations.

The technology works with Amazon’s AWS, Google Cloud and Microsoft Azure, and could work with other providers with some re-engineering, Chaterji said.

“It also may help researchers who are crunching their research data on remote data centers, compounded by the remote working conditions during the pandemic, where throughput is the priority,” she said.

About the Author

Connect with the GCN staff on Twitter @GCNtech.


  • 2020 Government Innovation Awards
    Government Innovation Awards -

    21 Public Sector Innovation award winners

    These projects at the federal, state and local levels show just how transformative government IT can be.

  • Federal 100 Awards
    cheering federal workers

    Nominations for the 2021 Fed 100 are now being accepted

    The deadline for submissions is Dec. 31.

Stay Connected