Flash memory boosts big data workloads

Government agencies looking for more efficient big data workloads running on Apache Accumulo now have an option that uses flash memory to improve performance.

Fusion-io, a developer of flash memory technology and Sqrrl, developer of a NoSQL database that can handle big volumes of data, have forged a tighter integration between their technologies to make applications running on Apache Accumulo faster, more secure and cost effective, company officials said

Apache Accumulo is a highly distributed, massively parallel database capable of analyzing both structured and unstructured data. At the same time, it delivers fine-grained user access control and authentication, the companies said.

Accumulo was originally built by the National Security Agency and then turned over to the open source community. The NoSQL database lets administrators assign user access to information at the cell-level, which means that access control labels are applied to every data object ingested into the database. 

Administrators can then ensure that only people who meet required security clearances and privacy permissions can view and manipulate the data, which makes it a good fit for intelligence and federal agencies with high-security requirements.

Some caveats apply: a conventional Accumulo-based system typically relies on costly Dynamic Random Access Memory (DRAM) for performance. Not only can servers built with large DRAM configurations quickly exceed many agencies’ budgets, but there are also physical limits as to how much DRAM can be added to any system. Additionally, DRAM does not retain data in the event of an unexpected power loss. Flash memory, in contrast, is non-volatile, which means that information stored is retained when the power is cut.

“An architecture built solely on DRAM for performance and spinning disk for storage will suffer from the inefficiencies involved in migrating data between the two,” Matt Kennedy, big data solutions architect with Fusion-io, wrote in a blog.

However, “big data platforms built with Fusion-io can entirely replace spinning-disk storage and reduce DRAM footprints or complement disk storage as a cache layer.” 

Tighter integration between Fusion ioMemory and Sqrrl Enterprise provides more performance for big data workloads running on Accumulo at lower cost than high-density DRAM systems, officials said.  Sqrrl Enterprise is built on top of the open source versions of Accumulo and Hadoop, a framework for processing massive volumes of data. As a result, Sqrrl extends the capabilities of Accumulo with additional management, security and real-time analytical features. 

About the Author

Rutrell Yasin is is a freelance technology writer for GCN.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/Shutterstock.com)

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected