security (issaro prakalung/

Defending IT infrastructure with analytics

To make it easier for next-generation threat hunters to analyze cybersecurity data across cloud environments, the Department of Homeland Security’s Cybersecurity and Infrastructure Security Agency and the DHS Science and Technology Directorate are developing an environment where new analytic tools and software can be researched and tested to counter existing and emerging threats.

CyLab will be a logical data warehouse to support improving CISA analytics and architecture by leveraging different cloud vendors and testing analytic solutions from development to production, according to CISA’s Associate Chief of Strategic Technology Gary Jones. Speaking in a July 26 webcast, he described how machine learning and threat hunting capabilities are being developed for use by DHS staff and contractors that can help defend not only federal systems and networks, but the nation's critical infrastructure.

CyLab’s data, though, is the basic ingredient for all the analysis, said Preston Werntz, the assistant chief data officer in the CTO's office in CISA. “We're really trying to make sure that data we've got is going to be in the best shape possible that we can move it into a CyLab and use it for these more advanced purposes,” he said. That entails bringing together what’s considered big data, cyber data, structured data and wide, or siloed, data that resides in smaller, perhaps unstructured data sets.

“All those different datas, even at the unclassified level, have certain sensitivities, maybe privacy sensitive, maybe critical infrastructure sensitive. So that governance and stewardship is so important,” he said.

CyLab is working to map all the data to different concepts and classes and increase the amount of captured metadata so the team can determine what data is appropriate for which ML model and help minimize the algorithms’ drift. It’s also important to keep on top of changes to the data, Werntz said.

The two things the team is focused on, he said, are getting CISA’s data ready to be used in CyLab and then putting policies in place to ensure that the machine learning models built on that data get shared to stakeholders, industry or critical infrastructure operators in machine-readable formats.

Alexandria Phounsavath, director of S&T’s Data Analytics Technology Center, outlined CyLab’s three-part research plan.

The first part concerns the ecosystem, the multicloud environment where various cloud providers’ capabilities can be reviewed. The CyLab team will consider how to move data and run computations across clouds and solve information-sharing and privacy issues so researchers can easily collaborate. The environment will also feature high-performance computing resources necessary for training artificial intelligence algorithms.

The second part of the research plan, she said, addresses the AI/ML tools for the environment, as well as the data wrangling, the model building, the natural language processing tools.

The final area is what Phounsavath called a “stretch goal.” It involves bringing academic researchers into the collaborative, problem-solving space. “So, where is this space? What data sets go in there? What do you do with folks you who may not be fully cleared?” she asked. In the event of another Colonial Pipeline incident where there’s a flurry of initial activity, CyLab wants to be able to sustain and maintain not just that energy but the whole environment, she said.

CyLab is expected to become operational in 2024, but additional capabilities will be added, according to Jones. The environment will probably start with basic machine learning capabilities along with DevOps-type development, he said.

“CyLab isn't one and done. It's going to be an enduring capability for systems missions to benefit from innovation,” Phounsavath said. “In the field of analytics, the players, the landscape of … products changes in months, not years. We're going to be creating an environment where, although the threats are changing and evolving, so will the capabilities that CISA has to address them.”

About the Author

Susan Miller is executive editor at GCN.

Over a career spent in tech media, Miller has worked in editorial, print production and online, starting on the copy desk at IDG’s ComputerWorld, moving to print production for Federal Computer Week and later helping launch websites and email newsletter delivery for FCW. After a turn at Virginia’s Center for Innovative Technology, where she worked to promote technology-based economic development, she rejoined what was to become 1105 Media in 2004, eventually managing content and production for all the company's government-focused websites. Miller shifted back to editorial in 2012, when she began working with GCN.

Miller has a BA and MA from West Chester University and did Ph.D. work in English at the University of Delaware.

Connect with Susan at [email protected] or @sjaymiller.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected