How to slash 'time to insight' when training AI
- By Rob Davies
- Oct 09, 2019
When agencies seek to develop or improve upon artificial intelligence applications, they often find that many of today’s IT systems are not robust enough to manage AI workloads at scale -- nor can they scale up and offer security at the speed required for AI modeling. This is especially true for legacy IT systems that are not purpose-built, AI-capable infrastructures. In fact, many infrastructures used today for AI have been force-fit -- and mis-fit -- into the AI space.
Before we look at what scale is required, and what IT infrastructure model is ideal, let’s quickly define the stages of advanced AI and machine learning development.
Two AI stages
In AI development, there is an initial training stage in which an AI practitioner will run AI model after model after model, drawing from deep wells of existing data.
Since AI development is iterative, the data used in the training stage is often required to be live, and therefore sensitive. (This is a central reason that training-stage development is often done on-prem.) AI achieves its efficacy by running multiple iterations across a large number of scenarios, settling on the ones that work after trying many that don’t. Simply managing the training stage consumes enterprise-class computing power, which must be available on-demand to be effective, and requires the ability to quickly scale storage.
The second stage of AI development is called inference, and that’s when the model is applied to new data. Like AI training, inference consumes massive compute power and memory for even basic operations. Note that the IT configuration to accommodate the training stage may not be the same for the inference stage.
AI and data management processes
Unlike traditional IT infrastructure providers, AI-ready infrastructure providers deliver the technological tools and skills required to design, build and manage an effective AI network. Purpose-built, AI-capable infrastructures enable data architects and scientists to access massively parallel, all-flash performance so they can slash time-to-insight and put AI to use as quickly as possible.
Here’s what agencies should expect from purpose-built, AI-capable infrastructure:
Speed: It must power multinode deep learning out of the box, delivering linear-scale performance for critical training workloads.
Simplicity: It should be up and running in hours, not weeks, and it should be able to scale to demand even faster.
Future-proofed: The most- advanced platform for real-world AI must offer scalable architecture that is responsive to demands at any scale. Purpose-built, AI-capable infrastructures should be able to add GPUs for faster training and/or add blades for bigger datasets. Any AI-capable infrastructure must be able to grow with AI workload needs, at any stage of development or use, without downtime risks.
As-a-service delivery: Purpose-built AI infrastructure should always be procured as an on-premises solution simply because of the difficulty in loading the massive datasets required for AI into the cloud. The infrastructure can be purchased, or acquired as-a-service, but it needs to be in the agency datacenter.
Choosing an AI data management provider
Applying the concept of AI to data management processes, including organization and analysis, is referred to as artificial intelligence data management. Every AI system should have data management layers, because unlike traditional analytics approaches, AI uses data pools to make productive changes to underlying algorithms.
When choosing an artificial intelligence data management provider, and the AI infrastructure that is implemented, look for these top features:
- Massively parallel, all-flash performance, where numerous individual processing units can execute many different parts of an AI model simultaneously.
- Automated incident responses that detect and assess security threats, a crucial feature given that AI models in the training stage cannot test a true working model on data that is stale and not live.
- A knowledge library construction based on tangible data, so unseen patterns and defining opportunities can be detected and improved.
- Productivity monitoring on a variety of scales, so the system can suggest streamlined processes.
- Extraction of metadata from textual and non-textual unstructured data, highlighting elements such as keywords, categories, semantics and concepts.
- Virtual assistance customer service, such as online chatbots, to aid in customer interaction.
- Predictive analysis and an untethered capability for machine learning.
- Security that protects not only sensitive data (personal data, images, sensitive medical records, etc.), but also the intellectual property and the investment made to develop the AI.
- A DevOps and systems architecture team (ideally from an experienced third party, as-a-service provider) that that specializes in providing platforms designed specifically for AI. This not only offers requisite security for live, sensitive data, but it reduces the burden of managing infrastructure.
Clearly, AI and machine learning offer some of the most exciting developments in IT and business at large, with beneficial uses that range from health care to defense.
One sure path to accelerating time-to-insight using artificial intelligence data management is to control the IT infrastructure. This means avoiding the trap of relying on constricted IT capacity when engaging in AI development processes or on IT capacity that isn’t AI ready. It’s remarkably easy to avoid this constraint by relying on an experienced as-a-service IT provider that specializes in providing platforms designed specifically for AI. This not only offers requisite security for live, sensitive data, but it reduces the burden of managing infrastructure, offering workload scaleup that can be achieved in a matter of hours, not weeks.
When developing AI/ML, agencies must ensure that the IT infrastructure is purpose-built, AI ready and can scale on demand. When those concerns are addressed, it frees the AI development teams from constricted, disorganized or unsecure systems and allows them to do what they do best.
Rob Davies is executive vice president of operations at ViON.