Project Oxford APIs offer facial, voice recognition

Project Oxford APIs offer facial, voice recognition capabilities

If you’ve tried out Microsoft’s only to find its guess for your age was way off, you know facial recognition technology has a long way to go. But the cloud-based Project Oxford, the ‘brains’ behind, has some tools developers can use now to easily build intelligence -- like facial, speech recognition -- into their apps.

Project Oxford’s application programming interfaces and software development kits allow developers to add intelligent services into their solutions that leverage Microsoft's natural data understanding.

The face API uses algorithms to detect and recognize human faces in images, with the ability to determine gender and age. That allows developers to check whether two images of faces are similar enough to be the same person (authentication) or to compare an image to a user-provided face database. It could be the foundation of a security application or help law enforcement agencies identify victims and criminals.

Earlier this week the Department of State posted a for facial recognition software that would have the ability to match a photo on a passport or visa to the face of the person in possession of it. And in San Diego, the Encinitas Union School District started a new pilot program that uses facial recognition software for school issued iPads to simplify the login process for students (although that particular

In Hyderabad, India, city police will soon use facial recognition software on cameras located around the city to potentially identify terror suspects, property offenders and other criminals by matching images with a data bank of images of known offenders.

The Project Oxford speech API uses algorithms to process spoken language into text and text into audio -- functionality that could be used for a wide range of citizen-facing services, such as initial call center screening. Files of spoken audio are transmitted to the Microsoft's servers in the cloud, and a single text result it returned.  There is also a client library that can be hosted locally, which allows for real-time streaming of audio and returns text as it is generated.

The vision API will produce visual features based on the input image's visual content -- categorizing the image, identifying dominant colors and even flagging image elements that might be inappropriate.  The API can also and read text from an image, and creates thumbnail versions of the original image that are tailored to specific needs.

There’s also a Language Understanding Intelligent Service, offered as in invitation-only beta, that helps applications understand what users mean when they say or type something using natural, everyday language.

The services  are currently available for limited free usage in beta (20 transactions per minute). They work across programming platforms and languages --  from Windows and Windows Phone to iOS and Android, Microsoft said. To try them out, a developer must have an Azure account.

About the Author

Derek Major is a former reporter for GCN.


  • business meeting (Monkey Business Images/

    Civic tech volunteers help states with legacy systems

    As COVID-19 exposed vulnerabilities in state and local government IT systems, the newly formed U.S. Digital Response stepped in to help. Its successes offer insight into existing barriers and the future of the civic tech movement.

  • data analytics (

    More visible data helps drive DOD decision-making

    CDOs in the Defense Department are opening up their data to take advantage of artificial intelligence and machine learning tools that help surface insights and improve decision-making.

Stay Connected