Machine learning models at the touch of a finger

To make it easier run complex analytics, researchers at MIT have developed a "virtual data scientist" that leverages user-friendly interfaces to generate machine-learning models for making predictions from datasets.

VDS is a new component of the Northstar interactive cloud-based data-science platform. It allows users without a background in data science to mine data and build, analyze and evaluate machine learning pipelines. Users upload datasets and combine and extract features using any touchscreen input device, such as fingers and digital pens on smartphones and interactive whiteboards.

Northstar, developed by researchers at MIT and Brown University, starts as a blank, white interface. As users upload datasets, they appear in a “datasets” box on the left. Data labels are displayed a separate “attributes” box, and an “operators” box contains various algorithms. Data is stored and analyzed in the cloud.

Using their fingers or a digital pen, users drag and drop datasets, attributes and operators to design their analysis. As MIT explained, medical researchers studying connections between three diseases in people of a particular age group would upload their datasets and then drag and drop a pattern-checking algorithm into the middle of the interface, which at first appears as a blank box. Into that box they'd add disease features, such as “blood,” “infectious” and “metabolic,” to get percentages of those diseases. When they drag the “age” feature into the interface, VDS displays a bar chart of the patient’s age distribution. Data in the two boxes can be linked together by drawing a line. Circling age ranges computes the co-occurrence of the three diseases among the age range. 

“It’s like a big, unbounded canvas where you can lay out how you want everything,” said Emanuel Zgraggen, who is the key inventor of Northstar’s interactive interface. “Then, you can link things together to create more complex questions about your data.”

Going forward, the researchers plan to add a feature that alerts novice users to potential data bias or errors.

About the Author

Susan Miller is executive editor at GCN.

Over a career spent in tech media, Miller has worked in editorial, print production and online, starting on the copy desk at IDG’s ComputerWorld, moving to print production for Federal Computer Week and later helping launch websites and email newsletter delivery for FCW. After a turn at Virginia’s Center for Innovative Technology, where she worked to promote technology-based economic development, she rejoined what was to become 1105 Media in 2004, eventually managing content and production for all the company's government-focused websites. Miller shifted back to editorial in 2012, when she began working with GCN.

Miller has a BA and MA from West Chester University and did Ph.D. work in English at the University of Delaware.

Connect with Susan at [email protected] or @sjaymiller.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected