machine learning

Algorithms can be gamed, but experienced humans can help

Researchers studying artificial intelligence and machine learning looked at the work of patent examiners to determine the degree to which ML systems can be “gamed” by individuals.

ML algorithms typically used to review resumes or insurance claims have been trained to look for specific phrases that indicate competence or experience, in the case of a job applicant for example. A savvy job hunter, then, could leverage an algorithm’s bias by including relevant but false degrees or certifications. An insurance applicant could deliberately omit mention of prior accidents.

Could ML algorithms spot and correct for deliberately false inputs?

For their study, the researchers from the University of Maryland and Harvard Business School turned to the U.S. Patent and Trademark Office whose patent examiners “face a time-consuming challenge of accurately determining the novelty and non-obviousness of a patent application,” the wrote. To help examiners find relevant prior art faster, USPTO uses ML to “read” the text of patent applications and compare it to textually similar innovations included in ever-expanding databases of ‘prior art,’ or all the existing patents, patent applications, publications and documentation related to applicant’s product. That process allows examiners to determine if there is a “silver bullet” -- an already-patented invention whose existence would essentially kill this specific application.

To make their innovations appear new, some unscrupulous applicants try to game the system by including extraneous information or omitting relevant citations.  They can also create hyphenated words and assign new meanings to existing words to better explain their novel and non-obvious inventions. The introduction of new vocabularies makes it “exceptionally difficult for ML to make reliable predictions about a future that is unfamiliar to its training dataset,” the researchers said.

The researchers found that it is “practically impossible” to train an ML algorithm to spot incomplete or inconsistent patent applications on its own – even given the fact that the technology can learn and correct for manipulations it finds.

The ML benefitted strongly, they said, from collaboration with humans -- but not just any humans. Those with broad skills and deep knowledge in a specific domain, those who can draw on relevant outside information and those with “vintage specific skills” -- in this case meaning long experience using ML technologies -- can better mitigate bias stemming from applicant manipulation.

“The promise of ML technology in the patent examination context lies in its ability to make superior predictions by identifying a narrower, more relevant distribution of prior art,” the researchers concluded. “However, when patent applications are characterized by (plausibly strategic) input incompleteness, the ML technology may be more likely to make biased predictions without domain specific expertise.”

The full paper, “Machine Learning and Human Capital Complementarities: Experimental Evidence on Bias Mitigation,” by Prithwiraj Choudhury of Harvard Business School, and Evan Starr and Rajshree Agarwal of the University of Maryland’s Robert H. Smith School of Business, can be found here.

About the Author

Susan Miller is executive editor at GCN.

Over a career spent in tech media, Miller has worked in editorial, print production and online, starting on the copy desk at IDG’s ComputerWorld, moving to print production for Federal Computer Week and later helping launch websites and email newsletter delivery for FCW. After a turn at Virginia’s Center for Innovative Technology, where she worked to promote technology-based economic development, she rejoined what was to become 1105 Media in 2004, eventually managing content and production for all the company's government-focused websites. Miller shifted back to editorial in 2012, when she began working with GCN.

Miller has a BA and MA from West Chester University and did Ph.D. work in English at the University of Delaware.

Connect with Susan at [email protected] or @sjaymiller.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected