No Junk

No Junk

PTO's Jay Pickens wants to clean up mailing lists to stop expensive, multiple mailings to the same addresses


Like most mass mailers, the Patent and Trademark Office doesn't want to waste time and money sending multiple envelopes to customers.

To clean up its mailing lists, a group within PTO's Search and Information Resources Administration (SIRA) is starting to use data-cleansing software that standardizes names, addresses and similar data fields.

The primary reason for seeking a data-cleansing product was to update PTO's mailing lists 'so we're not sending 10 pieces of mail to the same person,' said Jay Pickens, a SIRA computer specialist. 'It's expensive [for us] and aggravating for the recipient.'

PTO chose dfPower Studio from DataFlux Corp. of Raleigh, N.C.

To test the software, SIRA workers extracted patent holders' names and addresses from the primary patent database and then cleaned up the list with dfPower.

In its patent database, PTO maintains documents as they are received from patent applicants, Pickens said. Typographical errors lurk, as well as variant spellings. For instance, a company could call itself J.F. Smith Inc. in one application and John F. Smith Inc. in another.

'We could spend months doing matches by hand,' Pickens said.

Tony Fisher, president and general manager of DataFlux, said the dfPower product was designed to help end users get rid of corrupt or stale data.

The company also makes Blue Fusion, a software development kit for programmers.

For example, Fisher said, he could be listed in various databases as Anthony Fisher, Tony Fisher and Tony Fischer. The software finds these near matches and converts them to an enterprisewide, customizable standard in an organization's data repository.

Pickens and his colleagues ran a test of dfPower on a nearly 370,000-record data set with light matching criteria. They found that 335,000 of the records were duplications of 23,000 unique records. Another 34,000 records were truly unique.

Thus the 370,000 records were whittled down to 57,000, or only 15 percent of the original, Pickens said. If this had been a real mailing list rather than a test, the software would have saved a lot of money, he said.

Before PTO can deploy any software, the agency's Software Engineering Group must approve its addition to PTO's Technical Reference Model, Pickens said.

PTO technicians tested dfPower on Oracle Corp. database, Microsoft Access and Extensible Markup Language files'PTO's primary formats for data storage and retrieval. The DataFlux software has an extensive group of Open Database Connectivity drivers for linking to databases, Pickens said.

The Software Engineering Group gave dfPower its OK in late February, Pickens said. PTO will first use the software to update its mailing list of nearly 26,000 registered patent attorneys, he said.

In a test, the data-cleansing product found that 335,000 of the records were duplications of only 23,000 unique records.
PTO sends periodic mailings to alert registered attorneys and independent inventors about training classes, the new electronic filing process [GCN, Nov. 20, 2000, Page 1] and recent legislation, such as the American Inventor's Protection Act of 1999.

Another branch of PTO had maintained the mailing list. Last time SIRA used the list as is, it had a 30 percent return rate. 'To me, that's high,' Pickens said.

The software can't eliminate all duplications of names and addresses, but it can cut them back greatly, Pickens said. For example, dfPower recognizes the three-letter string 'IBM' regardless of whether or not the letters are separated by spaces.

DataFlux offers an auditing tool that database administrators can download free from the company's Web site, at, to see whether they have a dirty-data problem, Fisher said.

Pickens said dfPower works on minimally configured desktop PCs running 32-bit Microsoft Windows. The mailing list project probably will be done on a 450-MHz Pentium III PC from Micron Electronics Inc. of Nampa, Idaho, running Windows NT 4.0 and an Access database.

Still in testing stage

PTO is not using dfPower yet on an online database, and Pickens said he doesn't have plans to try it on a live database.

Several other divisions of PTO are interested in the product, but they haven't yet developed plans to test it, Pickens said.

'I'm sure that almost all of them have data that needs to be cleaned,' he said.
Pickens purchased five licenses of dfPower, two for SIRA and three for anyone else in PTO.
Pricing for the base dfPower software runs from $20,000 for a single-user license to $100,000 for an enterprise license.

DataFlux makes two add-on applications: dfPower Verify, which cleans addresses and ZIP codes to meet the Postal Service's Coding Accuracy Support System standards, and dfPower Match, which provides extended matching functions. The Verify and Match add-ons cost $20,000 and $15,000 respectively.

The company offers a 20 percent discount to agencies, Fisher said.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected