Storage space is tight at PTO
Storage space is tight at PTO
With volume of patents growing fast, office tackles organization and access
By Patricia Daukantas
The number of U.S. patents passed the 6 million mark during 1999, and the Patent and Trademark Office is grappling with electronic organization and access to all the records.
PTO's Web site, at www.uspto.gov, shows a .TIF image of the 6 millionth U.S. patent issued'to 3Com Corp. for HotSync.
The number of patent applications climbs by as much as 14 percent every year, said Wesley H. Ge-wehr, PTO's administrator for information dissemination. Patent filings themselves are getting bigger and require more storage room. Biotechnology gene se-quence patent applications, for example, typically run to thousands of pages.
The office maintains two sets of electronic storage systems. One is accessible via the Web and the other is behind the agency's firewall, Gewehr said.
The Web-accessible storehouse, at www.uspto.gov/patft
, lets anyone obtain a copy of any patent issued since 1976. The national system of PTO depository libraries provides access to separate data warehouses behind the agency's firewall.
The Web-accessible database has more than 2.2 million patents dating from 1976. PTO updates it weekly as new patents are issued, Gewehr said.
The searchable, text-only part of the database displays short summaries of patents, called the bibliographic file, and full-text versions. Altogether, the text file runs to 800G.
Each patent in the text file has a corresponding facsimile of the original patent document, Gewehr said. The image file now holds 23 million images of pages, or 2T worth of .tif files.
Also searchable via the Web are 1.3 million trademarks, amounting to 50G of text and images, Gewehr said.Three locations
PTO's Web-accessible databases are spread out over three physically separate sites. A contractor, Dataware Technologies Inc., hosts the full-text database site in Albany, N.Y., said Larry Larson of PTO's Office of Electronic Information Products. The 70G bibliographic database resides on a single system in Research Triangle Park, N.C.
PTO houses the 2T image file at its facility in Arlington, Va., on RAID storage systems from EMC Corp. of Hopkinton, Mass. The database runs on Hewlett-Packard 9000 K-Class midrange servers under HP-UX.
PTO took three days to make a full backup when the image system went up last March, Gewehr said. Since then, the agency has made only incremental backups.
The image file system's 9G drives are partitioned into 100 logical arrays, Larson said. The system allocates incoming patents to the arrays based on the last two digits of each patent number.
'The most recent patents tend to have the most activity, and we want to spread that activity across the storage devices,' Gewehr said.
Although Web searches are common today, PTO in 1991 pioneered the use of electronic searches through a bulletin board system, Larson said. The agency introduced Web searches in 1995.
PTO officials want to extend the Web-accessible databases to cover patents granted from 1975 back to 1790, the date of the first U.S. patent, Gewehr said. To accomplish that goal, which would more than double PTO's holdings outside the firewall, the agency is seeking $6 million for the project in fiscal 2001.
'We already have all the documents in image form,' Gewehr said. 'That's inside the shop'we just haven't been able to put it outside.'
The internal databases behind the firewall are accessible to PTO employees, primarily patent examiners and trademark attorneys, Gewehr said. The public can gain entry through PCs at a PTO public search office in Arlington and at 30 of the 85 depository libraries around the country.
Unlike the Web-accessible databases, the agency's internal patent databases are organized to meet the needs of patent examiners, who must determine whether applications represent original intellectual property. They are not suitable for casual searching.
'We have this delivery channel to afford the public an opportunity to use the same databases that the examiners use, and that's important to patent practitioners and applicants,' Gewehr said. 'By using the same search resource, they have some assurance of obtaining a result close to what the examiner is obtaining.'
Gewehr and Brooks Hunt, PTO's director of patent search systems, said the internal system does searches by patent classification and finds more comprehensive details than a keyword search would.
'You can search patents on the Web, but you can't do a patent search,' Gewehr said.
PTO's Order Entry Management System [GCN, Aug. 23, 1999, Page 38
] uses the internal databases to retrieve copies of patents and trademarks for public sale, Gewehr said.
The internal database has electronic text for 2.7 million patents going back to 1970, plus images of all 6.4 million patents back to 1790.
The 6.4 million includes the 6 million utility patents'inventions, business processes and the like'plus design, plant and reissued patents.
The patent text takes up 200G of storage, but the images fill up 6T, Gewehr said.
PTO also houses text and images for 2.5 million U.S. trademarks that are pending, registered, abandoned, canceled or expired.
Besides domestic patents and trademarks, PTO holds 5.1 million text abstracts and 6.8 million images of Japanese patents, plus 3.1 million European text abstracts and 8 million images. The foreign holdings total about 7T.
PTO exchanges electronic patent copies with other countries because disclosures elsewhere could bar granting of a U.S. patent, Hunt said.
Patent applications still circulate around PTO's offices on paper, but they are duplicated in a limited-access database that takes up 2.5T, Gewehr said. Applications remain confidential during the examination process.
A contractor, Reed Technology and Information Services, scans patent documents at its Horsham, Pa., facility and gives PTO the files in text, image and Adobe PostScript formats, Gewehr said.
PTO keeps its internal databases on the 11th floor of its Arlington building, in an EMC RAID system that takes up about an acre of floor space, Gewehr said.
For offsite storage, PTO backs up data on digital linear tape cartridges. Copies go to two offsite locations: one elsewhere in Arlington and one in the Iron Mountain/National Underground Storage Inc. facility in Boyers, Pa., north of Pittsburgh.