Army center enhances intranet with text-mining app

SemioMap
can recognize HTML and Notes documents.





Using SemioMap, the Center for Army Lessons Learned at Fort Leavenworth, Kan., has
mapped 50,000 pages’ worth of text onto a single intranet screen.


The Web text-mining application from Semio Corp. of San Mateo, Calif., represents key
concepts in large volumes of unstructured text data without any initial tagging.


Semio’s just-released SemioMap 2.1 handles direct connections to Lotus Development
Corp. Domino or Notes servers and Web servers.


“Text mining is the logical extension of data mining,” said Claude Vogel,
Semio chairman and chief executive officer. Analogous to data mining of structured
numerical data, it complements search tools such as Web crawlers, intranet search engines,
document management systems and push technology applications.


SemioMap combines lexical processing, information clustering and graphical display with
the company’s patented semiotics, which Vogel said is the formal study of signs
carried by patterned communications.


SemioMap displays the relationships it uncovers as a graphical map directing users to
information hidden in massive volumes of text. The database that users build with SemioMap
contains all the concepts, relationships, patterns and links back to documents, Vogel
said.


As in data mining, he said, you don’t need to know at the outset what you are
looking for.


SemioMap also can work on unstructured data in transactional systems, said Gail
Claspell, Semio marketing director. It uses the structured fields in transactional systems
“as a way of slicing and dicing the data in the unstructured fields,” Claspell
said.


The core functions of SemioMap are written in C. Administrative interfaces and client
software are in Java.


Semio used technology from Inso Corp. of Boston and Adobe Systems Inc. of San Jose,
Calif., to make SemioMap recognize most document formats, including Hypertext Markup
Language and Lotus Notes documents.


Vogel said Semio maps can combine any of the document file formats for exchange via
e-mail.


SemioMap 2.1 starts at $5,000 with a Java applet client and server software that runs
under SunSoft Solaris or Microsoft Windows 9x and NT.


An entry-level server license for processing a 100M data set is $15,000, Claspell said.


Semio provides support via e-mail only, he said.


Although there is no limit on the number of content maps users can create with
SemioMap, the largest data set it can process at any given time is 7G, Claspell said.
Future versions of the software will support 30G data sets.


A view of the Army’s SemioMap application appears at http://call.army.mil.


Contact Semio at 650-638-3330. 

inside gcn

  • high performance computing (Gorodenkoff/Shutterstock.com)

    Does AI require high-end infrastructure?

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above