E-Government run amok!

Handling the data influx - A new model protects agency servers from an online flood

National Science Foundation - funded researchers have found a way for agencies to accept large amounts of data arriving simultaneously from many sources.
The technique greatly reduces the number of servers an agency must deploy for peak loads by requiring only a receipt for the arriving material, rather than the material itself.
Agencies that experience a crush of incoming material just before a given deadline - such as the Internal Revenue Service at tax time - would find the technology useful, said Leana Golubchik, a University of Southern California professor of computer science who is one of the re-searchers on the project. Golubchik presented the technology at this year's Digital Government Research conference, sponsored by NSF.


'In the research literature, there is a lot of work on data dissemination, but there really isn't any work out there on the collection of data from a system's point of view,' said Golubchik. 'What we are looking for is to provide good performance to both sides - to the agencies collecting the data and to the users that are submitting this data.'
The technology requires end users to upload their data to intermediary computers located around the network, called Bistros. For each submission, the Bistro generates a checksum - a unique number that verifies the integrity of a block of data - and sends it to the agency. The agency records the time the checksum was received. The checksum ensures the file hasn't been tampered with; if someone changes the contents of the file, that file's checksum will be changed as well.


Since checksums are much smaller than the original files, the agency's servers won't be overloaded with traffic as users try to submit their material all at once, Golubchik said. The Bistros can then upload its material in a more gradual manner after the deadline has passed.
Any computer on a network can act as a Bistro, even the end-users' computers themselves. To ensure that the files stay intact while on the network, each Bistro breaks the files it receives into multiple parts and spreads them around to other Bistros. They are broken up in a redundant manner so not all the individual pieces are required to reconstruct that file. Spreading out the files among multiple nodes ensures that if any Bistro computer goes offline, the file can be recovered from other computers.

In the first half of 2004, when the Environmental Protection Agency solicited public feedback for a proposed rule that would limit mercury output from power plants, it received almost 540,000 comments over the Internet. A staff of 15 people was assigned to make sense of this veritable mountain of e-mail.

Of these comments, almost 173,000 were electronic form letters that came from a single Web site'Moveon.org. Other advocacy groups also contributed a fair amount of the e-mail. These special-interest groups normally set up a Web page'or hire companies to provide such services'that allow citizens to easily send e-mail protesting a proposed regulation. In the EPA's case, few of these dispatches had any original comments.

Such is the emerging nature of electronic communications. On behalf of the populace, Congress makes the laws and the agencies write the regulations interpreting those laws. Such bodies consider public feedback, but historically input from the populace has been limited by the natural barriers between lawmakers and the public'the effort involved in writing a letter or visiting a representative from Congress. Now, thanks in large part to e-government initiatives, e-mail and Web sites offer an easier way for making one's view known. How will agencies grapple with the possible influx of communications? Will such input even be beneficial for decision-makers?

These are questions now being addressed by the National Science Foundation's Digital Government Research Program, an NSF research effort to investigate ways computer information sciences can improve government. The Digital Government Research conference, held this year in Atlanta, showcased research and technologies that the agency funded to address the challenge of citizen interaction.

When NSF started the Digital Government program, many agencies were still thinking of the Internet as a one-way conduit, with the Web site acting as a kind of electronic brochure, said Lawrence Brandt, program manager for the NSF's Digital Government program. 'Now I see a lot more citizens interacting with the government,' Brandt said.

That's not to say that wider avenues of communication will be entirely beneficial. The bulk e-mail that arrived at EPA was 'symbolic of where public comment is going,' concluded Stuart Shulman, an assistant professor of information sciences and public administration at the University of Pittsburgh who studied the EPA e-mail with an NSF grant. 'It is not a pretty picture.'

Information overload


To understand what sorts of e-mail agencies were receiving during an e-rulemaking process, Shulman's team analyzed electronic input from three different sets of comments submitted by the public for proposed regulations.

Of the 1,000 e-mails that Shulman sampled from the EPA feedback, only 174 had original material unaltered, he said. The rest were form letters submitted by citizens via Web pages set up by advocacy groups. The team grouped e-mails that had the same basic bodies of text and then did a Web search for that text.

Inevitably, the search would point to a Web site run by an advocacy group. Of the form dispatches they received, many had 20 words or less of original material. One quoted a lyric from Bob Dylan; others berated EPA for being a pawn of big business. Few had any input that could be viewed as useful.

'This, to my mind, is one of the sources of the problem; the trouble that rule makers face in this environment,' Shulman said. Special-interest groups like to drum up large numbers of e-mail when they find a particular rule objectionable, Shulman said. They can tell reporters or the courts that hundreds of thousands of people had protested a particular regulation.

'The technology makes it easier and easier to move crowds,' Shulman said.

Pity the government office that must acknowledge these protesting crowds. Yet it is required. Fortunately, this is another focus of NSF's research'making tools that will help agencies fight fire with fire, so to speak. The Digital Government program has dedicated funding to building software that would help agencies sort through the volumes of electronic dispatches they receive based on the characteristics of the e-mail described by the work of Shulman's team.

At the conference in Atlanta, Carnegie Mellon University professor Jamie Callan and graduate student Hui Yang presented software they developed for sorting out 'near-duplicate' e-mails. Near-duplicate e-mails are particularly problematic since the agency may group them together and discard them, not reviewing unique and useful information buried somewhere in the text.

'Detecting copies of a form letter is easy, but a modified version is harder to detect,' Yang said. Callan and Yang's software algorithms can detect and group similar e-mails and then can highlight the unique comments in each message.

Researchers at Stanford University in California have also developed software that could help agencies sort through large amounts of e-mail. Gloria Lau, a researcher on the project, outlined how Stanford's software could categorize a batch of e-mails by the bill provision of each message addresses. An agency worker could use the software to sift through thousands of e-mails to quickly determine which sections within a proposed policy were controversial, Lau said. This in turn would allow agency workers to better summarize and make sense of the public feedback.

Talking to the wind


Even with such tools at their disposal, would agencies find use in the flood of input they receive? This question forces policy makers to reconsider some of the basic tenets of how the country is run.

Laboring under the notion of a representative democracy, public officials now act as proxies for the public, to vote on issues that citizens themselves may not have the time or inclination to thoroughly understand themselves. But even if citizens could get more involved through the magic of technology, would they make the right choices? This question was the focus of an NSF-funded study undertaken by Vincent Price, Joseph Cappella and a number of graduate students at the University of Pennsylvania.

'We're testing assumptions in democratic theory,' Price said.

The team assembled a group of volunteers to discuss, in Web chat rooms, policy issues in health care, such as the lack of insurance coverage for many Americans. The group consisted of both U.S. citizens and a number of health care policy experts (the 'elites,' as Price called them) from agencies such as the Health and Human Services Department, as well as from the private sector.

'A number of policy analysts have said that in public policy formation process, the elites dismiss the views and interests of the very people for whom these policies are promulgated, largely because [the people] are thought to be insufficiently knowledgeable to be trusted with input,' Price said.

Initially, the volunteers felt that the health care policy issues were indeed too complicated. But as the volunteer group met online and talked about them, they seemed to gain a greater understanding of the issues. Although the research team is still analyzing the results of the forums, Price noted that 'online deliberation clearly does increase the strengths of opinions and the number of opinions they formed about health care policy issues,' Price said.

Online forums of this nature could possibly help citizens become better informed about an issue, which would make for better public deliberation, Price said.

Such forums could also reduce the burden of the formal rule-making process. Instead of sifting through thousands of e-mails to discover what people think about an issue, an agency could hold online meetings beforehand, allowing the public to work through the issues and the agency to pinpoint the pertinent issues around a proposed rule.

With the introduction of Web technologies such as chat rooms and e-mail, 'We have the ability to provide access to government,' said Jane Fountain, who is the director for the National Center for Digital Government at the University of Massachusetts, Amherst. The question that agencies must grapple with, she said, is which segments of the population will make best use of that access.

'What we are looking for is to get meaningful input from knowledgeable people,' Brandt said.

Technology in action


'Rule-making involves public participation. It is the ultimate essence of democracy,' said Neil Eisner, the Assistant General Counsel for Regulation and Enforcement at the Transportation Department. 'We get the public to help us decided what the rules are saying.'

With all its subagencies, Transportation may well be the largest rule-making agency in the government, he said. The Federal Aviation Administration alone issues up to 6,000 rules a year.

Despite the possibility that giving citizens the ability to comment online about proposed rules could result in a severe case of information overload, Eisner is bullish on using technology to hasten rulemaking process. He oversees the electronic docket system, called the Docket Management System, that the agency put in place to automate the flow of the rule-making process. EPA has a similar system called EDocket, which it shares with other agencies, including the Homeland Security and Housing and Urban Development departments.

Transportation's docket system cost about $4 million to create when it was developed in the mid-1990s, Eisner said, but it reduced the support staff from 24 full-time people to about 14 full-time people. 'We are saving almost $1.3 million a year,' Eisner said.

Despite these gains, Eisner sees the need for new technologies that would allow Transportation to make better use of the data it receives. Better intelligence tools are needed to summarize public comments, ones that would organize them into topic areas. The software should be able to present individual sections of a proposed regulation to the user, so he or she could comment on just that section. Another nice feature, Eisner said, would be interactive question-and-answer software.

'When the commenter says 'Your rule is stupid,' ' Eisner said, 'the computer should ask 'Can you tell me why it's stupid?' '

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above