GILS report tells how to evaluate Web usage

As interesting as the GILS report might be, even more compelling is one of its
appendices that details innovative work done by the authors in measuring World Wide Web
site usage. Every webmaster should pay attention. There is now a way to find out more
about your users and the way they use your Web site.


The work was done by John Carlo Bertot of the University of Maryland, Baltimore County;
Charles R. McClure of the School of Information Studies at Syracuse University; William
Moen of the University of North Texas; and Jeff Rubin, a research associate at Syracuse.
McClure has been a leading Internet researcher and promoter for years.


Not everyone is aware of the availability of Web page usage data. Web servers
automatically generate and update four usage log files. The result is an incredibly useful
but incredibly lengthy set of files about the traffic at a site. The files can be as large
as 100M each day, so handling them is not easy.


The first of the files is an access log. It contains the date, time and IP address for
each user action, such as file downloading. This information permits, among other things,
calculation of the percentage of users coming from each domain type-for example, .com,
.gov or .edu.


Even more useful is the data on the path that a user takes through your site.


For example, if most users navigate to a page buried five levels deep, it's probably
time to move that page closer to the home page.


The second file, the agent log, identifies a user's browser version and operating
system. An administrator can use this data to keep tabs on user capabilities. If visitors
are still using old browsers, there is no reason to add a state-of-the-art feature quite
yet.


The third file is an error log. It lists dead links, missing files and aborted
downloads. If users are constantly aborting the downloading of certain files, something
may be amiss.


The last of the four files is the referrer log. It identifies the site that each user
came from.


If a webmaster learns that most traffic comes from a particular Web site, that reveals
something about the users. This information is helpful to federal webmasters.


The study includes an analysis of the statistics for one server at the Environmental
Protection Agency. The server had about 1.5 million hits in one week. The daily log files
for this server ranged in size from 8M to 26M. Just one week's worth of data exceeded 500M
in total. Analyzing this much information is not for the faint-hearted. The authors had to
develop new software to handle the work, and that took months.


If you are the administrator for a Web page-federal or otherwise-you should think about
doing this analysis. What better way to find out who your users are, where they are coming
from and what they are doing at your site?


The work may be hard, but the results are worth it. Feedback directly from users is
hard to get.


Web usage files will create some interesting problems under the Freedom of Information
Act. At least one agency has already received a request for the files. The Defense
Technical Information Center was asked to release 11 months worth of usage files that
totaled 18G of data. The request was rejected on privacy grounds and because disclosure
might undermine security.


The privacy argument is reasonable, and feds should dance lightly when dealing with
other issues regarding identifiable data. Of course, e-mail addresses could be excised,
but this is not simple when dealing with gigantic files. DTIC's security argument seems
more doubtful.


Whatever the motivation for the denial, a better response would be for DTIC to analyze
its own data and release the results.


I don't get excited very often about academic research, but this new paper is dynamite.
Don't just sit there. Find a copy and see how you might use the techniques to analyze and
improve your own Web site. The analysis will appear as an appendix to the report and will
also be published in October in Government Information Quarterly. If you can't wait until
the paper is out, contact McClure directly at cmcclure@mailbox.syr.edu.
 


Robert Gellman, former chief counsel to the House Government Operations Subcommittee on
Information, Justice, Transportation and Agriculture, is a Washington privacy and
information policy consultant. His e-mail address is rgellman@cais.com.


inside gcn

  • pollution (Shutterstock.com)

    Machine learning improves contamination monitoring

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above