|
Carlton Lovegrove
An important component of any ecommerce initiative
is to track the effectiveness of the marketing effort.
Through careful analysis of a web site's statistics
much information can be gleaned that can be further
used to fine tune the advertising, web site content,
and customer relationship management strategies and
policies. These are all important elements of Internet
Marketing plans and strategies that can ultimately dictate
the success or failure of any ecommerce initiative.
Surfing the World Wide Web involves traversing the
connections among hyperlinked documents. It is one of
the most common ways of accessing web pages. Theories
and models are beginning to explain how observed patterns
of surfing behavior emerge from fundamental human information
search processes. Therefore, the ability to predict
surfing patterns has the potential to be instrumental
in solving many problems facing producers and consumers
of web page content. For instance, web site designs
can be evaluated and optimized by predicting how users
will surf through their structures. Web client and server
applications can also reduce user perceived network
latency by pre-fetching content predicted to be on the
surfing path of individual users or groups of users
with similar surfing patterns. Systems and user interfaces
can be enhanced by the ability to recommend content
of interest to users, or by displaying information in
a way that best matches users' interests. Proper analysis
of a web site's activity is therefore an important process
that supports an enhanced and intelligent design of
a web site.
A common and popular source of tracking data and statistics
for any website is the log file on the web server. Most
web servers have a system for recording all requests
for web site objects to a log file. The data in the
log file indicates which objects were requested, when,
and information about whom or what requested them. Therefore,
with the appropriate software that is used to process
this data, company managers and executives can measure
the success of their websites and develop appropriate
strategies to address weaknesses and enhance their prospects
for future success by assessing their site's visibility
(the ease with which customers can locate your
site), navigability (the paths that customers
use to navigate through your site), and the usability
(how easy is it for customers to use your site).
However, complete reliance on data collected in log
files has its pitfalls, some of which will be discussed
in this article. Other tools such as tracking counters
help overcome some of the problems encountered with
log file analysis. Therefore, an intelligent selection
of site statistics software requires the ability to
recognize the strengths and weaknesses of each tool
in order to effectively strike a balance that realizes
the missions and goals of your organization. Understanding
the statistics provided by web site analysis software
is critical in order to properly interpret, evaluate,
and design subsequent marketing strategies.
Log file data
While web servers have the ability to record vast
amounts of information, relatively few fields are typically
recorded. Several formats have evolved from the Common
Logfile Format (CLF), including the Extended Logfile
Format (ECLF) as well as a variety of customized formats.
For the most part, the following fields are recorded
by web servers:
- the time of the request in seconds,
- the machine making the request is recorded as either
the domain name or IP address,
- the name of the requested URL as specified by the
client,
- the size of the transferred URL, and
- various HTTP related information like version number,
method, and return status.
Various web servers also enable other fields to be
recorded, the most common of which are:
- the URL of the previously viewed page (the “referrer”
field),
- the identity of the software used to make the request
(the “user agent” field), and
- a unique identifier issued by the server to each
client (typically a “cookie”).
Understanding how all of this data is interpreted
and displayed in a user readable format for subsequent
decision analysis is an important component of any statistical
analysis. It is therefore crucial that users be aware
that there are different ways that the statistical analysis
software can present the data to you. Subsequent sections
of this article address some of the important decisions
that the statistical analysis software must make when
creating reports on your web site activity.
URLs and Referrer Fields
While these fields are useful to analyze and provide
reasonable characterizations, several limitations make
analysis difficult when attempting path reconstruction
efforts. The URL recorded is the URL as requested by
the user, not the location of the file returned by the
server. This behavior can cause false tabulation for
pages when the requested page contains relative hyperlinks,
symbolic links, and/or hard coded expansion/translation
rules, e.g., directories do not always translate
to “index.html.” It also can lead to two paths being
considered different when in actuality they contain
the same content. While both pieces of information are
useful, the canonical file system-based URL returned
by the server would arguably be more useful as it removes
the ambiguity of what resource was returned to the user.
In addition, the content of the information contained
in the referrer field can be quite varied. Various browsers
and proxies do not send this information to the server
for privacy and other reasons. In addition, the value
of the referrer field is undefined for cases in which
the user requests a page by typing in the URL, selects
a page from their Favorites/Bookmarks list, or uses
other interface navigational aids like the history list.
Furthermore, several browsers provide conflicting values
for the referrer field. To illustrate, suppose a user
selects a listing for the Dell Corporation on Yahoo.
In requesting the Dell splash page, the URL for the
page on Yahoo is provided as the value for the referrer
field. Now suppose the user clicks on the Products page,
returns to the Dell splash page, and reloads the splash
page. In several popular browsers, the referrer field
for Yahoo is included in the second request for the
Dell splash page although the last page viewed on the
user's surfing path was the Product page in the Dell
site. If one chooses to reconstruct paths by relying
upon the referrer field, the paths of two users may
be identified instead of only one. Given these limitations,
strong reliance upon the information in the referrer
field may be more problematic than one would initially
expect.
User Agent Fields
The user agent field also suffers from imprecise semantics,
different implementations, and missing data. This can
partially be attributed to the use of the field by browser
vendors to perform content negotiation. Given that the
rendering of HTML differs from browser to browser, servers
have the ability of altering the HTML based upon which
browser is on the other end. Consequently, the user
agent field may contain the name of multiple browsers.
Some proxies also append information to this field.
In addition, the value of the user agent field can vary
for requests made by the same user using the same Web
browser. Adding to the confusion, there is no standardized
manner to determine if requests are made by autonomous
agents (e.g., robots), semi-autonomous agents acting
on behalf of users (e.g., copying a set of pages for
off-line reading), or humans following hyperlinks in
real time. Clearly, it is important to be able to understand
these classes of requests when attempting to model surfing
behaviors.
Page
2
|