|
Hosting
Manual: IN THIS
SECTION:
Introduction
Your account comes with HTTP-Analyze preinstalled and configured. HTTP-Analyze is a log analyzer for web servers. It analyzes the logfile of a web server and creates a comprehensive summary report from the information found there. http-analyze has been optimized to process large logfiles as fast as possible. In easier-to-understand terms, HTTP-Analyze is a very powerful traffic analyzer that quickly and efficiently delivers you statistics on the traffic that your web pages have generated. It has a user-friendly graphical user interface (GUI) that by a click of your mouse button will produce your traffic reports. Below we explain in more detail how this powerful software works with your web site, as well as provide you with definitions to the results you'll receive. The web server is a program running on a networked machine, waiting for connections from the outside world to serve certain documents on behalf of a request by a browser. To communicate, the server and the browser use an asynchronous communication method called the HTTP (hypertext transaction) protocol. It works as follows:
The
document delivered as an answer to this request may contain inline
objects. Inline objects are simply URLs pointing to another resource,
either a document, an image, an applet, a video/audio stream, or
any other addressable HTML object.
The
browser then requests all inline objects of the current
page from the server using the steps 2 and 3 above, before
it can display the content of that page.This communication method is called asynchronous, because the browser sends out many requests for inline documents at once (without waiting for a response from the server before sending the next request) using different communication channels: Since
the browser's requests are often handled by different
server processes or different threads of a server process,
there is absolutely no relationship between the logfile
entries caused by the responses from the server due to
a request of a document and it's inline objects.For example, the order in which the server logs the successful transmission of the document itself and the inline images contained therein is not predictable and depends on the type of documents, objects, server speed, system and network load, and many other parameters. What is logged?
Each and every response from the server - whether it indicates success, an error, or even a timeout (i.e. no response) - gets logged in the server's logfile. Since the server was hit by a request, such a response is called a Hit. In other words, the total number of hits must equal the total number of lines in the logfile minus the number of corrupt and empty lines. A typical logfile entry in the Common Logfile Format looks like: hostname-[01/Feb/1998:10:10:00 +0100]"GET/index.html HTTP/1.0"200 4839 The hostname field
contains the full qualified domain name (FQDN) of the site accessing
your server (see »Special Cases« below). The next two
fields usually contain a minus (`-') to indicate that those fields
are empty. The date is surrounded by square brackets ('[' and ']').
The next field contains the request. It contains the request method ('GET'
for example), the name of the requested document (URL), and
the protocol specification ('HTTP/1.0'). CLF Mozilla/2.0 (X11; IRIX 6.3; IP22) http://foo/bar.html CLF "http://foo/bar.html" "Mozilla/2.0 (X11; IRIX 6.3; IP22)"
Note
that in the second form, the user-agent and the referrer
URL are surrounded by double quotes, which makes them ambiguous
in certain cases such as erroneous referrer URLs, which contain double
quotes. Therefore, the first form should be preferred if possible.
The entries shown above are the only information the server records in the logfile. There might be much more information being transferred from the browser to the server, but although this additional information is available through CGI-scripts running on your server, it gets not logged in the logfile. Therefore, http-analyze can only show you a summary of the information in the logfile - nothing more, nothing less. Special Cases
Caching in the browser: As soon as a page has been saved in a browser's disk cache, the browser might send out conditional requests for documents or inline objects. This conditional request ask the web server to only send a document/object if it has been modified since the last time the page has been requested (if the page is still in the browser's cache). This way, network traffic is reduced somewhat, since documents must be transferred only if they have changed recently. If such a conditional request arrives, the server will respond with a Code 304 (Not Modified) status to indicate that the document hasn't changed or with a Code 200 (OK) status if it has changed in the meantime. Since the browser may be configured (and usually is so by default) to only send out such conditional requests once per session and otherwise unconditionally use the copy from the cache, you may not even see a Code 304 response if this users visits your site again in the same session. Conditional requests are then sent out only if the user terminates the browser session and later restarts the browser. Caching in a proxy server: Organizations
with a large number of users - such as companies, universities, or
online providers - often use a so-called proxy server for
mainly two reasons:
Both
forms of caching make it technically impossible to count visitors
or to track their way through your web site. All you see in the logfile
of your server is only a few initial hits from the proxy or browser
and probably some Code 304 responses resulting from conditional
requests sent out by the proxy or browser, depending on the preferences
settings of the proxy or browser.
Definition of Terms
The statistics report contains among others the following information: the
number of hits, 304's, files, pageviews, sessions, data sent (in
KB)
the
amount of data requested, transferred, and saved by cache (in KB)
the
number of unique URLs, sites, and sessions per month
the
number of all response codes other than 200 (OK)
the
average hits per weekday and for last week
the
maximum/average hits per day and per hour
the
number of hits, files, 304's, sites, data sent by day
the
top 5 days, 24 hours, 5 minutes and 5 seconds of the summary period
the
top 30 most commonly accessed URLs (hits, 304's, data sent)
the
10 least frequently accessed URLs (hits, 304's, data sent)
the
top 30 client domains accessing your server most often
the
top 30 browser types
the
top 30 referrer hosts
the
overview/detailed list of all files requested
the
overview/detailed list of all sites by domain and reverse domain
the
overview/detailed list of all browser types
the
overview/detailed list of all referrer URLs
The following table summarizes the meaning of all terms in the statistics report which are not self-explaining:
1 shown only on the total summary page.
|
The
browser then requests all inline objects of the current
page from the server using the steps 2 and 3 above, before
it can display the content of that page.
Since
the browser's requests are often handled by different
server processes or different threads of a server process,
there is absolutely no relationship between the logfile
entries caused by the responses from the server due to
a request of a document and it's inline objects.