Schema ID: www
Timestamp Field: time
In this DLF schema, each record represents a request to the web server. It has the equivalent information than the common log format supported by most web servers.
Fields in the Schema
Type: hostname
Defaults: -
The hostname (or ip address) of the clients that made the request.
Type: string
Defaults: -
If the request was authenticated, this field should contains the name of the authenticated user. Not that there is no indication of which authentication method was used (RFC1531, WWW authentication, etc.).
Type: string
Defaults: -
The numeric result code of the request. That's is 200, 301, etc.
Type: bytes
Defaults: -
The number of bytes sent to the client during the request.
Type: string
Defaults: -
The method used by the client for the request. That is usually one of GET, HEAD, POST, etc.
Type: url
Defaults: -
The URL that was requested by the client.
Type: string
Defaults: -
The protocol used by the client. It should usually be one of HTTP/1.0 or HTTP/1.1.
Type: timestamp
Defaults: 0
The time of the request.
Type: string
Defaults: -
The content of the Referer header that was sent along the request. That usually represents the referring URL, that's the URL which the user was browsing when this URL was requested.
Type: string
Defaults: -
The content of the User-Agent header that was sent along the request. That usually contains information the web browser used by the client.
Type: string
Defaults: -
When automatic compression is used, this should contains the result code from the compression submodule.
Type: int
Defaults: 0
When automatic compression of the results is used, this field should contains the compression ratio achieved.
Schema ID: www-attack
Base Schema: www
Module: Lire::Extensions::WWW::AttackSchema
Required Fields: requested_page
This is an extended schema for the WWW service which tries to find common web attack based on the requested URL.
Schema ID: www-domain
Base Schema: www
Module: Lire::Extensions::WWW::DomainSchema
Required Fields: client_host
This is an extended schema for the WWW service which adds a country and client_domain fields based on the client host.
Schema ID: www-robot
Base Schema: www
Module: Lire::Extensions::WWW::RobotSchema
Required Fields: None
This is an extended schema for the WWW service which adds a robot field based on information from the domain name or the user_agent string.
Schema ID: www-search
Base Schema: www
Module: Lire::Extensions::WWW::SearchSchema
Required Fields: referer
This is an extended schema for the WWW service which analyze the referrals. It extract the referring sites and it also determines if it was a request from a search engine.
Fields in the Schema
Type: string
Defaults: -
The site which reffered that request. This is usually an hostname, but it can also be bookmarks for when the user used a bookmark.
Type: string
Defaults: -
The name of the search engine, when the request was referred through a search engine.
Type: string
Defaults: -
The search phrase used when the request was referred through a search engine.
Schema ID: www-url
Base Schema: www
Module: Lire::Extensions::WWW::URLSchema
Required Fields: requested_page
This is an extended schema for the WWW service which parses the requested URL and adds several fields based on this information.
Fields in the Schema
Type: filename
Defaults: -
The portion of the requested URL that represents a filename. That is everything that comes before the ? which starts the QUERY_STRING.
Type: string
Defaults: -
The extension of the requested file.
Type: filename
Defaults: -
The directory portion of the URL.
Schema ID: www-user_agent
Base Schema: www
Module: Lire::Extensions::WWW::UserAgentSchema
Required Fields: useragent
This is an extended schema for the WWW service which adds fields to access information from the user_agent field.
Fields in the Schema
Type: string
Defaults: Unknown
The browser that was probably used to make the request as guessed from the user_agent field.
Type: string
Defaults: Unknown
The client's operating system as guessed from the user_agent field.
Type: string
Defaults: Unknown
The client's language as guessed from the locale's information contained in the user_agent field.
Schema ID: www-user_session
Base Schema: www
Module: Lire::Extensions::WWW::UserSessionSchema
Required Fields: time, client_host
Timestamp Field: session_start
This is a derived schema for the WWW service which represents user session. User sessions tracks the traversal of users through the web site. Users are tracked using their IP address and their user agent information. This is not a full proof method. For one thing, it clearly fails in the case of users having an homogeneous environment and browsing from behing a proxy server.
Possible enhancements would be to use tracking information from a cookie.
The session represent all the consequential requests made by a user. The session will end after 30 minutes where no requests was made by the user.
Fields in the Schema
Type: string
Defaults: -
This field contains an arbitrary session identifier.
Type: timestamp
Defaults: 0
The time at which the session started.
Type: timestamp
Defaults: 0
The time of the last request in the session.
Type: duration
Defaults: 0
The length elapsed between the first and last requests.
Type: int
Defaults: 0
The number of pages requested by the user in this session. (This excludes requests ending in .png, .jpg, .jpeg, .gif and .css.)
Type: int
Defaults: 0
This gives the number of requests by the user
Type: filename
Defaults: -
The first page requested by the user. (See page_count for exlusion.)
Type: filename
Defaults: -
The 2nd page requested by the user.
Type: filename
Defaults: -
The 3rd page requested by the user.
Type: filename
Defaults: -
The 4th page requested by the user.
Type: filename
Defaults: -
The 5th page requested by the user.
Type: filename
Defaults: -
The last page requested by the user.
Type: bool
Defaults: -
Was this session completed? A completed session is one that we know for sure that if the user made another request, it would have been in a new sesssion. Concretely, all requests made in the last 30 minutes of the period covered by the log file will be part of uncompleted sessions.
Type: int
Defaults: 0
This starts at 1 for the first session of a user in the log file and will be incremented for each new session started by that user in the same log file.