23-09-2013, 04:30 PM
Continuous online extraction of HTTP traces from packet traces
Continuous online extraction .pdf (Size: 63.28 KB / Downloads: 34)
Introduction
To improve the performance of the network and the network protocol it is important to characterize the
dominant applications [4, 8, 9, 12, 19, 22, 23]. Only by utilizing data about all events initiated by the
Web (including TCP and HTTP events) can one hope to understand the chain of performance problems that
current Web users face. Due the the popularity of the Web it is crucial to understand how usage relates to the
performance of the network, the servers, and the clients. Such comprehensive information is only available
via packet monitoring. Unfortunately, extracting HTTP information from packet sniffer data is non-trivial
due to the huge volume of data, the line speed of the monitored links, the need for continuous monitoring,
and the need to preserve privacy. These needs translate into requirements for online processing and online
extraction of the relevant data, the topic of this paper.
The software described in this paper runs on the PacketScope monitor developed by AT&T Labs[1].
The PacketScope is deployed at several different locations within AT&T WorldNet, a production IP net-
work, and AT&T Labs-Research. One PacketScope monitors T3 backbone links, another PacketScope may
monitor traffic generated by a large set of modems on a FDDI ring or traffic on other FDDI rings, another
PacketScope monitors traffic between AT&T Labs-Research and the Internet. First deployed in Spring 1997,
the software has run without interruption for weeks at a time collecting and reconstructing detailed logs of
millions of Web downloads with less than a worst case of 0.3% packet loss.
The rest of this paper is organized as follows. Section 2 discusses the advantages of packet sniffing and
Section 3 outlines some of the difficulties of extracting HTTP data from packet traces. The overall software
architecture is described in Section 4. Our solution is presented in Section 5 and finally Section 6 briefly
summarizes some of the lessons learned.
Strength of packet monitoring
There are many ways of gaining access to information about user accesses to the Web:
from users running modified Web Browsers;
from Web content provider logging information about which data is retrieved from their Web server;
from Web proxies logging information about which data is requested by the users of the Web proxy;
from the wire via packet monitoring.
While each of these methods has its advantages most have severe limitations regarding the detail of
information that can be logged. Distributing modified Web browsers to a representative sample of consumers
and having them agree to monitor their browsing behavior is problematic, especially since Microsoft Internet
Explorer and Netscape' s browser became more popular than Mosaic and Lynx.
Packet Monitoring Software
The hardware and software design for the monitoring system was driven by the desire to gather continuous
traces without downtime on a high speed transmission medium. The monitor should be deployable even on
backbone links. Due to the asymmetric routing, common in todays Internet, backbone links may only see
packets of one direction of a TCP connections.
Hardware design: The hardware of the AT&T Packetscope [1] consists of standard hardware components,
a Dec Alpha 400 Mhz Workstation with a 8 Gigabyte Raid disk array and a 7 tape DLT tape robot. For more
details on the hardware architecture see Figure 1. Several security precaution have been taken, including
using no IP addresses and using read only device drivers. The Dec Alpha platform was chosen because of
the kernel performance optimizations to support packet sniffing by Mogul and Ramakrishnan [20].
Summary
The most important lesson is: expect the unexpected. It is crucial to avoid assumption about how well-
behaved either the clients or the servers or the network might be. They aren' t. Other common lessons from
the implementation include: don' t try to do too much processing in the time-critical steps of the logfile
extraction; simplify wherever sensible and reasonable; reduce memory use and disk I/O.
But in the end the most crucial lesson was to never expect a perfect logfile. There will always be one
more exception or one more misbehaved client/server. Therefore the analysis program should test whatever
assumption the data has to satisfy and should eliminate any data that violates the assumption. If one spends
enough care looking into possible reasons for exceptions the number of requests that are discarded by this
step is small.