04-12-2012, 12:47 PM
ProgME: Towards Programmable Network MEasurement
1ProgME.pdf (Size: 2.07 MB / Downloads: 42)
ABSTRACT
Traffic measurements provide critical input for a wide range
of network management applications, including traffic engineering,
accounting, and security analysis. Existing measurement
tools collect traffic statistics based on some predetermined,
inflexible concept of “flows”. They do not have
sufficient built-in intelligence to understand the application
requirements or adapt to the traffic conditions. Consequently,
they have limited scalability with respect to the number of
flows and the heterogeneity of monitoring applications.
We present ProgME, a Programmable MEasurement architecture
based on a novel concept of flowset – arbitrary
set of flows defined according to application requirements
and/or traffic conditions. Through a simple flowset composition
language, ProgME can incorporate application requirements,
adapt itself to circumvent the challenges on scalability
posed by the large number of flows, and achieve a
better application-perceived accuracy. ProgME can analyze
and adapt to traffic statistics in real-time. Using sequential
hypothesis test, ProgME can achieve fast and scalable heavy
hitter identification.
INTRODUCTION
Accurate measurement of network traffic is a keystone of
a wide range of network management tasks, e.g., traffic engineering,
accounting, network monitoring, and anomaly detection.
A measurement tool, be it a dedicated hardware or
software running on routers or firewalls, collects statistics of
network traffic. Management applications use these statistics
to make network control decisions, such as re-routing
traffic, charging customer, or raising alarms to administrators.
The insights gained from traffic measurement are invaluable
to administrators in making informed decisions on
network planning or operations.
ARBITRARY FLOWSET
Traditionally, network statistics are collected based on the
concept of flows. A flow f refers to a set of packets that
have the same n-tuple value in their header fields. Let H :
{H1,H2, · · · ,Hn} denote the header fields used in the flow
definition. Typical definitions of flow include the 5-tuple of
H : {prt, sip, spt, dip, dpt} or the 2-tuple of H : {sip, dip} in
which prt is the protocol field, sip and dip are the source
and destination IP address and spt and dpt are the source
and destination port, respectively. Other header fields, e.g.,
Type-of-Service (TOS), could be used as well. A flow is
often used as the base unit for traffic measurement.
Underlying Data Structure
The string representation of flowset is not an ideal form for
complicated set operations. Following the approach used to
encode firewall rules and access lists in recent studies [21, 37],
we use binary decision diagram (BDD) [7] as the underlying
data structure for flowset (referred to as flowset label
hereafter). BDD is an efficient data structure that is widely
used in formal verification and simplification of digital circuits.
A BDD is a directed acyclic graph that can compactly
and canonically represent a set of boolean expressions.
Every bit of IP header corresponds to a BDD variable.
Collect and Report Statistics
Collecting traffic statistics is a simple two-step process.
Upon receiving a packet, FQAE first uses the same hash
function to extract the bits from the packet header and
lookup the TMC for a list of candidate flowsets. Then,
FQAE compares the packet to the flowset sequentially until
a matching flowset is found.
During the measurement process, FQAE performs trafficaware
optimization by sorting the order of candidates in
the TMC based on the number of packets observed earlier
(TrafficSort). Note that this seemingly simple optimization
is possible only because FQAE make flowsets fully disjoint.
If flowsets have non-empty intersections, finding the optimal
order is NP-complete, and one will have to resolve to
heuristics, as some have attempted in the context of packet
filtering [1, 18].
Based on the statistics collected for each sub-queries, answering
user queries requires a simple aggregation. Note
that the fundamental difference here, as compared to perflow
statistics, is that sub-queries are generated according
to user queries and we expect the number to be significantly
smaller than the number of flows in traffic.
RELATEDWORK
Online aggregation [2, 22] has received considerable attention
in the database community. A typical example is to find
the sum or average of a large number of objects. Instead of
running through a large number of objects and return an
accurate result after a long latency, such systems use statistical
methods to provide running (online) estimation so
that users can decide in real time whether to continue. If a
database of flow/packet records has been built, such a system
can be adapted to query a database of flow records. The
proposed flowset composition language (FCL) can be used
for efficient specification of user queries, and FQAE can be
used for aggregation on the database side.
There are several work on producing traffic summary or
identifying hierarchical heavy hitters. Aguri [8] is a traffic
profiler that aggregates small flow records (both temporal
and spatial) until the aggregated weight is larger than a certain
threshold. Autofocus [14, 32] is an traffic analysis and
visualization tool that finds both uni-dimensional or multidimensional
clusters from traffic trace or flow records.
CONCLUSION
In this paper, we presented ProgME, a framework for programmable
network measurement. The core idea of ProgME
is to collect traffic statistics based on a novel and versatile
concept of flowset i.e., arbitrary set of flows, instead of the
traditional inflexible concept of flow. The core of ProgME
is a flowset-based query answer engine (FQAE), which can
be programmed by users and applications via the proposed
flowset composition language. Knowledge about user requirements
offers measurement tools a fresh perspective and
enables them to adapt itself by collecting statistics according
to the tasks at hand. We further extended ProgME with an
adaptive multi-resolution tiling (MRT) algorithm that can
iteratively re-program itself to identify heavy hitters. We
show that ProgME, being a versatile tool, can adapt to different
measurement tasks. We believe ProgME is a useful
addition to the arsenal of measurement tools.