28-09-2012, 12:12 PM
A Lock-Free, Cache-Efficient Multi-Core Synchronization Mechanism for Line-Rate Network Traffic Monitoring
A Lock-Free, Cache-Efficient.pdf (Size: 248.96 KB / Downloads: 43)
Abstract
Line-rate data traffic monitoring in high-speed
networks is essential for network management. To satisfy the
line-rate requirement, one can leverage multi-core architectures
to parallelize traffic monitoring so as to improve information
processing capabilities over traditional uni-processor architec-
tures. Nevertheless, realizing the full potential of multi-core
architectures still needs substantial work, especially in the
face of the ever-increasing volume and complexity of network
traffic. This paper addresses the issue through the design
of a lock-free, cache-efficient synchronization mechanism that
serves as a basic building block for a general class of multi-
threaded, multi-core traffic monitoring applications. We embed
the synchronization mechanism into MCRingBuffer, a multi-
core shared ring buffer that provides fast data accesses among
threads running in different cores. MCRingBuffer allows con-
current lock-free data accesses and improves the cache locality
of accessing the control variables that are used for thread
synchronization.
INTRODUCTION
Monitoring the behavior of data traffic in communication
networks is essential for a variety of network management
applications such as accounting, resource provisioning, failure
diagnosis, and intrusion detection. One crucial requirement
of network traffic monitoring is to process packets at
line rate, meaning that the packet processing speed must
keep up with the bandwidth of the communication link
where data traffic is monitored. However, as the volume
of network traffic continuously surges, so does the link
bandwidth, typically to Gigabit-per-second scales. Also,
as networking applications are getting more diversified,
data traffic contains more sophisticated information that
further complicates the monitoring process. Conventional
uni-processor architectures may no longer support line-rate
traffic monitoring, and this motivates the need of more
advanced architectures.
MCRINGBUFFER
We describe MCRingBuffer, a shared ring buffer tailored
for multi-core architectures. We start by describing an existing
concurrent lock-free ring buffer that we use as a building
block for MCRingBuffer. Then we present MCRingBuffer,
including its design, sketch proof of correctness, and evaluation
results.
Background
Ring buffer design, also known as the producer/consumer
problem, is a classical problem in many introductory operating
systems textbooks. In the problem, we have a bounded
buffer with a fixed number of slots. A producer inserts
elements to the buffer only when the buffer is not full, while
a consumer extracts elements from the buffer only when the
buffer is not empty. In addition, the first-in-first-out (FIFO)
property needs to be guaranteed, meaning that the elements
extracted by the consumer appear in the same order as the
elements inserted by the producer.
Design
MCRingBuffer is built upon BasicRingBuffer to support
concurrent lock-free accesses, and enhances BasicRing-
Buffer with a key objective to improve the cache locality
of thread synchronization. In particular, MCRingBuffer is
a software-based data structure without any hardware synchronization
primitives. MCRingBuffer de-couples the data
and control operations as in BasicRingBuffer, so that it
can transfer data elements of general data types. Also, the
producer and consumer threads do not need to specifically
schedule their insert and extract operations to maximize the
performance gain of MCRingBuffer.
We make several assumptions for MCRingBuffer, while
these assumptions are inherited from BasicRingBuffer. First,
the single-producer/single-consumer model is used. Also,
reading and writing the shared control variables read and
write, which are of integer type, are indivisible, atomic
operations. In general, atomic operations on 32-bit integers
are guaranteed in modern 32/64-bit CPUs such as Intel (see
[5], Chapter 8). Furthermore, the memory model satisfies
sequential consistency [11], in which the order of operations
executed by all threads preserves the order of operations
specified by each individual thread.
RELATED WORK
In Section II-A2, we review different lock-free ring
buffers proposed in the literature. Here, we focus on reviewing
existing parallel network traffic monitoring schemes.
Kruegel et al. [7] propose a parallel, stateful intrusion
detection architecture that partitions and processes traffic on
different PCs (called sensors). Foschini et al. [1] extend [7]
with a parallel matching algorithm for stateful signatures.
Weaver and Sommer [19] propose an intrusion detection
system using a cluster of end hosts. Paxson et al. [12]
sketch a network monitoring system tailored for multi-core
architectures, and their main focus is on how to dispatch
packets to the analysis threads on different cores. Wang et
al. [18] apply the variant of FastForward [2] and show that
multi-core-based processing can boost the packet processing
speed to Gigabit levels. Similar work of multi-core-based
traffic monitoring is also found in [13], [20], whose major
focus is on the algorithmic design, such as flow classification
[13] and load balancing [20]. In contrast, our work on
MCRingBuffer mainly addresses a shared data structure that
minimizes the cost of inter-core (or inter-thread) communication
and applies to general multi-threaded, multi-core
network traffic monitoring systems.
CONCLUSIONS AND FUTURE WORK
We propose a basic primitive that enables line-rate network
traffic monitoring. We present MCRingBuffer, a lockfree,
cache-efficient ring buffer that achieves efficient thread
synchronization in multi-core architectures. MCRingBuffer
improves the cache locality of thread synchronization via:
(i) cache-line protection and (ii) batch updates of control
variables. Also, MCRingBuffer is a software-based data
structure that supports generic data types of elements.