26-07-2012, 04:07 PM
TCP Offload Engines
TCP Offload Engines.pdf (Size: 207.5 KB / Downloads: 94)
As network interconnect speeds advance to Gigabit
Ethernet1 and 10 Gigabit Ethernet,2 host processors can
become a bottleneck in high-speed computing—often
requiring more CPU cycles to process the TCP/IP protocol
stack than the business-critical applications they are running.
As network speed increases, so does the performance
degradation incurred by the corresponding increase in
TCP/IP overhead. The performance degradation problem
can be particularly severe in Internet SCSI (iSCSI)–based
applications, which use IP to transfer storage block I/O data
over the network.
By carrying SCSI commands over IP networks, iSCSI
facilitates both intranet data transfers and long-distance
storage management. To improve data-transfer performance
over IP networks, the TCP Offload Engine (TOE) model can
relieve the host CPU from the overhead of processing
TCP/IP. TOEs allow the operating system (OS) to move
all TCP/IP traffic to specialized hardware on the network
adapter while leaving TCP/IP control decisions to the host
server.
TCP/IP helps ensure reliable, in-order data delivery
Currently the de facto standard for internetwork data
transmission, the TCP/IP protocol suite is used to
transmit information over local area networks (LANs),
wide area networks (WANs), and the Internet. TCP/IP
processes can be conceptualized
as layers in a hierarchical stack;
each layer builds upon the layer
below it, providing additional
functionality. The layers most relevant
to TOEs are the IP layer and
the TCP layer (see Figure 1).
Large send offload
Large send offload (LSO), also known as TCP segmentation offload
(TSO), frees the OS from the task of segmenting the application’s
transmit data into MTU-size chunks. Using LSO, TCP can transmit
a chunk of data larger than the MTU size to the network adapter.
The adapter driver then divides the data into MTU-size chunks and
uses the prototype TCP and IP headers of the send buffer to create
TCP/IP headers for each packet in preparation for transmission.
LSO is an extremely useful technology to scale performance
across multiple Gigabit Ethernet links, although it does so under certain
conditions. The LSO technique is most efficient when transferring
large messages. Also, because LSO is a stateless offload, it yields
performance benefits only for traffic being sent; it offers no improvements
for traffic being received. Although LSO can reduce CPU utilization
by approximately half, this benefit can be realized only if
the receiver’s TCP window size is set to 64 KB. LSO has little effect
on interrupt processing because it is a transmit-only offload.
CPU interrupt processing
An application that generates a write to a remote host over a network
produces a series of interrupts to segment the data into packets
and process the incoming acknowledgments. Handling each
interrupt creates a significant amount of context switching—a type
of multitasking that directs the focus of the host CPU from one
process to another—in this case, from the current application process
to the OS kernel and back again. Although interrupt-processing
aggregation techniques can help reduce the overhead, they do not
reduce the event processing required to send packets. Additionally,
every data transfer generates a series of data copies from the application
data buffers to the system buffers, and from the system buffers
to the network adapters.
High-speed networks such as Gigabit Ethernet compel host
CPUs to keep up with a larger number of packets. For 1500-byte
packets, the host OS stack would need to process more than 83,000
packets per second, or a packet every 12 microseconds. Smaller
packets put an even greater burden on the host CPU. TOE processing
can enable a dramatic reduction in network transaction
load. Using TOEs, the host CPU can process an entire application
I/O transaction with one interrupt. Therefore, applications working
with data sizes that are multiples of network packet sizes will
benefit the most from TOEs. CPU interrupt processing can be reduced
from thousands of interrupts to one or two per I/O transaction.
TOEs provide options for optimal performance or flexibility
Administrators can implement TOEs in several ways, as best suits
performance and flexibility requirements. Both processor-based and
chip-based methods exist. The processor-based approach provides
the flexibility to add new features and use widely available components,
while chip-based techniques offer excellent performance
at a low cost. In addition, some TOE implementations offload processing
completely while others do so partially.