07-12-2012, 01:35 PM
A Dynamic Performance-Based Flow Control Method for High-Speed Data Transfer
A Dynamic Performance-Based Flow.pdf (Size: 2.48 MB / Downloads: 34)
Abstract
New types of specialized network applications are being created that need to be able to transmit large amounts of data
across dedicated network links. TCP fails to be a suitable method of bulk data transfer in many of these applications, giving rise to new
classes of protocols designed to circumvent TCP’s shortcomings. It is typical in these high-performance applications, however, that the
system hardware is simply incapable of saturating the bandwidths supported by the network infrastructure. When the bottleneck for
data transfer occurs in the system itself and not in the network, it is critical that the protocol scales gracefully to prevent buffer overflow
and packet loss. It is therefore necessary to build a high-speed protocol adaptive to the performance of each system by including a
dynamic performance-based flow control. This paper develops such a protocol, Performance Adaptive UDP (henceforth PA-UDP),
which aims to dynamically and autonomously maximize performance under different systems. A mathematical model and related
algorithms are proposed to describe the theoretical basis behind effective buffer and CPU management. A novel delay-based ratethrottling
model is also demonstrated to be very accurate under diverse system latencies. Based on these models, we implemented a
prototype under Linux, and the experimental results demonstrate that PA-UDP outperforms other existing high-speed protocols on
commodity hardware in terms of throughput, packet loss, and CPU utilization. PA-UDP is efficient not only for high-speed research
networks, but also for reliable high-performance bulk data transfer over dedicated local area networks where congestion and fairness
are typically not a concern.
INTRODUCTION
Acertain class of next generation science applications
needs to be able to transfer increasingly large amounts
of data between remote locations. Toward this goal, several
new dedicated networks with bandwidths upward of
10 Gbps have emerged to facilitate bulk data transfers.
Such networks include UltraScience Net (USN) [1], CHEETAH
[2], OSCARS [3], User-Controlled Light Paths
(UCLPs) [4], Enlightened [5], Dynamic Resource Allocation
via GMPLS Optical Networks (DRAGONs) [6], Japanese
Gigabit Network II [7], Bandwidth on Demand (BoD) on
Geant2 network [8], Hybrid Optical and Packet Infrastructure
(HOPI) [9], Bandwidth Brokers [10], and others.
The goal of our work is to present a protocol that can
maximally utilize the bandwidth of these private links
through a novel performance-based system flow control. As
Multigigabit speeds become more pervasive in dedicated
LANs and WANs and as hard drives remain relatively
stagnant in read and write speeds, it becomes increasingly
important to address these issues inside of the data transfer
protocol. We demonstrate a mathematical basis for the
control algorithms we use, and we implement and benchmark
our method against other commonly used applications
and protocols. A new protocol is necessary,
unfortunately, due to the fact that the de facto standard of
network communication, TCP, has been found to be
unsuitable for high-speed bulk transfer. It is difficult to
configure TCP to saturate the bandwidth of these links due
to several assumptions made during its creation.
TCP SOLUTIONS
As mentioned in Section 1, the congestion window provided
by TCP can make it impossible to saturate link bandwidth
under certain conditions. In the example pertaining to (1),
one obvious speed boost would be to increase the congestion
window beyond one packet. Assuming a no-loss link, a
window size of n packets would allow for 12:5n Mbps
throughput. On real networks, however, it turns out that the
Bandwidth-Delay Product (BDP) of the network is integral to
the window size. As the name suggests, the BDP is simply the
product of the bandwidth of the channel multiplied by the
end-to-end delay of the hosts. In a sense, this is the amount of
data present “on the line” at any given moment. A 10 Gbps
channel with an RTT of 10 ms would need approximately a
12.5 Megabyte buffer on either end, because at any given
time, 12.5 Megabytes would be on the line that potentially
would need to be resent due to errors in the line or packet loss
at the receiving end.
HIGH-SPEED RELIABLE UDP
High-speed Reliable UDP protocols include SABUL/UDT
[16], [17], RBUDP [18], Tsunami [19], and Hurricane [39],
among others [20].
UDP-based protocols generally follow a similar structure:
UDPis used for bulk data transfer andTCPis used marginally
for control mechanisms. Most high-speed reliable UDP
protocols use delay-based rate control to remove the need
for congestion windows. This control scheme allows a host to
statically set the rate and undoes the throughput-limiting
stairstep effects of AIMD. Furthermore, reliable delivery is
ensured with either delayed, selective, or negative acknowledgments
of packets. Negative acknowledgments are optimal
in cases where packet loss is minimal. If there is little loss,
acknowledging only lost packets will incur the least amount
of synchronous communication between the hosts. A simple
packet numbering scheme and application-level logic can
provide in-order, reliable delivery of data. Finally, reliable
UDPis positioned at the application level, which allows users
to explore more customized approaches to suit the type of
transfer, whether it is disk-to-disk, memory-to-disk, or any
combination thereof.
GOALS FOR HIGH-SPEED BULK TRANSFER
Ideally, we would want a high-performing protocol suitable
for a variety of high-speed, high-latency networks without
much configuration necessary at the user level. Furthermore,
we would like to see good performance on many
types of hardware, including commodity hardware and
disk systems. Understanding the interplay between these
algorithms and the host properties is crucial.
On high-speed, high-latency, congestion-free networks, a
protocol should strive to accomplish two goals: to maximize
goodput by minimizing synchronous, latency-bound communication
and to maximize the data rate according to the
receiver’s capacity. (Here, we define goodput as the
throughput of usable data, discounting any protocol headers
or transport overhead [21].)
Latency-bound communication is one of the primary
problems of TCP due to the positive acknowledgment
congestion window mechanism. As previous solutions have
shown, asynchronous communication is the key to achieving
maximum goodput. When UDP is used in tandem with
TCP, UDP packets can be sent asynchronously, allowing the
synchronous TCP component to do its job without limiting
the overall bandwidth.
The Data Sender
Depending on the application, the sender may be locked into
the same kinds of performance-limiting factors as the
receiver. For disk-to-disk transfers, if the disk read rate is
slower than the bandwidth of the channel, the host must rely
on preallocated buffers before the transfer. This is virtually
the same relationship as seen in (6). Unfortunately, if the
bottleneck occurs at this point, nothing can be done but to
improve the host’s disk performance. Unlike the receiver,
however, CPU latency and kernel buffers are less crucial to
performance and disk read speeds are almost universally
faster than disk write speeds. Therefore, if buffers of
comparable size are used (meaning will be the same), the
burden will always be on the receiver to keep up with the
sender and not vice versa. Note that this only applies for diskto-
disk transfers.
Data Flow and Structures
A loose description of data flow and important data
structures for both the sender and receiver is shown in
Figs. 8 and 9. The sender sends data through the UDP
socket, which is asynchronous, while periodically probing
the TCP socket for control and retransmission requests. A
buffer is maintained, so the sender does not have to
reread from disk when a retransmitted packet is needed.
Alternatively, when the data are generated, a buffer might
be crucial to the integrity of the received data if data are
taken from sensors or other such nonreproducible events.
At the receiver end, asshownin Fig. 9, there are six threads.
Threads serve to provide easily attainable parallelism,
crucially hiding latencies. Furthermore, the use of threading
to achieve periodicity of independent functions simplifies the
system code. As the Recv thread receives packets, two Disk
threads write them to disk in parallel. Asynchronously, the
Rexmt thread sends retransmit requests, and the Rate control
thread profiles and sends the current optimum sending rate
to the sender. The File processing thread ensures that the data
are in the correct order once the transfer is over.
RELATED WORK
High-bandwidth data transport is required for large-scale
distributed scientific applications. The default implementations
of Transmission Control Protocol (TCP) [30] and User
Datagram Protocol (UDP) do not adequately meet these
requirements. While several Internet backbone links have
been upgraded to OC-192 and 10GigE WAN PHY, end users
have not experienced proportional throughput increases. The
weekly traffic measurements reported in [41] reveal that most
of bulk TCP traffic carrying more than 10 MB of data on
Internet2 only experiences throughput of 5 Mbps or less. For
control applications, TCP may result in jittery dynamics on
lossy links [37].
Currently, there are two approaches to transport protocol
design: TCP enhancements and UDP-based transport with
non-Additive Increase Multiplicative Decrease (AIMD) control.
In the recent years, many changes to TCP have been
introduced to improve its performance for high-speed networks
[29]. Efforts by Kelly have resulted in a TCP variant
called Scalable TCP [32]. High-Speed TCP Low Priority
(HSTCP-LP) is a TCP-LP version with an aggressive window
increase policy targeted toward high-bandwidth and longdistance
networks [33].
CONCLUSIONS
The protocol based on the ideas in this paper has shown that
transfer protocols designed for high-speed networks should
not only rely on good theoretical performance but also be
intimately tied to the system hardware on which they run.
Thus, a high-performance protocol should adapt in different
environments to ensuremaximumperformance, and transfer
rates should be set appropriately to proactively curb packet
loss. If this relationship is properly understood, optimal
transfer rates can be achieved over high-speed, high-latency
networks at all times without excessive amounts of user
customization and parameter guesswork.
In addition to low packet loss and high throughput, PAUDP
has shown to be computationally efficient in terms of
processing power per throughput. The adaptive nature of
PA-UDP shows that it can scale computationally, given
different hardware constraints. PA-UDP was tested against
many other high-speed reliable UDP protocols, and also
against BBCP, a high-speed TCP variant. Among all
protocols tested, PA-UDP consistently outperformed the
other protocols in CPU utilization efficiency.