21-06-2012, 05:31 PM
A Machine Learning Approach to TCP Throughput Prediction
INTRODUCTION
THE availability of multiple paths between sources and receivers
enabled by content distribution, multihoming, and
overlay or virtual networks suggests the need for the ability to
select the “best” path for a particular data transfer. A common
starting point for this problem is to define “best” in terms of the
throughput that can be achieved over a particular path between
two end-hosts for a given sized TCP transfer. In this case, the
fundamental challenge is to develop a technique that provides
an accurate TCP throughput forecast for arbitrary and possibly
highly dynamic end-to-end paths.
Prior work on the problem of TCP throughput prediction has
largely fallen into two categories: those that investigate formulabased
approaches and those that investigate history-based approaches.
A MULTIVARIATE MACHINE LEARNING TOOL
The main hypothesis of this work is that history-based TCP
throughput prediction can be improved by incorporating measurements
of end-to-end path properties. The task of throughput
prediction can be formulated as a regression problem, i.e., predicting
a real-valued number based on multiple real-valued
input features. Each file transfer is represented by a feature
vector of dimension . Each dimension is an observed
feature, e.g., the file size, proximal measurements of path
properties such as queuing delay, loss, available bandwidth,
etc. Given , we want to predict the throughput . This
is achieved by training a regression function ,
and applying to . The function is trained using training
data, i.e., historical file transfers with known features and the
corresponding measured throughput.
Experimental Environment
The laboratory testbed used in our experiments is It consisted of commodity end-hosts connected
to a dumbbell-like topology of Cisco GSR 12000 routers.
Both measurement and background traffic was generated and
received by the end-hosts. Traffic flowed from the sending
hosts on separate paths via Gigabit Ethernet to separate Cisco
GSRs (hop B in the figure) where it was forwarded on OC12
(622 Mb/s) links. This configuration was created in order to
accommodate a precision passive measurement system. Traffic
from the OC12 links was then multiplexed onto a single OC3
(155 Mb/s) link (hop C in the figure), which formed the bottleneck
where congestion took place.
The RON Experimental Environment
The RON wide area experiments were conducted in January
2007 over 18 paths between seven different node locations. Two
nodes were in Europe (in Amsterdam, The Netherlands, and
London, U.K.), and the remainder were located at universities
in the continental United States (Cornell University, Ithaca,
NY; University of Maryland, College Park; University of New
Mexico, Albuquerque; New York University, New York; and
University of Utah, Salt Lake City). Of the 18 paths, two are
trans-European, nine are trans-Atlantic, and seven are transcontinental
U.S. The RON testbed has a significantly larger number
of available nodes and paths, but two considerations limited the
number of nodes and paths that we could use.