22-11-2012, 06:29 PM
Minimizing File Download Time in Stochastic Peer-to-Peer Networks
Minimizing File Download.pdf (Size: 313.99 KB / Downloads: 83)
Abstract
The peer-to-peer (P2P) file-sharing applications are
becoming increasingly popular and account for more than 70%
of the Internet’s bandwidth usage. Measurement studies show
that a typical download of a file can take from minutes up to
several hours depending on the level of network congestion or the
service capacity fluctuation. In this paper, we consider two major
factors that have significant impact on average download time,
namely, the spatial heterogeneity of service capacities in different
source peers and the temporal fluctuation in service capacity of
a single source peer. We point out that the common approach of
analyzing the average download time based on average service
capacity is fundamentally flawed. We rigorously prove that both
spatial heterogeneity and temporal correlations in service capacity
increase the average download time in P2P networks and then
analyze a simple, distributed algorithm to effectively remove these
negative factors, thus minimizing the average download time. We
show through analysis and simulations that it outperforms most
of other algorithms currently used in practice under various
network configurations.
INTRODUCTION
Peer-to-peer (P2P) technology is heavily used for content
distribution applications. The early model for content distribution
is a centralized one, in which the service provider simply
sets up a server and every user downloads files from it. In
this type of network architecture (server-client), many users
have to compete for limited resources in terms of bottleneck
bandwidth or processing power of a single server. As a result,
each user may receive very poor performance. From a single
user’s perspective, the duration of a download session, or the
download time for that individual user is the most often used
performance metric.
P2P technology tries to solve the issue of scalability by
making the system distributed. Each computer (peer) in the
network can act as both a server and a client at the same
time. When a peer completes downloading some files from
the network, it can become a server to service other peers
in the network. It is obvious that as time goes on, the service
capacity of the entire network will increase due to the increase
in the number of servicing peers. With this increasing service
capacity, theoretical studies have shown that the average download
time for each user in the network is much shorter than
that of a centralized network architecture in ideal cases [2],
[3]. In other words, users of a P2P network should enjoy much
faster downloads.
Our Contribution
The examples in Section I-A give us a motivation to seek
methods that can reduce the download time of each individual
user. The main contribution of this paper is to show that the
predicted value given in (1) is actually achievable without
requiring any global information, regardless of the distribution
of service capacities and correlations in a P2P network.
In this paper, we first characterize the relationship between
the heterogeneity in service capacity and the average download
time for each user, and show that the degree of diversity in
service capacities has negative impact on the average download
time. After we formally define the download time over
a stochastic capacity process, we prove that the correlations
in the capacity make the average download time much larger
than the commonly accepted value F/c, where c is the average
capacity of the source peer. It is thus obvious that the average
download time will be reduced if there exists a (possibly distributed)
algorithm that can efficiently eliminate the negative
impact of both the heterogeneity in service capacities over
different source peers and the correlations in time of a given
source peer.
BACKGROUND
In this section we briefly describe the characteristics of
the service capacity that a single user receives from the
network from the user’s perspective. Specifically, we consider
the heterogeneity of service capacities over different network
paths and the stochastic fluctuation of the capacity over time
for a given source peer.
Heterogeneity of Service Capacity
In a P2P network, just like any other network, the service
capacities from different source peers are different. There
are many reasons for this heterogeneity. On each peer side,
physical connection speeds at different peers vary over a wide
range [22] (e.g., DSL, Cable, T1, etc). Also, it is reasonable
to assume that most peers in a typical P2P network are just
personal computers, whose processing powers are also widely
different. The limitation in the processing power can limit how
fast a peer can service others and hence limits the service
capacity.
On the network side, peers are geographically located over
a large area and each logical connection consists of multiple
hops. The distance between two peers and the number of hops
surely affect its round-trip-time (RTT), which in turns affects
the throughput due to the TCP congestion control. Moreover,
in a typical P2P network, this information is usually ‘hidden’
when a user simply gets a list of available source peers that
have contents the user is looking for.
MINIMIZING AVERAGE DOWNLOAD TIME OVER
STOCHASTIC CHANNELS
Intuitively, if a downloader relies on a single source peer
for its entire download, it risks making an unlucky choice
of a slow source resulting in a long download. Since the
service capacity of each source peer is different and fluctuates
over time, utilizing different source peers either simultaneously
(parallel downloading) or sequentially within one download
session would be a good idea to diversify the risk. Parallel
downloading improves the performance by reducing the file
size over the ‘worst’ source peer and also may increase the
service capacity one receives from the network by utilizing
‘unused’ capacities of other source peers. If a downloader
utilizes one source peer at a time, switching around seems
to be a good strategy to avoid the ’bad’ source peer. Now,
the question is, “What is the criterion for switching, i.e., is it
chunk-based or time-based?” In this section we will analyze
the performance of (i) parallel downloading, (ii) random
chunk-based switching, and (iii) random time-based (periodic)
switching.