08-06-2013, 04:05 PM
Minimizing File Download Time in Stochastic Peer-to-Peer Networks
Minimizing File Download.zip (Size: 361.56 KB / Downloads: 17)
ABSTRACT
The peer-to-peer (P2P) file-sharing applications are becoming increasingly popular and account for more than 70% of the Internet’s bandwidth usage. Measurement studies show that a typical download of a file can take from minutes up to several hours depending on the level of network congestion or the service capacity fluctuation. In this paper, we consider two major factors that have significant impact on average download time, namely, the spatial heterogeneity of service capacities in different source peers and the temporal fluctuation in service capacity of a single source peer. We point out that the common approach of analyzing the average download time based on average service capacity is fundamentally flawed. We rigorously prove that both spatial heterogeneity and temporal correlations in service capacity increase the average download time in P2P networks and then analyze a simple, distributed algorithm to effectively remove these negative factors, thus minimizing the average download time. We show through analysis and simulations that it outperforms most of other algorithms currently used in practice under various network configurations.
INTRODUCTION
PEER-TO-PEER (P2P) technology is heavily used for content distribution applications. The early model for content distribution is a centralized one, in which the service provider simply sets up a server and every user downloads files from it. In this type of network architecture (server-client), many users have to compete for limited resources in terms of bottleneck bandwidth or processing power of a single server. As a result, each user may receive very poor performance. From a single user’s perspective, the duration of a download session, or the download time for that individual user is the most often used performance metric. By using this P2P technology, users can get the fast
Download experience.
SCOPE OF THE PROJECT
P2P technology tries to solve the issue of scalability by making the system distributed. Each computer (peer) in the network can act as both a server and a client at the same time. When a peer completes downloading some files from the network, it can become a server to service other peers in the network. It is obvious that as time goes on, the service capacity of the entire network will increase due to the increase in the number of servicing peers. With this increasing service capacity that the average download time for each user in the network is much shorter than that a centralized network architecture. Another word, users of a P2P network should enjoy much faster downloads.
INTRODUCTION TO THE PROJECT
Due to the distributed nature of the P2P network, searching and locating data of interest in the network has been an important issue in the project. In reality, data searching time only contributes a very small portion of a download session while the most delay is caused by actually transferring the file from source peers. Thus, if we want to minimize the download time for each user, reducing the actual file transfer time would make more noticeable difference. We have focused on reducing the total download duration.
There are two major issues in this approach. One is that global information of the peers in the network is required. The other is that the analysis is based on the averaged quantities, e.g., average capacities of all possible source peers in the network. The approach of using the average service capacity to analyze the average download
HETEROGENEITY OF SERVICE CAPACITY
In a P2P network, just like any other network, the service capacities from different source peers are different. There are many reasons for this heterogeneity. On each peer side, physical connection speeds at different peers vary over a wide range (e.g., DSL, Cable, T1, etc.). Also, it is reasonable to assume that most peers in a typical P2P network are just personal computers, whose processing powers are also widely different. The limitation in the processing power can limit how fast a peer can service others and hence limits the service capacity. On the network side, peers are geographically located over
a large area and each logical connection consists of multiple hops. The distance between two peers and the number of hops surely affect its round-trip time (RTT), which in turns affects the throughput due to the TCP congestion control. Moreover, in a typical P2P network, this information is usually “hidden” when a user simply gets a list of available source peers that have contents the user is looking for. Note that the aforementioned factors do not change over the timescale of any typical P2P session (days or a week). Hence, we assume that those factors mainly determine the long-term average of the service capacity over a given source peer.
Correlations in Service Capacity
The long-term average of the service capacity is mainly governed by topological parameters, the actual service capacity during a typical session is never constant, but always fluctuates over time. There are many factors causing this fluctuation. First, the number of connection a source peer allows is changing over time, which creates a fluctuation in the service capacity for each user. Second, some user applications running on a source peer (usually a PC), such as online games, may throttle the CPU and impact the amount of capacity it can offer. Third, temporary congestion at any link in the network can also reduce the service capacity of all users utilizing that link.
Fig. 1 shows a typical available end-to-end capacity fluctuation. The time scale for the figure in the measurement study is on the order of minutes. We know from that a typical file download session can last from minutes to hours for a file size of several megabytes. This implies that the service capacity over the timescale of one download session is stochastic and correlated. In Fig. 1, the short-term variations in the capacity are mainly due to the window size fluctuation in TCP, while the long-term variations are due to network congestion, changes in workload or the number of connecting users at the source peer, etc. The long-term fluctuation typically lasts over a longer time scale, say, few minutes up to several hours.
MINIMIZING AVERAGE DOWNLOAD TIME OVER STOCHASTIC CHANNELS
Intuitively, if a downloader relies on a single source peer for its entire download, it risks making an unlucky choice of a slow source resulting in a long download. Since the service capacity of each source peer is different and fluctuates over time, utilizing different source peers either simultaneously (parallel downloading) or sequentially within one download session would be a good idea to diversify the risk. Parallel downloading improves the performance by reducing the file size over the “worst” source peer and also may increase the service capacity one receives from the network by utilizing “unused” capacities of other source peers. If a downloader utilizes one source peer at a time, switching around seems to be a good strategy to avoid the “bad” source peer.
Different strategies have different impact on the average download time of each peer, which may result in different system dynamics as well, e.g., how fast a downloader can start to contribute (become a source peer) or how fast a peer leaves the system after finishing download. If there is no peer leaving the system and all peers are willing to share after they complete their download (either the entire file or a chunk), the aggregate service capacity in the system keeps increasing as time goes on because the number of source peers continuously grows.