06-02-2013, 04:34 PM
Seminar Report on Web Caching
1Web Caching.doc (Size: 85.5 KB / Downloads: 38)
Abstract
The World Wide Web can be considered as a large distributed information system that provides access to shared data objects. As one of the most popular applications currently running on the Internet, The World Wide Web is of an exponential growth in size, which results in network congestion and server overloading. Also, the WWW has documents that are of diverse nature and so everyone can find information according to their liking. But, this scorching rate of growth has put a heavy load on the Internet communication channels. This situation is likely to continue in the foreseeable future, as more and more information services move onto web. The result of all this is increased access latency for the users. Access latency, which is the time interval between the user giving out a request and its actual completion, could result from many reasons. Servers can get flooded with more requests than they can optimally handle. The network path between the user and the server could become congested due to increased traffic on any or some of the constituent links.
Caching popular objects close to the users provides an opportunity to combat this latency by allowing users to fetch data from a nearby cache rather than from a distant server. Web caching has been recognized as one of the effective schemes to alleviate the service bottleneck and reduce the network traffic, thereby minimize the user access latency.
In this seminar I would like to give about why we need web caching, desirable properties of the web cache, cache resolution, cache replacement policies, cache coherency and about caching of dynamic data.
Introduction
The World Wide Web can be considered as a large distributed information system that provides access to shared data objects. As one of the most popular applications currently running on the Internet, The World Wide Web is of an exponential growth in size, which results in network congestion and server overloading. Web caching has been recognized as one of the effective schemes to alleviate the service bottleneck and reduce the network traffic, thereby minimize the user access latency.
Hierarchical caching architecture
One approach to coordinate caches in the same system is to set up a caching hierarchy. With hierarchical caching, caches are placed at multiple levels of the network. For the sake of simplicity, we assume that there are four levels of caches: bottom, institutional, regional, and national levels. At the bottom level of the hierarchy there are the client/browser caches. When a request is not satisfied by the client cache, the request is redirected to the institutional cache. If the document is not found at the institutional level, the request is then forwarded to the regional level cache which in turn forwards unsatisfied requests to the national level cache. If the document is not found at any cache level, the national level cache contacts directly the original server. When the document is found, either at a cache or at the original server, it travels down the hierarchy, leaving a copy at each of the intermediate caches along its path. Further requests for the same document travel up the caching hierarchy until the document is hit at some cache level.
Dynamic Data Caching
As we mentioned before, the benefit of current Web caching schemes is limited by the fact that only a fraction of web data is cacheable. Non-cacheable data (i.e. personalized data, authenticated data, server dynamically generated data, etc.) is of a significant percentage of the total data. For example, measurement results show that 30% of user requests contain cookies. Many web sites disable caching of documents. The main reason behind that is many sites contain banner adds and they collect money from them based on the no of hits of the document, if the document is cached then it is not possible to do so and other reason might be to collect the user information that who is using the content. For these reasons all documents can’t be cached. Current dynamic data caching approaches can be classified into two categories: active cache and server accelerator.
Active Cache migrates parts of server processing on each user request to the caching proxy in a flexible, on-demand fashion via “cache applets.” A cache applet is a server-supplied code that is attached with a Universal Resource Locator (URL) or a collection of URLs. The code is typically written in a platform-independent programming language such as Java When caching a document, the proxy also fetches the corresponding cache applet. When a user request hits on the cached copy and the proxy would like to service the request, the proxy must invoke the cache applet with the user request and other information as arguments. The cache applet then decides what the proxy will send back to the user, either giving the proxy a new document to send back to the user, or allowing the proxy to use the cached copy, or instructing the proxy to send the request to the Web server. Furthermore, the applet can deposit information in a log object, which is sent back to the server periodically and when the applet or the document is purged from the cache. It's shown that Active cache scheme can result in significant network bandwidth saving at the expense of CPU cost. However, due to the significant CPU overhead, the user access latency is much larger than that without caching dynamic objects.
Conclusion
As Web services becomes more and more popular, users will suffer from network congestion and server overloading. Web caching is recognized to be one of the effective techniques to alleviate server bottleneck and reduce network traffic, thereby minimize the user access latency. The main challenges in Web caching are proxy placement, dynamic data caching, cache routing. The research frontier in Web performance improvement lies in developing effcient, scalable, robust, adaptive and stable Web caching scheme that can be easily deployed in current and future network.