12-11-2012, 04:12 PM
The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM
ramcloud.pdf (Size: 120.97 KB / Downloads: 14)
Abstract
Disk-oriented approaches to online storage are becoming increasingly problematic: they do not scale grace-fully to meet the needs of large-scale Web applications, and improvements in disk capacity have far out-stripped improvements in access latency and bandwidth. This paper argues for a new approach to datacenter storage called RAMCloud, where information is kept entirely in DRAM and large-scale systems are created by aggregating the main memories of thousands of commodity servers. We believe that RAMClouds can provide durable and available storage with 100-1000x the throughput of disk-based systems and 100-1000x lower access latency. The combination of low latency and large scale will enable a new breed of data-intensive applications.
Introduction
For four decades magnetic disks have provided the primary means of storing online information in com-puter systems. Over that period disk technology has undergone dramatic improvements, and it has been harnessed by a variety of higher-level storage systems such as file systems and relational databases. However, the performance of disk has not improved as rapidly as its capacity, and developers are finding it increasingly difficult to scale disk-based systems to meet the needs of large-scale Web applications. Many people have proposed new approaches to disk-based storage as a solution to this problem; others have suggested replac-ing disks with flash memory devices. In contrast, we believe that the solution is to shift the primary locus of online data from disk to random access memory, with disk relegated to a backup/archival role.
RAMCloud Overview
RAMClouds are most likely to be used in datacenters containing large numbers of servers divided roughly into two categories: application servers, which imple-ment application logic such as generating Web pages or enforcing business rules, and storage servers, which provide longer-term shared storage for the application servers. Traditionally the storage has consisted of files or relational databases, but in recent years a variety of new storage mechanisms have been developed to im-prove scalability, such as Bigtable [4] and memcached [16]. Each datacenter typically supports numerous ap-plications, ranging from small ones using only a frac-tion of an application server to large-scale applications with thousands of dedicated application and storage servers.
Motivation
Application scalability
The motivation for RAMClouds comes from two sources: applications and technology. From the stand-point of applications, relational databases have been the storage system of choice for several decades but they do not scale to the level required by today's large-scale applications. Virtually every popular Web application has found that a single relational database cannot meet its throughput requirements. As the site grows it must undergo a series of massive revisions, each one intro-ducing ad hoc techniques to scale its storage system, such as partitioning data among multiple databases. These techniques work for a while, but scalability is-sues return when the site reaches a new level of scale or a new feature is introduced, requiring yet more special-purpose techniques.
Caching
Historically, caching has been viewed as the answer to problems with disk latency: if most accesses are made to a small subset of the disk blocks, high performance can be achieved by keeping the most frequently ac-cessed blocks in DRAM. In the ideal case a system with caching can offer DRAM-like performance with disk-like cost.
However, the trends in Table 2 are diluting the benefits of caching by requiring a larger and larger fraction of data to be kept in DRAM. Furthermore, some new Web applications such as Facebook appear to have little or no locality, due to complex linkages between data (e.g., friendships in Facebook). As of August 2009 about 25% of all the online data for Facebook is kept in main memory on memcached servers at any given point in time, providing a hit rate of 96.5%. When additional caches on the database servers are counted, the total amount of memory used by the storage system equals approximately 75% of the total size of the data (exclud-ing images). Thus a RAMCloud would only increase memory usage for Facebook by about one third.
Research Issues
There are numerous challenging issues that must be addressed before a practical RAMCloud system can be constructed. This section describes several of those issues, along with some possible solutions. Some of these issues, such as data model and concurrency con-trol, are common to all distributed storage systems, so there may be solutions devised by other projects that will also work for RAMClouds; others, such as low latency RPC and data durability, may require a unique approach for RAMClouds.
Low latency RPC
Although latencies less than 10μs have been achieved in specialized networks such as Infiniband and Myrinet, most existing datacenters use networking infrastructure based on Ethernet/IP/TCP, with typical round-trip times for remote procedure calls of 300-500 μs. We believe it is possible to reduce this to 5-10 μs, but doing so will require innovations at several levels. The greatest ob-stacle today is latency in the network switches.
Data model
Another issue for RAMClouds is the data model for the system. There are many possible choices, and existing systems have explored almost every imaginable combi-nation of features. The data model includes three over-all aspects. First, it defines the nature of the basic ob-jects stored in the system. In some systems the basic objects are just variable-length “blobs” of bytes where the storage system knows nothing about their internal structure. At the other extreme, the basic objects can have specific structure enforced by the system; for ex-ample, a relational database requires each record to consist of a fixed number of fields with prespecified types such as integer, string, or date. In either case we expect most objects to be small (perhaps a few hundred bytes), containing information equivalent to a record in a database or an object in a programming language such as C++ or Java.
Distribution and scaling
A RAMCloud must provide the appearance of a single unified storage system even though it is actually im-plemented with thousands of server machines. The dis-tribution of the system should not be reflected in the APIs it provides to application developers, and it should be possible to reconfigure the system (e.g. by adding or removing servers) without impacting running applications or involving developers. This represents one of the most important advantages of RAMCloud over most existing storage systems, where distribution and scaling must be managed explicitly by developers.