21-05-2012, 02:59 PM
CloudTPS: Scalable Transactions forWeb Applications in the Cloud
Cloud_computing.pdf (Size: 1.44 MB / Downloads: 54)
Abstract
NoSQL Cloud data services provide scalability and high availability
properties for web applications but at the same time they sacrice data
consistency. However, many applications cannot aord any data inconsistency.
CloudTPS is a scalable transaction manager to allow cloud
database services to execute the ACID transactions of web applications,
even in the presence of server failures and network partitions.
We implement this approach on top of the two main families of scalable
data layers: Bigtable and SimpleDB. Performance evaluation on top
of HBase (an open-source version of Bigtable) in our local cluster and
Amazon SimpleDB in the Amazon cloud shows that our system scales
linearly at least up to 40 nodes in our local cluster and 80 nodes in the
Amazon cloud.
Introduction
Cloud computing oers the vision of a virtually innite pool of computing,
storage and networking resources where applications can be scalably
deployed [22]. In particular, NoSQL cloud database services such as Amazon SimpleDB [2] and Google Bigtable [13] oer a scalable data tier for applications in the cloud. These systems typically partition the application data to provide incremental scalability, and replicate the partitioned data to tolerate server failures.
Related Work
2.1 Data Storage in the Cloud
The simplest way to store structured data in the cloud is to deploy a relational database such as MySQL or Oracle. The relational data model,
typically implemented via the SQL language, provides great
exibility in accessing data. It supports sophisticated data access operations such as aggregation, range queries, join queries, etc. RDBMSs support transactions and guarantee strong data consistency. One can easily deploy a classical RDBMS such as MySQL and Oracle in the cloud and thus get support for transactional consistency. However, the features of exible data querying and strong data consistency prevent one from partitioning data automatically, which is the key for performance scalability. These database systems rely on replication techniques and therefore do not bring extra scalability improvement compared to a non-cloud deployment [29, 41, 6]. On the other hand, a new family of cloud database services such as Google Bigtable [13], Amazon SimpleDB [2], Yahoo PNUTS [14], and Cassandra [31], uses simplied data models based on attribute-value pairs.
Distributed Transactional Systems
There have been decades of research eorts in eciently implementing distributed transactions for distributed database systems [42, 38, 39]. A number of distributed commit protocols [33, 43, 46, 21] and concurrency control mechanisms [7, 8, 9] have been proposed to maintain the ACID properties of distributed transactions. However, as distributed databases use the same relational data model as RDBMS, they also cannot partition the data automatically and thus lack scalability. On the other hand, we can apply these techniques as building blocks in designing CloudTPS. We rely on 2-Phase Commit (2PC) [33, 43] as the distributed commit protocol for ensuring Atomicity, and on timestamp-ordering [7] for concurrency control.
H-Store [47, 27] is a distributed main memory OLTP database, which executes on a cluster of shared-nothing main memory executor nodes. H-Store supports transactions accessing multiple data records with SQL semantics, implemented as predened stored procedures written in C++. H-Store also replicates data records to tolerate machine failures. H-Store focuses on absolute system performance in terms of transaction throughput, and achieves very high performance on one executor node. However, the scalability of HStore relies on careful data partition across executor nodes, such that most transactions access only one executor node. On the other hand, we prefer to focus on achieving linear scalability specically for Web applications, such that any increase in workload can be accommodated by provisioning more servers. Also note that H-Store does not maintain persistent logs or keep any data in the non-volatile storage of either the executor nodes nor any backing store. CloudTPS checkpoints the updates.
Conclusion
Many Web applications need strong data consistency for their correct execution. However, although the high scalability and availability properties of the cloud make it a good platform to host Web content, scalable cloud database services only provide relatively weak consistency properties. This article shows how one can support ACID transactions without compromising the scalability property of the cloud for Web applications, even in the presence of server failures and network partitions.