29-11-2012, 04:40 PM
Highly Scalable, Ultra-Fast and Lots of Choices
1Highly Scalable,.pdf (Size: 92.57 KB / Downloads: 33)
Introduction
The main motivation to consider employing a non-relational database system for a new application
often comes from non-functional requirements on performance and scalability that your favorite relational
database system cannot support, at least not for reasonable costs.
NoSQL products promise to scale structured data storage beyond the limits of relational database
systems. While this has already proven true for quite a number of applications, the ability to serve a
huge number of concurrent users and great volumes of data comes at a price.
First of all, the NoSQL space provides a variety of very different approaches to data storage and access.
Coming from the world of relational database systems, you will quickly realize that standards
such as SQL do not exist (at least not yet). In fact, many NoSQL products have little in common
other than that they are different from relational database products.
In order to decide on a specific product, it helps a lot to understand the fundamentals how data is
actually stored. Fortunately, there are some commonalities among groups of NoSQL databases. Although
it is difficult to clearly categorize every NoSQL database by how their data storage is struc -
tured, choosing an actual product becomes easier once you know typical use cases for specific data
store types.
The goal of this paper is to clearly distinguish the currently existing data store types – at least in the -
ory. The paper assumes that you have a good understanding of relational database systems, including
properties such as transactional support, referential integrity, consistency constraints, etc.
Once you've identified a specific data store type as appropriate for your use case, you need to dig
deeper in order to choose an actual product. You will need to concern yourself with features and
properties of NoSQL products such as BASE properties, scalability and replication issues, concurrency
modes, failure modes, etc. These features are out of scope of this paper, however.
Non-Relational Data Store
A new project has been initiated. You and your team are experienced in using and running relational
database systems but the decision what data store to use has not been taken yet.
The scope of the project encompasses a huge user basis that will generate a massive amount of
data and/or a large number of concurrent requests that need to be quickly processed for a
smooth user experience. What type of data storage is appropriate in this situation?
Relational database products offer well-known, valuable solutions for all kinds of applications but
scaling a relational database beyond a certain limit (which depends on the actual product, the underlying
hardware and the available knowledge on optimization) is next to impossible.
Non-relational (i.e. NoSQL) databases offer a variety of solutions to create highly scalable applications
but you need to give up your own experience and instead find experienced developers and administration
expertise to fully benefit from any product that has not yet been employed in your organization.
Your team may be eager to learn a new technology but the introduction of a new technology
imposes a learning curve that costs time and money.
You can try to guess the amount of data and number of concurrent requests after the initial roll-out
but it is very hard to predict future functionality and usage of the new application. Reasonable
guessing may show that you may be well off sticking with a relational database for a while. But
your application would not be the first one to have virtually been knocked out by suddenly and unexpectedly
becoming very popular.
NoSQL Data Store Types
The following discussion of NoSQL data store types shows the variety among the existing
products. In reality it is not possible to clearly categorize every NoSQL product into one of the following
types. Quite a few products are hybrids that combine several concepts.
The most basic concept of non-relational data storage is the Key/Value Store. Such a database
stores little structured data that is accessible by their keys only. Several variants exist that form
groups of their own: Blob Stores are specialized to hold large binary data. Column Family Stores
persist related values closely together for better analyzing capabilities across huge numbers of
columns and Document Stores provide means to create rich domain models. Graph Databases rely
on a different concept, focusing on the relations between data entries.
Key/Value Store
Your domain model or an important part of it is rather simple, i.e. there are few interdependencies
and constraints. But the data changes often or your application generates a huge amount of it.
There are many cases where you need to store data that is not absolutely critical but you've got
plenty of it. Think about log or statistical data, current status information of an online game or similar
scenarios where small chunks of data are written often and read more or less often. Using a relational
database in such cases does not scale well because of transactional overhead or database locks.
Speed is your main concern when reading and writing data. Your application must react
quickly even at peak time. The ability to scale well is more important than data integrity and
the ability to comfortably query data.
Properties of relational database systems such as transactional support and referential integrity are
not important to you or, at least, less important than throughput but your data still needs to be
stored safely and high availability is a key issue.
Your domain model is not highly complex and your application's data has little structure but you
need at least support for some data types.
Final remarks
NoSQL databases provide a wide range of solutions when you need highly scalable data storage
([1], [2]). But there is no all-in-one solution available that promises to replace the currently dominating
relational databases systems. Rather, most NoSQL databases are best suited to address specific
problems and use cases ([3]).
While the data storage type of a NoSQL product is an important decision criterion, there are more
criteria to consider. The CAP theorem allows different tradeoffs to make, i.e. whether a product re -
stricts the consistency, availability or partition tolerance of the system. Furthermore, different
products support different clustering and replication strategies that may affect an application's general
architecture. All of these criteria (and several more) are out of scope of this paper and deserve a
paper on their own.
The term polyglot persistence ([4], [5]) expresses the fact that a single data storage may not suit all
needs of an application. Instead, an application might encompass several different approaches to
data persistence. To give an example: while you would probably keep financial records in a relational
database, catalog data that describes a variety of products may be better kept in a Document
Store. Session information of the users of a web site may be best stored in a Key/Value Store
whereas data that tracks the users' behavior for latter analysis is a candidate to be kept in a Column
Family Store. Add a Blob Store for binary data such as images and a Graph Database to model and
analyze the relations between your users and you've got a very rich data persistence landscape.