18-10-2016, 10:25 AM
1459547983-cassendra1.docx (Size: 27.77 KB / Downloads: 3)
Welcome to Apache Cassandra ™
The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
Cassandra's data model offers the convenience of column indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and powerful built-in caching
Overview:
• Proven
Cassandra is in use at Constant Contact,CERN, Comcast, eBay, GitHub, GoDaddy,Hulu, Instagram, Intuit, Netflix, Reddit, The Weather Channel, and over 1500 more companies that have large, active data sets.
One of the largest production deployments is Apple's, with over 75,000 nodes storing over 10 PB of data. Other large Cassandra installations include Netflix (2,500 nodes, 420 TB, over 1 trillion requests per day), Chinese search engine Easou (270 nodes, 300 TB, over 800 million reqests per day), and eBay (over 100 nodes, 250 TB).
• Fault Tolerant
Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
• Performant
Cassandra consistently outperforms popular NoSQL alternatives in benchmarks and real applications, primarily because offundamental architectural choices.
• Decentralized
There are no single points of failure. There are no network bottlenecks. Every node in the cluster is identical.
• Durable
Cassandra is suitable for applications that can't afford to lose data, even when an entire data center goes down.
• You're in Control
Choose between synchronous or asynchronous replication for each update. Highly available asynchronous operations are optimized with features like Hinted Handoffand Read Repair.
• Elastic
Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
• Professionally Supported
Cassandra support contracts and services are available from third parties.
A description of the Cassandra data model
Cassandra is a partitioned row store, where rows are organized into tables with a required primary key.
The first component of a table's primary key is the partition key; within a partition, rows are clustered by the remaining columns of the PK. Other columns may be indexed independent of the PK.
This allows pervasive denormalization to "pre-build" resultsets at update time, rather than doing expensive joins across the cluster.
Basic Rules of Cassandra Data Modeling
Picking the right data model is the hardest part of using Cassandra. If you have a relational background, CQL will look familiar, but the way you use it can be very different. The goal of this post is to explain the basic rules you should keep in mind when designing your schema for Cassandra. If you follow these rules, you’ll get pretty good performance out of the box. Better yet, your performance should scale linearly as you add nodes to the cluster.
Non-Goals
Developers coming from a relational background usually carry over rules about relational modeling and try to apply them to Cassandra. To avoid wasting time on rules that don’t really matter with Cassandra, I want to point out some non-goals:
Minimize the Number of Writes
Writes in Cassandra aren’t free, but they’re awfully cheap. Cassandra is optimized for high write throughput, and almost all writes are equally efficient [1]. If you can perform extra writes to improve the efficiency of your read queries, it’s almost always a good tradeoff. Reads tend to be more expensive and are much more difficult to tune.
Minimize Data Duplication
Denormalization and duplication of data is a fact of life with Cassandra. Don’t be afraid of it. Disk space is generally the cheapest resource (compared to CPU, memory, disk IOPs, or network), and Cassandra is architected around that fact. In order to get the most efficient reads, you often need to duplicate data.
Besides, Cassandra doesn’t have JOINs, and you don’t really want to use those in a distributed fashion.
Basic Goals
These are the two high-level goals for your data model:
1. Spread data evenly around the cluster
2. Minimize the number of partitions read
There are other, lesser goals to keep in mind, but these are the most important. For the most part, I will focus on the basics of achieving these two goals. There are other fancy tricks you can use, but you should know how to evaluate them, first.
Rule 1: Spread Data Evenly Around the Cluster
You want every node in the cluster to have roughly the same amount of data. Cassandra makes this easy, but it’s not a given. Rows are spread around the cluster based on a hash of the partition key, which is the first element of the PRIMARY KEY. So, the key to spreading data evenly is this: pick a good primary key. I’ll explain how to do this in a bit.
Rule 2: Minimize the Number of Partitions Read
Partitions are groups of rows that share the same partition key. When you issue a read query, you want to read rows from as few partitions as possible.
Why is this important? Each partition may reside on a different node. The coordinator will generally need to issue separate commands to separate nodes for each partition you request. This adds a lot of overhead and increases the variation in latency. Furthermore, even on a single node, it’s more expensive to read from multiple partitions than from a single one due to the way rows are stored.
Conflicting Rules?
If it’s good to minimize the number of partitions that you read from, why not put everything in a single big partition? You would end up violating Rule #1, which is to spread data evenly around the cluster.
The point is, these two goals often conflict, so you’ll need to try to balance them.
Model Around Your Queries
The way to minimize partition reads is to model your data to fit your queries. Don’t model around relations. Don’t model around objects. Model around your queries. Here’s how you do that:
Step 1: Determine What Queries to Support
Try to determine exactly what queries you need to support. This can include a lot of considerations that you may not think of at first. For example, you may need to think about:
• Grouping by an attribute
• Ordering by an attribute
• Filtering based on some set of conditions
• Enforcing uniqueness in the result set
• etc …
Changes to just one of these query requirements will frequently warrant a data model change for maximum efficiency.
Step 2: Try to create a table where you can satisfy your query by reading (roughly) one partition
In practice, this generally means you will use roughly one table per query pattern. If you need to support multiple query patterns, you usually need more than one table.
To put this another way, each table should pre-build the “answer” to a high-level query that you need to support. If you need different types of answers, you usually need different tables. This is how you optimize for reads.
Remember, data duplication is okay. Many of your tables may repeat the same data.
Limitations
CQL
• No join or subquery support, and limited support for aggregation. This is by design, to force you to denormalize into partitions that can be efficiently queried from a single replica, instead of having to gather data from across the entire cluster.
• Ordering is done per-partition, and is specified at table creation time. Again, this is to enforce good application design; sorting thousands or millions of rows can be fast in development, but sorting billions in production is a bad idea.
Storage engine
• All data for a single partition must fit (on disk) on a single machine in the cluster. Because partition keys alone are used to determine the nodes responsible for replicating their data, the amount of data associated with a single key has this upper bound.
• A single column value may not be larger than 2GB; in practice, "single digits of MB" is a more reasonable limit, since there is no streaming or random access of blob values.
• Collection values may not be larger than 64KB.
• The maximum number of cells (rows x columns) in a single partition is 2 billion.
Compatibility guarantees
The following document describe the compatibility guarantees offered during upgrade of Apache Cassandra. When a version is mentioned, this document assumes the “tick-tock” versioning and so in X.Y, X is the major version and Y the minor one.
General Definition
When we say that upgrading from version X to version Y is supported, we always at least mean that there is a path (documented in the NEWS file if any specifics are required) for upgrading all the nodes of a cluster from X to Y in a rolling fashion and so without incurring the unavailability of the database as a whole (that is, without loss of service).
Note however that during major upgrades (3.x to 4.y) ALTER, repair, bootstrap, and decommission might be temporary unavailable until the upgrade complete. Starting with 4.y, we plan to remove this limitation.
It is also always strongly discouraged to upgrade to any version without testing the upgrade in a staging environment and without having at least some snapshot of the sstables around. This is particularly ill advised for major upgrades.
Stable vs Experimental
Anything is either considered experimental or stable. No guarantee of any sort is provided on something experimental, outside of a gentleman's agreement of not completely changing/removing features in a minor release without serious reasons.
Minor upgrades
Upgrading a node to the minor versions of an equivalent major should be virtually indistinguishable from simply restarting the node (without upgrading it) from a user point of view. This means in particular:
• No removal/modifications of any configuration option, startup option, exposed metrics or general behavior of the Cassandra process.
• No removal nor syntactical/semantical change to either CQL, authentication, any existing version of the binary protocol or thrift.
Those guarantees should be enforced as strongly as possible. In the real world however, despite our efforts to avoid it, unfortunate backward incompatible changes might end up in a release due to:
• an error: if such change was to pass our vigilance and testing and make it in a release, we'll fix that break as soon as possible (in a “patch” release).
• in rare occasions, fixing a bug might take the form of a breaking change. In the hopefully very rare case where preserving the bug is considered a lot worst than preserving compatibility, we might do such a change in a minor release.
In both case, we will communicate any such breaking change to the mailing list as soon as it is found.
While no features will be removed in a minor upgrade, some feature could be deprecated in a minor from time to time. See the section on deprecation for more details.
New features may and will be added however, though will be limited to feature releases (even-numbered ones). Those new features may not and should not be used until the full cluster has been upgraded to support them.
The corollary of this is that, provided you accept to be limited to the features supported by the smaller version in the cluster, clusters with mixed versions _within_ a major are supported.
Cassandra Integration Points
Cassandra has been integrated with a number of projects and products. This page aims to be a list of these, with a description and links. It is not meant to prefer one project over another so please keep the names in alphabetical order.