15-12-2012, 01:58 PM
SAP HANA
1SAP HANA.pdf (Size: 929.25 KB / Downloads: 112)
Introduction
SAP HANA is a flexible, data-source-agnostic appliance that enables customers to analyze large volumes of SAP ERP data in real-time, avoiding the need to materialize transformations. SAP HANA is a hardware and software combination that integrates a number of SAP components including the SAP HANA database, SAP LT (Landscape Transformation) Replication Server, SAP HANA Direct Extractor Connection (DXC) and Sybase Replication technology. SAP HANA is delivered as an optimized appliance in conjunction with leading SAP hardware partners. SAP HANA is SAP AG’s implementation of in-memory database technology. There are four components within the software group: SAP HANA DB (or HANA DB) refers to the database technology itself, SAP HANA Studio refers to the suite of tools provided by SAP for modeling, SAP HANA Appliance refers to HANA DB as delivered on partner certified hardware as an appliance. It also includes the modeling tools from HANA Studio as well as replication and data transformation tools to move data into HANA DB, SAP HANA Application Cloud refers to the cloud based infrastructure for delivery of applications (typically existing SAP applications rewritten to run on HANA).
Keeping data in-memory:
The capacity of main memory in servers has continuously increased over the years, whereas prices have dramatically dropped. Today, a single enterprise class server can hold several terabytes of main memory. At the same time, prices for server main memory dramatically dropped over the last few decades. This increase in capacity and reduction in cost makes it a viable approach to keep huge amounts of business data in memory. This section discusses the benefits and challenges.
1. TREX (Text Retrieval and Extraction) is a search engine and began in 1996 as a student project at SAP in collaboration with DFKI. TREX became a standard component in SAP Net Weaver in 2000. In-memory attributes were added in 2002 and columnar data store was added in 2003, both as ways to enhance performance.
2. In 2005 SAP acquired Menlo Park based Transact in Memory, Inc.[5] With the acquisition came P*Time, an in-memory light-weight online transaction processing (OLTP) RDBMS technology with a row-based data store.
3. MaxDB (formerly SAP DB), a relational database coming from Nixdorf via Software AG (Adabas D) to SAP, was added to TREX and P*Time to provide persistence and more traditional database features like backup.
In 2008, SAP CTO Vishal Sikka wrote about HANA "...our teams working together with the Hasso Plattner Institute and Stanford University demonstrated how a new application architecture is possible, one that enables real-time complex analytics and aggregation, up to date with every transaction, in a way never thought possible in financial applications”. In 2009 a development initiative was launched at SAP to integrate the three technologies above to provide a more comprehensive feature set. The resulting product was named internally and externally as NewDB until the change to HANA DB was finalized in 2011.
Big data
Big data refers to datasets that exceed the abilities of commonly used tools. While no formal definition based on size exists, these datasets typically reach terabytes (TB), pet bytes (PB), or even Exabyte in size. SAP has positioned HANA as its solution to big data challenges at the low end of this scale. At launch HANA started with 1TB of RAM supporting up to 5TB of uncompressed data. In late 2011 hardware with 8TB of RAM became available which supported up to 40TB of uncompressed data. SAP owned Sybase IQ with its more mature Map Reduce-like functionality has been cited as a potentially better fit for larger datasets.
Methodology
Data persistence Keeping data in main memory brings up the question of what will happen in case of a loss of power. In database technology, atomicity, consistency, isolation, and durability (ACID) is a set of requirements that guarantees that database transactions are processed reliably:
A transaction has to be atomic. That is, if part of a transaction fails, the entire transaction has to fail and leave the database state unchanged.
The consistency of a database must be preserved by the transactions that it performs.
Isolation ensures that no transaction is able to interfere with another transaction.
Durability means that after a transaction has been committed it will remain committed.
Compression:
Even though today’s memory capacities allow keeping enormous amounts of data. In memory, compressing the data in-memory is still desirable. The goal is to compress data in a way that does not use up performance gained, while still minimizing data movement from RAM to the processor. By working with dictionaries to be able to represent text as integer numbers, the database can compress data significantly and thus reduce data movement, while not imposing additional CPU load for decompression, but even adding to the performance1. Figure 2-1 illustrates this with a simplified example.