Comparing High Level MapReduce Query Languages PPT

**seminar flower** · 06-09-2012, 03:54 PM

Comparing High Level MapReduce Query Languages

.pdf

Comparing High Level.pdf (Size: 439.4 KB / Downloads: 23)

Abstract.

The MapReduce parallel computational model is of increasing
importance. A number of High Level Query Languages (HLQLs) have
been constructed on top of the Hadoop MapReduce realization, primarily
Pig, Hive, and JAQL. This paper makes a systematic performance
comparison of these three HLQLs, focusing on scale up, scale out and
runtime metrics. We further make a language comparison of the HLQLs
focusing on conciseness and computational power. The HLQL development
communities are engaged in the study, which revealed technical
bottlenecks and limitations described in this document, and it is impacting
their development.

Introduction

The MapReduce model proposed by Google [8] has become a key data processing
model, with a number of realizations including the open source Hadoop [3]
implementation. A number of HLQLs have been constructed on top of Hadoop
to provide more abstract query facilities than using the low-level Hadoop Java
based API directly. Pig [18], Hive [24], and JAQL [2] are all important HLQLs.
This paper makes a systematic investigation of the HLQLs. We investigate
specifically, whether the HLQLs are indeed more abstract: that is, how much
shorter are the queries in each HLQL compared with direct use of the API?
What performance penalty do the HLQLs pay to provide more abstract queries?
How expressive are the HLQLs - are they relationally complete, SQL equivalent,
or even Turing complete? More precisely, the paper makes the following research
contributions with respect to Pig, Hive, and JAQL.

Hadoop

Hadoop [3] is an Apache open source MR implementation, which is well suited
for use in large data warehouses, and indeed has gained traction in industrial
datacentres at Yahoo, Facebook and IBM. The software stack of Hadoop is packaged
with a set of complimentary services, and higher level abstractions from
MR. The core elements of Hadoop however, are MapReduce - the distributed
data processing model and execution environment; and the Hadoop Distributed
Filesystem (HDFS) - a distributed filesystem that runs on large clusters. The
HDFS provides high throughput access to application data, is suitable for applications
that have large data sets

High Level Query Languages

Justifications for higher level query languages over the MR paradigm are presented
in [15]. It outlines the lack of support that MR provides for complex
N-step dataflows, that often arise in real-world data analysis scenarios. In addition,
explicit support for multiple data sources is not provided by MR. A
number of HLQLs have been developed on top of Hadoop, and we review Pig
[18], Hive [24], and JAQL [2] in comparison with raw MapReduce. Their relationship
to Hadoop is depicted in Figure 1. Programs written in these languages
are compiled into a sequence of MapReduce jobs, to be executed in the Hadoop
MapReduce environment

HLQL Comparison

Language Design. The language design motivations are reflected by the contrasting
features of each high level query language. Hive provides Hive QL, a
SQL like language, presenting a declarative language (Listing 1.3). Pig by comparison
provides Pig Latin (Listing 1.2), a dataflow language influenced by both
the declarative style of SQL (it includes SQL like functions), and also the more
procedural MR (Listing 1.1). Finally, JAQL is a functional, higher-order programming
language, where functions may be assigned as variables, and later
evaluated (Listing 1.4). In contrast, Pig and Hive are strictly evaluated during
the compilation process, to identify type errors prior to runtime

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Human Computer Interface : Seminar Report and PPT	seminar post	1	1,337	22-09-2017, 11:23 AM Last Post: jaseela123
	4G Broadband : Seminar Report and PPT	study tips	1	1,261	22-09-2017, 11:19 AM Last Post: jaseela123
	Software Life-Cycle Models ppt	seminar flower	1	3,852	22-09-2017, 10:54 AM Last Post: jaseela123
	PPT ON LINUX	project girl	1	1,829	21-09-2017, 03:56 PM Last Post: jaseela123
	Public Key Infrastructure (Digital Certificates and Digital Signatures) PPT	project girl	1	2,364	21-09-2017, 01:18 PM Last Post: jaseela123
	Itanium Processor : Seminar Report and PPT	seminar projects maker	1	1,052	21-09-2017, 12:46 PM Last Post: jaseela123
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Biometric Authentication PPT	project girl	1	1,109	19-09-2017, 02:32 PM Last Post: jaseela123
	Android Interface Definition Language PPT	project girl	1	1,681	19-09-2017, 10:58 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.