26-10-2016, 11:27 AM
1461263510-papermesos.docx (Size: 1.51 MB / Downloads: 7)
INTRODUCTION
Apache Spark
Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics. Spark has several advantages compared to other big data and MapReduce technologies like Hadoop and Storm. Firstly, Spark gives us a comprehensive, unified framework to manage big data processing requirements with a variety of data sets that are diverse in nature (text data, graph data, etc.) as well as the source of data (batch vs. real-time streaming data). Spark enables applications in Hadoop clusters to run up to 100 times faster in memory and ten times faster even when running on disk. Spark lets you quickly write applications in Java, Scala, or Python. It comes with a built-in set of over 80 high-level operators. And you can use it interactively to query data within the shell. In addition to Map and Reduce operations, it supports SQL queries, streaming data, machine learning and graph data processing. Developers can use these capabilities stand-alone or combine them to run in a single data pipeline use case. Spark provides a complete suite of tools for big data processing.
Apache Mesos
Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications or frameworks. Mesos is a open source software originally developed at the University of California at Berkeley. It sits between the application layer and the operating system and makes it easier to deploy and manage applications in large-scale clustered environments more efficiently. It can run many applications on a dynamically shared pool of nodes.
Mesos uses a distributed two-level scheduling mechanism called resource offers. Mesos decides how many resources to offer each framework, while frameworks decide which resources to accept and which computations to run on them. It is a thin resource sharing layer that enables fine-grained sharing across diverse cluster computing frameworks, by giving frameworks a common interface for accessing cluster resources.The idea is to deploy multiple distributed systems to a shared pool of nodes in order to increase resource utilization.
Kubernetes
Kubernetes is an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts, providing container-centric infrastructure. Kubernetes can schedule and run application containers on clusters of physical or virtual machines.
Containers are an operating-system-level virtualization method for running multiple isolated Linux systems (containers) on a control host using a single Linux kernel. Containers provides operating system-level virtualization through a virtual environment that has its own process and network space, instead of creating a full-fledged virtual machine. These containers are isolated from each other and from the host. They have their own filesystems, they can’t see each other's processes, and their computational resource usage can be bounded. They are easier to build than VMs, and because they are decoupled from the underlying infrastructure and from the host filesystem, they are portable across clouds and OS distributions.
Mesos vs. Kubernetes
Mesos is mainly used for those working with larger data clusters, such as Spark database: where as Kubernetes offers developers a lightweight cluster management tool for working with projects that are similarly packaged. Mesos demand more from their services while building an application whereas Kubernetes are built for flexibility without sacrificing control over resource management, allowing for clusters to be scaled up, down or paused as needed. Mesos uses Zookeeper for master election and discovery and Apache Auroa is a scheduler that runs on Mesos. Kubernetes can also run on top of Mesos and share a cluster. The resource overhead of Mesos uses event-driven message passing rather than coordinating via etcd (Etcd is an open-source distributed key-value store that serves as the backbone of distributed systems by providing a canonical hub for cluster coordination and state management) which is the case in Kubernetes. Kubernetes allows for quick-scaling, lightweight cluster management, while Mesos requires more resources.