Big Data and Distributed Data Mining: An Example of Future Networks

**mkaasees** · 29-09-2016, 03:48 PM

1456876647-IJARICS1391041.docx (Size: 67.46 KB / Downloads: 4)

Abstract

This paper describes the perspective on the analytics of big data generated by sensors and devices on the edge of networks. The paper includes a discussion of the importance of data at the edge of networks where some of ―biggest‖ big data is generated. Also quick overview of emerging technologies, including distributed frameworks such as the Apache Hadoop framework and Apache* Map Reduce.

Introduction

The explosion of big data is testing the variety [1] [5], and velocity of this flood of complex, capabilities the explosion of big data is testing the variety [1] [5], and velocity of this flood of complex, capabilities of even the most advanced analytics tools. IT is challenged by the sheer volume, structured, semi structured, and unstructured data which also offers organizations exciting opportunities to gain richer, deeper, and more accurate insights into their business.

1.1. What is Big Data?

Big data is a buzzword, catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques[6] [7]

Big data is typically described by the first three characteristics. The term big data is believed to have originated with Web search companies who had to query very large distributed aggregations of loosely-structured[23] data.

Big data analytics requires capturing and processing data where it resides. This paper explores the value of data at the edge of networks, where some of ―biggest‖ big data is generated. As the use of sensors and devices as well as intelligent systems [4] [5] [6] continues to expand, the potential to gain insight from the flood of data from these sources becomes a new and compelling opportunity. Businesses that can harness the power of big data at the edge and unlock its value to the organization will outperform their competitors with greater capabilities to innovate creatively and solve complex problems whose solutions have been out of reach in the past. Below-sometimes referred to as the three Vs. However, organizations [6] [7] [12] need a fourth— value—to make big data work.

• Volume. Huge data sets that are orders of magnitude larger than data managed in traditional storage and analytical solutions. Think petabytes instead of terabytes.

• Variety. Heterogeneous, complex, and variable data[23][31], which are generated in formats as different as e-mail, social media, video, images, blogs, and sensor data—as well as ―shadow data‖ such as access journals and Web search histories.

• Velocity. Data is generated as a constant stream with real- time queries for meaningful information to be served up on demand rather than batched.

• Value. Meaningful insights that deliver predictive analytics for future trends and patterns from deep, complex analysis based on machine learning, statistical modeling, and graph algorithms. These analytics go beyond the results of traditional business intelligence querying and reporting.

1.2. An Example of Big Data?

(The Apache Hadoop Framework and MapReduce) New technologies are emerging to make big data analytics possible and cost-effective [31]. The Apache Hadoop* framework is evolving as the best new approach. The Hadoop framework redefines the way data is managed and analyzed by leveraging the power of a distributed grid of computing resources.

The Hadoop open-source framework [5] [6] [7] [21] uses a simple programming model to enable distributed processing of large data sets on clusters of computers. The complete technology stack includes common utilities, a distributed file system, analytics and data storage platforms, and an application layer that manages distributed processing, parallel computation, workflow, and configuration management. In addition to offering high availability, the Hadoop framework is more cost- effective for handling large, complex, or unstructured data sets than conventional approaches, and it offers massive scalability and speed.

Big Data at the Edge

Much of the current discussion about big data analytics today focuses on managing and analyzing unstructured data from business and social sources such as e-mail, videos, tweets, Face book posts, reviews, and Web behavior. While this type of big data analytics promises to provide significant value to organizations, data generated at the edge of the network from sensors and other devices represents another huge, untapped resource with the potential to deliver insights that can transform the operations and strategic initiatives of public and private sector organizations.

Data from intelligent systems and sensors is some of the largest volume, fastest streaming, and/or most complex big data. The data sources are distributed across the network and data is collected

by an enormous variety of equipment, such as utility meters, traffic and security cameras, RFID [22] [26] [29] [31] readers, factory-line sensors, fitness machines, and medical devices. Ubiquitous connectivity and the growth of sensors and intelligent systems have opened up a whole new storehouse of valuable information. Edge data can provide significant value to both the private and public sector as a source of enormous potential for gaining deeper, richer insight faster and more cost-effectively than in the past. In many cases, analysis of edge data can help organizations respond to events and solve problems that were previously out of reach.

3. Implications for Technology

For data to be analyzed where it resides, compute and storage capabilities must be local at the edge and in the cloud. This local infrastructure must address a set of unique challenges based on characteristics of the data and related issues.

• Sensed data is massive and streams 24-7.

• Data is noisy and dirty and requires preprocessing.

• Data has strong locality characteristics, meaning that the devices are operated and consumed locally.

• Data ownership, interoperability, security, and privacy are big issues.

How does this translate into a real-life example?

Here‘s a transportation and public safety example.

• Road sensors may belong to different departments.

• Some cameras are owned by public security, while others belong to public transportation.

• Data is generated on private vehicles.

The issues: Can the data from these multiple systemsbe integrated and analyzed for meaningful insight? Who owns the data generated on private vehicles? Is the data secured?

These issues are well worth resolving. Multiple data stream scan unlock intrinsic correlations [4] [5] [16] that can have great significance overall. A recent study in a city in the People‘s Republic of China

(PRC) shows that if you can detect morning wash time from the water supply subsystem, you can infer the morning rush hour; similarly, if you can detect when offices are powered down in the evening, you can infer the evening rush hour. Understanding these relationships can help cities better handle traffic at peak times as well as improve availability of water and electrical resources when they are most needed.

4. What’s next?

Big data is a game changer and it‘s already here. While most of the momentum around big data today is around social media sources, I believe that realizing the promise of big data [1][2][21] analytics must include a way to harness the potential of big data from intelligent systems and sensors.

• Understand use cases and their implications. We must understand how existing disparate data sources can be evolved into a network of integrated, intelligent, connected systems.

• Define the usage model requirements for the analytics of edge data. The architecture must take advantage of big data distributed frameworks [24] [27] to move computation closer to where the data resides and support big data analytics at the edge via intelligent systems and local clouds.

• Enable the fast and secure delivery of aggregated data from edge analytics systems [27] [28] to other cloud and analytics platforms for further analysis.

• Address issues related to data ownership, interoperability, security, and privacy.

4.1. Take the Next Steps to Manage and Analyze Edge Data

Here‘s how you can get ready to take advantage of this fast moving area for your organization.

• Keep up-to-date with what‘s happening. For example Intel offers practical guidance to help you deploy big data environments more quickly and with lower risk.

• Explore business opportunities deriving from the analytics of edge data. Collaborate with the business to understand existing edge systems and the potential use for data. For more information

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	data mining full report	project report tiger	37	374,184,749	16-03-2019, 05:22 PM Last Post: TitkinWY
	A Novel Data Embedding Method Using Adaptive Pixel Pair Matching Report	project girl	3	4,489	15-01-2018, 01:56 PM Last Post: dhanabhagya
	Detecting False Data in Wireless Sensor Network using Efficient Becan Scheme	seminar tips	1	3,235	20-09-2017, 01:03 PM Last Post: jaseela123
	Different Initialization Data and the Performance by the BFM	seminar flower	1	680	20-09-2017, 12:44 PM Last Post: jaseela123
	Mobile Messenger Using Ad-hoc Networks	seminar code	1	682	19-09-2017, 02:50 PM Last Post: jaseela123
	Wide Area Mobile Data Services	seminar ideas	1	2,373	19-09-2017, 02:35 PM Last Post: jaseela123
	Integrating and Designing the Data Mining Technique System Based on Customer	seminar projects maker	1	782	15-09-2017, 02:45 PM Last Post: jaseela123
	Uisce: Characteristic-based Routing in Mobile Ad Hoc Networks	project uploader	1	1,721	14-09-2017, 03:30 PM Last Post: jaseela123
	Survey of Privacy Protection for Medical Data	project maker	1	649	13-09-2017, 01:14 PM Last Post: jaseela123
	Using Rapid Prototyping Data to Enhance a Knowledge-Based Framework for Product Redes	smart paper boy	1	115,120	13-09-2017, 09:54 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.