A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning

**mkaasees** · 17-10-2016, 02:35 PM

1459434873-SeminaronBigDataBayesianNetworkLearningApproach.pptx (Size: 499.14 KB / Downloads: 6)

Introduction

Bayesian Network (BN) is a probabilistic graph model that provides theoretically solid mechanisms for processing unsettled information and presenting relations among variables.

BNs have been applied in a wide range of domains such as Health Care, Education, Finance, Environment, Bioinformatics, Telecommunication, and Information Technology .

With abundant data resources nowadays, learning BN from Big Data could discover valuable business insights and bring potential revenue value to different domains that deal with Big Data.

We introduce a data partition approach that provides scalability to Bayesian network learning and it can also be applied to many other machine learning techniques to make them scalable and Big Data ready.

Existing System

Several Distributed Data Parallel (DDP) patterns, such as Map, Reduce, Match, CoGroup, and Cross, have been identified to easily build efficient and scalable data parallel analysis and analytics applications.

Each DDP pattern executes user-defined functions (UDF) in parallel over input data sets.

Since each DDP execution engine defines its own API for how UDFs should be implemented, an application implemented for one engine may be difficult to run on another engine.

Arising Problems

How can we effectively pre-process Big Data to evaluate its quality and reduce the size if necessary?

How to design a workflow capable of taking Gigabytes of big data sets and learn BNs with decent accuracy?

How to provide easy scalability support to BN learning algorithms?

These three are the main motivation for this research: the creation of the novel workflow - Scalable Bayesian Network Learning (SBNL) workflow.

Proposed System : Scalable Bayesian Network Learning (SBNL) workflow

This SBNL workflow has three research components which contribute to the current literature:

Intelligent Big Data pre-processing through the use of a proposed data quality score called Arc S to measure and ensure data quality and data faithfulness.

A new weight based ensemble algorithm (Max-Min Hill Climbing) is proposed to learn a BN structure from an ensemble of local results.

A user-friendly approach to build and run scalable Big Data machine learning applications via Kepler on top of DDP patterns and engines via scientific workflows.

Conclusion

By combining machine learning, distributed computing and workflow techniques, a Scalable Bayesian Network Learning (SBNL) workflow has been designed.

An illustration has been provided as to how the Kepler scientific workflow system can easily provide scalability to Bayesian network learning.

SBNL obtains significant performance gain when applied to distributed environments while keeping the same learning accuracy, making SBNL an ideal workflow for Big Data Bayesian Network learning.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	data mining full report	project report tiger	37	374,184,749	16-03-2019, 05:22 PM Last Post: TitkinWY
	Development of a workflow based Complaint Management System (where the complaints are	mechanical engineering crazy	2	28,844,331	26-11-2018, 12:11 PM Last Post: Guest
	A Novel Data Embedding Method Using Adaptive Pixel Pair Matching Report	project girl	3	4,489	15-01-2018, 01:56 PM Last Post: dhanabhagya
	Detecting False Data in Wireless Sensor Network using Efficient Becan Scheme	seminar tips	1	3,235	20-09-2017, 01:03 PM Last Post: jaseela123
	Different Initialization Data and the Performance by the BFM	seminar flower	1	680	20-09-2017, 12:44 PM Last Post: jaseela123
	Multiple Routing Configurations for Fast IP Network Recovery full report	seminar tips	1	964	19-09-2017, 03:27 PM Last Post: jaseela123
	Wide Area Mobile Data Services	seminar ideas	1	2,373	19-09-2017, 02:35 PM Last Post: jaseela123
	Integrating and Designing the Data Mining Technique System Based on Customer	seminar projects maker	1	782	15-09-2017, 02:45 PM Last Post: jaseela123
	SYSTEM-LEVEL MODELLING OF IEEE 802.16E MOBILE WIMAX NETWORK: KEY ISSUES REPORT	project girl	1	1,143	13-09-2017, 02:51 PM Last Post: jaseela123
	Survey of Privacy Protection for Medical Data	project maker	1	649	13-09-2017, 01:14 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.