Decision Trees for Uncertain Data DETAILS

**seminar ideas** · 30-07-2012, 01:26 PM

Decision Trees for Uncertain Data

Decision Trees for Uncertain Data

previous techniques can improve the efficiency of means, they do not consider the spatial relationship among cluster representatives, nor make use of the proximity between groups of uncertain objects to perform pruning in batch. A simple way to handle data uncertainty is to abstract probability distributions by summary statistics such as means and variances. We call this approach Averaging. Another approach is to consider the complete information carried by the probability distributions to build a decision tree. We call this approach Distribution-based.

PROPOSED SYSTEM :

We study the problem of constructing decision tree classifiers on data with uncertain numerical attributes. Our goals are (1) to devise an algorithm for building decision trees from uncertain data using the Distribution-based approach; (2) to investigate whether the Distribution-based approach could lead to a higher classification accuracy compared with the Averaging approach; and (3) to establish a theoretical foundation on which pruning techniques are derived that can significantly improve the computational efficiency of the Distribution-based algorithms.
MODULES
Data Insertion
In many applications, however, data uncertainty is common. The value of a feature/attribute is thus best captured not by a single point value, but by a range of values giving rise to a probability distribution. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. This uncertain data is inserted by user.

Generate Tree

Building a decision tree on tuples with numerical, point valued data is computationally demanding. A numerical attribute usually has a possibly infinite domain of real or integral numbers, inducing a large search space for the best “split point”. Given a set of n training tuples with a numerical attribute, there are as many as n-1 binary split points or ways to partition the set of tuples into two non-empty groups. Finding the best split point is thus computationally expensive. To improve efficiency, many techniques have been proposed to reduce the number of candidate split points
Averaging

A simple way to handle data uncertainty is to abstract probability distributions by summary statistics such as means and variances. We call this approach Averaging. A straight-forward way to deal with the uncertain information is to replace each pdf with its expected value, thus effectively converting the data tuples to point-valued tuples. This reduces the problem back to that for point-valued data. AVG is a greedy algorithm that builds a tree top-down. When processing a node, we examine a set of tuples S. The algorithm starts with the root node and with S being the set of all training tuples. At each node n, we first check if all the tuples in S have the same class label.

Distribution Based

An approach is to consider the complete information carried by the probability distributions to build a decision tree. We call this approach Distribution-based. Our goals are,
(1) To devise an algorithm for building decision trees from uncertain data using the Distribution-based approach;
(2) To investigate whether the Distribution-based approach could lead to a higher classification accuracy compared with the Averaging approach;
(3) To establish a theoretical foundation on which pruning techniques are derived that can significantly improve the computational efficiency of the Distribution-based algorithms.

**seminar ideas** · 10-08-2012, 03:32 PM

Decision Trees for Uncertain Data

.pptx

Decision Trees for Uncertain Data.pptx (Size: 1.11 MB / Downloads: 46)

Abstract

Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information.
With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution.
Since processing pdf’s is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data.
To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency.

Existing System

In traditional decision-tree classification, a feature (an attribute) of a tuple is either categorical or numerical.
For the latter, a precise and definite point value is usually assumed.
In many applications, however, data uncertainty is common.
Although the previous techniques can improve the efficiency of means, they do not consider the spatial relationship among cluster representatives, to perform pruning in batch.

Proposed System

A simple way to handle data uncertainty is to abstract probability distributions by summary statistics such as means and variances. We call this approach Averaging.
Another approach is to consider the complete information carried by the probability distributions to build a decision tree. We call this approach Distribution-based.
We study the problem of constructing decision tree classifiers on data with uncertain numerical attributes.

Data Insertion :

In many applications, however, data uncertainty is common. The value of a feature/attribute is thus best captured not by a single point value, but by a range of values giving rise to a probability distribution.
With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution.
This uncertain data is inserted by user.

**seminar flower** · 27-08-2012, 04:42 PM

Decision Trees for Uncertain Data

.doc

Decision Trees.doc (Size: 51.5 KB / Downloads: 30)

What is Decision Tree?

A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.
It is one way to display an algorithm.
Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal.
Another use of decision trees is as a descriptive means for calculating conditional probabilities.
A decision tree consists of 3 types of nodes:-
1. Decision nodes - commonly represented by squares
2. Chance nodes - represented by circles
3. End nodes - represented by triangles
Example
Decision trees can be used to optimize an investment portfolio.
The following example shows a portfolio of 7 investment options (projects).
The organization has $10,000,000 available for the total investment.
Bold lines mark the best selection 1, 3, 5, 6, and 7, which will cost $9,750,000 and create a payoff of 16,175,000.

What is uncertain Data?

In computer science, uncertain data is the notion of data that contains specific uncertainty. Uncertain data is typically found in the area of sensor networks. When representing such data in a database, some indication of the probability of the various values.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123
	B-Trees report	project girl	1	913	19-09-2017, 01:21 PM Last Post: jaseela123
	DEMONSTRATING DATAPOSSESSION AND UN CHEATABLE DATA TRANSFER	seminar flower	1	1,466	19-09-2017, 11:05 AM Last Post: jaseela123
	Processing of collected data PPT	seminar projects maker	1	718	15-09-2017, 12:48 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123
	Data Warehouse Report	study tips	1	879	12-09-2017, 12:23 PM Last Post: jaseela123
	Decision Support Systems	seminar code	1	672	11-09-2017, 12:11 PM Last Post: jaseela123
	A Decision Support System to improve e-Learning Environments pdf	project girl	1	1,265	09-09-2017, 09:33 AM Last Post: jaseela123
	CONFIDENTIAL DATA STORAGE AND DELETION details	seminar ideas	1	1,668	06-09-2017, 01:23 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.