10-08-2012, 03:32 PM
Decision Trees for Uncertain Data
Decision Trees for Uncertain Data.pptx (Size: 1.11 MB / Downloads: 46)
Abstract
Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information.
With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution.
Since processing pdf’s is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data.
To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency.
Existing System
In traditional decision-tree classification, a feature (an attribute) of a tuple is either categorical or numerical.
For the latter, a precise and definite point value is usually assumed.
In many applications, however, data uncertainty is common.
Although the previous techniques can improve the efficiency of means, they do not consider the spatial relationship among cluster representatives, to perform pruning in batch.
Proposed System
A simple way to handle data uncertainty is to abstract probability distributions by summary statistics such as means and variances. We call this approach Averaging.
Another approach is to consider the complete information carried by the probability distributions to build a decision tree. We call this approach Distribution-based.
We study the problem of constructing decision tree classifiers on data with uncertain numerical attributes.
Data Insertion :
In many applications, however, data uncertainty is common. The value of a feature/attribute is thus best captured not by a single point value, but by a range of values giving rise to a probability distribution.
With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution.
This uncertain data is inserted by user.
Decision Trees for Uncertain Data.pptx (Size: 1.11 MB / Downloads: 46)
Abstract
Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information.
With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution.
Since processing pdf’s is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data.
To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency.
Existing System
In traditional decision-tree classification, a feature (an attribute) of a tuple is either categorical or numerical.
For the latter, a precise and definite point value is usually assumed.
In many applications, however, data uncertainty is common.
Although the previous techniques can improve the efficiency of means, they do not consider the spatial relationship among cluster representatives, to perform pruning in batch.
Proposed System
A simple way to handle data uncertainty is to abstract probability distributions by summary statistics such as means and variances. We call this approach Averaging.
Another approach is to consider the complete information carried by the probability distributions to build a decision tree. We call this approach Distribution-based.
We study the problem of constructing decision tree classifiers on data with uncertain numerical attributes.
Data Insertion :
In many applications, however, data uncertainty is common. The value of a feature/attribute is thus best captured not by a single point value, but by a range of values giving rise to a probability distribution.
With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution.
This uncertain data is inserted by user.