25-08-2017, 09:32 PM
Weka and NetDraw
Weka and NetDraw.ppt (Size: 913.5 KB / Downloads: 83)
Introduction to Weka (Data Mining Tool)
Weka was developed at the University of Waikato in New Zealand. http://www.cs.waikato.ac.nz/ml/weka/
Weka is a open source data mining tool developed in Java. It is used for research, education, and applications. It can be run on Windows, Linux and Mac.
What can Weka do?
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset (using GUI) or called from your own Java code (using Weka Java library).
Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
Weka Tools/Functions
Tools (or functions) in Weka include:
Data preprocessing (e.g., Data Filters),
Classification (e.g., BayesNet, KNN, C4.5 Decision Tree, Neural Networks, SVM),
Regression (e.g., Linear Regression, Isotonic Regression, SVM for Regression),
Clustering (e.g., Simple K-means, Expectation Maximization (EM)),
Association rules (e.g., Apriori Algorithm, Predictive Accuracy, Confirmation Guided),
Feature Selection (e.g., Cfs Subset Evaluation, Information Gain, Chi-squared Statistic), and
Visualization (e.g., View different two-dimensional plots of the data).
How to use Weka?
Weka Data File Format (Input)
Weka for Data Mining
Sample Output from Weka (Output)
Weka Data File Format (Input)
FILE FORMAT
@relation RELATION_NAME
@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
@attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR
@data
DATAROW1
DATAROW2
DATAROW3
Weka for Data Mining
There are mainly 2 ways to use Weka to conduct your data mining tasks.
Use Weka Graphical User Interfaces (GUI)
GUI is straightforward and easy to use. But it is not flexible. It can not be called from you own application.
Import Weka Java library to your own java application.
Developers can leverage on Weka Java library to develop software or modify the source code to meet special requirements. It is more flexible and advanced. But it is not as easy to use as GUI.
Import Weka Java library to your own Java application
Three sets of classes you may need to use when developing your own application
Classes for Loading Data
Classes for Classifiers
Classes for Evaluation
Classes for Loading Data
Related Weka classes
weka.core.Instances
weka.core.Instance
weka.core.Attribute
How to load input data file into instances?
Every DataRow -> Instance, Every Attribute -> Attribute, Whole -> Instances
Instances contains Attribute and Instance
How to get every Instance within the Instances?
How to get an Attribute?
How to get the Attribute value of each Instance?
Class Index (Very important!)
Classes for Classifiers
Weka classes for C4.5, Naïve Bayes, and SVM
Classifier: all classes which extend weka.classifiers.Classifier
C4.5: weka.classifier.trees.J48
NaiveBayes: weka.classifiers.bayes.NaiveBayes
SVM: weka.classifiers.functions.SMO
How to build a classifier?
Classes for Evaluation
Related Weka classes
weka.classifiers.CostMatrix
weka.classifiers.Evaluation
How to use the evaluation classes?
Cross Validation
In cross validation process, we split a single dataset into N equal shares. While taking N-1 shares as a training dataset, the rest will be used as testing dataset.
The most widely used is 10 cross fold validation.
How to obtain the training dataset and the testing dataset?
Conclusion about Weka
In sum, the overall goal of Weka is to build a state-of-the-art facility for developing machine learning (ML) techniques and allow people to apply them to real-world data mining problems.
Detailed documentation about different functions provided by Weka can be found on Weka website.
WEKA is available at:
http://www.cs.waikato.ac.nz/ml/weka
Introduction to NetDraw (Visualization Tool)
NetDraw is an open source program written by Steve Borgatti from Analytic Technologies. It is often used for visualizing both 1-mode and 2-mode social network data.
You can download it from:
http://www.analytictechdownloadnd.htm
(Compared to Weka, it is much easier to use )
What can NetDraw do?
NetDraw can:
handle multiple relations at the same time, and
use node attributes to set colors, shapes, and sizes of nodes.
Pictures can be saved in metafile, jpg, gif and bitmap formats.
Two basic kinds of layouts are implemented: a circle and an MDS based on geodesic distance.
You can also rotate, flip, shift, resize and zoom configurations.
How to use NetDraw?
NetDraw Input Data File Format
Draw Networks using NetDraw