26-10-2016, 12:49 PM
1461348715-CricketStatistics.docx (Size: 16.3 KB / Downloads: 4)
Abstract: Everyday there is a tremendous increase in the number of players playing cricket. Selectors of cricket board should select players by making suitable choices. The decision that the selectors of the cricket board have to make depends on the average and strike rate of the player. Here the methods of data mining are applied on the dataset for determining the accurate model for cricket statistics. In this we have used the one data mining algorithm performance are analyzed to estimate average and strike rate of the player.
Keywords- Data Mining, CQ5 algorithm, Data Prepocessing
I. INTRODUCTION
Evaluating the performance of players depends on two factors average and strike rate. These factors depend on 7 attributes like player, span, matches, runs, out, high score, balls faced. Data mining is a branch of Artificial Intelligence that is applied in a variety of domains nowadays .Some of the data mining concepts which are utilized here are Data Preprocessing, J48 algorithm, Data visualization for evaluating cricket dataset.
II. BACKGROUND
III. PROJECT
The dataset used was obtained from http://knoema. The figure below shows our dataset.
The attributes in the cricket statistics dataset are:
1. Player ------ Name of the player.
2. Span -------- the period of time player played the game.
3. Matches ------ Number of cricket matches played by the player.
4. Runs----------- Number of runs made by the player.
5. Outs------------ Number of times the player is out in overall matches.
6. Highscore ----- Highest runs made by player in overall matches.
7. Ballsfaced------ Number of balls faced in overall matches.
A standard data analysis was done on the dataset to identify some patterns in the data and also present the data intables based on attribute range and their frequencies.
IV. CLASSIFICATION
J48 algorithm: J48is an open source java implementation of the C4.5 algorithm in the WEKA data mining tool.C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier.
V. SUMMARY
The cricket statistics data is obtained from the http://knoema. To predict the player performance based on the attributes of the dataset is the main objective of paper. Here predictors are strike rate and average of the player. These can be determined by using j48 algorithm, this algorithm is applied on the training dataset where it classifies the class labels which are attributes of the dataset, then it gives the results
CONCLUSION
The analysis of model shows that data has cleaned using data preprocessing then we predicted class label using J48 algorithm in order to predict the performance of the player. By using data visualization we represented various players and their playing status as Best, Good, Average, Worst.