Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Early Prediction of Students’ Grade Point Averages at Graduation: A Data Mining Approach

[attachment=69809]

Abstract

Problem Statement: There has recently been interest in educational databases containing a variety of valuable but sometimes hidden data that can be used to help less successful students to improve their academic performance. The extraction of hidden information from these databases often implements aspects of the educational data mining (EDM) theory, which aims to study available data in order to shed light on more valuable, hidden information. Data clustering, classification, and regression methods such as K-means clustering, neural networks (NN), extreme learning machine (ELM), and support vector machines (SVM) can be used for to predict aspects of the educational data. EDM outputs can ultimately identify which students will need additional help to improve their grade point averages (GPAs) at graduation.

Purpose of Study: This study aims to implement several prediction techniques in data mining to assist educational institutions with predicting their students’ GPAs at graduation. If students are predicted to have low GPAs at graduation, then extra efforts can be made to improve their academic performance and, in turn, GPAs.

Methods: NN, SVM, and ELM algorithms are applied to data of computer education and instructional technology students to predict their GPAs at graduation.

Findings and Results: A comparative analysis of the results indicates that the SVM technique yielded more accurate predictions at a rate of 97.98%. By contrast, the ELM method yielded the second most accurate prediction rate (94.92%) evaluated based on the criterion of correlation coefficient. NN reported the least accurate prediction rate (93.76%).

Conclusions and Recommendations: The use of data mining methodologies has recently expanded for a variety of educational purposes. The assessment of students’ needs, dropout liability, performance, and placement test improvement are some important emerging data mining applications in education. Since educational institutions have several seemingly unsolvable domain-related problems, this study’s results reveal that EDM can assist with how educational institutions analyze and solve these problems. Furthermore, ensemble models can be used to obtain improved results, while feature selection algorithms can be used to reduce the computational complexity of the prediction methods.

Keywords: GPA prediction, educational data mining, prediction methods, higher education

Data mining is the process of extracting important patterns from a given database and is therefore a valuable tool for converting data into usable information. Data mining has a wide range of applications in different areas, including marketing, banking, educational research, surveillance, telecommunications fraud detection, and scientific discovery (Han & Kamber, 2008). More specifically, data mining can discover hidden information to inform decision-making in various domains. The education system is one of these domains in which the primary concern is the evaluation and, in turn, enhancement of educational organizations.

Institutions of higher learning such as universities are at the core of educational systems in which extensive research and development is performed in a competitive environment. The prerequisite mission of these institutions is to generate, collect, and share knowledge. Specifically, universities commonly require knowledge mined from past and current data sets that, once mined, can be used for representing and delivering information to university administrators for monitoring conditions and taking action to resolve problems.

A growing volume of data is currently stored in educational databases that contain various hidden information that can help to improve the academic performance of students. Educational data mining is thus used to study available data and extract the hidden information for subsequent processes. This hidden information can be used in several educational processes such as predicting course enrollment, estimating student dropout rate (Yukselturk, Ozekes, & Turel, 2014), detecting abnormal values in the result sheets of students, and predicting student performance. Several prediction techniques can be used to help the educational institutions to predict their students’ grade point averages (GPAs) at graduation. If this prediction output indicates that a student will have a low GPA, then extra efforts can be made to improve the student’s academic performance and, in turn, his or her GPA at graduation. In this context, neural networks (NN), support vector machines (SVM), and extreme learning machine (ELM) algorithms can be applied to such data, and the comparative analysis of results can indicate that which students should receive extra academic help.

Since data mining techniques can be used to identify student performance trends, many researchers and authors have investigated EDM. In this study, a literature review concerning the EDM was conducted to better understand the importance of EDM’s applications in higher education, especially regarding the improvement of student performance.

Bharadwaj and Pal (2011a) used EDM to evaluate student performance among 300 students from five different colleges who were enrolled in an undergraduate computer application course. The study employed a Bayesian classification scheme of 17 attributes, of which student performance on a senior secondary exam, residence, various habits, family’s annual income, and family status were shown to be important parameters for academic performance. In a subsequent study, Bharadwaj and Pal (2011b) constructed a new data set with the attributes of a student attendance and test, seminar, and assignment scores in order to predict academic performance. Meanwhile, Ramaswami and Bhaskaran (2009) compared various feature selection methods for obtaining the best feature combination for improving prediction accuracy. Their data set included several interesting features such as student vision, eating habits, and family attributes. More recently, Sen, Uçar, and Delen (2012) used various data mining models to predict secondary education placement test results. They investigated sensitivity analysis identifying the most important predictors and also demonstrated that compared to NN, SVM, and logistic regression models, the C5 decision tree algorithm was the best predictor. A similar work was earlier proposed by Kovacic (2010), who used EDM to identify the extent to which enrollment data could be used to predict student academic performance. For this purpose, CHAID and CART algorithms were used on a dataset of student enrollment of information system students at the Open Polytechnic of New Zealand. Among other studies, Ben–Zadok, Hershkovitz, Mintz, and Nachmias (2009) presented a student warning scheme that uses student data to analyze learning behavior and warn them of risk before their final exams. Al-Radaideh, Al-Shawakfa, and Al-Najjar (2006) used data mining methods to analyze student academic data and improve the quality of the higher educational system. Feng, Beck, Heffernan, and Koedinger (2008) conducted a study to predict the standardized tests scores of students in middle and high schools that used a regression model with 25 variables. Kobrin, Camara, and Milewski (2002) studied student SAT scores and high-school grades within several diverse student bodies and ultimately determined three groups. While the first group comprised students with no significant variance in grades or test scores, the second group contained students whose SAT scores were significantly better than their grades would have otherwise suggested. Finally, the third group consisted of students whose SAT scores were abnormally low compared to their high-school performance and, interesting, was represented by women and minority students more heavily than the other two groups. An unsupervised k-means clustering algorithm was proposed by Shaeela, Tasleem, Ahsan, and Khan (2010) to predict student’s learning activities; results suggested that the outputs could be helpful for both instructors and students. A similar work was conducted by Erdoğan and Timor (2005), who proposed the k-means algorithm to identify student characteristics of 722 students at Maltepe University; the study sought a probable relationship between the university entrance exam results and other academic achievements.

mkaasees