Predicting School Failure and Dropout by Using Data Mining Techniques

**seminar code** · 02-07-2014, 03:41 PM

Predicting School Failure and Dropout by Using Data Mining Techniques

.pdf

Predicting School Failure and Dropout.pdf (Size: 266.04 KB / Downloads: 32)

INTRODUCTION

RECENT years have shown a growing interest and concern
in many countries about problem of school failure and
the determination of its main contributing factors [1]. The great
deal of research [2] has been done on identifying the factors
that affect the low performance of students (school failure and
dropout) at different educational levels (primary, secondary
and higher) using the large amount of information that current
computers can store in databases. All these data are a “gold
mine” of valuable information about students. Identify and
ﬁnd useful information hidden in large databases is a difﬁcult
task [3]. A very promising solution to achieve this goal is the
use of knowledge discovery in databases techniques or data
mining in education, called educational data mining, EDM [4].
This new area of research focuses on the development of
methods to better understand students and the settings in which
they learn [5]. In fact, there are good examples of how to
apply EDM techniques to create models that predict dropping
out and student failure speciﬁcally [6]. These works have
shown promising results with respect to those sociological,
economic, or educational characteristics that may be more
relevant in the prediction of low academic performance [7].
It is also important to notice that most of the research on
the application of EDM to resolve the problems of student
failure and drop-outs has been applied primarily to the speciﬁc
case of higher education [8] and more speciﬁcally to online o

DATA PRE-PROCESSING

Before applying DM algorithm sit is necessary to carry
out some pre-processing tasks such as cleaning, integration,
discretization and variable transformation [13]. It must be
pointed out that very important task in this work was data
pre-processing, due to the quality and reliability of available
information, which directly affects the results obtained. In fact,
some speciﬁc pre-processing tasks were applied to prepare all
the previously described data so that the classiﬁcation task
could be carried out correctly. Firstly, all available data were
integrated into a single dataset. During this process those
students without 100% complete information were eliminated.
All students who did not answer our speciﬁc survey or the
CENEVAL survey were excluded. Some modiﬁcations were
also made to the values of some attributes. For example,
words that contained the letter “Ñ” were replaced by “N”.
A new attribute of the age of each student in years was
created using the day, month, and year of birth of each student.
Furthermore, the continuous variables were transformed into
discrete variables, which provide a much more comprehensible
view of the data. For example, the numerical values of the
scores obtained by students in each subject were changed to
categorical values in the following way:

INTERPRETATION OF RESULTS

In this section, some examples of different rules discovered
by some of the algorithms are shown in order to compare
their interpretability and usefulness for early identiﬁcation of
students with risk of failing and for making decisions about
how to help this student. These rules show us the relevant
factors and relationships that lead a student to pass or fail.

CONCLUSION

As we have seen, predicting student failure at school can be
a difﬁcult task not only because it is a multifactor problem (in
which there are a lot of personal, family, social, and economic
factors that can be inﬂuential) but also because the available
data are normally imbalanced. To resolve these problems, we
have shown the use of different DM algorithms and approaches
for predicting student failure. We have carried out several
experiments using real data from high school students in
Mexico. We have applied different classiﬁcation approaches
for predicting the academic status or ﬁnal student performance
at the end of the course. Furthermore we have shown that some
approaches such as selecting the best attributes, cost-sensitive
classiﬁcation, and data balancing can also be very useful for
improving accuracy.
It is important to notice that gathering information and
pre-processing data were two very important tasks in this
work. In fact, the quality and the reliability of the used
information directly affects the results obtained. However, this
is an arduous task that involves a lot of time to do. Speciﬁcally,
we had to do the pick out of data from a paper and pencil
survey and we had to integrat data from three different sources
to form the ﬁnal dataset.
In general, regarding the DM approaches used and the
classiﬁcation result obtained, the main conclusions are as
follows:

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	data mining full report	project report tiger	37	374,184,749	16-03-2019, 05:22 PM Last Post: TitkinWY
	A Novel Data Embedding Method Using Adaptive Pixel Pair Matching Report	project girl	3	4,489	15-01-2018, 01:56 PM Last Post: dhanabhagya
	Detecting False Data in Wireless Sensor Network using Efficient Becan Scheme	seminar tips	1	3,235	20-09-2017, 01:03 PM Last Post: jaseela123
	Different Initialization Data and the Performance by the BFM	seminar flower	1	680	20-09-2017, 12:44 PM Last Post: jaseela123
	Color Image Indexing Using BTC	seminar tips	1	1,436	19-09-2017, 02:52 PM Last Post: jaseela123
	Mobile Messenger Using Ad-hoc Networks	seminar code	1	682	19-09-2017, 02:50 PM Last Post: jaseela123
	Wide Area Mobile Data Services	seminar ideas	1	2,373	19-09-2017, 02:35 PM Last Post: jaseela123
	System Analysis (Modeling of the Existing and Proposed System using OOD)	seminar flower	1	2,459	15-09-2017, 03:39 PM Last Post: jaseela123
	Integrating and Designing the Data Mining Technique System Based on Customer	seminar projects maker	1	782	15-09-2017, 02:45 PM Last Post: jaseela123
	DESIGN AND PERFORMANCE ANALYSIS OF OPTICAL CDMA SYSTEM USING NEWLY DESIGNED MULTIWAVE	project girl	1	1,270	15-09-2017, 01:34 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.