25-06-2013, 12:50 PM
Data Mining Curriculum: A Proposal (Version 1.0)
Data Mining Curriculum.pdf (Size: 94.88 KB / Downloads: 94)
Introduction
Recent tremendous technical advances in processing power, storage capacity, and inter-connectivity of com-
puter technology is creating unprecedented quantities of digital data. Data mining, the science of extracting
useful knowledge from such huge data repositories, has emerged as a young and interdisciplinary field in
computer science. Data mining techniques have been widely applied to problems in industry, science, en-
gineering and government, and it is widely believed that data mining will have profound impact on our
society. The growing consensus that data mining can bring real value has led to an explosion in demand
for novel data mining technologies and for students who are trained in data mining—students who have an
understanding of data mining techniques, can apply them to real-life problems, and are trained for research
and development of new data mining methods. Courses in data mining have started to sprawl all over the
world.
Based on this development of the field, the ACM SIGKDD Executive Committee has set up the ACM
SIGKDD Curriculum Committee to design a sample curriculum for data mining that gives recommendations
for educating the next generation of students in data mining. Based on feedback from researchers, educators,
and students, we are convinced that it is an important task to have a carefully designed, conceptually strong,
technically rich, and balanced curriculum for this discipline. A comprehensive and balanced curriculum will
ensure that the education in data mining sets a solid foundation for the healthy growth of the field, and it will
promote systematic training of students in computer science, information sciences, and other related fields,
and it will provide guidance for the training of the next generation of data mining researchers, developers
and technology users
Curriculum Design Philosophy
Data mining is an interdisciplinary field at the intersection of artificial intelligence, machine learning, statis-
tics, and database systems, and we believe that different educators will emphasize different topics in their courses. Thus we divided this curriculum proposal into two parts. The first part titled Foundations contains
basic material that we believe should be covered in any introductory course on data mining. The second
part called Advanced Topics is a comprehensive collection of material that can be sampled to complete an
introductory course or selections of which can form the basis for an advanced course in data mining.
We believe that the teaching of data mining should concentrate on long-lasting scientific principles and
concepts of the field. Thus instead of covering the last details of the most recent research, we designed the
basic material to lay a solid foundation that opens the door to explore more advanced material.
Course Topics and Models
Recall that we partitioned our curriculum into two parts: A course on Foundations and a course on Advanced
Topics. A standard 14-week one semester introductory course on data mining (offered to either senior
undergraduate or first-year graduate students) could cover all the units in Foundations and a selected set
of units from the Advanced Topics. A selected set of units from the Advanced Topics can be covered in a
second course.
Foundations (Course I)
Introduction
Basic concepts of data mining, including motivation, definition, the relationships of
data mining with database systems, statistics, machine learning, different kinds of data repositories
on which data mining can be performed, different kind of patterns and knowledge to be mined, the
concept of interestingness, and the current trends and developments of data mining. The material can
probably be introduced by showing a few case studies
Different course modules and educational goals
Since the course can be taught in different fields, such as computer science, business, and statistics, and with
different emphases, such as database, information systems, and machine learning, we should not expect the
material will be covered in full spectrum with similar emphasis. We plan to insert some modules based on
the feedbacks of instructors who have taught materials in specific fields.
Laboratories and exercises
Laboratories and exercises give students an opportunity to carry out experiments that illustrate topics in a
realistic setting and at the same time learn the specifics of the software used. Students may also be assigned
to work on projects too large to be completed during a single class period. Laboratories can provide time for
independent project work and programming assignments with reporting similar to that done in other topics
in computer science.