17-07-2013, 03:29 PM
Clustering and Sequential Pattern Mining of Online Collaborative Learning Data
Clustering and Sequential Pattern.pdf (Size: 246.05 KB / Downloads: 37)
Abstract
Group work is widespread in education. The growing use of online tools supporting group work generates huge
amounts of data. We aim to exploit this data to support mirroring: presenting useful high-level views of information about the
group, together with desired patterns characterizing the behaviour of strong groups. The goal is to enable the groups and their
facilitators to see relevant aspects of the group’s operation and provide feedback if these are more likely to be associated with
positive or negative outcomes and where the problems are. We explore how useful mirror information can be extracted via a
theory-driven approach and a range of clustering and sequential pattern mining. The context is a senior software development
project where students use the collaboration tool TRAC. We extract patterns distinguishing the better from the weaker groups
and get insights in the success factors. The results point to the importance of leadership and group interaction, and give
promising indications if they are occurring. Patterns indicating good individual practices were also identified. We found that
some key measures can be mined from early data. The results are promising for advising groups at the start and early
identification of effective and poor practices, in time for remediation.
INTRODUCTION
roup work is commonplace in many aspects of life,
particularly in the workplace where there are many
situations which require small groups of people to work
together to achieve a goal. For example, a task that re-
quires a complex combination of skills may only be pos-
sible if a group of people, each offering different skills,
can work together. To take just one other example, it may
be necessary to draw on the combined efforts of a group
to achieve a task in the time available. However, it is often
difficult to make a group operate effectively, with high
productivity and satisfaction within the group about its
operation. Reflecting the importance of group work, there
has been a huge body of research on how to make groups
more effective and how to help group members build
relevant skills. In one meta-analysis of this body of work,
a set of five key factors and three enablers has been iden-
tified [1]. For example, this work points both to the im-
portance of leadership as one of the five key factors and
to the effectiveness of training in leadership.
GOALS OF MINING GROUP WORK LOGS
We set our primary goal for the data mining as providing
mirroring tools that would be useful for helping improve
the learning about group work. This goal is realistic in the
context of the highly complex and variable nature of long-
term, small group activity, especially where the learners
undertake a diverse range of tasks, such as creating a
software system for an authentic client.
CONTEXT OF THE STUDY
Learners
The learners were students completing a senior software
development project course. Over 12 weeks, and working
in groups of 5-7 students, they were required to develop a
software solution for a client. The topics varied from cre-
ating a computer-based driving ability test to developing
an object tracking system for an art installation. The
groups were required to use Extreme Programming (XP)
[17], including use of user stories, small releases, and col-
lective code ownership.
We have collected data over three semesters, for co-
horts in 2005 and 2006. This paper reports the last 2006
cohort because our teaching changed markedly in 2006
and that cohort was given much more support and in-
struction in group work skills. This means their data is
richer and more meaningful, and is also not comparable
with the data from 2005.
DATA EXPLORATION
Before any data mining was carried out, the data was ex-
amined to see whether any simple statistics could distin-
guish the stronger from the weaker groups.
Firstly, we checked the total number of ticket events
for each group, as shown in Fig. 1. Intuitively we expect a
large number to be associated with strong groups as the
tickets allow group members to keep track of their work,
including to allocate and accept tasks. Indeed the results
show that the top group had the highest number of ticket
events. However, the performance of the other groups
does not seem to correlate with the number of ticket
events. For example, Group 2 had one of the lowest num-
bers. Upon interviewing members from this group (after
the completion of the course), we found that they were
reluctant to use the system as they felt it to be too cum-
bersome, and hence preferred to communicate their pro-
gress by other means.
Limitations of Clustering
The main limitation was the small data sample, especially
in the first task, clustering of groups. Although the data
contained more than 15000 events, we had only 7 groups
and 43 students. Nevertheless, we think that the collected
data and selected attributes allowed for uncovering use-
ful patterns characterising the work of stronger and
weaker students as discussed above. The follow-up inter-
views were very helpful for interpreting and validating
the patterns.
How to select the most appropriate clustering algo-
rithm and how to set its parameters is another important
issue. There are methods for determining a good number
of clusters and evaluating the clustering quality in terms
of cohesion and separation of the clusters found [20]. We
believe that in this application the expert knowledge of
the course co-ordinators and facilitators is essential to
find meaningful number of clusters and extract meaning-
ful characteristics, and then use them on new cohorts. For
larger datasets, hierarchical clustering may not be appli-
cable due to its high time and memory requirements; k-
means may be still a good choice, especially some of its
modifications, such as bi-secting k-means [20] which is
less sensitive to initialization and is also more efficient.
CONCLUSION
We performed mining of data collected from students
working in teams and using an online collaboration tool
in a one-semester software development project. Our goal
was to support learning group skills in the context of a
standard state-of-the art tool. Clustering was applied to
find both groups of similar teams and similar individual
members, and sequential pattern mining was used to ex-
tract sequences of frequent events. The results revealed
interesting patterns characterising the work of stronger
and weaker students. Key results point to the value of
analysis based on each resource and on individuals,
rather than just the group level. We also found that some
key measures can be mined from early data, in time for
these to be used by facilitators as well as individuals in
the groups. Some of the patterns are specific for our con-
text (i.e. the course requirements and tool used). Others
are more generic and consistent with psychological theo-
ries of group work, e.g. the importance of group interac-
tion and leadership for success.