04-10-2016, 11:43 AM
1457623343-Togelius2014How.pdf (Size: 87.42 KB / Downloads: 12)
Abstract— Game-based competitions are commonly used
within the Computational Intelligence (CI) and Artificial Intelligence
(AI) in games community to benchmark algorithms
and to attract new researchers. While many competitions have
been organised based on different games, the success of these
competitions is highly varied. This short paper is a self-help
paper for competition organisers and aspiring competition
organisers. After analysing the fate of a number of recent
competitions, some factors likely to contribute to the success
or failure of a competition are laid out, and a set of concrete
recommendations is offered. There is also a discussion of how
to write up game-based AI competitions and what we can
ultimately learn from them.
. INTRODUCTION
In the research field of artificial and computational intelligence
in games, game-based competitions have come to
assume a central place. Competitions are held each year in
conjunction with the two main conferences IEEE Computational
Intelligence and Games (CIG) and AAAI Artificial
Intelligence in Interactive Digital Entertainment (AIIDE),
as well as at major conferences dedicated to evolutionary
computation, game design, machine learning etc. These competitions
are based on games (computer games as well as
digital versions of board games), and submissions to these
competitions are in the form of either some CI/AI based
software or the output of this software. The winner is the
submission that best solves some problem posed by the game;
often, the problem is to play the game as well as possible,
but it could also be to generate fun levels for the game,
accurately analyse player data or imitate human players.
In August 2013, I held a tutorial at CIG where I discussed
what game-based AI competitions are, what we can learn
from them and how we can make them better. The tutorial
was based on my experience of running several such competitions
and of being competitions chair for conferences.
In the ensuing vigorous discussion it was suggested that the
tutorial be written up and published. Thus this short paper.
II. WHY DO IT?
First of all, let us motivate why we organise and participate
in game-based AI competitions at all. One of the most important
reasons, and probably the most academically respectable
reason, is that competitions provide fair, transparent and
reusable means of benchmarking algorithms. For example,
variants of Pac-Man had been used several times in the
past to test or demonstrate AI algorithms, such as genetic
programming [1]. However, since different implementations of the game and different experimental conditions were used,
there was no meaningful way to compare the results. After
the introduction of the Ms. Pac-Man competition, however,
researchers seeking to use some version of Pac-Man for their
research have had access to (and generally used) software and
experimental conditions as specified by the competition [2].
Anyone can check the league tables on the competition
website, and compare their own algorithm with the results
published there. Even after the competition has ceased to run
officially, researchers have continued to use the competition
software and rules to benchmark their solutions.
A strongly related benefit for researchers is that there is
a benchmark available at all. Many researchers wishing to
do research on problems relevant for games have in the past
faced the problem of having to develop the game (or an API
for an existing game) themselves. The existence of games
already equipped with interfaces and in other ways prepared
for AI research (such as being able to run in “headless”
mode without graphics and with faster execution) allows
researchers to focus on research. This has during the last few
years allowed many researchers from other fields to enter the
AI/CI in games field. It has also allowed many researchers
already in the field to get more research done faster.
Game-based AI competitions are useful not only for
research, but also for teaching. Universities now organise
their own local competitions based on the software and rules
used in the global competitions, and base course assignments
on the competition software. The prospect of being able to
work with a “real” game and benchmark the performance of
their implementations against each other and against leading
researchers is often appreciated by students [3].
The final reason for constructing these competitions and
their associated benchmarks might seem trivial but really is
not. It is that it makes it easier for people outside of our
research field to understand and engage with what we are
doing. It is not easy to explain to journalists, administrators
or even academics from other departments what research in
CI and AI is. Common testbeds such as pole balancing, TSP
and machine scheduling often come across as incomprehensible
and/or pointless. Everyone understands Pac-Man.
III. WHICH COMPETITIONS ARE THERE?
In the rest of the paper, I will use concrete examples of
game-based AI competitions in my argumentation. It will
therefore be useful to start with talking about some of them.
This paper is not a survey of game-based AI competitions;
such a survey ought to be written, but would result in a
considerably longer paper. Instead, this section is a selective
enumeration of competitions (most of them associated with
the CIG and AIIDE conferences) ordered by domain.
The earliest game-based AI competitions were based on
classical board games; Chess has long been studied in AI and
cognitive science research, and a Computer Chess Olympiad
has been running since 1974 [4]. Checkers has also been
the subject of much research, at least until it was solved in
2007 [5]. Recently, much effort has gone into playing Go,
motivating the development of the Monte Carlo Tree Search
algorithm [6], [7]. In the CIG community, there has been
a competition in Othello-playing at some conferences; competitors
submitted board evaluation functions which would
work with a shallow MiniMax search [8].
Strategy games have in common with classic board games
the control over multiple units and the focus on long-term
strategy. Surprisingly few competitions have been based on
turn-based strategy games. One of them is Planet Wars, a
mechanically simple game that was the basis of a Google
competition. There was also briefly a competition based on
the nuclear strategy game DefCon in 2009. Instead, popular
competitions in the CI/AI community have been based on
real-time strategy games (RTS). ORTS (Open RTS) was
developed by Buro and used for a competition that ran
2003–2009 [9]. In recent years, focus has shifted to the
StarCraft competitions. These are based on the now-classic
Blizzard RTS game together with a third-party interface
called BroodWar API; there is an annual competition at CIG
and another at AIIDE, and both are currently very active [10].
Car racing games could be positioned at the other end
from classic board games on several dimensions of the games
continuum; while they feature elements of tactics (such as
overtaking) and occasionally strategy (such as damage and
fuel management), the greatest emphasis is on the continuous
control problem of driving the car so that lap times are
minimised. The first simulated car racing competition, in
2007, was based on an idiosyncratic two-dimensional racing
game with simple physics and attracted a large number of
submissions [11]. This competition morphed into the Simulated
Car Racing Championship, based on the more complex
game TORCS, which has been running since 2008 [12].
Several competitions have been based on classic arcade
games or console games from the 80s. The Ms. Pac-Man
Screen Capture competition, where agents interfaced to an
emulated version of the original Ms. Pac-Man game, ran from
2007 to 2011 [2]. A Java-based Ms. Pac-Man competition ran
in 2011 and 2012, where the core mechanics and levels of
the game have been replicated in a framework to which the
controllers can interface directly [13]. The Mario AI Championship
ran in various versions from 2009 to 2012 [14],
[15]. This competition had four tracks. In one track submitted
controllers competed on playing unseen levels as well as
possible, in another on learning to play specific levels, in a
third on playing in a human-like manner and in the fourth
on generating levels. Competitions have also briefly existed
based on the 2D space shooter Xpilot [16], and on the
cooperative platformer Geometry Friends [17].
Another category of games is first-person shooter (FPS)
games. The Unreal Tournament 2004 is perhaps the only commercial FPS for which a third-party interface for agent
control exists, and as a result it has been used extensively
for competitions. The 2k BotPrize was a Turing-test-like
competition for developing the bot that played in the most
human-like manner in a multiplayer game; it ran from 2008
until 2013, when a bot finally managed to convince more than
half of the judges that it was a human player [18]. The same
game has also been used for a “deathmatch” competition in
2009, where the aim was simply to be the last bot standing.
Finally, there are a few competitions that are not based on
existing games or game genres. One is the physical travelling
salesman problem (PTSP) competition, which can be seen as
the hybrid of the traditional TSP problem and a racing game.
Controllers compete for reaching a number of checkpoints
in minimum time, while taking turning radius, momentum
etc into account [19]. Another is Cellz, which is a game
where the task of the controller is to control multiple “cellz”,
which move about in a 2D environment, eat smaller cellz
and multiply [20]. A more famous example is the general
game playing competition (GGP), which tasks submitted
controllers with learning to play an arbitrary unseen boardgame-like
game, after being provided with the rules [21].
Competitions are also held in other fields, including reinforcement
learning, evolutionary computation and planning,
and much could be learnt from how those are organised.
An interesting competition to learn from is the HUMIEs,
focusing on human-competitive results from evolutionary
computation in any domain, which has several times been
won by game-playing agents [22]. Much could probably also
be learnt from how human-human game-playing competitions
(eSports) are organised.
IV. SUCCESS AND FAILURE
Not all game-based AI competitions are equal: some have
more impact than others. A competition may get no entrants,
run only once, stagnate, or evolve and keep being relevant.
Get no entrants. Both the Cellz competition and the
Xpilot competition ran once, and received no submissions.
The first Geometry Friends competition had only one entrant.
This is of course the least desired fate for a competition,
so it is important to analyse the reasons for the lack of
competitors. In the case of Cellz, a probable cause is that the
game was not similar enough to any existing computer game,
and was not really playable by humans. (In the later stages
of the game, a human would have to control dozens of cellz
simultaneously in real-time.) Therefore, it is hard to compare
the performance of a controller to how a human would
have played, and also hard to intuit strategies. Additionally,
perhaps few people perceived Cellz as an interesting or
important problem to work on. Xpilot on the other hand is
a fairly well-known game with strong similarities to classic
arcade games such as Asteroids and Defender. However, it
was rather complicated to get started with developing an entry
for the competition. There was no interface for any highlevel
language in common use, such as Java; instead there
was a Scheme interface. The Geometry Friends competition software did not include a sample controller, which probably
dissuaded many potential competitors.
Run only once. The first version of the Simulated Car
Racing competition ran only one year, before being replaced
with a competition with the same name and partly the same
organisers but based on a very different racing game. The
Unreal 2k4 Deathmatch competition ran only once, attracting
submissions from three competitors, and was not followed
by another such competition. A key reason for competitions
being discontinued is almost certainly lack of time and energy
on part of the organisers. Running competitions is hard
work, and sometimes thankless. If the number of submissions
is low, it is tempting to not re-run the competition. Another
possible reason is that the problem appears solved or close to
solved in its current form. In the first version of the simulated
car racing competition, the top competitors reached very
similar scores, and seemed to be playing the game nearoptimally.
Some sort of change was needed to make the
problem more challenging; thus the radical redesign.
Stagnate. The new version of the Simulated Car Racing
competition kept attracting better and better submissions
for two or three years, but then it stagnated: in 2012 and
2013, none of the new submissions beat the best submission
from 2011. The controllers neither performed better nor were
they more sophisticated. There were also fewer submissions
in these latter years. As a result, the competition is not
running in 2014 and it is unclear whether it will run again.
Something similar happened for the Gameplay track of the
Mario AI competition but in a shorter time span: 2009 saw
two competition events and 2010 saw three; in between
each event, the framework evolved. The sophistication of the
submissions increased throughout 2010, even though there
were fewer of them. In 2011 the Gameplay track saw very
few entrants, and in 2012 there were too few entrants to run.
One important possible reason for stagnation is that no
further progress seems attainable, either because the problem
is perceived as “solved”, or because the submitted controllers
seem so sophisticated that it would be impossible to improve
on them in time for the deadline. This is a problem particularly
for submissions by teams of students. Other important
possible reasons are that the organisers of the competition
fail to sufficiently advertise the competition, make results of
previous years available, and make source code of previous
years’ competitors available. It is worth noting that even
if a competition stagnates, the competition software can
go on to be widely used for research papers that report
experiments done using that software but not submitted to
the competition. This is the case for both the Simulated Car
Racing competition and the Mario AI competition.
Keep evolving and being relevant. The 2k BotPrize
managed to get numerous submissions during the six years it
was running. More importantly, these submission generally
became better at fooling the judges, until an entry (as
mentioned above) finally managed to win the prize in 2013.
In its lifetime, it saw at least one major overhaul of its
rules, and there was continual improvement of the Pogamut framework it builds on. This is an example of a competition
that managed to keep evolving and stayed relevant until
the very end. Another good example here is the StarCraft
competitions. These competitions have run since 2010, and
the submissions to the competitions have been getting better
and more sophisticated each year. This is somewhat surprising,
as the complexity of the best controllers is such
that it would deter many students and other more casual
participants from participating. But apparently, many of the
best competitors in that competition are either researchers on
long-term or permanent contracts, or developers outside of
academia working in their spare time.
Competitions that keep being relevant have at least two
things in common: (1) the organisers keep investing time
in e.g. improving the usability of the software, bug-fixing
both rules and software, keeping results tables up to date and
promoting the competition, and (2) the underlying problem
is not perceived as solved, but it is seen as realistic to make
short-term advancement towards solving the problem.
V. HOW TO SUCCEED
The following advice is based on the fates of existing
competitions discussed above, and on my own experience of
running competitions. It can be thought of as “the ten habits
of highly successful competition organisers”. The list is
ordered by how well-corroborated these habits are by existing
competitions. I regard the first five as purely beneficial habits
that you should follow, whereas the five latter are positive in
most cases though there might be situations where they can
or should not be followed.
Choose a fun game. A game that humans play out of their
own free will is more interesting to write AI for. It also helps
if the game is famous so that many have already played it.
Be transparent and reliable. Lay down the rules for your
competition early on. Be clear about how scoring will happen
and whether you have an open-source requirement for your
submissions. Stick to your rules. Keep your website updated.
Be platform-agnostic. It is often erroneously assumed that
academic AI researchers are willing to spend time learning
any particular programming language or build system that the
competition might require, or go out and buy the hardware or
software it requires. A good competition builds on software
that runs on any platform, and can easily interface with
several major programming languages in use in academia
(such as Java, c and Python).
Be persistent. Running a competition once is of limited
value. Running a competition ten times over the space of
five years is of much higher value, and does not require
more initial effort, but does require much more persistence.
Repeating a competition allows competitors to improve and
tune their submissions, and allows people who see earlier
competition events to get inspired and participate in later
events. Therefore it is worth running again even if you get
no submissions the first time. Establish a clear governance
structure, and make sure there is someone there to take over
running the competition when you or your students graduate.
Run a discussion group. Even though you keep your
website updated, if your competition is popular there will
be more questions than you can handle, especially regarding
technical details. So start a discussion group (e.g. using
Google Groups) and invite all competitors and other users
of the software to do technical support for each other.
Offer money. Not only are researchers like other humans
motivated by money, but the fact that someone is prepared
to pay money to the winner makes the competition seem
more serious.
Avoid network communication. It should be possible to link
directly to the competition software, as overhead for communication
via IP can slow down the software substantially.
Make it possible to speed up the game. Most forms of
reinforcement learning, in particular evolutionary computation,
require many thousands of trials. If you want your
competitors to be able to use learning algorithms effectively,
make sure your game can be sped up by several orders
of magnitude. This unfortunately causes problems for using
existing closed-source commercial games as a basis for a
competition, as they can rarely be sped up significantly.
Make it really easy to get started. This is surprisingly often
overlooked. An average postgraduate student or faculty AI
researcher, who might not be a stellar programmer, should be
able to have a proof-of-concept submission up and running
within hours. Instructions should be less than a page.
Keep everything open source. Competition entries should
be open source so as to allow competitors to learn from
each other, and so as to prevent cheating. It might be
best to release the source code of entrants only after each
competition event, so as to avoid copying techniques during
the run-up to an event. The competition software itself
should be open source if possible, as it is the only fully
complete specification of the competition rules. This is a
question of fairness: closed-source software can always be
decompiled by anyone with sufficient time and resources.
(The parameters, levels or datasets used for the final test can
be kept secret until the competition event.) Note that it is not
enough to stipulate the open access to source code – all the
source code should be available on the website.