18-04-2013, 02:06 PM
SixthSense: Fast and Reliable Recognition of Dead Ends in MDPs
SixthSense.pdf (Size: 257.93 KB / Downloads: 17)
Abstract
The results of the latest International Probabilistic Planning
Competition (IPPC-2008) indicate that the presence of dead
ends, states with no trajectory to the goal, makes MDPs hard
for modern probabilistic planners. Implicit dead ends, states
with executable actions but no path to the goal, are particularly
challenging; existing MDP solvers spend much time and
memory identifying these states.
As a first attempt to address this issue, we propose a machine
learning algorithm called SIXTHSENSE. SIXTHSENSE helps
existing MDP solvers by finding nogoods, conjunctions of
literals whose truth in a state implies that the state is a dead
end. Importantly, our learned nogoods are sound, and hence
the states they identify are true dead ends. SIXTHSENSE is
very fast, needs little training data, and takes only a small
fraction of total planning time. While IPPC problems may
have millions of dead ends, they may typically be represented
with only a dozen or two no-goods. Thus, nogood learning
efficiently produces a quick and reliable means for dead-end
recognition. Our experiments show that the nogoods found
by SIXTHSENSE routinely reduce planning space and time
on IPPC domains, enabling some planners to solve problems
they could not previously handle.
INTRODUCTION
Recent work on “probabilistic interestingness” has suggested
that probabilistic planners have difficulty on domains
with avoidable dead ends, non-goal states with no potential
trajectories to the goal [14]. This conjecture is supported
by the recent International Probabilistic Planning Competition
(IPPC) [5], in which domains with a complex deadend
structure, e.g., Exploding Blocks World, have proven
the most challenging. Surprisingly, however, there has been
little research on methods for quickly and reliably avoiding
such dead ends in Markov Decision Processes (MDPs).
It is useful to distinguish between two types of dead ends.
If a non-goal state doesn’t satisfy the preconditions of any
action, we term it an explicit dead end. Implicit dead ends,
on the other hand, have executable actions (just no workable
path to the goal) and are much harder for MDP solvers to
detect than the explicit type.
BACKGROUND
Markov Decision Processes (MDPs). In this paper, we focus
on probabilistic planning problems that are modeled by
factored goal-oriented indefinite-horizon MDPs. They are
defined as tuples of the form hS;A; T ; C; G; s0i, where S
is a finite set of states, A is a finite set of actions, T is a
transition function S A S ! [0; 1] giving the probability
of moving from si to sj by executing a, C is a map
S A ! R+ specifying action costs, s0 is the start state,
and G is a set of (absorbing) goal states. Indefinite horizon
refers to the fact that the total action cost is accumulated
over a finite-length action sequence whose length is apriori
unknown. Our method also handles discounted infinitehorizon
MDPs, which reduce to the goal-oriented case [1].
APPROACH
A domain may have an exponential number of dead end
states, but usually there are just a few “explanations” for why
a state has no goal trajectory. A Mars rover flipped upside
down is in a dead-end state, irrespective of the values of the
other variables. In the Drive domain of IPPC-06, all states
with the (not (alive)) literal are dead ends. Knowing these
explanations obviates the dead-end analysis of each state individually
and the need to store the explained dead ends in
order to identify them later.
The method we are proposing, SIXTHSENSE, strives to
induce explanations as above in the factored MDP setting
and use them to help the planner recognize dead ends
quickly and reliably. Formally, its objective is to find nogoods,
conjunctions of literals with the property that all
states in which such a conjunction holds (or, the states it represents)
are dead ends. After at least one nogood is discovered,
whenever the planner encounters a new state, SIXTHSENSE
notifies the planner if the state is represented by a
known nogood and hence is a dead end.
DISCUSSION
Although our preliminary experiments clearly indicate the
benefits of nogoods and SIXTHSENSE, we believe that
SIXTHSENSE’s effectiveness in some settings will increase
if the algorithm is extended to generate nogoods in firstorder
logic. This capability would be helpful, for example,
inthe EBW domain, where, besides the immediate nogoods,
there are others of the form “block b is not in its goal position
and has an exploded block somewhere in the stack above
it”. Indeed, to move b one would first need to remove all
blocks, including the exploded one, above it in the stack, but
in EBW exploded blocks can’t be relocated. Expressed in
first-order logic, this statement would clearly capture many
dead ends. In propositional logic, however, it would translate
to a disjunction of many ground conjunctions, each of
which is a nogood. Each such ground nogood separately accounts
for a small fraction of dead ends in the domain and
is thus almost statistically unnoticeable, preventing SIXTHSENSE
from discovering it. Granted, our experiments imply
that first-order nogoods are not numerically significant in the
benchmark EBW problems. However, one can construct instances
where this would not be true.
RELATED WORK
To our knowledge, there have been no explicit previous
attempts to handle identification of dead ends in MDPs.
The “Sensitive but Slow” and “Fast but Insensitive” mechanisms
weren’t actually designed specifically for this purpose
and are unsatisfactory in many ways described in the Experiments
section. The general approach of SIXTHSENSE
somewhat resembles work on explanation-based learning
(EBL) [7]. In EBL, the planner would try to derive control
rules for action selection by analyzing its own execution
traces. Besides EBL, SIXTHSENSE can also be viewed as
a machine learning algorithm for rule induction, similar in
purpose, for example, to CN2 [8]. While this analogy is
valid, SIXTHSENSE operates under different requirements
than most such algorithms, because we demand that SIXTHSENSE-
derived rules (nogoods) have zero false-positive rate.
Last but not least, our term “nogood” shares its name with
a concept from the area of constraint satisfaction problems
(CSPs). However, the semantics of nogoods is CSPs is different,
and the methodology for finding them, largely summarized
in [9], has nothing in common with ours. The
idea of leveraging basis functions was inspired by their use
in [10] and ReTrASE, as well as the evidence provided
by solvers like FFReplan and RFF that many deterministic
plans that we derive basis functions from can be computed
very quickly in quantities.
CONCLUSION
We have identified recognition of implicit dead ends in
MDPs as a source of time and memory savings for probabilistic
planners. To materialize these benefits, we proposed
SIXTHSENSE, a machine learning algorithm that uncovers
the few concise “explanations” of implicit deads inherent in
most real-life and artificial scenarios requiring planning under
uncertainty. The explanations (nogoods) help the planner
recognize most dead ends in a problem quickly and reliably
reliably,
removing the need for a separate analysis of each such
state and expensive methods to do it. We feel that in the
future SIXTHSENSE could be improved further by being extended
to handle first-order logic expressions, which may
be useful in domains like EBW. We empirically illustrate
SIXTHSENSE’s operation and show how, even as it is, it can
help a wide range of existing planners save a large fraction
of resources on problems with dead ends. Moreover, these
gains are achieved with very little overhead.