A brief overview of AI planningJussi Rintanen, June 2010
A brief overview of AI planning
The planning problem in Artificial Intelligence is about
the decision making performed by intelligent creatures like robots,
humans, or computer programs when trying to achieve some goal.
It involves choosing a sequence of actions that will (with a high likelihood)
transform the state of the world, step by step, so that it will
satisfy the goal. The world is typically viewed to consist of atomic
facts, and actions make some facts true and some facts false.
In the following we discuss a number of ways how to formalize planning,
and how the planning problem can be solved automatically.
Models of state transition systems
Most works on planning use a state-transition system model
(we often just write transition system), in
which the world/system consists of a (finite or infinite) number of
states, and actions/transitions change the current state to
a next state. (Exceptions to this are planning with Hierarchical Task Networks (HTN), and planning problems closer to Scheduling, in which the world states are not represented explicitly.)
There are several possible representations for state-transition systems.
A state-transition system can be represented as an arc-labeled directed
multi-graph.
Each node of the graph is a state, and the arcs
represent actions.
An "action" is usually something that can be taken in several states,
with the same or different effects. All the arcs corresponding to
one action are labeled with the name of the action.
There may be more than one action that moves us from state A to state B,
and this means that there are more than one arc from A to B (this is why we said that the graph is a multi-graph.)
Succinct representations of transition systems
In planning (similarly to other areas, including Discrete Event Systems and Computer-Aided Verification), the transition systems are usually not represented
explicitly as a graph, simply because those graphs could be far too big.
Instead, a compact (succinct, factored) representation
is used.
These representation are typically based on state variables.
A state, instead of being an atomic object, is a valuation
of a (finite) number of state variables.
If there are state variables A : {1,2,3}, B : {0,1} and C : {0,1}, then
there are 3*2*2* states 100, 200, 300, 101, 201, 301, 110, 210, 310, 111,
211, and 311.
If the domain of each state variable is finite, and there are finitely many state variables, then the number of states is finite as well.
When states are represented as valuations of state variables,
an action can be represented as a procedure
or a program for changing the values of the state variables.
Below we explain a couple of possibilities how these procedures
can be defined.
STRIPS
The simplest language used for formalizing actions is the STRIPS
language.
In STRIPS, the state variables have the domain {0,1} (equivalently
{FALSE, TRUE}),
and an action consists of three sets of state variables,
the PRECONDITION, the ADD={a1,a2,...,an} list, and the DELETE={d1,d2,...,dm} list (it is assumed that ADD and DELETE don't intersect.)
An action is possible in a state if all the variables in PRECONDITION
have the value 1.
Taking the action corresponds to executing the following program, consisting
of assignment statements only:
a1 := 1
a2 := 1
...
an := 1
d1 := 0
d2 := 0
...
dm := 0.
In STRIPS planning, a goal is usually expressed as a set of
state variables. A state is a goal state if all of the goals
have the value 1 in it.
Petri nets
Petri nets are a model of state transition
systems in which several transitions may take place simultaneously in the sense
that they are independent.
Some of the restricted Petri net models (e.g. 1-safe state-transition nets)
are equivalent to the propositional STRIPS model.
For Petri nets, the state variables are the places.
In state-transition nets, each place can hold 0 or more tokens.
Essentially, a place is a state variable with the set of natural numbers
as its domain.
The class of 1-safe Petri nets adds the condition that each place can
only hold 0 or 1 tokens. This way each state variable has the domain {0,1},
and there will be only a finite number of states.
Transitions (actions) in Petri nets are described slightly differently.
Each transition has a set of predecessor places and a set of
successor places.
The transition is possible if all predecessor places N have a token
(this corresponds to the PRECONDITION in STRIPS).
When the transition is fired, all the successor places will
receive an additional token.
STRIPS actions can be relatively easily translated into Petri nets with
the 1-safety property (see Hickmott et al., IJCAI'07),
but there are some small complications because in STRIPS the assignment
a1 := 1 has the same result no matter what the old value of a1 was,
whereas in Petri nets if a1 is a successor place, the number of tokens
is increased by one, and may become something else than 1
if the place already had one or more tokens.
Petri nets in general cannot be translated into STRIPS, because in general there may be any number of tokens in a place, and consequently an infinite number of states. Petri nets with the 1-safety property however can be translated.
C. A. Petri, Kommunikation mit Automaten, Institut für instrumentelle Mathematik, Universität Bonn, Schriften des IIM, No. 2, Germany, 1962.
Murata, T., Petri nets: properties, analysis and applications, Proceedings of IEEE, 77(3), pp. 541-580, 1989.
PDDL/ADL
PDDL/ADL is a generalization of STRIPS.
Here we describe a propositional variant of PDDL/ADL.
(PDDL actually has a Lisp-like syntax, but we don't use it here.)
The differences of PDDL in comparison to STRIPS are
The PRECONDITION may be an arbitrary Boolean combination of atomic facts
about the state variables.
Atomic facts say something about one state variable,
for example a=0 or b=1.
A precondition could for example be (a=1)∨¬(b=1∧c=0).
Instead of the unconditional assignments represented by ADD and DELETE,
the effects may be conditional. This means that the effects are
of the form,
IF condition THEN a := v
where the condition is a Boolean combination of facts.
STRIPS corresponds to PDDL with trivial conditions that are always true (the condition is the constant TRUE).
Goals may be Boolean combinations of atomic facts (formulas).
PDDL can be reduced to STRIPS, but
there is NO one to one correspondence between the PDDL actions and
the STRIPS actions, because several STRIPS actions may be needed for
one PDDL action, and
the set of STRIPS actions may have a size that is exponential
in the size of the set of PDDL actions. This is because
a PDDL action may need to be reduced to an exponential number of STRIPS
actions. For example, a PDDL action with effects
IF a1=0 THEN a1 := 1
IF a1=1 THEN a1 := 0
IF a2=0 THEN a2 := 1
IF a2=1 THEN a2 := 0
...
IF an=0 THEN an := 1
IF an=1 THEN an := 0
corresponds to 2n STRIPS actions because the effects of
the actions are different in 2n different classes of states,
and for each class of state a different STRIPS action is needed.
Representation of actions in the propositional logic
Here we restrict to Boolean state variables, and will denote the atomic facts a=0 and a=1 respectively by a and ¬a.
In the representation of actions and change in the propositional logic
we need two sets of propositional variables: one expressing the values
of the state variables in the predecessor state, and one for the successor
state. We distinguish between the two by the ' sign so that the old
value of x is denoted by the propositional variable x and the new value by
the propositional variable x'.
All the above action representations (STRIPS, 1-safe Petri nets, PDDL) can
be compactly represented in the propositional logic.
Given an action, its logic representation consists of two parts:
the precondition formula P (exactly like in PDDL)
description of what happens to each state variable,
which is a conjunction of formulas, one conjunct for each state variable.
Each state variable ai is represented as an equivalence
F(a1,a2,...,an) ↔ a'i
where F is a Boolean function on the state variables (a formula).
For STRIPS the functions F are trivial: for a state variable x they are
either x, the constant TRUE, or the constant FALSE. This means that
in STRIPS either the value of a state variable remains the same, or the
variable unconditionally becomes TRUE or FALSE.
The reduction from 1-safe Petri nets to the propositional logic goes
through the STRIPS representation.
For PDDL the functions F may be arbitrarily complex. We don't give
a systematic mapping from PDDL to the propositional logic here, although
the mapping is straightforward.
An action consisting of the precondition d and
the conditional effect IF a∧b THEN c is represented by the propositional
formula
d∧(((a∧b)∨c)↔c').
The formula says that the new value of c will be TRUE if and only if
a and b are both true or c was true already.
State-space search
State-space search is one of the earliest methods for answering
questions about transition systems, including model-checking (verification),
planning and others. It is easy to implement efficiently, but, when
the number of states is high, its applicability is limited by
the necessity to do the search only one state at a time.
When the number of states is some tens of millions or less,
this approach is efficient and can give guarantees of finding solutions
in a limited amount of time.
Search algorithms
There are several types of search algorithms that are routinely
applied in planning. These include the following.
well-known uninformed search algorithms like depth-first search,
breadth-first search
systematic heuristic search algorithms with optimality guarantees, for example
A* and its variants like IDA*, WA*
systematic heuristic search algorithms without optimality guarantees,
for example the standard "best-first" search algorithm which is like
A* but ignores the cost-so-far component of the valuation function
incomplete unsystematic search algorithms, most notably stochastic
search algorithms
Incomplete search algorithms can be useful for specific problems when good
heuristics are available, or as a part of a portfolio of search algorithms
(see portfolios below.)
References
J. Pearl, Heuristics: Intelligent Search Strategies for Computer Problem Solving, Addison-Wesley, 1984.
Symmetry reduction
Symmetry reduction methods try to decrease the effective
search space by recognizing symmetries in the state-space
graph.
In planning, symmetries are typically caused by the interchangeability
of objects. If there is a plan that involves some interchangable objects A and B, there is a symmetric plan with the roles of A and B interchanged.
The standard works on symmetry reduction for state-space search are
from early nineties.
References
Starke, P. H., Reachability analysis of Petri nets using symmetries,
Journal of Mathematical Modelling and Simulation in Systems Analysis,
8(4/5), pp. 293-303, 1991.
Partial-order reduction
Partial-order reduction methods try to decrease the number of
search steps by recognizing different forms of independence of
actions/transitions. The basic idea is that if actions/transitions A and B
are independent in the sense that they can be taken in either order
A;B or B;A, one only needs to consider one of these orderings.
Many of the existing partial-order reduction methods address deadlock
detection, but there is a simple reduction from the problem of
reaching a goal state to the problem of reaching a deadlock state,
and therefore these methods are easily applicable to the planning problem, too.
Antti Valmari, Stubborn Sets for Reduced State Space Generation, Advances in Petri Nets 1990. 10th International Conference on Applications and Theory of Petri Nets, Bonn, Germany, Lecture Notes in Computer Science 483, pp. 491-515, Springer-Verlag, 1991.
Heuristics
There are several ways of systematically (automatically) creating heuristics.
the main approaches are different forms of relaxation, in which either
the original problem instance is simplified, and then the simplified
problem instance is solved, or
the problem instance is unmodified, but the definition of solution
is simplified so that it can be efficiently solved. (In many cases the same
thing can be achieved in either of these ways, but the latter option seems
to be more general.)
Well-known examples of this are the following.
Computing the max-heuristic of Bonet and Geffner for STRIPS
proceeds by ignoring all effects x := 0 and retaining all effects x := 1, and
then finding a plan for this simplified problem instance.
The shortest solution(s) to this problem instance will give a lower bound
for the plan length for the original instance, and can be used as a heuristic
in a heuristic search algorithm such as A* to find shortest plans.
Another, general method for deriving heuristics is
pattern databases (Culberson & Schaeffer, 1998, with application
to planning by Edelkamp in 2001).
The idea is to relax the original problem instance in alternative ways,
by eliminating all state variables except for a small number n
of them (with n typically 10 or less, so that the problem
becomes almost trivial), and then solving the simplified problem instances
optimally.
The length of any of these optimal solutions is a lower bound for
the length of the shortest plan of the original problem instance.
generalized abstractions of Dräger et al. (2006), which
can be viewed as a generalization of pattern databases that allows
abstraction by arbitrary aggregation of states, not only by ignoring
some of the state variables (this was called combine-and-abstract by
Dräger et al., and planning researchers have renamed this to
merge-and-shrink).
There are methods for deriving heuristics from other heuristics.
The simplest one, used in connection for pattern databases,
is to take the maximum of different heuristics. Since all
of the constituent heuristics are lower bounds for the true shortest plan
length, their maximum is a lower bound as well.
There are variations of the "taking the maximum" scheme that try
to be more accurate: when two heuristics respectively give lower bounds
n and m, and their calculation (e.g. through the max-heuristics or pattern databases) involved completely disjoint sets of actions,
then also the sum of the heuristics is still a lower bound
for the length of the shortest plan of the original plan.
The use of this strategy is more complicated, because it may be difficult
to find a partition the original planning problem in a way that the disjointness
property can be guaranteed in a useful way.
The above discussion is about admissible heuristics, which represent
true lower bounds for the actual shortest plan length.
Admissible heuristics are needed when trying to find optimal
plans with algorithms such as A*.
When optimality is not required, heuristics don't have to be admissible,
and then basically anything goes. Work on finding efficient non-admissible
heuristics is typically driven by their performance w.r.t. standard
benchmark problems, as they don't have to satisfy any special properties.
References
B. Bonet and H. Geffner, Planning as Heuristic Search, Artificial Intelligence Journal, 129(1-2), pp. 5-33, 2001.
K. Dräger, B. Finkbeiner and A. Podelski, Directed Model Checking with Distance-Preserving Abstractions, in Model Checking Software, LNCS 3925, pages 19-34, 2006.
J. C. Culberson and J. Schaeffer, Pattern Databases, Computational
Intelligence, 14(4), pp. 318-334, 1998.
S. Edelkamp, Planning with Pattern Databases, Proceedings of
the 6th European Conference on Planning (ECP-01), to appear.
Planning by SAT and constraint-satisfaction
Many transition systems have too many states to consider them explicitly
one by one, and then factored representations that allow representing
large numbers of states and state sequences may be a more efficient
alternative.
Such representations and search methods are known as symbolic
or factored.
The earliest symbolic search methods were based on Binary Decision Diagrams
(BDDs, discussed below). The currently most popular method is based on
reduction to the propositional satisfiability problem SAT and the use
of efficient algorithms for solving the SAT problem.
SAT methods are less prone to excessive memory consumption than BDDs.
Unlike state-space search, their memory consumption is not linear in
the number of visited (stored) states, and they can be implemented
with a memory consumption that is linear in the length of a plan or
transition sequence (as opposed to the in general exponential memory
consumption of explicit state-space search.) However, the currently
leading algorithms for the SAT problem gain much of their efficiency
by consuming more memory than the theoretical optimal algorithms.
All the main improvements to state-space search (symmetry reduction,
partial-order reduction, heuristics) have counterparts in the SAT
setting, in the form of symmetry-breaking constraints,
parallel or partially-ordered plans, and SAT-solvers
with planning-specific heuristics.
We will not be discussing these in more detail here.
SAT algorithms
References to the technology underlying the current generation of
efficient SAT algorithms see the following.
References
D. Mitchell, A SAT Solver Primer, EATCS Bulletin, 85, pp. 112-133, February 2005.
P. Beame, H. Kautz, and A. Sabharwal, Towards Understanding and Harnessing the Potential of Clause Learning, Journal of AI Research, 22, pp. 319-351, 2004.
Reduction to the SAT problem
Plans can be found by solving a satisfiability (SAT) problem, or solving
a constraint satisfaction problem (CSP). We describe the SAT case; CSP and SAT frameworks are very much similar, and the ideas expressed in terms of SAT are directly applicable in the CSP setting.
The idea is to have formulas PT (where T is a positive integer)
such that PT is satisfiable if and only if there is a plan
with a horizon 0,1,...,T.
Depending on the definition of plans, at each time point 0,1,...,T-1
there may be one or more actions.
The formulas PT can be constructed from the logic representation
of actions we described earlier.
Instead of using the propositional variables A∪A'
where A = { a1,..., an } and A' = { a'1,..., a'n }
for the values of the state variables in the current and in the next time
point, we have the propositional variables
A@i = { a1@i,..., an@i } for different time points
0≤i≤T.
Given a formula F for some action expressed in terms of
the variables in A∪A',
we obtain F@i by replacing all these variables by
a1@i,..., an@i,a1@(i+1),..., an@(i+1).
When we have formulas F1,...,Fn representing n different
actions, we can represent the transition relation between states
at time i as the formula
R@i = F1@i∨F2@i∨...∨Fn@i.
Now a sequence of T actions can be represented as
R@0∧R@1∧...∧R@(T-1).
Finally,finding a plan (of length T) from any state satisfying
a formula I to any state satisfying formula G can be expressed
as the formula
PT = I@0∧R@0∧R@1∧...∧R@(T-1)∧G@T, where
I@0 and G@T are obtained from I and G by replacing each state variable a
in these formulas respectively by a@0 and a@T.
The time it takes to test the satisfiability of the formulas
PT strongly depends on the number of propositional variables
in the formulas, which, for a given planning problem, is determined
by the parameter T. Most works that improve the efficiency
of the satisfiability tests do this by decreasing T by allowing more
than one action at each time point. See the references for more details.
References
H. Kautz and B. Selman, Pushing the envelope: planning, propositional logic, and stochastic search, AAAI'96, pp. 1194-1201, AAAI Press, 1996.
See my planning as satisfiability page for more details, including high-performance implementations of the approach.
Planning by "symbolic" search methods such as Binary Decision Diagrams
Symbolic search methods here means the use of Binary Decision Diagrams (BDD)
and similar normal forms (such as DNNF) for propositional formulas for
representing sets of states and transition relations compactly.
"Symbolic" planning uses exactly the same formulas we used
above with SAT-based planning. The difference is that instead of
constructing the formulas
PT = I@0∧R@0∧R@1∧...∧R@(T-1)∧G@T
for different values of T>0 and testing their satisfiability,
we incrementally and implicitly compute from the formulas
I@0
I@0∧R@0
I@0∧R@0∧R@1
I@0∧R@0∧R@1∧R@2
the sets of states that are respectively reachable by 0, 1, 2 or more actions,
and see if some of the goal states are reachable.
Instead of the timed variables a@i we only use the variables a and a' for
the current and the next state.
I represents the initial state(s).
Let R = F1∨F2∨...∨Fn where
the Fi formulas are as defined earlier.
Now I∧R represents the set S of state pairs (s,s') where state s' follows s
when one of the actions is taken and s is an initial state.
To compute the set { s' | (s,s') ∈S } we eliminate from I∧R all
propositional variables in A.
For this we use the existential abstraction operation defined
by ∃x.F = F[0/x]∨F[1/x].
For sets X = { x1,...,xk } of propositional variables
we define ∃X.F = ∃x1.∃x2....∃xk.F.
We also define a renaming operation: F[A'/A] means that all occurrences
of variables in A' are replaced by the corresponding variable in A
(for example a' is replaced by a.)
Now sets of state reachable by 0,1,2 and more steps are represented by
the following formulas:
I
(∃A.(I∧R))[A'/A]
(∃A.(((∃A.(I∧R))[A'/A])∧R))[A'/A]
(∃A.((∃A.(((∃A.(I∧R))[A'/A])∧R))[A'/A])∧R)[A'/A]
This can be more clearly written as a small program which computes the sets
of states reachable with 0,1,2 and more steps until a set that intersects
the goals G is reached.
states[0] := I;
t := 0;
while states[t]∧G is unsatisfiable do
t := t+1;
states[t] := (∃A.(states[t-1]∧R))[A'/A];
end while
The above algorithm terminates at iteration t iff t is the length
of the shortest action sequence from a state in I to a state in G.
At this point a plan can be output by the following algorithm.
goal := any valuation that satisfies state[t]∧G;
while goals does not satisfy I do
i := any action index i such that states[t]∧Fi↓(goal[A/A']) is satisfiable;
output action i;
goal := any valuation (on A) that satisfies states[t]∧Fi↓(goal[A/A']);
end while
Above, goal[A/A'] is a valuation like goal, except that it is defined on
A' instead of A, that is. goal[A/A'](a') = goal(a) for all a∈A.
And R↓v for a valuation v on A is a formula obtained from R by replacing
each occurrence of a∈A by TRUE if v(a)=true and by FALSE if v(a)=false.
Adnan Darwiche, Decomposable negation normal form, Journal of the ACM, 48(4), 2001.
This paper proposed another normal form that is often more compact than BDDs.
Coudert, Berthet and Madre, Verification of Synchronous Sequential Machines Based on Symbolic Execution}, in Automatic Verification Methods for Finite State Systems, LNCS 407, Springer-Verlag, 1990. This paper and the next are the first ones to use BDDs for solving
state-space reachability problems.
Burch, Clarke, Long, MacMillan and Dill, Symbolic Model Checking for Sequential Circuit Verification, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 13(4), 1994.
Meta level search strategies
When solving a planning problem, there are some meta-level strategies
that can be applied to any type of search algorithm (or their combination).
Algorithm portfolios
The idea of using a combination of several techniques has already been
mentioned in the context of heuristics, where aggregates of two or
more unrelated heuristics can be formed in different ways.
Algorithm portfolios do the same at the highest level of a program
that solves a planning problem.
The general idea is to utilize the complementarities between different
search methods.
If the strengths of algorithm 1 are sufficiently complementary to
the strengths of algorithm 2, the combination of these two algorithms
may be much stronger than either of the components.
A portfolio can be used in different ways.
Selection: Choose one algorithm from the portfolio, based on the properties
of the problem instance, and run it.
Sequential composition:
Run several algorithms according to a given schedule, for example
the first algorithm for n seconds or until some termination
criterion is satisfied, and if no solution has been found,
terminate the run and continue with the next algorithm.
Parallel composition:
Run several algorithms in parallel (in one or several CPUs/cores), with
the same or different rates, until one of them finds a solution.
In the planning area, the construction of portfolios has been more of
an art than a science, typically constructing a fixed portfolio by
selecting a suitable combination of algorithms that perform well with
the target problem set.
Theoretically more sound methods have been presented for example in the SAT area.
References
Gomes & Selman, Algorithm portfolio design: theory vs. practice,
Uncertainty in AI, 1997.