A brief overview of AI planning Jussi Rintanen, June 2010

A brief overview of AI planning

The planning problem in Artificial Intelligence is about the decision making performed by intelligent creatures like robots, humans, or computer programs when trying to achieve some goal. It involves choosing a sequence of actions that will (with a high likelihood) transform the state of the world, step by step, so that it will satisfy the goal. The world is typically viewed to consist of atomic facts, and actions make some facts true and some facts false. In the following we discuss a number of ways how to formalize planning, and how the planning problem can be solved automatically.

Models of state transition systems

Most works on planning use a state-transition system model (we often just write transition system), in which the world/system consists of a (finite or infinite) number of states, and actions/transitions change the current state to a next state. (Exceptions to this are planning with Hierarchical Task Networks (HTN), and planning problems closer to Scheduling, in which the world states are not represented explicitly.)

There are several possible representations for state-transition systems. A state-transition system can be represented as an arc-labeled directed multi-graph. Each node of the graph is a state, and the arcs represent actions. An "action" is usually something that can be taken in several states, with the same or different effects. All the arcs corresponding to one action are labeled with the name of the action. There may be more than one action that moves us from state A to state B, and this means that there are more than one arc from A to B (this is why we said that the graph is a multi-graph.)

Succinct representations of transition systems

In planning (similarly to other areas, including Discrete Event Systems and Computer-Aided Verification), the transition systems are usually not represented explicitly as a graph, simply because those graphs could be far too big. Instead, a compact (succinct, factored) representation is used. These representation are typically based on state variables. A state, instead of being an atomic object, is a valuation of a (finite) number of state variables.

If there are state variables A : {1,2,3}, B : {0,1} and C : {0,1}, then there are 3*2*2* states 100, 200, 300, 101, 201, 301, 110, 210, 310, 111, 211, and 311.

If the domain of each state variable is finite, and there are finitely many state variables, then the number of states is finite as well.

When states are represented as valuations of state variables, an action can be represented as a procedure or a program for changing the values of the state variables. Below we explain a couple of possibilities how these procedures can be defined.

STRIPS

The simplest language used for formalizing actions is the STRIPS language. In STRIPS, the state variables have the domain {0,1} (equivalently {FALSE, TRUE}), and an action consists of three sets of state variables, the PRECONDITION, the ADD={a1,a2,...,an} list, and the DELETE={d1,d2,...,dm} list (it is assumed that ADD and DELETE don't intersect.)

An action is possible in a state if all the variables in PRECONDITION have the value 1. Taking the action corresponds to executing the following program, consisting of assignment statements only:
a1 := 1
a2 := 1
...
an := 1
d1 := 0
d2 := 0
...
dm := 0.

In STRIPS planning, a goal is usually expressed as a set of state variables. A state is a goal state if all of the goals have the value 1 in it.

Petri nets

Petri nets are a model of state transition systems in which several transitions may take place simultaneously in the sense that they are independent. Some of the restricted Petri net models (e.g. 1-safe state-transition nets) are equivalent to the propositional STRIPS model.

For Petri nets, the state variables are the places. In state-transition nets, each place can hold 0 or more tokens. Essentially, a place is a state variable with the set of natural numbers as its domain. The class of 1-safe Petri nets adds the condition that each place can only hold 0 or 1 tokens. This way each state variable has the domain {0,1}, and there will be only a finite number of states.

Transitions (actions) in Petri nets are described slightly differently. Each transition has a set of predecessor places and a set of successor places. The transition is possible if all predecessor places N have a token (this corresponds to the PRECONDITION in STRIPS). When the transition is fired, all the successor places will receive an additional token.

STRIPS actions can be relatively easily translated into Petri nets with the 1-safety property (see Hickmott et al., IJCAI'07), but there are some small complications because in STRIPS the assignment a1 := 1 has the same result no matter what the old value of a1 was, whereas in Petri nets if a1 is a successor place, the number of tokens is increased by one, and may become something else than 1 if the place already had one or more tokens.

Petri nets in general cannot be translated into STRIPS, because in general there may be any number of tokens in a place, and consequently an infinite number of states. Petri nets with the 1-safety property however can be translated.

1. C. A. Petri, Kommunikation mit Automaten, Institut für instrumentelle Mathematik, Universität Bonn, Schriften des IIM, No. 2, Germany, 1962.
2. Murata, T., Petri nets: properties, analysis and applications, Proceedings of IEEE, 77(3), pp. 541-580, 1989.

PDDL/ADL is a generalization of STRIPS. Here we describe a propositional variant of PDDL/ADL. (PDDL actually has a Lisp-like syntax, but we don't use it here.)

The differences of PDDL in comparison to STRIPS are

• The PRECONDITION may be an arbitrary Boolean combination of atomic facts about the state variables. Atomic facts say something about one state variable, for example a=0 or b=1. A precondition could for example be (a=1)∨¬(b=1∧c=0).
• Instead of the unconditional assignments represented by ADD and DELETE, the effects may be conditional. This means that the effects are of the form, IF condition THEN a := v
where the condition is a Boolean combination of facts. STRIPS corresponds to PDDL with trivial conditions that are always true (the condition is the constant TRUE).
• Goals may be Boolean combinations of atomic facts (formulas).
PDDL can be reduced to STRIPS, but
• there is NO one to one correspondence between the PDDL actions and the STRIPS actions, because several STRIPS actions may be needed for one PDDL action, and
• the set of STRIPS actions may have a size that is exponential in the size of the set of PDDL actions. This is because a PDDL action may need to be reduced to an exponential number of STRIPS actions. For example, a PDDL action with effects

IF a1=0 THEN a1 := 1
IF a1=1 THEN a1 := 0

IF a2=0 THEN a2 := 1
IF a2=1 THEN a2 := 0

...

IF an=0 THEN an := 1
IF an=1 THEN an := 0

corresponds to 2n STRIPS actions because the effects of the actions are different in 2n different classes of states, and for each class of state a different STRIPS action is needed.

Representation of actions in the propositional logic

Here we restrict to Boolean state variables, and will denote the atomic facts a=0 and a=1 respectively by a and ¬a. In the representation of actions and change in the propositional logic we need two sets of propositional variables: one expressing the values of the state variables in the predecessor state, and one for the successor state. We distinguish between the two by the ' sign so that the old value of x is denoted by the propositional variable x and the new value by the propositional variable x'.

All the above action representations (STRIPS, 1-safe Petri nets, PDDL) can be compactly represented in the propositional logic. Given an action, its logic representation consists of two parts:

1. the precondition formula P (exactly like in PDDL)
2. description of what happens to each state variable, which is a conjunction of formulas, one conjunct for each state variable. Each state variable ai is represented as an equivalence F(a1,a2,...,an) ↔ a'i where F is a Boolean function on the state variables (a formula).
For STRIPS the functions F are trivial: for a state variable x they are either x, the constant TRUE, or the constant FALSE. This means that in STRIPS either the value of a state variable remains the same, or the variable unconditionally becomes TRUE or FALSE.

The reduction from 1-safe Petri nets to the propositional logic goes through the STRIPS representation.

For PDDL the functions F may be arbitrarily complex. We don't give a systematic mapping from PDDL to the propositional logic here, although the mapping is straightforward. An action consisting of the precondition d and the conditional effect IF a∧b THEN c is represented by the propositional formula d∧(((a∧b)∨c)↔c'). The formula says that the new value of c will be TRUE if and only if a and b are both true or c was true already.

State-space search

State-space search is one of the earliest methods for answering questions about transition systems, including model-checking (verification), planning and others. It is easy to implement efficiently, but, when the number of states is high, its applicability is limited by the necessity to do the search only one state at a time. When the number of states is some tens of millions or less, this approach is efficient and can give guarantees of finding solutions in a limited amount of time.

Search algorithms

There are several types of search algorithms that are routinely applied in planning. These include the following.
1. well-known uninformed search algorithms like depth-first search, breadth-first search
2. systematic heuristic search algorithms with optimality guarantees, for example A* and its variants like IDA*, WA*
3. systematic heuristic search algorithms without optimality guarantees, for example the standard "best-first" search algorithm which is like A* but ignores the cost-so-far component of the valuation function
4. incomplete unsystematic search algorithms, most notably stochastic search algorithms
Incomplete search algorithms can be useful for specific problems when good heuristics are available, or as a part of a portfolio of search algorithms (see portfolios below.)

References

J. Pearl, Heuristics: Intelligent Search Strategies for Computer Problem Solving, Addison-Wesley, 1984.

Symmetry reduction

Symmetry reduction methods try to decrease the effective search space by recognizing symmetries in the state-space graph. In planning, symmetries are typically caused by the interchangeability of objects. If there is a plan that involves some interchangable objects A and B, there is a symmetric plan with the roles of A and B interchanged.

The standard works on symmetry reduction for state-space search are from early nineties.

References

1. Starke, P. H., Reachability analysis of Petri nets using symmetries, Journal of Mathematical Modelling and Simulation in Systems Analysis, 8(4/5), pp. 293-303, 1991.

Partial-order reduction

Partial-order reduction methods try to decrease the number of search steps by recognizing different forms of independence of actions/transitions. The basic idea is that if actions/transitions A and B are independent in the sense that they can be taken in either order A;B or B;A, one only needs to consider one of these orderings.

Many of the existing partial-order reduction methods address deadlock detection, but there is a simple reduction from the problem of reaching a goal state to the problem of reaching a deadlock state, and therefore these methods are easily applicable to the planning problem, too.

1. Antti Valmari, Stubborn Sets for Reduced State Space Generation, Advances in Petri Nets 1990. 10th International Conference on Applications and Theory of Petri Nets, Bonn, Germany, Lecture Notes in Computer Science 483, pp. 491-515, Springer-Verlag, 1991.

Heuristics

There are several ways of systematically (automatically) creating heuristics. the main approaches are different forms of relaxation, in which either
• the original problem instance is simplified, and then the simplified problem instance is solved, or
• the problem instance is unmodified, but the definition of solution is simplified so that it can be efficiently solved. (In many cases the same thing can be achieved in either of these ways, but the latter option seems to be more general.)
Well-known examples of this are the following.
• Computing the max-heuristic of Bonet and Geffner for STRIPS proceeds by ignoring all effects x := 0 and retaining all effects x := 1, and then finding a plan for this simplified problem instance.
The shortest solution(s) to this problem instance will give a lower bound for the plan length for the original instance, and can be used as a heuristic in a heuristic search algorithm such as A* to find shortest plans.
• Another, general method for deriving heuristics is pattern databases (Culberson & Schaeffer, 1998, with application to planning by Edelkamp in 2001). The idea is to relax the original problem instance in alternative ways, by eliminating all state variables except for a small number n of them (with n typically 10 or less, so that the problem becomes almost trivial), and then solving the simplified problem instances optimally. The length of any of these optimal solutions is a lower bound for the length of the shortest plan of the original problem instance.
• generalized abstractions of Dräger et al. (2006), which can be viewed as a generalization of pattern databases that allows abstraction by arbitrary aggregation of states, not only by ignoring some of the state variables (this was called combine-and-abstract by Dräger et al., and planning researchers have renamed this to merge-and-shrink).
There are methods for deriving heuristics from other heuristics.
• The simplest one, used in connection for pattern databases, is to take the maximum of different heuristics. Since all of the constituent heuristics are lower bounds for the true shortest plan length, their maximum is a lower bound as well.
• There are variations of the "taking the maximum" scheme that try to be more accurate: when two heuristics respectively give lower bounds n and m, and their calculation (e.g. through the max-heuristics or pattern databases) involved completely disjoint sets of actions, then also the sum of the heuristics is still a lower bound for the length of the shortest plan of the original plan. The use of this strategy is more complicated, because it may be difficult to find a partition the original planning problem in a way that the disjointness property can be guaranteed in a useful way.
The above discussion is about admissible heuristics, which represent true lower bounds for the actual shortest plan length. Admissible heuristics are needed when trying to find optimal plans with algorithms such as A*. When optimality is not required, heuristics don't have to be admissible, and then basically anything goes. Work on finding efficient non-admissible heuristics is typically driven by their performance w.r.t. standard benchmark problems, as they don't have to satisfy any special properties.

References

1. B. Bonet and H. Geffner, Planning as Heuristic Search, Artificial Intelligence Journal, 129(1-2), pp. 5-33, 2001.
2. K. Dräger, B. Finkbeiner and A. Podelski, Directed Model Checking with Distance-Preserving Abstractions, in Model Checking Software, LNCS 3925, pages 19-34, 2006.
3. J. C. Culberson and J. Schaeffer, Pattern Databases, Computational Intelligence, 14(4), pp. 318-334, 1998.
4. S. Edelkamp, Planning with Pattern Databases, Proceedings of the 6th European Conference on Planning (ECP-01), to appear.

Planning by SAT and constraint-satisfaction

Many transition systems have too many states to consider them explicitly one by one, and then factored representations that allow representing large numbers of states and state sequences may be a more efficient alternative. Such representations and search methods are known as symbolic or factored. The earliest symbolic search methods were based on Binary Decision Diagrams (BDDs, discussed below). The currently most popular method is based on reduction to the propositional satisfiability problem SAT and the use of efficient algorithms for solving the SAT problem.

SAT methods are less prone to excessive memory consumption than BDDs. Unlike state-space search, their memory consumption is not linear in the number of visited (stored) states, and they can be implemented with a memory consumption that is linear in the length of a plan or transition sequence (as opposed to the in general exponential memory consumption of explicit state-space search.) However, the currently leading algorithms for the SAT problem gain much of their efficiency by consuming more memory than the theoretical optimal algorithms.

All the main improvements to state-space search (symmetry reduction, partial-order reduction, heuristics) have counterparts in the SAT setting, in the form of symmetry-breaking constraints, parallel or partially-ordered plans, and SAT-solvers with planning-specific heuristics. We will not be discussing these in more detail here.

SAT algorithms

References to the technology underlying the current generation of efficient SAT algorithms see the following.

References

1. D. Mitchell, A SAT Solver Primer, EATCS Bulletin, 85, pp. 112-133, February 2005.
2. P. Beame, H. Kautz, and A. Sabharwal, Towards Understanding and Harnessing the Potential of Clause Learning, Journal of AI Research, 22, pp. 319-351, 2004.

Reduction to the SAT problem

Plans can be found by solving a satisfiability (SAT) problem, or solving a constraint satisfaction problem (CSP). We describe the SAT case; CSP and SAT frameworks are very much similar, and the ideas expressed in terms of SAT are directly applicable in the CSP setting.

The idea is to have formulas PT (where T is a positive integer) such that PT is satisfiable if and only if there is a plan with a horizon 0,1,...,T. Depending on the definition of plans, at each time point 0,1,...,T-1 there may be one or more actions.

The formulas PT can be constructed from the logic representation of actions we described earlier. Instead of using the propositional variables A∪A' where A = { a1,..., an } and A' = { a'1,..., a'n } for the values of the state variables in the current and in the next time point, we have the propositional variables A@i = { a1@i,..., an@i } for different time points 0≤i≤T.

Given a formula F for some action expressed in terms of the variables in A∪A', we obtain F@i by replacing all these variables by a1@i,..., an@i,a1@(i+1),..., an@(i+1).

When we have formulas F1,...,Fn representing n different actions, we can represent the transition relation between states at time i as the formula R@i = F1@i∨F2@i∨...∨Fn@i.

Now a sequence of T actions can be represented as R@0∧R@1∧...∧R@(T-1).

Finally,finding a plan (of length T) from any state satisfying a formula I to any state satisfying formula G can be expressed as the formula PT = I@0∧R@0∧R@1∧...∧R@(T-1)∧G@T, where I@0 and G@T are obtained from I and G by replacing each state variable a in these formulas respectively by a@0 and a@T.

The time it takes to test the satisfiability of the formulas PT strongly depends on the number of propositional variables in the formulas, which, for a given planning problem, is determined by the parameter T. Most works that improve the efficiency of the satisfiability tests do this by decreasing T by allowing more than one action at each time point. See the references for more details.

References

1. H. Kautz and B. Selman, Pushing the envelope: planning, propositional logic, and stochastic search, AAAI'96, pp. 1194-1201, AAAI Press, 1996.
2. J. Rintanen, K. Heljanko and I. Niemelä, Planning as Satisfiability: parallel plans and algorithms for plan search, Artificial Intelligence, 170(12-13), pages 1031-1080, 2006.
3. J. Rintanen. Planning as satisfiability: heuristics, Artificial Intelligence Journal, 2012.
4. J. Rintanen, Planning and SAT, in A. Biere, H. van Maaren, M. Heule and Toby Walsh, Eds., Handbook of Satisfiability, pp. 483-504, IOS Press, 2009.
5. J. Rintanen. Planning with SAT, admissible heuristics and A*. In Proceedings of the International Joint Conference on Artificial Intelligence, AAAI Press, pages 2015-2020, 2011. (© 2011 American Association for Artificial Intelligence. AAAI)
See my planning as satisfiability page for more details, including high-performance implementations of the approach.

Planning by "symbolic" search methods such as Binary Decision Diagrams

Symbolic search methods here means the use of Binary Decision Diagrams (BDD) and similar normal forms (such as DNNF) for propositional formulas for representing sets of states and transition relations compactly.

"Symbolic" planning uses exactly the same formulas we used above with SAT-based planning. The difference is that instead of constructing the formulas PT = I@0∧R@0∧R@1∧...∧R@(T-1)∧G@T for different values of T>0 and testing their satisfiability, we incrementally and implicitly compute from the formulas

```  I@0
I@0∧R@0
I@0∧R@0∧R@1
I@0∧R@0∧R@1∧R@2
```
the sets of states that are respectively reachable by 0, 1, 2 or more actions, and see if some of the goal states are reachable. Instead of the timed variables a@i we only use the variables a and a' for the current and the next state.

I represents the initial state(s). Let R = F1∨F2∨...∨Fn where the Fi formulas are as defined earlier. Now I∧R represents the set S of state pairs (s,s') where state s' follows s when one of the actions is taken and s is an initial state. To compute the set { s' | (s,s') ∈S } we eliminate from I∧R all propositional variables in A. For this we use the existential abstraction operation defined by ∃x.F = F[0/x]∨F[1/x]. For sets X = { x1,...,xk } of propositional variables we define ∃X.F = ∃x1.∃x2....∃xk.F.

We also define a renaming operation: F[A'/A] means that all occurrences of variables in A' are replaced by the corresponding variable in A (for example a' is replaced by a.)

Now sets of state reachable by 0,1,2 and more steps are represented by the following formulas:

```  I
(∃A.(I∧R))[A'/A]
(∃A.(((∃A.(I∧R))[A'/A])∧R))[A'/A]
(∃A.((∃A.(((∃A.(I∧R))[A'/A])∧R))[A'/A])∧R)[A'/A]
```

This can be more clearly written as a small program which computes the sets of states reachable with 0,1,2 and more steps until a set that intersects the goals G is reached.

```states[0] := I;
t := 0;
while states[t]∧G is unsatisfiable do
t := t+1;
states[t] := (∃A.(states[t-1]∧R))[A'/A];
end while
```

The above algorithm terminates at iteration t iff t is the length of the shortest action sequence from a state in I to a state in G. At this point a plan can be output by the following algorithm.

```goal := any valuation that satisfies state[t]∧G;
while goals does not satisfy I do
i := any action index i such that states[t]∧Fi↓(goal[A/A']) is satisfiable;
output action i;
goal := any valuation (on A) that satisfies states[t]∧Fi↓(goal[A/A']);
end while
```
Above, goal[A/A'] is a valuation like goal, except that it is defined on A' instead of A, that is. goal[A/A'](a') = goal(a) for all a∈A. And R↓v for a valuation v on A is a formula obtained from R by replacing each occurrence of a∈A by TRUE if v(a)=true and by FALSE if v(a)=false.

References

1. Bryant, Symbolic Boolean Manipulation with Ordered Binary Decision Diagrams}, ACM Computing Surveys, 24(3), 1992.
2. Adnan Darwiche, Decomposable negation normal form, Journal of the ACM, 48(4), 2001. This paper proposed another normal form that is often more compact than BDDs.
3. Coudert, Berthet and Madre, Verification of Synchronous Sequential Machines Based on Symbolic Execution}, in Automatic Verification Methods for Finite State Systems, LNCS 407, Springer-Verlag, 1990.
This paper and the next are the first ones to use BDDs for solving state-space reachability problems.
4. Burch, Clarke, Long, MacMillan and Dill, Symbolic Model Checking for Sequential Circuit Verification, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 13(4), 1994.

Meta level search strategies

When solving a planning problem, there are some meta-level strategies that can be applied to any type of search algorithm (or their combination).

Algorithm portfolios

The idea of using a combination of several techniques has already been mentioned in the context of heuristics, where aggregates of two or more unrelated heuristics can be formed in different ways. Algorithm portfolios do the same at the highest level of a program that solves a planning problem. The general idea is to utilize the complementarities between different search methods. If the strengths of algorithm 1 are sufficiently complementary to the strengths of algorithm 2, the combination of these two algorithms may be much stronger than either of the components.

A portfolio can be used in different ways.

• Selection: Choose one algorithm from the portfolio, based on the properties of the problem instance, and run it.
• Sequential composition: Run several algorithms according to a given schedule, for example the first algorithm for n seconds or until some termination criterion is satisfied, and if no solution has been found, terminate the run and continue with the next algorithm.
• Parallel composition: Run several algorithms in parallel (in one or several CPUs/cores), with the same or different rates, until one of them finds a solution.
In the planning area, the construction of portfolios has been more of an art than a science, typically constructing a fixed portfolio by selecting a suitable combination of algorithms that perform well with the target problem set. Theoretically more sound methods have been presented for example in the SAT area.

References

1. Gomes & Selman, Algorithm portfolio design: theory vs. practice, Uncertainty in AI, 1997.

Jussi Rintanen