Comment on the 2011 planning competition

After having an impressive performance with the problems from all previous planning competitions (see here), the SAT-based planners fared in the 2011 competition much worse than expected, in terms of number of problems solved (none of the SAT-based planners spent any time to minimize the cost of the plan). The choice of problems for the competition had an impact on this, of course (e.g. the very easy sequential problems with several hundreds or a couple of thousand of actions were not solved with the default setting of the planners simply because of the very long plan length), but a more important issue was the way the difficulty level of the problem instances was chosen for each domain.

Unlike in the previous competitions, the organizers determined the "right" difficulty level so that the majority of the participating planners could solve most of the instances and that the instances were still not too easy. This introduced a strong bias that favors planners that are representative of the majority. Of course, most of the participating planners belonged to the HSP-FF-LAMA family.

What this meant in practice is that the best majority planners solved most of the problems (as was intended). For the best SAT-based planners the difficulty level did not match the planners capabilities: several domains were way too easy (all instances were solved in a fraction of a second) and some domains were way too difficult (not one single instance solved.)

If the difficulty level of all domains had been based on the SAT-based planners, we would have gotten exactly the opposite result: best SAT-based planners solve (almost) all instances, and the best planners representing other paradigms would have had difficulties on several domains, in some cases not solving any instances.

Apparently, the possibility of this kind of bias was not noticed ( or it was not considered a major issue) when the procedure for choosing problem instances for the competition was devised.