ICAPS 2011 International Probabilistic Planning Competition (IPPC)
General Information
This track (UNCERTAINTY / PROBABILISTIC):
Organization:
- Lead: Scott Sanner
(NICTA and the ANU; ssanner [at] gmail.com )
- Lead: Sungwook Yoon (PARC; sungwook.yoon [at] gmail.com)
- Domain contributor: Thomas Walsh
(University of Arizona; thomasjwalsh [at] gmail.com )
IPPC 2011 Mailing List
Current and Past Planning Competitions
Subsequent 2014 Discrete track
and 2015 Continuous track using RDDL.
IPPC 2011 Final Competition Evaluation Information
Thanks to Amazon for a research grant that covered all
Amazon Elastic Compute Cloud (EC2)
costs for the competitors and organizers.
- Winners of Boolean MDP Track
- 1st place -- PROST: Thomas Keller, Patrick Eyerich (University of Freiburg, Germany)
source code
- 2nd place -- Glutton: Andrey Kolobov, Peng Dai, Mausam, Dan Weld (University of Washington, USA)
source code
- Winners of Boolean POMDP Track
- 1st place -- POMDPX: Kegui Wu, Wee Sun Lee, David Hsu (National University of Singapore)
source code
- 2nd place -- KAIST: Dongho Kim, Kanghoon Lee, Kee-Eung Kim (KAIST, South Korea)
source code
IPPC 2011 Competitors
- Boolean MDP Track:
- SPUDD: George Zhu, Marek Grzes, Jesse Hoey; University of Waterloo
description
source code
- Glutton: Andrey Kolobov, Peng Dai, Mausam, Dan Weld; University of Washington
description
source code (NEW)
- PROST: Thomas Keller, Patrick Eyerich; University of Freiburg
description
source code (NEW)
- MIT-ACL: Tuna Toksoz, Kemal Ure, Josh Redding, Alborz Geramifard; MIT
description
- Beaver: Aswin Nadamuni, Prasad Tadepalli, Saket Joshi, Alan Fern; Oregon State University
description
- Boolean POMDP Track:
- Symbolic Perseus: Kyle Morrison, Pascal Poupart, Jesse Hoey; University of Waterloo
description
source code
(original Matlab/Java software,
original Java-only software)
- POMDPX NUS: Kegui Wu, Wee Sun Lee, David Hsu; National University of Singapore
description
source code
- KAIST AILAB: Dongho Kim, Kanghoon Lee, Kee-Eung Kim; KAIST
description
source code
- McGill: Shaowei Png, Sylvie Ong, Joelle Pineau; McGill University
description
- POND-Hindsight: Dan Bryce, Alan Olsen; Utah State University
description
- HyPlan: Eddy Borera, Larry Pyeatt; Texas Tech University
description
Language Change this Year to RDDL!
- RDDL Language Guide
- RDDL is the Relational Dynamic influence Diagram Language -- pronounced "riddle"
- For now, please cite as
@unpublished{Sanner:RDDL,
author = "Scott Sanner",
title = "Relational Dynamic Influence Diagram Language (RDDL): Language Description",
note = "http://users.cecs.anu.edu.au/~ssanner/IPPC_2011/RDDL.pdf",
year = 2010}
- A Short Introduction to RDDL Tutorial
- Code Repository for RDDL Simulator:
- rddlsim project hosted on Google Code
- To get started see: rddlsim/INSTALL.txt.
- Recommendation: checkout the code via svn (so you can just 'svn update' to get the latest revisions and domains)
- Project repository includes:
- Example problem domains:
- Boolean track MDP and POMDP domains and translations:
- General track (boolean, int, continuous) RDDL domains can be found in rddlsim/files/rddl/examples.
Offered RDDL Translations
For this IPPC, we provide translations from RDDL to all of the following formats:
Translators exist for the following, but are not provided:
- Cassandra Format (for POMDPs) --
you can use the above Symbolic Perseus translations with Pascal Poupart's Symbolic Perseus -> Cassandra translator described at the bottom of this page.
Warning: enumerated state formats like Cassandra may be too large to that point that translations for most IPPC problems are impossible.
To run the translator yourself, see the INSTALL.txt instructions in the rddlsim project.
Competition Format (all tracks)
Domains
- 8 domains
- 10 instances per domain
- No discounting and fixed horizon of 40 for all instances
- 30 trials per instance
Procedure
- We will use the Amazon Elastic Compute Cloud (EC2)
- You will receive instructions on how to login to your own personal Linux or Windows node (your choice) hosted on EC2
- Amazon EC2 instructions: pdf
- You should install your planner directly on your Linux or Windows EC2 node, notes
- You'll have admin rights so you can install anything you want
- The EC2 node specs are given by Standard Instance -- large (7.5GB) listed at the above EC2 link
- Your Client IP address is logged on each connection and will be verified to originate from within the EC2 cluster
- Rules:
- Your planner can only use the resources of one EC2 Standard Instance -- large; using any
other computational resources is prohibited
- Your planner can use any resources on your single EC2 instance (e.g., both processors)
- Your Client can only maintain one connection with the Server at any time -- under current rules you cannot run multiple Clients in parallel
- You are allowed to manually interact with your planner to kill it, debug errors or change parameters, and restart it
- You are not allowed to add structured information to your planner based on your manual domain analysis, e.g., providing a hand-built domain-specific set of policy restriction rules for elevators
- A planner freeze is in effect since 23 April 2011, GMT 05:00, specifically, the only modifications you can now make to your planner code are the following:
- Debugging to fix parser faults and bugs (poor performance is not a bug)
- Tuning the set of parameters (boolean flags, int & real-valued constants)
- Your nominated group representative must fill out and sign a form stating that you have abided by the above rules (or listing any deviations and reasons if not): IPPC rules agreement
- This signed form will be publicly posted along with the competition results
- At competition time
- We will tell you
- The server host name and port to communicate with
- If you have two or more distinctly different planners (i.e., each publishable separately) then email the organizers to arrange one server for each planner
- The location of all RDDL domains / instances and their translations
- A list of all instance names for the competition in a plain text file (80 lines, 1 instance name per line)
- Your client will have 24 hours to complete trials for all instances (no other time limit per instance / trial)
- Your client requests instance names to run so you choose the instance order, notes:
- A problem name (legacy term from PPDDL) is the same thing as an instance name (for RDDL)
- If you're using a translation of RDDL (PPDDL, SPUDD, Symbolic Perseus, etc.), some notes:
- translated filename format is <instance-name>.<suffix>
instance names will be unique in the competition so that is all the information you'll need from the filename
- sending an action to the server requires separating the action predicate and terms
... when concurrency is used, multiple actions will be separated from each other by a triple "___"
... the action predicate will always be separated from the terms by a double "__"
... the terms will be separated from each other by a single "_" (term names will never use a "_")
- Instances will have a suffix "__1" up to "__10" where 1 will be the smallest (usually easiest) and 10 the largest (usually hardest)
- You need not compete on all instances or for all trials, in that case the best average score from the random and NOOP policies is assigned (see below)
- If you're confused about client interaction with the server, please make sure you
- Post-competition evaluation
- We will shutdown the server at the 24 hour time limit
- The server maintains a log file for all instances and trials you complete
- If you execute more than 30 trials per instance, we will only use data for the last 30 trials
- Overall planner evaluation criterion for ranking:
- For each instance trial, the server records a raw score
- A raw score is the sum of discounted rewards over the finite horizon as seen from the initial state (could be positive or negative)
- To ensure compatibility with PPDDL for the IPPC 2011, all discount factors will be 1.0 (i.e., undiscounted)
- For the final competition (boolean MDP and POMDP tracks), a fixed horizon of 40 will be used for all instances
- We will compute a [0,1] normalized average score per instance from raw scores
- Per instance, averaged over all 30 trials, we will determine
- the minimum average score (minS_{instance}) is the max over the average scores from a purely random policy and an always-noop policy
- the maximum average score (maxS_{instance}) is the max over all competitors, a purely random policy, and a pure noop policy
- if a planner does not compete a trial for an instance, minS_{instance} is assigned as the raw score for that missing trial
- we will make available all raw and normalized data as well as minS_{instance} and maxS_{instance} used to compute the normalized score for each instance
- normalized-score_{planner,instance} = max(0, [(sum_{trials 1..30} raw-score_{planner,instance,trial})/30 - minS_{instance}] / [(maxS_{instance} - minS_{instance})] )
- we use a max here in the unlikely event that a planner does worse than either the noop or random policies to ensure the minimum score is 0... we don't want to penalize a planner that tries and fails vs. a planner that simply skips an instance and gets 0 automatically
- Final planner evaluation criterion
- avg-norm-score_{planner} = (sum_{instance 1..80} normalized-score_{planner,instance}) / 80
- note 1: 80 instances are from 8 domains X 10 instances per domain (instance names uniquely determine the domain)
- note 2: given the normalized score per instance, it is to your advantage to complete easier instances before harder ones
- Min / max score:
- The minimum avg-norm-score for any competing planner is 0
- The maximum avg-norm-score for any competing planner is 1
- Planners will be ranked by their avg-norm-score
- See the evaluation code that will be used for final competition scoring
Draft Language, Parser, and Simulator Release: September 2010
Language Finalization and Test Domains Available: October 2010
- Call for Participation:
November 2010
- Test Round:
March 11-18
- Final Competition Round:
Starting from April 24th, 01:00 GMT, open for 7 days
- The competition is over, thanks to all who competed. Results available below.
- Final Competition Problems: tar.gz or zip
- Amazon EC2 instructions: pdf
- Competitor time signup page: here
- At the designated start time, the organizers will
- Email a link to the final competition instances
- Email the SERVER_NAME and PORT information
- Shut down the server 24 hours after the designated start time
- Notes:
- Use a single descriptive name for your planner
- See competition info below for rules
- Please sign and scan (or photograph) this IPPC rules agreement form and email it to Scott and Sungwook.
FAQ
Q: Why switch from PPDDL to RDDL for this year's IPPC?
A: Lifted RDDL is strictly more expressive than lifted PPDDL and this expressiveness is required
to specify most of the domains in this year's competition. For more information, see this
discussion
and the RDDL documentation
for further motivation.
Q: Sorry, I don't have time to read the RDDL documentation. In brief, what is the new expressiveness that RDDL offers?
A: In short, support for
- complex scalable transition and general reward distributions,
- easier expression
of exogenous and independent events,
- support for multi-valued, integer, and continuous
fluents and complex compositional distributions for these fluents,
- support for
partial observability (POMDPs), and
- a clear semantics for concurrent action
executions in an unrestricted concurrency stochastic setting...
a major motivation for why RDDL (a) specifies actions as fluents, (b) specifies
transitions with a stratified lifted dynamic Bayes net (DBN),
and (c) uses state-action constraints in place of preconditions.
Q: If RDDL is so expressive, who will be able to compete?
A: It's interesting to note that this expressiveness only differentiates lifted RDDL from
lifted PPDDL (at least for the boolean MDP track of the competition). However, almost all planners plan
at a ground (factored) level and not at a lifted level. At the ground level, RDDL and PPDDL are
equally expressive -- likewise for SPUDD and Symbolic Perseus (for PO-PPDDL) which are inherently ground factored
languages. So previous PPDDL / MDP / POMDP ground planners can all compete without modification.
The simple impact of RDDL's
expressiveness on the competition this year is not on the language that (most) planners are planning
in, but rather on the language that domain designers are designing in.
With RDDL we can now specify more expressive and interesting
lifted domains that were impossible to express before.
Q: What drove the development of RDDL?
A: The practical motivations for RDDL were conceived with
the help of VIAGRA.
Scott highly recommends you try it!
The reasons for using a theoretically motivated lifted representation of MDPs and POMDPs for RDDL
are two-fold:
- to ensure all groundings of this representation are guaranteed to be well-defined MDPs or POMDPs and
- to promote
future research into lifted solutions. Results on (exact) lifted solutions for RDDL will most likely be for restricted subsets
of the full RDDL language... searching for an
exact lifted solution to full RDDL is hypothesized to be harmful to one's mental health.
Q: Aside from RDDL, were there are any other objectives in running this year's competition?
A: Well, yes, quite a few actually...
One goal is simply to represent more realistic probabilistic planning problems that
may help our community connect more with other fields and industry. To this end, competitors
will certainly see some new domains this year that were not possible to represent
in previous competitions (traffic control with cars moving independenly,
elevator control with independent arrivals, UAV reconnaissance, etc.).
A second goal is to pull in the (quite large) partially observed crowd, who have been
generally overlooked in previous ICAPS IPPCs. Previous IPPCs have supported conformant
planning (no observations, non-deterministic effects), but not partial observations
and probabilistic effects as we do this year. This sort of partial observability is
heavily used in applications from robotics to speech recognition dialog management to healthcare.
Another goal is to overcome the language barrier between planning under uncertainty
researchers that has existed for the last decade or so.
There have been two (largely) distinct communities that have
addressed planning under uncertainty: the MDP/POMDP community and the "probabilistic planning"
community (the former have been focused more on bounded optimality and
expected discounted reward, while
the latter have focused more on purely goal-oriented problems and satisficing
solutions). These two communities rarely compare on the same problems despite the
overlap and potential for cross-pollination of ideas. One reason has been that
each community has used different language specifications for problems.
This is the main reason why we provide translators
from RDDL to PPDDL (probabilistic planning community), SPUDD (MDP community),
Symbolic Perseus (POMDP community), and other formats. And we note the amazingly
diverse competitor pool this year reflects the fruits of this translation effort.
Finally, I (Scott) note that PPDDL did wonders for the planning under uncertainty community by
giving us a common set of problems and benchmarks to drive forward probabilistic planning
through comparative evaluation. But PPDDL is now over 7 years old and I believe the
previous IPPCs have thoroughly exercised the limits of what can be naturally modeled in lifted
PPDDL. My hope is that RDDL and this IPPC 2011 helps the community break the
lifted PPDDL expressiveness barrier and (with the other general RDDL language extensions listed above)
provides a common relational (read: object-oriented) language for the compact, expressive, and
scalable domain specification for the problems that planning under uncertainty will tackle in the
next decade.