ICAPS 2011 International Probabilistic Planning Competition (IPPC)

General Information

This track (UNCERTAINTY / PROBABILISTIC):

Organization:

Lead: Scott Sanner (NICTA and the ANU; ssanner [at] gmail.com )
Lead: Sungwook Yoon (PARC; sungwook.yoon [at] gmail.com)
Domain contributor: Thomas Walsh (University of Arizona; thomasjwalsh [at] gmail.com )

IPPC 2011 Mailing List
Current and Past Planning Competitions
Subsequent 2014 Discrete track and 2015 Continuous track using RDDL.

IPPC 2011 Final Competition Evaluation Information

Thanks to Amazon for a research grant that covered all Amazon Elastic Compute Cloud (EC2) costs for the competitors and organizers.

Winners of Boolean MDP Track
- 1st place -- PROST: Thomas Keller, Patrick Eyerich (University of Freiburg, Germany)
  source code
- 2nd place -- Glutton: Andrey Kolobov, Peng Dai, Mausam, Dan Weld (University of Washington, USA)
  source code

Winners of Boolean POMDP Track
- 1st place -- POMDPX: Kegui Wu, Wee Sun Lee, David Hsu (National University of Singapore)
  source code
- 2nd place -- KAIST: Dongho Kim, Kanghoon Lee, Kee-Eung Kim (KAIST, South Korea)
  source code

Final results presentation
Archive of log files and tabulated results with README description
Final competition domains/instances: tar.gz or zip
Evaluation code with README instructions

IPPC 2011 Competitors

Boolean MDP Track:
1. SPUDD: George Zhu, Marek Grzes, Jesse Hoey; University of Waterloo
  description
  source code
2. Glutton: Andrey Kolobov, Peng Dai, Mausam, Dan Weld; University of Washington
  description
  source code (NEW)
3. PROST: Thomas Keller, Patrick Eyerich; University of Freiburg
  description
  source code (NEW)
4. MIT-ACL: Tuna Toksoz, Kemal Ure, Josh Redding, Alborz Geramifard; MIT
  description
5. Beaver: Aswin Nadamuni, Prasad Tadepalli, Saket Joshi, Alan Fern; Oregon State University
  description

Boolean POMDP Track:
1. Symbolic Perseus: Kyle Morrison, Pascal Poupart, Jesse Hoey; University of Waterloo
  description
  source code (original Matlab/Java software, original Java-only software)
2. POMDPX NUS: Kegui Wu, Wee Sun Lee, David Hsu; National University of Singapore
  description
  source code
3. KAIST AILAB: Dongho Kim, Kanghoon Lee, Kee-Eung Kim; KAIST
  description
  source code
4. McGill: Shaowei Png, Sylvie Ong, Joelle Pineau; McGill University
  description
5. POND-Hindsight: Dan Bryce, Alan Olsen; Utah State University
  description
6. HyPlan: Eddy Borera, Larry Pyeatt; Texas Tech University
  description

Language Change this Year to RDDL!

RDDL Language Guide

RDDL is the Relational Dynamic influence Diagram Language -- pronounced "riddle"

For now, please cite as

   @unpublished{Sanner:RDDL,
      author = "Scott Sanner",
      title = "Relational Dynamic Influence Diagram Language (RDDL): Language Description",
      note = "http://users.cecs.anu.edu.au/~ssanner/IPPC_2011/RDDL.pdf",
      year = 2010}

A Short Introduction to RDDL Tutorial

Code Repository for RDDL Simulator:
- rddlsim project hosted on Google Code
  - To get started see: rddlsim/INSTALL.txt.
  - Recommendation: checkout the code via svn (so you can just 'svn update' to get the latest revisions and domains)
- Project repository includes:
  - Lexical patterns and BNF grammar specification in rddlsim/src/rddl/parser.
  - Translation from RDDL to a LISP-like prefix format for easier parsing: see rddlsim/INSTALL.txt
  - Translators from RDDL to (PO-)PPDDL & SPUDD (MDPs) and Symbolic Perseus & PO-PPDDL (POMDPs) in rddlsim/src/rddl/translate.
  - Java simulator for RDDL in rddlsim/src/rddl/sim.
  - Java client/server competition interface in rddlsim/src/rddl/competition.
  - C/C++ client competition interface in rddlsim/cclient.

Example problem domains:
- Boolean track MDP and POMDP domains and translations:
  - RDDL (domains only): rddlsim/files/final_comp/rddl_domains
  - RDDL (domains and instances): rddlsim/files/final_comp/rddl
  - PPDDL, PO-PPDDL: rddlsim/files/final_comp/ppddl
  - SPUDD, Symbolic Perseus: rddlsim/files/final_comp/spudd_sperseus
  - All final competition files: tar.gz or zip
- General track (boolean, int, continuous) RDDL domains can be found in rddlsim/files/rddl/examples.

Offered RDDL Translations

For this IPPC, we provide translations from RDDL to all of the following formats:

Prefix RDDL (LISP-like RDDL expression format for easier parsing), see .rddl_predix examples and INSTALL.txt
PPDDL Format (for MDPs)
PO-PPDDL Format (for POMDPs) -- a simple extension of ground PPDDL to support observation functions, see .po-ppddl examples
SPUDD Format (for MDPs)
Symbolic Perseus Format (for POMDPs)

Translators exist for the following, but are not provided:

Cassandra Format (for POMDPs) -- you can use the above Symbolic Perseus translations with Pascal Poupart's Symbolic Perseus -> Cassandra translator described at the bottom of this page.
Warning: enumerated state formats like Cassandra may be too large to that point that translations for most IPPC problems are impossible.

To run the translator yourself, see the INSTALL.txt instructions in the rddlsim project.

Competition Format (all tracks)

Domains

8 domains
10 instances per domain
- No discounting and fixed horizon of 40 for all instances
30 trials per instance

Procedure

We will use the Amazon Elastic Compute Cloud (EC2)
- You will receive instructions on how to login to your own personal Linux or Windows node (your choice) hosted on EC2
  - Amazon EC2 instructions: pdf
  - You should install your planner directly on your Linux or Windows EC2 node, notes
    - You'll have admin rights so you can install anything you want
- The EC2 node specs are given by Standard Instance -- large (7.5GB) listed at the above EC2 link
- Your Client IP address is logged on each connection and will be verified to originate from within the EC2 cluster
- Rules:

At competition time
- We will tell you
  - The server host name and port to communicate with
  - The location of all RDDL domains / instances and their translations
  - A list of all instance names for the competition in a plain text file (80 lines, 1 instance name per line)
- Your client will have 24 hours to complete trials for all instances (no other time limit per instance / trial)
- Your client requests instance names to run so you choose the instance order, notes:

Post-competition evaluation
- We will shutdown the server at the 24 hour time limit
- The server maintains a log file for all instances and trials you complete
- If you execute more than 30 trials per instance, we will only use data for the last 30 trials
- Overall planner evaluation criterion for ranking:
  - For each instance trial, the server records a raw score
    - A raw score is the sum of discounted rewards over the finite horizon as seen from the initial state (could be positive or negative)
      - To ensure compatibility with PPDDL for the IPPC 2011, all discount factors will be 1.0 (i.e., undiscounted)
      - For the final competition (boolean MDP and POMDP tracks), a fixed horizon of 40 will be used for all instances
  - We will compute a [0,1] normalized average score per instance from raw scores
    - Per instance, averaged over all 30 trials, we will determine
      - the minimum average score (minS_{instance}) is the max over the average scores from a purely random policy and an always-noop policy
      - the maximum average score (maxS_{instance}) is the max over all competitors, a purely random policy, and a pure noop policy
      - if a planner does not compete a trial for an instance, minS_{instance} is assigned as the raw score for that missing trial
      - we will make available all raw and normalized data as well as minS_{instance} and maxS_{instance} used to compute the normalized score for each instance
    - normalized-score_{planner,instance} = max(0, [(sum_{trials 1..30} raw-score_{planner,instance,trial})/30 - minS_{instance}] / [(maxS_{instance} - minS_{instance})] )
    - we use a max here in the unlikely event that a planner does worse than either the noop or random policies to ensure the minimum score is 0... we don't want to penalize a planner that tries and fails vs. a planner that simply skips an instance and gets 0 automatically
  - Final planner evaluation criterion
    - avg-norm-score_{planner} = (sum_{instance 1..80} normalized-score_{planner,instance}) / 80
    - note 1: 80 instances are from 8 domains X 10 instances per domain (instance names uniquely determine the domain)
    - note 2: given the normalized score per instance, it is to your advantage to complete easier instances before harder ones
  - Min / max score:
    - The minimum avg-norm-score for any competing planner is 0
    - The maximum avg-norm-score for any competing planner is 1
    - Planners will be ranked by their avg-norm-score
- See the evaluation code that will be used for final competition scoring

Competition Schedule

~~Draft Language, Parser, and Simulator Release: September 2010~~
~~Language Finalization and Test Domains Available: October 2010~~
Call for Participation: ~~November 2010~~
Test Round: ~~March 11-18~~
Final Competition Round: ~~Starting from April 24th, 01:00 GMT, open for 7 days~~
- The competition is over, thanks to all who competed. Results available below.
- Final Competition Problems: tar.gz or zip
- Amazon EC2 instructions: pdf
- Competitor time signup page: here
- At the designated start time, the organizers will
  - Email a link to the final competition instances
  - Email the SERVER_NAME and PORT information
  - Shut down the server 24 hours after the designated start time
- Notes:
  - Use a single descriptive name for your planner
  - See competition info below for rules
  - Please sign and scan (or photograph) this IPPC rules agreement form and email it to Scott and Sungwook.

FAQ

Q: Why switch from PPDDL to RDDL for this year's IPPC?

discussion

RDDL documentation

Q: Sorry, I don't have time to read the RDDL documentation. In brief, what is the new expressiveness that RDDL offers?

complex scalable transition and general reward distributions,
easier expression of exogenous and independent events,
support for multi-valued, integer, and continuous fluents and complex compositional distributions for these fluents,
support for partial observability (POMDPs), and
a clear semantics for concurrent action executions in an unrestricted concurrency stochastic setting... a major motivation for why RDDL (a) specifies actions as fluents, (b) specifies transitions with a stratified lifted dynamic Bayes net (DBN), and (c) uses state-action constraints in place of preconditions.

Q: If RDDL is so expressive, who will be able to compete?

A: It's interesting to note that this expressiveness only differentiates lifted RDDL from lifted PPDDL (at least for the boolean MDP track of the competition). However, almost all planners plan at a ground (factored) level and not at a lifted level. At the ground level, RDDL and PPDDL are equally expressive -- likewise for SPUDD and Symbolic Perseus (for PO-PPDDL) which are inherently ground factored languages. So previous PPDDL / MDP / POMDP ground planners can all compete without modification.

lifted

Q: What drove the development of RDDL?

VIAGRA

factored first-order MDPs

Sanner (PhD Thesis, 2008)

first-order MDP

Boutilier, Reiter, and Price (IJCAI-01)

Sanner and Boutilier (AIJ-09)

Wang (PhD Thesis, 2007)

Wang and Khardon (AAAI-10)

Sanner and Kersting (AAAI-10)

first-order POMDPs

factored first-order POMDPs

RDDL syntax

to ensure all groundings of this representation are guaranteed to be well-defined MDPs or POMDPs and
to promote future research into lifted solutions. Results on (exact) lifted solutions for RDDL will most likely be for restricted subsets of the full RDDL language... searching for an exact lifted solution to full RDDL is hypothesized to be harmful to one's mental health.

Q: Aside from RDDL, were there are any other objectives in running this year's competition?

A: Well, yes, quite a few actually...

One goal is simply to represent more realistic probabilistic planning problems that may help our community connect more with other fields and industry. To this end, competitors will certainly see some new domains this year that were not possible to represent in previous competitions (traffic control with cars moving independenly, elevator control with independent arrivals, UAV reconnaissance, etc.).

A second goal is to pull in the (quite large) partially observed crowd, who have been generally overlooked in previous ICAPS IPPCs. Previous IPPCs have supported conformant planning (no observations, non-deterministic effects), but not partial observations and probabilistic effects as we do this year. This sort of partial observability is heavily used in applications from robotics to speech recognition dialog management to healthcare.

Another goal is to overcome the language barrier between planning under uncertainty researchers that has existed for the last decade or so. There have been two (largely) distinct communities that have addressed planning under uncertainty: the MDP/POMDP community and the "probabilistic planning" community (the former have been focused more on bounded optimality and expected discounted reward, while the latter have focused more on purely goal-oriented problems and satisficing solutions). These two communities rarely compare on the same problems despite the overlap and potential for cross-pollination of ideas. One reason has been that each community has used different language specifications for problems. This is the main reason why we provide translators from RDDL to PPDDL (probabilistic planning community), SPUDD (MDP community), Symbolic Perseus (POMDP community), and other formats. And we note the amazingly diverse competitor pool this year reflects the fruits of this translation effort.

lifted