How did you select the instances?
Nice question. Thank you for asking! How the problems were generated is key to the competition being as "objective" as possible.
Firstly, we will list the attributes we consider desirable, of a method for selecting a problem set to be used to do the evaluation:
- Reproducible; others can follow the method and produce the same (sort of) problems.
- Without bias; does not favour a system against another.
- Useful; not resulting in all problems solved trivially, or all unsolvable.
- General. It can be applied to any set of planners, on any domain.
As previously stated, the aim of the method to generate problems was to make the competition as objective as possible while stretching the complexity of testing instances. Clearly, we have to consider a trade-off between the number of "trivial" instances, i.e. those in which almost all the planners found a very similar solution, and too complex instances, which are not solved by any planner.
Assume there were 20 planners. For each domain the following process was run.
1. Identify sizes.
If Domain is old, then
use the size of larger benchmark problems (top half), and also extended them, following the "trend" used by organisers.
Else (domain has not been used in previous IPCs), then
use some well-known planners, either from literature or from IPC-7 for identifying some challenging problem sizes.
2. Given the sizes identified at step 1, generate between 30 and 50 instances per domain by using available problem generators.
3. Anonymise planners 1..20
4. Run all the 20 planners on the generated instances
5. Collect results in terms of solved problems and quality of solutions.
6. Order problems by number of planners which solved them
If between (circa) 10 and 20 instances have been solved by some considered planners, then
select top 20 instances accordingly to the order at step 6.
Else. If either a very small number of instances have been solved or most of the instances are trivial, then
go back to step 1 and identify different sizes, considering the results obtained.
Else, remove trivial and too complex instances in order to obtain a final set of 20 problems.
8. Remove anonymity from planners
9. Run planners on the chosen 20 benchmmarks, rank planners using IPC score.
In case no generator was available, in step 1 we selected all the available instances (usually, from previous IPCs).