XCSP 2019 Solvers Competition: benchmark selection (draft)

This year, the selection of instances will be automated. This document describes the selection procedure.

Generalities

The number of selected instances will be 300 for both CSP and COP tracks
The pseudo random generator will be initialized with 2019 as seed.
The selection of instances will be based on an estimated hardness. This hardness is evaluated by running the top 3 solvers of last year competition (reference solvers).

Hardness estimation

The hardness of an instance can be defined as the minimum time required to solve it (by any solver that has no prior knowledge of that instance). In practice, this hardness will be estimated by running a limited number of solvers.

The hardness of an instances (hardness score) will be evaluated by averaging the PAR2 score of the top 3 solvers of the last competition, with a timeout set to 40 minutes. Only one version of a solver may appear in this top 3. In 2018, the top 3 solvers in the CSP main track were scop (order+MapleCOMSPS), PicatSAT (2018-08-14) and Mistral-2.0 (2018-08-01). In the COP main track, the top 3 solvers were PicatSAT (2018-08-14), Concrete (3.9.2) and Choco-solver (4.0.7b seq (e747e1e)).

Using only 3 solvers for estimating the hardness introduces a slight but obvious bias. Using the VBS would be slightly better, but is not computationally affordable. As an illustration, it takes 698 days of CPU time to evaluate the hardness of 24451 instances with 3 solvers.

The PAR2 score is equal to the CPU time of the solver when the instance is solved, and 2 times the timeout when the instance is unsolved (when the solver reaches the timeout, or doesn't answer for any other reason). In CSP, the instance is solved when the solver answers SAT or UNSAT. In COP, the instance is solved when the solver answers OPT or UNSAT.

Instances selection (theory)

Here are the rules governing the instances selection:

At most 10 instances can be selected per series (defined as a directory in the instances distribution)
At most half of the instances should be new (submitted to the competition).
At least 10 instances will be selected in series submitted to the competition (fresh, unpublished instances are particularly valuable in a competition). If the previous constraint is violated, only 150/(number of new series) instances will be selected per new series.
Instances will be classified in 4 categories:
- easy: the average PAR2 score is less or equal to 2 minutes. Theses instances are likely to be solved by every solver, and therefore won't help discriminating the solvers.
- medium: the average PAR2 score (in minutes) is within the range (2,30]
- hard: the average PAR2 score (in minutes) is within the range (30,80)
- open: the average PAR2 score is 80 minutes (twice the timeout of the reference solvers, which means that none of them gave an answer). Theses instances are likely to remain unsolved but represent an interesting challenge.
The selection will retain 10% of easy instances, 35% of medium instances, 35% of hard instances and 20% of open instances.
Picking N instances randomly inside a category does not guarantee a uniform distribution of hardness scores. Therefore, the interval of hardness scores defining a category will be divided into 10 sub-intervals of equal size and, when possible, N/10 instances will be picked randomly in each sub-interval. If an interval doesn't contain enough instances to satisfy all the contraints, the number of instances that should have been selected in this sub-interval will be reported to its neighbours.
To avoid picking only easy (resp. medium, hard, very hard) instances in a series, only one instance is selected at a time in a category. This means we first select an easy instance, then a medium instance, then a hard instance and an open instance. The process is repeated as long as necessary.
If the quota of the series is reached, the series is disabled (no instance can be selected in this series any more).
Instances submitted to the competition will be selected first.

Instances selection (practice)

In practice, it was sometimes impossible to satisfy all rules (because the list of instances was not diverse enough) and some rules were slightly simplified to facilitate the implementation. Here are the changes that have been made to stay as close as possible to the above rules, while avoiding unsatisfiable constraints.

For the main tracks, CSP and COP, the maximum number of instances per series has been raised to 15 and a series is defined by the first level of directory (e.g. XCSP17/Random and not XCSP17/Random/*).
For the mini track, some directories have been merged into a single series to avoid too many instances from the same problem (for CSP: {"XCSP17/Random","XCSP17/Crossword/Crossword-m1c-lex","XCSP17/Crossword/Crossword-m1c-ogd","XCSP17/Crossword/Crossword-m1c-uk","XCSP17/Crossword/Crossword-m1c-words","XCSP17/Subisomorphism/Subisomorphism-m1-si"}, for COP: {"XCSP17/Random","XCSP17/PseudoBoolean"}). Only 200 CSP instances and 180 COP instances could be selected. For COP, we had to switch to 30% easy, 30% medium, 10% hard, 30% open, with at most 20 instances per series.

Publication

The program and data used to perform the selection is available in the archive XCSP19-selection.tar.gz. The SHA512 hash of the program and data files was sent to contestants (to guarantee that the organizers can't change the selection) but the selection was not disclosed until the end of the competition (to prevent any possibility to tune one's solver).