Benchmarks selection
The steering committee decided of a random selection of instances, selecting (when possible) the same number of instances per domain. A domain is a set of similar instances. Basically, one directory in an archive equals one domain, but some directories must be merged into a single domain when they contain similar instances. At most 5% of the whole instance set can be selected per submitter. This is meant to avoid anyone to have too much influence on the competition results.
Past competition instances were classified by Johannes Fichte (thanks!) based on data in Appendix A of Pavel Smirnov's thesis. That classification is available in file 1-instance_list_domains.csv. New instances submitted to PB24 were classified by the organizer according to the name of the directories.
The goal was to obtain between 300 and 600 instances in the DEC-LIN and OPT-LIN tracks. Given the number of domains, 10 instances were randomly selected per domain in all tracks, except in the DEC-LIN track where 15 instances per domain were selected. If a domain doesn't contain enough instances, they are all selected. Duplicate instances are removed. This test is performed by checking the md5 checksum of instances. This means that the check is purely syntactical. It doesn't identify that an instance where contraints are reordered or variables renamed can be the same as another instance.
Most contributors submitted only a few domains, and therefore couldn't hit the limit of 5% of the whole set of instances. In constrast, Jakod Nordström submitted many instances and domains. Three of these domains were considered to be well known instances, and therefore were not subject to the 5% limit: MIPLIB, KNAP, multverif. Many instances were merged into preexisting domains (the ones identified in the past competitions instances). The rest was subject to the limit of 5% of the whole instance set: 24 instances were selected in 7 domains (4 instances in domains evencolouring, vertex_cover, maxcut and 3 intances in domains injcomp, dominating_set, linpeb, compose pebling.). Some submitted instances contained trivially unsatisfiable contraints and were not considered (not even normalized).
The archive PB24-selection.tgz contains the source files used to perform the selection. The program was run on a system with glibc 2.37-19 (Fedora 38). The random seed used is 2024. The archive contains:
- runSelection: this is the script that runs the selection process in all categories. Running this script will generate *-selection.list files (as well as *-selection.sql files, which are used for registering the selection in the database). The *-selection.list files contain the list of selected instances.
- {dec-lin,dec-nlc,opt-lin,opt-nlc,wbo}.list: these are the list of available instances, grouped by domains, with the number of instances to select per domain
- md5.list: fingerprint of each intance
- select.cc: the program performing the random selection. It is derived from the one available on https://www.cril.univ-artois.fr/XCSP19/selection/. It adds support for removal of duplicates (based on the md5 fingerprint). This program was written in a rush and therefore not well documented, not efficient and not forgiving.
- savedResult/: contains the selection actually used for the competition. Running the selection script should yield the same files as in this directory.
Selecting instances is a difficult subject. The selection made is not perfect and some mistakes were probably made. However, it is hopefully clear that the selection was automated, mainly random and largely independent of the submitted solvers.
Additional technical details
Coming soon...