Supplementary Material for Paper #7524
Contents
This archive contains several folders and files:
- ijcai22-7524-proofs.pdf: Proofs of the different propositions
- bin contains binary/executable openwbo required by our scripts (works on Linux only)
- datasets contains examples of datasets that can be used to test our scripts and to reproduce our results
- plot_rf contains additionnal plots for all datasets. For each dataset, we provide two different box plots, the first one concerns the computation time required to generate minimum-weight majoritary reasons for the instances, whereas the second one is about the number of minimum-weight majoritary reasons.
- result_RF_IJCAI is a folder containing .json files that can be generated by launching the command
python generate_data_RF_IJCAI22.py {DatasetName}
(see below). The .json files that have been produced contain the following keys:
- acc: The average accuracy over the 10 random forests produced by the cross validation process
- instance: The list of binarized instances that have been considered in the experiments for the corresponding dataset and the random forests that have been learned. Instances 1 to 25 are those instances picked up in the test set of the first random forest, instances 26 to 50 are associated with the second random forest, and so on
- classified:A list of tuples containing a Boolean value indicating whether the classifier succeeded in determining the right class of the corresponding instance, and a number (1 or 0) making precise this class
- len_bin:A list providing for each random forest two successive numbers: the first one is the number of Boolean features used in the forest, and the second one is the number of original features in the dataset
- dir_r: A list of tuples indicating for each instance the size of a direct reason that has been computed and the computation time needed to get it
- enum_min_maj_r: A list of tuples indicating for each instance the size of each min reasons computed, the computation time neded to get them and a boolean not useful here.
- enum_pref_f_import_r: Same as enum_min_maj_r but for features importance prefered reasons
- enum_pref_shap_r: Same as enum_min_maj_r but for shapely prefered reasons
- enum_wordf_import_r: Same as enum_min_maj_r but for word frequency prefered reasons
- script contains the python scripts that have been written. It also contains the file info_data_RF.json. This file contains the number of trees used for each dataset.
- environment.yml This .yml file is here to help you reproduce our Python environment, which is mandatory to run our scripts.
Software
How to set up our Python environment before running our scripts
- Be sure to use a Linux OS and to use a version of Python 3.x
- Install anaconda
- Open a terminal in this repository with anaconda activated (If conda is activated, you will get "(base)" displayed on your terminal)
- Execute the command
conda env create --file environment.yml
to clone our Python environment in your system
- Execute the command
conda activate paper7524
to activate this environment
How use our scripts
- Go inside script directory
- If you want to generate just one reason of each type (direct, min shap prefered...), just run :
python3 play_with_RF.py compas
(you can change the name of the dataset (see datasets directory).
- If you want to generate new json files with all computed data (can be long for some datasets):
python3 generate_RF_ijcai22.py compas
(you can change the name of the dataset (see datasets directory)).
The new json file will be created inside result_RF_IJCAI-local directory.
Remark: If openwbo does not work, please download it, compile and put the executable inside bin directory.
Description of our scripts
generate_data_RF.py
This script aims to generate results following a 10-cross validation process, as explained in the experimental section of the
paper.
my_tree.py
This script contains pieces of code to encode decision trees and to analyze them
my_forest.py
This script contains pieces of code to encode forests of decision trees (particularly, random forest) and to analyze them
play_with_RF.py
This script allows to compute one reason of each type for a given dataset.
analysis_RF_IJCAI22.py
This script creates the latex table of the paper.
plot_RF.py
This script creates the box plots located in directory plort_rf.
Other scripts
- encodage_CNF.py contains pieces of code to encode propositional formulae into CNF formulae using Tseitin technique
- timeout.py contains pieces of code to trigger a time-out exception