Papers Video GitHub In-the-Loop EXPEKCTATION About

Saving/Loading Models

The PyXAI library implements functions to save and load models and related hyper-parameters for training and to save and load preselected instances. PyXAI can save several models from an experimental protocol in a directory given by the user (named in this example <save_directory> and set to "try_save"). Each model is associated with an identifier <i> and two files:

<save_directory>/<dataset>.<i>.map: JSON file containing many information: training_index, test_index, accuracy, solver_name, …
<save_directory>/<dataset>.<i>.model: Raw model in the form of Scikit-learn, XGBoost or Generic.

Moreover, you can also save some preselected instances. This requires an additional file:

<save_directory>/<dataset>.<i>.instances (optional): JSON file containing the indexes of some preselected instances.

For the models of .model files, PyXAI supports multiple backup formats:

Scikit-learn and LightGBM: The raw model is saved using the pickle library.

XGBoost: The raw model is saved using the XGBoost built-in backup functions.

Generic: The raw model is saved using the own data structures of PyXAI in a JSON File (Not compatible with regression at the moment).

Saving Models

As a matter of illustration, we take the compas dataset. Let us start by creating two Random Forests using a leave-one-group-out cross-validation protocol and choose an instance:

from pyxai import Learning, Explainer, Tools

learner = Learning.Scikitlearn("../dataset/compas.csv", learner_type=Learning.CLASSIFICATION)
models = learner.evaluate(method=Learning.LEAVE_ONE_GROUP_OUT, output=Learning.RF, n_models=2)
instance, prediction = learner.get_instances(n=1)

data:
      Number_of_Priors  score_factor  Age_Above_FourtyFive   
0                    0             0                     1  \
1                    0             0                     0   
2                    4             0                     0   
3                    0             0                     0   
4                   14             1                     0   
...                ...           ...                   ...   
6167                 0             1                     0   
6168                 0             0                     0   
6169                 0             0                     1   
6170                 3             0                     0   
6171                 2             0                     0   

      Age_Below_TwentyFive  African_American  Asian  Hispanic   
0                        0                 0      0         0  \
1                        0                 1      0         0   
2                        1                 1      0         0   
3                        0                 0      0         0   
4                        0                 0      0         0   
...                    ...               ...    ...       ...   
6167                     1                 1      0         0   
6168                     1                 1      0         0   
6169                     0                 0      0         0   
6170                     0                 1      0         0   
6171                     1                 0      0         1   

      Native_American  Other  Female  Misdemeanor  Two_yr_Recidivism  
0                   0      1       0            0                  0  
1                   0      0       0            0                  1  
2                   0      0       0            0                  1  
3                   0      1       0            1                  0  
4                   0      0       0            0                  1  
...               ...    ...     ...          ...                ...  
6167                0      0       0            0                  0  
6168                0      0       0            0                  0  
6169                0      1       0            0                  0  
6170                0      0       1            1                  0  
6171                0      0       1            0                  1  

[6172 rows x 12 columns]
--------------   Information   ---------------
Dataset name: ../dataset/compas.csv
nFeatures (nAttributes, with the labels): 12
nInstances (nObservations): 6172
nLabels: 2
---------------   Evaluation   ---------------
method: LeaveOneGroupOut
output: RF
learner_type: Classification
learner_options: {'max_depth': None, 'random_state': 0}
---------   Evaluation Information   ---------
For the evaluation number 0:
metrics:
   accuracy: 66.42903434867142
nTraining instances: 3086
nTest instances: 3086

For the evaluation number 1:
metrics:
   accuracy: 64.45236552171096
nTraining instances: 3086
nTest instances: 3086

---------------   Explainer   ----------------
For the evaluation number 0:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 63

For the evaluation number 1:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 69

---------------   Instances   ----------------
number of instances selected: 1
----------------------------------------------

The save method allows one to save models:

<Learner Object>.save(models, save_directory, generic=False):
Saves models in the `save_directory` in the form of two files: `<save_directory>/<dataset>.<i>.map` and `<save_directory>/<dataset>.<i>.model` where `<i>` is the index of a model given in the model parameter. The backup formats of `.model` files is the same as the <Learner Object> used (Scikit-learn or XGBoost) or Generic if the generic parameter is set to `True`.
models `List` of `DecisionTree`\|`RandomForest`\|`BoostedTrees`: List of models to be saved.
save_directory `String`: The directory where the models are saved. Creates the directory if it does not exist.
generic `Boolean`: If generic is set to `True`, saves the model in the `.model` file with the own data structures of PyXAI. Default value is `False`.

learner.save(models, "try_save", generic=True)

Model saved: (try_save/compas.0.model, try_save/compas.0.map)
Model saved: (try_save/compas.1.model, try_save/compas.1.map)

If models based on the same dataset already exist in this folder, the method overwrites them.

Loading Models

After you have saved the data, you can load them into another program.

The save method is part of a <Learner Object> while the load method comes from the Learning module.

Learning.load(models_directory):
Returns a tuple `(Learner, models)` where the type of `Learner` is the one chosen when saving:`Learning.Generic`\|`Learning.Scikitlearn`\|`Learning.Xgboost`. Moreover, the type of models depends on the backup. They can be `DecisionTree`\|`RandomForest`\|`BoostedTrees`.
models_directory `String`: The models location.

learner, models = Learning.load(models_directory="try_save") 

for model in models:
  explainer = Explainer.initialize(model, instance)
  print("sufficient_reason:", explainer.sufficient_reason())    

----------   Loading Information   -----------
mapping file: try_save/compas.0.map
nFeatures (nAttributes, with the labels): 12
nInstances (nObservations): 6172
nLabels: 2
----------   Loading Information   -----------
mapping file: try_save/compas.1.map
nFeatures (nAttributes, with the labels): 12
nInstances (nObservations): 6172
nLabels: 2
---------   Evaluation Information   ---------
For the evaluation number 0:
metrics: {'accuracy': 66.42903434867142}
nTraining instances: 3086
nTest instances: 3086

For the evaluation number 1:
metrics: {'accuracy': 64.45236552171096}
nTraining instances: 3086
nTest instances: 3086

---------------   Explainer   ----------------
For the evaluation number 0:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 63

For the evaluation number 1:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 69

sufficient_reason: (-1, -2, -3, -4, 5, -6, -9, -11, -13)
sufficient_reason: (-1, -2, -3, -4, -6, 8, -13)

Saving/Loading Instances

PyXAI also allows to save and load instances. To this purpose, we use the get_instances method.

<Learner Object>.get_instances(model=None, indexes=Indexes.All, *, dataset=None, n=None, correct=None, predictions=None, save_directory=None, instances_id=None):
Returns the instances in a `Tuple`. Each instance is with the prediction of the model or alone depending on whether the model is specified or not (model=None). An instance is represented by a `numpy.array` object. Note that when the number of instances requested is 1 (n=1), the method just returns the instance and not a `Tuple` of instances.
model `DecisionTree` `RandomForest` `BoostedTrees`: The model computed by the `evaluation` method.
indexes `Learning.TRAINING` `Learning.TEST` `Learning.MIXED` `Learning.ALL` `String`: Allows one to get instances from a subset consisting of the training instances (`Learning.TRAINING`) or of the test instances (`Learning.TEST`) or of both them by giving priority to training instances (`Learning.MIXED`). By default set to `Learning.ALL` that takes into account all instances. Finally, when the indexes parameter is a `String`, this parameter represents a file containing indexes and the method loads the associated instances.
dataset `String` `pandas.DataFrame`: In some situations, this method needs the dataset (Optional).
n `Integer`: The wanted number of instances (None for all).
correct `None` `True` `False`: Only available if a model is given, selects by default all instances (`None`) or only correctly classified instances by the model (`True`) or only misclassified instances by the model (`False`)
predictions `None` `List of Integer`: Only available if a model is given. Selects by default all instances (`None`) or a `List of Integer`representing the desired classes/labels of instances to select.
save_directory `None` `String`: Saves the instance indexes into a file inside the directory given by this parameter.
instances_id `None` `Integer`: To add an identifier into the name of the saved file with the `save_directory` parameter or useful to load instances using the `indexes parameter`.

On the one hand, to save instances (more precisely, the indexes of the intances), we use the parameters save_directory and instances_id. On the other hand, to load them, we use the indexes and instances_id parameters.

In this example, for each of the two models, the indexes of 10 instances of the test set are save into the try_save directory:

for id, model in enumerate(models):
    instances = learner.get_instances(
      dataset="../dataset/compas.csv",
      indexes=Learning.TEST, 
      n=10, 
      model=model, 
      save_directory="try_save",
      instances_id=id)

---------------   Instances   ----------------
Indexes of selected instances saved in: try_save/compas.0.instances
number of instances selected: 10
----------------------------------------------
---------------   Instances   ----------------
Indexes of selected instances saved in: try_save/compas.1.instances
number of instances selected: 10
----------------------------------------------

If the dataset has never been loaded, get_instances does not load it completely and reads only the necessary indexes in the dataset.

Later, in another program, you can load the same instances using these instructions:

for id, model in enumerate(models):
    instances = learner.get_instances(
      dataset="../dataset/compas.csv",
      indexes="try_save", 
      model=model, 
      instances_id=id)

---------------   Instances   ----------------
Loading instances file: try_save/compas.0.instances
number of instances selected: 10
----------------------------------------------
---------------   Instances   ----------------
Loading instances file: try_save/compas.1.instances
number of instances selected: 10
----------------------------------------------

More information about the get_instances method is given in the Generating Models pages.