Link Search Menu Expand Document
PyXAI
Papers Video GitHub EXPEKCTATION About
download notebook

Saving/Loading Models

The PyXAI library implements functions to save and load models and related hyper-parameters for training and to save and load preselected instances. PyXAI can save several models from an experimental protocol in a directory given by the user (named in this example <save_directory> and set to "try_save"). Each model is associated with an identifier <i> and two files:

  • <save_directory>/<dataset>.<i>.map: JSON file containing many information: training_index, test_index, accuracy, solver_name, ‚Ķ
  • <save_directory>/<dataset>.<i>.model: Raw model in the form of Scikit-learn, XGBoost or Generic.

Moreover, you can also save some preselected instances. This requires an additional file:

  • <save_directory>/<dataset>.<i>.instances (optional): JSON file containing the indexes of some preselected instances.

For the models of .model files, PyXAI supports multiple backup formats:

  • Scikit-learn and LightGBM: The raw model is saved using the pickle library.
  • XGBoost: The raw model is saved using the XGBoost built-in backup functions.
  • Generic: The raw model is saved using the own data structures of PyXAI in a JSON File (Not compatible with regression at the moment).

Saving Models

As a matter of illustration, we take the compas dataset. Let us start by creating two Random Forests using a leave-one-group-out cross-validation protocol and choose an instance:

from pyxai import Learning, Explainer, Tools

learner = Learning.Scikitlearn("../dataset/compas.csv", learner_type=Learning.CLASSIFICATION)
models = learner.evaluate(method=Learning.LEAVE_ONE_GROUP_OUT, output=Learning.RF, n_models=2)
instance, prediction = learner.get_instances(n=1)
data:
      Number_of_Priors  score_factor  Age_Above_FourtyFive   
0                    0             0                     1  \
1                    0             0                     0   
2                    4             0                     0   
3                    0             0                     0   
4                   14             1                     0   
...                ...           ...                   ...   
6167                 0             1                     0   
6168                 0             0                     0   
6169                 0             0                     1   
6170                 3             0                     0   
6171                 2             0                     0   

      Age_Below_TwentyFive  African_American  Asian  Hispanic   
0                        0                 0      0         0  \
1                        0                 1      0         0   
2                        1                 1      0         0   
3                        0                 0      0         0   
4                        0                 0      0         0   
...                    ...               ...    ...       ...   
6167                     1                 1      0         0   
6168                     1                 1      0         0   
6169                     0                 0      0         0   
6170                     0                 1      0         0   
6171                     1                 0      0         1   

      Native_American  Other  Female  Misdemeanor  Two_yr_Recidivism  
0                   0      1       0            0                  0  
1                   0      0       0            0                  1  
2                   0      0       0            0                  1  
3                   0      1       0            1                  0  
4                   0      0       0            0                  1  
...               ...    ...     ...          ...                ...  
6167                0      0       0            0                  0  
6168                0      0       0            0                  0  
6169                0      1       0            0                  0  
6170                0      0       1            1                  0  
6171                0      0       1            0                  1  

[6172 rows x 12 columns]
--------------   Information   ---------------
Dataset name: ../dataset/compas.csv
nFeatures (nAttributes, with the labels): 12
nInstances (nObservations): 6172
nLabels: 2
---------------   Evaluation   ---------------
method: LeaveOneGroupOut
output: RF
learner_type: Classification
learner_options: {'max_depth': None, 'random_state': 0}
---------   Evaluation Information   ---------
For the evaluation number 0:
metrics:
   accuracy: 66.42903434867142
nTraining instances: 3086
nTest instances: 3086

For the evaluation number 1:
metrics:
   accuracy: 64.45236552171096
nTraining instances: 3086
nTest instances: 3086

---------------   Explainer   ----------------
For the evaluation number 0:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 63

For the evaluation number 1:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 69

---------------   Instances   ----------------
number of instances selected: 1
----------------------------------------------

The save method allows one to save models:

<Learner Object>.save(models, save_directory, generic=False):
Saves models in the save_directory in the form of two files: <save_directory>/<dataset>.<i>.map and <save_directory>/<dataset>.<i>.model where <i> is the index of a model given in the model parameter. The backup formats of .model files is the same as the <Learner Object> used (Scikit-learn or XGBoost) or Generic if the generic parameter is set to True.
models List of DecisionTree|RandomForest|BoostedTrees: List of models to be saved.
save_directory String: The directory where the models are saved. Creates the directory if it does not exist.
generic Boolean: If generic is set to True, saves the model in the .model file with the own data structures of PyXAI. Default value is False.
learner.save(models, "try_save", generic=True)
Model saved: (try_save/compas.0.model, try_save/compas.0.map)
Model saved: (try_save/compas.1.model, try_save/compas.1.map)

If models based on the same dataset already exist in this folder, the method overwrites them.

Loading Models

After you have saved the data, you can load them into another program.

The save method is part of a <Learner Object> while the load method comes from the Learning module.

Learning.load(models_directory):
Returns a tuple (Learner, models) where the type of Learner is the one chosen when saving:Learning.Generic|Learning.Scikitlearn|Learning.Xgboost. Moreover, the type of models depends on the backup. They can be DecisionTree|RandomForest|BoostedTrees.
models_directory String: The models location.
learner, models = Learning.load(models_directory="try_save") 

for model in models:
  explainer = Explainer.initialize(model, instance)
  print("sufficient_reason:", explainer.sufficient_reason())    
----------   Loading Information   -----------
mapping file: try_save/compas.0.map
nFeatures (nAttributes, with the labels): 12
nInstances (nObservations): 6172
nLabels: 2
----------   Loading Information   -----------
mapping file: try_save/compas.1.map
nFeatures (nAttributes, with the labels): 12
nInstances (nObservations): 6172
nLabels: 2
---------   Evaluation Information   ---------
For the evaluation number 0:
metrics: {'accuracy': 66.42903434867142}
nTraining instances: 3086
nTest instances: 3086

For the evaluation number 1:
metrics: {'accuracy': 64.45236552171096}
nTraining instances: 3086
nTest instances: 3086

---------------   Explainer   ----------------
For the evaluation number 0:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 63

For the evaluation number 1:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 69

sufficient_reason: (-1, -2, -3, -4, 5, -6, -9, -11, -13)
sufficient_reason: (-1, -2, -3, -4, -6, 8, -13)

Saving/Loading Instances

PyXAI also allows to save and load instances. To this purpose, we use the get_instances method.

<Learner Object>.get_instances(model=None, indexes=Indexes.All, *, dataset=None, n=None, correct=None, predictions=None, save_directory=None, instances_id=None):
Returns the instances in a Tuple. Each instance is with the prediction of the model or alone depending on whether the model is specified or not (model=None). An instance is represented by a numpy.array object. Note that when the number of instances requested is 1 (n=1), the method just returns the instance and not a Tuple of instances.
model DecisionTree RandomForest BoostedTrees: The model computed by the evaluation method.
indexes Learning.TRAINING Learning.TEST Learning.MIXED Learning.ALL String: Allows one to get instances from a subset consisting of the training instances (Learning.TRAINING) or of the test instances (Learning.TEST) or of both them by giving priority to training instances (Learning.MIXED). By default set to Learning.ALL that takes into account all instances. Finally, when the indexes parameter is a String, this parameter represents a file containing indexes and the method loads the associated instances.
dataset String pandas.DataFrame: In some situations, this method needs the dataset (Optional).
n Integer: The wanted number of instances (None for all).
correct None True False: Only available if a model is given, selects by default all instances (None) or only correctly classified instances by the model (True) or only misclassified instances by the model (False)
predictions None List of Integer: Only available if a model is given. Selects by default all instances (None) or a List of Integerrepresenting the desired classes/labels of instances to select.
save_directory None String: Saves the instance indexes into a file inside the directory given by this parameter.
instances_id None Integer: To add an identifier into the name of the saved file with the save_directory parameter or useful to load instances using the indexes parameter.

On the one hand, to save instances (more precisely, the indexes of the intances), we use the parameters save_directory and instances_id. On the other hand, to load them, we use the indexes and instances_id parameters.

In this example, for each of the two models, the indexes of 10 instances of the test set are save into the try_save directory:

for id, model in enumerate(models):
    instances = learner.get_instances(
      dataset="../dataset/compas.csv",
      indexes=Learning.TEST, 
      n=10, 
      model=model, 
      save_directory="try_save",
      instances_id=id)
---------------   Instances   ----------------
Indexes of selected instances saved in: try_save/compas.0.instances
number of instances selected: 10
----------------------------------------------
---------------   Instances   ----------------
Indexes of selected instances saved in: try_save/compas.1.instances
number of instances selected: 10
----------------------------------------------

If the dataset has never been loaded, get_instances does not load it completely and reads only the necessary indexes in the dataset.

Later, in another program, you can load the same instances using these instructions:

for id, model in enumerate(models):
    instances = learner.get_instances(
      dataset="../dataset/compas.csv",
      indexes="try_save", 
      model=model, 
      instances_id=id)
---------------   Instances   ----------------
Loading instances file: try_save/compas.0.instances
number of instances selected: 10
----------------------------------------------
---------------   Instances   ----------------
Loading instances file: try_save/compas.1.instances
number of instances selected: 10
----------------------------------------------

More information about the get_instances method is given in the Generating Models pages.