{ "cells": [ { "cell_type": "markdown", "id": "bc80c219", "metadata": {}, "source": [ "# Saving/Loading Models" ] }, { "cell_type": "markdown", "id": "a307abac", "metadata": {}, "source": [ "The PyXAI library implements functions to save and load models and related hyper-parameters for training and to save and load preselected instances. PyXAI can save several models from an experimental protocol in a directory given by the user (named in this example `````` and set to ```\"try_save\"```). Each model is associated with an identifier `````` and two files: \n", " \n", "* ```/..map```: JSON file containing many information: training_index, test_index, accuracy, solver_name, ...\n", "* ```/..model```: Raw model in the form of Scikit-learn, XGBoost or Generic.\n", "\n", "Moreover, you can also save some preselected instances. This requires an additional file:\n", "* ```/..instances``` (optional): JSON file containing the indexes of some preselected instances.\n" ] }, { "cell_type": "markdown", "id": "dc9ec87a", "metadata": {}, "source": [ "{: .attention }\n", "> For the models of ```.model``` files, PyXAI supports multiple backup formats:\n", "> * Scikit-learn and LightGBM: The raw model is saved using the [pickle](https://docs.python.org/3/library/pickle.html) library. \n", "> * XGBoost: The raw model is saved using the [XGBoost built-in backup functions](https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html). \n", "> * Generic: The raw model is saved using the own data structures of PyXAI in a JSON File (Not compatible with regression at the moment). " ] }, { "cell_type": "markdown", "id": "89054ae8", "metadata": {}, "source": [ "## Saving Models" ] }, { "cell_type": "markdown", "id": "e55570fc", "metadata": {}, "source": [ "As a matter of illustration, we take the ```compas``` dataset. Let us start by creating two Random Forests using a leave-one-group-out cross-validation protocol and choose an instance: " ] }, { "cell_type": "code", "execution_count": 1, "id": "c3366cfd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data:\n", " Number_of_Priors score_factor Age_Above_FourtyFive \n", "0 0 0 1 \\\n", "1 0 0 0 \n", "2 4 0 0 \n", "3 0 0 0 \n", "4 14 1 0 \n", "... ... ... ... \n", "6167 0 1 0 \n", "6168 0 0 0 \n", "6169 0 0 1 \n", "6170 3 0 0 \n", "6171 2 0 0 \n", "\n", " Age_Below_TwentyFive African_American Asian Hispanic \n", "0 0 0 0 0 \\\n", "1 0 1 0 0 \n", "2 1 1 0 0 \n", "3 0 0 0 0 \n", "4 0 0 0 0 \n", "... ... ... ... ... \n", "6167 1 1 0 0 \n", "6168 1 1 0 0 \n", "6169 0 0 0 0 \n", "6170 0 1 0 0 \n", "6171 1 0 0 1 \n", "\n", " Native_American Other Female Misdemeanor Two_yr_Recidivism \n", "0 0 1 0 0 0 \n", "1 0 0 0 0 1 \n", "2 0 0 0 0 1 \n", "3 0 1 0 1 0 \n", "4 0 0 0 0 1 \n", "... ... ... ... ... ... \n", "6167 0 0 0 0 0 \n", "6168 0 0 0 0 0 \n", "6169 0 1 0 0 0 \n", "6170 0 0 1 1 0 \n", "6171 0 0 1 0 1 \n", "\n", "[6172 rows x 12 columns]\n", "-------------- Information ---------------\n", "Dataset name: ../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 12\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Evaluation ---------------\n", "method: LeaveOneGroupOut\n", "output: RF\n", "learner_type: Classification\n", "learner_options: {'max_depth': None, 'random_state': 0}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "metrics:\n", " accuracy: 66.42903434867142\n", "nTraining instances: 3086\n", "nTest instances: 3086\n", "\n", "For the evaluation number 1:\n", "metrics:\n", " accuracy: 64.45236552171096\n", "nTraining instances: 3086\n", "nTest instances: 3086\n", "\n", "--------------- Explainer ----------------\n", "For the evaluation number 0:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 63\n", "\n", "For the evaluation number 1:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 69\n", "\n", "--------------- Instances ----------------\n", "number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explainer, Tools\n", "\n", "learner = Learning.Scikitlearn(\"../dataset/compas.csv\", learner_type=Learning.CLASSIFICATION)\n", "models = learner.evaluate(method=Learning.LEAVE_ONE_GROUP_OUT, output=Learning.RF, n_models=2)\n", "instance, prediction = learner.get_instances(n=1)" ] }, { "cell_type": "markdown", "id": "503016ff", "metadata": {}, "source": [ "The save method allows one to save models: " ] }, { "cell_type": "markdown", "id": "f02bb711", "metadata": {}, "source": [ "| <Learner Object>.save(models, save_directory, generic=False): | \n", "| :----------- | \n", "| Saves models in the ```save_directory``` in the form of two files: ```/..map``` and ```/..model``` where `````` is the index of a model given in the model parameter. The backup formats of ```.model``` files is the same as the <Learner Object> used (Scikit-learn or XGBoost) or Generic if the generic parameter is set to ```True```. |\n", "| models ```List``` of ```DecisionTree```\\|```RandomForest```\\|```BoostedTrees```: List of models to be saved.|\n", "| save_directory ```String```: The directory where the models are saved. Creates the directory if it does not exist.|\n", "| generic ```Boolean```: If generic is set to ```True```, saves the model in the ```.model``` file with the own data structures of PyXAI. Default value is ```False```.|\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "46b7efe6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model saved: (try_save/compas.0.model, try_save/compas.0.map)\n", "Model saved: (try_save/compas.1.model, try_save/compas.1.map)\n" ] } ], "source": [ "learner.save(models, \"try_save\", generic=True)" ] }, { "cell_type": "markdown", "id": "d82b9196", "metadata": {}, "source": [ "{: .warning }\n", "> If models based on the same dataset already exist in this folder, the method overwrites them." ] }, { "cell_type": "markdown", "id": "9041d85f", "metadata": {}, "source": [ "## Loading Models" ] }, { "cell_type": "markdown", "id": "0e13917e", "metadata": {}, "source": [ "After you have saved the data, you can load them into another program. " ] }, { "cell_type": "markdown", "id": "a8c30acb", "metadata": {}, "source": [ "{: .attention }\n", "> The save method is part of a <Learner Object> while the load method comes from the ```Learning``` module. " ] }, { "cell_type": "markdown", "id": "561598de", "metadata": {}, "source": [ "| Learning.load(models_directory): | \n", "| :----------- | \n", "| Returns a tuple ```(Learner, models)``` where the type of ```Learner``` is the one chosen when saving:```Learning.Generic```\\|```Learning.Scikitlearn```\\|```Learning.Xgboost```. Moreover, the type of models depends on the backup. They can be ```DecisionTree```\\|```RandomForest```\\|```BoostedTrees```.|\n", "| models_directory ```String```: The models location. |\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "f1b820c7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "---------- Loading Information -----------\n", "mapping file: try_save/compas.0.map\n", "nFeatures (nAttributes, with the labels): 12\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "---------- Loading Information -----------\n", "mapping file: try_save/compas.1.map\n", "nFeatures (nAttributes, with the labels): 12\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "metrics: {'accuracy': 66.42903434867142}\n", "nTraining instances: 3086\n", "nTest instances: 3086\n", "\n", "For the evaluation number 1:\n", "metrics: {'accuracy': 64.45236552171096}\n", "nTraining instances: 3086\n", "nTest instances: 3086\n", "\n", "--------------- Explainer ----------------\n", "For the evaluation number 0:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 63\n", "\n", "For the evaluation number 1:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 69\n", "\n", "sufficient_reason: (-1, -2, -3, -4, 5, -6, -9, -11, -13)\n", "sufficient_reason: (-1, -2, -3, -4, -6, 8, -13)\n" ] } ], "source": [ "learner, models = Learning.load(models_directory=\"try_save\") \n", "\n", "for model in models:\n", " explainer = Explainer.initialize(model, instance)\n", " print(\"sufficient_reason:\", explainer.sufficient_reason()) " ] }, { "cell_type": "markdown", "id": "57a62bac", "metadata": {}, "source": [ "## Saving/Loading Instances" ] }, { "cell_type": "markdown", "id": "d8b650f1", "metadata": {}, "source": [ "PyXAI also allows to save and load instances. To this purpose, we use the ```get_instances``` method.\n", "\n", "| <Learner Object>.get_instances(model=None, indexes=Indexes.All, *, dataset=None, n=None, correct=None, predictions=None, save_directory=None, instances_id=None): | \n", "| :----------- | \n", "| Returns the instances in a ```Tuple```. Each instance is with the prediction of the model or alone depending on whether the model is specified or not (model=None). An instance is represented by a ```numpy.array``` object. Note that when the number of instances requested is 1 (n=1), the method just returns the instance and not a ```Tuple``` of instances.|\n", "| model ```DecisionTree``` ```RandomForest``` ```BoostedTrees```: The model computed by the ```evaluation``` method.|\n", "| indexes ```Learning.TRAINING``` ```Learning.TEST``` ```Learning.MIXED``` ```Learning.ALL``` ```String```: Allows one to get instances from a subset consisting of the training instances (```Learning.TRAINING```) or of the test instances (```Learning.TEST```) or of both them by giving priority to training instances (```Learning.MIXED```). By default set to ```Learning.ALL``` that takes into account all instances. Finally, when the indexes parameter is a ```String```, this parameter represents a file containing indexes and the method loads the associated instances. \n", "| dataset ```String``` ```pandas.DataFrame```: In some situations, this method needs the dataset (Optional).|\n", "| n ```Integer```: The wanted number of instances (None for all).|\n", "| correct ```None``` ```True``` ```False```: Only available if a model is given, selects by default all instances (```None```) or only correctly classified instances by the model (```True```) or only misclassified instances by the model (```False```)|\n", "| predictions ```None``` ```List of Integer```: Only available if a model is given. Selects by default all instances (```None```) or a ```List of Integer```representing the desired classes/labels of instances to select.|\n", "| save_directory ```None``` ```String```: Saves the instance indexes into a file inside the directory given by this parameter.|\n", "| instances_id ```None``` ```Integer```: To add an identifier into the name of the saved file with the ```save_directory``` parameter or useful to load instances using the ```indexes parameter```.|\n" ] }, { "cell_type": "markdown", "id": "2e9e8dc7", "metadata": {}, "source": [ "On the one hand, to save instances (more precisely, the indexes of the intances), we use the parameters ```save_directory``` and ```instances_id```. On the other hand, to load them, we use the ```indexes``` and ```instances_id``` parameters. " ] }, { "cell_type": "markdown", "id": "3ae1f652", "metadata": {}, "source": [ "In this example, for each of the two models, the indexes of 10 instances of the test set are save into the ```try_save``` directory:" ] }, { "cell_type": "code", "execution_count": 4, "id": "885c8fa0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--------------- Instances ----------------\n", "Indexes of selected instances saved in: try_save/compas.0.instances\n", "number of instances selected: 10\n", "----------------------------------------------\n", "--------------- Instances ----------------\n", "Indexes of selected instances saved in: try_save/compas.1.instances\n", "number of instances selected: 10\n", "----------------------------------------------\n" ] } ], "source": [ "for id, model in enumerate(models):\n", " instances = learner.get_instances(\n", " dataset=\"../dataset/compas.csv\",\n", " indexes=Learning.TEST, \n", " n=10, \n", " model=model, \n", " save_directory=\"try_save\",\n", " instances_id=id)" ] }, { "cell_type": "markdown", "id": "460bbad2", "metadata": {}, "source": [ "{: .attention }\n", "> If the dataset has never been loaded, get_instances does not load it completely and reads only the necessary indexes in the dataset." ] }, { "cell_type": "markdown", "id": "312d0f71", "metadata": {}, "source": [ "Later, in another program, you can load the same instances using these instructions:" ] }, { "cell_type": "code", "execution_count": 5, "id": "31c6dc91", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--------------- Instances ----------------\n", "Loading instances file: try_save/compas.0.instances\n", "number of instances selected: 10\n", "----------------------------------------------\n", "--------------- Instances ----------------\n", "Loading instances file: try_save/compas.1.instances\n", "number of instances selected: 10\n", "----------------------------------------------\n" ] } ], "source": [ "for id, model in enumerate(models):\n", " instances = learner.get_instances(\n", " dataset=\"../dataset/compas.csv\",\n", " indexes=\"try_save\", \n", " model=model, \n", " instances_id=id)" ] }, { "cell_type": "markdown", "id": "947815af", "metadata": {}, "source": [ "More information about the ```get_instances``` method is given in the [Generating Models](/documentation/learning/generating/) pages. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }