{ "cells": [ { "cell_type": "markdown", "id": "bc80c219", "metadata": {}, "source": [ "# Saving/Loading Models" ] }, { "cell_type": "markdown", "metadata": {}, "source": "The PyXAI library provides functions to save and load models and related hyper-parameters, as well as preselected instances. PyXAI can save several models from an experimental protocol in a directory chosen by the user (named `````` in this example). Each model is associated with an index `````` and two files:\n\n* ```/..map```: JSON file containing training and test indexes, accuracy, solver name, etc.\n* ```/..pkl```: Raw model saved as a Pickle file.\n\nYou can also save preselected instances, which requires an additional file:\n* ```/..instances``` (optional): JSON file containing the indexes of preselected instances.\n" }, { "cell_type": "markdown", "id": "89054ae8", "metadata": {}, "source": [ "## Saving Models" ] }, { "cell_type": "markdown", "metadata": {}, "source": "As an illustration, we use the ```compas``` dataset. We start by training two Random Forests using a leave-one-group-out cross-validation protocol and selecting one instance:" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data:\n", " Number_of_Priors score_factor Age_Above_FourtyFive \n", "0 0 0 1 \\\n", "1 0 0 0 \n", "2 4 0 0 \n", "3 0 0 0 \n", "4 14 1 0 \n", "... ... ... ... \n", "6167 0 1 0 \n", "6168 0 0 0 \n", "6169 0 0 1 \n", "6170 3 0 0 \n", "6171 2 0 0 \n", "\n", " Age_Below_TwentyFive African_American Asian Hispanic \n", "0 0 0 0 0 \\\n", "1 0 1 0 0 \n", "2 1 1 0 0 \n", "3 0 0 0 0 \n", "4 0 0 0 0 \n", "... ... ... ... ... \n", "6167 1 1 0 0 \n", "6168 1 1 0 0 \n", "6169 0 0 0 0 \n", "6170 0 1 0 0 \n", "6171 1 0 0 1 \n", "\n", " Native_American Other Female Misdemeanor Two_yr_Recidivism \n", "0 0 1 0 0 0 \n", "1 0 0 0 0 1 \n", "2 0 0 0 0 1 \n", "3 0 1 0 1 0 \n", "4 0 0 0 0 1 \n", "... ... ... ... ... ... \n", "6167 0 0 0 0 0 \n", "6168 0 0 0 0 0 \n", "6169 0 1 0 0 0 \n", "6170 0 0 1 1 0 \n", "6171 0 0 1 0 1 \n", "\n", "[6172 rows x 12 columns]\n", "-------------- Information ---------------\n", "Dataset name: ../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 12\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Evaluation ---------------\n", "method: LeaveOneGroupOut\n", "output: RF\n", "learner_type: Classification\n", "learner_options: {'max_depth': None, 'random_state': 0}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "metrics:\n", " accuracy: 66.42903434867142\n", "nTraining instances: 3086\n", "nTest instances: 3086\n", "\n", "For the evaluation number 1:\n", "metrics:\n", " accuracy: 64.45236552171096\n", "nTraining instances: 3086\n", "nTest instances: 3086\n", "\n", "--------------- Explainer ----------------\n", "For the evaluation number 0:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 63\n", "\n", "For the evaluation number 1:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 69\n", "\n", "--------------- Instances ----------------\n", "number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": "from pyxai import Learning, Explainer, Tools\n\nlearner = Learning.Scikitlearn(\"../dataset/compas.csv\", problem_type=Learning.CLASSIFICATION)\nmodels = learner.evaluate(splitting_method=Learning.LEAVE_ONE_GROUP_OUT, model_type=Learning.RF, splitting_parameters={'n_models': 2})\ninstance, prediction = learner.get_instances(n=1)" }, { "cell_type": "markdown", "metadata": {}, "source": "The ```save``` method of ```ModelIO``` allows saving the models:" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model saved: (try_save/compas.0.model, try_save/compas.0.map)\n", "Model saved: (try_save/compas.1.model, try_save/compas.1.map)\n" ] } ], "source": "Learning.ModelIO.save(models, \"try_save\")" }, { "cell_type": "markdown", "id": "d82b9196", "metadata": {}, "source": [ "{: .warning }\n", "> If models based on the same dataset already exist in this folder, the method overwrites them." ] }, { "cell_type": "markdown", "id": "9041d85f", "metadata": {}, "source": [ "## Loading Models" ] }, { "cell_type": "markdown", "metadata": {}, "source": "After saving the models, you can reload them in another program using ```load```:" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "---------- Loading Information -----------\n", "mapping file: try_save/compas.0.map\n", "nFeatures (nAttributes, with the labels): 12\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "---------- Loading Information -----------\n", "mapping file: try_save/compas.1.map\n", "nFeatures (nAttributes, with the labels): 12\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "metrics: {'accuracy': 66.42903434867142}\n", "nTraining instances: 3086\n", "nTest instances: 3086\n", "\n", "For the evaluation number 1:\n", "metrics: {'accuracy': 64.45236552171096}\n", "nTraining instances: 3086\n", "nTest instances: 3086\n", "\n", "--------------- Explainer ----------------\n", "For the evaluation number 0:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 63\n", "\n", "For the evaluation number 1:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 69\n", "\n", "sufficient_reason: (-1, -2, -3, -4, 5, -6, -9, -11, -13)\n", "sufficient_reason: (-1, -2, -3, -4, -6, 8, -13)\n" ] } ], "source": "from pyxai import Learning, Explainer\n\nlearner, models = Learning.ModelIO.load(\"try_save\")\n\nfor model in models:\n explainer = Explainer.initialize(model, instance)\n print(\"sufficient_reason:\", explainer.sufficient_reason())" }, { "cell_type": "markdown", "id": "57a62bac", "metadata": {}, "source": [ "## Saving/Loading Instances" ] }, { "cell_type": "markdown", "metadata": {}, "source": "PyXAI also allows saving and loading instances. To this end, we use the ```get_instances``` method with the ```save_directory``` and ```instances_id``` parameters." }, { "cell_type": "markdown", "metadata": {}, "source": "To save instances (more precisely, their indexes), use the ```save_directory``` and ```instances_id``` parameters. To reload them, use the ```indexes``` and ```instances_id``` parameters." }, { "cell_type": "markdown", "id": "3ae1f652", "metadata": {}, "source": [ "In this example, for each of the two models, the indexes of 10 instances of the test set are save into the ```try_save``` directory:" ] }, { "cell_type": "code", "execution_count": 4, "id": "885c8fa0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--------------- Instances ----------------\n", "Indexes of selected instances saved in: try_save/compas.0.instances\n", "number of instances selected: 10\n", "----------------------------------------------\n", "--------------- Instances ----------------\n", "Indexes of selected instances saved in: try_save/compas.1.instances\n", "number of instances selected: 10\n", "----------------------------------------------\n" ] } ], "source": [ "for id, model in enumerate(models):\n", " instances = learner.get_instances(\n", " dataset=\"../dataset/compas.csv\",\n", " indexes=Learning.TEST, \n", " n=10, \n", " model=model, \n", " save_directory=\"try_save\",\n", " instances_id=id)" ] }, { "cell_type": "markdown", "id": "460bbad2", "metadata": {}, "source": [ "{: .attention }\n", "> If the dataset has never been loaded, get_instances does not load it completely and reads only the necessary indexes in the dataset." ] }, { "cell_type": "markdown", "id": "312d0f71", "metadata": {}, "source": [ "Later, in another program, you can load the same instances using these instructions:" ] }, { "cell_type": "code", "execution_count": 5, "id": "31c6dc91", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--------------- Instances ----------------\n", "Loading instances file: try_save/compas.0.instances\n", "number of instances selected: 10\n", "----------------------------------------------\n", "--------------- Instances ----------------\n", "Loading instances file: try_save/compas.1.instances\n", "number of instances selected: 10\n", "----------------------------------------------\n" ] } ], "source": [ "for id, model in enumerate(models):\n", " instances = learner.get_instances(\n", " dataset=\"../dataset/compas.csv\",\n", " indexes=\"try_save\", \n", " model=model, \n", " instances_id=id)" ] }, { "cell_type": "markdown", "metadata": {}, "source": "More information about the ```get_instances``` method is available on the [Generating Models](/documentation/learning/generating/) page." } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }