{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "bc80c219",
   "metadata": {},
   "source": [
    "# Saving/Loading Models"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a307abac",
   "metadata": {},
   "source": [
    "The PyXAI library implements functions to save and load models and related hyper-parameters for training and to save and load preselected instances. PyXAI can save several models from an experimental protocol in a directory given by the user (named in this example ```<save_directory>``` and set to ```\"try_save\"```). Each model is associated with an identifier ```<i>``` and two files:  \n",
    " \n",
    "* ```<save_directory>/<dataset>.<i>.map```: JSON file containing many information: training_index, test_index, accuracy, solver_name, ...\n",
    "* ```<save_directory>/<dataset>.<i>.model```: Raw model in the form of Scikit-learn, XGBoost or Generic.\n",
    "\n",
    "Moreover, you can also save some preselected instances. This requires an additional file:\n",
    "* ```<save_directory>/<dataset>.<i>.instances``` (optional): JSON file containing the indexes of some preselected instances.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc9ec87a",
   "metadata": {},
   "source": [
    "{: .attention }\n",
    "> For the models of ```.model``` files, PyXAI supports multiple backup formats:\n",
    "> * Scikit-learn and LightGBM: The raw model is saved using the [pickle](https://docs.python.org/3/library/pickle.html) library. \n",
    "> * XGBoost: The raw model is saved using the [XGBoost built-in backup functions](https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html). \n",
    "> * Generic: The raw model is saved using the own data structures of PyXAI in a JSON File (Not compatible with regression at the moment). "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89054ae8",
   "metadata": {},
   "source": [
    "## Saving Models"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e55570fc",
   "metadata": {},
   "source": [
    "As a matter of illustration, we take the ```compas``` dataset. Let us start by creating two Random Forests using a leave-one-group-out cross-validation protocol and choose an instance: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "c3366cfd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "data:\n",
      "      Number_of_Priors  score_factor  Age_Above_FourtyFive   \n",
      "0                    0             0                     1  \\\n",
      "1                    0             0                     0   \n",
      "2                    4             0                     0   \n",
      "3                    0             0                     0   \n",
      "4                   14             1                     0   \n",
      "...                ...           ...                   ...   \n",
      "6167                 0             1                     0   \n",
      "6168                 0             0                     0   \n",
      "6169                 0             0                     1   \n",
      "6170                 3             0                     0   \n",
      "6171                 2             0                     0   \n",
      "\n",
      "      Age_Below_TwentyFive  African_American  Asian  Hispanic   \n",
      "0                        0                 0      0         0  \\\n",
      "1                        0                 1      0         0   \n",
      "2                        1                 1      0         0   \n",
      "3                        0                 0      0         0   \n",
      "4                        0                 0      0         0   \n",
      "...                    ...               ...    ...       ...   \n",
      "6167                     1                 1      0         0   \n",
      "6168                     1                 1      0         0   \n",
      "6169                     0                 0      0         0   \n",
      "6170                     0                 1      0         0   \n",
      "6171                     1                 0      0         1   \n",
      "\n",
      "      Native_American  Other  Female  Misdemeanor  Two_yr_Recidivism  \n",
      "0                   0      1       0            0                  0  \n",
      "1                   0      0       0            0                  1  \n",
      "2                   0      0       0            0                  1  \n",
      "3                   0      1       0            1                  0  \n",
      "4                   0      0       0            0                  1  \n",
      "...               ...    ...     ...          ...                ...  \n",
      "6167                0      0       0            0                  0  \n",
      "6168                0      0       0            0                  0  \n",
      "6169                0      1       0            0                  0  \n",
      "6170                0      0       1            1                  0  \n",
      "6171                0      0       1            0                  1  \n",
      "\n",
      "[6172 rows x 12 columns]\n",
      "--------------   Information   ---------------\n",
      "Dataset name: ../dataset/compas.csv\n",
      "nFeatures (nAttributes, with the labels): 12\n",
      "nInstances (nObservations): 6172\n",
      "nLabels: 2\n",
      "---------------   Evaluation   ---------------\n",
      "method: LeaveOneGroupOut\n",
      "output: RF\n",
      "learner_type: Classification\n",
      "learner_options: {'max_depth': None, 'random_state': 0}\n",
      "---------   Evaluation Information   ---------\n",
      "For the evaluation number 0:\n",
      "metrics:\n",
      "   accuracy: 66.42903434867142\n",
      "nTraining instances: 3086\n",
      "nTest instances: 3086\n",
      "\n",
      "For the evaluation number 1:\n",
      "metrics:\n",
      "   accuracy: 64.45236552171096\n",
      "nTraining instances: 3086\n",
      "nTest instances: 3086\n",
      "\n",
      "---------------   Explainer   ----------------\n",
      "For the evaluation number 0:\n",
      "**Random Forest Model**\n",
      "nClasses: 2\n",
      "nTrees: 100\n",
      "nVariables: 63\n",
      "\n",
      "For the evaluation number 1:\n",
      "**Random Forest Model**\n",
      "nClasses: 2\n",
      "nTrees: 100\n",
      "nVariables: 69\n",
      "\n",
      "---------------   Instances   ----------------\n",
      "number of instances selected: 1\n",
      "----------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "from pyxai import Learning, Explainer, Tools\n",
    "\n",
    "learner = Learning.Scikitlearn(\"../dataset/compas.csv\", learner_type=Learning.CLASSIFICATION)\n",
    "models = learner.evaluate(method=Learning.LEAVE_ONE_GROUP_OUT, output=Learning.RF, n_models=2)\n",
    "instance, prediction = learner.get_instances(n=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "503016ff",
   "metadata": {},
   "source": [
    "The save method allows one to save models: "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f02bb711",
   "metadata": {},
   "source": [
    "| <font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\" size=\"+1pt\">&lt;Learner Object&gt;.save(models, save_directory, generic=False):</font> | \n",
    "| :----------- | \n",
    "| Saves models in the ```save_directory``` in the form of two files: ```<save_directory>/<dataset>.<i>.map``` and ```<save_directory>/<dataset>.<i>.model``` where ```<i>``` is the index of a model given in the model parameter. The backup formats of ```.model``` files is the same as the &lt;Learner Object&gt; used (Scikit-learn or XGBoost) or Generic if the generic parameter is set to ```True```.   |\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">models</font></b> ```List``` of ```DecisionTree```\\|```RandomForest```\\|```BoostedTrees```: List of models to be saved.|\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">save_directory</font></b> ```String```: The directory where the models are saved. Creates the directory if it does not exist.|\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">generic</font></b> ```Boolean```: If generic is set to ```True```, saves the model in the ```.model``` file with the own data structures of PyXAI. Default value is ```False```.|\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "46b7efe6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model saved: (try_save/compas.0.model, try_save/compas.0.map)\n",
      "Model saved: (try_save/compas.1.model, try_save/compas.1.map)\n"
     ]
    }
   ],
   "source": [
    "learner.save(models, \"try_save\", generic=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d82b9196",
   "metadata": {},
   "source": [
    "{: .warning }\n",
    "> If models based on the same dataset already exist in this folder, the method overwrites them."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9041d85f",
   "metadata": {},
   "source": [
    "## Loading Models"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e13917e",
   "metadata": {},
   "source": [
    "After you have saved the data, you can load them into another program. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8c30acb",
   "metadata": {},
   "source": [
    "{: .attention }\n",
    "> The save method is part of a &lt;Learner Object&gt; while the load method comes from the ```Learning``` module. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "561598de",
   "metadata": {},
   "source": [
    "| <font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\" size=\"+1pt\">Learning.load(models_directory):</font> | \n",
    "| :----------- | \n",
    "| Returns a tuple ```(Learner, models)``` where the type of ```Learner``` is the one chosen when saving:```Learning.Generic```\\|```Learning.Scikitlearn```\\|```Learning.Xgboost```. Moreover, the type of models depends on the backup. They can be ```DecisionTree```\\|```RandomForest```\\|```BoostedTrees```.|\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">models_directory</font></b> ```String```: The models location. |\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "f1b820c7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "----------   Loading Information   -----------\n",
      "mapping file: try_save/compas.0.map\n",
      "nFeatures (nAttributes, with the labels): 12\n",
      "nInstances (nObservations): 6172\n",
      "nLabels: 2\n",
      "----------   Loading Information   -----------\n",
      "mapping file: try_save/compas.1.map\n",
      "nFeatures (nAttributes, with the labels): 12\n",
      "nInstances (nObservations): 6172\n",
      "nLabels: 2\n",
      "---------   Evaluation Information   ---------\n",
      "For the evaluation number 0:\n",
      "metrics: {'accuracy': 66.42903434867142}\n",
      "nTraining instances: 3086\n",
      "nTest instances: 3086\n",
      "\n",
      "For the evaluation number 1:\n",
      "metrics: {'accuracy': 64.45236552171096}\n",
      "nTraining instances: 3086\n",
      "nTest instances: 3086\n",
      "\n",
      "---------------   Explainer   ----------------\n",
      "For the evaluation number 0:\n",
      "**Random Forest Model**\n",
      "nClasses: 2\n",
      "nTrees: 100\n",
      "nVariables: 63\n",
      "\n",
      "For the evaluation number 1:\n",
      "**Random Forest Model**\n",
      "nClasses: 2\n",
      "nTrees: 100\n",
      "nVariables: 69\n",
      "\n",
      "sufficient_reason: (-1, -2, -3, -4, 5, -6, -9, -11, -13)\n",
      "sufficient_reason: (-1, -2, -3, -4, -6, 8, -13)\n"
     ]
    }
   ],
   "source": [
    "learner, models = Learning.load(models_directory=\"try_save\") \n",
    "\n",
    "for model in models:\n",
    "  explainer = Explainer.initialize(model, instance)\n",
    "  print(\"sufficient_reason:\", explainer.sufficient_reason())    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "57a62bac",
   "metadata": {},
   "source": [
    "## Saving/Loading Instances"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8b650f1",
   "metadata": {},
   "source": [
    "PyXAI also allows to save and load instances. To this purpose, we use the ```get_instances``` method.\n",
    "\n",
    "| <font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\" size=\"+1pt\">&lt;Learner Object&gt;.get_instances(model=None, indexes=Indexes.All, *, dataset=None, n=None, correct=None, predictions=None, save_directory=None, instances_id=None):</font> | \n",
    "| :----------- | \n",
    "| Returns the instances in a ```Tuple```. Each instance is with the prediction of the model or alone depending on whether the model is specified or not (model=None). An instance is represented by a ```numpy.array``` object. Note that when the number of instances requested is 1 (n=1), the method just returns the instance and not a ```Tuple``` of instances.|\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">model</font></b> ```DecisionTree``` ```RandomForest``` ```BoostedTrees```: The model computed by the ```evaluation``` method.|\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">indexes</font></b> ```Learning.TRAINING``` ```Learning.TEST``` ```Learning.MIXED``` ```Learning.ALL``` ```String```: Allows one to get instances from a subset consisting of the training instances (```Learning.TRAINING```) or of the test instances (```Learning.TEST```) or of both them by giving priority to training instances (```Learning.MIXED```). By default set to ```Learning.ALL``` that takes into account all instances.  Finally, when the indexes parameter is a ```String```, this parameter represents a file containing indexes and the method loads the associated instances. \n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">dataset</font></b> ```String``` ```pandas.DataFrame```: In some situations, this method needs the dataset (Optional).|\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">n</font></b> ```Integer```: The wanted number of instances (None for all).|\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">correct</font></b> ```None``` ```True``` ```False```: Only available if a model is given, selects by default all instances (```None```) or only correctly classified instances by the model (```True```) or only misclassified instances by the model (```False```)|\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">predictions</font></b> ```None``` ```List of Integer```:  Only available if a model is given. Selects by default all instances (```None```) or a ```List of Integer```representing the desired classes/labels of instances to select.|\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">save_directory</font></b> ```None``` ```String```: Saves the instance indexes into a file inside the directory given by this parameter.|\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">instances_id</font></b> ```None``` ```Integer```: To add an identifier into the name of the saved file with the ```save_directory``` parameter or useful to load instances using the ```indexes parameter```.|\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e9e8dc7",
   "metadata": {},
   "source": [
    "On the one hand, to save instances (more precisely, the indexes of the intances), we use the parameters ```save_directory``` and ```instances_id```. On the other hand, to load them, we use the ```indexes``` and ```instances_id``` parameters. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ae1f652",
   "metadata": {},
   "source": [
    "In this example, for each of the two models, the indexes of 10 instances of the test set are save into the ```try_save``` directory:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "885c8fa0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "---------------   Instances   ----------------\n",
      "Indexes of selected instances saved in: try_save/compas.0.instances\n",
      "number of instances selected: 10\n",
      "----------------------------------------------\n",
      "---------------   Instances   ----------------\n",
      "Indexes of selected instances saved in: try_save/compas.1.instances\n",
      "number of instances selected: 10\n",
      "----------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "for id, model in enumerate(models):\n",
    "    instances = learner.get_instances(\n",
    "      dataset=\"../dataset/compas.csv\",\n",
    "      indexes=Learning.TEST, \n",
    "      n=10, \n",
    "      model=model, \n",
    "      save_directory=\"try_save\",\n",
    "      instances_id=id)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "460bbad2",
   "metadata": {},
   "source": [
    "{: .attention }\n",
    "> If the dataset has never been loaded, get_instances does not load it completely and reads only the necessary indexes in the dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "312d0f71",
   "metadata": {},
   "source": [
    "Later, in another program, you can load the same instances using these instructions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "31c6dc91",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "---------------   Instances   ----------------\n",
      "Loading instances file: try_save/compas.0.instances\n",
      "number of instances selected: 10\n",
      "----------------------------------------------\n",
      "---------------   Instances   ----------------\n",
      "Loading instances file: try_save/compas.1.instances\n",
      "number of instances selected: 10\n",
      "----------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "for id, model in enumerate(models):\n",
    "    instances = learner.get_instances(\n",
    "      dataset=\"../dataset/compas.csv\",\n",
    "      indexes=\"try_save\", \n",
    "      model=model, \n",
    "      instances_id=id)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "947815af",
   "metadata": {},
   "source": [
    "More information about the ```get_instances``` method is given in the [Generating Models](/documentation/learning/generating/) pages.  "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}