{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Importing Models From Libraries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PyXAI can [generate models]({{ site.baseurl }}/pyxai/documentation/learning/generating/) for you. Indeed, it provides dedicated functions that simplify this task. However, if your model has already been trained, you may want to import it into PyXAI in order to extract explanations. This page explains how to perform such a task."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Procedure "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "Consider the following source code to create a ```RandomForestClassifier``` using [Scikit-learn](https://scikit-learn.org/stable/): "
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from sklearn import datasets\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "\n",
    "model_rf = RandomForestClassifier(random_state=0)\n",
    "data = datasets.load_breast_cancer(as_frame=True)\n",
    "X = data.data.to_numpy()\n",
    "Y = data.target.to_numpy()\n",
    "\n",
    "feature_names = data.feature_names\n",
    "model_rf.fit(X, Y);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can import this ML model using the ```import_models``` method of the ```ModelIO``` class:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is a table summarizing the library compatibility of ```import_models```:\n\n<table>\n<thead>\n  <tr>\n    <th>Type</th>\n    <th>Scikit-learn</th>\n    <th>Xgboost</th>\n    <th>LightGBM</th>\n  </tr>\n</thead>\n<tbody>\n  <tr>\n    <td style=\"text-align:center\">Decision Tree</td>\n    <td style=\"text-align:center\">DecisionTreeClassifier</td>\n    <td style=\"text-align:center\"></td>\n    <td style=\"text-align:center\"></td>\n  </tr>\n  <tr>\n    <td style=\"text-align:center\">Random Forest</td>\n    <td style=\"text-align:center\">RandomForestClassifier</td>\n    <td style=\"text-align:center\"></td>\n    <td style=\"text-align:center\"></td>\n  </tr>\n  <tr>\n    <td style=\"text-align:center\">Boosted Tree</td>\n    <td style=\"text-align:center\"></td>\n    <td style=\"text-align:center\">XGBClassifier<br>XGBRegressor</td>\n    <td style=\"text-align:center\">LGBMRegressor</td>\n  </tr>\n</tbody>\n</table>\n<br>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyxai import Learning, Explainer\n\nlearner, model = Learning.ModelIO.import_models(model_rf, instances_type='tabular')\nlearner.feature_names = feature_names"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, you can get explanations by executing: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "---------------   Instances   ----------------\n",
      "data:\n",
      "     mean radius  mean texture  mean perimeter  mean area  mean smoothness   \n",
      "0          17.99         10.38          122.80     1001.0          0.11840  \\\n",
      "1          20.57         17.77          132.90     1326.0          0.08474   \n",
      "2          19.69         21.25          130.00     1203.0          0.10960   \n",
      "3          11.42         20.38           77.58      386.1          0.14250   \n",
      "4          20.29         14.34          135.10     1297.0          0.10030   \n",
      "..           ...           ...             ...        ...              ...   \n",
      "564        21.56         22.39          142.00     1479.0          0.11100   \n",
      "565        20.13         28.25          131.20     1261.0          0.09780   \n",
      "566        16.60         28.08          108.30      858.1          0.08455   \n",
      "567        20.60         29.33          140.10     1265.0          0.11780   \n",
      "568         7.76         24.54           47.92      181.0          0.05263   \n",
      "\n",
      "     mean compactness  mean concavity  mean concave points  mean symmetry   \n",
      "0             0.27760         0.30010              0.14710         0.2419  \\\n",
      "1             0.07864         0.08690              0.07017         0.1812   \n",
      "2             0.15990         0.19740              0.12790         0.2069   \n",
      "3             0.28390         0.24140              0.10520         0.2597   \n",
      "4             0.13280         0.19800              0.10430         0.1809   \n",
      "..                ...             ...                  ...            ...   \n",
      "564           0.11590         0.24390              0.13890         0.1726   \n",
      "565           0.10340         0.14400              0.09791         0.1752   \n",
      "566           0.10230         0.09251              0.05302         0.1590   \n",
      "567           0.27700         0.35140              0.15200         0.2397   \n",
      "568           0.04362         0.00000              0.00000         0.1587   \n",
      "\n",
      "     mean fractal dimension  ...  worst texture  worst perimeter  worst area   \n",
      "0                   0.07871  ...          17.33           184.60      2019.0  \\\n",
      "1                   0.05667  ...          23.41           158.80      1956.0   \n",
      "2                   0.05999  ...          25.53           152.50      1709.0   \n",
      "3                   0.09744  ...          26.50            98.87       567.7   \n",
      "4                   0.05883  ...          16.67           152.20      1575.0   \n",
      "..                      ...  ...            ...              ...         ...   \n",
      "564                 0.05623  ...          26.40           166.10      2027.0   \n",
      "565                 0.05533  ...          38.25           155.00      1731.0   \n",
      "566                 0.05648  ...          34.12           126.70      1124.0   \n",
      "567                 0.07016  ...          39.42           184.60      1821.0   \n",
      "568                 0.05884  ...          30.37            59.16       268.6   \n",
      "\n",
      "     worst smoothness  worst compactness  worst concavity   \n",
      "0             0.16220            0.66560           0.7119  \\\n",
      "1             0.12380            0.18660           0.2416   \n",
      "2             0.14440            0.42450           0.4504   \n",
      "3             0.20980            0.86630           0.6869   \n",
      "4             0.13740            0.20500           0.4000   \n",
      "..                ...                ...              ...   \n",
      "564           0.14100            0.21130           0.4107   \n",
      "565           0.11660            0.19220           0.3215   \n",
      "566           0.11390            0.30940           0.3403   \n",
      "567           0.16500            0.86810           0.9387   \n",
      "568           0.08996            0.06444           0.0000   \n",
      "\n",
      "     worst concave points  worst symmetry  worst fractal dimension  target  \n",
      "0                  0.2654          0.4601                  0.11890       0  \n",
      "1                  0.1860          0.2750                  0.08902       0  \n",
      "2                  0.2430          0.3613                  0.08758       0  \n",
      "3                  0.2575          0.6638                  0.17300       0  \n",
      "4                  0.1625          0.2364                  0.07678       0  \n",
      "..                    ...             ...                      ...     ...  \n",
      "564                0.2216          0.2060                  0.07115       0  \n",
      "565                0.1628          0.2572                  0.06637       0  \n",
      "566                0.1418          0.2218                  0.07820       0  \n",
      "567                0.2650          0.4087                  0.12400       0  \n",
      "568                0.0000          0.2871                  0.07039       1  \n",
      "\n",
      "[569 rows x 31 columns]\n",
      "--------------   Information   ---------------\n",
      "Dataset name: pandas.core.frame.DataFrame\n",
      "nFeatures (nAttributes, with the labels): 31\n",
      "nInstances (nObservations): 569\n",
      "nLabels: 2\n",
      "number of instances selected: 1\n",
      "----------------------------------------------\n",
      "instance: [1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01 2.776e-01 3.001e-01\n",
      " 1.471e-01 2.419e-01 7.871e-02 1.095e+00 9.053e-01 8.589e+00 1.534e+02\n",
      " 6.399e-03 4.904e-02 5.373e-02 1.587e-02 3.003e-02 6.193e-03 2.538e+01\n",
      " 1.733e+01 1.846e+02 2.019e+03 1.622e-01 6.656e-01 7.119e-01 2.654e-01\n",
      " 4.601e-01 1.189e-01]\n",
      "prediction: 0\n"
     ]
    }
   ],
   "source": [
    "instance, prediction = learner.get_instances(dataset=data.frame, model=model, n=1)\n",
    "print(\"instance:\", instance)\n",
    "print(\"prediction:\", prediction)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "len direct reason: 294\n",
      "len sufficient reason: 159\n",
      "to_features: ('mean radius > 15.045000076293945', 'mean texture <= 11.585000038146973', 'mean perimeter > 96.57999801635742', 'mean area > 694.5', 'mean smoothness > 0.09075499698519707', 'mean compactness > 0.09524999931454659', 'mean concavity > 0.17409999668598175', 'mean concave points > 0.07939000055193901', 'mean symmetry > 0.12639999762177467', 'radius error > 0.7730999886989594', 'texture error > 0.7377500236034393', 'perimeter error > 2.76200008392334', 'area error > 33.064998626708984', 'smoothness error in ]0.005567499902099371, 0.009928999934345484]', 'compactness error > 0.00834800023585558', 'concavity error in ]0.018459999933838844, 0.2157999947667122]', 'fractal dimension error in ]0.0030724999960511923, 0.012140000239014626]', 'worst radius > 17.72499942779541', 'worst texture in ]15.434999942779541, 18.289999961853027]', 'worst perimeter > 120.70000076293945', 'worst area > 953.7000122070312', 'worst smoothness > 0.1363999992609024', 'worst concavity > 0.4524500072002411', 'worst concave points > 0.16029999405145645', 'worst symmetry > 0.37139999866485596', 'worst fractal dimension > 0.10035499930381775')\n"
     ]
    }
   ],
   "source": [
    "explainer = Explainer.initialize(model, instance=instance)\n",
    "\n",
    "direct = explainer.direct_reason()\n",
    "print(\"len direct reason:\", len(direct))\n",
    "\n",
    "sufficient = explainer.sufficient_reason()\n",
    "print(\"len sufficient reason:\", len(sufficient))\n",
    "\n",
    "print(\"to_features:\", explainer.to_features(sufficient))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "{: .attention }\n> Setting ```learner.feature_names``` allows the ```to_features``` method to display the correct feature names. If not set, the feature names will be of the form f1, f2, f3, ..., f30, where the numbers correspond to the rank of the feature in the dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load/Save From Libraries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "The creation of ML models and the calculation of explanations are done by two different programs. You can save them using the first one and  load them using the second one. "
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Scikit-learn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After importing a Scikit-learn model into PyXAI using ```import_models```, you can save it with ```save``` and reload it later with ```load```."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyxai import Learning\nfrom sklearn import datasets\nfrom sklearn.ensemble import RandomForestClassifier\n\nrf = RandomForestClassifier()\nX, Y = datasets.load_breast_cancer(return_X_y=True)\nrf.fit(X, Y)\n\nlearner, model = Learning.ModelIO.import_models(rf, instances_type='tabular')\nLearning.ModelIO.save(model, \"my_models/\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can reload this model in another program using ```load```:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyxai import Learning\n\nlearner, model = Learning.ModelIO.load(\"my_models/\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  XGBoost"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After importing an XGBoost model into PyXAI using ```import_models```, you can save it with ```save``` and reload it later with ```load```. See the [XGBoost documentation](https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html) for the native format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyxai import Learning\nfrom sklearn import datasets\nfrom xgboost import XGBClassifier\n\nX, Y = datasets.load_iris(return_X_y=True)\nbt = XGBClassifier(eval_metric=\"mlogloss\")\nbt.fit(X, Y)\n\nlearner, model = Learning.ModelIO.import_models(bt, instances_type='tabular')\nLearning.ModelIO.save(model, \"my_models/\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can reload this model in another program using ```load```:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyxai import Learning\n\nlearner, model = Learning.ModelIO.load(\"my_models/\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### LightGBM"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After importing a LightGBM model into PyXAI using ```import_models```, you can save it with ```save``` and reload it later with ```load```. See the [LightGBM documentation](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html) for the native format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyxai import Learning\nfrom sklearn import datasets\nimport lightgbm\n\nX, Y = datasets.load_iris(return_X_y=True)\nlgbm = lightgbm.LGBMRegressor(n_estimators=5, random_state=0)\nlgbm.fit(X, Y)\n\nlearner, model = Learning.ModelIO.import_models(lgbm, instances_type='tabular')\nLearning.ModelIO.save(model, \"my_models/\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can reload this model in another program using ```load```:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyxai import Learning\n\nlearner, model = Learning.ModelIO.load(\"my_models/\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example with cross-validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example shows how to import models and compute explanations. We start by implementing a function to process the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas\n",
    "import numpy\n",
    "\n",
    "def load_dataset(dataset):\n",
    "    data = pandas.read_csv(dataset).copy()\n",
    "\n",
    "    # extract labels\n",
    "    labels = data[data.columns[-1]]\n",
    "    labels = numpy.array(labels)\n",
    "\n",
    "    # remove the label of each instance\n",
    "    data = data.drop(columns=[data.columns[-1]])\n",
    "\n",
    "    # extract the feature names\n",
    "    feature_names = list(data.columns)\n",
    "\n",
    "    return data.values, labels, feature_names"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we implement a function performing cross-validation. More precisely, we use the Leave One Group Out cross-validator of [Scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeaveOneGroupOut.html) and a ```lightgbm.LGBMRegressor``` from the [LightGBM](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html) library:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "import functools\n",
    "import random \n",
    "import operator\n",
    "import lightgbm\n",
    "from sklearn.model_selection import LeaveOneGroupOut\n",
    "\n",
    "def cross_validation(X, Y, n_trees=100, n_forests=2) :\n",
    "    n_instance = len(Y)\n",
    "    quotient = n_instance // n_forests\n",
    "    remain = n_instance % n_forests\n",
    "\n",
    "    # Groups creation\n",
    "    groups = [quotient*[i] for i in range(1,n_forests+1)]\n",
    "    groups = functools.reduce(operator.iconcat, groups, [])\n",
    "    groups += [i for i in range(1,remain+1)]\n",
    "    random.shuffle(groups)\n",
    "\n",
    "    # Variable definition\n",
    "    loo = LeaveOneGroupOut()\n",
    "    forests = []\n",
    "    i = 0\n",
    "    for index_training, index_test in loo.split(X, Y, groups=groups):\n",
    "        if i < n_forests:\n",
    "            i += 1\n",
    "        # Creation of instances (X) and labels (Y) according to the index of loo.split() \n",
    "        # for both training and test set\n",
    "        x_train = [X[x] for x in index_training]\n",
    "        y_train = [Y[x] for x in index_training]\n",
    "        x_test = [X[x] for x in index_test]\n",
    "        y_test = [Y[x] for x in index_test]\n",
    "\n",
    "        # Training phase\n",
    "        learner = lightgbm.LGBMRegressor(n_estimators=5, random_state=0)\n",
    "        learner.fit(x_train, y_train)\n",
    "        # Get the classifier prediction of the test set  \n",
    "        y_predict = learner.predict(x_test)\n",
    "\n",
    "        forests.append((learner, index_training, index_test))\n",
    "    return forests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we use the two previous functions and import the models into PyXAI to compute explanations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyxai import Learning, Explainer\n\ndata, labels, feature_names = load_dataset(\"../dataset/winequality-red.csv\")\nresults = cross_validation(data, labels, n_trees=5)\n\nmodels = [result[0] for result in results]\ntraining_indexes = [result[1] for result in results]\ntest_indexes = [result[2] for result in results]\n\nlearner, models = Learning.ModelIO.import_models(models, instances_type='tabular')\n\nfor i, model in enumerate(models):\n    instances = learner.get_instances(dataset=\"../dataset/winequality-red.csv\", model=model, n=2, indexes=Learning.TEST, test_indexes=test_indexes[i])\n\n    for (instance, prediction_classifier) in instances:\n        explainer = Explainer.initialize(model, instance=instance)\n        prediction = model.predict_instance(instance)\n        print(\"prediction:\", prediction)\n        direct = explainer.direct_reason()\n        print(\"len direct reason:\", len(direct))\n        explainer.set_interval(prediction - 0.2, prediction + 0.2)\n        ts = explainer.tree_specific_reason()\n        print(\"len tree_specific_reason:\", len(ts))\n        print(\"---------------------------\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With PyXAI, you can also generate your own models. See the [Generating Models]({{ site.baseurl }}/documentation/learning/builder/) page for more information."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}