{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "601b0667",
   "metadata": {},
   "source": [
    "# Concepts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43ffdd12",
   "metadata": {},
   "source": [
    "This section deals with the concepts of the ```Explainer``` object of PyXAI. First, we show you how to use it, then we explain in detail the notion of binary variables, and to finish, we give an example."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "107a059b",
   "metadata": {},
   "source": [
    "## Main Methods"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49832bd4",
   "metadata": {},
   "source": [
    "First of all, in order to explain instances from a given model, you need to create an Explainer Object. This is done\n",
    "using the function ```Explainer.initialize```."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e84822a",
   "metadata": {},
   "source": [
    "| <font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\" size=\"+1pt\">Explainer.initialize(model, instance=None, features_type=None):</font>|\n",
    "|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| \n",
    "| Depending on the model given in the first argument, this method creates an ```ExplainerDT```, an ```ExplainerRF``` an ```ExplainerBT```, or an ```ExplainerRegressionBT``` object. This object is able to give explanations about the instance given as a second parameter. This last parameter is optional because you can set the instance later using the ```set_instance``` function. |\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">model</font></b> ```DecisionTree``` ```RandomForest``` ```BoostedTree```: The model for which  explanations will be calculated.|\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">instance</font></b> ```Numpy Array``` of ```Float```: The instance to be explained. Default value is ```None```.|\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">features_type</font></b> ```String``` ```Dict``` ```None```: Either a dictionary indicating the type of features or the path to a ```.types``` file containing this information. Activate domain theories. More details are given on the [Theories](/documentation/explainer/theories/) page. |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32ef6d0c",
   "metadata": {},
   "source": [
    "Once the ```Explainer``` is created you can explain several instances using it. Each time you want to explain a new instance, you need to call the function ```set_instance``` (it is not necessary to create a new ```Explainer``` object)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da895ea8",
   "metadata": {},
   "source": [
    "| <font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\" size=\"+1pt\">&lt;Explainer Object&gt;.set_instance(instance):</font>                    |\n",
    "|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n",
    "| Sets a new instance to be explained.                                                                                                                                                                                   |\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">instance</font></b> ```Numpy Array``` of ```Float```: The instance to be explained. |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcad9d89",
   "metadata": {},
   "source": [
    "The prediction made by the ML model is given by the ```target_prediction``` variable of the ```Explainer``` Object. Now, we have to consider binary representation."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa3aa966",
   "metadata": {},
   "source": [
    "## Binary representation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7432c7d8",
   "metadata": {},
   "source": [
    "First, let us recall that all the ML models we considerconsist of trees (only one for a decision tree). Each tree contains nodes\n",
    "representing conditions \"\\<id_feature\\> \\<operator\\> \\<threshold\\> ?\" (such as \"$x_4 \\ge 0.5$ ?\"). Internally,\n",
    "the ```Explainer``` works with these conditions, that are treated as Boolean variables. The **binary representation**\n",
    "of an instance is a set of Boolean variables matching such conditions. Each Boolean\n",
    "variable\n",
    "represents a condition \"\\<id_feature\\> \\<operator\\> \\<threshold\\> ?\" of the model. The binary representation\n",
    "can be found in the ```binary_variable``` variable of\n",
    "the ```Explainer``` Object. The function ```to_features``` converts a binary representation (or an  explanation) into a tuple of conditions \"\n",
    "\\<id_feature\\> \\<operator\\> \\<threshold\\>\" representing the features used."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f45a3e1",
   "metadata": {},
   "source": [
    "| <font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\" size=\"+1pt\">&lt;Explainer Object&gt;.to_features(self, binary_representation, *, eliminate_redundant_features=True, details=False, contrastive=False, without_intervals=False):</font>|\n",
    "|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n",
    "| When the parameter details is set to ```False```, returns a ```Tuple``` of ```String``` where each ```String``` represents a condition \"\\<id_feature\\> \\<operator\\> \\<threshold\\>\" associated with the binary representation given as first parameter. By default, a string represents such a condition but if you need more information, you can set the parameter ```details``` to ```True```. In this case, the method returns a ```Tuple``` of ```Dict``` where each dictionary provides more information on the condition. This method also allows one to eliminate redundant conditions. For example, if we have \"feature_a > 3\" and \"feature_a > 2\", we keep only the binary variable linked to the Boolean corresponding to the \"feature_a > 3\". Therefore, if you want to get all conditions, you have to set the parameter ```eliminate_redundant``` to ```False```. |\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">binary_representation</font></b> ```List``` ```Tuple```: A set of (signed) binary variables. |\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">eliminate_redundant_features</font></b> ```Boolean```: ```True``` or ```False``` depending if you want eliminate or not redundant conditions. Default value is ```True```.|\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">details</font></b> ```Boolean```: ```True``` or ```False``` depending on whether you want details or not. Default value is ```False```.| \n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">contrastive</font></b> ```Boolean```: ```True``` or ```False``` depending on whether you want to get a contrastive explanation or not. When this parameter is set to ```True```, the elimination of redundant features must be reversed. Default value is ```False```.|\n",
    "| <b><font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\">without_intervals</font></b> ```Boolean```: ```True``` or ```False``` depending if you want to consider a compact representation with intervals or not.|\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4621eeab",
   "metadata": {},
   "source": [
    "{: .note}\n",
    "> The details provided with the details parameter set to ```True``` in the to_features function are represented by the\n",
    "> keys of the returned dictionary:\n",
    "\n",
    "- ```[\"id\"]```: The id of the feature.\n",
    "- ```[\"name\"]```: The name of the feature (if labels are known, otherwise they are named f1, f2 and so on).\n",
    "- ```[\"operator\"]```: The operator associated with the condition.\n",
    "- ```[\"threshold\"]```: The threshold of the condition.\n",
    "- ```[\"sign\"]```: The sign of the Boolean variable in the binary representation: ```True``` if the condition is satisfied\n",
    "  else ```False```.\n",
    "- ```[\"weight\"]```: The weight of the condition, used only with user preferences."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eeeaf935",
   "metadata": {},
   "source": [
    "{: .note}\n",
    "> Explanations computed using  out  explainer module may contain redundant conditions. Let us take an example with a feature\n",
    "> $f_1$ and two Boolean variables $x_1$ and $x_2$ associated with the condition $(f_1 \\ge 5)$ and $(f_1 \\ge 3)$\n",
    "> respectively. If in the instance $f_1=6$ then $x_1$ and $x_2$ are set to true. The explanation that is derived can involve\n",
    "> both\n",
    "> of them. By setting the ```eliminate_redundant_features parameter``` to ```True``` in the method ```to_features```,\n",
    "> we remove $(f_1 \\ge 3)$ which is redundant."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d72e250",
   "metadata": {},
   "source": [
    "{: .attention }\n",
    "> Forgetting to set parameter ```contrastive``` to ```True``` to display a contrastive explanation may result in an incorrect explanation. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4b1899e",
   "metadata": {},
   "source": [
    "The function ```to_features``` is independent of the instance given by the explainer through the ```initialize```\n",
    "and ```set_instance``` methods, but depends only on the binary representation given by the parameter."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4896445c",
   "metadata": {},
   "source": [
    "## Example"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4057ff0",
   "metadata": {},
   "source": [
    "We present in the following an example based on the dataset [iris](/assets/notebooks/dataset/iris.csv) and a Decision Tree as ML model. You should take a look to the [Generating Model](/documentation/learning/) page if you need more information about the ```Learning``` module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "7a754ae5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "data:\n",
      "     Sepal.Length  Sepal.Width  Petal.Length  Petal.Width         Species\n",
      "0             5.1          3.5           1.4          0.2     Iris-setosa\n",
      "1             4.9          3.0           1.4          0.2     Iris-setosa\n",
      "2             4.7          3.2           1.3          0.2     Iris-setosa\n",
      "3             4.6          3.1           1.5          0.2     Iris-setosa\n",
      "4             5.0          3.6           1.4          0.2     Iris-setosa\n",
      "..            ...          ...           ...          ...             ...\n",
      "145           6.7          3.0           5.2          2.3  Iris-virginica\n",
      "146           6.3          2.5           5.0          1.9  Iris-virginica\n",
      "147           6.5          3.0           5.2          2.0  Iris-virginica\n",
      "148           6.2          3.4           5.4          2.3  Iris-virginica\n",
      "149           5.9          3.0           5.1          1.8  Iris-virginica\n",
      "\n",
      "[150 rows x 5 columns]\n",
      "--------------   Information   ---------------\n",
      "Dataset name: ../../dataset/iris.csv\n",
      "nFeatures (nAttributes, with the labels): 5\n",
      "nInstances (nObservations): 150\n",
      "nLabels: 3\n",
      "---------------   Evaluation   ---------------\n",
      "method: HoldOut\n",
      "output: DT\n",
      "learner_type: Classification\n",
      "learner_options: {'max_depth': None, 'random_state': 0}\n",
      "---------   Evaluation Information   ---------\n",
      "For the evaluation number 0:\n",
      "metrics:\n",
      "   accuracy: 97.77777777777777\n",
      "nTraining instances: 105\n",
      "nTest instances: 45\n",
      "\n",
      "---------------   Explainer   ----------------\n",
      "For the evaluation number 0:\n",
      "**Decision Tree Model**\n",
      "nFeatures: 4\n",
      "nNodes: 6\n",
      "nVariables: 5\n",
      "\n",
      "---------------   Instances   ----------------\n",
      "number of instances selected: 1\n",
      "----------------------------------------------\n",
      "instance: [5.1 3.5 1.4 0.2]\n",
      "binary representation: (-1, -2, -3, 4, -5)\n",
      "target_prediction: 0\n",
      "to_features: ('Sepal.Width > 3.100000023841858', 'Petal.Length <= 4.950000047683716', 'Petal.Width <= 0.75')\n",
      "to_features (keep redundant): ('Sepal.Width > 3.100000023841858', 'Petal.Length <= 4.950000047683716', 'Petal.Width <= 0.75', 'Petal.Width <= 1.6500000357627869', 'Petal.Width <= 1.75')\n",
      "to_features with details: OrderedDict([('Petal.Width', [{'id': 4, 'name': 'Petal.Width', 'operator': <OperatorCondition.GT: 52>, 'sign': True, 'operator_sign_considered': <OperatorCondition.LE: 51>, 'threshold': 0.75, 'weight': None, 'theory': None, 'string': 'Petal.Width <= 0.75'}])])\n"
     ]
    }
   ],
   "source": [
    "from pyxai import Learning, Explainer, Tools\n",
    "learner = Learning.Scikitlearn(\"../../dataset/iris.csv\", learner_type=Learning.CLASSIFICATION)\n",
    "model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.DT)\n",
    "\n",
    "instance, prediction = learner.get_instances(model, n=1, correct=True, predictions=[0])\n",
    "explainer = Explainer.initialize(model, instance)\n",
    "\n",
    "print(\"instance:\", instance)\n",
    "print(\"binary representation:\", explainer.binary_representation)\n",
    "print(\"target_prediction:\", explainer.target_prediction)\n",
    "\n",
    "print(\"to_features:\", explainer.to_features(explainer.binary_representation))\n",
    "print(\"to_features (keep redundant):\", explainer.to_features(explainer.binary_representation, eliminate_redundant_features=False))\n",
    "\n",
    "print(\"to_features with details:\", explainer.to_features([-1], details=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "373b9642",
   "metadata": {},
   "source": [
    "We notice that the binary representation contains more than 4 variables because the decision tree of the model is\n",
    "composed of five nodes (i.e., five conditions). Indeed, the feature Petal.Width appears 3 times whereas the feature Sepal.length does not appear. We can see that, for this binary representation, we can eliminate two redundant conditions related to the Petal.width feature."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}