{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b1db8c9a",
   "metadata": {},
   "source": [
    "# Direct Reason"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "514a5144",
   "metadata": {},
   "source": [
    "Let $BT$ be a boosted tree composed of {$T_1,\\ldots T_n$} regression trees and $x$ an instance, the **direct reason** for $x$ is the term of the binary representation corresponding to the conjunction of the terms corresponding to the root-to-leaf paths of all $T_i$ that are compatible with $x$. Due to its simplicity, it is one of the easiest abductive explanation for $x$ to compute, but it can be highly redundant. More information about the direct reason can be found in the article [Computing Abductive Explanations for Boosted Trees](https://arxiv.org/abs/2209.07740)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d88fcc64",
   "metadata": {},
   "source": [
    "| <font style=\"font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;\" size=\"+1pt\">&lt;Explainer Object&gt;.direct_reason():</font> | \n",
    "| :----------- | \n",
    "| Returns the direct reason for the current instance. Returns ```None``` if this reason contains some excluded features. All kinds of operators in the conditions are supported. This reason is in the form of binary variables, you must use the ```to_features ``` method if  you want to obtain a representation based on the features considered at start. |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4432d14",
   "metadata": {},
   "source": [
    "The basic methods (```initialize```, ```set_instance```, ```to_features```, ```is_reason```, ...) of the ```explainer``` module used in the next examples are described in the [Explainer Principles](/documentation/explainer/) page. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d70f9a73",
   "metadata": {},
   "source": [
    "## Example from Hand-Crafted Trees"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3df1afd3",
   "metadata": {},
   "source": [
    "For this example, we take an example of binary classification from the [Building Models](/documentation/learning/builder/BTbuilder/) page. This figure represents a boosted tree $BT$ using $4$ features ($A_1$, $A_2$, $A_3$ and $A_4$), where $A_1$ and $A_2$ are numerical, $A_3$ is categorical and $A_4$ is Boolean. The direct reason for the instance $x$ = ($A_1=4$, $A_2 = 3$, $A_3 = 1$, $A_4 = 1$) is in red. This reason contains all features of the instance. \n",
    "\n",
    "<img src=\"attachment:BTdirect.png\" alt=\"BTdirect\" width=\"700\" />\n",
    "\n",
    "We have $w(T_1, x)=0.3$, $w(T_2, x)=0.5$ and $w(T_3, x)=0.1$. So $W(F, x) = 0.9$. As we are in the case of  binary classification and $W(F, x) > 0$, $x$ is classified as a positive instance ($BT(x) = 1$).\n",
    "\n",
    "{: .attention }\n",
    "> We consider that the features $A_3$ and $A_4$ are numerical. Categorical and Boolean features will be implemented in future versions of PyXAI.     \n",
    "\n",
    "We now show how to get direct reasons using PyXAI: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "411398a5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyxai import Builder, Explainer\n",
    "\n",
    "node1_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.3)\n",
    "node1_2 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=-0.3, right=node1_1)\n",
    "node1_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=0.4, right=node1_2)\n",
    "node1_4 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=node1_3)\n",
    "tree1 = Builder.DecisionTree(4, node1_4)\n",
    "\n",
    "node2_1 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.4, right=0.3)\n",
    "node2_2 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=node2_1)\n",
    "node2_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node2_2, right=0.5)\n",
    "tree2 = Builder.DecisionTree(4, node2_3)\n",
    "\n",
    "node3_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=0.2, right=0.3)\n",
    "node3_2_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.2)\n",
    "node3_2_2 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.1, right=node3_1)\n",
    "node3_2_3 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=0.1)\n",
    "node3_3_1 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node3_2_1, right=node3_2_2)\n",
    "node3_3_2 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=-0.4, right=node3_2_3)\n",
    "node3_4 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=node3_3_1, right=node3_3_2)\n",
    "tree3 = Builder.DecisionTree(4, node3_4)\n",
    "\n",
    "BT = Builder.BoostedTrees([tree1, tree2, tree3], n_classes=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2263e89",
   "metadata": {},
   "source": [
    "We compute the direct reason for this instance: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "935837ea",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "instance: (4,3,2,1)\n",
      "binary_representation: (1, 2, 3, 4)\n",
      "target_prediction: 1\n",
      "direct: (1, 2, 3, 4)\n",
      "to_features: ('f1 > 2', 'f2 > 1', 'f3 == 1', 'f4 == 1')\n"
     ]
    }
   ],
   "source": [
    "explainer = Explainer.initialize(BT)\n",
    "explainer.set_instance((4,3,1,1))\n",
    "direct = explainer.direct_reason()\n",
    "print(\"instance: (4,3,2,1)\")\n",
    "print(\"binary_representation:\", explainer.binary_representation)\n",
    "print(\"target_prediction:\", explainer.target_prediction)\n",
    "print(\"direct:\", direct)\n",
    "print(\"to_features:\", explainer.to_features(direct))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc75b1a7",
   "metadata": {},
   "source": [
    "As you can see, in this case, the direct reason coincides with the full instance."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4061b821",
   "metadata": {},
   "source": [
    "## Example from a Real Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c187df9",
   "metadata": {},
   "source": [
    "For this example, we take the ```compas.csv``` dataset. We create one model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "e3fe96a1",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "data:\n",
      "      Number_of_Priors  score_factor  Age_Above_FourtyFive   \n",
      "0                    0             0                     1  \\\n",
      "1                    0             0                     0   \n",
      "2                    4             0                     0   \n",
      "3                    0             0                     0   \n",
      "4                   14             1                     0   \n",
      "...                ...           ...                   ...   \n",
      "6167                 0             1                     0   \n",
      "6168                 0             0                     0   \n",
      "6169                 0             0                     1   \n",
      "6170                 3             0                     0   \n",
      "6171                 2             0                     0   \n",
      "\n",
      "      Age_Below_TwentyFive  African_American  Asian  Hispanic   \n",
      "0                        0                 0      0         0  \\\n",
      "1                        0                 1      0         0   \n",
      "2                        1                 1      0         0   \n",
      "3                        0                 0      0         0   \n",
      "4                        0                 0      0         0   \n",
      "...                    ...               ...    ...       ...   \n",
      "6167                     1                 1      0         0   \n",
      "6168                     1                 1      0         0   \n",
      "6169                     0                 0      0         0   \n",
      "6170                     0                 1      0         0   \n",
      "6171                     1                 0      0         1   \n",
      "\n",
      "      Native_American  Other  Female  Misdemeanor  Two_yr_Recidivism  \n",
      "0                   0      1       0            0                  0  \n",
      "1                   0      0       0            0                  1  \n",
      "2                   0      0       0            0                  1  \n",
      "3                   0      1       0            1                  0  \n",
      "4                   0      0       0            0                  1  \n",
      "...               ...    ...     ...          ...                ...  \n",
      "6167                0      0       0            0                  0  \n",
      "6168                0      0       0            0                  0  \n",
      "6169                0      1       0            0                  0  \n",
      "6170                0      0       1            1                  0  \n",
      "6171                0      0       1            0                  1  \n",
      "\n",
      "[6172 rows x 12 columns]\n",
      "--------------   Information   ---------------\n",
      "Dataset name: ../../../dataset/compas.csv\n",
      "nFeatures (nAttributes, with the labels): 12\n",
      "nInstances (nObservations): 6172\n",
      "nLabels: 2\n",
      "---------------   Evaluation   ---------------\n",
      "method: HoldOut\n",
      "output: BT\n",
      "learner_type: Classification\n",
      "learner_options: {'seed': 0, 'max_depth': None, 'eval_metric': 'mlogloss'}\n",
      "---------   Evaluation Information   ---------\n",
      "For the evaluation number 0:\n",
      "metrics:\n",
      "   accuracy: 66.73866090712744\n",
      "nTraining instances: 4320\n",
      "nTest instances: 1852\n",
      "\n",
      "---------------   Explainer   ----------------\n",
      "For the evaluation number 0:\n",
      "**Boosted Tree model**\n",
      "NClasses: 2\n",
      "nTrees: 100\n",
      "nVariables: 42\n",
      "\n",
      "---------------   Instances   ----------------\n",
      "number of instances selected: 1\n",
      "----------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "from pyxai import Learning, Explainer\n",
    "\n",
    "learner = Learning.Xgboost(\"../../../dataset/compas.csv\", learner_type=Learning.CLASSIFICATION)\n",
    "model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.BT)\n",
    "instance, prediction = learner.get_instances(model, n=1, correct=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bcd926f6",
   "metadata": {},
   "source": [
    "Finally, we display the direct reason for this instance: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "f5ba8cd7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "instance: [0 0 1 0 0 0 0 0 1 0 0]\n",
      "prediction: 0\n",
      "\n",
      "len binary representation: 42\n",
      "len direct: 38\n",
      "is_reason: True\n",
      "to_features: ('Number_of_Priors < 0.5', 'score_factor < 0.5', 'Age_Above_FourtyFive >= 0.5', 'Age_Below_TwentyFive < 0.5', 'African_American < 0.5', 'Asian < 0.5', 'Hispanic < 0.5', 'Native_American < 0.5', 'Other >= 0.5', 'Female < 0.5', 'Misdemeanor < 0.5')\n"
     ]
    }
   ],
   "source": [
    "explainer = Explainer.initialize(model, instance)\n",
    "print(\"instance:\", instance)\n",
    "print(\"prediction:\", prediction)\n",
    "print()\n",
    "direct_reason = explainer.direct_reason()\n",
    "print(\"len binary representation:\", len(explainer.binary_representation))\n",
    "print(\"len direct:\", len(direct_reason))\n",
    "print(\"is_reason:\", explainer.is_reason(direct_reason))\n",
    "print(\"to_features:\", explainer.to_features(direct_reason))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "663667d5",
   "metadata": {},
   "source": [
    "We can remark that this direct reason contains 38 binary variables of the implicant out of 42. This reason explains why the model predicts $0$ for this instance. But this is probably not the most compact reason for this instance, we invite you to look at the other types of reasons presented on the [Explanations Computation](/documentation/explanations/BTexplanations/) page.  "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}