{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b1db8c9a",
   "metadata": {},
   "source": [
    "# Sufficient Reasons"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "514a5144",
   "metadata": {},
   "source": [
    "Let $f$ be a Boolean function represented by a random forest $RF$, $x$ be an instance and $1$ (resp. $0$) the prediction of $RF$ on $x$ ($f(x) = 1$ (resp 0)), a **sufficient reason** for $x$ is a term of the binary representation of the instance that is a prime implicant of $f$ (resp $\\neg f$) that covers $x$.\n",
    "\n",
    "In other words, a **sufficient reason** for an instance $x$ given a class described by a Boolean function $f$ is a subset $t$ of the characteristics of $x$ that is minimal w.r.t. set inclusion,and such that any instance $x'$ sharing this set t of characteristics is classified by $f$ as $x$ is.\n",
    "\n",
    "More information about sufficient reasons can be found in the paper [Trading Complexity for Sparsity in Random Forest Explanations](https://ojs.aaai.org/index.php/AAAI/article/view/20484).\n",
    "\n",
    "The function ```ExplainerRF.sufficient_reason``` allows computing this kind of explanation.\n",
    "\n",
    "The library also provides a way to check that a reason is sufficient using the function ```is_sufficient_reason```."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b326e678",
   "metadata": {},
   "source": "A sufficient reason is minimal w.r.t. set inclusion, i.e., there is no subset of this reason which is also a sufficient reason. A **minimal sufficient reason** for $x$ is a sufficient reason for $x$ that\ncontains a minimal number of literals. In other words, a **minimal sufficient reason** has a minimal size.  \n\nThe function ```ExplainerRF.minimal_sufficient_reason``` allows computing this kind of explanation.\n"
  },
  {
   "cell_type": "markdown",
   "id": "f766d997",
   "metadata": {},
   "source": [
    "The PyXAI library also provides a way to verify that a reason is sufficient:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8fcf0b8",
   "metadata": {},
   "source": [
    "{: .warning}\n",
    "> Unfortunately, searching for MUS or even more a minimal MUS is a difficult computational task. If the dataset contains a lot of features or if the binary representation of the instance contains many binary variables, finding a MUS may be out of reach. In order to deal with this problem we introduced the notion of [Majoritary Reason](/documentation/classification/RFexplanations/majoritary/) which is an abductive explanation much easier to compute. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "869cba3c",
   "metadata": {},
   "source": [
    "## Example from Hand-Crafted Trees"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a557012",
   "metadata": {},
   "source": [
    "For this example, we take the random forest of the [Building Models](/documentation/learning/builder/RFbuilder/) page consisting of $4$ binary features ($x_1$, $x_2$, $x_3$ and $x_4$). \n",
    "\n",
    "The following figure shows in red and bold a minimal sufficient reason $(x_1, x_4)$ for the instance $(1,1,1,1)$. \n",
    "<img src=\"attachment:RFsufficient1.png\" alt=\"RFsufficient1\" width=\"700\" />\n",
    "\n",
    "As you can see in the figure, some leaves of this sufficient reason (in red) can have a prediction equal to 0 or 1. These are the predictions from the trees ($T_1$, $T_2$ and $T_3$), but not from the random forest. We need to calculate for each possible interpretation arising from this reason the prediction $f$ from the random forest:    \n",
    "<table>\n",
    "<thead>\n",
    "  <tr>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$x_1$</th>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$x_2$</th>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$x_3$</th>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$x_4$</th>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$T_1$</th>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$T_2$</th>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$T_3$</th>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$f$</th>\n",
    "  </tr>\n",
    "</thead>\n",
    "<tbody>\n",
    "  <tr>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>1</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "  </tr>\n",
    "</tbody>\n",
    "</table>\n",
    "<style>\n",
    "td{text-align: center;}\n",
    "</style>\n",
    "As at least 2 trees out of 3 give the right prediction (1), $(x_1, x_4)$ is indeed a sufficient reason. \n",
    "\n",
    "The next figure shows in blue and bold a minimal sufficient reason $(-x_4)$ for the instance $(0,1,0,0)$. \n",
    "<img src=\"attachment:RFsufficient2.png\" alt=\"RFsufficient2\" width=\"700\" />\n",
    "\n",
    "As before, we compute the predictions associated with this reason: \n",
    "\n",
    "<table>\n",
    "<thead>\n",
    "  <tr>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$x_1$</th>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$x_2$</th>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$x_3$</th>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$x_4$</th>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$T_1$</th>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$T_2$</th>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$T_3$</th>\n",
    "    <th style='padding-left:0!important;padding-right:0!important;min-width:15px!important'>$f$</th>\n",
    "  </tr>\n",
    "</thead>\n",
    "<tbody>\n",
    "  <tr>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>0</td>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>0</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "    <td>1</td>\n",
    "    <td>0</td>\n",
    "    <td>0</td>\n",
    "  </tr>\n",
    "</tbody>\n",
    "</table>\n",
    "\n",
    "As at least 2 trees out of 3 have the right prediction (0), $(-x_4)$ is indeed a sufficient reason. \n",
    "\n",
    "Now, we show how to get them with PyXAI. We start by building the random forest:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "745fbf2c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyxai import Builder, Explaining\n",
    "\n",
    "nodeT1_1 = Builder.DecisionNode(1, left=0, right=1)\n",
    "nodeT1_3 = Builder.DecisionNode(3, left=0, right=nodeT1_1)\n",
    "nodeT1_2 = Builder.DecisionNode(2, left=1, right=nodeT1_3)\n",
    "nodeT1_4 = Builder.DecisionNode(4, left=0, right=nodeT1_2)\n",
    "\n",
    "tree1 = Builder.DecisionTree(4, nodeT1_4, force_features_equal_to_binaries=True)\n",
    "\n",
    "nodeT2_4 = Builder.DecisionNode(4, left=0, right=1)\n",
    "nodeT2_1 = Builder.DecisionNode(1, left=0, right=nodeT2_4)\n",
    "nodeT2_2 = Builder.DecisionNode(2, left=nodeT2_1, right=1)\n",
    "\n",
    "tree2 = Builder.DecisionTree(4, nodeT2_2, force_features_equal_to_binaries=True) #4 features but only 3 used\n",
    "\n",
    "nodeT3_1_1 = Builder.DecisionNode(1, left=0, right=1)\n",
    "nodeT3_1_2 = Builder.DecisionNode(1, left=0, right=1)\n",
    "nodeT3_4_1 = Builder.DecisionNode(4, left=0, right=nodeT3_1_1)\n",
    "nodeT3_4_2 = Builder.DecisionNode(4, left=0, right=1)\n",
    "\n",
    "nodeT3_2_1 = Builder.DecisionNode(2, left=nodeT3_1_2, right=nodeT3_4_1)\n",
    "nodeT3_2_2 = Builder.DecisionNode(2, left=0, right=nodeT3_4_2)\n",
    "\n",
    "nodeT3_3_1 = Builder.DecisionNode(3, left=nodeT3_2_1, right=nodeT3_2_2)\n",
    "\n",
    "tree3 = Builder.DecisionTree(4, nodeT3_3_1, force_features_equal_to_binaries=True)\n",
    "forest = Builder.RandomForest([tree1, tree2, tree3], n_classes=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bad9b535",
   "metadata": {},
   "source": [
    "Then we compute a sufficient reasons for each of these two instances: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "0f5c98bf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "minimal: (1, 4)\n",
      "-------------------------------\n",
      "target_prediction: 0\n",
      "sufficient: (-1, -3)\n",
      "minimal: (-4,)\n"
     ]
    }
   ],
   "source": [
    "explainer = Explaining.initialize(forest)\n",
    "explainer.set_instance((1,1,1,1))\n",
    "\n",
    "sufficient = explainer.sufficient_reason()\n",
    "assert explainer.is_sufficient_reason(sufficient)\n",
    "assert sufficient == (1, 4), \"The sufficient reason is not good !\"\n",
    "\n",
    "minimal = explainer.minimal_sufficient_reason()\n",
    "print(\"minimal:\", minimal)\n",
    "assert minimal == (1, 4), \"The minimal sufficient reason is not good !\"\n",
    "\n",
    "print(\"-------------------------------\")\n",
    "instance = (0,1,0,0)\n",
    "explainer.set_instance(instance)\n",
    "print(\"target_prediction:\", explainer.target_prediction)\n",
    "\n",
    "sufficient = explainer.sufficient_reason()\n",
    "print(\"sufficient:\", sufficient)\n",
    "assert sufficient == (-1, -3), \"The sufficient reason is not good !\"\n",
    "\n",
    "minimal = explainer.minimal_sufficient_reason()\n",
    "print(\"minimal:\", minimal)\n",
    "assert minimal == (-4, ), \"The minimal sufficient reason is not good !\" \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0420183",
   "metadata": {},
   "source": [
    "## Example from a Real dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03c8f44e",
   "metadata": {},
   "source": [
    "For this example, we take the [compas](/assets/notebooks/dataset/compas.csv) dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "5a1c9c9b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------   Information   ---------------\n",
      "Problem type: classification\n",
      "Instances type: tabular\n",
      "Labels type: classes\n",
      "\n",
      "Dataset path: ../../../dataset/compas.csv\n",
      "nFeatures (nAttributes, with the labels): 11\n",
      "nInstances (nObservations): 6172\n",
      "nLabels: 2\n",
      "---------------   Model creation, fitting and evaluation  ---------------\n",
      "Splitting method: hold-out\n",
      "Problem type: classification\n",
      "Models type: random-forest\n",
      "model_parameters: {}\n",
      "---------   Evaluation Information   ---------\n",
      "For the evaluation number 0:\n",
      "Metrics:\n",
      "   sklearn_confusion_matrix: [[612, 211], [295, 425]]\n",
      "   precision: 66.82389937106919\n",
      "   recall: 59.02777777777778\n",
      "   f1_score: 62.68436578171092\n",
      "   specificity: 74.36208991494532\n",
      "   true_positive: 425\n",
      "   true_negative: 612\n",
      "   false_positive: 211\n",
      "   false_negative: 295\n",
      "   accuracy: 67.20674011665587\n",
      "Number of Training instances: 4629\n",
      "Number of Testing instances: 1543\n",
      "\n",
      "---------------   Explainer   ----------------\n",
      "For the split number 0:\n",
      "**Random Forest Model**\n",
      "nClasses: 2\n",
      "nTrees: 100\n",
      "nVariables: 70\n",
      "\n",
      "---------------   Instances   ----------------\n",
      "Number of instances selected: 1\n",
      "----------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "from pyxai import Learning, Explaining\n",
    "\n",
    "learner = Learning.Scikitlearn(\"../../../dataset/compas.csv\", problem_type='classification')\n",
    "model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.RF)\n",
    "instance, prediction = learner.get_instances(model, n=1, is_correct=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4cacbab0",
   "metadata": {},
   "source": [
    "This dataset is not very large and the computation of a sufficient reason is quite easy, but it is not so easy to derive a minimal one. Since the related solver (Optux) does not propose a time limit mode we commented the related code. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "b7691f19",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "instance: Misdemeanor             0\n",
      "Number_of_Priors        0\n",
      "score_factor            0\n",
      "Age_Above_FourtyFive    1\n",
      "Age_Below_TwentyFive    0\n",
      "African_American        0\n",
      "Asian                   0\n",
      "Hispanic                0\n",
      "Native_American         0\n",
      "Other                   1\n",
      "Female                  0\n",
      "Name: 0, dtype: int64\n",
      "prediction: 0\n",
      "\n",
      "\n",
      "sufficient reason: (-1, -2, -3, -4, -6, -11, -12)\n",
      "to features ['Number_of_Priors <= 0.5', 'score_factor <= 0.5', 'African_American <= 0.5', 'Hispanic <= 0.5', 'Female <= 0.5']\n",
      "is sufficient_reason (for max 50 checks):  None\n",
      "\n"
     ]
    }
   ],
   "source": [
    "explainer = Explaining.initialize(model, instance)\n",
    "print(\"instance:\", instance)\n",
    "print(\"prediction:\", prediction)\n",
    "print()\n",
    "sufficient_reason = explainer.sufficient_reason()\n",
    "print(\"\\nsufficient reason:\", sufficient_reason)\n",
    "print(\"to features\", explainer.to_features(sufficient_reason))\n",
    "print(\"is sufficient_reason (for max 50 checks): \", explainer.is_sufficient_reason(sufficient_reason, n_samples=50))\n",
    "print()\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}