{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "91e36913",
   "metadata": {},
   "source": [
    "## Tree-Specific Explanations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "514a5144",
   "metadata": {},
   "source": [
    "Let $BT$ be a Boosted Tree composed of {$T_1,\\ldots T_n$} regression trees and $x$ be an instance.\n",
    "* A **worst instance** extending $t$ given $F$ is an instance $x'$ such that $t \\subseteq t_{x'}$ and: $$x' = \\mathit{argmin}_{x'': t \\subseteq t_{x''}}(\\{w(F, x'')\\})$$\n",
    "* A **best instance** extending $t$ given $F$ is an instance $x'$ such that $t \\subseteq t_{x'}$ and: $$x' = \\mathit{argmax}_{x'': t \\subseteq t_{x''}}(\\{w(F, x'')\\})$$\n",
    "\n",
    "$W(t, F)$ (resp. $B(t, F)$) denotes the set of worst (resp. best) instances covered by $t$ given $F$, and $w_\\downarrow(t, F)$ (resp. $w_\\uparrow(t, F)$) denotes the weight of the worst (resp. best) instance covered by $t$ given $F$. For example, for the sufficient reason $t$ = ($A_1 > 2$, $A_4 = 1$) for the instance $x$ = ($A_1=4$, $A_2 = 3$, $A_3 = 1$, $A_4 = 1$) of the [sufficient reason](/documentation/classification/BTexplanations/sufficient/) page, a worst instance can be $x'$ = ($A_1=4$, $A_2 = 1$, $A_3 = 1$, $A_4 = 1$) with $w(F, x') = 0.3$ . Identifying a worst (resp. best) instance $x$\n",
    "for given a boosted tree is intractable. Interestingly, when the classifier consists of a regression tree,\n",
    "identifying an element of $W(t, T)$ (resp. $B(t, T)$) for a tree $T$ is easy. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3fb21492",
   "metadata": {},
   "source": [
    "We are now ready to define the notion of **tree-specific** explanation $t$ for an instance $x$ given a boosted tree $BT$. We start with the binary case, i.e., when $BT$ consists of a single forest $F$:\n",
    "\n",
    "* If $BT(x) = 1$, then $t$ is a **tree-specific** explanation for $x$ given $F$ if and only if $t$ is a subset of $t_{x}$ such that $\\sum_{k=1}^p w_\\downarrow(t, T_k) > 0$ and no proper subset of $t$ satisfies the latter condition.\n",
    "* If $BT(x) = 0$, then $t$ is a **tree-specific** explanation for $x$ given $F$ if and only if $t$ is a subset of $t_{x}$ such that $\\sum_{k=1}^p w_\\uparrow(t, T_k) \\leq 0$ and no proper subset of $t$ satisfies the latter condition.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b780cae3",
   "metadata": {},
   "source": [
    "More generally, in the multi-class setting, TS-explanations can be defined as follows:\n",
    "\n",
    "Let $BT = \\{F^1,\\cdots, F^{m}\\}$ be a boosted tree over $\\{A_1, \\ldots, A_n\\}$ where each $F^j$ ($j \\in [m]$) contains $p_j$ trees, and $x$ such that $BT(x) = i$. Conceptually, $t$ is a **tree-specific** explanation for $x$ given $BT$ if and only if $t$ is a subset of $t_{x}$ such that for every $j \\in [m] \\setminus \\{i\\}$, we have $\\sum_{k=1}^{p_i} w_\\downarrow(t, T_k^i) > \\sum_{k=1}^{p_j} w_\\uparrow(t, T_k^j)$, and no proper subset of $t$ satisfies the latter condition.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "701d48f7",
   "metadata": {},
   "source": [
    "In general, the notions of tree-specific explanation and of sufficient reason do not coincide. Indeed, a sufficient reason is a prime implicant (covering $x$) of the Boosted Tree $BT$, while a tree-specific explanation is an implicant $t$ (covering $x$). Since there is a simple, linear-time algorithm for computing $w_\\downarrow(t, F)$ and $w_\\uparrow(t, F)$ and for deriving a worst-case and a best-case instance, the **tree-specific** explanations for $x$ are much easier to compute than the sufficient reasons and remain abductive.\n",
    "\n",
    "More information about tree-specific explanations can be found in the paper [Computing Abductive Explanations\n",
    "for Boosted Trees](https://proceedings.mlr.press/v206/audemard23a/audemard23a.pdf).\n",
    "\n",
    "The function ```ExplainerBT.tree_specific_reason``` allows computing this kind of explanation.\n",
    "\n",
    "The library also provides a way to check that a reason is tree specific  using the function ```is_tree_specific_reason```."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "714c0d17",
   "metadata": {},
   "source": [
    "A natural way of improving the quality of tree-specific explanations is to seek for the most parsimonious ones. A **minimal tree-specific explanation** for $x$ is a tree-specific explanation for $x$ that\n",
    "contains a minimal number of literals. In other words, a **minimal tree-specific explanation** has a minimal size.  \n",
    "\n",
    "The function ```ExplainerBT.minimal_tree_specific_reasons``` allows computing this kind of explanation."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fb5f191",
   "metadata": {},
   "source": [
    "The PyXAI library also provides a way to test that an explanation is actually a tree-specific explanation:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d487e10f",
   "metadata": {},
   "source": [
    "## Example from Hand-Crafted Trees"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b37b50b",
   "metadata": {},
   "source": [
    "For this example, we take the Boosted Tree of the [Building Models](/documentation/learning/builder/BTbuilder/) page. \n",
    "\n",
    "The next figure represents a Boosted  Tree using $4$  features ($A_1$, $A_2$, $A_3$ and $A_4$), where $A_1$ and $A_2$ are numerical, $A_3$ is categorical and $A_4$ is a Boolean. In this figure, we show worst instances and the corresponding weights (in dark red) of the sufficient reason $t$ = ($A_1 = 4$, $A_4 = 1$) for the instance $x$ = ($A_1 = 4$, $A_2 = 3$, $A_3 = 1$, $A_4 = 1$).\n",
    "\n",
    "<img src=\"attachment:BTts1.png\" alt=\"BTts1\" width=\"700\" />\n",
    "\n",
    "This sufficient reason $t$ = ($A_1 = 4$, $A_4 = 1$) is not a tree-specific explanation for $x$ given $BT$. Indeed, we have $w_\\downarrow(t, T_1) = -0.3$, $w_\\downarrow(t, T_2) = 0.3$, and $w_\\downarrow(t, T_3) = -0.4$, hence \n",
    "$w_\\downarrow(t, T_1) + w_\\downarrow(t, T_2) + w_\\downarrow(t, T_3) = -0.4 < 0$ while $w(F, x) = 0.9 > 0$.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca3b6756",
   "metadata": {},
   "source": [
    "In the next figure, in this figure, $t' = (A_2 > 1, A_4 = 1)$  is a tree-specific explanation for $x$ given $BT$.\n",
    "\n",
    "<img src=\"attachment:BTts2.png\" alt=\"BTts2\" width=\"700\" />\n",
    "\n",
    "We have $w_\\downarrow(t', T_1) = -0.3$, $w_\\downarrow(t', T_2) = 0.5$, and $w_\\downarrow(t', T_3) = 0.1$, hence \n",
    "$w_\\downarrow(t', T_1) + w_\\downarrow(t', T_2) + w_\\downarrow(t', T_3) = 0.3 > 0$.\n",
    "It can be verified that $t'$ also is a sufficient reason for $x$ given $BT$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b4b8844",
   "metadata": {},
   "source": [
    "We show now how to get them with PyXAI. We start by building the Boosted Tree:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "db173b61",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyxai import Builder, Explaining\n",
    "\n",
    "node1_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.3)\n",
    "node1_2 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=-0.3, right=node1_1)\n",
    "node1_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=0.4, right=node1_2)\n",
    "node1_4 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=node1_3)\n",
    "tree1 = Builder.DecisionTree(4, node1_4)\n",
    "\n",
    "node2_1 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.4, right=0.3)\n",
    "node2_2 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=node2_1)\n",
    "node2_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node2_2, right=0.5)\n",
    "tree2 = Builder.DecisionTree(4, node2_3)\n",
    "\n",
    "node3_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=0.2, right=0.3)\n",
    "node3_2_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.2)\n",
    "node3_2_2 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.1, right=node3_1)\n",
    "node3_2_3 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=0.1)\n",
    "node3_3_1 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node3_2_1, right=node3_2_2)\n",
    "node3_3_2 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=-0.4, right=node3_2_3)\n",
    "node3_4 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=node3_3_1, right=node3_3_2)\n",
    "tree3 = Builder.DecisionTree(4, node3_4)\n",
    "\n",
    "BT = Builder.BoostedTrees([tree1, tree2, tree3], n_classes=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45cce4b2",
   "metadata": {},
   "source": [
    "We compute a tree-specific explanation for this instance: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "f35d03a3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "target_prediction: 1\n",
      "tree specific: (1, 2)\n",
      "minimal: (1, 2)\n"
     ]
    }
   ],
   "source": [
    "explainer = Explaining.initialize(BT)\n",
    "explainer.set_instance((4,3,2,1))\n",
    "\n",
    "tree_specific = explainer.tree_specific_reason()\n",
    "print(\"target_prediction:\", explainer.target_prediction)\n",
    "print(\"tree specific:\", tree_specific)\n",
    "assert explainer.is_tree_specific_reason(tree_specific)\n",
    "\n",
    "minimal = explainer.minimal_tree_specific_reasons(n=1)\n",
    "print(\"minimal:\", minimal)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e92e4ae4",
   "metadata": {},
   "source": [
    "## Example from a Real Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d58684a0",
   "metadata": {},
   "source": [
    "For this example, we take the compas.csv dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "ab7788c9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------   Information   ---------------\n",
      "Problem type: classification\n",
      "Instances type: tabular\n",
      "Labels type: classes\n",
      "\n",
      "Dataset path: ../../../dataset/compas.csv\n",
      "nFeatures (nAttributes, with the labels): 11\n",
      "nInstances (nObservations): 6172\n",
      "nLabels: 2\n",
      "---------------   Model creation, fitting and evaluation  ---------------\n",
      "Splitting method: hold-out\n",
      "Problem type: classification\n",
      "Models type: boosted-tree\n",
      "model_parameters: {}\n",
      "---------   Evaluation Information   ---------\n",
      "For the evaluation number 0:\n",
      "Metrics:\n",
      "   sklearn_confusion_matrix: [[634, 196], [287, 426]]\n",
      "   precision: 68.48874598070739\n",
      "   recall: 59.74754558204769\n",
      "   f1_score: 63.820224719101134\n",
      "   specificity: 76.3855421686747\n",
      "   true_positive: 426\n",
      "   true_negative: 634\n",
      "   false_positive: 196\n",
      "   false_negative: 287\n",
      "   accuracy: 68.69734283862606\n",
      "Number of Training instances: 4629\n",
      "Number of Testing instances: 1543\n",
      "\n",
      "---------------   Explainer   ----------------\n",
      "For the split number 0:\n",
      "**Boosted Tree model**\n",
      "NClasses: 2\n",
      "nTrees: 100\n",
      "nVariables: 38\n",
      "\n",
      "---------------   Instances   ----------------\n",
      "Number of instances selected: 1\n",
      "----------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "from pyxai import Learning, Explaining\n",
    "\n",
    "learner = Learning.Xgboost(\"../../../dataset/compas.csv\", problem_type='classification')\n",
    "model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.BT)\n",
    "instance, prediction = learner.get_instances(model, n=1, is_correct=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a20938fa",
   "metadata": {},
   "source": [
    "We compute a tree-specific explanation and a minimal one.\n",
    "Since finding a minimal one explanation is time consuming, we set a time limit. In the case of the time limit is reached, we may obtain only an approximation of a minimal explanation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "1ccce2fc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "instance: Misdemeanor             0\n",
      "Number_of_Priors        0\n",
      "score_factor            0\n",
      "Age_Above_FourtyFive    1\n",
      "Age_Below_TwentyFive    0\n",
      "African_American        0\n",
      "Asian                   0\n",
      "Hispanic                0\n",
      "Native_American         0\n",
      "Other                   1\n",
      "Female                  0\n",
      "Name: 0, dtype: int64\n",
      "prediction: 0\n",
      "\n",
      "\n",
      "tree specific reason: (-1, -2, -3, -4, 5, 6, -7, -8, -9, -10, -11, -13, -14, -15, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28, -33, -37)\n",
      "len tree specific reason: 28\n",
      "to features ['Number_of_Priors < 1.0', 'score_factor < 1.0', 'Age_Above_FourtyFive >= 1.0', 'Age_Below_TwentyFive < 1.0', 'African_American < 1.0', 'Asian < 1.0', 'Hispanic < 1.0', 'Other >= 1.0', 'Female < 1.0']\n",
      "is tree specific reason:  True\n",
      "\n",
      "\n",
      "minimal: ((-1, -2, -3, -4, 5, -7, -8, -10, -11, -13, -15, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28, -29, -33, -37, -38), (-1, -2, -3, -4, 5, -7, -8, -10, -11, -13, -15, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28, -29, -33, -34, -37), (-1, -2, -3, -4, 5, -7, -8, -10, -11, -13, -14, -15, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28, -29, -33, -37))\n",
      "minimal: 3\n",
      "to features ['Number_of_Priors < 1.0', 'score_factor < 1.0', 'Age_Above_FourtyFive >= 1.0', 'Age_Below_TwentyFive < 1.0', 'African_American < 1.0', 'Asian < 1.0', 'Hispanic < 1.0', 'Other >= 1.0', 'Female < 1.0']\n",
      "is majoritary reason:  True\n"
     ]
    }
   ],
   "source": [
    "explainer = Explaining.initialize(model, instance)\n",
    "print(\"instance:\", instance)\n",
    "print(\"prediction:\", prediction)\n",
    "print()\n",
    "tree_specific_reason = explainer.tree_specific_reason()\n",
    "#for s in sufficient_reasons:\n",
    "print(\"\\ntree specific reason:\", tree_specific_reason)\n",
    "print(\"len tree specific reason:\", len(tree_specific_reason))\n",
    "print(\"to features\", explainer.to_features(tree_specific_reason))\n",
    "print(\"is tree specific reason: \", explainer.is_tree_specific_reason(tree_specific_reason))\n",
    "print()\n",
    "minimal = explainer.minimal_tree_specific_reasons(time_limit=5)\n",
    "print(\"\\nminimal:\", minimal)\n",
    "print(\"minimal:\", len(minimal))\n",
    "print(\"to features\", explainer.to_features(tree_specific_reason))\n",
    "print(\"is majoritary reason: \", explainer.is_tree_specific_reason(tree_specific_reason))\n",
    "if explainer.elapsed_time == Explaining.TIMEOUT: \n",
    "    print(\"time out... It is an approximation\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b9c2c4d",
   "metadata": {},
   "source": [
    "Other types of explanations are presented in the [Explanations Computation](/documentation/explanations/BTexplanations/) page.\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}