{ "cells": [ { "cell_type": "markdown", "id": "91e36913", "metadata": {}, "source": [ "## Tree-Specific Explanations" ] }, { "cell_type": "markdown", "id": "514a5144", "metadata": {}, "source": [ "Let $BT$ be a Boosted Tree composed of {$T_1,\\ldots T_n$} regression trees and $x$ be an instance.\n", "* A **worst instance** extending $t$ given $F$ is an instance $x'$ such that $t \\subseteq t_{x'}$ and: $$x' = \\mathit{argmin}_{x'': t \\subseteq t_{x''}}(\\{w(F, x'')\\})$$\n", "* A **best instance** extending $t$ given $F$ is an instance $x'$ such that $t \\subseteq t_{x'}$ and: $$x' = \\mathit{argmax}_{x'': t \\subseteq t_{x''}}(\\{w(F, x'')\\})$$\n", "\n", "$W(t, F)$ (resp. $B(t, F)$) denotes the set of worst (resp. best) instances covered by $t$ given $F$, and $w_\\downarrow(t, F)$ (resp. $w_\\uparrow(t, F)$) denotes the weight of the worst (resp. best) instance covered by $t$ given $F$. For example, for the sufficient reason $t$ = ($A_1 > 2$, $A_4 = 1$) for the instance $x$ = ($A_1=4$, $A_2 = 3$, $A_3 = 1$, $A_4 = 1$) of the [sufficient reason](/documentation/classification/BTexplanations/sufficient/) page, a worst instance can be $x'$ = ($A_1=4$, $A_2 = 1$, $A_3 = 1$, $A_4 = 1$) with $w(F, x') = 0.3$ . Identifying a worst (resp. best) instance $x$\n", "for given a boosted tree is intractable. Interestingly, when the classifier consists of a regression tree,\n", "identifying an element of $W(t, T)$ (resp. $B(t, T)$) for a tree $T$ is easy. " ] }, { "cell_type": "markdown", "id": "3fb21492", "metadata": {}, "source": [ "We are now ready to define the notion of **tree-specific** explanation $t$ for an instance $x$ given a boosted tree $BT$. We start with the binary case, i.e., when $BT$ consists of a single forest $F$:\n", "\n", "* If $BT(x) = 1$, then $t$ is a **tree-specific** explanation for $x$ given $F$ if and only if $t$ is a subset of $t_{x}$ such that $\\sum_{k=1}^p w_\\downarrow(t, T_k) > 0$ and no proper subset of $t$ satisfies the latter condition.\n", "* If $BT(x) = 0$, then $t$ is a **tree-specific** explanation for $x$ given $F$ if and only if $t$ is a subset of $t_{x}$ such that $\\sum_{k=1}^p w_\\uparrow(t, T_k) \\leq 0$ and no proper subset of $t$ satisfies the latter condition.\n" ] }, { "cell_type": "markdown", "id": "b780cae3", "metadata": {}, "source": [ "More generally, in the multi-class setting, TS-explanations can be defined as follows:\n", "\n", "Let $BT = \\{F^1,\\cdots, F^{m}\\}$ be a boosted tree over $\\{A_1, \\ldots, A_n\\}$ where each $F^j$ ($j \\in [m]$) contains $p_j$ trees, and $x$ such that $BT(x) = i$. Conceptually, $t$ is a **tree-specific** explanation for $x$ given $BT$ if and only if $t$ is a subset of $t_{x}$ such that for every $j \\in [m] \\setminus \\{i\\}$, we have $\\sum_{k=1}^{p_i} w_\\downarrow(t, T_k^i) > \\sum_{k=1}^{p_j} w_\\uparrow(t, T_k^j)$, and no proper subset of $t$ satisfies the latter condition.\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "701d48f7", "metadata": {}, "source": [ "In general, the notions of tree-specific explanation and of sufficient reason do not coincide. Indeed, a sufficient reason is a prime implicant (covering $x$) of the Boosted Tree $BT$, while a tree-specific explanation is an implicant $t$ (covering $x$). Since there is a simple, linear-time algorithm for computing $w_\\downarrow(t, F)$ and $w_\\uparrow(t, F)$ and for deriving a worst-case and a best-case instance, the **tree-specific** explanations for $x$ are much easier to compute than the sufficient reasons and remain abductive.\n", "\n", "More information about tree-specific explanations can be found in the paper [Computing Abductive Explanations\n", "for Boosted Trees](https://proceedings.mlr.press/v206/audemard23a/audemard23a.pdf).\n", "\n", "The function ```ExplainerBT.tree_specific_reason``` allows computing this kind of explanation.\n", "\n", "The library also provides a way to check that a reason is tree specific using the function ```is_tree_specific_reason```." ] }, { "cell_type": "markdown", "id": "714c0d17", "metadata": {}, "source": [ "A natural way of improving the quality of tree-specific explanations is to seek for the most parsimonious ones. A **minimal tree-specific explanation** for $x$ is a tree-specific explanation for $x$ that\n", "contains a minimal number of literals. In other words, a **minimal tree-specific explanation** has a minimal size. \n", "\n", "The function ```ExplainerBT.minimal_tree_specific_reasons``` allows computing this kind of explanation." ] }, { "cell_type": "markdown", "id": "4fb5f191", "metadata": {}, "source": [ "The PyXAI library also provides a way to test that an explanation is actually a tree-specific explanation:" ] }, { "cell_type": "markdown", "id": "d487e10f", "metadata": {}, "source": [ "## Example from Hand-Crafted Trees" ] }, { "cell_type": "markdown", "id": "8b37b50b", "metadata": {}, "source": [ "For this example, we take the Boosted Tree of the [Building Models](/documentation/learning/builder/BTbuilder/) page. \n", "\n", "The next figure represents a Boosted Tree using $4$ features ($A_1$, $A_2$, $A_3$ and $A_4$), where $A_1$ and $A_2$ are numerical, $A_3$ is categorical and $A_4$ is a Boolean. In this figure, we show worst instances and the corresponding weights (in dark red) of the sufficient reason $t$ = ($A_1 = 4$, $A_4 = 1$) for the instance $x$ = ($A_1 = 4$, $A_2 = 3$, $A_3 = 1$, $A_4 = 1$).\n", "\n", "\"BTts1\"\n", "\n", "This sufficient reason $t$ = ($A_1 = 4$, $A_4 = 1$) is not a tree-specific explanation for $x$ given $BT$. Indeed, we have $w_\\downarrow(t, T_1) = -0.3$, $w_\\downarrow(t, T_2) = 0.3$, and $w_\\downarrow(t, T_3) = -0.4$, hence \n", "$w_\\downarrow(t, T_1) + w_\\downarrow(t, T_2) + w_\\downarrow(t, T_3) = -0.4 < 0$ while $w(F, x) = 0.9 > 0$.\n" ] }, { "cell_type": "markdown", "id": "ca3b6756", "metadata": {}, "source": [ "In the next figure, in this figure, $t' = (A_2 > 1, A_4 = 1)$ is a tree-specific explanation for $x$ given $BT$.\n", "\n", "\"BTts2\"\n", "\n", "We have $w_\\downarrow(t', T_1) = -0.3$, $w_\\downarrow(t', T_2) = 0.5$, and $w_\\downarrow(t', T_3) = 0.1$, hence \n", "$w_\\downarrow(t', T_1) + w_\\downarrow(t', T_2) + w_\\downarrow(t', T_3) = 0.3 > 0$.\n", "It can be verified that $t'$ also is a sufficient reason for $x$ given $BT$." ] }, { "cell_type": "markdown", "id": "6b4b8844", "metadata": {}, "source": [ "We show now how to get them with PyXAI. We start by building the Boosted Tree:" ] }, { "cell_type": "code", "execution_count": 1, "id": "db173b61", "metadata": {}, "outputs": [], "source": [ "from pyxai import Builder, Explaining\n", "\n", "node1_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.3)\n", "node1_2 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=-0.3, right=node1_1)\n", "node1_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=0.4, right=node1_2)\n", "node1_4 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=node1_3)\n", "tree1 = Builder.DecisionTree(4, node1_4)\n", "\n", "node2_1 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.4, right=0.3)\n", "node2_2 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=node2_1)\n", "node2_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node2_2, right=0.5)\n", "tree2 = Builder.DecisionTree(4, node2_3)\n", "\n", "node3_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=0.2, right=0.3)\n", "node3_2_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.2)\n", "node3_2_2 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.1, right=node3_1)\n", "node3_2_3 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=0.1)\n", "node3_3_1 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node3_2_1, right=node3_2_2)\n", "node3_3_2 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=-0.4, right=node3_2_3)\n", "node3_4 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=node3_3_1, right=node3_3_2)\n", "tree3 = Builder.DecisionTree(4, node3_4)\n", "\n", "BT = Builder.BoostedTrees([tree1, tree2, tree3], n_classes=2)" ] }, { "cell_type": "markdown", "id": "45cce4b2", "metadata": {}, "source": [ "We compute a tree-specific explanation for this instance: " ] }, { "cell_type": "code", "execution_count": 4, "id": "f35d03a3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "target_prediction: 1\n", "tree specific: (1, 2)\n", "minimal: (1, 2)\n" ] } ], "source": [ "explainer = Explaining.initialize(BT)\n", "explainer.set_instance((4,3,2,1))\n", "\n", "tree_specific = explainer.tree_specific_reason()\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "print(\"tree specific:\", tree_specific)\n", "assert explainer.is_tree_specific_reason(tree_specific)\n", "\n", "minimal = explainer.minimal_tree_specific_reasons(n=1)\n", "print(\"minimal:\", minimal)\n" ] }, { "cell_type": "markdown", "id": "e92e4ae4", "metadata": {}, "source": [ "## Example from a Real Dataset" ] }, { "cell_type": "markdown", "id": "d58684a0", "metadata": {}, "source": [ "For this example, we take the compas.csv dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. " ] }, { "cell_type": "code", "execution_count": 5, "id": "ab7788c9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------- Information ---------------\n", "Problem type: classification\n", "Instances type: tabular\n", "Labels type: classes\n", "\n", "Dataset path: ../../../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 11\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Model creation, fitting and evaluation ---------------\n", "Splitting method: hold-out\n", "Problem type: classification\n", "Models type: boosted-tree\n", "model_parameters: {}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "Metrics:\n", " sklearn_confusion_matrix: [[634, 196], [287, 426]]\n", " precision: 68.48874598070739\n", " recall: 59.74754558204769\n", " f1_score: 63.820224719101134\n", " specificity: 76.3855421686747\n", " true_positive: 426\n", " true_negative: 634\n", " false_positive: 196\n", " false_negative: 287\n", " accuracy: 68.69734283862606\n", "Number of Training instances: 4629\n", "Number of Testing instances: 1543\n", "\n", "--------------- Explainer ----------------\n", "For the split number 0:\n", "**Boosted Tree model**\n", "NClasses: 2\n", "nTrees: 100\n", "nVariables: 38\n", "\n", "--------------- Instances ----------------\n", "Number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explaining\n", "\n", "learner = Learning.Xgboost(\"../../../dataset/compas.csv\", problem_type='classification')\n", "model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.BT)\n", "instance, prediction = learner.get_instances(model, n=1, is_correct=True)" ] }, { "cell_type": "markdown", "id": "a20938fa", "metadata": {}, "source": [ "We compute a tree-specific explanation and a minimal one.\n", "Since finding a minimal one explanation is time consuming, we set a time limit. In the case of the time limit is reached, we may obtain only an approximation of a minimal explanation." ] }, { "cell_type": "code", "execution_count": 7, "id": "1ccce2fc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: Misdemeanor 0\n", "Number_of_Priors 0\n", "score_factor 0\n", "Age_Above_FourtyFive 1\n", "Age_Below_TwentyFive 0\n", "African_American 0\n", "Asian 0\n", "Hispanic 0\n", "Native_American 0\n", "Other 1\n", "Female 0\n", "Name: 0, dtype: int64\n", "prediction: 0\n", "\n", "\n", "tree specific reason: (-1, -2, -3, -4, 5, 6, -7, -8, -9, -10, -11, -13, -14, -15, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28, -33, -37)\n", "len tree specific reason: 28\n", "to features ['Number_of_Priors < 1.0', 'score_factor < 1.0', 'Age_Above_FourtyFive >= 1.0', 'Age_Below_TwentyFive < 1.0', 'African_American < 1.0', 'Asian < 1.0', 'Hispanic < 1.0', 'Other >= 1.0', 'Female < 1.0']\n", "is tree specific reason: True\n", "\n", "\n", "minimal: ((-1, -2, -3, -4, 5, -7, -8, -10, -11, -13, -15, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28, -29, -33, -37, -38), (-1, -2, -3, -4, 5, -7, -8, -10, -11, -13, -15, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28, -29, -33, -34, -37), (-1, -2, -3, -4, 5, -7, -8, -10, -11, -13, -14, -15, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28, -29, -33, -37))\n", "minimal: 3\n", "to features ['Number_of_Priors < 1.0', 'score_factor < 1.0', 'Age_Above_FourtyFive >= 1.0', 'Age_Below_TwentyFive < 1.0', 'African_American < 1.0', 'Asian < 1.0', 'Hispanic < 1.0', 'Other >= 1.0', 'Female < 1.0']\n", "is majoritary reason: True\n" ] } ], "source": [ "explainer = Explaining.initialize(model, instance)\n", "print(\"instance:\", instance)\n", "print(\"prediction:\", prediction)\n", "print()\n", "tree_specific_reason = explainer.tree_specific_reason()\n", "#for s in sufficient_reasons:\n", "print(\"\\ntree specific reason:\", tree_specific_reason)\n", "print(\"len tree specific reason:\", len(tree_specific_reason))\n", "print(\"to features\", explainer.to_features(tree_specific_reason))\n", "print(\"is tree specific reason: \", explainer.is_tree_specific_reason(tree_specific_reason))\n", "print()\n", "minimal = explainer.minimal_tree_specific_reasons(time_limit=5)\n", "print(\"\\nminimal:\", minimal)\n", "print(\"minimal:\", len(minimal))\n", "print(\"to features\", explainer.to_features(tree_specific_reason))\n", "print(\"is majoritary reason: \", explainer.is_tree_specific_reason(tree_specific_reason))\n", "if explainer.elapsed_time == Explaining.TIMEOUT: \n", " print(\"time out... It is an approximation\")\n" ] }, { "cell_type": "markdown", "id": "7b9c2c4d", "metadata": {}, "source": [ "Other types of explanations are presented in the [Explanations Computation](/documentation/explanations/BTexplanations/) page.\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }