{ "cells": [ { "cell_type": "markdown", "id": "91e36913", "metadata": {}, "source": [ "## Tree-Specific Explanations" ] }, { "cell_type": "markdown", "id": "514a5144", "metadata": {}, "source": [ "Let $BT$ be a Boosted Tree composed of {$T_1,\\ldots T_n$} regression trees and $x$ be an instance.\n", "* A **worst instance** extending $t$ given $F$ is an instance $x'$ such that $t \\subseteq t_{x'}$ and: $$x' = \\mathit{argmin}_{x'': t \\subseteq t_{x''}}(\\{w(F, x'')\\})$$\n", "* A **best instance** extending $t$ given $F$ is an instance $x'$ such that $t \\subseteq t_{x'}$ and: $$x' = \\mathit{argmax}_{x'': t \\subseteq t_{x''}}(\\{w(F, x'')\\})$$\n", "\n", "$W(t, F)$ (resp. $B(t, F)$) denotes the set of worst (resp. best) instances covered by $t$ given $F$, and $w_\\downarrow(t, F)$ (resp. $w_\\uparrow(t, F)$) denotes the weight of the worst (resp. best) instance covered by $t$ given $F$. For example, for the sufficient reason $t$ = ($A_1 > 2$, $A_4 = 1$) for the instance $x$ = ($A_1=4$, $A_2 = 3$, $A_3 = 1$, $A_4 = 1$) of the [sufficient reason](/documentation/classification/BTexplanations/sufficient/) page, a worst instance can be $x'$ = ($A_1=4$, $A_2 = 1$, $A_3 = 1$, $A_4 = 1$) with $w(F, x') = 0.3$ . Identifying a worst (resp. best) instance $x$\n", "for given a boosted tree is intractable. Interestingly, when the classifier consists of a regression tree,\n", "identifying an element of $W(t, T)$ (resp. $B(t, T)$) for a tree $T$ is easy. " ] }, { "cell_type": "markdown", "id": "3fb21492", "metadata": {}, "source": [ "We are now ready to define the notion of **tree-specific** explanation $t$ for an instance $x$ given a boosted tree $BT$. We start with the binary case, i.e., when $BT$ consists of a single forest $F$:\n", "\n", "* If $BT(x) = 1$, then $t$ is a **tree-specific** explanation for $x$ given $F$ if and only if $t$ is a subset of $t_{x}$ such that $\\sum_{k=1}^p w_\\downarrow(t, T_k) > 0$ and no proper subset of $t$ satisfies the latter condition.\n", "* If $BT(x) = 0$, then $t$ is a **tree-specific** explanation for $x$ given $F$ if and only if $t$ is a subset of $t_{x}$ such that $\\sum_{k=1}^p w_\\uparrow(t, T_k) \\leq 0$ and no proper subset of $t$ satisfies the latter condition.\n" ] }, { "cell_type": "markdown", "id": "b780cae3", "metadata": {}, "source": [ "More generally, in the multi-class setting, TS-explanations can be defined as follows:\n", "\n", "Let $BT = \\{F^1,\\cdots, F^{m}\\}$ be a boosted tree over $\\{A_1, \\ldots, A_n\\}$ where each $F^j$ ($j \\in [m]$) contains $p_j$ trees, and $x$ such that $BT(x) = i$. Conceptually, $t$ is a **tree-specific** explanation for $x$ given $BT$ if and only if $t$ is a subset of $t_{x}$ such that for every $j \\in [m] \\setminus \\{i\\}$, we have $\\sum_{k=1}^{p_i} w_\\downarrow(t, T_k^i) > \\sum_{k=1}^{p_j} w_\\uparrow(t, T_k^j)$, and no proper subset of $t$ satisfies the latter condition.\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "701d48f7", "metadata": {}, "source": [ "In general, the notions of tree-specific explanation and of sufficient reason do not coincide. Indeed, a sufficient reason is a prime implicant (covering $x$) of the Boosted Tree $BT$, while a tree-specific explanation is an implicant $t$ (covering $x$). Since there is a simple, linear-time algorithm for computing $w_\\downarrow(t, F)$ and $w_\\uparrow(t, F)$ and for deriving a worst-case and a best-case instance, the **tree-specific** explanations for $x$ are much easier to compute than the sufficient reasons and remain abductive.\n", "\n", "More information about tree-specific explanations can be found in the paper [Computing Abductive Explanations\n", "for Boosted Trees](https://proceedings.mlr.press/v206/audemard23a/audemard23a.pdf)." ] }, { "cell_type": "markdown", "id": "e75c9f20", "metadata": {}, "source": [ "| <ExplainerBT Object>.tree_specific_reason(*, n_iterations=50, time_limit=None, seed=0): | \n", "| :----------- | \n", "|This method calls a greedy algorithm to compute a tree-specific explanation. The algorithm is run n_iterations times and the smallest tree specific reason that has been computed is returned.

Excluded features are supported. The reasons are in the form of binary variables, you must use the ```to_features``` method if you want to obtain a representation based on the features considered at start.|\n", "| time_limit ```Integer``` ```None```: The time limit of the method in seconds. Sets this to ```None``` to give this process an infinite amount of time. Default value is ```None```.|\n", "| n_iterations ```Integer```: The number of iterations done by the greedy algorithm. Default value is 50.|\n", "| seed ```Integer```: The seed when the greedy algorithm is used. Set to 0 this parameter in order to use a random seed. Default value is 0|\n" ] }, { "cell_type": "markdown", "id": "714c0d17", "metadata": {}, "source": [ "A natural way of improving the quality of tree-specific explanations is to seek for the most parsimonious ones. A **minimal tree-specific explanation** for $x$ is a tree-specific explanation for $x$ that\n", "contains a minimal number of literals. In other words, a **minimal tree-specific explanation** has a minimal size. " ] }, { "cell_type": "markdown", "id": "48840e70", "metadata": {}, "source": [ "| <ExplainerBT Object>.minimal_tree_specific_reason(*, n=1, time_limit=None): | \n", "| :----------- | \n", "|This method considers an optimisation problem (COP) representing the boosted tree encoded using [PyCSP](https://pycsp.org). To solve it, several calls to a CP solver ([ACE](https://github.com/xcsp3team/ace)) are performed and the result of each call is a minimal tree-specific explanation. The method prevents from finding the same explanation twice or more by adding constraints between each invocation.

Returns n minimal tree-pecific explanations of the current instance in a Tuple (when n is set to 1, does not return a Tuple but just the reason). Supports the excluded features. The reasons are in the form of binary variables, you must use the ```to_features``` method if you want to obtain a representation based on the features considered at start.|\n", "| n ```Integer```: The number of minimal tree specific reasons computed. Default value is 1.|\n", "| time_limit ```Integer``` ```None```: The time limit of the method in seconds. Set this to ```None``` to give this process an infinite amount of time. Default value is ```None```.|" ] }, { "cell_type": "markdown", "id": "4fb5f191", "metadata": {}, "source": [ "The PyXAI library also provides a way to test that an explanation is actually a tree-specific explanation:" ] }, { "cell_type": "markdown", "id": "779a1fda", "metadata": {}, "source": [ "One can also compute preferred tree-specific explanations. Indeed, the user may prefer explanations containing some features and can provide weights in order to specify preferences. Please take a look to the [Preferences](/documentation/explainer/preferences/) page for more information on preference handling." ] }, { "cell_type": "markdown", "id": "c4bf853b", "metadata": {}, "source": [ "| <ExplainerBT Object>.prefered_tree_specific_reason(*, method, n=1, time_limit=None, weights=None): | \n", "| :----------- | \n", "|This method considers an optimisation problem (COP) representing the boosted tree encoded using [PyCSP](https://pycsp.org). To solve it, several calls to a CP solver ([ACE](https://github.com/xcsp3team/ace)) are performed and the result of each call depending the ```method``` used. IIf the method is ```PreferredReasonMethod.WEIGHTS``` then weights are given by the parameter ```weights```, otherwise this parameter is useless. If the method is ```PreferredReasonMethod.INCLUSION_PREFERRED``` then the partition of features is given by the parameter features_partition, otherwise this parameter is useless.

Returns ```n``` preferred tree-specific explanations for the current instance in a Tuple (when ```n``` is set to 1, does not return a Tuple but just the reason). Supports the excluded features. The reasons are in the form of binary variables, use to_features method if you want to obtain a representation based on the features considered at start.|\n", "| method ```PreferredReasonMethod.WEIGHTS``` ```PreferredReasonMethods.SHAPLEY``` ```PreferredReasonMethod.FEATURE_IMPORTANCE``` ```PreferredReasonMethod.WORD_FREQUENCY```: The method used to discriminate features.\n", "| time_limit ```Integer``` ```None```: The time limit of the method in seconds. Set this to ```None``` to give this process an infinite amount of time. Default value is ```None```.|\n", "| n ```Integer```: The number of tree specific reasons computed. Currently n=1 or n=Exmplainer.ALL is only supported. Default value is 1.|\n", "| weights ```List```: The weights (list of floats, one per feature, used to discriminate features. Only useful when ```method``` is ```PreferredReasonMethod.WEIGHTS```. Default value is ```None```.|" ] }, { "cell_type": "markdown", "id": "3a17fe2f", "metadata": {}, "source": [ "| <ExplainerBT Object>.is_tree_specific_reason(reason): | \n", "| :----------- | \n", "| This method checks whether an explanation is a tree-specific one. To do it, we first call the method ```is_implicant``` to check whether this explanation leads to the correct prediction or not. |\n", "| reason ```List``` of ```Integer```: The explanation to be checked.|" ] }, { "cell_type": "markdown", "id": "d487e10f", "metadata": {}, "source": [ "## Example from Hand-Crafted Trees" ] }, { "cell_type": "markdown", "id": "8b37b50b", "metadata": {}, "source": [ "For this example, we take the Boosted Tree of the [Building Models](/documentation/learning/builder/BTbuilder/) page. \n", "\n", "The next figure represents a Boosted Tree using $4$ features ($A_1$, $A_2$, $A_3$ and $A_4$), where $A_1$ and $A_2$ are numerical, $A_3$ is categorical and $A_4$ is a Boolean. In this figure, we show worst instances and the corresponding weights (in dark red) of the sufficient reason $t$ = ($A_1 = 4$, $A_4 = 1$) for the instance $x$ = ($A_1 = 4$, $A_2 = 3$, $A_3 = 1$, $A_4 = 1$).\n", "\n", "\"BTts1\"\n", "\n", "This sufficient reason $t$ = ($A_1 = 4$, $A_4 = 1$) is not a tree-specific explanation for $x$ given $BT$. Indeed, we have $w_\\downarrow(t, T_1) = -0.3$, $w_\\downarrow(t, T_2) = 0.3$, and $w_\\downarrow(t, T_3) = -0.4$, hence \n", "$w_\\downarrow(t, T_1) + w_\\downarrow(t, T_2) + w_\\downarrow(t, T_3) = -0.4 < 0$ while $w(F, x) = 0.9 > 0$.\n" ] }, { "cell_type": "markdown", "id": "ca3b6756", "metadata": {}, "source": [ "In the next figure, in this figure, $t' = (A_2 > 1, A_4 = 1)$ is a tree-specific explanation for $x$ given $BT$.\n", "\n", "\"BTts2\"\n", "\n", "We have $w_\\downarrow(t', T_1) = -0.3$, $w_\\downarrow(t', T_2) = 0.5$, and $w_\\downarrow(t', T_3) = 0.1$, hence \n", "$w_\\downarrow(t', T_1) + w_\\downarrow(t', T_2) + w_\\downarrow(t', T_3) = 0.3 > 0$.\n", "It can be verified that $t'$ also is a sufficient reason for $x$ given $BT$." ] }, { "cell_type": "markdown", "id": "6b4b8844", "metadata": {}, "source": [ "We show now how to get them with PyXAI. We start by building the Boosted Tree:" ] }, { "cell_type": "code", "execution_count": 1, "id": "db173b61", "metadata": {}, "outputs": [], "source": [ "from pyxai import Builder, Explainer\n", "\n", "node1_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.3)\n", "node1_2 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=-0.3, right=node1_1)\n", "node1_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=0.4, right=node1_2)\n", "node1_4 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=node1_3)\n", "tree1 = Builder.DecisionTree(4, node1_4)\n", "\n", "node2_1 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.4, right=0.3)\n", "node2_2 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=node2_1)\n", "node2_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node2_2, right=0.5)\n", "tree2 = Builder.DecisionTree(4, node2_3)\n", "\n", "node3_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=0.2, right=0.3)\n", "node3_2_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.2)\n", "node3_2_2 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.1, right=node3_1)\n", "node3_2_3 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=0.1)\n", "node3_3_1 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node3_2_1, right=node3_2_2)\n", "node3_3_2 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=-0.4, right=node3_2_3)\n", "node3_4 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=node3_3_1, right=node3_3_2)\n", "tree3 = Builder.DecisionTree(4, node3_4)\n", "\n", "BT = Builder.BoostedTrees([tree1, tree2, tree3], n_classes=2)" ] }, { "cell_type": "markdown", "id": "45cce4b2", "metadata": {}, "source": [ "We compute a tree-specific explanation for this instance: " ] }, { "cell_type": "code", "execution_count": 2, "id": "f35d03a3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "target_prediction: 1\n", "tree specific: (1, 2)\n", "minimal: (1, 2)\n" ] } ], "source": [ "explainer = Explainer.initialize(BT)\n", "explainer.set_instance((4,3,2,1))\n", "\n", "tree_specific = explainer.tree_specific_reason()\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "print(\"tree specific:\", tree_specific)\n", "assert explainer.is_tree_specific_reason(tree_specific)\n", "\n", "minimal = explainer.minimal_tree_specific_reason()\n", "print(\"minimal:\", minimal)\n", "assert minimal == (1, 2), \"The minimal reason is not good !\"\n" ] }, { "cell_type": "markdown", "id": "e92e4ae4", "metadata": {}, "source": [ "## Example from a Real Dataset" ] }, { "cell_type": "markdown", "id": "d58684a0", "metadata": {}, "source": [ "For this example, we take the compas.csv dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. " ] }, { "cell_type": "code", "execution_count": 3, "id": "ab7788c9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data:\n", " Number_of_Priors score_factor Age_Above_FourtyFive \n", "0 0 0 1 \\\n", "1 0 0 0 \n", "2 4 0 0 \n", "3 0 0 0 \n", "4 14 1 0 \n", "... ... ... ... \n", "6167 0 1 0 \n", "6168 0 0 0 \n", "6169 0 0 1 \n", "6170 3 0 0 \n", "6171 2 0 0 \n", "\n", " Age_Below_TwentyFive African_American Asian Hispanic \n", "0 0 0 0 0 \\\n", "1 0 1 0 0 \n", "2 1 1 0 0 \n", "3 0 0 0 0 \n", "4 0 0 0 0 \n", "... ... ... ... ... \n", "6167 1 1 0 0 \n", "6168 1 1 0 0 \n", "6169 0 0 0 0 \n", "6170 0 1 0 0 \n", "6171 1 0 0 1 \n", "\n", " Native_American Other Female Misdemeanor Two_yr_Recidivism \n", "0 0 1 0 0 0 \n", "1 0 0 0 0 1 \n", "2 0 0 0 0 1 \n", "3 0 1 0 1 0 \n", "4 0 0 0 0 1 \n", "... ... ... ... ... ... \n", "6167 0 0 0 0 0 \n", "6168 0 0 0 0 0 \n", "6169 0 1 0 0 0 \n", "6170 0 0 1 1 0 \n", "6171 0 0 1 0 1 \n", "\n", "[6172 rows x 12 columns]\n", "-------------- Information ---------------\n", "Dataset name: ../../../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 12\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Evaluation ---------------\n", "method: HoldOut\n", "output: BT\n", "learner_type: Classification\n", "learner_options: {'seed': 0, 'max_depth': None, 'eval_metric': 'mlogloss'}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "metrics:\n", " accuracy: 66.73866090712744\n", "nTraining instances: 4320\n", "nTest instances: 1852\n", "\n", "--------------- Explainer ----------------\n", "For the evaluation number 0:\n", "**Boosted Tree model**\n", "NClasses: 2\n", "nTrees: 100\n", "nVariables: 42\n", "\n", "--------------- Instances ----------------\n", "number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explainer\n", "\n", "learner = Learning.Xgboost(\"../../../dataset/compas.csv\", learner_type=Learning.CLASSIFICATION)\n", "model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.BT)\n", "instance, prediction = learner.get_instances(model, n=1, correct=True)" ] }, { "cell_type": "markdown", "id": "a20938fa", "metadata": {}, "source": [ "We compute a tree-specific explanation and a minimal one.\n", "Since finding a minimal one explanation is time consuming, we set a time limit. In the case of the time limit is reached, we may obtain only an approximation of a minimal explanation." ] }, { "cell_type": "code", "execution_count": 4, "id": "1ccce2fc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: [0 0 1 0 0 0 0 0 1 0 0]\n", "prediction: 0\n", "\n", "\n", "tree specific reason: (-1, -2, -3, -4, 5, -6, -7, -8, -9, -10, -11, -14, 15, -19, -20, -21, -25, -26, -29, -32, -34, -37, -38, -40)\n", "len tree specific reason: 24\n", "to features ('Number_of_Priors < 0.5', 'score_factor < 0.5', 'Age_Above_FourtyFive >= 0.5', 'Age_Below_TwentyFive < 0.5', 'African_American < 0.5', 'Asian < 0.5', 'Hispanic < 0.5', 'Native_American < 0.5', 'Other >= 0.5', 'Female < 0.5', 'Misdemeanor < 0.5')\n", "is tree specific reason: True\n", "\n", "\n", "minimal: (-1, -2, -3, -4, 5, -6, -7, -8, -9, -10, -11, 15, -16, -18, -19, -21, -28, -29, -32, -33, -34, -37)\n", "minimal: 22\n", "to features ('Number_of_Priors < 0.5', 'score_factor < 0.5', 'Age_Above_FourtyFive >= 0.5', 'Age_Below_TwentyFive < 0.5', 'African_American < 0.5', 'Asian < 0.5', 'Hispanic < 0.5', 'Native_American < 0.5', 'Other >= 0.5', 'Female < 0.5', 'Misdemeanor < 0.5')\n", "is majoritary reason: True\n", "time out... It is an approximation\n" ] } ], "source": [ "explainer = Explainer.initialize(model, instance)\n", "print(\"instance:\", instance)\n", "print(\"prediction:\", prediction)\n", "print()\n", "tree_specific_reason = explainer.tree_specific_reason()\n", "#for s in sufficient_reasons:\n", "print(\"\\ntree specific reason:\", tree_specific_reason)\n", "print(\"len tree specific reason:\", len(tree_specific_reason))\n", "print(\"to features\", explainer.to_features(tree_specific_reason))\n", "print(\"is tree specific reason: \", explainer.is_tree_specific_reason(tree_specific_reason))\n", "print()\n", "minimal = explainer.minimal_tree_specific_reason(time_limit=5)\n", "print(\"\\nminimal:\", minimal)\n", "print(\"minimal:\", len(minimal))\n", "print(\"to features\", explainer.to_features(tree_specific_reason))\n", "print(\"is majoritary reason: \", explainer.is_tree_specific_reason(tree_specific_reason))\n", "if explainer.elapsed_time == Explainer.TIMEOUT: \n", " print(\"time out... It is an approximation\")\n" ] }, { "cell_type": "markdown", "id": "7b9c2c4d", "metadata": {}, "source": [ "Other types of explanations are presented in the [Explanations Computation](/documentation/explanations/BTexplanations/) page." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }