{ "cells": [ { "cell_type": "markdown", "id": "b1db8c9a", "metadata": {}, "source": [ "# Direct Reason" ] }, { "cell_type": "markdown", "id": "514a5144", "metadata": {}, "source": [ "Let $BT$ be a boosted tree composed of {$T_1,\\ldots T_n$} regression trees and $x$ an instance, the **direct reason** for $x$ is the term of the binary representation corresponding to the conjunction of the terms corresponding to the root-to-leaf paths of all $T_i$ that are compatible with $x$. Due to its simplicity, it is one of the easiest abductive explanation for $x$ to compute, but it can be highly redundant. More information about the direct reason can be found in the article [Computing Abductive Explanations for Boosted Trees](https://arxiv.org/abs/2209.07740)." ] }, { "cell_type": "markdown", "id": "e4432d14", "metadata": {}, "source": [ "The basic methods ([``initialize``](/documentation/api/modules/explaining/), ```set_instance```, ```to_features```, ```is_reason```, ...) of the ```Explaining``` module used in the next examples are described in the [Explainer Principles](/documentation/explainer/) page. " ] }, { "cell_type": "markdown", "id": "d70f9a73", "metadata": {}, "source": [ "## Example from Hand-Crafted Trees" ] }, { "cell_type": "markdown", "id": "3df1afd3", "metadata": {}, "source": [ "For this example, we take an example of binary classification from the [Building Models](/documentation/learning/builder/BTbuilder/) page. This figure represents a boosted tree $BT$ using $4$ features ($A_1$, $A_2$, $A_3$ and $A_4$), where $A_1$ and $A_2$ are numerical, $A_3$ is categorical and $A_4$ is Boolean. The direct reason for the instance $x$ = ($A_1=4$, $A_2 = 3$, $A_3 = 1$, $A_4 = 1$) is in red. This reason contains all features of the instance. \n", "\n", "\"BTdirect\"\n", "\n", "We have $w(T_1, x)=0.3$, $w(T_2, x)=0.5$ and $w(T_3, x)=0.1$. So $W(F, x) = 0.9$. As we are in the case of binary classification and $W(F, x) > 0$, $x$ is classified as a positive instance ($BT(x) = 1$).\n", "\n", "{: .attention }\n", "> We consider that the features $A_3$ and $A_4$ are numerical. Categorical and Boolean features will be implemented in future versions of PyXAI. \n", "\n", "We now show how to get direct reasons using PyXAI: " ] }, { "cell_type": "code", "execution_count": 1, "id": "411398a5", "metadata": {}, "outputs": [], "source": [ "from pyxai import Builder, Explaining\n", "\n", "node1_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.3)\n", "node1_2 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=-0.3, right=node1_1)\n", "node1_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=0.4, right=node1_2)\n", "node1_4 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=node1_3)\n", "tree1 = Builder.DecisionTree(4, node1_4)\n", "\n", "node2_1 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.4, right=0.3)\n", "node2_2 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=node2_1)\n", "node2_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node2_2, right=0.5)\n", "tree2 = Builder.DecisionTree(4, node2_3)\n", "\n", "node3_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=0.2, right=0.3)\n", "node3_2_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.2)\n", "node3_2_2 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.1, right=node3_1)\n", "node3_2_3 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=0.1)\n", "node3_3_1 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node3_2_1, right=node3_2_2)\n", "node3_3_2 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=-0.4, right=node3_2_3)\n", "node3_4 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=node3_3_1, right=node3_3_2)\n", "tree3 = Builder.DecisionTree(4, node3_4)\n", "\n", "BT = Builder.BoostedTrees([tree1, tree2, tree3], n_classes=2)" ] }, { "cell_type": "markdown", "id": "a2263e89", "metadata": {}, "source": [ "We compute the direct reason for this instance: " ] }, { "cell_type": "code", "execution_count": 2, "id": "935837ea", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: (4,3,2,1)\n", "binary_representation: (1, 2, 3, 4)\n", "target_prediction: 1\n", "direct: (1, 2, 3, 4)\n", "to_features: ['f1 > 2', 'f2 > 1', 'f3 == 1', 'f4 == 1']\n" ] } ], "source": [ "explainer = Explaining.initialize(BT)\n", "explainer.set_instance((4,3,1,1))\n", "direct = explainer.direct_reason()\n", "print(\"instance: (4,3,2,1)\")\n", "print(\"binary_representation:\", explainer.binary_representation)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "print(\"direct:\", direct)\n", "print(\"to_features:\", explainer.to_features(direct))\n" ] }, { "cell_type": "markdown", "id": "fc75b1a7", "metadata": {}, "source": [ "As you can see, in this case, the direct reason coincides with the full instance." ] }, { "cell_type": "markdown", "id": "4061b821", "metadata": {}, "source": [ "## Example from a Real Dataset" ] }, { "cell_type": "markdown", "id": "7c187df9", "metadata": {}, "source": [ "For this example, we take the ```compas.csv``` dataset. We create one model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. " ] }, { "cell_type": "code", "execution_count": 3, "id": "e3fe96a1", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------- Information ---------------\n", "Problem type: classification\n", "Instances type: tabular\n", "Labels type: classes\n", "\n", "Dataset path: ../../../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 11\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Model creation, fitting and evaluation ---------------\n", "Splitting method: hold-out\n", "Problem type: classification\n", "Models type: boosted-tree\n", "model_parameters: {}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "Metrics:\n", " sklearn_confusion_matrix: [[653, 206], [295, 389]]\n", " precision: 65.3781512605042\n", " recall: 56.87134502923976\n", " f1_score: 60.82877247849882\n", " specificity: 76.0186263096624\n", " true_positive: 389\n", " true_negative: 653\n", " false_positive: 206\n", " false_negative: 295\n", " accuracy: 67.53078418664938\n", "Number of Training instances: 4629\n", "Number of Testing instances: 1543\n", "\n", "--------------- Explainer ----------------\n", "For the split number 0:\n", "**Boosted Tree model**\n", "NClasses: 2\n", "nTrees: 100\n", "nVariables: 38\n", "\n", "--------------- Instances ----------------\n", "Number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explaining\n", "\n", "learner = Learning.Xgboost(\"../../../dataset/compas.csv\", problem_type='classification')\n", "model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.BT)\n", "instance, prediction = learner.get_instances(model, n=1, is_correct=True)" ] }, { "cell_type": "markdown", "id": "bcd926f6", "metadata": {}, "source": [ "Finally, we display the direct reason for this instance: " ] }, { "cell_type": "code", "execution_count": 4, "id": "f5ba8cd7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: Misdemeanor 0\n", "Number_of_Priors 0\n", "score_factor 0\n", "Age_Above_FourtyFive 1\n", "Age_Below_TwentyFive 0\n", "African_American 0\n", "Asian 0\n", "Hispanic 0\n", "Native_American 0\n", "Other 1\n", "Female 0\n", "Name: 0, dtype: int64\n", "prediction: 0\n", "\n", "len binary representation: 38\n", "len direct: 36\n", "is_reason: True\n", "to_features: ['Misdemeanor < 1.0', 'Number_of_Priors < 1.0', 'score_factor < 1.0', 'Age_Above_FourtyFive >= 1.0', 'Age_Below_TwentyFive < 1.0', 'African_American < 1.0', 'Asian < 1.0', 'Hispanic < 1.0', 'Native_American < 1.0', 'Other >= 1.0', 'Female < 1.0']\n" ] } ], "source": [ "explainer = Explaining.initialize(model, instance)\n", "print(\"instance:\", instance)\n", "print(\"prediction:\", prediction)\n", "print()\n", "direct_reason = explainer.direct_reason()\n", "print(\"len binary representation:\", len(explainer.binary_representation))\n", "print(\"len direct:\", len(direct_reason))\n", "print(\"is_reason:\", explainer.is_implicant(direct_reason))\n", "print(\"to_features:\", explainer.to_features(direct_reason))" ] }, { "cell_type": "markdown", "id": "e0f91d31-c8be-4cc9-bf37-a0977df39908", "metadata": {}, "source": [ "We can remark that this direct reason contains 38 binary variables of the implicant out of 42. This reason explains why the model predicts for this instance. But this is probably not the most compact reason for this instance, we invite you to look at the other types of reasons presented on the [Explanations Computation](/documentation/explanations/RFexplanations/) page." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }