{ "cells": [ { "cell_type": "markdown", "id": "b1db8c9a", "metadata": {}, "source": [ "# Direct Reason" ] }, { "cell_type": "markdown", "id": "514a5144", "metadata": {}, "source": [ "Let $BT$ be a boosted tree composed of {$T_1,\\ldots T_n$} regression trees and $x$ an instance, the **direct reason** for $x$ is the term of the binary representation corresponding to the conjunction of the terms corresponding to the root-to-leaf paths of all $T_i$ that are compatible with $x$. Due to its simplicity, it is one of the easiest abductive explanation for $x$ to compute, but it can be highly redundant. More information about the direct reason can be found in the article [Computing Abductive Explanations for Boosted Trees](https://arxiv.org/abs/2209.07740)." ] }, { "cell_type": "markdown", "id": "d88fcc64", "metadata": {}, "source": [ "| <Explainer Object>.direct_reason(): | \n", "| :----------- | \n", "| Returns the direct reason for the current instance. Returns ```None``` if this reason contains some excluded features. All kinds of operators in the conditions are supported. This reason is in the form of binary variables, you must use the ```to_features ``` method if you want to obtain a representation based on the features considered at start. |" ] }, { "cell_type": "markdown", "id": "e4432d14", "metadata": {}, "source": [ "The basic methods (```initialize```, ```set_instance```, ```to_features```, ```is_reason```, ...) of the ```explainer``` module used in the next examples are described in the [Explainer Principles](/documentation/explainer/) page. " ] }, { "cell_type": "markdown", "id": "d70f9a73", "metadata": {}, "source": [ "## Example from Hand-Crafted Trees" ] }, { "cell_type": "markdown", "id": "3df1afd3", "metadata": {}, "source": [ "For this example, we take an example of binary classification from the [Building Models](/documentation/learning/builder/BTbuilder/) page. This figure represents a boosted tree $BT$ using $4$ features ($A_1$, $A_2$, $A_3$ and $A_4$), where $A_1$ and $A_2$ are numerical, $A_3$ is categorical and $A_4$ is Boolean. The direct reason for the instance $x$ = ($A_1=4$, $A_2 = 3$, $A_3 = 1$, $A_4 = 1$) is in red. This reason contains all features of the instance. \n", "\n", "\"BTdirect\"\n", "\n", "We have $w(T_1, x)=0.3$, $w(T_2, x)=0.5$ and $w(T_3, x)=0.1$. So $W(F, x) = 0.9$. As we are in the case of binary classification and $W(F, x) > 0$, $x$ is classified as a positive instance ($BT(x) = 1$).\n", "\n", "{: .attention }\n", "> We consider that the features $A_3$ and $A_4$ are numerical. Categorical and Boolean features will be implemented in future versions of PyXAI. \n", "\n", "We now show how to get direct reasons using PyXAI: " ] }, { "cell_type": "code", "execution_count": 1, "id": "411398a5", "metadata": {}, "outputs": [], "source": [ "from pyxai import Builder, Explainer\n", "\n", "node1_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.3)\n", "node1_2 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=-0.3, right=node1_1)\n", "node1_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=0.4, right=node1_2)\n", "node1_4 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=node1_3)\n", "tree1 = Builder.DecisionTree(4, node1_4)\n", "\n", "node2_1 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.4, right=0.3)\n", "node2_2 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=node2_1)\n", "node2_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node2_2, right=0.5)\n", "tree2 = Builder.DecisionTree(4, node2_3)\n", "\n", "node3_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=0.2, right=0.3)\n", "node3_2_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.2)\n", "node3_2_2 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.1, right=node3_1)\n", "node3_2_3 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=0.1)\n", "node3_3_1 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node3_2_1, right=node3_2_2)\n", "node3_3_2 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=-0.4, right=node3_2_3)\n", "node3_4 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=node3_3_1, right=node3_3_2)\n", "tree3 = Builder.DecisionTree(4, node3_4)\n", "\n", "BT = Builder.BoostedTrees([tree1, tree2, tree3], n_classes=2)" ] }, { "cell_type": "markdown", "id": "a2263e89", "metadata": {}, "source": [ "We compute the direct reason for this instance: " ] }, { "cell_type": "code", "execution_count": 2, "id": "935837ea", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: (4,3,2,1)\n", "binary_representation: (1, 2, 3, 4)\n", "target_prediction: 1\n", "direct: (1, 2, 3, 4)\n", "to_features: ('f1 > 2', 'f2 > 1', 'f3 == 1', 'f4 == 1')\n" ] } ], "source": [ "explainer = Explainer.initialize(BT)\n", "explainer.set_instance((4,3,1,1))\n", "direct = explainer.direct_reason()\n", "print(\"instance: (4,3,2,1)\")\n", "print(\"binary_representation:\", explainer.binary_representation)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "print(\"direct:\", direct)\n", "print(\"to_features:\", explainer.to_features(direct))\n" ] }, { "cell_type": "markdown", "id": "fc75b1a7", "metadata": {}, "source": [ "As you can see, in this case, the direct reason coincides with the full instance." ] }, { "cell_type": "markdown", "id": "4061b821", "metadata": {}, "source": [ "## Example from a Real Dataset" ] }, { "cell_type": "markdown", "id": "7c187df9", "metadata": {}, "source": [ "For this example, we take the ```compas.csv``` dataset. We create one model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. " ] }, { "cell_type": "code", "execution_count": 3, "id": "e3fe96a1", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data:\n", " Number_of_Priors score_factor Age_Above_FourtyFive \n", "0 0 0 1 \\\n", "1 0 0 0 \n", "2 4 0 0 \n", "3 0 0 0 \n", "4 14 1 0 \n", "... ... ... ... \n", "6167 0 1 0 \n", "6168 0 0 0 \n", "6169 0 0 1 \n", "6170 3 0 0 \n", "6171 2 0 0 \n", "\n", " Age_Below_TwentyFive African_American Asian Hispanic \n", "0 0 0 0 0 \\\n", "1 0 1 0 0 \n", "2 1 1 0 0 \n", "3 0 0 0 0 \n", "4 0 0 0 0 \n", "... ... ... ... ... \n", "6167 1 1 0 0 \n", "6168 1 1 0 0 \n", "6169 0 0 0 0 \n", "6170 0 1 0 0 \n", "6171 1 0 0 1 \n", "\n", " Native_American Other Female Misdemeanor Two_yr_Recidivism \n", "0 0 1 0 0 0 \n", "1 0 0 0 0 1 \n", "2 0 0 0 0 1 \n", "3 0 1 0 1 0 \n", "4 0 0 0 0 1 \n", "... ... ... ... ... ... \n", "6167 0 0 0 0 0 \n", "6168 0 0 0 0 0 \n", "6169 0 1 0 0 0 \n", "6170 0 0 1 1 0 \n", "6171 0 0 1 0 1 \n", "\n", "[6172 rows x 12 columns]\n", "-------------- Information ---------------\n", "Dataset name: ../../../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 12\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Evaluation ---------------\n", "method: HoldOut\n", "output: BT\n", "learner_type: Classification\n", "learner_options: {'seed': 0, 'max_depth': None, 'eval_metric': 'mlogloss'}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "metrics:\n", " accuracy: 66.73866090712744\n", "nTraining instances: 4320\n", "nTest instances: 1852\n", "\n", "--------------- Explainer ----------------\n", "For the evaluation number 0:\n", "**Boosted Tree model**\n", "NClasses: 2\n", "nTrees: 100\n", "nVariables: 42\n", "\n", "--------------- Instances ----------------\n", "number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explainer\n", "\n", "learner = Learning.Xgboost(\"../../../dataset/compas.csv\", learner_type=Learning.CLASSIFICATION)\n", "model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.BT)\n", "instance, prediction = learner.get_instances(model, n=1, correct=True)" ] }, { "cell_type": "markdown", "id": "bcd926f6", "metadata": {}, "source": [ "Finally, we display the direct reason for this instance: " ] }, { "cell_type": "code", "execution_count": 4, "id": "f5ba8cd7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: [0 0 1 0 0 0 0 0 1 0 0]\n", "prediction: 0\n", "\n", "len binary representation: 42\n", "len direct: 38\n", "is_reason: True\n", "to_features: ('Number_of_Priors < 0.5', 'score_factor < 0.5', 'Age_Above_FourtyFive >= 0.5', 'Age_Below_TwentyFive < 0.5', 'African_American < 0.5', 'Asian < 0.5', 'Hispanic < 0.5', 'Native_American < 0.5', 'Other >= 0.5', 'Female < 0.5', 'Misdemeanor < 0.5')\n" ] } ], "source": [ "explainer = Explainer.initialize(model, instance)\n", "print(\"instance:\", instance)\n", "print(\"prediction:\", prediction)\n", "print()\n", "direct_reason = explainer.direct_reason()\n", "print(\"len binary representation:\", len(explainer.binary_representation))\n", "print(\"len direct:\", len(direct_reason))\n", "print(\"is_reason:\", explainer.is_reason(direct_reason))\n", "print(\"to_features:\", explainer.to_features(direct_reason))" ] }, { "cell_type": "markdown", "id": "663667d5", "metadata": {}, "source": [ "We can remark that this direct reason contains 38 binary variables of the implicant out of 42. This reason explains why the model predicts $0$ for this instance. But this is probably not the most compact reason for this instance, we invite you to look at the other types of reasons presented on the [Explanations Computation](/documentation/explanations/BTexplanations/) page. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }