{
"cells": [
{
"cell_type": "markdown",
"id": "b1db8c9a",
"metadata": {},
"source": [
"# Direct Reason"
]
},
{
"cell_type": "markdown",
"id": "514a5144",
"metadata": {},
"source": [
"Let $BT$ be a boosted tree composed of {$T_1,\\ldots T_n$} regression trees and $x$ an instance, the **direct reason** for $x$ is the term of the binary representation corresponding to the conjunction of the terms corresponding to the root-to-leaf paths of all $T_i$ that are compatible with $x$. Due to its simplicity, it is one of the easiest abductive explanation for $x$ to compute, but it can be highly redundant. More information about the direct reason can be found in the article [Computing Abductive Explanations for Boosted Trees](https://arxiv.org/abs/2209.07740)."
]
},
{
"cell_type": "markdown",
"id": "d88fcc64",
"metadata": {},
"source": [
"| <Explainer Object>.direct_reason(): | \n",
"| :----------- | \n",
"| Returns the direct reason for the current instance. Returns ```None``` if this reason contains some excluded features. All kinds of operators in the conditions are supported. This reason is in the form of binary variables, you must use the ```to_features ``` method if you want to obtain a representation based on the features considered at start. |"
]
},
{
"cell_type": "markdown",
"id": "e4432d14",
"metadata": {},
"source": [
"The basic methods (```initialize```, ```set_instance```, ```to_features```, ```is_reason```, ...) of the ```explainer``` module used in the next examples are described in the [Explainer Principles](/documentation/explainer/) page. "
]
},
{
"cell_type": "markdown",
"id": "d70f9a73",
"metadata": {},
"source": [
"## Example from Hand-Crafted Trees"
]
},
{
"cell_type": "markdown",
"id": "3df1afd3",
"metadata": {},
"source": [
"For this example, we take an example of binary classification from the [Building Models](/documentation/learning/builder/BTbuilder/) page. This figure represents a boosted tree $BT$ using $4$ features ($A_1$, $A_2$, $A_3$ and $A_4$), where $A_1$ and $A_2$ are numerical, $A_3$ is categorical and $A_4$ is Boolean. The direct reason for the instance $x$ = ($A_1=4$, $A_2 = 3$, $A_3 = 1$, $A_4 = 1$) is in red. This reason contains all features of the instance. \n",
"\n",
"
\n",
"\n",
"We have $w(T_1, x)=0.3$, $w(T_2, x)=0.5$ and $w(T_3, x)=0.1$. So $W(F, x) = 0.9$. As we are in the case of binary classification and $W(F, x) > 0$, $x$ is classified as a positive instance ($BT(x) = 1$).\n",
"\n",
"{: .attention }\n",
"> We consider that the features $A_3$ and $A_4$ are numerical. Categorical and Boolean features will be implemented in future versions of PyXAI. \n",
"\n",
"We now show how to get direct reasons using PyXAI: "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "411398a5",
"metadata": {},
"outputs": [],
"source": [
"from pyxai import Builder, Explainer\n",
"\n",
"node1_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.3)\n",
"node1_2 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=-0.3, right=node1_1)\n",
"node1_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=0.4, right=node1_2)\n",
"node1_4 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=node1_3)\n",
"tree1 = Builder.DecisionTree(4, node1_4)\n",
"\n",
"node2_1 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.4, right=0.3)\n",
"node2_2 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=node2_1)\n",
"node2_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node2_2, right=0.5)\n",
"tree2 = Builder.DecisionTree(4, node2_3)\n",
"\n",
"node3_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=0.2, right=0.3)\n",
"node3_2_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.2)\n",
"node3_2_2 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.1, right=node3_1)\n",
"node3_2_3 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=0.1)\n",
"node3_3_1 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node3_2_1, right=node3_2_2)\n",
"node3_3_2 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=-0.4, right=node3_2_3)\n",
"node3_4 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=node3_3_1, right=node3_3_2)\n",
"tree3 = Builder.DecisionTree(4, node3_4)\n",
"\n",
"BT = Builder.BoostedTrees([tree1, tree2, tree3], n_classes=2)"
]
},
{
"cell_type": "markdown",
"id": "a2263e89",
"metadata": {},
"source": [
"We compute the direct reason for this instance: "
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "935837ea",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"instance: (4,3,2,1)\n",
"binary_representation: (1, 2, 3, 4)\n",
"target_prediction: 1\n",
"direct: (1, 2, 3, 4)\n",
"to_features: ('f1 > 2', 'f2 > 1', 'f3 == 1', 'f4 == 1')\n"
]
}
],
"source": [
"explainer = Explainer.initialize(BT)\n",
"explainer.set_instance((4,3,1,1))\n",
"direct = explainer.direct_reason()\n",
"print(\"instance: (4,3,2,1)\")\n",
"print(\"binary_representation:\", explainer.binary_representation)\n",
"print(\"target_prediction:\", explainer.target_prediction)\n",
"print(\"direct:\", direct)\n",
"print(\"to_features:\", explainer.to_features(direct))\n"
]
},
{
"cell_type": "markdown",
"id": "fc75b1a7",
"metadata": {},
"source": [
"As you can see, in this case, the direct reason coincides with the full instance."
]
},
{
"cell_type": "markdown",
"id": "4061b821",
"metadata": {},
"source": [
"## Example from a Real Dataset"
]
},
{
"cell_type": "markdown",
"id": "7c187df9",
"metadata": {},
"source": [
"For this example, we take the ```compas.csv``` dataset. We create one model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "e3fe96a1",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"data:\n",
" Number_of_Priors score_factor Age_Above_FourtyFive \n",
"0 0 0 1 \\\n",
"1 0 0 0 \n",
"2 4 0 0 \n",
"3 0 0 0 \n",
"4 14 1 0 \n",
"... ... ... ... \n",
"6167 0 1 0 \n",
"6168 0 0 0 \n",
"6169 0 0 1 \n",
"6170 3 0 0 \n",
"6171 2 0 0 \n",
"\n",
" Age_Below_TwentyFive African_American Asian Hispanic \n",
"0 0 0 0 0 \\\n",
"1 0 1 0 0 \n",
"2 1 1 0 0 \n",
"3 0 0 0 0 \n",
"4 0 0 0 0 \n",
"... ... ... ... ... \n",
"6167 1 1 0 0 \n",
"6168 1 1 0 0 \n",
"6169 0 0 0 0 \n",
"6170 0 1 0 0 \n",
"6171 1 0 0 1 \n",
"\n",
" Native_American Other Female Misdemeanor Two_yr_Recidivism \n",
"0 0 1 0 0 0 \n",
"1 0 0 0 0 1 \n",
"2 0 0 0 0 1 \n",
"3 0 1 0 1 0 \n",
"4 0 0 0 0 1 \n",
"... ... ... ... ... ... \n",
"6167 0 0 0 0 0 \n",
"6168 0 0 0 0 0 \n",
"6169 0 1 0 0 0 \n",
"6170 0 0 1 1 0 \n",
"6171 0 0 1 0 1 \n",
"\n",
"[6172 rows x 12 columns]\n",
"-------------- Information ---------------\n",
"Dataset name: ../../../dataset/compas.csv\n",
"nFeatures (nAttributes, with the labels): 12\n",
"nInstances (nObservations): 6172\n",
"nLabels: 2\n",
"--------------- Evaluation ---------------\n",
"method: HoldOut\n",
"output: BT\n",
"learner_type: Classification\n",
"learner_options: {'seed': 0, 'max_depth': None, 'eval_metric': 'mlogloss'}\n",
"--------- Evaluation Information ---------\n",
"For the evaluation number 0:\n",
"metrics:\n",
" accuracy: 66.73866090712744\n",
"nTraining instances: 4320\n",
"nTest instances: 1852\n",
"\n",
"--------------- Explainer ----------------\n",
"For the evaluation number 0:\n",
"**Boosted Tree model**\n",
"NClasses: 2\n",
"nTrees: 100\n",
"nVariables: 42\n",
"\n",
"--------------- Instances ----------------\n",
"number of instances selected: 1\n",
"----------------------------------------------\n"
]
}
],
"source": [
"from pyxai import Learning, Explainer\n",
"\n",
"learner = Learning.Xgboost(\"../../../dataset/compas.csv\", learner_type=Learning.CLASSIFICATION)\n",
"model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.BT)\n",
"instance, prediction = learner.get_instances(model, n=1, correct=True)"
]
},
{
"cell_type": "markdown",
"id": "bcd926f6",
"metadata": {},
"source": [
"Finally, we display the direct reason for this instance: "
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f5ba8cd7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"instance: [0 0 1 0 0 0 0 0 1 0 0]\n",
"prediction: 0\n",
"\n",
"len binary representation: 42\n",
"len direct: 38\n",
"is_reason: True\n",
"to_features: ('Number_of_Priors < 0.5', 'score_factor < 0.5', 'Age_Above_FourtyFive >= 0.5', 'Age_Below_TwentyFive < 0.5', 'African_American < 0.5', 'Asian < 0.5', 'Hispanic < 0.5', 'Native_American < 0.5', 'Other >= 0.5', 'Female < 0.5', 'Misdemeanor < 0.5')\n"
]
}
],
"source": [
"explainer = Explainer.initialize(model, instance)\n",
"print(\"instance:\", instance)\n",
"print(\"prediction:\", prediction)\n",
"print()\n",
"direct_reason = explainer.direct_reason()\n",
"print(\"len binary representation:\", len(explainer.binary_representation))\n",
"print(\"len direct:\", len(direct_reason))\n",
"print(\"is_reason:\", explainer.is_reason(direct_reason))\n",
"print(\"to_features:\", explainer.to_features(direct_reason))"
]
},
{
"cell_type": "markdown",
"id": "663667d5",
"metadata": {},
"source": [
"We can remark that this direct reason contains 38 binary variables of the implicant out of 42. This reason explains why the model predicts $0$ for this instance. But this is probably not the most compact reason for this instance, we invite you to look at the other types of reasons presented on the [Explanations Computation](/documentation/explanations/BTexplanations/) page. "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 5
}