{ "cells": [ { "cell_type": "markdown", "id": "b1db8c9a", "metadata": {}, "source": [ "# Direct Reason" ] }, { "cell_type": "markdown", "id": "514a5144", "metadata": {}, "source": [ "Let $T$ be a decision tree and $x$ be an instance, the **direct reason** for $x$ is a term of the binary representation of the instance corresponding to the unique root-to-leaf path of $T$ that is compatible with $x$. Due to its simplicity, it is one of the easiest reason to calculate, but in general it is redundant. More information about the direct reason can be found in the paper [On the Explanatory Power of Decision Trees](https://arxiv.org/abs/2108.05266)." ] }, { "cell_type": "markdown", "id": "e4432d14", "metadata": {}, "source": [ "The basic methods [``initialize``](/documentation/api/modules/explaining/), ```set_instance```, ```to_features```, ```is_reason```, ...) of the ```Explainer``` module used in the next examples are described in the [Explainer Principles](/pyxai/documentation/explainer/) page. " ] }, { "cell_type": "markdown", "id": "d70f9a73", "metadata": {}, "source": [ "## Example from a Hand-Crafted Tree" ] }, { "cell_type": "markdown", "id": "3df1afd3", "metadata": {}, "source": [ "For this example, we take the Decision Tree of the [Building Models](/documentation/learning/builder/DTbuilder/) page. \n", "\n", "\"DTbuilder\"\n", "\n", "This figure represents a Decision Tree using $4$ binary features ($x_1$, $x_2$, $x_3$ and $x_4$). The direct reason for the instance $(1,1,1,1)$ is in red and the one for $(0,0,0,0)$ is in blue. Now, we show how to get them with PyXAI. We start by building the decision tree: " ] }, { "cell_type": "code", "execution_count": 2, "id": "411398a5", "metadata": {}, "outputs": [], "source": [ "from pyxai import Builder, Explaining\n", "\n", "node_x4_1 = Builder.DecisionNode(4, left=0, right=1)\n", "node_x4_2 = Builder.DecisionNode(4, left=0, right=1)\n", "node_x4_3 = Builder.DecisionNode(4, left=0, right=1)\n", "node_x4_4 = Builder.DecisionNode(4, left=0, right=1)\n", "node_x4_5 = Builder.DecisionNode(4, left=0, right=1)\n", "\n", "node_x3_1 = Builder.DecisionNode(3, left=0, right=node_x4_1)\n", "node_x3_2 = Builder.DecisionNode(3, left=node_x4_2, right=node_x4_3)\n", "node_x3_3 = Builder.DecisionNode(3, left=node_x4_4, right=node_x4_5)\n", "\n", "node_x2_1 = Builder.DecisionNode(2, left=0, right=node_x3_1)\n", "node_x2_2 = Builder.DecisionNode(2, left=node_x3_2, right=node_x3_3)\n", "\n", "node_x1_1 = Builder.DecisionNode(1, left=node_x2_1, right=node_x2_2)\n", "\n", "tree = Builder.DecisionTree(4, node_x1_1, force_features_equal_to_binaries=True)" ] }, { "cell_type": "markdown", "id": "a2263e89", "metadata": {}, "source": [ "And we compute the direct reasons for these two instances: " ] }, { "cell_type": "code", "execution_count": 4, "id": "935837ea", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: (1,1,1,1)\n", "binary representation: (1, 2, 3, 4)\n", "target_prediction: 1\n", "direct: (1, 2, 3, 4)\n", "to_features: ['f1 >= 0.5', 'f2 >= 0.5', 'f3 >= 0.5', 'f4 >= 0.5']\n", "------------------------------------------------\n", "instance: (0,0,0,0)\n", "binary representation: (-1, -2, -3, -4)\n", "target_prediction: 0\n", "direct: (-1, -2)\n", "to_features: ['f1 < 0.5', 'f2 < 0.5']\n" ] } ], "source": [ "explainer = Explaining.initialize(tree)\n", "explainer.set_instance((1,1,1,1))\n", "direct = explainer.direct_reason()\n", "print(\"instance: (1,1,1,1)\")\n", "print(\"binary representation:\", explainer.binary_representation)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "print(\"direct:\", direct)\n", "print(\"to_features:\", explainer.to_features(direct))\n", "print(\"------------------------------------------------\")\n", "explainer.set_instance((0,0,0,0))\n", "direct = explainer.direct_reason()\n", "print(\"instance: (0,0,0,0)\")\n", "print(\"binary representation:\", explainer.binary_representation)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "print(\"direct:\", direct)\n", "print(\"to_features:\", explainer.to_features(direct))" ] }, { "cell_type": "markdown", "id": "4061b821", "metadata": {}, "source": [ "## Example from a Real Dataset" ] }, { "cell_type": "markdown", "id": "7c187df9", "metadata": {}, "source": [ "For this example, we take the [compas](/assets/notebooks/dataset/compas.csv) dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. " ] }, { "cell_type": "code", "execution_count": 10, "id": "e3fe96a1", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------- Information ---------------\n", "Problem type: classification\n", "Instances type: tabular\n", "Labels type: classes\n", "\n", "Dataset path: ../../../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 11\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Model creation, fitting and evaluation ---------------\n", "Splitting method: hold-out\n", "Problem type: classification\n", "Models type: decision-tree\n", "model_parameters: {}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "Metrics:\n", " sklearn_confusion_matrix: [[637, 200], [327, 379]]\n", " precision: 65.45768566493955\n", " recall: 53.682719546742206\n", " f1_score: 58.988326848249024\n", " specificity: 76.10513739545998\n", " true_positive: 379\n", " true_negative: 637\n", " false_positive: 200\n", " false_negative: 327\n", " accuracy: 65.84575502268308\n", "Number of Training instances: 4629\n", "Number of Testing instances: 1543\n", "\n", "--------------- Explainer ----------------\n", "For the split number 0:\n", "**Decision Tree Model**\n", "nFeatures: 11\n", "nNodes: 564\n", "nVariables: 45\n", "\n", "--------------- Instances ----------------\n", "Number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explaining\n", "\n", "learner = learner = Learning.Scikitlearn(\"../../../dataset/compas.csv\", problem_type='classification', instances_type='tabular', labels_type='classes')\n", "model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.DT)\n", "instance, prediction = learner.get_instances(model, n=1, is_correct=True)" ] }, { "cell_type": "markdown", "id": "bcd926f6", "metadata": {}, "source": [ "Finally, we compute the direct reason for this instance: " ] }, { "cell_type": "code", "execution_count": 12, "id": "f5ba8cd7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: Number_of_Priors 0\n", "score_factor 0\n", "Age_Above_FourtyFive 1\n", "Age_Below_TwentyFive 0\n", "Origin_African_American 0\n", "Origin_Asian 0\n", "Origin_Hispanic 0\n", "Origin_Native_American 0\n", "Origin_Other 1\n", "Female 0\n", "Misdemeanor 0\n", "Name: 0, dtype: int64\n", "prediction: 0\n", "\n", "len binary representation: 45\n", "len direct: 10\n", "is_reason: True\n", "to_features: ['Number_of_Priors <= 0.5', 'score_factor <= 0.5', 'Age_Above_FourtyFive > 0.5', 'Age_Below_TwentyFive <= 0.5', 'Origin_African_American <= 0.5', 'Origin_Hispanic <= 0.5', 'Origin_Other > 0.5', 'Female <= 0.5', 'Misdemeanor <= 0.5']\n" ] } ], "source": [ "explainer = Explaining.initialize(model, instance)\n", "print(\"instance:\", instance)\n", "print(\"prediction:\", prediction)\n", "print()\n", "direct_reason = explainer.direct_reason()\n", "print(\"len binary representation:\", len(explainer.binary_representation))\n", "print(\"len direct:\", len(direct_reason))\n", "print(\"is_reason:\", explainer.is_reason(direct_reason))\n", "print(\"to_features:\", explainer.to_features(direct_reason))" ] }, { "cell_type": "markdown", "id": "663667d5", "metadata": {}, "source": [ "We can remark that this direct reason contains 9 binary variables out of 46 variables in the binary representation. This reason explains why the model predicts $0$ for this instance. But this is probably not the most compact abductive explanation for the instance, we invite you to take a look at the other types of reasons presented on the [Decision Tree Explanations](/documentation/explanations/DTexplanations/) page. " ] }, { "cell_type": "markdown", "id": "d1405d96-58d2-4eb4-8a78-05ddd5dfd77c", "metadata": { "jp-MarkdownHeadingCollapsed": true }, "source": [ "## See Also\n", " - API: ```Builder```, ```ExplainerDT```, ```Learner```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }