{ "cells": [ { "cell_type": "markdown", "id": "b1db8c9a", "metadata": {}, "source": [ "# Direct Reason" ] }, { "cell_type": "markdown", "id": "514a5144", "metadata": {}, "source": [ "Let $RF=${$T_1, \\ldots, T_n$} be a random forest and $x$ be an instance, the **direct reason** for $x$ is a term of the binary representation corresponding to the conjunction of the terms corresponding to the root-to-leaf paths of all $T_i$ that are compatible with $x$. Due to its simplicity, it is an abductive explanation that is easy to compute, but it can be redundant. More information about the direct reason can be found in the paper [Trading Complexity for Sparsity in Random Forest Explanations](https://ojs.aaai.org/index.php/AAAI/article/view/20484)." ] }, { "cell_type": "markdown", "id": "d88fcc64", "metadata": {}, "source": [ "| <Explainer Object>.direct_reason(): | \n", "| :----------- | \n", "| Returns the unique direct reason for the current instance. Return ```None``` if this reason contains some excluded features. This reason is in the form of binary variables, you must use the ```to_features ``` method if you want it in the form of features. |" ] }, { "cell_type": "markdown", "id": "e4432d14", "metadata": {}, "source": [ "The basic methods (```initialize```, ```set_instance```, ```to_features```, ```is_reason```, ...) of the ```Explainer``` module used in the next examples are described in the [Explainer Principles](/documentation/explainer/) page. " ] }, { "cell_type": "markdown", "id": "d70f9a73", "metadata": {}, "source": [ "## Example from Hand-Crafted Trees" ] }, { "cell_type": "markdown", "id": "3df1afd3", "metadata": {}, "source": [ "For this example, we take the Decision Tree of the [Building Models](/documentation/learning/builder/RFbuilder/) page. \n", "\n", "\"RFdirect\"\n", "\n", "This figure represents a Random Forest with 3 Decision Trees using $4$ binary features ($x_1$, $x_2$, $x_3$ and $x_4$). The direct reason for the instance $(1,1,1,1)$ is in red and the one for $(0,0,0,0)$ is in blue. Now, we show how to get them with PyXAI. We start by building the random forest:" ] }, { "cell_type": "code", "execution_count": 1, "id": "411398a5", "metadata": {}, "outputs": [], "source": [ "from pyxai import Builder, Explainer\n", "\n", "nodeT1_1 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT1_3 = Builder.DecisionNode(3, left=0, right=nodeT1_1)\n", "nodeT1_2 = Builder.DecisionNode(2, left=1, right=nodeT1_3)\n", "nodeT1_4 = Builder.DecisionNode(4, left=0, right=nodeT1_2)\n", "tree1 = Builder.DecisionTree(4, nodeT1_4, force_features_equal_to_binaries=True)\n", "\n", "nodeT2_4 = Builder.DecisionNode(4, left=0, right=1)\n", "nodeT2_1 = Builder.DecisionNode(1, left=0, right=nodeT2_4)\n", "nodeT2_2 = Builder.DecisionNode(2, left=nodeT2_1, right=1)\n", "tree2 = Builder.DecisionTree(4, nodeT2_2, force_features_equal_to_binaries=True) #4 features but only 3 used\n", "\n", "nodeT3_1_1 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT3_1_2 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT3_4_1 = Builder.DecisionNode(4, left=0, right=nodeT3_1_1)\n", "nodeT3_4_2 = Builder.DecisionNode(4, left=0, right=1)\n", "nodeT3_2_1 = Builder.DecisionNode(2, left=nodeT3_1_2, right=nodeT3_4_1)\n", "nodeT3_2_2 = Builder.DecisionNode(2, left=0, right=nodeT3_4_2)\n", "nodeT3_3_1 = Builder.DecisionNode(3, left=nodeT3_2_1, right=nodeT3_2_2)\n", "tree3 = Builder.DecisionTree(4, nodeT3_3_1, force_features_equal_to_binaries=True)\n", "\n", "forest = Builder.RandomForest([tree1, tree2, tree3], n_classes=2)" ] }, { "cell_type": "markdown", "id": "a2263e89", "metadata": {}, "source": [ "We compute the direct reasons for these two instances: " ] }, { "cell_type": "code", "execution_count": 2, "id": "935837ea", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: (1,1,1,1)\n", "binary representation: (1, 2, 3, 4)\n", "target_prediction: 1\n", "direct: (1, 2, 3, 4)\n", "to_features: ('f1 >= 0.5', 'f2 >= 0.5', 'f3 >= 0.5', 'f4 >= 0.5')\n", "------------------------------------------------\n", "instance: (0,0,0,0)\n", "binary representation: (-1, -2, -3, -4)\n", "target_prediction: 0\n", "direct: (-1, -2, -3, -4)\n", "to_features: ('f1 < 0.5', 'f2 < 0.5', 'f3 < 0.5', 'f4 < 0.5')\n" ] } ], "source": [ "explainer = Explainer.initialize(forest)\n", "explainer.set_instance((1,1,1,1))\n", "direct = explainer.direct_reason()\n", "print(\"instance: (1,1,1,1)\")\n", "print(\"binary representation:\", explainer.binary_representation)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "print(\"direct:\", direct)\n", "print(\"to_features:\", explainer.to_features(direct))\n", "print(\"------------------------------------------------\")\n", "explainer.set_instance((0,0,0,0))\n", "direct = explainer.direct_reason()\n", "print(\"instance: (0,0,0,0)\")\n", "print(\"binary representation:\", explainer.binary_representation)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "print(\"direct:\", direct)\n", "print(\"to_features:\", explainer.to_features(direct))" ] }, { "cell_type": "markdown", "id": "ee8de6c5", "metadata": {}, "source": [ "As you can see, in this case, the direct reason corresonds to the full instance." ] }, { "cell_type": "markdown", "id": "4061b821", "metadata": {}, "source": [ "## Example from a Real Dataset" ] }, { "cell_type": "markdown", "id": "7c187df9", "metadata": {}, "source": [ "For this example, we take the [compas](/assets/notebooks/dataset/compas.csv) dataset. We create one model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. " ] }, { "cell_type": "code", "execution_count": 3, "id": "e3fe96a1", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data:\n", " Number_of_Priors score_factor Age_Above_FourtyFive \n", "0 0 0 1 \\\n", "1 0 0 0 \n", "2 4 0 0 \n", "3 0 0 0 \n", "4 14 1 0 \n", "... ... ... ... \n", "6167 0 1 0 \n", "6168 0 0 0 \n", "6169 0 0 1 \n", "6170 3 0 0 \n", "6171 2 0 0 \n", "\n", " Age_Below_TwentyFive African_American Asian Hispanic \n", "0 0 0 0 0 \\\n", "1 0 1 0 0 \n", "2 1 1 0 0 \n", "3 0 0 0 0 \n", "4 0 0 0 0 \n", "... ... ... ... ... \n", "6167 1 1 0 0 \n", "6168 1 1 0 0 \n", "6169 0 0 0 0 \n", "6170 0 1 0 0 \n", "6171 1 0 0 1 \n", "\n", " Native_American Other Female Misdemeanor Two_yr_Recidivism \n", "0 0 1 0 0 0 \n", "1 0 0 0 0 1 \n", "2 0 0 0 0 1 \n", "3 0 1 0 1 0 \n", "4 0 0 0 0 1 \n", "... ... ... ... ... ... \n", "6167 0 0 0 0 0 \n", "6168 0 0 0 0 0 \n", "6169 0 1 0 0 0 \n", "6170 0 0 1 1 0 \n", "6171 0 0 1 0 1 \n", "\n", "[6172 rows x 12 columns]\n", "-------------- Information ---------------\n", "Dataset name: ../../../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 12\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Evaluation ---------------\n", "method: HoldOut\n", "output: RF\n", "learner_type: Classification\n", "learner_options: {'max_depth': None, 'random_state': 0}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "metrics:\n", " accuracy: 65.71274298056156\n", "nTraining instances: 4320\n", "nTest instances: 1852\n", "\n", "--------------- Explainer ----------------\n", "For the evaluation number 0:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 68\n", "\n", "--------------- Instances ----------------\n", "number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explainer\n", "\n", "learner = Learning.Scikitlearn(\"../../../dataset/compas.csv\", learner_type=Learning.CLASSIFICATION)\n", "model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.RF)\n", "instance, prediction = learner.get_instances(model, n=1, correct=True)" ] }, { "cell_type": "markdown", "id": "bcd926f6", "metadata": {}, "source": [ "We display the direct reason for this instance: " ] }, { "cell_type": "code", "execution_count": 4, "id": "f5ba8cd7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: [0 0 1 0 0 0 0 0 1 0 0]\n", "prediction: 0\n", "\n", "len binary representation: 68\n", "len direct: 22\n", "is_reason: True\n", "to_features: ('Number_of_Priors <= 0.5', 'score_factor <= 0.5', 'Age_Above_FourtyFive > 0.5', 'Age_Below_TwentyFive <= 0.5', 'African_American <= 0.5', 'Asian <= 0.5', 'Hispanic <= 0.5', 'Native_American <= 0.5', 'Other > 0.5', 'Female <= 0.5', 'Misdemeanor <= 0.5')\n" ] } ], "source": [ "explainer = Explainer.initialize(model, instance)\n", "print(\"instance:\", instance)\n", "print(\"prediction:\", prediction)\n", "print()\n", "direct_reason = explainer.direct_reason()\n", "print(\"len binary representation:\", len(explainer.binary_representation))\n", "print(\"len direct:\", len(direct_reason))\n", "print(\"is_reason:\", explainer.is_reason(direct_reason))\n", "print(\"to_features:\", explainer.to_features(direct_reason))" ] }, { "cell_type": "markdown", "id": "663667d5", "metadata": {}, "source": [ "We can remark that this direct reason contains 22 binary variables out of the 68 variables in the binary representation corresponding to the instance. This reason explains why the model predicts $0$ for this instance. But this is probably not the most compact abductive explanation for it. We invite you to look at the other types of reasons presented on the [Explanations Computation](/documentation/explanations/RFexplanations/) page. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }