{ "cells": [ { "cell_type": "markdown", "id": "b1db8c9a", "metadata": {}, "source": [ "# Sufficient Reasons" ] }, { "cell_type": "markdown", "id": "514a5144", "metadata": {}, "source": [ "Let $f$ be a Boolean function represented by a random forest $RF$, $x$ be an instance and $1$ (resp. $0$) the prediction of $RF$ on $x$ ($f(x) = 1$ (resp 0)), a **sufficient reason** for $x$ is a term of the binary representation of the instance that is a prime implicant of $f$ (resp $\\neg f$) that covers $x$.\n", "\n", "In other words, a **sufficient reason** for an instance $x$ given a class described by a Boolean function $f$ is a subset $t$ of the characteristics of $x$ that is minimal w.r.t. set inclusion,and such that any instance $x'$ sharing this set t of characteristics is classified by $f$ as $x$ is.\n", "\n", "More information about sufficient reasons can be found in the paper [Trading Complexity for Sparsity in Random Forest Explanations](https://ojs.aaai.org/index.php/AAAI/article/view/20484).\n", "\n", "The function ```ExplainerRF.sufficient_reason``` allows computing this kind of explanation.\n", "\n", "The library also provides a way to check that a reason is sufficient using the function ```is_sufficient_reason```." ] }, { "cell_type": "markdown", "id": "b326e678", "metadata": {}, "source": "A sufficient reason is minimal w.r.t. set inclusion, i.e., there is no subset of this reason which is also a sufficient reason. A **minimal sufficient reason** for $x$ is a sufficient reason for $x$ that\ncontains a minimal number of literals. In other words, a **minimal sufficient reason** has a minimal size. \n\nThe function ```ExplainerRF.minimal_sufficient_reason``` allows computing this kind of explanation.\n" }, { "cell_type": "markdown", "id": "f766d997", "metadata": {}, "source": [ "The PyXAI library also provides a way to verify that a reason is sufficient:" ] }, { "cell_type": "markdown", "id": "e8fcf0b8", "metadata": {}, "source": [ "{: .warning}\n", "> Unfortunately, searching for MUS or even more a minimal MUS is a difficult computational task. If the dataset contains a lot of features or if the binary representation of the instance contains many binary variables, finding a MUS may be out of reach. In order to deal with this problem we introduced the notion of [Majoritary Reason](/documentation/classification/RFexplanations/majoritary/) which is an abductive explanation much easier to compute. " ] }, { "cell_type": "markdown", "id": "869cba3c", "metadata": {}, "source": [ "## Example from Hand-Crafted Trees" ] }, { "cell_type": "markdown", "id": "6a557012", "metadata": {}, "source": [ "For this example, we take the random forest of the [Building Models](/documentation/learning/builder/RFbuilder/) page consisting of $4$ binary features ($x_1$, $x_2$, $x_3$ and $x_4$). \n", "\n", "The following figure shows in red and bold a minimal sufficient reason $(x_1, x_4)$ for the instance $(1,1,1,1)$. \n", "\"RFsufficient1\"\n", "\n", "As you can see in the figure, some leaves of this sufficient reason (in red) can have a prediction equal to 0 or 1. These are the predictions from the trees ($T_1$, $T_2$ and $T_3$), but not from the random forest. We need to calculate for each possible interpretation arising from this reason the prediction $f$ from the random forest: \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
$x_1$$x_2$$x_3$$x_4$$T_1$$T_2$$T_3$$f$
10011111
10111101
11010111
11111111
\n", "\n", "As at least 2 trees out of 3 give the right prediction (1), $(x_1, x_4)$ is indeed a sufficient reason. \n", "\n", "The next figure shows in blue and bold a minimal sufficient reason $(-x_4)$ for the instance $(0,1,0,0)$. \n", "\"RFsufficient2\"\n", "\n", "As before, we compute the predictions associated with this reason: \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
$x_1$$x_2$$x_3$$x_4$$T_1$$T_2$$T_3$$f$
00000000
00100000
01000100
01100100
10000010
10100000
11000100
11100100
\n", "\n", "As at least 2 trees out of 3 have the right prediction (0), $(-x_4)$ is indeed a sufficient reason. \n", "\n", "Now, we show how to get them with PyXAI. We start by building the random forest:" ] }, { "cell_type": "code", "execution_count": 1, "id": "745fbf2c", "metadata": {}, "outputs": [], "source": [ "from pyxai import Builder, Explaining\n", "\n", "nodeT1_1 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT1_3 = Builder.DecisionNode(3, left=0, right=nodeT1_1)\n", "nodeT1_2 = Builder.DecisionNode(2, left=1, right=nodeT1_3)\n", "nodeT1_4 = Builder.DecisionNode(4, left=0, right=nodeT1_2)\n", "\n", "tree1 = Builder.DecisionTree(4, nodeT1_4, force_features_equal_to_binaries=True)\n", "\n", "nodeT2_4 = Builder.DecisionNode(4, left=0, right=1)\n", "nodeT2_1 = Builder.DecisionNode(1, left=0, right=nodeT2_4)\n", "nodeT2_2 = Builder.DecisionNode(2, left=nodeT2_1, right=1)\n", "\n", "tree2 = Builder.DecisionTree(4, nodeT2_2, force_features_equal_to_binaries=True) #4 features but only 3 used\n", "\n", "nodeT3_1_1 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT3_1_2 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT3_4_1 = Builder.DecisionNode(4, left=0, right=nodeT3_1_1)\n", "nodeT3_4_2 = Builder.DecisionNode(4, left=0, right=1)\n", "\n", "nodeT3_2_1 = Builder.DecisionNode(2, left=nodeT3_1_2, right=nodeT3_4_1)\n", "nodeT3_2_2 = Builder.DecisionNode(2, left=0, right=nodeT3_4_2)\n", "\n", "nodeT3_3_1 = Builder.DecisionNode(3, left=nodeT3_2_1, right=nodeT3_2_2)\n", "\n", "tree3 = Builder.DecisionTree(4, nodeT3_3_1, force_features_equal_to_binaries=True)\n", "forest = Builder.RandomForest([tree1, tree2, tree3], n_classes=2)" ] }, { "cell_type": "markdown", "id": "bad9b535", "metadata": {}, "source": [ "Then we compute a sufficient reasons for each of these two instances: " ] }, { "cell_type": "code", "execution_count": 2, "id": "0f5c98bf", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "minimal: (1, 4)\n", "-------------------------------\n", "target_prediction: 0\n", "sufficient: (-1, -3)\n", "minimal: (-4,)\n" ] } ], "source": [ "explainer = Explaining.initialize(forest)\n", "explainer.set_instance((1,1,1,1))\n", "\n", "sufficient = explainer.sufficient_reason()\n", "assert explainer.is_sufficient_reason(sufficient)\n", "assert sufficient == (1, 4), \"The sufficient reason is not good !\"\n", "\n", "minimal = explainer.minimal_sufficient_reason()\n", "print(\"minimal:\", minimal)\n", "assert minimal == (1, 4), \"The minimal sufficient reason is not good !\"\n", "\n", "print(\"-------------------------------\")\n", "instance = (0,1,0,0)\n", "explainer.set_instance(instance)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "\n", "sufficient = explainer.sufficient_reason()\n", "print(\"sufficient:\", sufficient)\n", "assert sufficient == (-1, -3), \"The sufficient reason is not good !\"\n", "\n", "minimal = explainer.minimal_sufficient_reason()\n", "print(\"minimal:\", minimal)\n", "assert minimal == (-4, ), \"The minimal sufficient reason is not good !\" \n" ] }, { "cell_type": "markdown", "id": "e0420183", "metadata": {}, "source": [ "## Example from a Real dataset" ] }, { "cell_type": "markdown", "id": "03c8f44e", "metadata": {}, "source": [ "For this example, we take the [compas](/assets/notebooks/dataset/compas.csv) dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. " ] }, { "cell_type": "code", "execution_count": 3, "id": "5a1c9c9b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------- Information ---------------\n", "Problem type: classification\n", "Instances type: tabular\n", "Labels type: classes\n", "\n", "Dataset path: ../../../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 11\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Model creation, fitting and evaluation ---------------\n", "Splitting method: hold-out\n", "Problem type: classification\n", "Models type: random-forest\n", "model_parameters: {}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "Metrics:\n", " sklearn_confusion_matrix: [[612, 211], [295, 425]]\n", " precision: 66.82389937106919\n", " recall: 59.02777777777778\n", " f1_score: 62.68436578171092\n", " specificity: 74.36208991494532\n", " true_positive: 425\n", " true_negative: 612\n", " false_positive: 211\n", " false_negative: 295\n", " accuracy: 67.20674011665587\n", "Number of Training instances: 4629\n", "Number of Testing instances: 1543\n", "\n", "--------------- Explainer ----------------\n", "For the split number 0:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 70\n", "\n", "--------------- Instances ----------------\n", "Number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explaining\n", "\n", "learner = Learning.Scikitlearn(\"../../../dataset/compas.csv\", problem_type='classification')\n", "model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.RF)\n", "instance, prediction = learner.get_instances(model, n=1, is_correct=True)" ] }, { "cell_type": "markdown", "id": "4cacbab0", "metadata": {}, "source": [ "This dataset is not very large and the computation of a sufficient reason is quite easy, but it is not so easy to derive a minimal one. Since the related solver (Optux) does not propose a time limit mode we commented the related code. " ] }, { "cell_type": "code", "execution_count": 4, "id": "b7691f19", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: Misdemeanor 0\n", "Number_of_Priors 0\n", "score_factor 0\n", "Age_Above_FourtyFive 1\n", "Age_Below_TwentyFive 0\n", "African_American 0\n", "Asian 0\n", "Hispanic 0\n", "Native_American 0\n", "Other 1\n", "Female 0\n", "Name: 0, dtype: int64\n", "prediction: 0\n", "\n", "\n", "sufficient reason: (-1, -2, -3, -4, -6, -11, -12)\n", "to features ['Number_of_Priors <= 0.5', 'score_factor <= 0.5', 'African_American <= 0.5', 'Hispanic <= 0.5', 'Female <= 0.5']\n", "is sufficient_reason (for max 50 checks): None\n", "\n" ] } ], "source": [ "explainer = Explaining.initialize(model, instance)\n", "print(\"instance:\", instance)\n", "print(\"prediction:\", prediction)\n", "print()\n", "sufficient_reason = explainer.sufficient_reason()\n", "print(\"\\nsufficient reason:\", sufficient_reason)\n", "print(\"to features\", explainer.to_features(sufficient_reason))\n", "print(\"is sufficient_reason (for max 50 checks): \", explainer.is_sufficient_reason(sufficient_reason, n_samples=50))\n", "print()\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }