{ "cells": [ { "cell_type": "markdown", "id": "b1db8c9a", "metadata": {}, "source": [ "# Sufficient Reasons" ] }, { "cell_type": "markdown", "id": "514a5144", "metadata": {}, "source": [ "Let $f$ be a Boolean function represented by a random forest $RF$, $x$ be an instance and $1$ (resp. $0$) the prediction of $RF$ on $x$ ($f(x) = 1$ (resp 0)), a **sufficient reason** for $x$ is a term of the binary representation of the instance that is a prime implicant of $f$ (resp $\\neg f$) that covers $x$.\n", "\n", "In other words, a **sufficient reason** for an instance $x$ given a class described by a Boolean function $f$ is a subset $t$ of the characteristics of $x$ that is minimal w.r.t. set inclusion,and such that any instance $x'$ sharing this set t of characteristics is classified by $f$ as $x$ is.\n", "\n", "More information about sufficient reasons can be found in the paper [Trading Complexity for Sparsity in Random Forest Explanations](https://ojs.aaai.org/index.php/AAAI/article/view/20484)." ] }, { "cell_type": "markdown", "id": "91c11ebe", "metadata": {}, "source": [ "| <Explainer Object>.sufficient_reason(*, time_limit=None): | \n", "| :----------- | \n", "| This method creates a CNF formula corresponding to the negation of the random forest and the instance and then calls a solver ([Muser](https://bitbucket.org/anton_belov/muser2)) that extracts a MUS (a Minimum Unsatisfiable Formula) to derive sufficient reasons. Enumerating MUS is a hard task, this is why this function returns only one sufficient reason.

Returns a sufficient reason of the current instance. Supports the excluded features. The reason is in the form of binary variables, you must use the ```to_features``` method if you want a representation based on the features considered at start.|\n", "| time_limit ```Integer``` ```None```: The time limit of the method in seconds. Set this to ```None``` to give this process an infinite amount of time. Default value is ```None```.|\n" ] }, { "cell_type": "markdown", "id": "b326e678", "metadata": {}, "source": [ "A sufficient reason is minimal w.r.t. set inclusion, i.e., there is no subset of this reason which is also a sufficient reason. A **minimal sufficient reason** for $x$ is a sufficient reason for $x$ that\n", "contains a minimal number of literals. In other words, a **minimal sufficient reason** has a minimal size. " ] }, { "cell_type": "markdown", "id": "0240672f", "metadata": {}, "source": [ "| <ExplainerRF Object>.minimal_sufficient_reason(*, time_limit=None): | \n", "| :----------- | \n", "|This method creates a CNF formula corresponding to the negation of the random forest and the instance and next calls a solver ([Optilux](https://pysathq.github.io/docs/html/api/examples/optux.html)) that extracts a minimal MUS (Minimum Unsatisfiable Formula) to derive minimal sufficient reasons. Finding one minimal MUS is a hard task, this is the reason why this function allows the extraction of only one minimal sufficient reason.

Returns a minimal sufficient reason of the current instance. Supports the excluded features. The reason is in the form of binary variables, you must use the ```to_features``` method if you want a representation based on the features considered at start.|\n", "| time_limit ```Integer``` ```None```: The time limit of the method in seconds. Set this to ```None``` to give this process an infinite amount of time. Default value is ```None```.|" ] }, { "cell_type": "markdown", "id": "f766d997", "metadata": {}, "source": [ "The PyXAI library also provides a way to verify that a reason is sufficient:" ] }, { "cell_type": "markdown", "id": "b149861a", "metadata": {}, "source": [ "| <Explainer Object>.is_sufficient_reason(reason, *, n_samples=50): | \n", "| :----------- | \n", "| This method checks wheter a reason is sufficient. To this purpose, we first call the method ```is_reason``` to check whether n_samples complete binary representations from this reason (randomly generated) lead to the correct prediction or not. Then, we verify the minimality of the reason in the sense of set inclusion. To do that, we delete a literal of the reason, test with ```is_reason``` that this new binary representation is not a reason and put back this literal. The method repeats this operation on every literal of the reason. Because this method is based on random generation and a limited number of samples, it is not deterministic and it is not complete (i.e., it is not 100% sure to provide the right answer). Therefore, this method can return ```True```, ```False``` or ```None```. |\n", "| reason ```List``` of ```Integer```: The reason to check.|\n", "| n_samples ```Integer```: The number of samples to check, i.e., the number of complete binary representations generated randomly from the reason. Default value is 50.|" ] }, { "cell_type": "markdown", "id": "e8fcf0b8", "metadata": {}, "source": [ "{: .warning}\n", "> Unfortunately, searching for MUS or even more a minimal MUS is a difficult computational task. If the dataset contains a lot of features or if the binary representation of the instance contains many binary variables, finding a MUS may be out of reach. In order to deal with this problem we introduced the notion of [Majoritary Reason](/documentation/classification/RFexplanations/majoritary/) which is an abductive explanation much easier to compute. " ] }, { "cell_type": "markdown", "id": "869cba3c", "metadata": {}, "source": [ "## Example from Hand-Crafted Trees" ] }, { "cell_type": "markdown", "id": "6a557012", "metadata": {}, "source": [ "For this example, we take the random forest of the [Building Models](/documentation/learning/builder/RFbuilder/) page consisting of $4$ binary features ($x_1$, $x_2$, $x_3$ and $x_4$). \n", "\n", "The following figure shows in red and bold a minimal sufficient reason $(x_1, x_4)$ for the instance $(1,1,1,1)$. \n", "\"RFsufficient1\"\n", "\n", "As you can see in the figure, some leaves of this sufficient reason (in red) can have a prediction equal to 0 or 1. These are the predictions from the trees ($T_1$, $T_2$ and $T_3$), but not from the random forest. We need to calculate for each possible interpretation arising from this reason the prediction $f$ from the random forest: \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
$x_1$$x_2$$x_3$$x_4$$T_1$$T_2$$T_3$$f$
10011111
10111101
11010111
11111111
\n", "\n", "As at least 2 trees out of 3 give the right prediction (1), $(x_1, x_4)$ is indeed a sufficient reason. \n", "\n", "The next figure shows in blue and bold a minimal sufficient reason $(-x_4)$ for the instance $(0,1,0,0)$. \n", "\"RFsufficient2\"\n", "\n", "As before, we compute the predictions associated with this reason: \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
$x_1$$x_2$$x_3$$x_4$$T_1$$T_2$$T_3$$f$
00000000
00100000
01000100
01100100
10000010
10100000
11000100
11100100
\n", "\n", "As at least 2 trees out of 3 have the right prediction (0), $(-x_4)$ is indeed a sufficient reason. \n", "\n", "Now, we show how to get them with PyXAI. We start by building the random forest:" ] }, { "cell_type": "code", "execution_count": 1, "id": "745fbf2c", "metadata": {}, "outputs": [], "source": [ "from pyxai import Builder, Explainer\n", "\n", "nodeT1_1 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT1_3 = Builder.DecisionNode(3, left=0, right=nodeT1_1)\n", "nodeT1_2 = Builder.DecisionNode(2, left=1, right=nodeT1_3)\n", "nodeT1_4 = Builder.DecisionNode(4, left=0, right=nodeT1_2)\n", "\n", "tree1 = Builder.DecisionTree(4, nodeT1_4, force_features_equal_to_binaries=True)\n", "\n", "nodeT2_4 = Builder.DecisionNode(4, left=0, right=1)\n", "nodeT2_1 = Builder.DecisionNode(1, left=0, right=nodeT2_4)\n", "nodeT2_2 = Builder.DecisionNode(2, left=nodeT2_1, right=1)\n", "\n", "tree2 = Builder.DecisionTree(4, nodeT2_2, force_features_equal_to_binaries=True) #4 features but only 3 used\n", "\n", "nodeT3_1_1 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT3_1_2 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT3_4_1 = Builder.DecisionNode(4, left=0, right=nodeT3_1_1)\n", "nodeT3_4_2 = Builder.DecisionNode(4, left=0, right=1)\n", "\n", "nodeT3_2_1 = Builder.DecisionNode(2, left=nodeT3_1_2, right=nodeT3_4_1)\n", "nodeT3_2_2 = Builder.DecisionNode(2, left=0, right=nodeT3_4_2)\n", "\n", "nodeT3_3_1 = Builder.DecisionNode(3, left=nodeT3_2_1, right=nodeT3_2_2)\n", "\n", "tree3 = Builder.DecisionTree(4, nodeT3_3_1, force_features_equal_to_binaries=True)\n", "forest = Builder.RandomForest([tree1, tree2, tree3], n_classes=2)" ] }, { "cell_type": "markdown", "id": "bad9b535", "metadata": {}, "source": [ "Then we compute a sufficient reasons for each of these two instances: " ] }, { "cell_type": "code", "execution_count": 2, "id": "0f5c98bf", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "minimal: (1, 4)\n", "-------------------------------\n", "target_prediction: 0\n", "sufficient: (-1, -3)\n", "minimal: (-4,)\n" ] } ], "source": [ "explainer = Explainer.initialize(forest)\n", "explainer.set_instance((1,1,1,1))\n", "\n", "sufficient = explainer.sufficient_reason()\n", "assert explainer.is_sufficient_reason(sufficient)\n", "assert sufficient == (1, 4), \"The sufficient reason is not good !\"\n", "\n", "minimal = explainer.minimal_sufficient_reason()\n", "print(\"minimal:\", minimal)\n", "assert minimal == (1, 4), \"The minimal sufficient reason is not good !\"\n", "\n", "print(\"-------------------------------\")\n", "instance = (0,1,0,0)\n", "explainer.set_instance(instance)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "\n", "sufficient = explainer.sufficient_reason()\n", "print(\"sufficient:\", sufficient)\n", "assert sufficient == (-1, -3), \"The sufficient reason is not good !\"\n", "\n", "minimal = explainer.minimal_sufficient_reason()\n", "print(\"minimal:\", minimal)\n", "assert minimal == (-4, ), \"The minimal sufficient reason is not good !\" \n" ] }, { "cell_type": "markdown", "id": "e0420183", "metadata": {}, "source": [ "## Example from a Real dataset" ] }, { "cell_type": "markdown", "id": "03c8f44e", "metadata": {}, "source": [ "For this example, we take the [compas](/assets/notebooks/dataset/compas.csv) dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. " ] }, { "cell_type": "code", "execution_count": 3, "id": "5a1c9c9b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data:\n", " Number_of_Priors score_factor Age_Above_FourtyFive \\\n", "0 0 0 1 \n", "1 0 0 0 \n", "2 4 0 0 \n", "3 0 0 0 \n", "4 14 1 0 \n", "... ... ... ... \n", "6167 0 1 0 \n", "6168 0 0 0 \n", "6169 0 0 1 \n", "6170 3 0 0 \n", "6171 2 0 0 \n", "\n", " Age_Below_TwentyFive African_American Asian Hispanic \\\n", "0 0 0 0 0 \n", "1 0 1 0 0 \n", "2 1 1 0 0 \n", "3 0 0 0 0 \n", "4 0 0 0 0 \n", "... ... ... ... ... \n", "6167 1 1 0 0 \n", "6168 1 1 0 0 \n", "6169 0 0 0 0 \n", "6170 0 1 0 0 \n", "6171 1 0 0 1 \n", "\n", " Native_American Other Female Misdemeanor Two_yr_Recidivism \n", "0 0 1 0 0 0 \n", "1 0 0 0 0 1 \n", "2 0 0 0 0 1 \n", "3 0 1 0 1 0 \n", "4 0 0 0 0 1 \n", "... ... ... ... ... ... \n", "6167 0 0 0 0 0 \n", "6168 0 0 0 0 0 \n", "6169 0 1 0 0 0 \n", "6170 0 0 1 1 0 \n", "6171 0 0 1 0 1 \n", "\n", "[6172 rows x 12 columns]\n", "-------------- Information ---------------\n", "Dataset name: ../../../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 12\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Evaluation ---------------\n", "method: HoldOut\n", "output: RF\n", "learner_type: Classification\n", "learner_options: {'max_depth': None, 'random_state': 0}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "metrics:\n", " accuracy: 65.71274298056156\n", "nTraining instances: 4320\n", "nTest instances: 1852\n", "\n", "--------------- Explainer ----------------\n", "For the evaluation number 0:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 68\n", "\n", "--------------- Instances ----------------\n", "number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explainer\n", "\n", "learner = Learning.Scikitlearn(\"../../../dataset/compas.csv\", learner_type=Learning.CLASSIFICATION)\n", "model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.RF)\n", "instance, prediction = learner.get_instances(model, n=1, correct=True)" ] }, { "cell_type": "markdown", "id": "4cacbab0", "metadata": {}, "source": [ "This dataset is not very large and the computation of a sufficient reason is quite easy, but it is not so easy to derive a minimal one. Since the related solver (Optux) does not propose a time limit mode we commented the related code. " ] }, { "cell_type": "code", "execution_count": 4, "id": "b7691f19", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: [0 0 1 0 0 0 0 0 1 0 0]\n", "prediction: 0\n", "\n", "\n", "sufficient reason: (-1, -2, -3, -6, -11, -14)\n", "to features ('Number_of_Priors <= 0.5', 'score_factor <= 0.5', 'Age_Below_TwentyFive <= 0.5', 'African_American <= 0.5')\n", "is sufficient_reason (for max 50 checks): None\n", "\n" ] } ], "source": [ "explainer = Explainer.initialize(model, instance)\n", "print(\"instance:\", instance)\n", "print(\"prediction:\", prediction)\n", "print()\n", "sufficient_reason = explainer.sufficient_reason()\n", "print(\"\\nsufficient reason:\", sufficient_reason)\n", "print(\"to features\", explainer.to_features(sufficient_reason))\n", "print(\"is sufficient_reason (for max 50 checks): \", explainer.is_sufficient_reason(sufficient_reason, n_samples=50))\n", "print()\n" ] }, { "cell_type": "markdown", "id": "14fa4d1d", "metadata": {}, "source": [ "Other types of explanations are presented in the [Explanations Computation](/documentation/explanations/RFexplanations/) page." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }