{ "cells": [ { "cell_type": "markdown", "id": "b1db8c9a", "metadata": {}, "source": [ "# Majoritary Reasons " ] }, { "cell_type": "markdown", "id": "514a5144", "metadata": {}, "source": [ "Let $f$ be a Boolean function represented by a random forest $RF$, $x$ be an instance and $1$ (resp. $0$) the prediction of $RF$ on $x$ ($f(x) = 1$ (resp $f(x)=0$)).\n", "A **majoritary reason** for $x$ is a term $t$ covering $x$, such that $t$ \n", "is an implicant of at least a majoritary of decision trees $T_i$ that is minimal w.r.t. set inclusion." ] }, { "cell_type": "markdown", "id": "701d48f7", "metadata": {}, "source": [ "In general, the notions of majoritary reason and of sufficient reason do not coincide. Indeed, a sufficient reason is a prime implicant (covering $x$) of the forest $F$, while a majoritary reason is an implicant $t$ (covering $x$) of a majority of decision trees in the forest $F$. More information about majoritary reasons can be found in the article [Trading Complexity for Sparsity in Random Forest Explanations](https://ojs.aaai.org/index.php/AAAI/article/view/20484)." ] }, { "cell_type": "markdown", "id": "adad5906", "metadata": {}, "source": [ "| <ExplainerRF Object>.majoritary_reason(*, n=1, n_iterations=50, time_limit=None, seed=0): | \n", "| :----------- | \n", "| If ```n``` is set to 1, this method calls a greedy algorithm to compute a majoritary reason. The algorithm is run ```n_iterations``` times and the smallest majoritary reason found is returned. On the contrary, if ```n``` is set to ```Explainer.ALL```, a CNF formula associated to the random forest is created and a SAT solver is called to solve it. Each solution corresponds to a majoritary reason.

Excluded features are supported. The reasons are in the form of binary variables, you must use the ```to_features``` method if you want a representation based on the features considered at start.|\n", "| time_limit ```Integer``` ```None```: The time limit of the method in seconds. Set this to ```None``` to give this process an infinite amount of time. Default value is ```None```.|\n", "| n ```Integer```: The number of majoritary reasons computed. Currently n=1 or n=Explainer.ALL is only supported. Default value is 1.|\n", "| n_iterations ```Integer```: Only used if n=1. It is the number of iterations done by the greedy algorithm. Default value is 50.|\n", "| seed ```Integer```: The seed when the greedy algorithm is used. Set to 0 this parameter in order to use a random seed. Default value is 0|\n" ] }, { "cell_type": "markdown", "id": "714c0d17", "metadata": {}, "source": [ "As for minimal sufficient reasons, a natural way of improving the quality of majoritary reasons is to seek for the most parsimonious ones. A **minimal majoritary reason** for $x$ is a majoritary reason for $x$ that\n", "contains a minimal number of literals. In other words, a **minimal majoritary reason** has a minimal size. " ] }, { "cell_type": "markdown", "id": "48840e70", "metadata": {}, "source": [ "| <ExplainerRF Object>.minimal_majoritary_reason(*, n=1, time_limit=None): | \n", "| :----------- | \n", "|This method considers a CNF formula representing the random forest as hard clauses and adds binary variables representing the instance as unary soft clauses with weights equal to 1. Several calls to a MAXSAT solver (OPENWBO) are performed and the result of each call is a minimal majoritary reason. The method prevents from finding the same reason twice or more by adding clauses (called blocking clauses) between each invocation.

Returns ```n``` minimal majoritary reason of the current instance in a Tuple (when ```n``` is set to 1, does not return a Tuple but just the reason). Supports the excluded features. The reasons are in the form of binary variables, you must use the ```to_features``` method if you want to convert them into features.|\n", "| n ```Integer```: The number of majoritary reasons computed. Currently n=1 or n=Exmplainer.ALL is only supported. Default value is 1.|\n", "| time_limit ```Integer``` ```None```: The time limit of the method in seconds. Set this to ```None``` to give this process an infinite amount of time. Default value is ```None```.|" ] }, { "cell_type": "markdown", "id": "bc7a977a", "metadata": {}, "source": [ "One can also find preferred majoritary reasons. Indeed, the user may prefer reason containing some features and can provide weights in order to select some reasons instead of others. Please take a look to the [Preferences](/documentation/explainer/preferences/) page for more informations on preference handling." ] }, { "cell_type": "markdown", "id": "264f0d82", "metadata": {}, "source": [ "| <ExplainerRF Object>.prefered_majoritary_reason(*, method, n=1, time_limit=None, weights=None, features_partition=None): | \n", "| :----------- | \n", "|This method considers a CNF formula representing the random forest as hard clauses and adds binary variables representing the instance as unary soft clauses with weights equal to different values depending the ```method``` used. If the method is ```PreferredReasonMethod.WEIGHTS``` then weights are given by the parameter ```weights```, otherwise this parameter is useless. If the method is ```PreferredReasonMethod.INCLUSION_PREFERRED``` then the partition of features is given by the parameter features_partition, otherwise this parameter is useless. To derived a preferred reason, several calls to a MAXSAT solver (OPENWBO) are performed and the result of each call is a preferred majoritary reason. The method prevents from finding the same reason twice or more by adding clauses (called blocking clauses) between each invocation.

Returns ```n``` preferred majoritary reason of the current instance in a Tuple (when ```n``` is set to 1, does not return a Tuple but just the reason). Supports the excluded features. The reasons are in the form of binary variables, you must use the ```to_features``` method if you want a representation based on the features considered at start.|\n", "| method ```PreferredReasonMethod.WEIGHTS``` ```PreferredReasonMethods.SHAPLEY``` ```PreferredReasonMethod.FEATURE_IMPORTANCE``` ```PreferredReasonMethod.WORD_FREQUENCY```: The method used to derive preferred majoritary reasons.|\n", "| n ```Integer```: The number of majoritary reasons computed. Currently n=1 or n=Exmplainer.ALL is only supported. Default value is 1.|\n", "| time_limit ```Integer``` ```None```: The time limit of the method in seconds. Sets this to ```None``` to give this process an infinite amount of time. Default value is ```None```.|\n", "| weights ```List```: The weights (list of floats, one per feature, used to discriminate features. Only usefull when ```method``` is ```PreferredReasonMethod.WEIGHTS```. Default value is ```None```.|\n", "| features_partition ```List``` of ```List```: The partition of features. The first elements are preferred to the second ones, and so on. Only usefull when ```method``` is ```PreferredReasonMethod.INCLUSION_PREFERRED```. Default value is ```None```.|" ] }, { "cell_type": "markdown", "id": "4fb5f191", "metadata": {}, "source": [ "The PyXAI library also provides a way to test that a reason actually is a majority reason:" ] }, { "cell_type": "markdown", "id": "3a17fe2f", "metadata": {}, "source": [ "| <ExplainerRF Object>.is_majoritary_reason(reason): | \n", "| :----------- | \n", "| This method checks whether a reason is a majoritary reason. It first calls the method ```is_implicant``` to check if this reason leads to the correct prediction or not. Then it verifies the minimality of the reason in the sense of set inclusion. To do that, it deletes a literal of the reason, tests with ```is_implicant``` that this new implicant is not a majority reason and puts back this literal. The method repeats this operation on every literal of the reason. The method is deterministic and returns ```True``` or ```False```.|\n", "| reason ```List``` of ```Integer```: The reason to be checked.|" ] }, { "cell_type": "markdown", "id": "d487e10f", "metadata": {}, "source": [ "## Example from Hand-Crafted Trees" ] }, { "cell_type": "markdown", "id": "8b37b50b", "metadata": {}, "source": [ "For this example, we take the Random Forest of the [Building Models](/documentation/learning/builder/RFbuilder/) page consisting of 4 binary features (𝑥1, 𝑥2, 𝑥3 and 𝑥4). " ] }, { "cell_type": "markdown", "id": "6b4b8844", "metadata": {}, "source": [ "The following figure shows in red and bold a minimal majoritary reason $(x_2, x_3, x_4)$ for the instance $(1,1,1,1)$. \n", "\"RFmajoritary1\"\n", "\n", "For the majoritary reason $(x_2, x_3, x_4)$, we can see that even if $T_1$ leads to a prediction equal to 0 (when we have $-x_1$), there is always a majority of Decision Trees (i.e. $T_2$ and $T_3$) that give a prediction of 1. \n", "\n", "The next figure shows in blue and bold a minimal majoritary reason $(x_2, -x_4)$ for the instance $(0,1,0,0)$. \n", "\n", "\"RFmajoritary2\"\n", "\n", "For $(x_2, -x_4)$, $T_2$ always gives a prediction of 1 while $T_1$ and $T_3$ always give a prediction of 0. So in all cases we have a majority of trees ($T_1$ and $T_3$) that lead to the right prediction (0).\n", "\n", "Now, we show how to get them with PyXAI. We start by building the random forest:" ] }, { "cell_type": "code", "execution_count": 1, "id": "db173b61", "metadata": {}, "outputs": [], "source": [ "from pyxai import Builder, Explainer\n", "\n", "nodeT1_1 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT1_3 = Builder.DecisionNode(3, left=0, right=nodeT1_1)\n", "nodeT1_2 = Builder.DecisionNode(2, left=1, right=nodeT1_3)\n", "nodeT1_4 = Builder.DecisionNode(4, left=0, right=nodeT1_2)\n", "tree1 = Builder.DecisionTree(4, nodeT1_4, force_features_equal_to_binaries=True)\n", "\n", "nodeT2_4 = Builder.DecisionNode(4, left=0, right=1)\n", "nodeT2_1 = Builder.DecisionNode(1, left=0, right=nodeT2_4)\n", "nodeT2_2 = Builder.DecisionNode(2, left=nodeT2_1, right=1)\n", "tree2 = Builder.DecisionTree(4, nodeT2_2, force_features_equal_to_binaries=True) #4 features but only 3 used\n", "\n", "nodeT3_1_1 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT3_1_2 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT3_4_1 = Builder.DecisionNode(4, left=0, right=nodeT3_1_1)\n", "nodeT3_4_2 = Builder.DecisionNode(4, left=0, right=1)\n", "nodeT3_2_1 = Builder.DecisionNode(2, left=nodeT3_1_2, right=nodeT3_4_1)\n", "nodeT3_2_2 = Builder.DecisionNode(2, left=0, right=nodeT3_4_2)\n", "nodeT3_3_1 = Builder.DecisionNode(3, left=nodeT3_2_1, right=nodeT3_2_2)\n", "tree3 = Builder.DecisionTree(4, nodeT3_3_1, force_features_equal_to_binaries=True)\n", "\n", "forest = Builder.RandomForest([tree1, tree2, tree3], n_classes=2)" ] }, { "cell_type": "markdown", "id": "45cce4b2", "metadata": {}, "source": [ "We compute a majoritary reason for each of these two instances: " ] }, { "cell_type": "code", "execution_count": 2, "id": "f35d03a3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "target_prediction: 1\n", "majoritary: (1, 2, 4)\n", "minimal: (2, 3, 4)\n", "-------------------------------\n", "target_prediction: 0\n", "majoritary: (-1, -4)\n", "minimal: (2, -4)\n" ] } ], "source": [ "explainer = Explainer.initialize(forest)\n", "explainer.set_instance((1,1,1,1))\n", "\n", "majoritary = explainer.majoritary_reason(seed=1234)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "print(\"majoritary:\", majoritary)\n", "assert explainer.is_majoritary_reason(majoritary)\n", "\n", "\n", "minimal = explainer.minimal_majoritary_reason()\n", "print(\"minimal:\", minimal)\n", "assert minimal == (2,3 , 4), \"The minimal reason is not good !\"\n", "\n", "print(\"-------------------------------\")\n", "instance = (0,1,0,0)\n", "explainer.set_instance(instance)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "\n", "majoritary = explainer.majoritary_reason()\n", "print(\"majoritary:\", majoritary)\n", "\n", "minimal = explainer.minimal_majoritary_reason()\n", "print(\"minimal:\", minimal)\n", "assert minimal == (2, -4), \"The minimal reason is not good !\" \n" ] }, { "cell_type": "markdown", "id": "e92e4ae4", "metadata": {}, "source": [ "## Example from a Real Dataset" ] }, { "cell_type": "markdown", "id": "d58684a0", "metadata": {}, "source": [ "For this example, we take the [compas](/assets/notebooks/dataset/compas.csv) dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. " ] }, { "cell_type": "code", "execution_count": 3, "id": "ab7788c9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data:\n", " Number_of_Priors score_factor Age_Above_FourtyFive \n", "0 0 0 1 \\\n", "1 0 0 0 \n", "2 4 0 0 \n", "3 0 0 0 \n", "4 14 1 0 \n", "... ... ... ... \n", "6167 0 1 0 \n", "6168 0 0 0 \n", "6169 0 0 1 \n", "6170 3 0 0 \n", "6171 2 0 0 \n", "\n", " Age_Below_TwentyFive African_American Asian Hispanic \n", "0 0 0 0 0 \\\n", "1 0 1 0 0 \n", "2 1 1 0 0 \n", "3 0 0 0 0 \n", "4 0 0 0 0 \n", "... ... ... ... ... \n", "6167 1 1 0 0 \n", "6168 1 1 0 0 \n", "6169 0 0 0 0 \n", "6170 0 1 0 0 \n", "6171 1 0 0 1 \n", "\n", " Native_American Other Female Misdemeanor Two_yr_Recidivism \n", "0 0 1 0 0 0 \n", "1 0 0 0 0 1 \n", "2 0 0 0 0 1 \n", "3 0 1 0 1 0 \n", "4 0 0 0 0 1 \n", "... ... ... ... ... ... \n", "6167 0 0 0 0 0 \n", "6168 0 0 0 0 0 \n", "6169 0 1 0 0 0 \n", "6170 0 0 1 1 0 \n", "6171 0 0 1 0 1 \n", "\n", "[6172 rows x 12 columns]\n", "-------------- Information ---------------\n", "Dataset name: ../../../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 12\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Evaluation ---------------\n", "method: HoldOut\n", "output: RF\n", "learner_type: Classification\n", "learner_options: {'max_depth': None, 'random_state': 0}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "metrics:\n", " accuracy: 65.71274298056156\n", "nTraining instances: 4320\n", "nTest instances: 1852\n", "\n", "--------------- Explainer ----------------\n", "For the evaluation number 0:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 68\n", "\n", "--------------- Instances ----------------\n", "number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explainer\n", "\n", "learner = Learning.Scikitlearn(\"../../../dataset/compas.csv\", learner_type=Learning.CLASSIFICATION)\n", "model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.RF)\n", "instance, prediction = learner.get_instances(model, n=1, correct=True)" ] }, { "cell_type": "markdown", "id": "a20938fa", "metadata": {}, "source": [ "We compute a majoritary reason for the instance and a minimal one." ] }, { "cell_type": "code", "execution_count": 4, "id": "1ccce2fc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: [0 0 1 0 0 0 0 0 1 0 0]\n", "prediction: 0\n", "\n", "\n", "majoritary reason: (-1, -2, -3, -4, -6, -11, -14, -18)\n", "len majoritary reason: 8\n", "to features ('Number_of_Priors <= 0.5', 'score_factor <= 0.5', 'Age_Below_TwentyFive <= 0.5', 'African_American <= 0.5', 'Female <= 0.5')\n", "is majoritary reason: True\n", "\n", "\n", "minimal: (-1, -2, -3, -4, -6, -11, -13, -14)\n", "minimal: 8\n", "to features ('Number_of_Priors <= 0.5', 'score_factor <= 0.5', 'Age_Below_TwentyFive <= 0.5', 'African_American <= 0.5', 'Female <= 0.5')\n", "is majoritary reason: True\n" ] } ], "source": [ "explainer = Explainer.initialize(model, instance)\n", "print(\"instance:\", instance)\n", "print(\"prediction:\", prediction)\n", "print()\n", "majoritary_reason = explainer.majoritary_reason()\n", "#for s in sufficient_reasons:\n", "print(\"\\nmajoritary reason:\", majoritary_reason)\n", "print(\"len majoritary reason:\", len(majoritary_reason))\n", "print(\"to features\", explainer.to_features(majoritary_reason))\n", "print(\"is majoritary reason: \", explainer.is_majoritary_reason(majoritary_reason))\n", "print()\n", "minimal = explainer.minimal_majoritary_reason()\n", "print(\"\\nminimal:\", minimal)\n", "print(\"minimal:\", len(minimal))\n", "print(\"to features\", explainer.to_features(majoritary_reason))\n", "print(\"is majoritary reason: \", explainer.is_majoritary_reason(majoritary_reason))\n" ] }, { "cell_type": "markdown", "id": "7b9c2c4d", "metadata": {}, "source": [ "Other types of explanations are presented in the [Explanations Computation](/documentation/explanations/RFexplanations/) page." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }