{ "cells": [ { "cell_type": "markdown", "id": "b1db8c9a", "metadata": {}, "source": [ "# Majoritary Reasons " ] }, { "cell_type": "markdown", "id": "514a5144", "metadata": {}, "source": [ "Let $f$ be a Boolean function represented by a random forest $RF$, $x$ be an instance and $1$ (resp. $0$) the prediction of $RF$ on $x$ ($f(x) = 1$ (resp $f(x)=0$)).\n", "A **majoritary reason** for $x$ is a term $t$ covering $x$, such that $t$ \n", "is an implicant of at least a majoritary of decision trees $T_i$ that is minimal w.r.t. set inclusion." ] }, { "cell_type": "markdown", "id": "701d48f7", "metadata": {}, "source": [ "In general, the notions of majoritary reason and of sufficient reason do not coincide. Indeed, a sufficient reason is a prime implicant (covering $x$) of the forest $F$, while a majoritary reason is an implicant $t$ (covering $x$) of a majority of decision trees in the forest $F$. More information about majoritary reasons can be found in the article [Trading Complexity for Sparsity in Random Forest Explanations](https://ojs.aaai.org/index.php/AAAI/article/view/20484).\n", "\n", "The function ```majoritary_reason``` allows computing this kind of explanation.\n", "\n", "The library also provides a way to check that a reason is majoritary using the function ```is_majoritary_reason```." ] }, { "cell_type": "markdown", "id": "ab6ae05a-1648-4bd2-826d-6e54b163e7ba", "metadata": {}, "source": [ "### Minimal majoritary reason \n", "\n", "As for minimal sufficient reasons, a natural way of improving the quality of majoritary reasons is to seek for the most parsimonious ones. A **minimal majoritary reason** for $x$ is a majoritary reason for $x$ that\n", "contains a minimal number of literals. In other words, a **minimal majoritary reason** has a minimal size. \n", "\n", "The function ```minimal_majoritary_reason``` allows computing this kind of explanation.\n" ] }, { "cell_type": "markdown", "id": "bc7a977a", "metadata": {}, "source": [ "### Preferred reasons\n", "\n", "One can also find preferred majoritary reasons. Indeed, the user may prefer reason containing some features and can provide weights in order to select some reasons instead of others. Please take a look to the [Preferences](/documentation/explainer/preferences/) page for more information on preference handling.\n", "\n", "The function ```preferred_majoritary_reason``` allows computing this kind of explanation.\n" ] }, { "cell_type": "markdown", "id": "d487e10f", "metadata": {}, "source": [ "## Example from Hand-Crafted Trees" ] }, { "cell_type": "markdown", "id": "8b37b50b", "metadata": {}, "source": [ "For this example, we take the Random Forest of the [Building Models](/documentation/learning/builder/RFbuilder/) page consisting of 4 binary features (𝑥1, 𝑥2, 𝑥3 and 𝑥4). " ] }, { "cell_type": "markdown", "id": "6b4b8844", "metadata": {}, "source": [ "The following figure shows in red and bold a minimal majoritary reason $(x_2, x_3, x_4)$ for the instance $(1,1,1,1)$. \n", "\"RFmajoritary1\"\n", "\n", "For the majoritary reason $(x_2, x_3, x_4)$, we can see that even if $T_1$ leads to a prediction equal to 0 (when we have $-x_1$), there is always a majority of Decision Trees (i.e. $T_2$ and $T_3$) that give a prediction of 1. \n", "\n", "The next figure shows in blue and bold a minimal majoritary reason $(x_2, -x_4)$ for the instance $(0,1,0,0)$. \n", "\n", "\"RFmajoritary2\"\n", "\n", "For $(x_2, -x_4)$, $T_2$ always gives a prediction of 1 while $T_1$ and $T_3$ always give a prediction of 0. So in all cases we have a majority of trees ($T_1$ and $T_3$) that lead to the right prediction (0).\n", "\n", "Now, we show how to get them with PyXAI. We start by building the random forest:" ] }, { "cell_type": "code", "execution_count": 1, "id": "db173b61", "metadata": {}, "outputs": [], "source": [ "from pyxai import Builder, Explaining\n", "\n", "nodeT1_1 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT1_3 = Builder.DecisionNode(3, left=0, right=nodeT1_1)\n", "nodeT1_2 = Builder.DecisionNode(2, left=1, right=nodeT1_3)\n", "nodeT1_4 = Builder.DecisionNode(4, left=0, right=nodeT1_2)\n", "tree1 = Builder.DecisionTree(4, nodeT1_4, force_features_equal_to_binaries=True)\n", "\n", "nodeT2_4 = Builder.DecisionNode(4, left=0, right=1)\n", "nodeT2_1 = Builder.DecisionNode(1, left=0, right=nodeT2_4)\n", "nodeT2_2 = Builder.DecisionNode(2, left=nodeT2_1, right=1)\n", "tree2 = Builder.DecisionTree(4, nodeT2_2, force_features_equal_to_binaries=True) #4 features but only 3 used\n", "\n", "nodeT3_1_1 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT3_1_2 = Builder.DecisionNode(1, left=0, right=1)\n", "nodeT3_4_1 = Builder.DecisionNode(4, left=0, right=nodeT3_1_1)\n", "nodeT3_4_2 = Builder.DecisionNode(4, left=0, right=1)\n", "nodeT3_2_1 = Builder.DecisionNode(2, left=nodeT3_1_2, right=nodeT3_4_1)\n", "nodeT3_2_2 = Builder.DecisionNode(2, left=0, right=nodeT3_4_2)\n", "nodeT3_3_1 = Builder.DecisionNode(3, left=nodeT3_2_1, right=nodeT3_2_2)\n", "tree3 = Builder.DecisionTree(4, nodeT3_3_1, force_features_equal_to_binaries=True)\n", "\n", "forest = Builder.RandomForest([tree1, tree2, tree3], n_classes=2)" ] }, { "cell_type": "markdown", "id": "45cce4b2", "metadata": {}, "source": [ "We compute a majoritary reason for each of these two instances: " ] }, { "cell_type": "code", "execution_count": 4, "id": "f35d03a3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "target_prediction: 1\n", "majoritary: (1, 2, 4)\n", "[['v', '-1', '2', '3', '4', '-5', '6', '7', '8', '9', '']]\n", "minimal: (2, 3, 4)\n", "-------------------------------\n", "target_prediction: 0\n", "majoritary: (-1, -4)\n", "[['v', '1', '2', '3', '-4', '5', '-6', '7', '-8', '9', '']]\n", "minimal: (2, -4)\n" ] } ], "source": [ "explainer = Explaining.initialize(forest)\n", "explainer.set_instance((1,1,1,1))\n", "\n", "majoritary = explainer.majoritary_reason(seed=1234)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "print(\"majoritary:\", majoritary)\n", "assert explainer.is_majoritary_reason(majoritary)\n", "\n", "\n", "minimal = explainer.minimal_majoritary_reason()\n", "print(\"minimal:\", minimal)\n", "\n", "print(\"-------------------------------\")\n", "instance = (0,1,0,0)\n", "explainer.set_instance(instance)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "\n", "majoritary = explainer.majoritary_reason()\n", "print(\"majoritary:\", majoritary)\n", "\n", "minimal = explainer.minimal_majoritary_reason()\n", "print(\"minimal:\", minimal)\n" ] }, { "cell_type": "markdown", "id": "e92e4ae4", "metadata": {}, "source": [ "## Example from a Real Dataset" ] }, { "cell_type": "markdown", "id": "d58684a0", "metadata": {}, "source": [ "For this example, we take the [compas](/assets/notebooks/dataset/compas.csv) dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. " ] }, { "cell_type": "code", "execution_count": 1, "id": "ab7788c9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------- Information ---------------\n", "Problem type: classification\n", "Instances type: tabular\n", "Labels type: classes\n", "\n", "Dataset path: ../../../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 11\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Model creation, fitting and evaluation ---------------\n", "Splitting method: hold-out\n", "Problem type: classification\n", "Models type: random-forest\n", "model_parameters: {}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "Metrics:\n", " sklearn_confusion_matrix: [[595, 216], [321, 411]]\n", " precision: 65.55023923444976\n", " recall: 56.14754098360656\n", " f1_score: 60.48565121412804\n", " specificity: 73.36621454993835\n", " true_positive: 411\n", " true_negative: 595\n", " false_positive: 216\n", " false_negative: 321\n", " accuracy: 65.19766688269605\n", "Number of Training instances: 4629\n", "Number of Testing instances: 1543\n", "\n", "--------------- Explainer ----------------\n", "For the split number 0:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 69\n", "\n", "--------------- Instances ----------------\n", "Number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explaining\n", "\n", "learner = Learning.Scikitlearn(\"../../../dataset/compas.csv\", problem_type='classification')\n", "model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.RF)\n", "instance, prediction = learner.get_instances(model, n=1, is_correct=True)" ] }, { "cell_type": "markdown", "id": "a20938fa", "metadata": {}, "source": [ "We compute a majoritary reason for the instance and a minimal one." ] }, { "cell_type": "code", "execution_count": 2, "id": "1ccce2fc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: Misdemeanor 0\n", "Number_of_Priors 0\n", "score_factor 0\n", "Age_Above_FourtyFive 1\n", "Age_Below_TwentyFive 0\n", "African_American 0\n", "Asian 0\n", "Hispanic 0\n", "Native_American 0\n", "Other 1\n", "Female 0\n", "Name: 0, dtype: int64\n", "prediction: 0\n", "\n", "\n", "majoritary reason: (-2, 4, -5, -6, 7, -10, -11, -14, -15, -17, -29)\n", "len majoritary reason: 11\n", "to features ['Number_of_Priors <= 0.5', 'score_factor <= 0.5', 'Age_Above_FourtyFive > 0.5', 'Age_Below_TwentyFive <= 0.5', 'African_American <= 0.5', 'Other > 0.5', 'Female <= 0.5']\n", "is majoritary reason: True\n", "\n", "\n", "minimal: (-1, -2, -3, 4, -5, -8, -10, -11, -15, -17)\n", "minimal: 10\n", "to features ['Number_of_Priors <= 0.5', 'score_factor <= 0.5', 'Age_Above_FourtyFive > 0.5', 'Age_Below_TwentyFive <= 0.5', 'African_American <= 0.5', 'Other > 0.5', 'Female <= 0.5']\n", "is majoritary reason: True\n" ] } ], "source": [ "explainer = Explaining.initialize(model, instance)\n", "print(\"instance:\", instance)\n", "print(\"prediction:\", prediction)\n", "print()\n", "majoritary_reason = explainer.majoritary_reason()\n", "#for s in sufficient_reasons:\n", "print(\"\\nmajoritary reason:\", majoritary_reason)\n", "print(\"len majoritary reason:\", len(majoritary_reason))\n", "print(\"to features\", explainer.to_features(majoritary_reason))\n", "print(\"is majoritary reason: \", explainer.is_majoritary_reason(majoritary_reason))\n", "print()\n", "minimal = explainer.minimal_majoritary_reason()\n", "print(\"\\nminimal:\", minimal)\n", "print(\"minimal:\", len(minimal))\n", "print(\"to features\", explainer.to_features(majoritary_reason))\n", "print(\"is majoritary reason: \", explainer.is_majoritary_reason(majoritary_reason))\n" ] }, { "cell_type": "markdown", "id": "7b9c2c4d", "metadata": {}, "source": [ "Other types of explanations are presented in the [Explanations Computation](/documentation/explanations/RFexplanations/) page." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }