{ "cells": [ { "cell_type": "markdown", "id": "b1db8c9a", "metadata": {}, "source": [ "# Sufficient Reasons" ] }, { "cell_type": "markdown", "id": "514a5144", "metadata": {}, "source": [ "Let $f$ be a Boolean function represented by a decision tree $T$, $x$ be an instance and $p$ be the prediction of $T$ on $x$ ($f(x) = p$), a **sufficient reason** for $x$ is a term of the binary representation of the instance that is a prime implicant of $f$ that covers $x$.\n", "\n", "In other words, a **sufficient reason** for an instance $x$ given a class described by a Boolean function $f$ is a subset $t$ of the characteristics of $x$ that is minimal w.r.t. set inclusion and such that any instance $x'$ sharing this set $t$ of characteristics is classified by $f$ as $x$ is.\n", "\n", "The function ```ExplainerDT.sufficient_reason``` allows computing this kind of explanation.\n", "\n", "The library provides a way to check that a reason is sufficient using the function ```is_sufficient_reason```." ] }, { "cell_type": "markdown", "id": "6119a2c9-7ff8-4d6f-816e-cf85251fb753", "metadata": {}, "source": [ "### Minimal Sufficient Reason" ] }, { "cell_type": "markdown", "id": "b326e678", "metadata": {}, "source": [ "A sufficient reason is minimal w.r.t. set inclusion, i.e. there is no subset of this reason which is also a sufficient reason. A **minimal sufficient reason** for $x$ is a sufficient reason for $x$ that\n", "contains a minimal number of literals. In other words, a **minimal sufficient reason** has a minimal size. \n", "\n", "The function ```ExplainerDT.minimal_sufficient_reason``` allows computing this kind of explanation." ] }, { "cell_type": "markdown", "id": "0b280433-c872-4012-9d3b-56b311f025ca", "metadata": {}, "source": [ "### Preferences over Sufficient Reasons" ] }, { "cell_type": "markdown", "id": "f38f0ac8", "metadata": {}, "source": [ "One can also compute preferred sufficient reasons. Indeed, the user may prefer reason containing some features and can provide weights in order to discriminate some features. Please take a look to the [Preferences](/documentation/explainer/preferences/) page for more information.\n", "\n", "The function ```preferred_sufficient_reason``` allows computing this kind of explanation." ] }, { "cell_type": "markdown", "id": "8d9c8f14-fdaa-435b-8c0d-a3b44fed5247", "metadata": {}, "source": [ "### Other methods" ] }, { "cell_type": "markdown", "id": "c708202d", "metadata": {}, "source": [ "Reminder that the literals of a binary representation represent the conditions \"\\ \\ \\ ?\" (such as \"$x_4 \\ge 0.5$ ?\") implied by an instance. A literal $l$ of a binary representation is a **necessary feature** for $x$ if and only if $l$ belongs to every sufficient reason $t$ for $x$. In contrast, a literal $l$ of a binary representation is a **relevant feature** for $x$ if and only if $l$ belongs to at least one sufficient reason $t$ for $x$. \n", "\n", "PyXAI provides methods to compute them : \n", "\n", " - ```necessary_literals```.\n", " - ```relevant_literals```." ] }, { "cell_type": "markdown", "id": "6d4deafe", "metadata": {}, "source": [ "For a given instance, it can be interesting to compute the number of sufficient reasons or the number of sufficient reasons per literal of the binary representation. PyXAI allows this: \n", "\n", " - ```n_sufficient_reasons```.\n", " - ```n_sufficient_reasons_per_attribute```." ] }, { "cell_type": "markdown", "id": "273580b5", "metadata": {}, "source": [ "More information about sufficient reasons and minimal sufficient reasons can be found in the paper [On the Explanatory Power of Decision Trees](https://arxiv.org/abs/2108.05266).\n", "The basic methods ([``initialize``](/documentation/api/modules/explaining/), ```set_instance```, ```to_features```, ```is_reason```, ...) of the ```Explainer``` module used in the next examples are described in the [Explainer Principles](/documentation/explainer/) page." ] }, { "cell_type": "markdown", "id": "869cba3c", "metadata": {}, "source": [ "## Example from a Hand-Crafted Tree" ] }, { "cell_type": "markdown", "id": "6a557012", "metadata": {}, "source": [ "For this example, we take the Decision Tree of the [Building Models](/documentation/learning/builder/DTbuilder/) page consisting of $4$ binary features ($x_1$, $x_2$, $x_3$ and $x_4$). \n", "\n", "The following figure shows in red and bold a minimal sufficient reason $(x_1, x_4)$ for the instance $(1,1,1,1)$. \n", "\"DTbuilder\"\n", "\n", "The next figure gives in blue and bold a minimal sufficient reason $(-x_4)$ for the instance $(0,0,0,0)$. \n", "\"DTbuilder\"\n", "\n", " We now show how to get those reasons with PyXAI. We start by building the decision tree: " ] }, { "cell_type": "code", "execution_count": 6, "id": "745fbf2c", "metadata": {}, "outputs": [], "source": [ "from pyxai import Builder, Explaining\n", "\n", "node_x4_1 = Builder.DecisionNode(4, left=0, right=1)\n", "node_x4_2 = Builder.DecisionNode(4, left=0, right=1)\n", "node_x4_3 = Builder.DecisionNode(4, left=0, right=1)\n", "node_x4_4 = Builder.DecisionNode(4, left=0, right=1)\n", "node_x4_5 = Builder.DecisionNode(4, left=0, right=1)\n", "\n", "node_x3_1 = Builder.DecisionNode(3, left=0, right=node_x4_1)\n", "node_x3_2 = Builder.DecisionNode(3, left=node_x4_2, right=node_x4_3)\n", "node_x3_3 = Builder.DecisionNode(3, left=node_x4_4, right=node_x4_5)\n", "\n", "node_x2_1 = Builder.DecisionNode(2, left=0, right=node_x3_1)\n", "node_x2_2 = Builder.DecisionNode(2, left=node_x3_2, right=node_x3_3)\n", "\n", "node_x1_1 = Builder.DecisionNode(1, left=node_x2_1, right=node_x2_2)\n", "\n", "tree = Builder.DecisionTree(4, node_x1_1, force_features_equal_to_binaries=True)" ] }, { "cell_type": "markdown", "id": "bad9b535", "metadata": {}, "source": [ "And we compute the sufficient reasons for each of these two instances: " ] }, { "cell_type": "code", "execution_count": 8, "id": "0f5c98bf", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sufficient_reasons: ((1, 4), (2, 3, 4))\n", "to_features: ['f1 >= 0.5', 'f4 >= 0.5']\n", "to_features: ['f2 >= 0.5', 'f3 >= 0.5', 'f4 >= 0.5']\n", "[['v', '1', '-2', '-3', '']]\n", "minimal_sufficient_reason: (1, 4)\n", "-------------------------------\n", "sufficient_reasons: ((-4,), (-1, -2), (-1, -3))\n", "to_features: ['f4 < 0.5']\n", "to_features: ['f1 < 0.5', 'f2 < 0.5']\n", "to_features: ['f1 < 0.5', 'f3 < 0.5']\n", "[['v', '-1', '2', '-3', '-4', '5', '-6', '-7', '']]\n", "minimal_sufficient_reasons: (-4,)\n" ] } ], "source": [ "explainer = Explaining.initialize(tree)\n", "explainer.set_instance((1,1,1,1))\n", "\n", "sufficient_reasons = explainer.sufficient_reason(n=Explaining.ALL)\n", "print(\"sufficient_reasons:\", sufficient_reasons)\n", "assert sufficient_reasons == ((1, 4), (2, 3, 4)), \"The sufficient reasons are not good !\"\n", "\n", "for sufficient in sufficient_reasons:\n", " print(\"to_features:\", explainer.to_features(sufficient)) \n", " assert explainer.is_sufficient_reason(sufficient), \"This is have to be a sufficient reason !\"\n", "\n", "minimals = explainer.minimal_sufficient_reason()\n", "print(\"minimal_sufficient_reason:\", minimals)\n", "assert minimals == (1, 4), \"The minimal sufficient reasons are not good !\"\n", "\n", "print(\"-------------------------------\")\n", "\n", "explainer.set_instance((0,0,0,0))\n", "\n", "sufficient_reasons = explainer.sufficient_reason(n=Explaining.ALL)\n", "print(\"sufficient_reasons:\", sufficient_reasons)\n", "assert sufficient_reasons == ((-4,), (-1, -2), (-1, -3)), \"The sufficient reasons are not good !\"\n", "\n", "for sufficient in sufficient_reasons:\n", " print(\"to_features:\", explainer.to_features(sufficient))\n", " assert explainer.is_sufficient_reason(sufficient), \"This is have to be a sufficient reason !\"\n", "\n", "minimals = explainer.minimal_sufficient_reason(n=1)\n", "print(\"minimal_sufficient_reasons:\", minimals)\n", "assert minimals == (-4,), \"The minimal sufficient reasons are not good !\"" ] }, { "cell_type": "markdown", "id": "e0420183", "metadata": {}, "source": [ "## Example from a Real Dataset" ] }, { "cell_type": "markdown", "id": "03c8f44e", "metadata": {}, "source": [ "For this example, we take the ```compas.csv``` dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. " ] }, { "cell_type": "code", "execution_count": 9, "id": "5a1c9c9b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------- Information ---------------\n", "Problem type: classification\n", "Instances type: tabular\n", "Labels type: classes\n", "\n", "Dataset path: ../../../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 11\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Model creation, fitting and evaluation ---------------\n", "Splitting method: hold-out\n", "Problem type: classification\n", "Models type: decision-tree\n", "model_parameters: {}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "Metrics:\n", " sklearn_confusion_matrix: [[631, 212], [326, 374]]\n", " precision: 63.82252559726962\n", " recall: 53.42857142857142\n", " f1_score: 58.16485225505443\n", " specificity: 74.85172004744959\n", " true_positive: 374\n", " true_negative: 631\n", " false_positive: 212\n", " false_negative: 326\n", " accuracy: 65.13285806869735\n", "Number of Training instances: 4629\n", "Number of Testing instances: 1543\n", "\n", "--------------- Explainer ----------------\n", "For the split number 0:\n", "**Decision Tree Model**\n", "nFeatures: 11\n", "nNodes: 574\n", "nVariables: 48\n", "\n", "--------------- Instances ----------------\n", "Number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explaining\n", "\n", "learner = Learning.Scikitlearn(\"../../../dataset/compas.csv\", problem_type='classification')\n", "model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.DT)\n", "instance, prediction = learner.get_instances(model, n=1, is_correct=True)" ] }, { "cell_type": "markdown", "id": "4cacbab0", "metadata": {}, "source": [ "And we compute a sufficient reason for this instance: " ] }, { "cell_type": "code", "execution_count": 10, "id": "b7691f19", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: Misdemeanor 0\n", "Number_of_Priors 0\n", "score_factor 0\n", "Age_Above_FourtyFive 1\n", "Age_Below_TwentyFive 0\n", "African_American 0\n", "Asian 0\n", "Hispanic 0\n", "Native_American 0\n", "Other 1\n", "Female 0\n", "Name: 0, dtype: int64\n", "prediction: 0\n", "\n", "\n", "sufficient reason: 4\n", "to features ['Misdemeanor <= 0.5', 'Number_of_Priors <= 0.5', 'score_factor <= 0.5']\n", "is sufficient_reason (for max 50 checks): True\n", "\n", "[['v', '1', '-2', '-3', '4', '5', '-6', '-7', '-8', '-9', '-10', '-11', '-12', '-13', '-14', '15', '-16', '-17', '-18', '-19', '-20', '21', '22', '23', '24', '25', '-26', '-27', '-28', '-29', '-30', '-31', '-32', '-33', '-34', '-35', '-36', '-37', '-38', '-39', '']]\n", "\n", "minimal: 4\n", "is sufficient_reason (for max 50 checks): True\n", "\n", "\n", "necessary literals: [-1]\n", "\n", "necessary literals features: ['score_factor <= 0.5']\n", "\n", "relevant literals: [-5, -6, -3, -11, -2, 4, -18, -13, 7, -8, -9, -12, -15, -31]\n", "\n", "n sufficient reasons: 15\n", "\n", "sufficient_reasons_per_attribute: {-1: 15, -5: 8, -6: 7, -3: 12, -11: 5, -2: 5, 4: 10, -18: 10, -13: 10, 7: 5, -8: 7, -9: 5, -12: 4, -15: 1, -31: 1}\n", "\n", "sufficient_reasons_per_attribute features: OrderedDict({'Misdemeanor': [{'id': 1, 'name': 'Misdemeanor', 'operator_sign_considered': , 'threshold': np.float64(0.5), 'weight': 5, 'string': 'Misdemeanor <= 0.5'}], 'Number_of_Priors': [{'id': 2, 'name': 'Number_of_Priors', 'operator_sign_considered': , 'threshold': np.float64(0.5), 'weight': 8, 'string': 'Number_of_Priors <= 0.5'}], 'score_factor': [{'id': 3, 'name': 'score_factor', 'operator_sign_considered': , 'threshold': np.float64(0.5), 'weight': 15, 'string': 'score_factor <= 0.5'}], 'Age_Above_FourtyFive': [{'id': 4, 'name': 'Age_Above_FourtyFive', 'operator_sign_considered': , 'threshold': np.float64(0.5), 'weight': 10, 'string': 'Age_Above_FourtyFive > 0.5'}], 'Age_Below_TwentyFive': [{'id': 5, 'name': 'Age_Below_TwentyFive', 'operator_sign_considered': , 'threshold': np.float64(0.5), 'weight': 12, 'string': 'Age_Below_TwentyFive <= 0.5'}], 'African_American': [{'id': 6, 'name': 'African_American', 'operator_sign_considered': , 'threshold': np.float64(0.5), 'weight': 4, 'string': 'African_American <= 0.5'}], 'Asian': [{'id': 7, 'name': 'Asian', 'operator_sign_considered': , 'threshold': np.float64(0.5), 'weight': 7, 'string': 'Asian <= 0.5'}], 'Hispanic': [{'id': 8, 'name': 'Hispanic', 'operator_sign_considered': , 'threshold': np.float64(0.5), 'weight': 7, 'string': 'Hispanic <= 0.5'}], 'Other': [{'id': 10, 'name': 'Other', 'operator_sign_considered': , 'threshold': np.float64(0.5), 'weight': 5, 'string': 'Other > 0.5'}], 'Female': [{'id': 11, 'name': 'Female', 'operator_sign_considered': , 'threshold': np.float64(0.5), 'weight': 5, 'string': 'Female <= 0.5'}]})\n" ] } ], "source": [ "explainer = Explaining.initialize(model, instance)\n", "print(\"instance:\", instance)\n", "print(\"prediction:\", prediction)\n", "print()\n", "sufficient_reason = explainer.sufficient_reason(n=1)\n", "#for s in sufficient_reasons:\n", "print(\"\\nsufficient reason:\", len(sufficient_reason))\n", "print(\"to features\", explainer.to_features(sufficient_reason))\n", "print(\"is sufficient_reason (for max 50 checks): \", explainer.is_sufficient_reason(sufficient_reason, n_samples=50))\n", "print()\n", "minimal = explainer.minimal_sufficient_reason()\n", "print(\"\\nminimal:\", len(minimal))\n", "print(\"is sufficient_reason (for max 50 checks): \", explainer.is_sufficient_reason(sufficient_reason, n_samples=50))\n", "print()\n", "print(\"\\nnecessary literals: \", explainer.necessary_literals())\n", "print(\"\\nnecessary literals features: \", explainer.to_features(explainer.necessary_literals()))\n", "print(\"\\nrelevant literals: \", explainer.relevant_literals())\n", "print()\n", "print(\"n sufficient reasons:\", explainer.n_sufficient_reasons())\n", "sufficient_reasons_per_attribute = explainer.n_sufficient_reasons_per_attribute()\n", "print(\"\\nsufficient_reasons_per_attribute:\", sufficient_reasons_per_attribute)\n", "print(\"\\nsufficient_reasons_per_attribute features:\", explainer.to_features(sufficient_reasons_per_attribute, details=True))\n" ] }, { "cell_type": "markdown", "id": "14fa4d1d", "metadata": {}, "source": [ "Other types of explanations are presented in the [Explanations Computation](/documentation/explanations/DTexplanations/) page." ] }, { "cell_type": "markdown", "id": "552c7f56-8c63-4f41-8b3a-dd59034cbc51", "metadata": {}, "source": [ "## See Also\n", " - API: ```Builder```, ```ExplainerDT```, ```Learner```." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }