{ "cells": [ { "cell_type": "markdown", "id": "b1db8c9a", "metadata": {}, "source": [ "# Contrastive Reasons" ] }, { "cell_type": "markdown", "id": "514a5144", "metadata": {}, "source": [ "{: .attention }\n", "> Currently, contrastives reasons for Boosted trees are only available for binary classification." ] }, { "cell_type": "markdown", "id": "2570300b-d8a7-479a-880f-cd91d8ceb6af", "metadata": {}, "source": "Unlike abductive explanations that explain why an instance $x$ is classified as belonging to a given class, the **contrastive explanations** explains why $x$ has not been classified by the ML model as expected.\n\nLet 𝑓 be a Boolean function represented by a random forest 𝑅𝐹, 𝑥 be an instance and 1 (resp. 0) the prediction of 𝑅𝐹 on 𝑥 (𝑓(𝑥)=1 (resp $f(x)=0$)), a **contrastive reason** for $x$ is a term $t$ such that:\n* $t \\subseteq t_{x}$, $t_{x} \\setminus t$ is not an implicant of $f;$ \n* for every $\\ell \\in t$, $t \\setminus \\{\\ell\\}$ does not satisfy this previous condition (i.e., $t$ is minimal w.r.t. set inclusion).\n\nFormally, a **contrastive reason** for $x$ is a subset $t$ of the characteristics of $x$ that is minimal w.r.t. set inclusion among those such that at least one instance $x'$ that coincides with $x$ except on the characteristics from $t$ is not classified by the decision tree as $x$ is. Stated otherwise, a **contrastive reason** represents adjustments of the features that we have to do to change the prediction for an instance. \n\nA contrastive reason is minimal w.r.t. set inclusion, i.e. there is no subset of this reason which is also a contrastive reason. A **minimal contrastive reason** for $x$ is a contrastive reason for $x$ that contains a minimal number of literals. In other words, a **minimal contrastive reason** has a minimal size. \n\nThe function [``contrastive_reason``](/documentation/api/classes/explainerBT/#contrastive_reason) allows computing this kind of explanation.\n\nThe library also provides a way to check that a reason is contrastive using the function ```is_contrastive_reason```.\n" }, { "cell_type": "markdown", "id": "aac420d8-b694-4326-b094-80f37df8505b", "metadata": {}, "source": [ "## Example from Building Trees" ] }, { "cell_type": "markdown", "id": "3ab1f94f-2827-4bd4-ba29-e9ce8439dc1b", "metadata": {}, "source": [ "We show now how to get them with PyXAI. We start by building the Boosted Tree:" ] }, { "cell_type": "code", "execution_count": 1, "id": "6039bf90-9d4f-4ae3-8088-a2a21d11819c", "metadata": {}, "outputs": [], "source": [ "from pyxai import Builder, Explaining\n", "\n", "node1_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.3)\n", "node1_2 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=-0.3, right=node1_1)\n", "node1_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=0.4, right=node1_2)\n", "node1_4 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=node1_3)\n", "tree1 = Builder.DecisionTree(4, node1_4)\n", "\n", "node2_1 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.4, right=0.3)\n", "node2_2 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=node2_1)\n", "node2_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node2_2, right=0.5)\n", "tree2 = Builder.DecisionTree(4, node2_3)\n", "\n", "node3_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=0.2, right=0.3)\n", "node3_2_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.2)\n", "node3_2_2 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.1, right=node3_1)\n", "node3_2_3 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=0.1)\n", "node3_3_1 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node3_2_1, right=node3_2_2)\n", "node3_3_2 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=-0.4, right=node3_2_3)\n", "node3_4 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=node3_3_1, right=node3_3_2)\n", "tree3 = Builder.DecisionTree(4, node3_4)\n", "\n", "BT = Builder.BoostedTrees([tree1, tree2, tree3], n_classes=2)" ] }, { "cell_type": "markdown", "id": "7bb91b78-5e35-4c63-be7a-22a6c522e6ef", "metadata": {}, "source": [ "We compute a contrastive reason for these two instances: " ] }, { "cell_type": "code", "execution_count": 2, "id": "edda6920-1c68-4438-ac36-be411421677b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "target_prediction: 1\n", "minimal contrastive reason: ['f4 == 1']\n" ] } ], "source": [ "explainer = Explaining.initialize(BT)\n", "explainer.set_instance((4,3,2,1))\n", "\n", "contrastive_reason = explainer.minimal_contrastive_reason()\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "print(\"minimal contrastive reason:\", explainer.to_features(contrastive_reason))\n", "assert explainer.is_contrastive_reason(contrastive_reason), \"It is not a contrastive reason !\"" ] }, { "cell_type": "code", "execution_count": 3, "id": "b1e548b3-abcc-4f6d-842e-2cd4c601537d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "target_prediction: 0\n" ] } ], "source": [ "# We can create a contrastive instance\n", "explainer.set_instance((4,3,2,0))\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "# The prediction is 0 now." ] }, { "cell_type": "markdown", "id": "1622b7b9-2e82-4ee4-94a3-c1bc56cc8eb8", "metadata": {}, "source": [ "## Example from Real Dataset" ] }, { "cell_type": "markdown", "id": "e89785e8-cbcf-4703-91ff-de06253be788", "metadata": {}, "source": [ "For this example, we take the compas.csv known to be biased. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a wrong classified instance." ] }, { "cell_type": "code", "execution_count": 5, "id": "79e0a715-e25b-41d5-a6f9-93cd79bbf305", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------- Information ---------------\n", "Problem type: classification\n", "Instances type: tabular\n", "Labels type: classes\n", "\n", "Dataset path: ../../../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 11\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Model creation, fitting and evaluation ---------------\n", "Splitting method: hold-out\n", "Problem type: classification\n", "Models type: boosted-tree\n", "model_parameters: {}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "Metrics:\n", " sklearn_confusion_matrix: [[624, 209], [294, 416]]\n", " precision: 66.56\n", " recall: 58.59154929577465\n", " f1_score: 62.32209737827716\n", " specificity: 74.90996398559425\n", " true_positive: 416\n", " true_negative: 624\n", " false_positive: 209\n", " false_negative: 294\n", " accuracy: 67.40116655865198\n", "Number of Training instances: 4629\n", "Number of Testing instances: 1543\n", "\n", "--------------- Explainer ----------------\n", "For the split number 0:\n", "**Boosted Tree model**\n", "NClasses: 2\n", "nTrees: 100\n", "nVariables: 38\n", "\n", "--------------- Instances ----------------\n", "Number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explaining\n", "\n", "learner = Learning.Xgboost(\"../../../dataset/compas.csv\", problem_type='classification')\n", "model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.BT)\n", "instance, prediction = learner.get_instances(model, n=1, is_correct=False)\n", "\n" ] }, { "cell_type": "markdown", "id": "27ba9df8-5185-4d2d-a387-d383bbcacf37", "metadata": {}, "source": [ "We compute a contrastive instance (with a time limit equal to 5 seconds). In order to compute contrastive reasons, it is better to activate the theory related to the type of features (see this [page](/documentation/explainer/theories/))." ] }, { "cell_type": "code", "execution_count": 7, "id": "ab2f471d-1faf-4465-bf72-29efb76dbd00", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "feature_names: ['Misdemeanor', 'Number_of_Priors', 'score_factor', 'Age_Above_FourtyFive', 'Age_Below_TwentyFive', 'African_American', 'Asian', 'Hispanic', 'Native_American', 'Other', 'Female']\n", "--------- Theory Feature Types -----------\n", "Before the one-hot encoding of categorical features:\n", "Numerical features: 1\n", "Categorical features: 2\n", "Binary features: 3\n", "Number of features: 6\n", "Characteristics of categorical features: {'African_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'African_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Asian': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Asian', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Hispanic': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Hispanic', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Native_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Native_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Other': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Other', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Age_Above_FourtyFive': ['Age', 'Above_FourtyFive', ['Above_FourtyFive', 'Below_TwentyFive']], 'Age_Below_TwentyFive': ['Age', 'Below_TwentyFive', ['Above_FourtyFive', 'Below_TwentyFive']]}\n", "\n", "Number of used features in the model (before the encoding of categorical features): 6\n", "Number of used features in the model (after the encoding of categorical features): 11\n", "----------------------------------------------\n", "instance: Misdemeanor 0\n", "Number_of_Priors 0\n", "score_factor 0\n", "Age_Above_FourtyFive 0\n", "Age_Below_TwentyFive 0\n", "African_American 1\n", "Asian 0\n", "Hispanic 0\n", "Native_American 0\n", "Other 0\n", "Female 0\n", "Name: 1, dtype: int64\n", "target_prediction: 0\n", "minimal contrastive reason: ['Number_of_Priors < 28.0', '{African_American,Asian,Hispanic,Native_American,Other} = African_American']\n", "is contrastive: True\n" ] } ], "source": [ "compas_types = {\n", " \"numerical\": [\"Number_of_Priors\"],\n", " \"binary\": [\"Misdemeanor\", \"score_factor\", \"Female\"],\n", " \"categorical\": {\"{African_American,Asian,Hispanic,Native_American,Other}\": [\"African_American\", \"Asian\", \"Hispanic\", \"Native_American\", \"Other\"],\n", " \"Age*\": [\"Above_FourtyFive\", \"Below_TwentyFive\"]}\n", "}\n", "\n", "\n", "\n", "explainer = Explaining.initialize(model, instance, features_type=compas_types)\n", "\n", "contrastive_reason = explainer.minimal_contrastive_reason(time_limit=5)\n", "print(\"instance: \", instance)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "print(\"minimal contrastive reason:\", explainer.to_features(contrastive_reason, contrastive=True))\n", "print(\"is contrastive: \", explainer.is_contrastive_reason(contrastive_reason))" ] }, { "cell_type": "markdown", "id": "313852ec-74db-4767-b566-efef07ce2f74", "metadata": {}, "source": [ "If one wants to change the classification, one needs to change the origin and the number of priors (to at least 4).\n", "\n", "Other types of explanations are presented in the [Explanations Computation](/documentation/explanations/RFexplanations/) page." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }