{ "cells": [ { "cell_type": "markdown", "id": "601b0667", "metadata": {}, "source": [ "# Concepts" ] }, { "cell_type": "markdown", "id": "43ffdd12", "metadata": {}, "source": [ "This section deals with the concepts of the ```Explainer``` object of PyXAI. First, we show you how to use it, then we explain in detail the notion of binary variables, and to finish, we give an example." ] }, { "cell_type": "markdown", "id": "107a059b", "metadata": {}, "source": [ "## Main Methods" ] }, { "cell_type": "markdown", "id": "49832bd4", "metadata": {}, "source": [ "First of all, in order to explain instances from a given model, you need to create an Explainer Object. This is done\n", "using the function ```Explainer.initialize```." ] }, { "cell_type": "markdown", "id": "2e84822a", "metadata": {}, "source": [ "| Explainer.initialize(model, instance=None, features_type=None):|\n", "|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| \n", "| Depending on the model given in the first argument, this method creates an ```ExplainerDT```, an ```ExplainerRF``` an ```ExplainerBT```, or an ```ExplainerRegressionBT``` object. This object is able to give explanations about the instance given as a second parameter. This last parameter is optional because you can set the instance later using the ```set_instance``` function. |\n", "| model ```DecisionTree``` ```RandomForest``` ```BoostedTree```: The model for which explanations will be calculated.|\n", "| instance ```Numpy Array``` of ```Float```: The instance to be explained. Default value is ```None```.|\n", "| features_type ```String``` ```Dict``` ```None```: Either a dictionary indicating the type of features or the path to a ```.types``` file containing this information. Activate domain theories. More details are given on the [Theories](/documentation/explainer/theories/) page. |" ] }, { "cell_type": "markdown", "id": "32ef6d0c", "metadata": {}, "source": [ "Once the ```Explainer``` is created you can explain several instances using it. Each time you want to explain a new instance, you need to call the function ```set_instance``` (it is not necessary to create a new ```Explainer``` object)." ] }, { "cell_type": "markdown", "id": "da895ea8", "metadata": {}, "source": [ "| <Explainer Object>.set_instance(instance): |\n", "|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n", "| Sets a new instance to be explained. |\n", "| instance ```Numpy Array``` of ```Float```: The instance to be explained. |" ] }, { "cell_type": "markdown", "id": "fcad9d89", "metadata": {}, "source": [ "The prediction made by the ML model is given by the ```target_prediction``` variable of the ```Explainer``` Object. Now, we have to consider binary representation." ] }, { "cell_type": "markdown", "id": "fa3aa966", "metadata": {}, "source": [ "## Binary representation" ] }, { "cell_type": "markdown", "id": "7432c7d8", "metadata": {}, "source": [ "First, let us recall that all the ML models we considerconsist of trees (only one for a decision tree). Each tree contains nodes\n", "representing conditions \"\\ \\ \\ ?\" (such as \"$x_4 \\ge 0.5$ ?\"). Internally,\n", "the ```Explainer``` works with these conditions, that are treated as Boolean variables. The **binary representation**\n", "of an instance is a set of Boolean variables matching such conditions. Each Boolean\n", "variable\n", "represents a condition \"\\ \\ \\ ?\" of the model. The binary representation\n", "can be found in the ```binary_variable``` variable of\n", "the ```Explainer``` Object. The function ```to_features``` converts a binary representation (or an explanation) into a tuple of conditions \"\n", "\\ \\ \\\" representing the features used." ] }, { "cell_type": "markdown", "id": "0f45a3e1", "metadata": {}, "source": [ "| <Explainer Object>.to_features(self, binary_representation, *, eliminate_redundant_features=True, details=False, contrastive=False, without_intervals=False):|\n", "|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n", "| When the parameter details is set to ```False```, returns a ```Tuple``` of ```String``` where each ```String``` represents a condition \"\\ \\ \\\" associated with the binary representation given as first parameter. By default, a string represents such a condition but if you need more information, you can set the parameter ```details``` to ```True```. In this case, the method returns a ```Tuple``` of ```Dict``` where each dictionary provides more information on the condition. This method also allows one to eliminate redundant conditions. For example, if we have \"feature_a > 3\" and \"feature_a > 2\", we keep only the binary variable linked to the Boolean corresponding to the \"feature_a > 3\". Therefore, if you want to get all conditions, you have to set the parameter ```eliminate_redundant``` to ```False```. |\n", "| binary_representation ```List``` ```Tuple```: A set of (signed) binary variables. |\n", "| eliminate_redundant_features ```Boolean```: ```True``` or ```False``` depending if you want eliminate or not redundant conditions. Default value is ```True```.|\n", "| details ```Boolean```: ```True``` or ```False``` depending on whether you want details or not. Default value is ```False```.| \n", "| contrastive ```Boolean```: ```True``` or ```False``` depending on whether you want to get a contrastive explanation or not. When this parameter is set to ```True```, the elimination of redundant features must be reversed. Default value is ```False```.|\n", "| without_intervals ```Boolean```: ```True``` or ```False``` depending if you want to consider a compact representation with intervals or not.|\n" ] }, { "cell_type": "markdown", "id": "4621eeab", "metadata": {}, "source": [ "{: .note}\n", "> The details provided with the details parameter set to ```True``` in the to_features function are represented by the\n", "> keys of the returned dictionary:\n", "\n", "- ```[\"id\"]```: The id of the feature.\n", "- ```[\"name\"]```: The name of the feature (if labels are known, otherwise they are named f1, f2 and so on).\n", "- ```[\"operator\"]```: The operator associated with the condition.\n", "- ```[\"threshold\"]```: The threshold of the condition.\n", "- ```[\"sign\"]```: The sign of the Boolean variable in the binary representation: ```True``` if the condition is satisfied\n", " else ```False```.\n", "- ```[\"weight\"]```: The weight of the condition, used only with user preferences." ] }, { "cell_type": "markdown", "id": "eeeaf935", "metadata": {}, "source": [ "{: .note}\n", "> Explanations computed using out explainer module may contain redundant conditions. Let us take an example with a feature\n", "> $f_1$ and two Boolean variables $x_1$ and $x_2$ associated with the condition $(f_1 \\ge 5)$ and $(f_1 \\ge 3)$\n", "> respectively. If in the instance $f_1=6$ then $x_1$ and $x_2$ are set to true. The explanation that is derived can involve\n", "> both\n", "> of them. By setting the ```eliminate_redundant_features parameter``` to ```True``` in the method ```to_features```,\n", "> we remove $(f_1 \\ge 3)$ which is redundant." ] }, { "cell_type": "markdown", "id": "3d72e250", "metadata": {}, "source": [ "{: .attention }\n", "> Forgetting to set parameter ```contrastive``` to ```True``` to display a contrastive explanation may result in an incorrect explanation. " ] }, { "cell_type": "markdown", "id": "f4b1899e", "metadata": {}, "source": [ "The function ```to_features``` is independent of the instance given by the explainer through the ```initialize```\n", "and ```set_instance``` methods, but depends only on the binary representation given by the parameter." ] }, { "cell_type": "markdown", "id": "4896445c", "metadata": {}, "source": [ "## Example" ] }, { "cell_type": "markdown", "id": "b4057ff0", "metadata": {}, "source": [ "We present in the following an example based on the dataset [iris](/assets/notebooks/dataset/iris.csv) and a Decision Tree as ML model. You should take a look to the [Generating Model](/documentation/learning/) page if you need more information about the ```Learning``` module." ] }, { "cell_type": "code", "execution_count": 1, "id": "7a754ae5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data:\n", " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "0 5.1 3.5 1.4 0.2 Iris-setosa\n", "1 4.9 3.0 1.4 0.2 Iris-setosa\n", "2 4.7 3.2 1.3 0.2 Iris-setosa\n", "3 4.6 3.1 1.5 0.2 Iris-setosa\n", "4 5.0 3.6 1.4 0.2 Iris-setosa\n", ".. ... ... ... ... ...\n", "145 6.7 3.0 5.2 2.3 Iris-virginica\n", "146 6.3 2.5 5.0 1.9 Iris-virginica\n", "147 6.5 3.0 5.2 2.0 Iris-virginica\n", "148 6.2 3.4 5.4 2.3 Iris-virginica\n", "149 5.9 3.0 5.1 1.8 Iris-virginica\n", "\n", "[150 rows x 5 columns]\n", "-------------- Information ---------------\n", "Dataset name: ../../dataset/iris.csv\n", "nFeatures (nAttributes, with the labels): 5\n", "nInstances (nObservations): 150\n", "nLabels: 3\n", "--------------- Evaluation ---------------\n", "method: HoldOut\n", "output: DT\n", "learner_type: Classification\n", "learner_options: {'max_depth': None, 'random_state': 0}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "metrics:\n", " accuracy: 97.77777777777777\n", "nTraining instances: 105\n", "nTest instances: 45\n", "\n", "--------------- Explainer ----------------\n", "For the evaluation number 0:\n", "**Decision Tree Model**\n", "nFeatures: 4\n", "nNodes: 6\n", "nVariables: 5\n", "\n", "--------------- Instances ----------------\n", "number of instances selected: 1\n", "----------------------------------------------\n", "instance: [5.1 3.5 1.4 0.2]\n", "binary representation: (-1, -2, -3, 4, -5)\n", "target_prediction: 0\n", "to_features: ('Sepal.Width > 3.100000023841858', 'Petal.Length <= 4.950000047683716', 'Petal.Width <= 0.75')\n", "to_features (keep redundant): ('Sepal.Width > 3.100000023841858', 'Petal.Length <= 4.950000047683716', 'Petal.Width <= 0.75', 'Petal.Width <= 1.6500000357627869', 'Petal.Width <= 1.75')\n", "to_features with details: OrderedDict([('Petal.Width', [{'id': 4, 'name': 'Petal.Width', 'operator': , 'sign': True, 'operator_sign_considered': , 'threshold': 0.75, 'weight': None, 'theory': None, 'string': 'Petal.Width <= 0.75'}])])\n" ] } ], "source": [ "from pyxai import Learning, Explainer, Tools\n", "learner = Learning.Scikitlearn(\"../../dataset/iris.csv\", learner_type=Learning.CLASSIFICATION)\n", "model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.DT)\n", "\n", "instance, prediction = learner.get_instances(model, n=1, correct=True, predictions=[0])\n", "explainer = Explainer.initialize(model, instance)\n", "\n", "print(\"instance:\", instance)\n", "print(\"binary representation:\", explainer.binary_representation)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "\n", "print(\"to_features:\", explainer.to_features(explainer.binary_representation))\n", "print(\"to_features (keep redundant):\", explainer.to_features(explainer.binary_representation, eliminate_redundant_features=False))\n", "\n", "print(\"to_features with details:\", explainer.to_features([-1], details=True))" ] }, { "cell_type": "markdown", "id": "373b9642", "metadata": {}, "source": [ "We notice that the binary representation contains more than 4 variables because the decision tree of the model is\n", "composed of five nodes (i.e., five conditions). Indeed, the feature Petal.Width appears 3 times whereas the feature Sepal.length does not appear. We can see that, for this binary representation, we can eliminate two redundant conditions related to the Petal.width feature." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }