{
"cells": [
{
"cell_type": "markdown",
"id": "601b0667",
"metadata": {},
"source": [
"# Concepts"
]
},
{
"cell_type": "markdown",
"id": "43ffdd12",
"metadata": {},
"source": [
"This section deals with the concepts of the ```Explainer``` object of PyXAI. First, we show you how to use it, then we explain in detail the notion of binary variables, and to finish, we give an example."
]
},
{
"cell_type": "markdown",
"id": "107a059b",
"metadata": {},
"source": [
"## Main Methods"
]
},
{
"cell_type": "markdown",
"id": "49832bd4",
"metadata": {},
"source": [
"First of all, in order to explain instances from a given model, you need to create an Explainer Object. This is done\n",
"using the function ```Explainer.initialize```."
]
},
{
"cell_type": "markdown",
"id": "2e84822a",
"metadata": {},
"source": [
"| Explainer.initialize(model, instance=None, features_type=None):|\n",
"|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| \n",
"| Depending on the model given in the first argument, this method creates an ```ExplainerDT```, an ```ExplainerRF``` an ```ExplainerBT```, or an ```ExplainerRegressionBT``` object. This object is able to give explanations about the instance given as a second parameter. This last parameter is optional because you can set the instance later using the ```set_instance``` function. |\n",
"| model ```DecisionTree``` ```RandomForest``` ```BoostedTree```: The model for which explanations will be calculated.|\n",
"| instance ```Numpy Array``` of ```Float```: The instance to be explained. Default value is ```None```.|\n",
"| features_type ```String``` ```Dict``` ```None```: Either a dictionary indicating the type of features or the path to a ```.types``` file containing this information. Activate domain theories. More details are given on the [Theories](/documentation/explainer/theories/) page. |"
]
},
{
"cell_type": "markdown",
"id": "32ef6d0c",
"metadata": {},
"source": [
"Once the ```Explainer``` is created you can explain several instances using it. Each time you want to explain a new instance, you need to call the function ```set_instance``` (it is not necessary to create a new ```Explainer``` object)."
]
},
{
"cell_type": "markdown",
"id": "da895ea8",
"metadata": {},
"source": [
"| <Explainer Object>.set_instance(instance): |\n",
"|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n",
"| Sets a new instance to be explained. |\n",
"| instance ```Numpy Array``` of ```Float```: The instance to be explained. |"
]
},
{
"cell_type": "markdown",
"id": "fcad9d89",
"metadata": {},
"source": [
"The prediction made by the ML model is given by the ```target_prediction``` variable of the ```Explainer``` Object. Now, we have to consider binary representation."
]
},
{
"cell_type": "markdown",
"id": "fa3aa966",
"metadata": {},
"source": [
"## Binary representation"
]
},
{
"cell_type": "markdown",
"id": "7432c7d8",
"metadata": {},
"source": [
"First, let us recall that all the ML models we considerconsist of trees (only one for a decision tree). Each tree contains nodes\n",
"representing conditions \"\\ \\ \\ ?\" (such as \"$x_4 \\ge 0.5$ ?\"). Internally,\n",
"the ```Explainer``` works with these conditions, that are treated as Boolean variables. The **binary representation**\n",
"of an instance is a set of Boolean variables matching such conditions. Each Boolean\n",
"variable\n",
"represents a condition \"\\ \\ \\ ?\" of the model. The binary representation\n",
"can be found in the ```binary_variable``` variable of\n",
"the ```Explainer``` Object. The function ```to_features``` converts a binary representation (or an explanation) into a tuple of conditions \"\n",
"\\ \\ \\\" representing the features used."
]
},
{
"cell_type": "markdown",
"id": "0f45a3e1",
"metadata": {},
"source": [
"| <Explainer Object>.to_features(self, binary_representation, *, eliminate_redundant_features=True, details=False, contrastive=False, without_intervals=False):|\n",
"|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n",
"| When the parameter details is set to ```False```, returns a ```Tuple``` of ```String``` where each ```String``` represents a condition \"\\ \\ \\\" associated with the binary representation given as first parameter. By default, a string represents such a condition but if you need more information, you can set the parameter ```details``` to ```True```. In this case, the method returns a ```Tuple``` of ```Dict``` where each dictionary provides more information on the condition. This method also allows one to eliminate redundant conditions. For example, if we have \"feature_a > 3\" and \"feature_a > 2\", we keep only the binary variable linked to the Boolean corresponding to the \"feature_a > 3\". Therefore, if you want to get all conditions, you have to set the parameter ```eliminate_redundant``` to ```False```. |\n",
"| binary_representation ```List``` ```Tuple```: A set of (signed) binary variables. |\n",
"| eliminate_redundant_features ```Boolean```: ```True``` or ```False``` depending if you want eliminate or not redundant conditions. Default value is ```True```.|\n",
"| details ```Boolean```: ```True``` or ```False``` depending on whether you want details or not. Default value is ```False```.| \n",
"| contrastive ```Boolean```: ```True``` or ```False``` depending on whether you want to get a contrastive explanation or not. When this parameter is set to ```True```, the elimination of redundant features must be reversed. Default value is ```False```.|\n",
"| without_intervals ```Boolean```: ```True``` or ```False``` depending if you want to consider a compact representation with intervals or not.|\n"
]
},
{
"cell_type": "markdown",
"id": "4621eeab",
"metadata": {},
"source": [
"{: .note}\n",
"> The details provided with the details parameter set to ```True``` in the to_features function are represented by the\n",
"> keys of the returned dictionary:\n",
"\n",
"- ```[\"id\"]```: The id of the feature.\n",
"- ```[\"name\"]```: The name of the feature (if labels are known, otherwise they are named f1, f2 and so on).\n",
"- ```[\"operator\"]```: The operator associated with the condition.\n",
"- ```[\"threshold\"]```: The threshold of the condition.\n",
"- ```[\"sign\"]```: The sign of the Boolean variable in the binary representation: ```True``` if the condition is satisfied\n",
" else ```False```.\n",
"- ```[\"weight\"]```: The weight of the condition, used only with user preferences."
]
},
{
"cell_type": "markdown",
"id": "eeeaf935",
"metadata": {},
"source": [
"{: .note}\n",
"> Explanations computed using out explainer module may contain redundant conditions. Let us take an example with a feature\n",
"> $f_1$ and two Boolean variables $x_1$ and $x_2$ associated with the condition $(f_1 \\ge 5)$ and $(f_1 \\ge 3)$\n",
"> respectively. If in the instance $f_1=6$ then $x_1$ and $x_2$ are set to true. The explanation that is derived can involve\n",
"> both\n",
"> of them. By setting the ```eliminate_redundant_features parameter``` to ```True``` in the method ```to_features```,\n",
"> we remove $(f_1 \\ge 3)$ which is redundant."
]
},
{
"cell_type": "markdown",
"id": "3d72e250",
"metadata": {},
"source": [
"{: .attention }\n",
"> Forgetting to set parameter ```contrastive``` to ```True``` to display a contrastive explanation may result in an incorrect explanation. "
]
},
{
"cell_type": "markdown",
"id": "f4b1899e",
"metadata": {},
"source": [
"The function ```to_features``` is independent of the instance given by the explainer through the ```initialize```\n",
"and ```set_instance``` methods, but depends only on the binary representation given by the parameter."
]
},
{
"cell_type": "markdown",
"id": "4896445c",
"metadata": {},
"source": [
"## Example"
]
},
{
"cell_type": "markdown",
"id": "b4057ff0",
"metadata": {},
"source": [
"We present in the following an example based on the dataset [iris](/assets/notebooks/dataset/iris.csv) and a Decision Tree as ML model. You should take a look to the [Generating Model](/documentation/learning/) page if you need more information about the ```Learning``` module."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "7a754ae5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"data:\n",
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n",
"0 5.1 3.5 1.4 0.2 Iris-setosa\n",
"1 4.9 3.0 1.4 0.2 Iris-setosa\n",
"2 4.7 3.2 1.3 0.2 Iris-setosa\n",
"3 4.6 3.1 1.5 0.2 Iris-setosa\n",
"4 5.0 3.6 1.4 0.2 Iris-setosa\n",
".. ... ... ... ... ...\n",
"145 6.7 3.0 5.2 2.3 Iris-virginica\n",
"146 6.3 2.5 5.0 1.9 Iris-virginica\n",
"147 6.5 3.0 5.2 2.0 Iris-virginica\n",
"148 6.2 3.4 5.4 2.3 Iris-virginica\n",
"149 5.9 3.0 5.1 1.8 Iris-virginica\n",
"\n",
"[150 rows x 5 columns]\n",
"-------------- Information ---------------\n",
"Dataset name: ../../dataset/iris.csv\n",
"nFeatures (nAttributes, with the labels): 5\n",
"nInstances (nObservations): 150\n",
"nLabels: 3\n",
"--------------- Evaluation ---------------\n",
"method: HoldOut\n",
"output: DT\n",
"learner_type: Classification\n",
"learner_options: {'max_depth': None, 'random_state': 0}\n",
"--------- Evaluation Information ---------\n",
"For the evaluation number 0:\n",
"metrics:\n",
" accuracy: 97.77777777777777\n",
"nTraining instances: 105\n",
"nTest instances: 45\n",
"\n",
"--------------- Explainer ----------------\n",
"For the evaluation number 0:\n",
"**Decision Tree Model**\n",
"nFeatures: 4\n",
"nNodes: 6\n",
"nVariables: 5\n",
"\n",
"--------------- Instances ----------------\n",
"number of instances selected: 1\n",
"----------------------------------------------\n",
"instance: [5.1 3.5 1.4 0.2]\n",
"binary representation: (-1, -2, -3, 4, -5)\n",
"target_prediction: 0\n",
"to_features: ('Sepal.Width > 3.100000023841858', 'Petal.Length <= 4.950000047683716', 'Petal.Width <= 0.75')\n",
"to_features (keep redundant): ('Sepal.Width > 3.100000023841858', 'Petal.Length <= 4.950000047683716', 'Petal.Width <= 0.75', 'Petal.Width <= 1.6500000357627869', 'Petal.Width <= 1.75')\n",
"to_features with details: OrderedDict([('Petal.Width', [{'id': 4, 'name': 'Petal.Width', 'operator': , 'sign': True, 'operator_sign_considered': , 'threshold': 0.75, 'weight': None, 'theory': None, 'string': 'Petal.Width <= 0.75'}])])\n"
]
}
],
"source": [
"from pyxai import Learning, Explainer, Tools\n",
"learner = Learning.Scikitlearn(\"../../dataset/iris.csv\", learner_type=Learning.CLASSIFICATION)\n",
"model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.DT)\n",
"\n",
"instance, prediction = learner.get_instances(model, n=1, correct=True, predictions=[0])\n",
"explainer = Explainer.initialize(model, instance)\n",
"\n",
"print(\"instance:\", instance)\n",
"print(\"binary representation:\", explainer.binary_representation)\n",
"print(\"target_prediction:\", explainer.target_prediction)\n",
"\n",
"print(\"to_features:\", explainer.to_features(explainer.binary_representation))\n",
"print(\"to_features (keep redundant):\", explainer.to_features(explainer.binary_representation, eliminate_redundant_features=False))\n",
"\n",
"print(\"to_features with details:\", explainer.to_features([-1], details=True))"
]
},
{
"cell_type": "markdown",
"id": "373b9642",
"metadata": {},
"source": [
"We notice that the binary representation contains more than 4 variables because the decision tree of the model is\n",
"composed of five nodes (i.e., five conditions). Indeed, the feature Petal.Width appears 3 times whereas the feature Sepal.length does not appear. We can see that, for this binary representation, we can eliminate two redundant conditions related to the Petal.width feature."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}