{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Concepts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This section deals with the concepts of the ```Explaining``` object of PyXAI. First, we show how to use it, then we explain the notion of binary variables, and finally we give an example.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Main Methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First of all, in order to explain instances from a given model, you need to create an Explainer Object. This is done\n", "using the function ```Explaining.initialize```." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the ```Explainer``` is created you can explain several instances using it. Each time you want to explain a new instance, you need to call the function ```set_instance``` (it is not necessary to create a new ```Explainer``` object)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The prediction made by the ML model is given by the ```target_prediction``` variable of the ```Explainer``` Object. Now, we have to consider binary representation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Binary representation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let us recall that all the ML models we consider consist of trees (only one for a decision tree). Each tree contains nodes representing conditions \"\\ \\ \\ ?\" (such as \"$x_4 \\ge 0.5$ ?\"). Internally, the ```Explainer``` works with these conditions, treated as Boolean variables. The **binary representation** of an instance is a set of Boolean variables matching such conditions. Each Boolean variable represents a condition \"\\ \\ \\ ?\" of the model. The binary representation can be found in the ```binary_representation``` attribute of the ```Explainer``` object. The function ```to_features``` converts a binary representation (or an explanation) into a list of conditions \"\\ \\ \\\" representing the features used. It is possible to obtain detailed results — in that case, a dictionary is returned with all details.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function ```to_features``` is independent of the instance given by the explainer through the ```initialize```\n", "and ```set_instance``` methods, but depends only on the binary representation given by the parameter." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We present below an example based on the [iris](../../dataset/iris.csv) dataset and a Decision Tree as ML model. You should take a look at the [Generating Model](/documentation/learning/) page if you need more information about the ```Learning``` module.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------- Information ---------------\n", "Problem type: classification\n", "Instances type: tabular\n", "Labels type: classes\n", "\n", "Dataset path: ../../dataset/iris.csv\n", "nFeatures (nAttributes, with the labels): 4\n", "nInstances (nObservations): 150\n", "nLabels: 3\n", "--------------- Model creation, fitting and evaluation ---------------\n", "Splitting method: hold-out\n", "Problem type: classification\n", "Models type: decision-tree\n", "model_parameters: {}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "Metrics:\n", " micro_averaging_accuracy: 96.49122807017544\n", " micro_averaging_precision: 94.73684210526315\n", " micro_averaging_recall: 94.73684210526315\n", " macro_averaging_accuracy: 96.49122807017542\n", " macro_averaging_precision: 93.73219373219372\n", " macro_averaging_recall: 93.73219373219372\n", " true_positives: {'Iris-setosa': 16, 'Iris-versicolor': 8, 'Iris-virginica': 12}\n", " true_negatives: {'Iris-setosa': 22, 'Iris-versicolor': 28, 'Iris-virginica': 24}\n", " false_positives: {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 1}\n", " false_negatives: {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 1}\n", " accuracy: 94.73684210526315\n", " sklearn_confusion_matrix: [[16, 0, 0], [0, 8, 1], [0, 1, 12]]\n", "Number of Training instances: 112\n", "Number of Testing instances: 38\n", "\n", "--------------- Explainer ----------------\n", "For the split number 0:\n", "**Decision Tree Model**\n", "nFeatures: 4\n", "nNodes: 8\n", "nVariables: 8\n", "\n", "--------------- Instances ----------------\n", "Number of instances selected: 1\n", "----------------------------------------------\n", "instance: Sepal.Length 5.1\n", "Sepal.Width 3.5\n", "Petal.Length 1.4\n", "Petal.Width 0.2\n", "Name: 0, dtype: float64\n", "binary representation: (-1, -2, -3, -4, -5, -6, -7, -8)\n", "target_prediction: Iris-setosa\n", "to_features: ['Sepal.Length <= 5.950000047683716', 'Petal.Length <= 2.449999988079071', 'Petal.Width <= 1.550000011920929']\n", "to_features (keep redundant): ['Sepal.Length <= 6.599999904632568', 'Sepal.Length <= 5.950000047683716', 'Petal.Length <= 2.449999988079071', 'Petal.Length <= 4.75', 'Petal.Length <= 4.950000047683716', 'Petal.Length <= 4.8500001430511475', 'Petal.Width <= 1.699999988079071', 'Petal.Width <= 1.550000011920929']\n", "to_features with details: OrderedDict({'Petal.Length': [{'id': 3, 'name': 'Petal.Length', 'operator_sign_considered': , 'threshold': np.float64(2.449999988079071), 'weight': None, 'string': 'Petal.Length <= 2.449999988079071'}]})\n" ] } ], "source": [ "from pyxai import Learning, Explaining\n\nlearner = Learning.Scikitlearn(\"../../dataset/iris.csv\", problem_type=Learning.CLASSIFICATION)\nmodel = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.DT)\n\ninstance, prediction = learner.get_instances(model, n=1, is_correct=True)\nexplainer = Explaining.initialize(model, instance)\n\nprint(\"instance:\", instance)\nprint(\"binary representation:\", explainer.binary_representation)\nprint(\"target_prediction:\", explainer.target_prediction)\n\nprint(\"to_features:\", explainer.to_features(explainer.binary_representation))\nprint(\"to_features (keep redundant):\", explainer.to_features(explainer.binary_representation, eliminate_redundant_features=False))\n\nprint(\"to_features with details:\", explainer.to_features([-1], details=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We notice that the binary representation contains more variables than features because several features can appear multiple times in the decision tree — one condition per node. The function ```to_features``` eliminates redundant conditions by default, keeping only the tightest bound for each feature. Passing ```eliminate_redundant_features=False``` returns all conditions without simplification.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" } }, "nbformat": 4, "nbformat_minor": 5 }