{ "cells": [ { "cell_type": "markdown", "id": "95a492d4", "metadata": {}, "source": [ "# Quick Start" ] }, { "cell_type": "markdown", "id": "80685d54", "metadata": {}, "source": [ "{: .attention }\n", "> Once pyxai has been installed, you can use these commands:\n", "> \n", "> ```python3 ```: Execute a python file with lines of code using PyXAI\\\\\n", "> ```python3 -m pyxai -gui```: Open the PyXAI's Graphical User Interface\\\\\n", "> ```python3 -m pyxai -explanations```: Copy the explanations backups of GUI in your current directory\\\\ \n", "> ```python3 -m pyxai -examples```: Copy the examples in your current directory \n", "\n", "Let us give a quick illustration of PyXAI, showing how to compute explanations given a ML model. \n", "\"Iris\"" ] }, { "cell_type": "markdown", "id": "f898fcd6", "metadata": {}, "source": [ "The first thing to do is to import the components of PyXAI. In order to import only the necessary methods into a project, PyXAI is composed of three distinct modules: ```Learning```, ```Explainer```, and ```Tools```." ] }, { "cell_type": "code", "execution_count": 1, "id": "1500590b", "metadata": {}, "outputs": [], "source": [ "from pyxai import Learning, Explainer, Tools" ] }, { "cell_type": "markdown", "id": "ca4d6513", "metadata": {}, "source": [ "If you encounter a problem, this is certainly because you need the python package PyXAI to be installed on your system. You need to execute a command like ```python3 -m pip install pyxai```. See the [Installation](/documentation/installation) page for details." ] }, { "cell_type": "markdown", "id": "8dff254a", "metadata": {}, "source": [ "In most situations, the use of PyXAI library requires to achieve two successive steps: first the generation of an ML model from a dataset (with the ```Learning``` module) and second, given the learned model, the computation of explanations for some instances (using the ```Explainer``` module). " ] }, { "cell_type": "markdown", "id": "a5714f78", "metadata": {}, "source": [ "## Machine Learning" ] }, { "cell_type": "markdown", "id": "aa225621", "metadata": {}, "source": [ "For this example, we want to create a decision tree classifier for the iris dataset using [Scikit-learn](https://scikit-learn.org/stable/)." ] }, { "cell_type": "code", "execution_count": 2, "id": "bc48a391", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data:\n", " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "0 5.1 3.5 1.4 0.2 Iris-setosa\n", "1 4.9 3.0 1.4 0.2 Iris-setosa\n", "2 4.7 3.2 1.3 0.2 Iris-setosa\n", "3 4.6 3.1 1.5 0.2 Iris-setosa\n", "4 5.0 3.6 1.4 0.2 Iris-setosa\n", ".. ... ... ... ... ...\n", "145 6.7 3.0 5.2 2.3 Iris-virginica\n", "146 6.3 2.5 5.0 1.9 Iris-virginica\n", "147 6.5 3.0 5.2 2.0 Iris-virginica\n", "148 6.2 3.4 5.4 2.3 Iris-virginica\n", "149 5.9 3.0 5.1 1.8 Iris-virginica\n", "\n", "[150 rows x 5 columns]\n", "-------------- Information ---------------\n", "Dataset name: ../dataset/iris.csv\n", "nFeatures (nAttributes, with the labels): 5\n", "nInstances (nObservations): 150\n", "nLabels: 3\n" ] } ], "source": [ "learner = Learning.Scikitlearn(\"../dataset/iris.csv\", learner_type=Learning.CLASSIFICATION)" ] }, { "cell_type": "markdown", "id": "d12124c7", "metadata": {}, "source": [ "It is possible to download this dataset from the [UCI Machine Learning Repository -- Iris Data Set](http://archive.ics.uci.edu/ml/datasets/Iris) or [here](/assets/notebooks/dataset/iris.csv). In our case, it is located in the directory ```../dataset```. The parameter ```learner_type=Learning.CLASSIFICATION``` asks to achieve a classification task. The Iris Dataset contains four features (length and width of sepals and petals) of 50 samples of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). The goal of the classifier is to find the right outcome for an instance among three classes: setosa, virginica, versicolor. " ] }, { "cell_type": "markdown", "id": "d54c678e", "metadata": {}, "source": [ "To create models, PyXAI implements methods to directly run an ML experimental protocol (with the train-test split technique). Several cross-validation methods (```Learning.HOLD_OUT```, ```Learning.K_FOLDS```, ```Learning.LEAVE_ONE_GROUP_OUT```) and models (```Learning.DT```, ```Learning.RF```, ```Learning.BT```) are available. \n", "\n", "In this example, we compute a Decision Tree (see the parameter ```output=Learning.DT```)." ] }, { "cell_type": "code", "execution_count": 3, "id": "0c692233", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--------------- Evaluation ---------------\n", "method: HoldOut\n", "output: DT\n", "learner_type: Classification\n", "learner_options: {'max_depth': None, 'random_state': 0}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "metrics:\n", " accuracy: 97.77777777777777\n", "nTraining instances: 105\n", "nTest instances: 45\n", "\n", "--------------- Explainer ----------------\n", "For the evaluation number 0:\n", "**Decision Tree Model**\n", "nFeatures: 4\n", "nNodes: 6\n", "nVariables: 5\n", "\n" ] } ], "source": [ "model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.DT)" ] }, { "cell_type": "markdown", "id": "1e45e421", "metadata": {}, "source": [ "Once the model is created, we select an instance in order to be able to derive explanations. Here, a well-classified instance is chosen: the model predicts the first class ```0``` (i.e. the Iris setosa class) thanks to the ```correct=True``` and the ```predictions=[0]``` parameters. " ] }, { "cell_type": "code", "execution_count": 4, "id": "2577c3ec", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--------------- Instances ----------------\n", "number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "instance, prediction = learner.get_instances(model, n=1, correct=True, predictions=[0])" ] }, { "cell_type": "markdown", "id": "5372cec4", "metadata": {}, "source": [ "Please consult the [Learning](/documentation/learning/generating) page for more details about this ML part. " ] }, { "cell_type": "markdown", "id": "23b4ec8b", "metadata": {}, "source": [ "## Explainer" ] }, { "cell_type": "markdown", "id": "5b4a98b8", "metadata": {}, "source": [ "The ```Explainer``` module contains different methods to generate explanations. In this purpose, the model and the target instance are defined as parameters of the ```initialize``` function of this module. " ] }, { "cell_type": "code", "execution_count": 5, "id": "7d7e859c", "metadata": {}, "outputs": [], "source": [ "explainer = Explainer.initialize(model, instance)" ] }, { "cell_type": "markdown", "id": "5257e4b0", "metadata": {}, "source": [ "The ```initialize``` function converts the instance into binary variables (called a binary representation) coding the associated model. More precisely, each of these binary variables represents a condition (feature $op$ value ?) in the model where $op$ is a standard comparison operator. [Scikit-learn](https://scikit-learn.org/stable/) and [XGBoost](https://xgboost.readthedocs.io/en/stable/) use the operator $\\ge$. With respect to the instance, the sign of a binary variable indicates whether the condition is true or not in the model. Here, we can see the instance and its binary representation. We can see the conditions related to the binary representation using the function ```to_features``` which is explained below." ] }, { "cell_type": "code", "execution_count": 6, "id": "4ce31e13", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: [5.1 3.5 1.4 0.2]\n", "binary representation: (-1, -2, -3, 4, -5)\n", "conditions related to the binary representation: ('Sepal.Width > 3.100000023841858', 'Petal.Length <= 4.950000047683716', 'Petal.Width <= 0.75', 'Petal.Width <= 1.6500000357627869', 'Petal.Width <= 1.75')\n" ] } ], "source": [ "print(\"instance:\", instance)\n", "print(\"binary representation:\", explainer.binary_representation)\n", "print(\"conditions related to the binary representation:\", explainer.to_features(explainer.binary_representation,eliminate_redundant_features=False))" ] }, { "cell_type": "markdown", "id": "28df2c0f", "metadata": {}, "source": [ "We notice that the binary representation of this instance contains more than 4 variables because the decision tree of the model is composed of five nodes (binary variables). Indeed, the feature Petal.Width appears 3 times whereas Sepal.Length is useless. Please see the [concepts](/documentation/explainer/concepts/) page for more information on binary representations." ] }, { "cell_type": "markdown", "id": "62766926", "metadata": {}, "source": [ "### Abductive explanations" ] }, { "cell_type": "markdown", "id": "ea0e27bc", "metadata": {}, "source": [ "In PyXAI, several types of explanation are available. In their binary forms representing conditions, these are called reasons. In our example, we choose to compute one of the most popular type of explanations: a sufficient reason. A sufficient reason is an abductive explanation (any other instance X' sharing the conditions of this reason is classified by the model as X is) for which no proper subset of this reason is a sufficient reason (i.e., the explanation is minimal with respect to set inclusion). " ] }, { "cell_type": "code", "execution_count": 7, "id": "3929b134", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sufficient_reason: (-1,)\n" ] } ], "source": [ "sufficient_reason = explainer.sufficient_reason(n=1)\n", "print(\"sufficient_reason:\", sufficient_reason)" ] }, { "cell_type": "markdown", "id": "b8e5f6bc", "metadata": {}, "source": [ "We can get the features involved in the reason thanks to the method ```to_features```:" ] }, { "cell_type": "code", "execution_count": 8, "id": "b297f93b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "to_features: ('Petal.Width <= 0.75',)\n" ] } ], "source": [ "print(\"to_features:\", explainer.to_features(sufficient_reason))" ] }, { "cell_type": "markdown", "id": "337056c0", "metadata": {}, "source": [ "The ```to_features``` method eliminates redundant features by default and is also able to return more information about the features using the ```details``` parameter. This method is described in the [concepts](/documentation/explainer/concepts/) page. " ] }, { "cell_type": "markdown", "id": "d524a051", "metadata": {}, "source": [ "We can check whether the derived explanation actually is a reason." ] }, { "cell_type": "code", "execution_count": 9, "id": "21a1bd99", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "is sufficient: True\n" ] } ], "source": [ "print(\"is sufficient: \", explainer.is_sufficient_reason(sufficient_reason))" ] }, { "cell_type": "markdown", "id": "43404046", "metadata": {}, "source": [ "{: .attention }\n", "\n", "> It is important to note that computing and checking reasons are done independently." ] }, { "cell_type": "markdown", "id": "28f5daf6", "metadata": {}, "source": [ "To conclude, the sufficient reason (```('Petal.Width < 0.75',)```) explains why the instance ```[5.1 3.5 1.4 0.2]``` is well classified by the model (the prediction was Iris-setosa). It is because the fourth feature (the petal width in cm), set to 0.2 cm, is not greater or equal than 0.75 cm (see the attached image). \n", "\"Iris\"" ] }, { "cell_type": "markdown", "id": "82f1f716", "metadata": {}, "source": [ "### Contrastive explanations" ] }, { "cell_type": "markdown", "id": "25efc49d", "metadata": {}, "source": [ "Now, let us consider another instance, a wrongly classified one using the parameter ```correct=False``` of the function ```get_instance```. We set this instance to the explainer with the ```set_instance``` method." ] }, { "cell_type": "code", "execution_count": 10, "id": "c3887aae", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--------------- Instances ----------------\n", "number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "instance, prediction = learner.get_instances(model, n=1, correct=False)\n", "explainer.set_instance(instance)" ] }, { "cell_type": "markdown", "id": "9e38acce", "metadata": {}, "source": [ "We can explain why this instance is **not** classified differently by providing a contrastive explanation." ] }, { "cell_type": "code", "execution_count": 11, "id": "6c0ad185", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "contrastive reason (1,)\n", "to_features: ('Petal.Width > 0.75',)\n" ] } ], "source": [ "contrastive_reason = explainer.contrastive_reason()\n", "print(\"contrastive reason\", contrastive_reason)\n", "print(\"to_features:\", explainer.to_features(contrastive_reason, contrastive=True))" ] }, { "cell_type": "markdown", "id": "d1b16200", "metadata": {}, "source": [ "More information about explanations can be found in the [Explainer Principles](/documentation/explainer/) page, the [Explaining Classification](/documentation/classification/) page and the [Explaining Regression](/documentation/regression/) page." ] } ], "metadata": { "celltoolbar": "Pièces jointes", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }