{ "cells": [ { "cell_type": "markdown", "id": "0bbebed5", "metadata": {}, "source": [ "# Visualisation of Explanations (With GUI)" ] }, { "cell_type": "markdown", "id": "e7cb1097", "metadata": {}, "source": [ "Some datasets are associated with graphical representations. It is for instance the case for the MNIST dataset ([Modified National Institute of Standards and Technology database](https://paperswithcode.com/dataset/mnist)) which is a large collection of handwritten digits. Based on on [PyQT6](https://www.riverbankcomputing.com/software/pyqt/) and [Matplotlib](https://matplotlib.org/), PyXAI provides a Graphical User Interface (GUI) to display, save and load, instances and explanations for any dataset in a smart way. " ] }, { "cell_type": "markdown", "id": "b704f3c2", "metadata": {}, "source": [ "You can open the PyXAI's Graphical User Interface (GUI) with this command: \n", "\n", "```\n", "python3 -m pyxai -gui\n", "```\n", "\n", "You can also open the PyXAI's Graphical User Interface inside a Python file thanks to the ```Explainer``` module: \n", "\n", "```\n", "from pyxai import Explainer\n", "Explainer.show()\n", "```\n", "\n", "PyXAI saves and loads visualisations of instances and explanations through JSON files with the ```.explainer``` extension. In order to get demonstration backup files, in your current directory, we invite you to type this command: \n", "\n", "```\n", "python3 -m pyxai -explanations\n", "```\n", "\n", "This command creates a new directory containing backup files and named ```explanations``` in your current directory:\n", "\n", "```console\n", "Python version: 3.10.12\n", "PyXAI version: 1.0.post1\n", "PyXAI location: /home/adminlocal/Bureau/pyxai/pyxai-backend-experimental/pyxai\n", "Source of files found: /home/adminlocal/Bureau/pyxai/pyxai-backend-experimental/pyxai/explanations/\n", "Successful creation of the /home/adminlocal/Bureau/pyxai/pyxai-website/explanations/ directory containing the explanations.\n", "```" ] }, { "cell_type": "markdown", "id": "9c471ffc", "metadata": {}, "source": [ "## Loading" ] }, { "cell_type": "markdown", "id": "da6ed447", "metadata": {}, "source": [ "Open the PyXAI's Graphical User Interface (GUI) with this command: \n", "\n", "```\n", "python3 -m pyxai -gui\n", "```\n", "\n", "Click on ```File``` and then ```Load Explainer``` in the menu bar at the top left of the application:\n", "\n", "\"GUI-load\"\n", "\n", "Then, you can choose the file to load, here we have chosen the file ```BT-iris.explainer```:\n", "\n", "\"GUI-load-2\"\n" ] }, { "cell_type": "markdown", "id": "d9888a3a", "metadata": {}, "source": [ "## Saving (with a tabular dataset)" ] }, { "cell_type": "markdown", "id": "ed712648", "metadata": {}, "source": [ "The [Australian Credit Approval dataset](https://www.openml.org/search?type=data&sort=runs&id=40981&status=active) is a credit card application:" ] }, { "cell_type": "code", "execution_count": 6, "id": "65e12823", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data:\n", " A1 A2 A3 A4_1 A4_2 A4_3 A5_1 A5_2 A5_3 A5_4 ... A8 A9 A10 \n", "0 1 65 168 0 1 0 0 0 0 1 ... 0 0 1 \\\n", "1 0 72 123 0 1 0 0 0 0 0 ... 0 0 1 \n", "2 0 142 52 1 0 0 0 0 0 1 ... 0 0 1 \n", "3 0 60 169 1 0 0 0 0 0 0 ... 1 1 12 \n", "4 1 44 134 0 1 0 0 0 0 0 ... 1 1 15 \n", ".. .. ... ... ... ... ... ... ... ... ... ... .. .. ... \n", "685 1 163 160 0 1 0 0 0 0 0 ... 1 0 1 \n", "686 1 49 14 0 1 0 0 0 0 0 ... 0 0 1 \n", "687 0 32 145 0 1 0 0 0 0 0 ... 1 0 1 \n", "688 0 122 193 0 1 0 0 0 0 0 ... 1 1 2 \n", "689 1 245 2 0 1 0 0 0 0 0 ... 0 1 2 \n", "\n", " A11 A12_1 A12_2 A12_3 A13 A14 A15 \n", "0 1 0 1 0 32 161 0 \n", "1 0 0 1 0 53 1 0 \n", "2 1 0 1 0 98 1 0 \n", "3 1 0 1 0 1 1 1 \n", "4 0 0 1 0 18 68 1 \n", ".. ... ... ... ... ... ... ... \n", "685 0 0 1 0 1 1 1 \n", "686 0 0 1 0 1 35 0 \n", "687 0 0 1 0 32 1 1 \n", "688 0 0 1 0 38 12 1 \n", "689 0 1 0 0 159 1 1 \n", "\n", "[690 rows x 39 columns]\n", "-------------- Information ---------------\n", "Dataset name: ../dataset/australian_0.csv\n", "nFeatures (nAttributes, with the labels): 39\n", "nInstances (nObservations): 690\n", "nLabels: 2\n", "--------------- Evaluation ---------------\n", "method: HoldOut\n", "output: RF\n", "learner_type: Classification\n", "learner_options: {'max_depth': None, 'random_state': 0}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "metrics:\n", " accuracy: 85.5072463768116\n", "nTraining instances: 483\n", "nTest instances: 207\n", "\n", "--------------- Explainer ----------------\n", "For the evaluation number 0:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 1361\n", "\n", "--------------- Instances ----------------\n", "number of instances selected: 10\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explainer\n", "\n", "# Machine learning part\n", "learner = Learning.Scikitlearn(\"../dataset/australian_0.csv\", learner_type=Learning.CLASSIFICATION)\n", "model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.RF)\n", "instances = learner.get_instances(model, n=10, seed=11200, correct=True)\n", "\n", "australian_types = {\n", " \"numerical\": Learning.DEFAULT,\n", " \"categorical\": {\"A4*\": (1, 2, 3), \n", " \"A5*\": tuple(range(1, 15)),\n", " \"A6*\": (1, 2, 3, 4, 5, 7, 8, 9), \n", " \"A12*\": tuple(range(1, 4))},\n", " \"binary\": [\"A1\", \"A8\", \"A9\", \"A11\"],\n", "}" ] }, { "cell_type": "markdown", "id": "932ee707", "metadata": {}, "source": [ "We choose here to compute one majoritary reason per instance:" ] }, { "cell_type": "code", "execution_count": 7, "id": "885c1c93", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--------- Theory Feature Types -----------\n", "Before the encoding (without one hot encoded features), we have:\n", "Numerical features: 6\n", "Categorical features: 4\n", "Binary features: 4\n", "Number of features: 14\n", "Values of categorical features: {'A4_1': ['A4', 1, (1, 2, 3)], 'A4_2': ['A4', 2, (1, 2, 3)], 'A4_3': ['A4', 3, (1, 2, 3)], 'A5_1': ['A5', 1, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_2': ['A5', 2, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_3': ['A5', 3, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_4': ['A5', 4, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_5': ['A5', 5, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_6': ['A5', 6, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_7': ['A5', 7, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_8': ['A5', 8, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_9': ['A5', 9, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_10': ['A5', 10, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_11': ['A5', 11, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_12': ['A5', 12, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_13': ['A5', 13, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_14': ['A5', 14, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A6_1': ['A6', 1, (1, 2, 3, 4, 5, 7, 8, 9)], 'A6_2': ['A6', 2, (1, 2, 3, 4, 5, 7, 8, 9)], 'A6_3': ['A6', 3, (1, 2, 3, 4, 5, 7, 8, 9)], 'A6_4': ['A6', 4, (1, 2, 3, 4, 5, 7, 8, 9)], 'A6_5': ['A6', 5, (1, 2, 3, 4, 5, 7, 8, 9)], 'A6_7': ['A6', 7, (1, 2, 3, 4, 5, 7, 8, 9)], 'A6_8': ['A6', 8, (1, 2, 3, 4, 5, 7, 8, 9)], 'A6_9': ['A6', 9, (1, 2, 3, 4, 5, 7, 8, 9)], 'A12_1': ['A12', 1, (1, 2, 3)], 'A12_2': ['A12', 2, (1, 2, 3)], 'A12_3': ['A12', 3, (1, 2, 3)]}\n", "\n", "Number of used features in the model (before the encoding): 14\n", "Number of used features in the model (after the encoding): 38\n", "----------------------------------------------\n", "majoritary_reason 12\n", "majoritary_reason 13\n", "majoritary_reason 12\n", "majoritary_reason 12\n", "majoritary_reason 13\n", "majoritary_reason 15\n", "majoritary_reason 13\n", "majoritary_reason 14\n", "majoritary_reason 13\n", "majoritary_reason 12\n" ] } ], "source": [ "explainer = Explainer.initialize(model, features_type=australian_types)\n", "for (instance, prediction) in instances:\n", " explainer.set_instance(instance)\n", "\n", " majoritary_reason = explainer.majoritary_reason(time_limit=10)\n", " print(\"majoritary_reason\", len(majoritary_reason))" ] }, { "cell_type": "markdown", "id": "3646de73", "metadata": {}, "source": [ "And we start the PyXAI's Graphical User Interface inside a Python file thanks to the ```Explainer``` module:" ] }, { "cell_type": "code", "execution_count": null, "id": "b081dd28", "metadata": {}, "outputs": [], "source": [ "explainer.visualisation.gui()" ] }, { "cell_type": "markdown", "id": "95f10631", "metadata": {}, "source": [ "The last lines of code display the instances and the explanations:\n", "\n", "\"GUI-save-1\"\n", "\n", "You can save this explainer by clicking on ```File``` and then ```Save Explainer``` in the menu bar at the top left of the application.\n", "\n", "\"GUI-save-2\"\n", "\n", "For a tabular dataset where a [Theory](/documentation/explainer/theories/) is taken into account, an explanation is displayed according to its type: \n", "- For Boolean features: it is indicated directly \"is True\" or \"is False\" (A_1 in the exemple)\n", "- For categorical features: the blue (resp. red) values must be equal (resp. not equal) to explain the classification or the regression (A_4, A_5 and A_6 in the example).\n", "- For numerical features: a horizontal axis represents the interval in which the values must be contained to explain the classification or the regression (in blue). In addition, the red dot represents the current feature value of the instance (A_2 and A_3 in the example).\n" ] }, { "cell_type": "markdown", "id": "889576ca", "metadata": {}, "source": [ "## Saving (with an image dataset)" ] }, { "cell_type": "markdown", "id": "6c83cb20", "metadata": {}, "source": [ "We use a modified version of [MNIST](/assets/notebooks/dataset/mnist49.csv) dataset that focuses on 4 and 9 digits. We create a model using the hold-out approach (by default, the test size is set to 30%). We choose to use a Boosted Tree by using XGBoost. " ] }, { "cell_type": "code", "execution_count": 1, "id": "c42c7c13", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data:\n", " 0 1 2 3 4 5 6 7 8 9 ... 775 776 777 778 779 780 781 \n", "0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \\\n", "1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n", "2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n", "3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n", "4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n", "... .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... \n", "13777 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n", "13778 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n", "13779 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n", "13780 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n", "13781 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n", "\n", " 782 783 784 \n", "0 0 0 4 \n", "1 0 0 9 \n", "2 0 0 4 \n", "3 0 0 9 \n", "4 0 0 4 \n", "... ... ... ... \n", "13777 0 0 4 \n", "13778 0 0 4 \n", "13779 0 0 4 \n", "13780 0 0 9 \n", "13781 0 0 4 \n", "\n", "[13782 rows x 785 columns]\n", "-------------- Information ---------------\n", "Dataset name: ../dataset/mnist49.csv\n", "nFeatures (nAttributes, with the labels): 785\n", "nInstances (nObservations): 13782\n", "nLabels: 2\n", "--------------- Evaluation ---------------\n", "method: HoldOut\n", "output: BT\n", "learner_type: Classification\n", "learner_options: {'seed': 0, 'max_depth': None, 'eval_metric': 'mlogloss'}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "metrics:\n", " accuracy: 98.57315598548972\n", "nTraining instances: 9647\n", "nTest instances: 4135\n", "\n", "--------------- Explainer ----------------\n", "For the evaluation number 0:\n", "**Boosted Tree model**\n", "NClasses: 2\n", "nTrees: 100\n", "nVariables: 1590\n", "\n", "--------------- Instances ----------------\n", "number of instances selected: 10\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explainer, Tools\n", "\n", "\n", "# Machine learning part\n", "learner = Learning.Xgboost(\"../dataset/mnist49.csv\", learner_type=Learning.CLASSIFICATION)\n", "model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.BT)\n", "instances = learner.get_instances(model, n=10, correct=True, predictions=[0])" ] }, { "cell_type": "code", "execution_count": 2, "id": "297e3d0b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "len direct: 335\n", "len direct: 349\n", "len direct: 342\n", "len direct: 388\n", "len direct: 348\n", "len direct: 347\n", "len direct: 291\n", "len direct: 357\n", "len direct: 346\n", "len direct: 355\n" ] } ], "source": [ "# Explanation part\n", "explainer = Explainer.initialize(model)\n", "for (instance, prediction) in instances:\n", " explainer.set_instance(instance)\n", "\n", " direct = explainer.direct_reason()\n", " print(\"len direct:\", len(direct))\n", "\n", " tree_specific_reason = explainer.tree_specific_reason()\n", " print(\"len tree_specific_reason:\", len(tree_specific_reason))\n", "\n", " minimal_tree_specific_reason = explainer.minimal_tree_specific_reason(time_limit=100)\n", " print(\"len minimal tree_specific_reason:\", len(minimal_tree_specific_reason))\n" ] }, { "cell_type": "markdown", "id": "b05c6ddb", "metadata": {}, "source": [ "For a dataset containing images, you need to give certain information specific to images (through the ```image``` parameter of the ```show``` method) in order to display instances and explanations correctly." ] }, { "cell_type": "markdown", "id": "54b33b9b", "metadata": {}, "source": [ "| Explainer.visualisation.gui(image=None, time_series=None):|\n", "|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| \n", "| Open the PyXAI's Graphical User Interface. |\n", "| image ```Dict``` ```None```: Python dictionary containing some information specific to images with 4 keys: [\"shape\", \"dtype\", \"get_pixel_value\", \"instance_index_to_pixel_position\"].| \n", "|\"shape\": Tuple representing the number of horizontal and vertical pixels. If the number of values representing a pixel is not equal to 1, this number must be placed is the last value of this tuple. |\n", "|\"dtype\": Domain of values for each pixel (a numpy dtype or can be a tuple of numpy dtype (for a RGB pixel for example)). |\n", "|\"get_pixel_value\": Python function with 4 parameters which returns the value of a pixel according to a pixel position (x,y).|\n", "|\"instance_index_to_pixel_position\": Python function with 2 parameters which returns a pixel position (x,y) according to an index of the instance. |\n", "| time_series ```Dict``` ```None```: To display time series. Python dictionary where a key is the name of a time serie and each value of a key is a list containing time series feature names.| \n" ] }, { "cell_type": "markdown", "id": "07b091db", "metadata": {}, "source": [ "Here we give an example for the MNIST images, each of which is composed of 28 $\\times$ 28 pixels, and the value of a pixel is an 8-bit integer:" ] }, { "cell_type": "code", "execution_count": null, "id": "9185026a", "metadata": {}, "outputs": [], "source": [ "import numpy\n", "\n", "def get_pixel_value(instance, x, y, shape):\n", " index = x * shape[0] + y \n", " return instance[index]\n", "\n", "def instance_index_to_pixel_position(i, shape):\n", " return i // shape[0], i % shape[0]\n", "\n", "explainer.visualisation.gui(image={\"shape\": (28,28),\n", " \"dtype\": numpy.uint8,\n", " \"get_pixel_value\": get_pixel_value,\n", " \"instance_index_to_pixel_position\": instance_index_to_pixel_position})" ] }, { "cell_type": "markdown", "id": "41b37429", "metadata": {}, "source": [ "\"GUI-save-3\"" ] }, { "cell_type": "markdown", "id": "4c61428a", "metadata": {}, "source": [ "You can save this explainer by clicking on ```File``` and then ```Save Explainer``` in the menu bar at the top left of the application.\n", "\n", "As an example, we also consider images in dataset [CIFAR](https://www.cs.toronto.edu/~kriz/cifar.html), which are 32 x 32 pixels with three 8-bit values (RGB) per pixel:" ] }, { "cell_type": "code", "execution_count": null, "id": "28faea26", "metadata": {}, "outputs": [], "source": [ "def get_pixel_value(instance, x, y, shape):\n", " n_pixels = shape[0]*shape[1]\n", " index = x * shape[0] + y \n", " return (instance[0:n_pixels][index], instance[n_pixels:n_pixels*2][index],instance[n_pixels*2:][index])\n", "\n", "def instance_index_to_pixel_position(i, shape):\n", " n_pixels = shape[0]*shape[1]\n", " if i < n_pixels:\n", " value = i \n", " elif i >= n_pixels and i < n_pixels*2:\n", " value = i - n_pixels \n", " else:\n", " value = i - (n_pixels*2) \n", " return value // shape[0], value % shape[0]\n", " \n", "explainer.visualisation.gui(image={\"shape\": (32,32,3),\n", " \"dtype\": numpy.uint8,\n", " \"get_pixel_value\": get_pixel_value,\n", " \"instance_index_to_pixel_position\": instance_index_to_pixel_position})" ] }, { "cell_type": "markdown", "id": "0dccb8e0", "metadata": {}, "source": [ "\"GUI-save-4\"\n", "\n", "You can save individual images by clicking on ```File``` in the menu bar at the top left of the application." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" } }, "nbformat": 4, "nbformat_minor": 5 }