Papers Video GitHub In-the-Loop EXPEKCTATION Release Notes About

Visualisation of Explanations (Without GUI)

You can also display some explanations without PyXAI’s Graphical Interface. To achieve this, the Explainer object of PyXAI defines several methods. As a reminder, see the Explainer page for instructions on how to create this object.

To display an explanation in a Jupyter notebook, use notebook:
To display an explanation on screen, use screen:
To return PIL images of an explanation, use get_PILImage:
To save PNG image files in the current directory, use save_png:
A PIL image can be resized using resize_PILimage:

With a tabular dataset

The Australian Credit Approval dataset is a credit card application:

from pyxai import Learning, Explainer

# Machine learning part
learner = Learning.Scikitlearn("../dataset/australian_0.csv", problem_type=Learning.CLASSIFICATION)
model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.RF)
instance, prediction = learner.get_instances(model, n=1, seed=11200, is_correct=True)

australian_types = {
    "numerical": Learning.DEFAULT,
    "categorical": {"A4*": (1, 2, 3), 
                    "A5*": tuple(range(1, 15)),
                    "A6*": (1, 2, 3, 4, 5, 7, 8, 9), 
                    "A12*": tuple(range(1, 4))},
    "binary": ["A1", "A8", "A9", "A11"],
}

data:
     A1   A2   A3  A4_1  A4_2  A4_3  A5_1  A5_2  A5_3  A5_4  ...  A8  A9  A10   
0     1   65  168     0     1     0     0     0     0     1  ...   0   0    1  \
1     0   72  123     0     1     0     0     0     0     0  ...   0   0    1   
2     0  142   52     1     0     0     0     0     0     1  ...   0   0    1   
3     0   60  169     1     0     0     0     0     0     0  ...   1   1   12   
4     1   44  134     0     1     0     0     0     0     0  ...   1   1   15   
..   ..  ...  ...   ...   ...   ...   ...   ...   ...   ...  ...  ..  ..  ...   
685   1  163  160     0     1     0     0     0     0     0  ...   1   0    1   
686   1   49   14     0     1     0     0     0     0     0  ...   0   0    1   
687   0   32  145     0     1     0     0     0     0     0  ...   1   0    1   
688   0  122  193     0     1     0     0     0     0     0  ...   1   1    2   
689   1  245    2     0     1     0     0     0     0     0  ...   0   1    2   

     A11  A12_1  A12_2  A12_3  A13  A14  A15  
0      1      0      1      0   32  161    0  
1      0      0      1      0   53    1    0  
2      1      0      1      0   98    1    0  
3      1      0      1      0    1    1    1  
4      0      0      1      0   18   68    1  
..   ...    ...    ...    ...  ...  ...  ...  
685    0      0      1      0    1    1    1  
686    0      0      1      0    1   35    0  
687    0      0      1      0   32    1    1  
688    0      0      1      0   38   12    1  
689    0      1      0      0  159    1    1  

[690 rows x 39 columns]
--------------   Information   ---------------
Dataset name: ../dataset/australian_0.csv
nFeatures (nAttributes, with the labels): 39
nInstances (nObservations): 690
nLabels: 2
---------------   Evaluation   ---------------
method: HoldOut
output: RF
learner_type: Classification
learner_options: {'max_depth': None, 'random_state': 0}
---------   Evaluation Information   ---------
For the evaluation number 0:
metrics:
   accuracy: 85.5072463768116
   precision: 84.70588235294117
   recall: 80.89887640449437
   f1_score: 82.75862068965516
   specificity: 88.98305084745762
   true_positive: 72
   true_negative: 105
   false_positive: 13
   false_negative: 17
   sklearn_confusion_matrix: [[105, 13], [17, 72]]
nTraining instances: 483
nTest instances: 207

---------------   Explainer   ----------------
For the evaluation number 0:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 1361

---------------   Instances   ----------------
number of instances selected: 1
----------------------------------------------

explainer = Explainer.initialize(model, instance, features_type=australian_types)
majoritary_reason = explainer.majoritary_reason(time_limit=10)
print("majoritary_reason", len(majoritary_reason))
explainer.visualisation.notebook(instance, majoritary_reason)

---------   Theory Feature Types   -----------
Before the encoding (without one hot encoded features), we have:
Numerical features: 6
Categorical features: 4
Binary features: 4
Number of features: 14
Values of categorical features: {'A4_1': ['A4', 1, (1, 2, 3)], 'A4_2': ['A4', 2, (1, 2, 3)], 'A4_3': ['A4', 3, (1, 2, 3)], 'A5_1': ['A5', 1, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_2': ['A5', 2, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_3': ['A5', 3, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_4': ['A5', 4, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_5': ['A5', 5, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_6': ['A5', 6, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_7': ['A5', 7, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_8': ['A5', 8, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_9': ['A5', 9, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_10': ['A5', 10, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_11': ['A5', 11, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_12': ['A5', 12, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_13': ['A5', 13, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A5_14': ['A5', 14, (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)], 'A6_1': ['A6', 1, (1, 2, 3, 4, 5, 7, 8, 9)], 'A6_2': ['A6', 2, (1, 2, 3, 4, 5, 7, 8, 9)], 'A6_3': ['A6', 3, (1, 2, 3, 4, 5, 7, 8, 9)], 'A6_4': ['A6', 4, (1, 2, 3, 4, 5, 7, 8, 9)], 'A6_5': ['A6', 5, (1, 2, 3, 4, 5, 7, 8, 9)], 'A6_7': ['A6', 7, (1, 2, 3, 4, 5, 7, 8, 9)], 'A6_8': ['A6', 8, (1, 2, 3, 4, 5, 7, 8, 9)], 'A6_9': ['A6', 9, (1, 2, 3, 4, 5, 7, 8, 9)], 'A12_1': ['A12', 1, (1, 2, 3)], 'A12_2': ['A12', 2, (1, 2, 3)], 'A12_3': ['A12', 3, (1, 2, 3)]}

Number of used features in the model (before the encoding): 14
Number of used features in the model (after the encoding): 38
----------------------------------------------
majoritary_reason 12

png

With an image dataset

We use a modified version of MNIST dataset that focuses on 4 and 9 digits. We create a model using the hold-out approach (by default, the test size is set to 30%). We chose to use a Boosted Tree by using XGBoost.

from pyxai import Learning, Explainer, Tools


# Machine learning part
learner = Learning.Xgboost("../dataset/mnist49.csv", problem_type=Learning.CLASSIFICATION)
model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.BT)
instance, prediction = learner.get_instances(model, n=1, is_correct=True, subset_predicted_classes=[0])

data:
       0  1  2  3  4  5  6  7  8  9  ...  775  776  777  778  779  780  781   
0      0  0  0  0  0  0  0  0  0  0  ...    0    0    0    0    0    0    0  \
1      0  0  0  0  0  0  0  0  0  0  ...    0    0    0    0    0    0    0   
2      0  0  0  0  0  0  0  0  0  0  ...    0    0    0    0    0    0    0   
3      0  0  0  0  0  0  0  0  0  0  ...    0    0    0    0    0    0    0   
4      0  0  0  0  0  0  0  0  0  0  ...    0    0    0    0    0    0    0   
...   .. .. .. .. .. .. .. .. .. ..  ...  ...  ...  ...  ...  ...  ...  ...   
13777  0  0  0  0  0  0  0  0  0  0  ...    0    0    0    0    0    0    0   
13778  0  0  0  0  0  0  0  0  0  0  ...    0    0    0    0    0    0    0   
13779  0  0  0  0  0  0  0  0  0  0  ...    0    0    0    0    0    0    0   
13780  0  0  0  0  0  0  0  0  0  0  ...    0    0    0    0    0    0    0   
13781  0  0  0  0  0  0  0  0  0  0  ...    0    0    0    0    0    0    0   

       782  783  784  
0        0    0    4  
1        0    0    9  
2        0    0    4  
3        0    0    9  
4        0    0    4  
...    ...  ...  ...  
13777    0    0    4  
13778    0    0    4  
13779    0    0    4  
13780    0    0    9  
13781    0    0    4  

[13782 rows x 785 columns]
--------------   Information   ---------------
Dataset name: ../dataset/mnist49.csv
nFeatures (nAttributes, with the labels): 785
nInstances (nObservations): 13782
nLabels: 2
---------------   Evaluation   ---------------
method: HoldOut
output: BT
learner_type: Classification
learner_options: {'seed': 0, 'max_depth': None, 'eval_metric': 'mlogloss'}
---------   Evaluation Information   ---------
For the evaluation number 0:
metrics:
   accuracy: 98.57315598548972
   precision: 98.53911404335533
   recall: 98.67862199150542
   f1_score: 98.60881867484083
   specificity: 98.4623015873016
   true_positive: 2091
   true_negative: 1985
   false_positive: 31
   false_negative: 28
   sklearn_confusion_matrix: [[1985, 31], [28, 2091]]
nTraining instances: 9647
nTest instances: 4135

---------------   Explainer   ----------------
For the evaluation number 0:
**Boosted Tree model**
NClasses: 2
nTrees: 100
nVariables: 1590

---------------   Instances   ----------------
number of instances selected: 1
----------------------------------------------

# Explanation part
explainer = Explainer.initialize(model, instance)
minimal_tree_specific_reason = explainer.minimal_tree_specific_reason(time_limit=10)
print("len minimal tree_specific_reason:", len(minimal_tree_specific_reason))

len minimal tree_specific_reason: 187

For a dataset containing images, you need to provide certain information specific to images (through the image parameter of the visualization method) in order to display instances and explanations correctly. Here we give an example for the MNIST images, each of which is composed of 28 $\times$ 28 pixels, and the value of a pixel is an 8-bit integer:

import numpy

def get_pixel_value(instance, x, y, shape):
    index = x * shape[0] + y 
    return instance[index]

def instance_index_to_pixel_position(i, shape):
    return i // shape[0], i % shape[0]

explainer.visualisation.notebook(instance, minimal_tree_specific_reason, image={"shape": (28,28),
                      "dtype": numpy.uint8,
                      "get_pixel_value": get_pixel_value,
                      "instance_index_to_pixel_position": instance_index_to_pixel_position})

png

<Figure size 432x288 with 0 Axes>

Blue (resp. Red) pixels represent positive (resp. negative) conditions in the explanation. A blue (resp. red) pixel means that the color of the pixel must be white (black) to be included in the explanation. In this way, all instances (images) that have white pixels when the explanation has blue pixels and have black pixels when the explanation has red pixels will have the same classification (in our case, it will be 4).

You can open on your screen the images with:

explainer.visualisation.screen(instance, minimal_tree_specific_reason, image={"shape": (28,28),
                      "dtype": numpy.uint8,
                      "get_pixel_value": get_pixel_value,
                      "instance_index_to_pixel_position": instance_index_to_pixel_position})

<Figure size 432x288 with 0 Axes>

You can save PNG files with:

explainer.visualisation.save_png("mnist49_example.png", instance, minimal_tree_specific_reason, image={"shape": (28,28),
                      "dtype": numpy.uint8,
                      "get_pixel_value": get_pixel_value,
                      "instance_index_to_pixel_position": instance_index_to_pixel_position})

<Figure size 432x288 with 0 Axes>

You can get PIL images with:

images = explainer.visualisation.get_PILImage(instance, minimal_tree_specific_reason, image={"shape": (28,28),
                      "dtype": numpy.uint8,
                      "get_pixel_value": get_pixel_value,
                      "instance_index_to_pixel_position": instance_index_to_pixel_position})
print(images)

[<PIL.Image.Image image mode=L size=28x28 at 0x7FF4D8850C10>, <PIL.Image.Image image mode=RGBA size=28x28 at 0x7FF4D86571F0>]

<Figure size 432x288 with 0 Axes>

You can resize a PIL image with:

image_resized = explainer.visualisation.resize_PILimage(images[1], 100)
from IPython.display import display
display(image_resized)

png

With a time series dataset

The ArrowHead dataset consists of outlines of the images of arrowheads. The shapes of the projectile points are converted into a time series using the angle-based method. The classes (called “Avonlea”, “Clovis” and “Mix”) are based on shape distinctions such as the presence and location of a notch in the arrow.

Note that we need the sktime package to load this dataset: python3 -m pip install sktime We start by preparing the data:

import pandas
from sktime.datasets import load_arrow_head

X, y = load_arrow_head()

rawdata = []
n_instances = X["dim_0"].shape[0]
n_features = len(X["dim_0"][0])
print("n_instances:", n_instances)
print("n_features:", n_features)

for i in range(n_instances):
    instance = X["dim_0"][i]
    label = y[i]
    rawdata.append(list(instance)+[label])
    
dataframe = pandas.DataFrame(rawdata)
dataframe.columns = [str(i) for i in range(n_features)]+["label"]
print(dataframe)
    

n_instances: 211
n_features: 251
            0         1         2         3         4         5         6   
 -1.963009 -1.957825 -1.956145 -1.938289 -1.896657 -1.869857 -1.838705  \
 -1.774571 -1.774036 -1.776586 -1.730749 -1.696268 -1.657377 -1.636227   
 -1.866021 -1.841991 -1.835025 -1.811902 -1.764390 -1.707687 -1.648280   
 -2.073758 -2.073301 -2.044607 -2.038346 -1.959043 -1.874494 -1.805619   
 -1.746255 -1.741263 -1.722741 -1.698640 -1.677223 -1.630356 -1.579440   
..        ...       ...       ...       ...       ...       ...       ...   
-1.625142 -1.622988 -1.626062 -1.610799 -1.568567 -1.552384 -1.535922   
-1.657757 -1.664673 -1.632645 -1.611002 -1.591838 -1.548710 -1.502004   
-1.603279 -1.587365 -1.577407 -1.556043 -1.531201 -1.505051 -1.487764   
-1.739020 -1.741534 -1.732863 -1.722299 -1.698109 -1.665590 -1.618474   
-1.630727 -1.629918 -1.620556 -1.607832 -1.579719 -1.562561 -1.526980   

            7         8         9  ...       242       243       244   
 -1.812289 -1.736433 -1.673329  ... -1.655329 -1.719153 -1.750881  \
 -1.609807 -1.543439 -1.486174  ... -1.484666 -1.539972 -1.590150   
 -1.582643 -1.531502 -1.493609  ... -1.652337 -1.684565 -1.743972   
 -1.731043 -1.712653 -1.628022  ... -1.743732 -1.819801 -1.858136   
 -1.551225 -1.473980 -1.459377  ... -1.607101 -1.635137 -1.686346   
..        ...       ...       ...  ...       ...       ...       ...   
-1.498036 -1.460416 -1.430095  ... -1.447404 -1.476224 -1.531105   
-1.447861 -1.389342 -1.322178  ... -1.580908 -1.631919 -1.664673   
-1.455712 -1.419410 -1.391145  ... -1.492721 -1.511071 -1.539271   
-1.585797 -1.547740 -1.504280  ... -1.487935 -1.536131 -1.594358   
-1.507067 -1.464752 -1.417547  ... -1.350127 -1.403990 -1.451258   

          245       246       247       248       249       250  label  
 -1.796273 -1.841345 -1.884289 -1.905393 -1.923905 -1.909153      0  
 -1.635663 -1.639989 -1.678683 -1.729227 -1.775670 -1.789324      1  
 -1.799117 -1.829069 -1.875828 -1.862512 -1.863368 -1.846493      2  
 -1.886146 -1.951247 -2.012927 -2.026963 -2.073405 -2.075292      0  
 -1.691274 -1.716886 -1.740726 -1.743442 -1.762729 -1.763428      1  
..        ...       ...       ...       ...       ...       ...    ...  
-1.551077 -1.567998 -1.597641 -1.623653 -1.622988 -1.624174      2  
-1.672895 -1.678613 -1.666825 -1.667466 -1.681769 -1.678613      2  
-1.565389 -1.577493 -1.585284 -1.603086 -1.605382 -1.605597      2  
-1.615147 -1.636495 -1.669889 -1.699870 -1.713118 -1.728176      2  
-1.472321 -1.513637 -1.550431 -1.581576 -1.595273 -1.620783      2  

[211 rows x 252 columns]

We create a model using the hold-out approach (by default, the test size is set to 30%). We chose to use a Boosted Tree by using XGBoost.

from pyxai import Learning, Explainer, Tools


# Machine learning part
learner = Learning.Xgboost(dataframe, problem_type=Learning.CLASSIFICATION)
model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.BT)
instance, prediction = learner.get_instances(model, n=1, is_correct=True, subset_predicted_classes=[0])

data:
            0         1         2         3         4         5         6   
0   -1.963009 -1.957825 -1.956145 -1.938289 -1.896657 -1.869857 -1.838705  \
1   -1.774571 -1.774036 -1.776586 -1.730749 -1.696268 -1.657377 -1.636227   
2   -1.866021 -1.841991 -1.835025 -1.811902 -1.764390 -1.707687 -1.648280   
3   -2.073758 -2.073301 -2.044607 -2.038346 -1.959043 -1.874494 -1.805619   
4   -1.746255 -1.741263 -1.722741 -1.698640 -1.677223 -1.630356 -1.579440   
..        ...       ...       ...       ...       ...       ...       ...   
206 -1.625142 -1.622988 -1.626062 -1.610799 -1.568567 -1.552384 -1.535922   
207 -1.657757 -1.664673 -1.632645 -1.611002 -1.591838 -1.548710 -1.502004   
208 -1.603279 -1.587365 -1.577407 -1.556043 -1.531201 -1.505051 -1.487764   
209 -1.739020 -1.741534 -1.732863 -1.722299 -1.698109 -1.665590 -1.618474   
210 -1.630727 -1.629918 -1.620556 -1.607832 -1.579719 -1.562561 -1.526980   

            7         8         9  ...       242       243       244   
0   -1.812289 -1.736433 -1.673329  ... -1.655329 -1.719153 -1.750881  \
1   -1.609807 -1.543439 -1.486174  ... -1.484666 -1.539972 -1.590150   
2   -1.582643 -1.531502 -1.493609  ... -1.652337 -1.684565 -1.743972   
3   -1.731043 -1.712653 -1.628022  ... -1.743732 -1.819801 -1.858136   
4   -1.551225 -1.473980 -1.459377  ... -1.607101 -1.635137 -1.686346   
..        ...       ...       ...  ...       ...       ...       ...   
206 -1.498036 -1.460416 -1.430095  ... -1.447404 -1.476224 -1.531105   
207 -1.447861 -1.389342 -1.322178  ... -1.580908 -1.631919 -1.664673   
208 -1.455712 -1.419410 -1.391145  ... -1.492721 -1.511071 -1.539271   
209 -1.585797 -1.547740 -1.504280  ... -1.487935 -1.536131 -1.594358   
210 -1.507067 -1.464752 -1.417547  ... -1.350127 -1.403990 -1.451258   

          245       246       247       248       249       250  label  
0   -1.796273 -1.841345 -1.884289 -1.905393 -1.923905 -1.909153      0  
1   -1.635663 -1.639989 -1.678683 -1.729227 -1.775670 -1.789324      1  
2   -1.799117 -1.829069 -1.875828 -1.862512 -1.863368 -1.846493      2  
3   -1.886146 -1.951247 -2.012927 -2.026963 -2.073405 -2.075292      0  
4   -1.691274 -1.716886 -1.740726 -1.743442 -1.762729 -1.763428      1  
..        ...       ...       ...       ...       ...       ...    ...  
206 -1.551077 -1.567998 -1.597641 -1.623653 -1.622988 -1.624174      2  
207 -1.672895 -1.678613 -1.666825 -1.667466 -1.681769 -1.678613      2  
208 -1.565389 -1.577493 -1.585284 -1.603086 -1.605382 -1.605597      2  
209 -1.615147 -1.636495 -1.669889 -1.699870 -1.713118 -1.728176      2  
210 -1.472321 -1.513637 -1.550431 -1.581576 -1.595273 -1.620783      2  

[211 rows x 252 columns]
--------------   Information   ---------------
Dataset name: pandas.core.frame.DataFrame
nFeatures (nAttributes, with the labels): 252
nInstances (nObservations): 211
nLabels: 3
---------------   Evaluation   ---------------
method: HoldOut
output: BT
learner_type: Classification
learner_options: {'seed': 0, 'max_depth': None, 'eval_metric': 'mlogloss'}
---------   Evaluation Information   ---------
For the evaluation number 0:
metrics:
   micro_averaging_accuracy: 89.58333333333334
   micro_averaging_precision: 84.375
   micro_averaging_recall: 84.375
   macro_averaging_accuracy: 89.58333333333334
   macro_averaging_precision: 84.61147421931736
   macro_averaging_recall: 85.05072463768116
   true_positives: {'0': 18, '1': 22, '2': 14}
   true_negatives: {'0': 37, '1': 36, '2': 45}
   false_positives: {'0': 2, '1': 5, '2': 3}
   false_negatives: {'0': 7, '1': 1, '2': 2}
   accuracy: 84.375
   sklearn_confusion_matrix: [[18, 4, 3], [1, 22, 0], [1, 1, 14]]
nTraining instances: 147
nTest instances: 64

---------------   Explainer   ----------------
For the evaluation number 0:
**Boosted Tree model**
NClasses: 3
nTrees: 300
nVariables: 345

---------------   Instances   ----------------
number of instances selected: 1
----------------------------------------------

Next, we calculate an explanation for an arrow (an instance) that is classified as 0 (corresponding to the Avonlea era). We set the time limit to 60 seconds.

# Explanation part
explainer = Explainer.initialize(model, instance)
tree_specific_reason = explainer.tree_specific_reason(time_limit=60)
print("tree_specific_reason:", len(tree_specific_reason))

tree_specific_reason: 38

If we display this explanation in a Jupyter notebook, it is difficult to interpret.

explainer.visualisation.notebook(instance, tree_specific_reason)

png

To overcome this problem, we display it as a time series.

explainer.visualisation.notebook(instance, tree_specific_reason, time_series={"serie":learner.feature_names[:-1]}, width=600)

png

The red line represents the instance values. The two blue lines represent the extreme values of the explanation for each feature. A blue value at the top (resp. at the bottom) means that the extreme value for the associated feature is $+\infty$ (resp. $-\infty$).

The explanation shows us that all the other instances (other lines we can imagine) between the two blue lines will have the same classification. In our case, we clearly see that the explanation tells us that it is because of the two bumps on each side that this instance is classified as such.