Link Search Menu Expand Document
PyXAI
Papers Video GitHub In-the-Loop EXPEKCTATION Release Notes About
download notebook

Concepts

This section deals with the concepts of the Explaining object of PyXAI. First, we show how to use it, then we explain the notion of binary variables, and finally we give an example.

Main Methods

First of all, in order to explain instances from a given model, you need to create an Explainer Object. This is done using the function Explaining.initialize.

Once the Explainer is created you can explain several instances using it. Each time you want to explain a new instance, you need to call the function set_instance (it is not necessary to create a new Explainer object).

The prediction made by the ML model is given by the target_prediction variable of the Explainer Object. Now, we have to consider binary representation.

Binary representation

First, let us recall that all the ML models we consider consist of trees (only one for a decision tree). Each tree contains nodes representing conditions “<id_feature> <operator> <threshold> ?” (such as “$x_4 \ge 0.5$ ?”). Internally, the Explainer works with these conditions, treated as Boolean variables. The binary representation of an instance is a set of Boolean variables matching such conditions. Each Boolean variable represents a condition “<id_feature> <operator> <threshold> ?” of the model. The binary representation can be found in the binary_representation attribute of the Explainer object. The function to_features converts a binary representation (or an explanation) into a list of conditions “<id_feature> <operator> <threshold>” representing the features used. It is possible to obtain detailed results — in that case, a dictionary is returned with all details.

The function to_features is independent of the instance given by the explainer through the initialize and set_instance methods, but depends only on the binary representation given by the parameter.

Example

We present below an example based on the iris dataset and a Decision Tree as ML model. You should take a look at the Generating Model page if you need more information about the Learning module.

from pyxai import Learning, Explaining

learner = Learning.Scikitlearn("../../dataset/iris.csv", problem_type=Learning.CLASSIFICATION)
model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.DT)

instance, prediction = learner.get_instances(model, n=1, is_correct=True)
explainer = Explaining.initialize(model, instance)

print("instance:", instance)
print("binary representation:", explainer.binary_representation)
print("target_prediction:", explainer.target_prediction)

print("to_features:", explainer.to_features(explainer.binary_representation))
print("to_features (keep redundant):", explainer.to_features(explainer.binary_representation, eliminate_redundant_features=False))

print("to_features with details:", explainer.to_features([-1], details=True))
--------------   Information   ---------------
Problem type: classification
Instances type: tabular
Labels type: classes

Dataset path: ../../dataset/iris.csv
nFeatures (nAttributes, with the labels): 4
nInstances (nObservations): 150
nLabels: 3
---------------   Model creation, fitting and evaluation  ---------------
Splitting method: hold-out
Problem type: classification
Models type: decision-tree
model_parameters: {}
---------   Evaluation Information   ---------
For the evaluation number 0:
Metrics:
   micro_averaging_accuracy: 96.49122807017544
   micro_averaging_precision: 94.73684210526315
   micro_averaging_recall: 94.73684210526315
   macro_averaging_accuracy: 96.49122807017542
   macro_averaging_precision: 93.73219373219372
   macro_averaging_recall: 93.73219373219372
   true_positives: {'Iris-setosa': 16, 'Iris-versicolor': 8, 'Iris-virginica': 12}
   true_negatives: {'Iris-setosa': 22, 'Iris-versicolor': 28, 'Iris-virginica': 24}
   false_positives: {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 1}
   false_negatives: {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 1}
   accuracy: 94.73684210526315
   sklearn_confusion_matrix: [[16, 0, 0], [0, 8, 1], [0, 1, 12]]
Number of Training instances: 112
Number of Testing instances: 38

---------------   Explainer   ----------------
For the split number 0:
**Decision Tree Model**
nFeatures: 4
nNodes: 8
nVariables: 8

---------------   Instances   ----------------
Number of instances selected: 1
----------------------------------------------
instance: Sepal.Length    5.1
Sepal.Width     3.5
Petal.Length    1.4
Petal.Width     0.2
Name: 0, dtype: float64
binary representation: (-1, -2, -3, -4, -5, -6, -7, -8)
target_prediction: Iris-setosa
to_features: ['Sepal.Length <= 5.950000047683716', 'Petal.Length <= 2.449999988079071', 'Petal.Width <= 1.550000011920929']
to_features (keep redundant): ['Sepal.Length <= 6.599999904632568', 'Sepal.Length <= 5.950000047683716', 'Petal.Length <= 2.449999988079071', 'Petal.Length <= 4.75', 'Petal.Length <= 4.950000047683716', 'Petal.Length <= 4.8500001430511475', 'Petal.Width <= 1.699999988079071', 'Petal.Width <= 1.550000011920929']
to_features with details: OrderedDict({'Petal.Length': [{'id': 3, 'name': 'Petal.Length', 'operator_sign_considered': <OperatorCondition.LE: 'LE'>, 'threshold': np.float64(2.449999988079071), 'weight': None, 'string': 'Petal.Length <= 2.449999988079071'}]})

We notice that the binary representation contains more variables than features because several features can appear multiple times in the decision tree — one condition per node. The function to_features eliminates redundant conditions by default, keeping only the tightest bound for each feature. Passing eliminate_redundant_features=False returns all conditions without simplification.