PyXAI

Concepts

This section deals with the concepts of the Explainer object of PyXAI. First, we show you how to use it, then we explain in detail the notion of binary variables, and to finish, we give an example.

Main Methods

First of all, in order to explain instances from a given model, you need to create an Explainer Object. This is done using the function Explainer.initialize.

Explainer.initialize(model, instance=None, features_type=None):
Depending on the model given in the first argument, this method creates an `ExplainerDT`, an `ExplainerRF` an `ExplainerBT`, or an `ExplainerRegressionBT` object. This object is able to give explanations about the instance given as a second parameter. This last parameter is optional because you can set the instance later using the `set_instance` function.
model `DecisionTree` `RandomForest` `BoostedTree`: The model for which explanations will be calculated.
instance `Numpy Array` of `Float`: The instance to be explained. Default value is `None`.
features_type `String` `Dict` `None`: Either a dictionary indicating the type of features or the path to a `.types` file containing this information. Activate domain theories. More details are given on the Theories page.

Once the Explainer is created you can explain several instances using it. Each time you want to explain a new instance, you need to call the function set_instance (it is not necessary to create a new Explainer object).

<Explainer Object>.set_instance(instance):
Sets a new instance to be explained.
instance `Numpy Array` of `Float`: The instance to be explained.

The prediction made by the ML model is given by the target_prediction variable of the Explainer Object. Now, we have to consider binary representation.

Binary representation

First, let us recall that all the ML models we considerconsist of trees (only one for a decision tree). Each tree contains nodes representing conditions “<id_feature> <operator> <threshold> ?” (such as “$x_4 \ge 0.5$ ?”). Internally, the Explainer works with these conditions, that are treated as Boolean variables. The binary representation of an instance is a set of Boolean variables matching such conditions. Each Boolean variable represents a condition “<id_feature> <operator> <threshold> ?” of the model. The binary representation can be found in the binary_variable variable of the Explainer Object. The function to_features converts a binary representation (or an explanation) into a tuple of conditions “ <id_feature> <operator> <threshold>” representing the features used.

<Explainer Object>.to_features(self, binary_representation, *, eliminate_redundant_features=True, details=False, contrastive=False, without_intervals=False):
When the parameter details is set to `False`, returns a `Tuple` of `String` where each `String` represents a condition “<id_feature> <operator> <threshold>” associated with the binary representation given as first parameter. By default, a string represents such a condition but if you need more information, you can set the parameter `details` to `True`. In this case, the method returns a `Tuple` of `Dict` where each dictionary provides more information on the condition. This method also allows one to eliminate redundant conditions. For example, if we have “feature_a > 3” and “feature_a > 2”, we keep only the binary variable linked to the Boolean corresponding to the “feature_a > 3”. Therefore, if you want to get all conditions, you have to set the parameter `eliminate_redundant` to `False`.
binary_representation `List` `Tuple`: A set of (signed) binary variables.
eliminate_redundant_features `Boolean`: `True` or `False` depending if you want eliminate or not redundant conditions. Default value is `True`.
details `Boolean`: `True` or `False` depending on whether you want details or not. Default value is `False`.
contrastive `Boolean`: `True` or `False` depending on whether you want to get a contrastive explanation or not. When this parameter is set to `True`, the elimination of redundant features must be reversed. Default value is `False`.
without_intervals `Boolean`: `True` or `False` depending if you want to consider a compact representation with intervals or not.

The details provided with the details parameter set to True in the to_features function are represented by the keys of the returned dictionary:

["id"]: The id of the feature.
["name"]: The name of the feature (if labels are known, otherwise they are named f1, f2 and so on).
["operator"]: The operator associated with the condition.
["threshold"]: The threshold of the condition.
["sign"]: The sign of the Boolean variable in the binary representation: True if the condition is satisfied else False.
["weight"]: The weight of the condition, used only with user preferences.

Explanations computed using out explainer module may contain redundant conditions. Let us take an example with a feature $f_1$ and two Boolean variables $x_1$ and $x_2$ associated with the condition $(f_1 \ge 5)$ and $(f_1 \ge 3)$ respectively. If in the instance $f_1=6$ then $x_1$ and $x_2$ are set to true. The explanation that is derived can involve both of them. By setting the eliminate_redundant_features parameter to True in the method to_features, we remove $(f_1 \ge 3)$ which is redundant.

Forgetting to set parameter contrastive to True to display a contrastive explanation may result in an incorrect explanation.

The function to_features is independent of the instance given by the explainer through the initialize and set_instance methods, but depends only on the binary representation given by the parameter.

Example

We present in the following an example based on the dataset iris and a Decision Tree as ML model. You should take a look to the Generating Model page if you need more information about the Learning module.

from pyxai import Learning, Explainer, Tools
learner = Learning.Scikitlearn("../../dataset/iris.csv", learner_type=Learning.CLASSIFICATION)
model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.DT)

instance, prediction = learner.get_instances(model, n=1, correct=True, predictions=[0])
explainer = Explainer.initialize(model, instance)

print("instance:", instance)
print("binary representation:", explainer.binary_representation)
print("target_prediction:", explainer.target_prediction)

print("to_features:", explainer.to_features(explainer.binary_representation))
print("to_features (keep redundant):", explainer.to_features(explainer.binary_representation, eliminate_redundant_features=False))

print("to_features with details:", explainer.to_features([-1], details=True))

data:
     Sepal.Length  Sepal.Width  Petal.Length  Petal.Width         Species
0             5.1          3.5           1.4          0.2     Iris-setosa
1             4.9          3.0           1.4          0.2     Iris-setosa
2             4.7          3.2           1.3          0.2     Iris-setosa
3             4.6          3.1           1.5          0.2     Iris-setosa
4             5.0          3.6           1.4          0.2     Iris-setosa
..            ...          ...           ...          ...             ...
145           6.7          3.0           5.2          2.3  Iris-virginica
146           6.3          2.5           5.0          1.9  Iris-virginica
147           6.5          3.0           5.2          2.0  Iris-virginica
148           6.2          3.4           5.4          2.3  Iris-virginica
149           5.9          3.0           5.1          1.8  Iris-virginica

[150 rows x 5 columns]
--------------   Information   ---------------
Dataset name: ../../dataset/iris.csv
nFeatures (nAttributes, with the labels): 5
nInstances (nObservations): 150
nLabels: 3
---------------   Evaluation   ---------------
method: HoldOut
output: DT
learner_type: Classification
learner_options: {'max_depth': None, 'random_state': 0}
---------   Evaluation Information   ---------
For the evaluation number 0:
metrics:
   accuracy: 97.77777777777777
nTraining instances: 105
nTest instances: 45

---------------   Explainer   ----------------
For the evaluation number 0:
**Decision Tree Model**
nFeatures: 4
nNodes: 6
nVariables: 5

---------------   Instances   ----------------
number of instances selected: 1
----------------------------------------------
instance: [5.1 3.5 1.4 0.2]
binary representation: (-1, -2, -3, 4, -5)
target_prediction: 0
to_features: ('Sepal.Width > 3.100000023841858', 'Petal.Length <= 4.950000047683716', 'Petal.Width <= 0.75')
to_features (keep redundant): ('Sepal.Width > 3.100000023841858', 'Petal.Length <= 4.950000047683716', 'Petal.Width <= 0.75', 'Petal.Width <= 1.6500000357627869', 'Petal.Width <= 1.75')
to_features with details: OrderedDict([('Petal.Width', [{'id': 4, 'name': 'Petal.Width', 'operator': <OperatorCondition.GT: 52>, 'sign': True, 'operator_sign_considered': <OperatorCondition.LE: 51>, 'threshold': 0.75, 'weight': None, 'theory': None, 'string': 'Petal.Width <= 0.75'}])])

We notice that the binary representation contains more than 4 variables because the decision tree of the model is composed of five nodes (i.e., five conditions). Indeed, the feature Petal.Width appears 3 times whereas the feature Sepal.length does not appear. We can see that, for this binary representation, we can eliminate two redundant conditions related to the Petal.width feature.