Papers Video GitHub In-the-Loop EXPEKCTATION Release Notes About

Direct Reason

Let $RF=${$T_1, \ldots, T_n$} be a random forest and $x$ be an instance, the direct reason for $x$ is a term of the binary representation corresponding to the conjunction of the terms corresponding to the root-to-leaf paths of all $T_i$ that are compatible with $x$. Due to its simplicity, it is an abductive explanation that is easy to compute, but it can be redundant. More information about the direct reason can be found in the paper Trading Complexity for Sparsity in Random Forest Explanations.

The basic methods (initialize, set_instance, to_features, is_reason, …) of the Explainer module used in the next examples are described in the Explainer Principles page.

Example from Hand-Crafted Trees

For this example, we take the Decision Tree of the Building Models page.

RFdirect

This figure represents a Random Forest with 3 Decision Trees using $4$ binary features ($x_1$, $x_2$, $x_3$ and $x_4$). The direct reason for the instance $(1,1,1,1)$ is in red and the one for $(0,0,0,0)$ is in blue. Now, we show how to get them with PyXAI. We start by building the random forest:

from pyxai import Builder, Explaining

nodeT1_1 = Builder.DecisionNode(1, left=0, right=1)
nodeT1_3 = Builder.DecisionNode(3, left=0, right=nodeT1_1)
nodeT1_2 = Builder.DecisionNode(2, left=1, right=nodeT1_3)
nodeT1_4 = Builder.DecisionNode(4, left=0, right=nodeT1_2)
tree1 = Builder.DecisionTree(4, nodeT1_4, force_features_equal_to_binaries=True)

nodeT2_4 = Builder.DecisionNode(4, left=0, right=1)
nodeT2_1 = Builder.DecisionNode(1, left=0, right=nodeT2_4)
nodeT2_2 = Builder.DecisionNode(2, left=nodeT2_1, right=1)
tree2 = Builder.DecisionTree(4, nodeT2_2, force_features_equal_to_binaries=True) #4 features but only 3 used

nodeT3_1_1 = Builder.DecisionNode(1, left=0, right=1)
nodeT3_1_2 = Builder.DecisionNode(1, left=0, right=1)
nodeT3_4_1 = Builder.DecisionNode(4, left=0, right=nodeT3_1_1)
nodeT3_4_2 = Builder.DecisionNode(4, left=0, right=1)
nodeT3_2_1 = Builder.DecisionNode(2, left=nodeT3_1_2, right=nodeT3_4_1)
nodeT3_2_2 = Builder.DecisionNode(2, left=0, right=nodeT3_4_2)
nodeT3_3_1 = Builder.DecisionNode(3, left=nodeT3_2_1, right=nodeT3_2_2)
tree3 = Builder.DecisionTree(4, nodeT3_3_1, force_features_equal_to_binaries=True)

forest = Builder.RandomForest([tree1, tree2, tree3], n_classes=2)

We compute the direct reasons for these two instances:

explainer = Explaining.initialize(forest)
explainer.set_instance((1,1,1,1))
direct = explainer.direct_reason()
print("instance: (1,1,1,1)")
print("binary representation:", explainer.binary_representation)
print("target_prediction:", explainer.target_prediction)
print("direct:", direct)
print("to_features:", explainer.to_features(direct))
print("------------------------------------------------")
explainer.set_instance((0,0,0,0))
direct = explainer.direct_reason()
print("instance: (0,0,0,0)")
print("binary representation:", explainer.binary_representation)
print("target_prediction:", explainer.target_prediction)
print("direct:", direct)
print("to_features:", explainer.to_features(direct))

instance: (1,1,1,1)
binary representation: (1, 2, 3, 4)
target_prediction: 1
direct: (1, 2, 3, 4)
to_features: ['f1 >= 0.5', 'f2 >= 0.5', 'f3 >= 0.5', 'f4 >= 0.5']
------------------------------------------------
instance: (0,0,0,0)
binary representation: (-1, -2, -3, -4)
target_prediction: 0
direct: (-1, -2, -3, -4)
to_features: ['f1 < 0.5', 'f2 < 0.5', 'f3 < 0.5', 'f4 < 0.5']

As you can see, in this case, the direct reason corresponds to the full instance.

Example from a Real Dataset

For this example, we take the compas dataset. We create one model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance.

from pyxai import Learning, Explaining

learner = Learning.Scikitlearn("../../../dataset/compas.csv", problem_type='classification', instances_type='tabular', labels_type='classes')
model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.RF)
instance, prediction = learner.get_instances(model, n=1, is_correct=True)

--------------   Information   ---------------
Problem type: classification
Instances type: tabular
Labels type: classes

Dataset path: ../../../dataset/compas.csv
nFeatures (nAttributes, with the labels): 11
nInstances (nObservations): 6172
nLabels: 2
---------------   Model creation, fitting and evaluation  ---------------
Splitting method: hold-out
Problem type: classification
Models type: random-forest
model_parameters: {}
---------   Evaluation Information   ---------
For the evaluation number 0:
Metrics:
   sklearn_confusion_matrix: [[604, 197], [313, 429]]
   precision: 68.53035143769968
   recall: 57.8167115902965
   f1_score: 62.71929824561404
   specificity: 75.40574282147315
   true_positive: 429
   true_negative: 604
   false_positive: 197
   false_negative: 313
   accuracy: 66.94750486066104
Number of Training instances: 4629
Number of Testing instances: 1543

---------------   Explainer   ----------------
For the split number 0:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 68

---------------   Instances   ----------------
Number of instances selected: 1
----------------------------------------------

We display the direct reason for this instance:

explainer = Explaining.initialize(model, instance)
print("instance:", instance)
print("prediction:", prediction)
print()
direct_reason = explainer.direct_reason()
print("len binary representation:", len(explainer.binary_representation))
print("len direct:", len(direct_reason))
print("is_reason:", explainer.is_reason(direct_reason))
print("to_features:", explainer.to_features(direct_reason))

instance: Number_of_Priors           0
score_factor               0
Age_Above_FourtyFive       1
Age_Below_TwentyFive       0
Origin_African_American    0
Origin_Asian               0
Origin_Hispanic            0
Origin_Native_American     0
Origin_Other               1
Female                     0
Misdemeanor                0
Name: 0, dtype: int64
prediction: 0

len binary representation: 68
len direct: 23
is_reason: True
to_features: ['Number_of_Priors <= 0.5', 'score_factor <= 0.5', 'Age_Above_FourtyFive > 0.5', 'Age_Below_TwentyFive <= 0.5', 'Origin_African_American <= 0.5', 'Origin_Asian <= 0.5', 'Origin_Hispanic <= 0.5', 'Origin_Native_American <= 0.5', 'Origin_Other > 0.5', 'Female <= 0.5', 'Misdemeanor <= 0.5']

We can remark that this direct reason contains 22 binary variables out of the 68 variables in the binary representation corresponding to the instance. This reason explains why the model predicts $0$ for this instance. But this is probably not the most compact abductive explanation for it. We invite you to look at the other types of reasons presented on the Explanations Computation page.