Link Search Menu Expand Document
PyXAI
Papers Video GitHub In-the-Loop EXPEKCTATION Release Notes About
download notebook

Direct Reason

Let $BT$ be a boosted tree composed of {$T_1,\ldots T_n$} regression trees and $x$ an instance, the direct reason for $x$ is the term of the binary representation corresponding to the conjunction of the terms corresponding to the root-to-leaf paths of all $T_i$ that are compatible with $x$. Due to its simplicity, it is one of the easiest abductive explanation for $x$ to compute, but it can be highly redundant. More information about the direct reason can be found in the article Computing Abductive Explanations for Boosted Trees.

The basic methods (initialize, set_instance, to_features, is_reason, …) of the Explaining module used in the next examples are described in the Explainer Principles page.

Example from Hand-Crafted Trees

For this example, we take an example of binary classification from the Building Models page. This figure represents a boosted tree $BT$ using $4$ features ($A_1$, $A_2$, $A_3$ and $A_4$), where $A_1$ and $A_2$ are numerical, $A_3$ is categorical and $A_4$ is Boolean. The direct reason for the instance $x$ = ($A_1=4$, $A_2 = 3$, $A_3 = 1$, $A_4 = 1$) is in red. This reason contains all features of the instance.

BTdirect

We have $w(T_1, x)=0.3$, $w(T_2, x)=0.5$ and $w(T_3, x)=0.1$. So $W(F, x) = 0.9$. As we are in the case of binary classification and $W(F, x) > 0$, $x$ is classified as a positive instance ($BT(x) = 1$).

We consider that the features $A_3$ and $A_4$ are numerical. Categorical and Boolean features will be implemented in future versions of PyXAI.

We now show how to get direct reasons using PyXAI:

from pyxai import Builder, Explaining

node1_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.3)
node1_2 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=-0.3, right=node1_1)
node1_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=0.4, right=node1_2)
node1_4 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=node1_3)
tree1 = Builder.DecisionTree(4, node1_4)

node2_1 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.4, right=0.3)
node2_2 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=node2_1)
node2_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node2_2, right=0.5)
tree2 = Builder.DecisionTree(4, node2_3)

node3_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=0.2, right=0.3)
node3_2_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.2)
node3_2_2 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.1, right=node3_1)
node3_2_3 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=0.1)
node3_3_1 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node3_2_1, right=node3_2_2)
node3_3_2 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=-0.4, right=node3_2_3)
node3_4 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=node3_3_1, right=node3_3_2)
tree3 = Builder.DecisionTree(4, node3_4)

BT = Builder.BoostedTrees([tree1, tree2, tree3], n_classes=2)

We compute the direct reason for this instance:

explainer = Explaining.initialize(BT)
explainer.set_instance((4,3,1,1))
direct = explainer.direct_reason()
print("instance: (4,3,2,1)")
print("binary_representation:", explainer.binary_representation)
print("target_prediction:", explainer.target_prediction)
print("direct:", direct)
print("to_features:", explainer.to_features(direct))

instance: (4,3,2,1)
binary_representation: (1, 2, 3, 4)
target_prediction: 1
direct: (1, 2, 3, 4)
to_features: ['f1 > 2', 'f2 > 1', 'f3 == 1', 'f4 == 1']

As you can see, in this case, the direct reason coincides with the full instance.

Example from a Real Dataset

For this example, we take the compas.csv dataset. We create one model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance.

from pyxai import Learning, Explaining

learner = Learning.Xgboost("../../../dataset/compas.csv", problem_type='classification')
model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.BT)
instance, prediction = learner.get_instances(model, n=1, is_correct=True)
--------------   Information   ---------------
Problem type: classification
Instances type: tabular
Labels type: classes

Dataset path: ../../../dataset/compas.csv
nFeatures (nAttributes, with the labels): 11
nInstances (nObservations): 6172
nLabels: 2
---------------   Model creation, fitting and evaluation  ---------------
Splitting method: hold-out
Problem type: classification
Models type: boosted-tree
model_parameters: {}
---------   Evaluation Information   ---------
For the evaluation number 0:
Metrics:
   sklearn_confusion_matrix: [[653, 206], [295, 389]]
   precision: 65.3781512605042
   recall: 56.87134502923976
   f1_score: 60.82877247849882
   specificity: 76.0186263096624
   true_positive: 389
   true_negative: 653
   false_positive: 206
   false_negative: 295
   accuracy: 67.53078418664938
Number of Training instances: 4629
Number of Testing instances: 1543

---------------   Explainer   ----------------
For the split number 0:
**Boosted Tree model**
NClasses: 2
nTrees: 100
nVariables: 38

---------------   Instances   ----------------
Number of instances selected: 1
----------------------------------------------

Finally, we display the direct reason for this instance:

explainer = Explaining.initialize(model, instance)
print("instance:", instance)
print("prediction:", prediction)
print()
direct_reason = explainer.direct_reason()
print("len binary representation:", len(explainer.binary_representation))
print("len direct:", len(direct_reason))
print("is_reason:", explainer.is_implicant(direct_reason))
print("to_features:", explainer.to_features(direct_reason))
instance: Misdemeanor             0
Number_of_Priors        0
score_factor            0
Age_Above_FourtyFive    1
Age_Below_TwentyFive    0
African_American        0
Asian                   0
Hispanic                0
Native_American         0
Other                   1
Female                  0
Name: 0, dtype: int64
prediction: 0

len binary representation: 38
len direct: 36
is_reason: True
to_features: ['Misdemeanor < 1.0', 'Number_of_Priors < 1.0', 'score_factor < 1.0', 'Age_Above_FourtyFive >= 1.0', 'Age_Below_TwentyFive < 1.0', 'African_American < 1.0', 'Asian < 1.0', 'Hispanic < 1.0', 'Native_American < 1.0', 'Other >= 1.0', 'Female < 1.0']

We can remark that this direct reason contains 38 binary variables of the implicant out of 42. This reason explains why the model predicts for this instance. But this is probably not the most compact reason for this instance, we invite you to look at the other types of reasons presented on the Explanations Computation page.