Direct Reason
Let $BT$ be a boosted tree composed of {$T_1,\ldots T_n$} regression trees and $x$ an instance, the direct reason for $x$ is the term of the binary representation corresponding to the conjunction of the terms corresponding to the root-to-leaf paths of all $T_i$ that are compatible with $x$. Due to its simplicity, it is one of the easiest abductive explanation for $x$ to compute, but it can be highly redundant. More information about the direct reason can be found in the article Computing Abductive Explanations for Boosted Trees.
The basic methods (initialize, set_instance, to_features, is_reason, …) of the Explaining module used in the next examples are described in the Explainer Principles page.
Example from Hand-Crafted Trees
For this example, we take an example of binary classification from the Building Models page. This figure represents a boosted tree $BT$ using $4$ features ($A_1$, $A_2$, $A_3$ and $A_4$), where $A_1$ and $A_2$ are numerical, $A_3$ is categorical and $A_4$ is Boolean. The direct reason for the instance $x$ = ($A_1=4$, $A_2 = 3$, $A_3 = 1$, $A_4 = 1$) is in red. This reason contains all features of the instance.

We have $w(T_1, x)=0.3$, $w(T_2, x)=0.5$ and $w(T_3, x)=0.1$. So $W(F, x) = 0.9$. As we are in the case of binary classification and $W(F, x) > 0$, $x$ is classified as a positive instance ($BT(x) = 1$).
We consider that the features $A_3$ and $A_4$ are numerical. Categorical and Boolean features will be implemented in future versions of PyXAI.
We now show how to get direct reasons using PyXAI:
from pyxai import Builder, Explaining
node1_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.3)
node1_2 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=-0.3, right=node1_1)
node1_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=0.4, right=node1_2)
node1_4 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=node1_3)
tree1 = Builder.DecisionTree(4, node1_4)
node2_1 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.4, right=0.3)
node2_2 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=node2_1)
node2_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node2_2, right=0.5)
tree2 = Builder.DecisionTree(4, node2_3)
node3_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=0.2, right=0.3)
node3_2_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.2)
node3_2_2 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.1, right=node3_1)
node3_2_3 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=0.1)
node3_3_1 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node3_2_1, right=node3_2_2)
node3_3_2 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=-0.4, right=node3_2_3)
node3_4 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=node3_3_1, right=node3_3_2)
tree3 = Builder.DecisionTree(4, node3_4)
BT = Builder.BoostedTrees([tree1, tree2, tree3], n_classes=2)
We compute the direct reason for this instance:
explainer = Explaining.initialize(BT)
explainer.set_instance((4,3,1,1))
direct = explainer.direct_reason()
print("instance: (4,3,2,1)")
print("binary_representation:", explainer.binary_representation)
print("target_prediction:", explainer.target_prediction)
print("direct:", direct)
print("to_features:", explainer.to_features(direct))
instance: (4,3,2,1)
binary_representation: (1, 2, 3, 4)
target_prediction: 1
direct: (1, 2, 3, 4)
to_features: ['f1 > 2', 'f2 > 1', 'f3 == 1', 'f4 == 1']
As you can see, in this case, the direct reason coincides with the full instance.
Example from a Real Dataset
For this example, we take the compas.csv dataset. We create one model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance.
from pyxai import Learning, Explaining
learner = Learning.Xgboost("../../../dataset/compas.csv", problem_type='classification')
model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.BT)
instance, prediction = learner.get_instances(model, n=1, is_correct=True)
-------------- Information ---------------
Problem type: classification
Instances type: tabular
Labels type: classes
Dataset path: ../../../dataset/compas.csv
nFeatures (nAttributes, with the labels): 11
nInstances (nObservations): 6172
nLabels: 2
--------------- Model creation, fitting and evaluation ---------------
Splitting method: hold-out
Problem type: classification
Models type: boosted-tree
model_parameters: {}
--------- Evaluation Information ---------
For the evaluation number 0:
Metrics:
sklearn_confusion_matrix: [[653, 206], [295, 389]]
precision: 65.3781512605042
recall: 56.87134502923976
f1_score: 60.82877247849882
specificity: 76.0186263096624
true_positive: 389
true_negative: 653
false_positive: 206
false_negative: 295
accuracy: 67.53078418664938
Number of Training instances: 4629
Number of Testing instances: 1543
--------------- Explainer ----------------
For the split number 0:
**Boosted Tree model**
NClasses: 2
nTrees: 100
nVariables: 38
--------------- Instances ----------------
Number of instances selected: 1
----------------------------------------------
Finally, we display the direct reason for this instance:
explainer = Explaining.initialize(model, instance)
print("instance:", instance)
print("prediction:", prediction)
print()
direct_reason = explainer.direct_reason()
print("len binary representation:", len(explainer.binary_representation))
print("len direct:", len(direct_reason))
print("is_reason:", explainer.is_implicant(direct_reason))
print("to_features:", explainer.to_features(direct_reason))
instance: Misdemeanor 0
Number_of_Priors 0
score_factor 0
Age_Above_FourtyFive 1
Age_Below_TwentyFive 0
African_American 0
Asian 0
Hispanic 0
Native_American 0
Other 1
Female 0
Name: 0, dtype: int64
prediction: 0
len binary representation: 38
len direct: 36
is_reason: True
to_features: ['Misdemeanor < 1.0', 'Number_of_Priors < 1.0', 'score_factor < 1.0', 'Age_Above_FourtyFive >= 1.0', 'Age_Below_TwentyFive < 1.0', 'African_American < 1.0', 'Asian < 1.0', 'Hispanic < 1.0', 'Native_American < 1.0', 'Other >= 1.0', 'Female < 1.0']
We can remark that this direct reason contains 38 binary variables of the implicant out of 42. This reason explains why the model predicts for this instance. But this is probably not the most compact reason for this instance, we invite you to look at the other types of reasons presented on the Explanations Computation page.