PyXAI

Direct Reason

Let $RF=${$T_1, \ldots, T_n$} be a random forest and $x$ be an instance, the direct reason for $x$ is a term of the binary representation corresponding to the conjunction of the terms corresponding to the root-to-leaf paths of all $T_i$ that are compatible with $x$. Due to its simplicity, it is an abductive explanation that is easy to compute, but it can be redundant. More information about the direct reason can be found in the paper Trading Complexity for Sparsity in Random Forest Explanations.

<Explainer Object>.direct_reason():
Returns the unique direct reason for the current instance. Return `None` if this reason contains some excluded features. This reason is in the form of binary variables, you must use the `to_features` method if you want it in the form of features.

The basic methods (initialize, set_instance, to_features, is_reason, …) of the Explainer module used in the next examples are described in the Explainer Principles page.

Example from Hand-Crafted Trees

For this example, we take the Decision Tree of the Building Models page.

RFdirect

This figure represents a Random Forest with 3 Decision Trees using $4$ binary features ($x_1$, $x_2$, $x_3$ and $x_4$). The direct reason for the instance $(1,1,1,1)$ is in red and the one for $(0,0,0,0)$ is in blue. Now, we show how to get them with PyXAI. We start by building the random forest:

from pyxai import Builder, Explainer

nodeT1_1 = Builder.DecisionNode(1, left=0, right=1)
nodeT1_3 = Builder.DecisionNode(3, left=0, right=nodeT1_1)
nodeT1_2 = Builder.DecisionNode(2, left=1, right=nodeT1_3)
nodeT1_4 = Builder.DecisionNode(4, left=0, right=nodeT1_2)
tree1 = Builder.DecisionTree(4, nodeT1_4, force_features_equal_to_binaries=True)

nodeT2_4 = Builder.DecisionNode(4, left=0, right=1)
nodeT2_1 = Builder.DecisionNode(1, left=0, right=nodeT2_4)
nodeT2_2 = Builder.DecisionNode(2, left=nodeT2_1, right=1)
tree2 = Builder.DecisionTree(4, nodeT2_2, force_features_equal_to_binaries=True) #4 features but only 3 used

nodeT3_1_1 = Builder.DecisionNode(1, left=0, right=1)
nodeT3_1_2 = Builder.DecisionNode(1, left=0, right=1)
nodeT3_4_1 = Builder.DecisionNode(4, left=0, right=nodeT3_1_1)
nodeT3_4_2 = Builder.DecisionNode(4, left=0, right=1)
nodeT3_2_1 = Builder.DecisionNode(2, left=nodeT3_1_2, right=nodeT3_4_1)
nodeT3_2_2 = Builder.DecisionNode(2, left=0, right=nodeT3_4_2)
nodeT3_3_1 = Builder.DecisionNode(3, left=nodeT3_2_1, right=nodeT3_2_2)
tree3 = Builder.DecisionTree(4, nodeT3_3_1, force_features_equal_to_binaries=True)

forest = Builder.RandomForest([tree1, tree2, tree3], n_classes=2)

We compute the direct reasons for these two instances:

explainer = Explainer.initialize(forest)
explainer.set_instance((1,1,1,1))
direct = explainer.direct_reason()
print("instance: (1,1,1,1)")
print("binary representation:", explainer.binary_representation)
print("target_prediction:", explainer.target_prediction)
print("direct:", direct)
print("to_features:", explainer.to_features(direct))
print("------------------------------------------------")
explainer.set_instance((0,0,0,0))
direct = explainer.direct_reason()
print("instance: (0,0,0,0)")
print("binary representation:", explainer.binary_representation)
print("target_prediction:", explainer.target_prediction)
print("direct:", direct)
print("to_features:", explainer.to_features(direct))

instance: (1,1,1,1)
binary representation: (1, 2, 3, 4)
target_prediction: 1
direct: (1, 2, 3, 4)
to_features: ('f1 >= 0.5', 'f2 >= 0.5', 'f3 >= 0.5', 'f4 >= 0.5')
------------------------------------------------
instance: (0,0,0,0)
binary representation: (-1, -2, -3, -4)
target_prediction: 0
direct: (-1, -2, -3, -4)
to_features: ('f1 < 0.5', 'f2 < 0.5', 'f3 < 0.5', 'f4 < 0.5')

As you can see, in this case, the direct reason corresonds to the full instance.

Example from a Real Dataset

For this example, we take the compas dataset. We create one model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance.

from pyxai import Learning, Explainer

learner = Learning.Scikitlearn("../../../dataset/compas.csv", learner_type=Learning.CLASSIFICATION)
model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.RF)
instance, prediction = learner.get_instances(model, n=1, correct=True)

data:
      Number_of_Priors  score_factor  Age_Above_FourtyFive   
0                    0             0                     1  \
1                    0             0                     0   
2                    4             0                     0   
3                    0             0                     0   
4                   14             1                     0   
...                ...           ...                   ...   
6167                 0             1                     0   
6168                 0             0                     0   
6169                 0             0                     1   
6170                 3             0                     0   
6171                 2             0                     0   

      Age_Below_TwentyFive  African_American  Asian  Hispanic   
0                        0                 0      0         0  \
1                        0                 1      0         0   
2                        1                 1      0         0   
3                        0                 0      0         0   
4                        0                 0      0         0   
...                    ...               ...    ...       ...   
6167                     1                 1      0         0   
6168                     1                 1      0         0   
6169                     0                 0      0         0   
6170                     0                 1      0         0   
6171                     1                 0      0         1   

      Native_American  Other  Female  Misdemeanor  Two_yr_Recidivism  
0                   0      1       0            0                  0  
1                   0      0       0            0                  1  
2                   0      0       0            0                  1  
3                   0      1       0            1                  0  
4                   0      0       0            0                  1  
...               ...    ...     ...          ...                ...  
6167                0      0       0            0                  0  
6168                0      0       0            0                  0  
6169                0      1       0            0                  0  
6170                0      0       1            1                  0  
6171                0      0       1            0                  1  

[6172 rows x 12 columns]
--------------   Information   ---------------
Dataset name: ../../../dataset/compas.csv
nFeatures (nAttributes, with the labels): 12
nInstances (nObservations): 6172
nLabels: 2
---------------   Evaluation   ---------------
method: HoldOut
output: RF
learner_type: Classification
learner_options: {'max_depth': None, 'random_state': 0}
---------   Evaluation Information   ---------
For the evaluation number 0:
metrics:
   accuracy: 65.71274298056156
nTraining instances: 4320
nTest instances: 1852

---------------   Explainer   ----------------
For the evaluation number 0:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 68

---------------   Instances   ----------------
number of instances selected: 1
----------------------------------------------

We display the direct reason for this instance:

explainer = Explainer.initialize(model, instance)
print("instance:", instance)
print("prediction:", prediction)
print()
direct_reason = explainer.direct_reason()
print("len binary representation:", len(explainer.binary_representation))
print("len direct:", len(direct_reason))
print("is_reason:", explainer.is_reason(direct_reason))
print("to_features:", explainer.to_features(direct_reason))

instance: [0 0 1 0 0 0 0 0 1 0 0]
prediction: 0

len binary representation: 68
len direct: 22
is_reason: True
to_features: ('Number_of_Priors <= 0.5', 'score_factor <= 0.5', 'Age_Above_FourtyFive > 0.5', 'Age_Below_TwentyFive <= 0.5', 'African_American <= 0.5', 'Asian <= 0.5', 'Hispanic <= 0.5', 'Native_American <= 0.5', 'Other > 0.5', 'Female <= 0.5', 'Misdemeanor <= 0.5')

We can remark that this direct reason contains 22 binary variables out of the 68 variables in the binary representation corresponding to the instance. This reason explains why the model predicts $0$ for this instance. But this is probably not the most compact abductive explanation for it. We invite you to look at the other types of reasons presented on the Explanations Computation page.