Papers Video GitHub In-the-Loop EXPEKCTATION Release Notes About

Sufficient Reasons

Let $f$ be a Boolean function represented by a random forest $RF$, $x$ be an instance and $1$ (resp. $0$) the prediction of $RF$ on $x$ ($f(x) = 1$ (resp 0)), a sufficient reason for $x$ is a term of the binary representation of the instance that is a prime implicant of $f$ (resp $\neg f$) that covers $x$.

In other words, a sufficient reason for an instance $x$ given a class described by a Boolean function $f$ is a subset $t$ of the characteristics of $x$ that is minimal w.r.t. set inclusion,and such that any instance $x’$ sharing this set t of characteristics is classified by $f$ as $x$ is.

More information about sufficient reasons can be found in the paper Trading Complexity for Sparsity in Random Forest Explanations.

The function ExplainerRF.sufficient_reason allows computing this kind of explanation.

The library also provides a way to check that a reason is sufficient using the function is_sufficient_reason.

A sufficient reason is minimal w.r.t. set inclusion, i.e., there is no subset of this reason which is also a sufficient reason. A minimal sufficient reason for $x$ is a sufficient reason for $x$ that contains a minimal number of literals. In other words, a minimal sufficient reason has a minimal size.

The function ExplainerRF.minimal_sufficient_reason allows computing this kind of explanation.

The PyXAI library also provides a way to verify that a reason is sufficient:

Unfortunately, searching for MUS or even more a minimal MUS is a difficult computational task. If the dataset contains a lot of features or if the binary representation of the instance contains many binary variables, finding a MUS may be out of reach. In order to deal with this problem we introduced the notion of Majoritary Reason which is an abductive explanation much easier to compute.

Example from Hand-Crafted Trees

For this example, we take the random forest of the Building Models page consisting of $4$ binary features ($x_1$, $x_2$, $x_3$ and $x_4$).

The following figure shows in red and bold a minimal sufficient reason $(x_1, x_4)$ for the instance $(1,1,1,1)$. RFsufficient1

As you can see in the figure, some leaves of this sufficient reason (in red) can have a prediction equal to 0 or 1. These are the predictions from the trees ($T_1$, $T_2$ and $T_3$), but not from the random forest. We need to calculate for each possible interpretation arising from this reason the prediction $f$ from the random forest:

$x_1$	$x_2$	$x_3$	$x_4$	$T_1$	$T_2$	$T_3$	$f$
1	0	0	1	1	1	1	1
1	0	1	1	1	1	0	1
1	1	0	1	0	1	1	1
1	1	1	1	1	1	1	1

As at least 2 trees out of 3 give the right prediction (1), $(x_1, x_4)$ is indeed a sufficient reason.

The next figure shows in blue and bold a minimal sufficient reason $(-x_4)$ for the instance $(0,1,0,0)$. RFsufficient2

As before, we compute the predictions associated with this reason:

$x_1$	$x_2$	$x_3$	$T_2$	$T_3$
0	0	0	0	0
0	0	1	0	0
0	1	0	1	0
0	1	1	1	0
1	0	0	0	1
1	0	1	0	0
1	1	0	1	0
1	1	1	1	0

As at least 2 trees out of 3 have the right prediction (0), $(-x_4)$ is indeed a sufficient reason.

Now, we show how to get them with PyXAI. We start by building the random forest:

from pyxai import Builder, Explaining

nodeT1_1 = Builder.DecisionNode(1, left=0, right=1)
nodeT1_3 = Builder.DecisionNode(3, left=0, right=nodeT1_1)
nodeT1_2 = Builder.DecisionNode(2, left=1, right=nodeT1_3)
nodeT1_4 = Builder.DecisionNode(4, left=0, right=nodeT1_2)

tree1 = Builder.DecisionTree(4, nodeT1_4, force_features_equal_to_binaries=True)

nodeT2_4 = Builder.DecisionNode(4, left=0, right=1)
nodeT2_1 = Builder.DecisionNode(1, left=0, right=nodeT2_4)
nodeT2_2 = Builder.DecisionNode(2, left=nodeT2_1, right=1)

tree2 = Builder.DecisionTree(4, nodeT2_2, force_features_equal_to_binaries=True) #4 features but only 3 used

nodeT3_1_1 = Builder.DecisionNode(1, left=0, right=1)
nodeT3_1_2 = Builder.DecisionNode(1, left=0, right=1)
nodeT3_4_1 = Builder.DecisionNode(4, left=0, right=nodeT3_1_1)
nodeT3_4_2 = Builder.DecisionNode(4, left=0, right=1)

nodeT3_2_1 = Builder.DecisionNode(2, left=nodeT3_1_2, right=nodeT3_4_1)
nodeT3_2_2 = Builder.DecisionNode(2, left=0, right=nodeT3_4_2)

nodeT3_3_1 = Builder.DecisionNode(3, left=nodeT3_2_1, right=nodeT3_2_2)

tree3 = Builder.DecisionTree(4, nodeT3_3_1, force_features_equal_to_binaries=True)
forest = Builder.RandomForest([tree1, tree2, tree3], n_classes=2)

Then we compute a sufficient reasons for each of these two instances:

explainer = Explaining.initialize(forest)
explainer.set_instance((1,1,1,1))

sufficient = explainer.sufficient_reason()
assert explainer.is_sufficient_reason(sufficient)
assert sufficient == (1, 4), "The sufficient reason is not good !"

minimal = explainer.minimal_sufficient_reason()
print("minimal:", minimal)
assert minimal == (1, 4), "The minimal sufficient reason is not good !"

print("-------------------------------")
instance = (0,1,0,0)
explainer.set_instance(instance)
print("target_prediction:", explainer.target_prediction)

sufficient = explainer.sufficient_reason()
print("sufficient:", sufficient)
assert sufficient == (-1, -3), "The sufficient reason is not good !"

minimal = explainer.minimal_sufficient_reason()
print("minimal:", minimal)
assert minimal == (-4, ), "The minimal sufficient reason is not good !" 

minimal: (1, 4)
-------------------------------
target_prediction: 0
sufficient: (-1, -3)
minimal: (-4,)

Example from a Real dataset

For this example, we take the compas dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance.

from pyxai import Learning, Explaining

learner = Learning.Scikitlearn("../../../dataset/compas.csv", problem_type='classification')
model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.RF)
instance, prediction = learner.get_instances(model, n=1, is_correct=True)

--------------   Information   ---------------
Problem type: classification
Instances type: tabular
Labels type: classes

Dataset path: ../../../dataset/compas.csv
nFeatures (nAttributes, with the labels): 11
nInstances (nObservations): 6172
nLabels: 2
---------------   Model creation, fitting and evaluation  ---------------
Splitting method: hold-out
Problem type: classification
Models type: random-forest
model_parameters: {}
---------   Evaluation Information   ---------
For the evaluation number 0:
Metrics:
   sklearn_confusion_matrix: [[612, 211], [295, 425]]
   precision: 66.82389937106919
   recall: 59.02777777777778
   f1_score: 62.68436578171092
   specificity: 74.36208991494532
   true_positive: 425
   true_negative: 612
   false_positive: 211
   false_negative: 295
   accuracy: 67.20674011665587
Number of Training instances: 4629
Number of Testing instances: 1543

---------------   Explainer   ----------------
For the split number 0:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 70

---------------   Instances   ----------------
Number of instances selected: 1
----------------------------------------------

This dataset is not very large and the computation of a sufficient reason is quite easy, but it is not so easy to derive a minimal one. Since the related solver (Optux) does not propose a time limit mode we commented the related code.

explainer = Explaining.initialize(model, instance)
print("instance:", instance)
print("prediction:", prediction)
print()
sufficient_reason = explainer.sufficient_reason()
print("\nsufficient reason:", sufficient_reason)
print("to features", explainer.to_features(sufficient_reason))
print("is sufficient_reason (for max 50 checks): ", explainer.is_sufficient_reason(sufficient_reason, n_samples=50))
print()

instance: Misdemeanor             0
Number_of_Priors        0
score_factor            0
Age_Above_FourtyFive    1
Age_Below_TwentyFive    0
African_American        0
Asian                   0
Hispanic                0
Native_American         0
Other                   1
Female                  0
Name: 0, dtype: int64
prediction: 0


sufficient reason: (-1, -2, -3, -4, -6, -11, -12)
to features ['Number_of_Priors <= 0.5', 'score_factor <= 0.5', 'African_American <= 0.5', 'Hispanic <= 0.5', 'Female <= 0.5']
is sufficient_reason (for max 50 checks):  None

$x_1$	$x_2$	$x_3$	$T_2$	$T_3$
0	0	0	0	0
0	0	1	0	0
0	1	0	1	0
0	1	1	1	0
1	0	0	0	1
1	0	1	0	0
1	1	0	1	0
1	1	1	1	0

$x_1$	$x_2$	$x_3$	$T_2$	$T_3$
0	0	0	0	0
0	0	1	0	0
0	1	0	1	0
0	1	1	1	0
1	0	0	0	1
1	0	1	0	0
1	1	0	1	0
1	1	1	1	0

$x_1$	$x_2$	$x_3$	$T_2$	$T_3$
0	0	0	0	0
0	0	1	0	0
0	1	0	1	0
0	1	1	1	0
1	0	0	0	1
1	0	1	0	0
1	1	0	1	0
1	1	1	1	0