Link Search Menu Expand Document
PyXAI
Papers Video GitHub In-the-Loop EXPEKCTATION Release Notes About
download notebook

Sufficient Reasons

Let $f$ be a Boolean function represented by a decision tree $T$, $x$ be an instance and $p$ be the prediction of $T$ on $x$ ($f(x) = p$), a sufficient reason for $x$ is a term of the binary representation of the instance that is a prime implicant of $f$ that covers $x$.

In other words, a sufficient reason for an instance $x$ given a class described by a Boolean function $f$ is a subset $t$ of the characteristics of $x$ that is minimal w.r.t. set inclusion and such that any instance $x’$ sharing this set $t$ of characteristics is classified by $f$ as $x$ is.

The function ExplainerDT.sufficient_reason allows computing this kind of explanation.

The library provides a way to check that a reason is sufficient using the function is_sufficient_reason.

Minimal Sufficient Reason

A sufficient reason is minimal w.r.t. set inclusion, i.e. there is no subset of this reason which is also a sufficient reason. A minimal sufficient reason for $x$ is a sufficient reason for $x$ that contains a minimal number of literals. In other words, a minimal sufficient reason has a minimal size.

The function ExplainerDT.minimal_sufficient_reason allows computing this kind of explanation.

Preferences over Sufficient Reasons

One can also compute preferred sufficient reasons. Indeed, the user may prefer reason containing some features and can provide weights in order to discriminate some features. Please take a look to the Preferences page for more information.

The function preferred_sufficient_reason allows computing this kind of explanation.

Other methods

Reminder that the literals of a binary representation represent the conditions “<id_feature> <operator> <threshold> ?” (such as “$x_4 \ge 0.5$ ?”) implied by an instance. A literal $l$ of a binary representation is a necessary feature for $x$ if and only if $l$ belongs to every sufficient reason $t$ for $x$. In contrast, a literal $l$ of a binary representation is a relevant feature for $x$ if and only if $l$ belongs to at least one sufficient reason $t$ for $x$.

PyXAI provides methods to compute them :

For a given instance, it can be interesting to compute the number of sufficient reasons or the number of sufficient reasons per literal of the binary representation. PyXAI allows this:

More information about sufficient reasons and minimal sufficient reasons can be found in the paper On the Explanatory Power of Decision Trees. The basic methods (initialize, set_instance, to_features, is_reason, …) of the Explainer module used in the next examples are described in the Explainer Principles page.

Example from a Hand-Crafted Tree

For this example, we take the Decision Tree of the Building Models page consisting of $4$ binary features ($x_1$, $x_2$, $x_3$ and $x_4$).

The following figure shows in red and bold a minimal sufficient reason $(x_1, x_4)$ for the instance $(1,1,1,1)$. DTbuilder

The next figure gives in blue and bold a minimal sufficient reason $(-x_4)$ for the instance $(0,0,0,0)$. DTbuilder

We now show how to get those reasons with PyXAI. We start by building the decision tree:

from pyxai import Builder, Explaining

node_x4_1 = Builder.DecisionNode(4, left=0, right=1)
node_x4_2 = Builder.DecisionNode(4, left=0, right=1)
node_x4_3 = Builder.DecisionNode(4, left=0, right=1)
node_x4_4 = Builder.DecisionNode(4, left=0, right=1)
node_x4_5 = Builder.DecisionNode(4, left=0, right=1)

node_x3_1 = Builder.DecisionNode(3, left=0, right=node_x4_1)
node_x3_2 = Builder.DecisionNode(3, left=node_x4_2, right=node_x4_3)
node_x3_3 = Builder.DecisionNode(3, left=node_x4_4, right=node_x4_5)

node_x2_1 = Builder.DecisionNode(2, left=0, right=node_x3_1)
node_x2_2 = Builder.DecisionNode(2, left=node_x3_2, right=node_x3_3)

node_x1_1 = Builder.DecisionNode(1, left=node_x2_1, right=node_x2_2)

tree = Builder.DecisionTree(4, node_x1_1, force_features_equal_to_binaries=True)

And we compute the sufficient reasons for each of these two instances:

explainer = Explaining.initialize(tree)
explainer.set_instance((1,1,1,1))

sufficient_reasons = explainer.sufficient_reason(n=Explaining.ALL)
print("sufficient_reasons:", sufficient_reasons)
assert sufficient_reasons == ((1, 4), (2, 3, 4)), "The sufficient reasons are not good !"

for sufficient in sufficient_reasons:
  print("to_features:", explainer.to_features(sufficient))  
  assert explainer.is_sufficient_reason(sufficient), "This is have to be a sufficient reason !"

minimals = explainer.minimal_sufficient_reason()
print("minimal_sufficient_reason:", minimals)
assert minimals == (1, 4), "The minimal sufficient reasons are not good !"

print("-------------------------------")

explainer.set_instance((0,0,0,0))

sufficient_reasons = explainer.sufficient_reason(n=Explaining.ALL)
print("sufficient_reasons:", sufficient_reasons)
assert sufficient_reasons == ((-4,), (-1, -2), (-1, -3)), "The sufficient reasons are not good !"

for sufficient in sufficient_reasons:
  print("to_features:", explainer.to_features(sufficient))
  assert explainer.is_sufficient_reason(sufficient), "This is have to be a sufficient reason !"

minimals = explainer.minimal_sufficient_reason(n=1)
print("minimal_sufficient_reasons:", minimals)
assert minimals == (-4,), "The minimal sufficient reasons are not good !"
sufficient_reasons: ((1, 4), (2, 3, 4))
to_features: ['f1 >= 0.5', 'f4 >= 0.5']
to_features: ['f2 >= 0.5', 'f3 >= 0.5', 'f4 >= 0.5']
[['v', '1', '-2', '-3', '']]
minimal_sufficient_reason: (1, 4)
-------------------------------
sufficient_reasons: ((-4,), (-1, -2), (-1, -3))
to_features: ['f4 < 0.5']
to_features: ['f1 < 0.5', 'f2 < 0.5']
to_features: ['f1 < 0.5', 'f3 < 0.5']
[['v', '-1', '2', '-3', '-4', '5', '-6', '-7', '']]
minimal_sufficient_reasons: (-4,)

Example from a Real Dataset

For this example, we take the compas.csv dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance.

from pyxai import Learning, Explaining

learner = Learning.Scikitlearn("../../../dataset/compas.csv", problem_type='classification')
model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.DT)
instance, prediction = learner.get_instances(model, n=1, is_correct=True)
--------------   Information   ---------------
Problem type: classification
Instances type: tabular
Labels type: classes

Dataset path: ../../../dataset/compas.csv
nFeatures (nAttributes, with the labels): 11
nInstances (nObservations): 6172
nLabels: 2
---------------   Model creation, fitting and evaluation  ---------------
Splitting method: hold-out
Problem type: classification
Models type: decision-tree
model_parameters: {}
---------   Evaluation Information   ---------
For the evaluation number 0:
Metrics:
   sklearn_confusion_matrix: [[631, 212], [326, 374]]
   precision: 63.82252559726962
   recall: 53.42857142857142
   f1_score: 58.16485225505443
   specificity: 74.85172004744959
   true_positive: 374
   true_negative: 631
   false_positive: 212
   false_negative: 326
   accuracy: 65.13285806869735
Number of Training instances: 4629
Number of Testing instances: 1543

---------------   Explainer   ----------------
For the split number 0:
**Decision Tree Model**
nFeatures: 11
nNodes: 574
nVariables: 48

---------------   Instances   ----------------
Number of instances selected: 1
----------------------------------------------

And we compute a sufficient reason for this instance:

explainer = Explaining.initialize(model, instance)
print("instance:", instance)
print("prediction:", prediction)
print()
sufficient_reason = explainer.sufficient_reason(n=1)
#for s in sufficient_reasons:
print("\nsufficient reason:", len(sufficient_reason))
print("to features", explainer.to_features(sufficient_reason))
print("is sufficient_reason (for max 50 checks): ", explainer.is_sufficient_reason(sufficient_reason, n_samples=50))
print()
minimal = explainer.minimal_sufficient_reason()
print("\nminimal:", len(minimal))
print("is sufficient_reason (for max 50 checks): ", explainer.is_sufficient_reason(sufficient_reason, n_samples=50))
print()
print("\nnecessary literals: ", explainer.necessary_literals())
print("\nnecessary literals features: ", explainer.to_features(explainer.necessary_literals()))
print("\nrelevant literals: ", explainer.relevant_literals())
print()
print("n sufficient reasons:", explainer.n_sufficient_reasons())
sufficient_reasons_per_attribute = explainer.n_sufficient_reasons_per_attribute()
print("\nsufficient_reasons_per_attribute:", sufficient_reasons_per_attribute)
print("\nsufficient_reasons_per_attribute features:", explainer.to_features(sufficient_reasons_per_attribute, details=True))

instance: Misdemeanor             0
Number_of_Priors        0
score_factor            0
Age_Above_FourtyFive    1
Age_Below_TwentyFive    0
African_American        0
Asian                   0
Hispanic                0
Native_American         0
Other                   1
Female                  0
Name: 0, dtype: int64
prediction: 0


sufficient reason: 4
to features ['Misdemeanor <= 0.5', 'Number_of_Priors <= 0.5', 'score_factor <= 0.5']
is sufficient_reason (for max 50 checks):  True

[['v', '1', '-2', '-3', '4', '5', '-6', '-7', '-8', '-9', '-10', '-11', '-12', '-13', '-14', '15', '-16', '-17', '-18', '-19', '-20', '21', '22', '23', '24', '25', '-26', '-27', '-28', '-29', '-30', '-31', '-32', '-33', '-34', '-35', '-36', '-37', '-38', '-39', '']]

minimal: 4
is sufficient_reason (for max 50 checks):  True


necessary literals:  [-1]

necessary literals features:  ['score_factor <= 0.5']

relevant literals:  [-5, -6, -3, -11, -2, 4, -18, -13, 7, -8, -9, -12, -15, -31]

n sufficient reasons: 15

sufficient_reasons_per_attribute: {-1: 15, -5: 8, -6: 7, -3: 12, -11: 5, -2: 5, 4: 10, -18: 10, -13: 10, 7: 5, -8: 7, -9: 5, -12: 4, -15: 1, -31: 1}

sufficient_reasons_per_attribute features: OrderedDict({'Misdemeanor': [{'id': 1, 'name': 'Misdemeanor', 'operator_sign_considered': <OperatorCondition.LE: 'LE'>, 'threshold': np.float64(0.5), 'weight': 5, 'string': 'Misdemeanor <= 0.5'}], 'Number_of_Priors': [{'id': 2, 'name': 'Number_of_Priors', 'operator_sign_considered': <OperatorCondition.LE: 'LE'>, 'threshold': np.float64(0.5), 'weight': 8, 'string': 'Number_of_Priors <= 0.5'}], 'score_factor': [{'id': 3, 'name': 'score_factor', 'operator_sign_considered': <OperatorCondition.LE: 'LE'>, 'threshold': np.float64(0.5), 'weight': 15, 'string': 'score_factor <= 0.5'}], 'Age_Above_FourtyFive': [{'id': 4, 'name': 'Age_Above_FourtyFive', 'operator_sign_considered': <OperatorCondition.GT: 'GT'>, 'threshold': np.float64(0.5), 'weight': 10, 'string': 'Age_Above_FourtyFive > 0.5'}], 'Age_Below_TwentyFive': [{'id': 5, 'name': 'Age_Below_TwentyFive', 'operator_sign_considered': <OperatorCondition.LE: 'LE'>, 'threshold': np.float64(0.5), 'weight': 12, 'string': 'Age_Below_TwentyFive <= 0.5'}], 'African_American': [{'id': 6, 'name': 'African_American', 'operator_sign_considered': <OperatorCondition.LE: 'LE'>, 'threshold': np.float64(0.5), 'weight': 4, 'string': 'African_American <= 0.5'}], 'Asian': [{'id': 7, 'name': 'Asian', 'operator_sign_considered': <OperatorCondition.LE: 'LE'>, 'threshold': np.float64(0.5), 'weight': 7, 'string': 'Asian <= 0.5'}], 'Hispanic': [{'id': 8, 'name': 'Hispanic', 'operator_sign_considered': <OperatorCondition.LE: 'LE'>, 'threshold': np.float64(0.5), 'weight': 7, 'string': 'Hispanic <= 0.5'}], 'Other': [{'id': 10, 'name': 'Other', 'operator_sign_considered': <OperatorCondition.GT: 'GT'>, 'threshold': np.float64(0.5), 'weight': 5, 'string': 'Other > 0.5'}], 'Female': [{'id': 11, 'name': 'Female', 'operator_sign_considered': <OperatorCondition.LE: 'LE'>, 'threshold': np.float64(0.5), 'weight': 5, 'string': 'Female <= 0.5'}]})

Other types of explanations are presented in the Explanations Computation page.

See Also