Link Search Menu Expand Document
PyXAI
Papers Video GitHub In-the-Loop EXPEKCTATION Release Notes About
download notebook

Contrastive Reasons

Currently, contrastives reasons for Boosted trees are only available for binary classification.

Unlike abductive explanations that explain why an instance $x$ is classified as belonging to a given class, the contrastive explanations explains why $x$ has not been classified by the ML model as expected.

Let 𝑓 be a Boolean function represented by a random forest 𝑅𝐹, 𝑥 be an instance and 1 (resp. 0) the prediction of 𝑅𝐹 on 𝑥 (𝑓(𝑥)=1 (resp $f(x)=0$)), a contrastive reason for $x$ is a term $t$ such that:

  • $t \subseteq t_{x}$, $t_{x} \setminus t$ is not an implicant of $f;$
  • for every $\ell \in t$, $t \setminus {\ell}$ does not satisfy this previous condition (i.e., $t$ is minimal w.r.t. set inclusion).

Formally, a contrastive reason for $x$ is a subset $t$ of the characteristics of $x$ that is minimal w.r.t. set inclusion among those such that at least one instance $x’$ that coincides with $x$ except on the characteristics from $t$ is not classified by the decision tree as $x$ is. Stated otherwise, a contrastive reason represents adjustments of the features that we have to do to change the prediction for an instance.

A contrastive reason is minimal w.r.t. set inclusion, i.e. there is no subset of this reason which is also a contrastive reason. A minimal contrastive reason for $x$ is a contrastive reason for $x$ that contains a minimal number of literals. In other words, a minimal contrastive reason has a minimal size.

The function contrastive_reason allows computing this kind of explanation.

The library also provides a way to check that a reason is contrastive using the function is_contrastive_reason.

Example from Building Trees

We show now how to get them with PyXAI. We start by building the Boosted Tree:

from pyxai import Builder, Explaining

node1_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.3)
node1_2 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=-0.3, right=node1_1)
node1_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=0.4, right=node1_2)
node1_4 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=node1_3)
tree1 = Builder.DecisionTree(4, node1_4)

node2_1 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.4, right=0.3)
node2_2 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=node2_1)
node2_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node2_2, right=0.5)
tree2 = Builder.DecisionTree(4, node2_3)

node3_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=0.2, right=0.3)
node3_2_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.2)
node3_2_2 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.1, right=node3_1)
node3_2_3 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=0.1)
node3_3_1 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node3_2_1, right=node3_2_2)
node3_3_2 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=-0.4, right=node3_2_3)
node3_4 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=node3_3_1, right=node3_3_2)
tree3 = Builder.DecisionTree(4, node3_4)

BT = Builder.BoostedTrees([tree1, tree2, tree3], n_classes=2)

We compute a contrastive reason for these two instances:

explainer = Explaining.initialize(BT)
explainer.set_instance((4,3,2,1))

contrastive_reason = explainer.minimal_contrastive_reason()
print("target_prediction:", explainer.target_prediction)
print("minimal contrastive reason:", explainer.to_features(contrastive_reason))
assert explainer.is_contrastive_reason(contrastive_reason), "It is not a contrastive reason !"
target_prediction: 1
minimal contrastive reason: ['f4 == 1']
# We can create a contrastive instance
explainer.set_instance((4,3,2,0))
print("target_prediction:", explainer.target_prediction)
# The prediction is 0 now.
target_prediction: 0

Example from Real Dataset

For this example, we take the compas.csv known to be biased. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a wrong classified instance.

from pyxai import Learning, Explaining

learner = Learning.Xgboost("../../../dataset/compas.csv", problem_type='classification')
model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.BT)
instance, prediction = learner.get_instances(model, n=1, is_correct=False)


--------------   Information   ---------------
Problem type: classification
Instances type: tabular
Labels type: classes

Dataset path: ../../../dataset/compas.csv
nFeatures (nAttributes, with the labels): 11
nInstances (nObservations): 6172
nLabels: 2
---------------   Model creation, fitting and evaluation  ---------------
Splitting method: hold-out
Problem type: classification
Models type: boosted-tree
model_parameters: {}
---------   Evaluation Information   ---------
For the evaluation number 0:
Metrics:
   sklearn_confusion_matrix: [[624, 209], [294, 416]]
   precision: 66.56
   recall: 58.59154929577465
   f1_score: 62.32209737827716
   specificity: 74.90996398559425
   true_positive: 416
   true_negative: 624
   false_positive: 209
   false_negative: 294
   accuracy: 67.40116655865198
Number of Training instances: 4629
Number of Testing instances: 1543

---------------   Explainer   ----------------
For the split number 0:
**Boosted Tree model**
NClasses: 2
nTrees: 100
nVariables: 38

---------------   Instances   ----------------
Number of instances selected: 1
----------------------------------------------

We compute a contrastive instance (with a time limit equal to 5 seconds). In order to compute contrastive reasons, it is better to activate the theory related to the type of features (see this page).

compas_types = {
    "numerical": ["Number_of_Priors"],
    "binary": ["Misdemeanor", "score_factor", "Female"],
    "categorical": {"{African_American,Asian,Hispanic,Native_American,Other}": ["African_American", "Asian", "Hispanic", "Native_American", "Other"],
                    "Age*": ["Above_FourtyFive", "Below_TwentyFive"]}
}



explainer = Explaining.initialize(model, instance, features_type=compas_types)

contrastive_reason = explainer.minimal_contrastive_reason(time_limit=5)
print("instance: ", instance)
print("target_prediction:", explainer.target_prediction)
print("minimal contrastive reason:", explainer.to_features(contrastive_reason, contrastive=True))
print("is contrastive: ", explainer.is_contrastive_reason(contrastive_reason))
feature_names: ['Misdemeanor', 'Number_of_Priors', 'score_factor', 'Age_Above_FourtyFive', 'Age_Below_TwentyFive', 'African_American', 'Asian', 'Hispanic', 'Native_American', 'Other', 'Female']
---------   Theory Feature Types   -----------
Before the one-hot encoding of categorical features:
Numerical features: 1
Categorical features: 2
Binary features: 3
Number of features: 6
Characteristics of categorical features: {'African_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'African_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Asian': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Asian', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Hispanic': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Hispanic', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Native_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Native_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Other': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Other', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Age_Above_FourtyFive': ['Age', 'Above_FourtyFive', ['Above_FourtyFive', 'Below_TwentyFive']], 'Age_Below_TwentyFive': ['Age', 'Below_TwentyFive', ['Above_FourtyFive', 'Below_TwentyFive']]}

Number of used features in the model (before the encoding of categorical features): 6
Number of used features in the model (after the encoding of categorical features): 11
----------------------------------------------
instance:  Misdemeanor             0
Number_of_Priors        0
score_factor            0
Age_Above_FourtyFive    0
Age_Below_TwentyFive    0
African_American        1
Asian                   0
Hispanic                0
Native_American         0
Other                   0
Female                  0
Name: 1, dtype: int64
target_prediction: 0
minimal contrastive reason: ['Number_of_Priors < 28.0', '{African_American,Asian,Hispanic,Native_American,Other} = African_American']
is contrastive:  True

If one wants to change the classification, one needs to change the origin and the number of priors (to at least 4).

Other types of explanations are presented in the Explanations Computation page.