Link Search Menu Expand Document
PyXAI
Papers Video GitHub EXPEKCTATION About
download notebook

Contrastive Reasons

Currently, contrastives reasons for Boosted trees are only available for binary classification.

Unlike abductive explanations that explain why an instance $x$ is classified as belonging to a given class, the contrastive explanations explains why $x$ has not been classified by the ML model as expected.

Let 𝑓 be a Boolean function represented by a random forest 𝑅𝐹, 𝑥 be an instance and 1 (resp. 0) the prediction of 𝑅𝐹 on 𝑥 (𝑓(𝑥)=1 (resp $f(x)=0$)), a contrastive reason for $x$ is a term $t$ such that:

  • $t \subseteq t_{x}$, $t_{x} \setminus t$ is not an implicant of $f;$
  • for every $\ell \in t$, $t \setminus {\ell}$ does not satisfy this previous condition (i.e., $t$ is minimal w.r.t. set inclusion).

Formally, a contrastive reason for $x$ is a subset $t$ of the characteristics of $x$ that is minimal w.r.t. set inclusion among those such that at least one instance $x’$ that coincides with $x$ except on the characteristics from $t$ is not classified by the decision tree as $x$ is. Stated otherwhise, a contrastive reason represents adjustments of the features that we have to do to change the prediction for an instance.

A contrastive reason is minimal w.r.t. set inclusion, i.e. there is no subset of this reason which is also a contrastive reason. A minimal contrastive reason for $x$ is a contrastive reason for $x$ that contains a minimal number of literals. In other words, a minimal contrastive reason has a minimal size.

<ExplainerBT Object>.minimal_contrastive_reason(*, time_limit=None):
This method computes a minimal contrastive reasons using a MIP solver. Return one minimal contrastive reason of the current instance. Supports the excluded features. The reasons are in the form of binary variables, you must use the to_features method if you want a representation based on the features considered at start.
time_limit Integer None: The time limit of the method in seconds. Set this to None to give this process an infinite amount of time. Default value is None.

Example from Building Trees

We show now how to get them with PyXAI. We start by building the Boosted Tree:

from pyxai import Builder, Explainer

node1_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.3)
node1_2 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=-0.3, right=node1_1)
node1_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=0.4, right=node1_2)
node1_4 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=node1_3)
tree1 = Builder.DecisionTree(4, node1_4)

node2_1 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.4, right=0.3)
node2_2 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=node2_1)
node2_3 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node2_2, right=0.5)
tree2 = Builder.DecisionTree(4, node2_3)

node3_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=0.2, right=0.3)
node3_2_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2, left=-0.2, right=0.2)
node3_2_2 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.1, right=node3_1)
node3_2_3 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-0.5, right=0.1)
node3_3_1 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=node3_2_1, right=node3_2_2)
node3_3_2 = Builder.DecisionNode(2, operator=Builder.GT, threshold=1, left=-0.4, right=node3_2_3)
node3_4 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=node3_3_1, right=node3_3_2)
tree3 = Builder.DecisionTree(4, node3_4)

BT = Builder.BoostedTrees([tree1, tree2, tree3], n_classes=2)

We compute a contrastive reason for these two instances:

explainer = Explainer.initialize(BT)
explainer.set_instance((4,3,2,1))

contrastive_reason = explainer.minimal_contrastive_reason()
print("target_prediction:", explainer.target_prediction)
print("minimal contrastive reason:", explainer.to_features(contrastive_reason))
assert explainer.is_contrastive_reason(contrastive_reason), "It is not a contrastive reason !"
target_prediction: 1
minimal contrastive reason: ('f4 == 1',)
# We can create a contrastive instance
explainer.set_instance((4,3,2,0))
print("target_prediction:", explainer.target_prediction)
# The prediction is 0 now.
target_prediction: 0

Example from Real Dataset

For this example, we take the compas.csv known to be biased. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a wrong classified instance.

from pyxai import Learning, Explainer

learner = Learning.Xgboost("../../../dataset/compas.csv", learner_type=Learning.CLASSIFICATION)
model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.BT)
instance, prediction = learner.get_instances(model, n=1, correct=False)


data:
      Number_of_Priors  score_factor  Age_Above_FourtyFive  \
0                    0             0                     1   
1                    0             0                     0   
2                    4             0                     0   
3                    0             0                     0   
4                   14             1                     0   
...                ...           ...                   ...   
6167                 0             1                     0   
6168                 0             0                     0   
6169                 0             0                     1   
6170                 3             0                     0   
6171                 2             0                     0   

      Age_Below_TwentyFive  Origin_African_American  Origin_Asian  \
0                        0                        0             0   
1                        0                        1             0   
2                        1                        1             0   
3                        0                        0             0   
4                        0                        0             0   
...                    ...                      ...           ...   
6167                     1                        1             0   
6168                     1                        1             0   
6169                     0                        0             0   
6170                     0                        1             0   
6171                     1                        0             0   

      Origin_Hispanic  Origin_Native_American  Origin_Other  Female  \
0                   0                       0             1       0   
1                   0                       0             0       0   
2                   0                       0             0       0   
3                   0                       0             1       0   
4                   0                       0             0       0   
...               ...                     ...           ...     ...   
6167                0                       0             0       0   
6168                0                       0             0       0   
6169                0                       0             1       0   
6170                0                       0             0       1   
6171                1                       0             0       1   

      Misdemeanor  Two_yr_Recidivism  
0               0                  0  
1               0                  1  
2               0                  1  
3               1                  0  
4               0                  1  
...           ...                ...  
6167            0                  0  
6168            0                  0  
6169            0                  0  
6170            1                  0  
6171            0                  1  

[6172 rows x 12 columns]
--------------   Information   ---------------
Dataset name: ../../../dataset/compas.csv
nFeatures (nAttributes, with the labels): 12
nInstances (nObservations): 6172
nLabels: 2
---------------   Evaluation   ---------------
method: HoldOut
output: BT
learner_type: Classification
learner_options: {'seed': 0, 'max_depth': None, 'eval_metric': 'mlogloss'}
---------   Evaluation Information   ---------
For the evaluation number 0:
metrics:
   accuracy: 66.73866090712744
   precision: 66.66666666666666
   recall: 53.855278766310796
   f1_score: 59.580052493438316
   specificity: 77.50247770069376
   true_positive: 454
   true_negative: 782
   false_positive: 227
   false_negative: 389
   sklearn_confusion_matrix: [[782, 227], [389, 454]]
nTraining instances: 4320
nTest instances: 1852

---------------   Explainer   ----------------
For the evaluation number 0:
**Boosted Tree model**
NClasses: 2
nTrees: 100
nVariables: 42

---------------   Instances   ----------------
number of instances selected: 1
----------------------------------------------

We compute a contrastive instance (with a time limit equal to 5 seconds). In order to compute contrastive reasons, it is better to activate the theory related to the type of features (see this page).

types = {
        "numerical": ["Number_of_Priors"],
        "categorical": {"Origin_*":['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other'],
            "Age_*": ["Above_FourtyFive", "Below_TwentyFive"]},
        "binary": Learning.DEFAULT,
    }



explainer = Explainer.initialize(model, instance, features_type=types)

contrastive_reason = explainer.minimal_contrastive_reason(time_limit=5)
print("instance: ", instance)
print("target_prediction:", explainer.target_prediction)
print("minimal contrastive reason:", explainer.to_features(contrastive_reason, contrastive=True))
print("is contrastive: ", explainer.is_contrastive_reason(contrastive_reason))
---------   Theory Feature Types   -----------
Before the encoding (without one hot encoded features), we have:
Numerical features: 1
Categorical features: 2
Binary features: 3
Number of features: 6
Values of categorical features: {'Origin_African_American': ['Origin_', 'African_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Origin_Asian': ['Origin_', 'Asian', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Origin_Hispanic': ['Origin_', 'Hispanic', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Origin_Native_American': ['Origin_', 'Native_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Origin_Other': ['Origin_', 'Other', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Age_Above_FourtyFive': ['Age_', 'Above_FourtyFive', ['Above_FourtyFive', 'Below_TwentyFive']], 'Age_Below_TwentyFive': ['Age_', 'Below_TwentyFive', ['Above_FourtyFive', 'Below_TwentyFive']]}

Number of used features in the model (before the encoding): 6
Number of used features in the model (after the encoding): 11
----------------------------------------------
instance:  [0 0 0 0 1 0 0 0 0 0 0]
target_prediction: 0
minimal contrastive reason: ('Number_of_Priors < 3.5', 'Origin_ = African_American')
is contrastive:  True

If one wants to change the classification, one needs to change the origin and the number of priors (to at least 4).