Link Search Menu Expand Document
PyXAI
Papers Video GitHub In-the-Loop EXPEKCTATION Release Notes About
download notebook

Coverage Reasons

A coverage reason (coverage-based prime implicant explanation, CPI-Xp) for an instance $x$ is an abductive explanation that is maximally general with respect to the domain theory $\Sigma^f$: among all the abductive explanations of $x$, it covers as many instances satisfying $\Sigma^f$ as possible. Unlike a sufficient reason, it is not required to be subset-minimal, so it may involve more conditions. A coverage reason that is in addition subset-minimal is a minimal coverage reason (mCPI-Xp).

A detailed and illustrated presentation of coverage reasons is given on the Random Forests / Coverage Reason page. Computing a coverage reason requires a domain theory, so the feature types must be provided when initializing the explainer (see the Theories page).

We train a boosted tree on the australian dataset (its australian_0.types file activates the domain theory) and compute a coverage reason, then a minimal one, for a well-classified instance. The to_features method gives a compact, human-readable form.

from pyxai import Learning, Explaining

learner = Learning.Xgboost("../../dataset/australian_0.csv", problem_type=Learning.CLASSIFICATION)
model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.BT)
instance, prediction = learner.get_instances(model, n=1, seed=11200, is_correct=True)

explainer = Explaining.initialize(model, instance=instance, features_type="../../dataset/australian_0.types")
print("prediction:", prediction)

coverage = explainer.coverage_reason()
print("\ncoverage reason:", explainer.to_features(coverage))

minimal = explainer.minimal_coverage_reason()
print("minimal coverage reason:", explainer.to_features(minimal))
--------------   Information   ---------------
Problem type: classification
Instances type: tabular
Labels type: classes

Dataset path: ../../dataset/australian_0.csv
nFeatures (nAttributes, with the labels): 38
nInstances (nObservations): 690
nLabels: 2
---------------   Model creation, fitting and evaluation  ---------------
Splitting method: hold-out
Problem type: classification
Models type: boosted-tree
model_parameters: {}


---------   Evaluation Information   ---------
For the evaluation number 0:
Metrics:
   sklearn_confusion_matrix: [[90, 5], [9, 69]]
   precision: 93.24324324324324
   recall: 88.46153846153845
   f1_score: 90.78947368421053
   specificity: 94.73684210526315
   true_positive: 69
   true_negative: 90
   false_positive: 5
   false_negative: 9
   accuracy: 91.90751445086705
Number of Training instances: 517
Number of Testing instances: 173

---------------   Explainer   ----------------
For the split number 0:
**Boosted Tree model**
NClasses: 2
nTrees: 100
nVariables: 293

---------------   Instances   ----------------
Number of instances selected: 1
----------------------------------------------
---------   Theory Feature Types   -----------
Before the one-hot encoding of categorical features:
Numerical features: 6
Categorical features: 4
Binary features: 4
Number of features: 14
Characteristics of categorical features: {'A4_1': ['A4', 1, [1, 2, 3]], 'A4_2': ['A4', 2, [1, 2, 3]], 'A4_3': ['A4', 3, [1, 2, 3]], 'A5_1': ['A5', 1, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_2': ['A5', 2, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_3': ['A5', 3, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_4': ['A5', 4, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_5': ['A5', 5, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_6': ['A5', 6, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_7': ['A5', 7, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_8': ['A5', 8, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_9': ['A5', 9, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_10': ['A5', 10, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_11': ['A5', 11, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_12': ['A5', 12, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_13': ['A5', 13, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_14': ['A5', 14, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A6_1': ['A6', 1, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_2': ['A6', 2, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_3': ['A6', 3, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_4': ['A6', 4, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_5': ['A6', 5, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_7': ['A6', 7, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_8': ['A6', 8, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_9': ['A6', 9, [1, 2, 3, 4, 5, 7, 8, 9]], 'A12_1': ['A12', 1, [1, 2, 3]], 'A12_2': ['A12', 2, [1, 2, 3]], 'A12_3': ['A12', 3, [1, 2, 3]]}

Number of used features in the model (before the encoding of categorical features): 14
Number of used features in the model (after the encoding of categorical features): 27
----------------------------------------------
prediction: 1



coverage reason: ['A1 = 1', 'A2 < 312.0', 'A3 in [47.0, 56.0[', 'A4 != 1', 'A5 = 9', 'A6 = 4', 'A7 >= 44.0', 'A8 = 1', 'A9 = 1', 'A10 in [5.0, 8.0[', 'A12 = 2', 'A13 in [26.0, 36.0[', 'A14 < 17.0']


minimal coverage reason: ['A1 = 1', 'A2 < 312.0', 'A3 in [47.0, 56.0[', 'A4 != 1', 'A5 = 9', 'A6 = 4', 'A7 >= 44.0', 'A8 = 1', 'A9 = 1', 'A10 in [5.0, 8.0[', 'A12 = 2', 'A13 in [26.0, 36.0[', 'A14 < 17.0']

As with random forests, a single equality condition per categorical feature is reported (thanks to the domain theory), and the widest thresholds compatible with the prediction are kept. The function ExplainerBT.minimal_coverage_reason returns a coverage reason that is in addition subset-minimal.