{ "cells": [ { "cell_type": "markdown", "id": "9e8488b0", "metadata": {}, "source": [ "# Coverage Reasons" ] }, { "cell_type": "markdown", "id": "ecee6f20", "metadata": {}, "source": [ "A **coverage reason** (coverage-based prime implicant explanation, CPI-Xp) for an instance $x$ is an **abductive** explanation that is **maximally general** with respect to the domain theory $\\Sigma^f$: among all the abductive explanations of $x$, it covers as many instances satisfying $\\Sigma^f$ as possible. Unlike a sufficient reason, it is *not* required to be subset-minimal, so it may involve more conditions. A coverage reason that is in addition subset-minimal is a **minimal coverage reason** (mCPI-Xp).\n", "\n", "A detailed and illustrated presentation of coverage reasons is given on the [Random Forests / Coverage Reason](/documentation/classification/RFexplanations/coverage_reason/) page. Computing a coverage reason requires a domain theory, so the feature types must be provided when initializing the explainer (see the [Theories](/documentation/explainer/theories/) page)." ] }, { "cell_type": "markdown", "id": "a6d2175c", "metadata": {}, "source": [ "We train a boosted tree on the [australian](/assets/notebooks/dataset/australian_0.csv) dataset (its [australian_0.types](/assets/notebooks/dataset/australian_0.types) file activates the domain theory) and compute a coverage reason, then a minimal one, for a well-classified instance. The ```to_features``` method gives a compact, human-readable form." ] }, { "cell_type": "code", "execution_count": 1, "id": "c87cab2d", "metadata": { "execution": { "iopub.execute_input": "2026-06-09T10:11:12.821575Z", "iopub.status.busy": "2026-06-09T10:11:12.821463Z", "iopub.status.idle": "2026-06-09T10:12:18.614009Z", "shell.execute_reply": "2026-06-09T10:12:18.613570Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------- Information ---------------\n", "Problem type: classification\n", "Instances type: tabular\n", "Labels type: classes\n", "\n", "Dataset path: ../../dataset/australian_0.csv\n", "nFeatures (nAttributes, with the labels): 38\n", "nInstances (nObservations): 690\n", "nLabels: 2\n", "--------------- Model creation, fitting and evaluation ---------------\n", "Splitting method: hold-out\n", "Problem type: classification\n", "Models type: boosted-tree\n", "model_parameters: {}\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "Metrics:\n", " sklearn_confusion_matrix: [[90, 5], [9, 69]]\n", " precision: 93.24324324324324\n", " recall: 88.46153846153845\n", " f1_score: 90.78947368421053\n", " specificity: 94.73684210526315\n", " true_positive: 69\n", " true_negative: 90\n", " false_positive: 5\n", " false_negative: 9\n", " accuracy: 91.90751445086705\n", "Number of Training instances: 517\n", "Number of Testing instances: 173\n", "\n", "--------------- Explainer ----------------\n", "For the split number 0:\n", "**Boosted Tree model**\n", "NClasses: 2\n", "nTrees: 100\n", "nVariables: 293\n", "\n", "--------------- Instances ----------------\n", "Number of instances selected: 1\n", "----------------------------------------------\n", "--------- Theory Feature Types -----------\n", "Before the one-hot encoding of categorical features:\n", "Numerical features: 6\n", "Categorical features: 4\n", "Binary features: 4\n", "Number of features: 14\n", "Characteristics of categorical features: {'A4_1': ['A4', 1, [1, 2, 3]], 'A4_2': ['A4', 2, [1, 2, 3]], 'A4_3': ['A4', 3, [1, 2, 3]], 'A5_1': ['A5', 1, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_2': ['A5', 2, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_3': ['A5', 3, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_4': ['A5', 4, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_5': ['A5', 5, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_6': ['A5', 6, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_7': ['A5', 7, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_8': ['A5', 8, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_9': ['A5', 9, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_10': ['A5', 10, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_11': ['A5', 11, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_12': ['A5', 12, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_13': ['A5', 13, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_14': ['A5', 14, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A6_1': ['A6', 1, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_2': ['A6', 2, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_3': ['A6', 3, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_4': ['A6', 4, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_5': ['A6', 5, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_7': ['A6', 7, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_8': ['A6', 8, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_9': ['A6', 9, [1, 2, 3, 4, 5, 7, 8, 9]], 'A12_1': ['A12', 1, [1, 2, 3]], 'A12_2': ['A12', 2, [1, 2, 3]], 'A12_3': ['A12', 3, [1, 2, 3]]}\n", "\n", "Number of used features in the model (before the encoding of categorical features): 14\n", "Number of used features in the model (after the encoding of categorical features): 27\n", "----------------------------------------------\n", "prediction: 1\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "coverage reason: ['A1 = 1', 'A2 < 312.0', 'A3 in [47.0, 56.0[', 'A4 != 1', 'A5 = 9', 'A6 = 4', 'A7 >= 44.0', 'A8 = 1', 'A9 = 1', 'A10 in [5.0, 8.0[', 'A12 = 2', 'A13 in [26.0, 36.0[', 'A14 < 17.0']\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "minimal coverage reason: ['A1 = 1', 'A2 < 312.0', 'A3 in [47.0, 56.0[', 'A4 != 1', 'A5 = 9', 'A6 = 4', 'A7 >= 44.0', 'A8 = 1', 'A9 = 1', 'A10 in [5.0, 8.0[', 'A12 = 2', 'A13 in [26.0, 36.0[', 'A14 < 17.0']\n" ] } ], "source": [ "from pyxai import Learning, Explaining\n", "\n", "learner = Learning.Xgboost(\"../../dataset/australian_0.csv\", problem_type=Learning.CLASSIFICATION)\n", "model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.BT)\n", "instance, prediction = learner.get_instances(model, n=1, seed=11200, is_correct=True)\n", "\n", "explainer = Explaining.initialize(model, instance=instance, features_type=\"../../dataset/australian_0.types\")\n", "print(\"prediction:\", prediction)\n", "\n", "coverage = explainer.coverage_reason()\n", "print(\"\\ncoverage reason:\", explainer.to_features(coverage))\n", "\n", "minimal = explainer.minimal_coverage_reason()\n", "print(\"minimal coverage reason:\", explainer.to_features(minimal))" ] }, { "cell_type": "markdown", "id": "c7e3feb6", "metadata": {}, "source": [ "As with random forests, a single equality condition per categorical feature is reported (thanks to the domain theory), and the widest thresholds compatible with the prediction are kept. The function ```ExplainerBT.minimal_coverage_reason``` returns a coverage reason that is in addition subset-minimal." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" } }, "nbformat": 4, "nbformat_minor": 5 }