{ "cells": [ { "cell_type": "markdown", "id": "d8c54475", "metadata": {}, "source": [ "# Coverage Reasons" ] }, { "cell_type": "markdown", "id": "fcef538a", "metadata": {}, "source": [ "A **coverage reason** (coverage-based prime implicant explanation, CPI-Xp) for an instance $x$ is an **abductive** explanation that is **maximally general** with respect to the domain theory $\\Sigma^f$: among all the abductive explanations of $x$, it covers as many instances satisfying $\\Sigma^f$ as possible. Unlike a sufficient reason, it is *not* required to be subset-minimal, so it may involve more conditions. A coverage reason that is in addition subset-minimal is a **minimal coverage reason** (mCPI-Xp).\n", "\n", "A detailed and illustrated presentation of coverage reasons is given on the [Random Forests / Coverage Reason](/documentation/classification/RFexplanations/coverage_reason/) page. Computing a coverage reason requires a domain theory, so the feature types must be provided when initializing the explainer (see the [Theories](/documentation/explainer/theories/) page)." ] }, { "cell_type": "markdown", "id": "8ec9f82b", "metadata": {}, "source": [ "We train a decision tree on the [australian](/assets/notebooks/dataset/australian_0.csv) dataset (its [australian_0.types](/assets/notebooks/dataset/australian_0.types) file activates the domain theory) and compute a coverage reason, then a minimal one, for a well-classified instance. The ```to_features``` method gives a compact, human-readable form." ] }, { "cell_type": "code", "execution_count": 1, "id": "178b0d68", "metadata": { "execution": { "iopub.execute_input": "2026-06-09T10:11:06.623247Z", "iopub.status.busy": "2026-06-09T10:11:06.623138Z", "iopub.status.idle": "2026-06-09T10:11:08.152473Z", "shell.execute_reply": "2026-06-09T10:11:08.151873Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------- Information ---------------\n", "Problem type: classification\n", "Instances type: tabular\n", "Labels type: classes\n", "\n", "Dataset path: ../../dataset/australian_0.csv\n", "nFeatures (nAttributes, with the labels): 38\n", "nInstances (nObservations): 690\n", "nLabels: 2\n", "--------------- Model creation, fitting and evaluation ---------------\n", "Splitting method: hold-out\n", "Problem type: classification\n", "Models type: decision-tree\n", "model_parameters: {}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "Metrics:\n", " sklearn_confusion_matrix: [[79, 13], [15, 66]]\n", " precision: 83.54430379746836\n", " recall: 81.48148148148148\n", " f1_score: 82.5\n", " specificity: 85.86956521739131\n", " true_positive: 66\n", " true_negative: 79\n", " false_positive: 13\n", " false_negative: 15\n", " accuracy: 83.8150289017341\n", "Number of Training instances: 517\n", "Number of Testing instances: 173\n", "\n", "--------------- Explainer ----------------\n", "For the split number 0:\n", "**Decision Tree Model**\n", "nFeatures: 38\n", "nNodes: 66\n", "nVariables: 59\n", "\n", "--------------- Instances ----------------\n", "Number of instances selected: 1\n", "----------------------------------------------\n", "--------- Theory Feature Types -----------\n", "Before the one-hot encoding of categorical features:\n", "Numerical features: 6\n", "Categorical features: 4\n", "Binary features: 4\n", "Number of features: 14\n", "Characteristics of categorical features: {'A4_1': ['A4', 1, [1, 2, 3]], 'A4_2': ['A4', 2, [1, 2, 3]], 'A4_3': ['A4', 3, [1, 2, 3]], 'A5_1': ['A5', 1, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_2': ['A5', 2, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_3': ['A5', 3, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_4': ['A5', 4, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_5': ['A5', 5, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_6': ['A5', 6, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_7': ['A5', 7, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_8': ['A5', 8, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_9': ['A5', 9, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_10': ['A5', 10, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_11': ['A5', 11, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_12': ['A5', 12, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_13': ['A5', 13, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_14': ['A5', 14, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A6_1': ['A6', 1, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_2': ['A6', 2, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_3': ['A6', 3, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_4': ['A6', 4, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_5': ['A6', 5, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_7': ['A6', 7, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_8': ['A6', 8, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_9': ['A6', 9, [1, 2, 3, 4, 5, 7, 8, 9]], 'A12_1': ['A12', 1, [1, 2, 3]], 'A12_2': ['A12', 2, [1, 2, 3]], 'A12_3': ['A12', 3, [1, 2, 3]]}\n", "\n", "Number of used features in the model (before the encoding of categorical features): 13\n", "Number of used features in the model (after the encoding of categorical features): 24\n", "----------------------------------------------\n", "prediction: 1\n", "\n", "coverage reason: ['A2 > 46.5', 'A6 != 9', 'A7 <= 115.0', 'A8 = 1', 'A10 > 4.5', 'A14 <= 83.0']\n", "minimal coverage reason: ['A2 > 46.5', 'A6 != 9', 'A7 <= 115.0', 'A8 = 1', 'A10 > 4.5', 'A14 <= 83.0']\n" ] } ], "source": [ "from pyxai import Learning, Explaining\n", "\n", "learner = Learning.Scikitlearn(\"../../dataset/australian_0.csv\", problem_type=Learning.CLASSIFICATION)\n", "model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.DT)\n", "instance, prediction = learner.get_instances(model, n=1, seed=11200, is_correct=True)\n", "\n", "explainer = Explaining.initialize(model, instance=instance, features_type=\"../../dataset/australian_0.types\")\n", "print(\"prediction:\", prediction)\n", "\n", "coverage = explainer.coverage_reason()\n", "print(\"\\ncoverage reason:\", explainer.to_features(coverage))\n", "\n", "minimal = explainer.minimal_coverage_reason()\n", "print(\"minimal coverage reason:\", explainer.to_features(minimal))" ] }, { "cell_type": "markdown", "id": "ba4cfca7", "metadata": {}, "source": [ "As with random forests, a single equality condition per categorical feature is reported (thanks to the domain theory), and the widest thresholds compatible with the prediction are kept. The function ```ExplainerDT.minimal_coverage_reason``` returns a coverage reason that is in addition subset-minimal." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" } }, "nbformat": 4, "nbformat_minor": 5 }