{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9e8488b0",
   "metadata": {},
   "source": [
    "# Coverage Reasons"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ecee6f20",
   "metadata": {},
   "source": [
    "A **coverage reason** (coverage-based prime implicant explanation, CPI-Xp) for an instance $x$ is an **abductive** explanation that is **maximally general** with respect to the domain theory $\\Sigma^f$: among all the abductive explanations of $x$, it covers as many instances satisfying $\\Sigma^f$ as possible. Unlike a sufficient reason, it is *not* required to be subset-minimal, so it may involve more conditions. A coverage reason that is in addition subset-minimal is a **minimal coverage reason** (mCPI-Xp).\n",
    "\n",
    "A detailed and illustrated presentation of coverage reasons is given on the [Random Forests / Coverage Reason](/documentation/classification/RFexplanations/coverage_reason/) page. Computing a coverage reason requires a domain theory, so the feature types must be provided when initializing the explainer (see the [Theories](/documentation/explainer/theories/) page)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6d2175c",
   "metadata": {},
   "source": [
    "We train a boosted tree on the [australian](/assets/notebooks/dataset/australian_0.csv) dataset (its [australian_0.types](/assets/notebooks/dataset/australian_0.types) file activates the domain theory) and compute a coverage reason, then a minimal one, for a well-classified instance. The ```to_features``` method gives a compact, human-readable form."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "c87cab2d",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-09T10:11:12.821575Z",
     "iopub.status.busy": "2026-06-09T10:11:12.821463Z",
     "iopub.status.idle": "2026-06-09T10:12:18.614009Z",
     "shell.execute_reply": "2026-06-09T10:12:18.613570Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------   Information   ---------------\n",
      "Problem type: classification\n",
      "Instances type: tabular\n",
      "Labels type: classes\n",
      "\n",
      "Dataset path: ../../dataset/australian_0.csv\n",
      "nFeatures (nAttributes, with the labels): 38\n",
      "nInstances (nObservations): 690\n",
      "nLabels: 2\n",
      "---------------   Model creation, fitting and evaluation  ---------------\n",
      "Splitting method: hold-out\n",
      "Problem type: classification\n",
      "Models type: boosted-tree\n",
      "model_parameters: {}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "---------   Evaluation Information   ---------\n",
      "For the evaluation number 0:\n",
      "Metrics:\n",
      "   sklearn_confusion_matrix: [[90, 5], [9, 69]]\n",
      "   precision: 93.24324324324324\n",
      "   recall: 88.46153846153845\n",
      "   f1_score: 90.78947368421053\n",
      "   specificity: 94.73684210526315\n",
      "   true_positive: 69\n",
      "   true_negative: 90\n",
      "   false_positive: 5\n",
      "   false_negative: 9\n",
      "   accuracy: 91.90751445086705\n",
      "Number of Training instances: 517\n",
      "Number of Testing instances: 173\n",
      "\n",
      "---------------   Explainer   ----------------\n",
      "For the split number 0:\n",
      "**Boosted Tree model**\n",
      "NClasses: 2\n",
      "nTrees: 100\n",
      "nVariables: 293\n",
      "\n",
      "---------------   Instances   ----------------\n",
      "Number of instances selected: 1\n",
      "----------------------------------------------\n",
      "---------   Theory Feature Types   -----------\n",
      "Before the one-hot encoding of categorical features:\n",
      "Numerical features: 6\n",
      "Categorical features: 4\n",
      "Binary features: 4\n",
      "Number of features: 14\n",
      "Characteristics of categorical features: {'A4_1': ['A4', 1, [1, 2, 3]], 'A4_2': ['A4', 2, [1, 2, 3]], 'A4_3': ['A4', 3, [1, 2, 3]], 'A5_1': ['A5', 1, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_2': ['A5', 2, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_3': ['A5', 3, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_4': ['A5', 4, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_5': ['A5', 5, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_6': ['A5', 6, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_7': ['A5', 7, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_8': ['A5', 8, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_9': ['A5', 9, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_10': ['A5', 10, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_11': ['A5', 11, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_12': ['A5', 12, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_13': ['A5', 13, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A5_14': ['A5', 14, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]], 'A6_1': ['A6', 1, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_2': ['A6', 2, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_3': ['A6', 3, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_4': ['A6', 4, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_5': ['A6', 5, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_7': ['A6', 7, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_8': ['A6', 8, [1, 2, 3, 4, 5, 7, 8, 9]], 'A6_9': ['A6', 9, [1, 2, 3, 4, 5, 7, 8, 9]], 'A12_1': ['A12', 1, [1, 2, 3]], 'A12_2': ['A12', 2, [1, 2, 3]], 'A12_3': ['A12', 3, [1, 2, 3]]}\n",
      "\n",
      "Number of used features in the model (before the encoding of categorical features): 14\n",
      "Number of used features in the model (after the encoding of categorical features): 27\n",
      "----------------------------------------------\n",
      "prediction: 1\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "coverage reason: ['A1 = 1', 'A2 < 312.0', 'A3 in [47.0, 56.0[', 'A4 != 1', 'A5 = 9', 'A6 = 4', 'A7 >= 44.0', 'A8 = 1', 'A9 = 1', 'A10 in [5.0, 8.0[', 'A12 = 2', 'A13 in [26.0, 36.0[', 'A14 < 17.0']\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "minimal coverage reason: ['A1 = 1', 'A2 < 312.0', 'A3 in [47.0, 56.0[', 'A4 != 1', 'A5 = 9', 'A6 = 4', 'A7 >= 44.0', 'A8 = 1', 'A9 = 1', 'A10 in [5.0, 8.0[', 'A12 = 2', 'A13 in [26.0, 36.0[', 'A14 < 17.0']\n"
     ]
    }
   ],
   "source": [
    "from pyxai import Learning, Explaining\n",
    "\n",
    "learner = Learning.Xgboost(\"../../dataset/australian_0.csv\", problem_type=Learning.CLASSIFICATION)\n",
    "model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.BT)\n",
    "instance, prediction = learner.get_instances(model, n=1, seed=11200, is_correct=True)\n",
    "\n",
    "explainer = Explaining.initialize(model, instance=instance, features_type=\"../../dataset/australian_0.types\")\n",
    "print(\"prediction:\", prediction)\n",
    "\n",
    "coverage = explainer.coverage_reason()\n",
    "print(\"\\ncoverage reason:\", explainer.to_features(coverage))\n",
    "\n",
    "minimal = explainer.minimal_coverage_reason()\n",
    "print(\"minimal coverage reason:\", explainer.to_features(minimal))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7e3feb6",
   "metadata": {},
   "source": [
    "As with random forests, a single equality condition per categorical feature is reported (thanks to the domain theory), and the widest thresholds compatible with the prediction are kept. The function ```ExplainerBT.minimal_coverage_reason``` returns a coverage reason that is in addition subset-minimal."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}