{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9214fb4e",
   "metadata": {},
   "source": [
    "# Rectification for Random Forests"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11bd9e02",
   "metadata": {},
   "source": [
    "To rectify an random forest, we simply rectify each of its trees. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e38b2031",
   "metadata": {},
   "source": [
    "## Example from a Real Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d16c6e3",
   "metadata": {},
   "source": [
    "For this example, we take the compas.csv dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a miss-classified instance. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9e802eda",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "data:\n",
      "      Number_of_Priors  score_factor  Age_Above_FourtyFive  \\\n",
      "0                    0             0                     1   \n",
      "1                    0             0                     0   \n",
      "2                    4             0                     0   \n",
      "3                    0             0                     0   \n",
      "4                   14             1                     0   \n",
      "...                ...           ...                   ...   \n",
      "6167                 0             1                     0   \n",
      "6168                 0             0                     0   \n",
      "6169                 0             0                     1   \n",
      "6170                 3             0                     0   \n",
      "6171                 2             0                     0   \n",
      "\n",
      "      Age_Below_TwentyFive  African_American  Asian  Hispanic  \\\n",
      "0                        0                 0      0         0   \n",
      "1                        0                 1      0         0   \n",
      "2                        1                 1      0         0   \n",
      "3                        0                 0      0         0   \n",
      "4                        0                 0      0         0   \n",
      "...                    ...               ...    ...       ...   \n",
      "6167                     1                 1      0         0   \n",
      "6168                     1                 1      0         0   \n",
      "6169                     0                 0      0         0   \n",
      "6170                     0                 1      0         0   \n",
      "6171                     1                 0      0         1   \n",
      "\n",
      "      Native_American  Other  Female  Misdemeanor  Two_yr_Recidivism  \n",
      "0                   0      1       0            0                  0  \n",
      "1                   0      0       0            0                  1  \n",
      "2                   0      0       0            0                  1  \n",
      "3                   0      1       0            1                  0  \n",
      "4                   0      0       0            0                  1  \n",
      "...               ...    ...     ...          ...                ...  \n",
      "6167                0      0       0            0                  0  \n",
      "6168                0      0       0            0                  0  \n",
      "6169                0      1       0            0                  0  \n",
      "6170                0      0       1            1                  0  \n",
      "6171                0      0       1            0                  1  \n",
      "\n",
      "[6172 rows x 12 columns]\n",
      "--------------   Information   ---------------\n",
      "Dataset name: ../dataset/compas.csv\n",
      "nFeatures (nAttributes, with the labels): 12\n",
      "nInstances (nObservations): 6172\n",
      "nLabels: 2\n",
      "---------------   Evaluation   ---------------\n",
      "method: HoldOut\n",
      "output: RF\n",
      "learner_type: Classification\n",
      "learner_options: {'max_depth': None, 'random_state': 0}\n",
      "---------   Evaluation Information   ---------\n",
      "For the evaluation number 0:\n",
      "metrics:\n",
      "   accuracy: 65.71274298056156\n",
      "   precision: 64.64788732394366\n",
      "   recall: 54.44839857651246\n",
      "   f1_score: 59.11139729555699\n",
      "   specificity: 75.12388503468782\n",
      "   true_positive: 459\n",
      "   true_negative: 758\n",
      "   false_positive: 251\n",
      "   false_negative: 384\n",
      "   sklearn_confusion_matrix: [[758, 251], [384, 459]]\n",
      "nTraining instances: 4320\n",
      "nTest instances: 1852\n",
      "\n",
      "---------------   Explainer   ----------------\n",
      "For the evaluation number 0:\n",
      "**Random Forest Model**\n",
      "nClasses: 2\n",
      "nTrees: 100\n",
      "nVariables: 68\n",
      "\n",
      "---------------   Instances   ----------------\n",
      "number of instances selected: 1\n",
      "----------------------------------------------\n",
      "prediction: 0\n"
     ]
    }
   ],
   "source": [
    "from pyxai import Learning, Explainer\n",
    "\n",
    "learner = Learning.Scikitlearn(\"../dataset/compas.csv\", learner_type=Learning.CLASSIFICATION)\n",
    "model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.RF)\n",
    "\n",
    "dict_information = learner.get_instances(model, n=1, indexes=Learning.TEST, correct=False, details=True)\n",
    "\n",
    "instance = dict_information[\"instance\"]\n",
    "label = dict_information[\"label\"]\n",
    "prediction = dict_information[\"prediction\"]\n",
    "\n",
    "print(\"prediction:\", prediction)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a22e957",
   "metadata": {},
   "source": [
    "We activate the explainer with the associated theory and the selected instance: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4ed8f056",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "---------   Theory Feature Types   -----------\n",
      "Before the one-hot encoding of categorical features:\n",
      "Numerical features: 1\n",
      "Categorical features: 2\n",
      "Binary features: 3\n",
      "Number of features: 6\n",
      "Characteristics of categorical features: {'African_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'African_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Asian': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Asian', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Hispanic': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Hispanic', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Native_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Native_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Other': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Other', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Age_Above_FourtyFive': ['Age', 'Above_FourtyFive', ['Above_FourtyFive', 'Below_TwentyFive']], 'Age_Below_TwentyFive': ['Age', 'Below_TwentyFive', ['Above_FourtyFive', 'Below_TwentyFive']]}\n",
      "\n",
      "Number of used features in the model (before the encoding of categorical features): 6\n",
      "Number of used features in the model (after the encoding of categorical features): 11\n",
      "----------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "compas_types = {\n",
    "    \"numerical\": [\"Number_of_Priors\"],\n",
    "    \"binary\": [\"Misdemeanor\", \"score_factor\", \"Female\"],\n",
    "    \"categorical\": {\"{African_American,Asian,Hispanic,Native_American,Other}\": [\"African_American\", \"Asian\", \"Hispanic\", \"Native_American\", \"Other\"],\n",
    "                    \"Age*\": [\"Above_FourtyFive\", \"Below_TwentyFive\"]}\n",
    "}\n",
    "\n",
    "\n",
    "explainer = Explainer.initialize(model, instance=instance, features_type=compas_types)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95124c65",
   "metadata": {},
   "source": [
    "We compute why the model predicts 0 for this instance:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d2ec090c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "explanation: (-2, -3, -6, 9)\n",
      "to_features: ('Number_of_Priors <= 0.5', 'score_factor = 0', 'Age != Below_TwentyFive', 'Misdemeanor = 1')\n"
     ]
    }
   ],
   "source": [
    "reason = explainer.majoritary_reason(n=1)\n",
    "print(\"explanation:\", reason)\n",
    "print(\"to_features:\", explainer.to_features(reason))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8eb3b471",
   "metadata": {},
   "source": [
    "Suppose that the user knows that every instance covered by the explanation (-2, -3, -6, 9) should be classified as a positive instance. The model must be rectified by the corresponding classification rule.\n",
    "Once the model has been corrected, the instance is classified as expected by the user:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "06abf749",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "-------------- Rectification information:\n",
      "Classification Rule - Number of nodes: 9\n",
      "Model - Number of nodes: 89814\n",
      "Model - Number of nodes (after rectification): 290854\n",
      "Model - Number of nodes (after simplification using the theory): 93768\n",
      "Model - Number of nodes (after elimination of redundant nodes): 60176\n",
      "--------------\n",
      "new prediction: 1\n"
     ]
    }
   ],
   "source": [
    "model = explainer.rectify(conditions=reason, label=1)        \n",
    "print(\"new prediction:\", model.predict_instance(instance))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55cb1cae",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}