{ "cells": [ { "cell_type": "markdown", "id": "9214fb4e", "metadata": {}, "source": [ "# Rectification for Random Forests" ] }, { "cell_type": "markdown", "id": "11bd9e02", "metadata": {}, "source": "To rectify a random forest, we simply rectify each of its trees. " }, { "cell_type": "markdown", "id": "e38b2031", "metadata": {}, "source": [ "## Example from a Real Dataset" ] }, { "cell_type": "markdown", "id": "6d16c6e3", "metadata": {}, "source": "For this example, we take the compas.csv dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a misclassified instance. " }, { "cell_type": "code", "execution_count": 1, "id": "9e802eda", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------- Information ---------------\n", "Problem type: classification\n", "Instances type: tabular\n", "Labels type: classes\n", "\n", "Dataset path: ../dataset/compas.csv\n", "nFeatures (nAttributes, with the labels): 11\n", "nInstances (nObservations): 6172\n", "nLabels: 2\n", "--------------- Model creation, fitting and evaluation ---------------\n", "Splitting method: hold-out\n", "Problem type: classification\n", "Models type: random-forest\n", "model_parameters: {}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "Metrics:\n", " sklearn_confusion_matrix: [[636, 216], [313, 378]]\n", " precision: 63.63636363636363\n", " recall: 54.70332850940666\n", " f1_score: 58.83268482490272\n", " specificity: 74.64788732394366\n", " true_positive: 378\n", " true_negative: 636\n", " false_positive: 216\n", " false_negative: 313\n", " accuracy: 65.71613739468567\n", "Number of Training instances: 4629\n", "Number of Testing instances: 1543\n", "\n", "--------------- Explainer ----------------\n", "For the split number 0:\n", "**Random Forest Model**\n", "nClasses: 2\n", "nTrees: 100\n", "nVariables: 71\n", "\n", "--------------- Instances ----------------\n", "Number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explaining\n", "\n", "learner = Learning.Scikitlearn(\"../dataset/compas.csv\", problem_type='classification')\n", "model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.RF, splitting_parameters={'random_state':0})\n", "\n", "dict_information = learner.get_instances(model, n=1, indexes=Learning.TEST, is_correct=False, details=True, seed=2)\n", "\n", "\n", "instance = dict_information[\"instance\"]\n", "label = dict_information[\"label\"]\n", "prediction = dict_information[\"prediction\"]\n", "\n" ] }, { "cell_type": "markdown", "id": "5a22e957", "metadata": {}, "source": [ "We activate the explainer with the associated theory and the selected instance: " ] }, { "cell_type": "code", "execution_count": 3, "id": "4ed8f056", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "feature_names: ['Misdemeanor', 'Number_of_Priors', 'score_factor', 'Age_Above_FourtyFive', 'Age_Below_TwentyFive', 'African_American', 'Asian', 'Hispanic', 'Native_American', 'Other', 'Female']\n", "--------- Theory Feature Types -----------\n", "Before the one-hot encoding of categorical features:\n", "Numerical features: 1\n", "Categorical features: 2\n", "Binary features: 3\n", "Number of features: 6\n", "Characteristics of categorical features: {'African_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'African_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Asian': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Asian', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Hispanic': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Hispanic', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Native_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Native_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Other': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Other', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Age_Above_FourtyFive': ['Age', 'Above_FourtyFive', ['Above_FourtyFive', 'Below_TwentyFive']], 'Age_Below_TwentyFive': ['Age', 'Below_TwentyFive', ['Above_FourtyFive', 'Below_TwentyFive']]}\n", "\n", "Number of used features in the model (before the encoding of categorical features): 6\n", "Number of used features in the model (after the encoding of categorical features): 11\n", "----------------------------------------------\n" ] } ], "source": [ "compas_types = {\n", " \"numerical\": [\"Number_of_Priors\"],\n", " \"binary\": [\"Misdemeanor\", \"score_factor\", \"Female\"],\n", " \"categorical\": {\"{African_American,Asian,Hispanic,Native_American,Other}\": [\"African_American\", \"Asian\", \"Hispanic\", \"Native_American\", \"Other\"],\n", " \"Age*\": [\"Above_FourtyFive\", \"Below_TwentyFive\"]}\n", "}\n", "\n", "\n", "explainer = Explaining.initialize(model, instance=instance, features_type=compas_types)" ] }, { "cell_type": "code", "execution_count": 4, "id": "d2ec090c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Current prediction: 1\n", "explanation: (3, -4, 7, -8, -9, 12, -51)\n", "to_features: ['Misdemeanor = 0', 'Number_of_Priors in ]0.5, 1.0]', 'score_factor = 1', 'Age = Below_TwentyFive', '{African_American,Asian,Hispanic,Native_American,Other} != Hispanic', 'Female = 0']\n" ] } ], "source": [ "reason = explainer.majoritary_reason(n=1)\n", "print(\"Current prediction: \", explainer.target_prediction)\n", "print(\"explanation:\", reason)\n", "print(\"to_features:\", explainer.to_features(reason))" ] }, { "cell_type": "markdown", "id": "8eb3b471", "metadata": {}, "source": [ "We compute a reason for this instance:\n", "Suppose that the user knows that every instance covered by the explanation (-2, -3, -6, 9) should be classified as a negative instance. The model must be rectified by the corresponding classification rule.\n", "Once the model has been corrected, the instance is classified as expected by the user:" ] }, { "cell_type": "code", "execution_count": 5, "id": "06abf749", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "current prediction: 1\n", "Rectify - Number of nodes - Initial (c++): 93368\n", "Rectify - Number of nodes - After rectification (c++): 112936\n", "Rectify - Number of nodes - After simplification with the theory (c++): 97314\n", "Rectify - Number of nodes - After elimination of redundant nodes (c++): 61592\n", "Rectify - Number of nodes - Final (c++): 61592\n", "Rectification time: 0.21008984799999908\n", "--------------\n", "new prediction: 0\n" ] } ], "source": [ "print(\"current prediction:\", model.predict_instance(instance))\n", "model = explainer.rectify(conditions=reason, label=0) # we want to change the prediction \n", "print(\"new prediction:\", model.predict_instance(instance))" ] }, { "cell_type": "code", "execution_count": null, "id": "55cb1cae", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" } }, "nbformat": 4, "nbformat_minor": 5 }