Link Search Menu Expand Document
PyXAI
Papers Video GitHub In-the-Loop EXPEKCTATION Release Notes About
download notebook

Rectification for Random Forests

To rectify a random forest, we simply rectify each of its trees.

Example from a Real Dataset

For this example, we take the compas.csv dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a misclassified instance.

from pyxai import Learning, Explaining

learner = Learning.Scikitlearn("../dataset/compas.csv", problem_type='classification')
model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.RF, splitting_parameters={'random_state':0})

dict_information = learner.get_instances(model, n=1, indexes=Learning.TEST, is_correct=False, details=True, seed=2)


instance = dict_information["instance"]
label = dict_information["label"]
prediction = dict_information["prediction"]


--------------   Information   ---------------
Problem type: classification
Instances type: tabular
Labels type: classes

Dataset path: ../dataset/compas.csv
nFeatures (nAttributes, with the labels): 11
nInstances (nObservations): 6172
nLabels: 2
---------------   Model creation, fitting and evaluation  ---------------
Splitting method: hold-out
Problem type: classification
Models type: random-forest
model_parameters: {}
---------   Evaluation Information   ---------
For the evaluation number 0:
Metrics:
   sklearn_confusion_matrix: [[636, 216], [313, 378]]
   precision: 63.63636363636363
   recall: 54.70332850940666
   f1_score: 58.83268482490272
   specificity: 74.64788732394366
   true_positive: 378
   true_negative: 636
   false_positive: 216
   false_negative: 313
   accuracy: 65.71613739468567
Number of Training instances: 4629
Number of Testing instances: 1543

---------------   Explainer   ----------------
For the split number 0:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 71

---------------   Instances   ----------------
Number of instances selected: 1
----------------------------------------------

We activate the explainer with the associated theory and the selected instance:

compas_types = {
    "numerical": ["Number_of_Priors"],
    "binary": ["Misdemeanor", "score_factor", "Female"],
    "categorical": {"{African_American,Asian,Hispanic,Native_American,Other}": ["African_American", "Asian", "Hispanic", "Native_American", "Other"],
                    "Age*": ["Above_FourtyFive", "Below_TwentyFive"]}
}


explainer = Explaining.initialize(model, instance=instance, features_type=compas_types)
feature_names: ['Misdemeanor', 'Number_of_Priors', 'score_factor', 'Age_Above_FourtyFive', 'Age_Below_TwentyFive', 'African_American', 'Asian', 'Hispanic', 'Native_American', 'Other', 'Female']
---------   Theory Feature Types   -----------
Before the one-hot encoding of categorical features:
Numerical features: 1
Categorical features: 2
Binary features: 3
Number of features: 6
Characteristics of categorical features: {'African_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'African_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Asian': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Asian', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Hispanic': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Hispanic', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Native_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Native_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Other': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Other', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Age_Above_FourtyFive': ['Age', 'Above_FourtyFive', ['Above_FourtyFive', 'Below_TwentyFive']], 'Age_Below_TwentyFive': ['Age', 'Below_TwentyFive', ['Above_FourtyFive', 'Below_TwentyFive']]}

Number of used features in the model (before the encoding of categorical features): 6
Number of used features in the model (after the encoding of categorical features): 11
----------------------------------------------
reason = explainer.majoritary_reason(n=1)
print("Current prediction: ", explainer.target_prediction)
print("explanation:", reason)
print("to_features:", explainer.to_features(reason))
Current prediction:  1
explanation: (3, -4, 7, -8, -9, 12, -51)
to_features: ['Misdemeanor = 0', 'Number_of_Priors in ]0.5, 1.0]', 'score_factor = 1', 'Age = Below_TwentyFive', '{African_American,Asian,Hispanic,Native_American,Other} != Hispanic', 'Female = 0']

We compute a reason for this instance: Suppose that the user knows that every instance covered by the explanation (-2, -3, -6, 9) should be classified as a negative instance. The model must be rectified by the corresponding classification rule. Once the model has been corrected, the instance is classified as expected by the user:

print("current prediction:", model.predict_instance(instance))
model = explainer.rectify(conditions=reason, label=0) # we want to change the prediction        
print("new prediction:", model.predict_instance(instance))
current prediction: 1
Rectify - Number of nodes - Initial (c++): 93368
Rectify - Number of nodes - After rectification (c++): 112936
Rectify - Number of nodes - After simplification with the theory (c++): 97314
Rectify - Number of nodes - After elimination of redundant nodes (c++): 61592
Rectify - Number of nodes - Final (c++): 61592
Rectification time: 0.21008984799999908
--------------
new prediction: 0