Link Search Menu Expand Document
PyXAI
Papers Video GitHub EXPEKCTATION About
download notebook

Rectification for Random Forests

To rectify an random forest, we simply rectify each of its trees.

Example from a Real Dataset

For this example, we take the compas.csv dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a miss-classified instance.

from pyxai import Learning, Explainer

learner = Learning.Scikitlearn("../dataset/compas.csv", learner_type=Learning.CLASSIFICATION)
model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.RF)

dict_information = learner.get_instances(model, n=1, indexes=Learning.TEST, correct=False, details=True)

instance = dict_information["instance"]
label = dict_information["label"]
prediction = dict_information["prediction"]

print("prediction:", prediction)
data:
      Number_of_Priors  score_factor  Age_Above_FourtyFive  \
0                    0             0                     1   
1                    0             0                     0   
2                    4             0                     0   
3                    0             0                     0   
4                   14             1                     0   
...                ...           ...                   ...   
6167                 0             1                     0   
6168                 0             0                     0   
6169                 0             0                     1   
6170                 3             0                     0   
6171                 2             0                     0   

      Age_Below_TwentyFive  African_American  Asian  Hispanic  \
0                        0                 0      0         0   
1                        0                 1      0         0   
2                        1                 1      0         0   
3                        0                 0      0         0   
4                        0                 0      0         0   
...                    ...               ...    ...       ...   
6167                     1                 1      0         0   
6168                     1                 1      0         0   
6169                     0                 0      0         0   
6170                     0                 1      0         0   
6171                     1                 0      0         1   

      Native_American  Other  Female  Misdemeanor  Two_yr_Recidivism  
0                   0      1       0            0                  0  
1                   0      0       0            0                  1  
2                   0      0       0            0                  1  
3                   0      1       0            1                  0  
4                   0      0       0            0                  1  
...               ...    ...     ...          ...                ...  
6167                0      0       0            0                  0  
6168                0      0       0            0                  0  
6169                0      1       0            0                  0  
6170                0      0       1            1                  0  
6171                0      0       1            0                  1  

[6172 rows x 12 columns]
--------------   Information   ---------------
Dataset name: ../dataset/compas.csv
nFeatures (nAttributes, with the labels): 12
nInstances (nObservations): 6172
nLabels: 2
---------------   Evaluation   ---------------
method: HoldOut
output: RF
learner_type: Classification
learner_options: {'max_depth': None, 'random_state': 0}
---------   Evaluation Information   ---------
For the evaluation number 0:
metrics:
   accuracy: 65.71274298056156
   precision: 64.64788732394366
   recall: 54.44839857651246
   f1_score: 59.11139729555699
   specificity: 75.12388503468782
   true_positive: 459
   true_negative: 758
   false_positive: 251
   false_negative: 384
   sklearn_confusion_matrix: [[758, 251], [384, 459]]
nTraining instances: 4320
nTest instances: 1852

---------------   Explainer   ----------------
For the evaluation number 0:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 68

---------------   Instances   ----------------
number of instances selected: 1
----------------------------------------------
prediction: 0

We activate the explainer with the associated theory and the selected instance:

compas_types = {
    "numerical": ["Number_of_Priors"],
    "binary": ["Misdemeanor", "score_factor", "Female"],
    "categorical": {"{African_American,Asian,Hispanic,Native_American,Other}": ["African_American", "Asian", "Hispanic", "Native_American", "Other"],
                    "Age*": ["Above_FourtyFive", "Below_TwentyFive"]}
}


explainer = Explainer.initialize(model, instance=instance, features_type=compas_types)
---------   Theory Feature Types   -----------
Before the one-hot encoding of categorical features:
Numerical features: 1
Categorical features: 2
Binary features: 3
Number of features: 6
Characteristics of categorical features: {'African_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'African_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Asian': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Asian', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Hispanic': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Hispanic', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Native_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Native_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Other': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Other', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Age_Above_FourtyFive': ['Age', 'Above_FourtyFive', ['Above_FourtyFive', 'Below_TwentyFive']], 'Age_Below_TwentyFive': ['Age', 'Below_TwentyFive', ['Above_FourtyFive', 'Below_TwentyFive']]}

Number of used features in the model (before the encoding of categorical features): 6
Number of used features in the model (after the encoding of categorical features): 11
----------------------------------------------

We compute why the model predicts 0 for this instance:

reason = explainer.majoritary_reason(n=1)
print("explanation:", reason)
print("to_features:", explainer.to_features(reason))
explanation: (-2, -3, -6, 9)
to_features: ('Number_of_Priors <= 0.5', 'score_factor = 0', 'Age != Below_TwentyFive', 'Misdemeanor = 1')

Suppose that the user knows that every instance covered by the explanation (-2, -3, -6, 9) should be classified as a positive instance. The model must be rectified by the corresponding classification rule. Once the model has been corrected, the instance is classified as expected by the user:

model = explainer.rectify(conditions=reason, label=1)        
print("new prediction:", model.predict_instance(instance))
-------------- Rectification information:
Classification Rule - Number of nodes: 9
Model - Number of nodes: 89814
Model - Number of nodes (after rectification): 290854
Model - Number of nodes (after simplification using the theory): 93768
Model - Number of nodes (after elimination of redundant nodes): 60176
--------------
new prediction: 1