Rectification for Random Forests
To rectify an random forest, we simply rectify each of its trees.
Example from a Real Dataset
For this example, we take the compas.csv dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a miss-classified instance.
from pyxai import Learning, Explainer
learner = Learning.Scikitlearn("../dataset/compas.csv", learner_type=Learning.CLASSIFICATION)
model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.RF)
dict_information = learner.get_instances(model, n=1, indexes=Learning.TEST, correct=False, details=True)
instance = dict_information["instance"]
label = dict_information["label"]
prediction = dict_information["prediction"]
print("prediction:", prediction)
data:
Number_of_Priors score_factor Age_Above_FourtyFive \
0 0 0 1
1 0 0 0
2 4 0 0
3 0 0 0
4 14 1 0
... ... ... ...
6167 0 1 0
6168 0 0 0
6169 0 0 1
6170 3 0 0
6171 2 0 0
Age_Below_TwentyFive African_American Asian Hispanic \
0 0 0 0 0
1 0 1 0 0
2 1 1 0 0
3 0 0 0 0
4 0 0 0 0
... ... ... ... ...
6167 1 1 0 0
6168 1 1 0 0
6169 0 0 0 0
6170 0 1 0 0
6171 1 0 0 1
Native_American Other Female Misdemeanor Two_yr_Recidivism
0 0 1 0 0 0
1 0 0 0 0 1
2 0 0 0 0 1
3 0 1 0 1 0
4 0 0 0 0 1
... ... ... ... ... ...
6167 0 0 0 0 0
6168 0 0 0 0 0
6169 0 1 0 0 0
6170 0 0 1 1 0
6171 0 0 1 0 1
[6172 rows x 12 columns]
-------------- Information ---------------
Dataset name: ../dataset/compas.csv
nFeatures (nAttributes, with the labels): 12
nInstances (nObservations): 6172
nLabels: 2
--------------- Evaluation ---------------
method: HoldOut
output: RF
learner_type: Classification
learner_options: {'max_depth': None, 'random_state': 0}
--------- Evaluation Information ---------
For the evaluation number 0:
metrics:
accuracy: 65.71274298056156
precision: 64.64788732394366
recall: 54.44839857651246
f1_score: 59.11139729555699
specificity: 75.12388503468782
true_positive: 459
true_negative: 758
false_positive: 251
false_negative: 384
sklearn_confusion_matrix: [[758, 251], [384, 459]]
nTraining instances: 4320
nTest instances: 1852
--------------- Explainer ----------------
For the evaluation number 0:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 68
--------------- Instances ----------------
number of instances selected: 1
----------------------------------------------
prediction: 0
We activate the explainer with the associated theory and the selected instance:
compas_types = {
"numerical": ["Number_of_Priors"],
"binary": ["Misdemeanor", "score_factor", "Female"],
"categorical": {"{African_American,Asian,Hispanic,Native_American,Other}": ["African_American", "Asian", "Hispanic", "Native_American", "Other"],
"Age*": ["Above_FourtyFive", "Below_TwentyFive"]}
}
explainer = Explainer.initialize(model, instance=instance, features_type=compas_types)
--------- Theory Feature Types -----------
Before the one-hot encoding of categorical features:
Numerical features: 1
Categorical features: 2
Binary features: 3
Number of features: 6
Characteristics of categorical features: {'African_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'African_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Asian': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Asian', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Hispanic': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Hispanic', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Native_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Native_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Other': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Other', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Age_Above_FourtyFive': ['Age', 'Above_FourtyFive', ['Above_FourtyFive', 'Below_TwentyFive']], 'Age_Below_TwentyFive': ['Age', 'Below_TwentyFive', ['Above_FourtyFive', 'Below_TwentyFive']]}
Number of used features in the model (before the encoding of categorical features): 6
Number of used features in the model (after the encoding of categorical features): 11
----------------------------------------------
We compute why the model predicts 0 for this instance:
reason = explainer.majoritary_reason(n=1)
print("explanation:", reason)
print("to_features:", explainer.to_features(reason))
explanation: (-2, -3, -6, 9)
to_features: ('Number_of_Priors <= 0.5', 'score_factor = 0', 'Age != Below_TwentyFive', 'Misdemeanor = 1')
Suppose that the user knows that every instance covered by the explanation (-2, -3, -6, 9) should be classified as a positive instance. The model must be rectified by the corresponding classification rule. Once the model has been corrected, the instance is classified as expected by the user:
model = explainer.rectify(conditions=reason, label=1)
print("new prediction:", model.predict_instance(instance))
-------------- Rectification information:
Classification Rule - Number of nodes: 9
Model - Number of nodes: 89814
Model - Number of nodes (after rectification): 290854
Model - Number of nodes (after simplification using the theory): 93768
Model - Number of nodes (after elimination of redundant nodes): 60176
--------------
new prediction: 1