Rectification for Random Forests
To rectify a random forest, we simply rectify each of its trees.
Example from a Real Dataset
For this example, we take the compas.csv dataset. We create a model using the hold-out approach (by default, the test size is set to 30%) and select a misclassified instance.
from pyxai import Learning, Explaining
learner = Learning.Scikitlearn("../dataset/compas.csv", problem_type='classification')
model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.RF, splitting_parameters={'random_state':0})
dict_information = learner.get_instances(model, n=1, indexes=Learning.TEST, is_correct=False, details=True, seed=2)
instance = dict_information["instance"]
label = dict_information["label"]
prediction = dict_information["prediction"]
-------------- Information ---------------
Problem type: classification
Instances type: tabular
Labels type: classes
Dataset path: ../dataset/compas.csv
nFeatures (nAttributes, with the labels): 11
nInstances (nObservations): 6172
nLabels: 2
--------------- Model creation, fitting and evaluation ---------------
Splitting method: hold-out
Problem type: classification
Models type: random-forest
model_parameters: {}
--------- Evaluation Information ---------
For the evaluation number 0:
Metrics:
sklearn_confusion_matrix: [[636, 216], [313, 378]]
precision: 63.63636363636363
recall: 54.70332850940666
f1_score: 58.83268482490272
specificity: 74.64788732394366
true_positive: 378
true_negative: 636
false_positive: 216
false_negative: 313
accuracy: 65.71613739468567
Number of Training instances: 4629
Number of Testing instances: 1543
--------------- Explainer ----------------
For the split number 0:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 71
--------------- Instances ----------------
Number of instances selected: 1
----------------------------------------------
We activate the explainer with the associated theory and the selected instance:
compas_types = {
"numerical": ["Number_of_Priors"],
"binary": ["Misdemeanor", "score_factor", "Female"],
"categorical": {"{African_American,Asian,Hispanic,Native_American,Other}": ["African_American", "Asian", "Hispanic", "Native_American", "Other"],
"Age*": ["Above_FourtyFive", "Below_TwentyFive"]}
}
explainer = Explaining.initialize(model, instance=instance, features_type=compas_types)
feature_names: ['Misdemeanor', 'Number_of_Priors', 'score_factor', 'Age_Above_FourtyFive', 'Age_Below_TwentyFive', 'African_American', 'Asian', 'Hispanic', 'Native_American', 'Other', 'Female']
--------- Theory Feature Types -----------
Before the one-hot encoding of categorical features:
Numerical features: 1
Categorical features: 2
Binary features: 3
Number of features: 6
Characteristics of categorical features: {'African_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'African_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Asian': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Asian', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Hispanic': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Hispanic', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Native_American': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Native_American', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Other': ['{African_American,Asian,Hispanic,Native_American,Other}', 'Other', ['African_American', 'Asian', 'Hispanic', 'Native_American', 'Other']], 'Age_Above_FourtyFive': ['Age', 'Above_FourtyFive', ['Above_FourtyFive', 'Below_TwentyFive']], 'Age_Below_TwentyFive': ['Age', 'Below_TwentyFive', ['Above_FourtyFive', 'Below_TwentyFive']]}
Number of used features in the model (before the encoding of categorical features): 6
Number of used features in the model (after the encoding of categorical features): 11
----------------------------------------------
reason = explainer.majoritary_reason(n=1)
print("Current prediction: ", explainer.target_prediction)
print("explanation:", reason)
print("to_features:", explainer.to_features(reason))
Current prediction: 1
explanation: (3, -4, 7, -8, -9, 12, -51)
to_features: ['Misdemeanor = 0', 'Number_of_Priors in ]0.5, 1.0]', 'score_factor = 1', 'Age = Below_TwentyFive', '{African_American,Asian,Hispanic,Native_American,Other} != Hispanic', 'Female = 0']
We compute a reason for this instance: Suppose that the user knows that every instance covered by the explanation (-2, -3, -6, 9) should be classified as a negative instance. The model must be rectified by the corresponding classification rule. Once the model has been corrected, the instance is classified as expected by the user:
print("current prediction:", model.predict_instance(instance))
model = explainer.rectify(conditions=reason, label=0) # we want to change the prediction
print("new prediction:", model.predict_instance(instance))
current prediction: 1
Rectify - Number of nodes - Initial (c++): 93368
Rectify - Number of nodes - After rectification (c++): 112936
Rectify - Number of nodes - After simplification with the theory (c++): 97314
Rectify - Number of nodes - After elimination of redundant nodes (c++): 61592
Rectify - Number of nodes - Final (c++): 61592
Rectification time: 0.21008984799999908
--------------
new prediction: 0