PyXAI

# Direct Reason

Let $BT$ be a boosted tree composed of {$T_1,\ldots T_n$} regression trees and $x$ an instance, the direct reason for $x$ is a subset of $t_{\vec x}$ (the binary form of the instance) corresponding to the conjunction for each $T_i$ of the term associated with the unique root-to-leaf path of $T_i$ that is compatible with $x$. Due to its simplicity, it is one of the easiest abductive explanation that can be computedn but it can be highly redundant. More information about the direct reason can be found in the article Computing Abductive Explanations for Boosted Regression Trees.

<Explainer Object>.direct_reason():
Returns the direct reason for the current instance. Returns None if this reason contains some excluded features. All kinds of operators in the conditions are supported. This reason is in the form of binary variables, you must use the to_features method if you want to obtain a representation based on the features represented at start.

The basic methods (initialize, set_instance, to_features, is_reason, …) of the explainer module used in the next examples are described in the Explainer Principles page.

## Example from Hand-Crafted Trees

Let us consider a loan application scenario that will be used as a running example. The goal is to predict the amount of money that can be granted to an applicant described using three attributes ($A = {A_1, A_2, A_3}$).

• $A_1$ is a numerical attribute giving the income per month of the applicant
• $A_2$ is a categorical feature giving its employment status as ”employed”, ”unemployed” or ”self-employed”
• $A_3$ is a Boolean feature set to true if the customer is married, false otherwise.

In this example:

• $A_1$ is represented by the feature identifier $F_1$
• $A_2$ has been one-hot encoded and is represented by feature identifiers $F_2$, $F_3$ and $F_4$, each of these features represents respectively the condition $A_2^{1} = employed$, $A_2^{2} = unemployed$ and $A_2^{3} = self-employed$
• $A_3$ is represented by the feature identifier $F_5$ and the condition $(A_3 = 1)$ (”the applicant is married”)

We consider the instance $x=(2200, 0, 0, 1, 1)$, corresponding to a person with a salary equal to 2200 per month, self employed (one hot encoded) and married. Then, $F(x) = 1500 + 250 + 250 = 2000$.

The direct reason for the instance $x = (2200, 0, 0, 1, 1)$ is in red and can be represented by ${A_1{>}2000, \overline{A_1{>}3000}, A_2^3, A_3}$.

We now show how to get it using PyXAI:

from pyxai import Builder, Explainer

node1_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=3000, left=1500, right=1750)
node1_2 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2000, left=1000, right=node1_1)
node1_3 = Builder.DecisionNode(1, operator=Builder.GT, threshold=1000, left=0, right=node1_2)
tree1 = Builder.DecisionTree(5, node1_3)

node2_1 = Builder.DecisionNode(5, operator=Builder.EQ, threshold=1, left=100, right=250)
node2_2 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-100, right=node2_1)
node2_3 = Builder.DecisionNode(2, operator=Builder.EQ, threshold=1, left=node2_2, right=250)
tree2 = Builder.DecisionTree(5, node2_3)

node3_1 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=500, right=250)
node3_2 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=250, right=100)
node3_3 = Builder.DecisionNode(1, operator=Builder.GE, threshold=2000, left=0, right=node3_1)
node3_4 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=node3_3, right=node3_2)
tree3 = Builder.DecisionTree(5, node3_4)

BT = Builder.BoostedTreesRegression([tree1, tree2, tree3])



We now compute the direct reason for this instance:

explainer = Explainer.initialize(BT)
explainer.set_instance((2200, 0, 0, 1, 1))
direct = explainer.direct_reason()
print("instance: (4,3,2,1)")
print("binary_representation:", explainer.binary_representation)
print("target_prediction:", explainer.target_prediction)
print("direct:", direct)
print("to_features:", explainer.to_features(direct))


instance: (4,3,2,1)
binary_representation: (1, 2, -3, -4, 5, 6, 7, -8)
target_prediction: 2000
direct: (1, 2, -3, -4, 5, 6, -8)
to_features: ('f1 in ]2000, 3000]', 'f2 != 1', 'f3 != 1', 'f4 == 1', 'f5 == 1')


As you can see, in this case, the direct reason corresponds to the full instance.

## Example from a Real Dataset

For this example, we take the Houses-prices dataset (this one here). We create a model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. As this dataset contains strings, we encode the data using PyXAI’s Preprocessor:

from pyxai import Learning

preprocessor = Learning.Preprocessor("../../dataset/houses-prices.csv", target_feature="SalePrice", learner_type=Learning.REGRESSION)

preprocessor.unset_features(["Id"])

preprocessor.set_categorical_features(columns=[
"MSSubClass",
"Street",
"LotShape",
"LandContour",
"LotConfig",
"LandSlope",
"Neighborhood",
"Condition1",
"Condition2",
"BldgType",
"HouseStyle",
"OverallQual",
"OverallCond",
"RoofStyle",
"RoofMatl",
"ExterQual",
"ExterCond",
"Foundation",
"Heating",
"HeatingQC",
"CentralAir",
"PavedDrive",
"SaleCondition"])

preprocessor.set_numerical_features({
"LotArea": None,
"YearBuilt": None,
"1stFlrSF": None,
"2ndFlrSF": None,
"LowQualFinSF": None,
"GrLivArea": None,
"FullBath": None,
"HalfBath": None,
"BedroomAbvGr": None,
"KitchenAbvGr": None,
"TotRmsAbvGrd": None,
"Fireplaces": None,
"WoodDeckSF": None,
"OpenPorchSF": None,
"EnclosedPorch": None,
"3SsnPorch": None,
"ScreenPorch": None,
"PoolArea": None,
"MiscVal": None,
"MoSold": None,
"YrSold": None
})

preprocessor.process()
dataset_name = "../../dataset/houses-prices.csv".split("/")[-1].split(".")[0]+"-converted"
preprocessor.export(dataset_name, output_directory="../../dataset")

Index(['Id', 'MSSubClass', 'LotArea', 'Street', 'LotShape', 'LandContour',
'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2',
'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt',
'Foundation', 'Heating', 'HeatingQC', 'CentralAir', '1stFlrSF',
'2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'FullBath', 'HalfBath',
'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces',
'PavedDrive', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch',
'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold',
'SaleCondition', 'SalePrice'],
dtype='object')
---------------    Converter    ---------------
Feature deleted:  Id
One hot encoding new features for MSSubClass: 16
-> The feature Street is boolean! No One Hot Encoding for this features.
-> However, the boolean feature Street contains strings. A ordinal encoding must be performed.
One hot encoding new features for LotShape: 4
One hot encoding new features for LandContour: 4
One hot encoding new features for LotConfig: 5
One hot encoding new features for LandSlope: 3
One hot encoding new features for Neighborhood: 25
One hot encoding new features for Condition1: 9
One hot encoding new features for Condition2: 8
One hot encoding new features for BldgType: 5
One hot encoding new features for HouseStyle: 8
One hot encoding new features for OverallQual: 10
One hot encoding new features for OverallCond: 9
One hot encoding new features for RoofStyle: 6
One hot encoding new features for RoofMatl: 8
One hot encoding new features for ExterQual: 4
One hot encoding new features for ExterCond: 5
One hot encoding new features for Foundation: 6
One hot encoding new features for Heating: 6
One hot encoding new features for HeatingQC: 5
-> The feature CentralAir is boolean! No One Hot Encoding for this features.
-> However, the boolean feature CentralAir contains strings. A ordinal encoding must be performed.
One hot encoding new features for PavedDrive: 3
One hot encoding new features for SaleCondition: 6
Dataset saved: ../../dataset/houses-prices-converted.csv
Types saved: ../../dataset/houses-prices-converted.types
-----------------------------------------------

DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame is highly fragmented.  This is usually the result of calling frame.insert many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.

Index(['Id', 'MSSubClass', 'LotArea', 'Street', 'LotShape', 'LandContour',
'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2',
'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt',
'Foundation', 'Heating', 'HeatingQC', 'CentralAir', '1stFlrSF',
'2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'FullBath', 'HalfBath',
'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces',
'PavedDrive', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch',
'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold',
'SaleCondition', 'SalePrice'],
dtype='object')
---------------    Converter    ---------------
Feature deleted:  Id
One hot encoding new features for MSSubClass: 16
-> The feature Street is boolean! No One Hot Encoding for this features.
-> However, the boolean feature Street contains strings. A ordinal encoding must be performed.
One hot encoding new features for LotShape: 4
One hot encoding new features for LandContour: 4
One hot encoding new features for LotConfig: 5
One hot encoding new features for LandSlope: 3
One hot encoding new features for Neighborhood: 25
One hot encoding new features for Condition1: 9
One hot encoding new features for Condition2: 8
One hot encoding new features for BldgType: 5
One hot encoding new features for HouseStyle: 8
One hot encoding new features for OverallQual: 10
One hot encoding new features for OverallCond: 9
One hot encoding new features for RoofStyle: 6
One hot encoding new features for RoofMatl: 8
One hot encoding new features for ExterQual: 4
One hot encoding new features for ExterCond: 5
One hot encoding new features for Foundation: 6
One hot encoding new features for Heating: 6
One hot encoding new features for HeatingQC: 5
-> The feature CentralAir is boolean! No One Hot Encoding for this features.
-> However, the boolean feature CentralAir contains strings. A ordinal encoding must be performed.
One hot encoding new features for PavedDrive: 3
One hot encoding new features for SaleCondition: 6
Dataset saved: ../../dataset/houses-prices-converted_0.csv
Types saved: ../../dataset/houses-prices-converted_0.types
-----------------------------------------------


Now we produce a model and pick up an instance:

from pyxai import Learning, Explainer

learner = Learning.Xgboost("../../dataset/houses-prices-converted_0.csv", learner_type=Learning.REGRESSION)
model = learner.evaluate(method=Learning.HOLD_OUT, output=Learning.BT)
instance, prediction = learner.get_instances(model, n=1)

data:
MSSubClass_20  MSSubClass_30  MSSubClass_40  MSSubClass_45  \
0                 0              0              0              0
1                 1              0              0              0
2                 0              0              0              0
3                 0              0              0              0
4                 0              0              0              0
...             ...            ...            ...            ...
2914              0              0              0              0
2915              0              0              0              0
2916              1              0              0              0
2917              0              0              0              0
2918              0              0              0              0

MSSubClass_50  MSSubClass_60  MSSubClass_70  MSSubClass_75  \
0                 0              1              0              0
1                 0              0              0              0
2                 0              1              0              0
3                 0              0              1              0
4                 0              1              0              0
...             ...            ...            ...            ...
2914              0              0              0              0
2915              0              0              0              0
2916              0              0              0              0
2917              0              0              0              0
2918              0              1              0              0

MSSubClass_80  MSSubClass_85  ...  MiscVal  MoSold  YrSold  \
0                 0              0  ...        0       2    2008
1                 0              0  ...        0       5    2007
2                 0              0  ...        0       9    2008
3                 0              0  ...        0       2    2006
4                 0              0  ...        0      12    2008
...             ...            ...  ...      ...     ...     ...
2914              0              0  ...        0       6    2006
2915              0              0  ...        0       4    2006
2916              0              0  ...        0       9    2006
2917              0              1  ...      700       7    2006
2918              0              0  ...        0      11    2006

0                         0                      0                     0
1                         0                      0                     0
2                         0                      0                     0
3                         1                      0                     0
4                         0                      0                     0
...                     ...                    ...                   ...
2914                      0                      0                     0
2915                      1                      0                     0
2916                      1                      0                     0
2917                      0                      0                     0
2918                      0                      0                     0

SaleCondition_Family  SaleCondition_Normal  SaleCondition_Partial  \
0                        0                     1                      0
1                        0                     1                      0
2                        0                     1                      0
3                        0                     0                      0
4                        0                     1                      0
...                    ...                   ...                    ...
2914                     0                     1                      0
2915                     0                     0                      0
2916                     0                     0                      0
2917                     0                     1                      0
2918                     0                     1                      0

SalePrice
0     208500.000000
1     181500.000000
2     223500.000000
3     140000.000000
4     250000.000000
...             ...
2914  167081.220949
2915  164788.778231
2916  219222.423400
2917  184924.279659
2918  187741.866657

[2919 rows x 180 columns]
--------------   Information   ---------------
Dataset name: ../../dataset/houses-prices-converted_0.csv
nFeatures (nAttributes, with the labels): 180
nInstances (nObservations): 2919
nLabels: None
---------------   Evaluation   ---------------
method: HoldOut
output: BT
learner_type: Regression
learner_options: {'seed': 0, 'max_depth': None}
---------   Evaluation Information   ---------
For the evaluation number 0:
metrics:
mean_squared_error: 1997310553.8387074
root_mean_squared_error: 44691.28051240765
mean_absolute_error: 29588.51328599622
nTraining instances: 2043
nTest instances: 876

---------------   Explainer   ----------------
For the evaluation number 0:
**Boosted Tree model**
NClasses: None
nTrees: 100
nVariables: 1696

---------------   Instances   ----------------
number of instances selected: 1
----------------------------------------------


Finally, we display the direct reason for this instance. Note that the theory created by the PyXAI’s Preprocessor is achieved by adding the parameter features_type="../../dataset/houses-prices-converted_0.types" to the initialize method. More information about theories is available on this page.

explainer = Explainer.initialize(model, instance, features_type="../../dataset/houses-prices-converted_0.types")
print("instance:", instance)
print("prediction:", prediction)
print()
direct_reason = explainer.direct_reason()
print("len binary representation:", len(explainer.binary_representation))
print("len direct:", len(direct_reason))
print("is_reason:", explainer.is_reason(direct_reason))
print("to_features:", explainer.to_features(direct_reason))

---------   Theory Feature Types   -----------
Before the encoding (without one hot encoded features), we have:
Numerical features: 22
Categorical features: 21
Binary features: 2
Number of features: 45

Number of used features in the model (before the encoding): 44
Number of used features in the model (after the encoding): 153
----------------------------------------------
instance: [   0    0    0    0    0    1    0    0    0    0    0    0    0    0
0    0 8450    1    0    0    0    1    0    0    0    1    0    0
0    0    1    1    0    0    0    0    0    0    0    1    0    0
0    0    0    0    0    0    0    0    0    0    0    0    0    0
0    0    0    0    0    1    0    0    0    0    0    0    0    0
1    0    0    0    0    0    1    0    0    0    0    0    0    0
0    0    1    0    0    0    0    0    0    0    0    1    0    0
0    0    0    0    0    1    0    0    0    0 2003 2003    0    1
0    0    0    0    0    1    0    0    0    0    0    0    0    0
1    0    0    0    0    0    1    0    0    1    0    0    0    0
1    0    0    0    0    1    0    0    0    0    1  856  854    0
1710    2    1    3    1    8    0    0    0    1    0   61    0    0
0    0    0    2 2008    0    0    0    0    1    0]
prediction: 199248.22

len binary representation: 1696
len direct: 413

is_reason: True
to_features: ('MSSubClass = 60', 'LotArea in [8159, 8592.5[', 'LotShape = Reg', 'LandContour = Lvl', 'LotConfig != {Corner,CulDSac,FR3,FR2}', 'LandSlope = Gtl', 'Neighborhood = CollgCr', 'Condition1 = Norm', 'Condition2 != Feedr', 'HouseStyle != {1.5Fin,1.5Unf,SFoyer}', 'OverallQual = 7', 'OverallCond = 5', 'YearBuilt in [2000, 2005.5[', 'YearRemodAdd in [1975.5, 2006.5[', 'RoofStyle = Gable', 'RoofMatl != {WdShngl,Tar&Grv}', 'ExterQual != {TA,Ex,Fa}', 'ExterCond = TA', 'Foundation = PConc', 'Heating = GasA', 'HeatingQC = Ex', 'CentralAir = 1', '1stFlrSF in [843, 868.5[', '2ndFlrSF in [841.5, 899.5[', 'LowQualFinSF < 114', 'GrLivArea in [1619, 1743.5[', 'FullBath in [0.5, 2.5[', 'BedroomAbvGr in [2.5, 5.5[', 'KitchenAbvGr in [0.5, 1.5[', 'TotRmsAbvGrd in [4.5, 9.5[', 'Fireplaces < 0.5', 'PavedDrive = Y', 'WoodDeckSF < 84.5', 'OpenPorchSF in [60.5, 62.5[', 'EnclosedPorch < 19', '3SsnPorch < 88', 'ScreenPorch < 32', 'PoolArea < 496', 'MiscVal < 467.5', 'MoSold in [1.5, 5.5[', 'YrSold in [2006.5, 2009.5[', 'SaleCondition != {Family,Abnorml,Partial}')


We can remark that the direct reason for this instance $x$ contains 413 binary variables of $t_{\vec x}$ out of 1696. This reason explains why the model predicts the regression value for this instance. But it is probably not the most compact reason for this instance, we invite you to look at the other types of reasons presented on the Boosted Tree Explanations page. More precisely, the Tree-Specific reasons are often more compact and therefore more interpretable reasons.