{ "cells": [ { "cell_type": "markdown", "id": "b1db8c9a", "metadata": {}, "source": [ "# Direct Reason" ] }, { "cell_type": "markdown", "id": "514a5144", "metadata": {}, "source": "Let $BT$ be a boosted tree composed of {$T_1,\\ldots T_n$} regression trees and $x$ an instance, the **direct reason** for $x$ is a subset of $t_{\\vec x}$ (the binary form of the instance) corresponding to the conjunction for each $T_i$ of the term associated with the unique root-to-leaf path of $T_i$ that is compatible with $x$. Due to its simplicity, it is one of the easiest abductive explanation that can be computed but it can be highly redundant. More information about the direct reason can be found in the article [Computing Abductive Explanations for Boosted Regression Trees](https://www.ijcai.org/proceedings/2023/382)." }, { "cell_type": "markdown", "id": "e4432d14", "metadata": {}, "source": [ "The basic methods ([``initialize``](http://localhost:4000/pyxai/documentation/api/modules/explaining/), ```set_instance```, ```to_features```, ```is_reason```, ...) of the ```explainer``` module used in the next examples are described in the [Explainer Principles](/documentation/explainer/) page. " ] }, { "cell_type": "markdown", "id": "d70f9a73", "metadata": {}, "source": [ "## Example from Hand-Crafted Trees" ] }, { "cell_type": "markdown", "id": "3df1afd3", "metadata": {}, "source": [ "Let us consider a loan application scenario that will be used as a running example. The goal is to predict\n", "the amount of money that can be granted to an applicant described using three attributes ($A = \\{A_1, A_2, A_3\\}$). \n", "- $A_1$ is a numerical attribute giving the income per month of the applicant\n", "- $A_2$ is a categorical feature giving its employment status as ”employed”, ”unemployed” or ”self-employed”\n", "- $A_3$ is a Boolean feature set to true if the customer is married, false otherwise. \n", "\n", "\"BTdirect\"\n", "\n", "In this example:\n", "\n", "- $A_1$ is represented by the feature identifier $F_1$\n", "- $A_2$ has been one-hot encoded and is represented by feature identifiers $F_2$, $F_3$ and $F_4$, each of these features represents respectively the condition $A_2^{1} = employed$, $A_2^{2} = unemployed$ and $A_2^{3} = self-employed$\n", "- $A_3$ is represented by the feature identifier $F_5$ and the condition $(A_3 = 1)$ (”the applicant is married”)\n", "\n", "We consider the instance $x=(2200, 0, 0, 1, 1)$, corresponding to a person with a salary equal to 2200 per month, self employed (one hot encoded) and married. Then, $F(x) = 1500 + 250 + 250 = 2000\\$.\n", "\n", "The direct reason for the instance $x = (2200, 0, 0, 1, 1)$ is in red and can be represented by $\\{A_1{>}2000, \\overline{A_1{>}3000}, A_2^3, A_3\\}$.\n", "\n", "We now show how to get it using PyXAI: " ] }, { "cell_type": "code", "execution_count": 1, "id": "411398a5", "metadata": {}, "outputs": [], "source": [ "from pyxai import Builder, Explaining\n", "\n", "node1_1 = Builder.DecisionNode(1, operator=Builder.GT, threshold=3000, left=1500, right=1750)\n", "node1_2 = Builder.DecisionNode(1, operator=Builder.GT, threshold=2000, left=1000, right=node1_1)\n", "node1_3 = Builder.DecisionNode(1, operator=Builder.GT, threshold=1000, left=0, right=node1_2)\n", "tree1 = Builder.DecisionTree(5, node1_3)\n", "\n", "\n", "node2_1 = Builder.DecisionNode(5, operator=Builder.EQ, threshold=1, left=100, right=250)\n", "node2_2 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=-100, right=node2_1)\n", "node2_3 = Builder.DecisionNode(2, operator=Builder.EQ, threshold=1, left=node2_2, right=250)\n", "tree2 = Builder.DecisionTree(5, node2_3)\n", "\n", "node3_1 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=500, right=250)\n", "node3_2 = Builder.DecisionNode(3, operator=Builder.EQ, threshold=1, left=250, right=100)\n", "node3_3 = Builder.DecisionNode(1, operator=Builder.GE, threshold=2000, left=0, right=node3_1)\n", "node3_4 = Builder.DecisionNode(4, operator=Builder.EQ, threshold=1, left=node3_3, right=node3_2)\n", "tree3 = Builder.DecisionTree(5, node3_4)\n", "\n", "\n", "BT = Builder.BoostedTreesRegression([tree1, tree2, tree3])\n" ] }, { "cell_type": "markdown", "id": "a2263e89", "metadata": {}, "source": [ "We now compute the direct reason for this instance: " ] }, { "cell_type": "code", "execution_count": 2, "id": "935837ea", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "instance: (4,3,2,1)\n", "binary_representation: (1, 2, -3, -4, 5, 6, 7, -8)\n", "target_prediction: 2000\n", "direct: (1, 2, -3, -4, 5, 6, -8)\n", "to_features: ['f1 in ]2000, 3000]', 'f2 != 1', 'f3 != 1', 'f4 == 1', 'f5 == 1']\n" ] } ], "source": [ "explainer = Explaining.initialize(BT)\n", "explainer.set_instance((2200, 0, 0, 1, 1))\n", "direct = explainer.direct_reason()\n", "print(\"instance: (4,3,2,1)\")\n", "print(\"binary_representation:\", explainer.binary_representation)\n", "print(\"target_prediction:\", explainer.target_prediction)\n", "print(\"direct:\", direct)\n", "print(\"to_features:\", explainer.to_features(direct))\n" ] }, { "cell_type": "markdown", "id": "fc75b1a7", "metadata": {}, "source": [ "As you can see, in this case, the direct reason corresponds to the full instance." ] }, { "cell_type": "markdown", "id": "4061b821", "metadata": {}, "source": [ "## Example from a Real Dataset" ] }, { "cell_type": "markdown", "id": "7c187df9", "metadata": {}, "source": [ "For this example, we take the [Houses-prices](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques) dataset (this one [here](/assets/notebooks/dataset/houses-prices.csv)). We create a model using the hold-out approach (by default, the test size is set to 30%) and select a well-classified instance. As this dataset contains strings, we encode the data using PyXAI's [Preprocessor]({{ site.baseurl }}/documentation/preprocessor/): " ] }, { "cell_type": "code", "execution_count": 3, "id": "59532d7f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------- Information ---------------\n", "Problem type: regression\n", "Instances type: tabular\n", "Labels type: None\n", "\n", "Dataset path: None\n", "--------------- Converter ---------------\n", "Feature deleted: Id\n", "One hot encoding new features for MSSubClass: 16\n", "-> The feature Street is boolean! No One Hot Encoding for this features.\n", "-> However, the boolean feature Street contains strings. A ordinal encoding must be performed.\n", "One hot encoding new features for LotShape: 4\n", "One hot encoding new features for LandContour: 4\n", "One hot encoding new features for LotConfig: 5\n", "One hot encoding new features for LandSlope: 3\n", "One hot encoding new features for Neighborhood: 25\n", "One hot encoding new features for Condition1: 9\n", "One hot encoding new features for Condition2: 8\n", "One hot encoding new features for BldgType: 5\n", "One hot encoding new features for HouseStyle: 8\n", "One hot encoding new features for OverallQual: 10\n", "One hot encoding new features for OverallCond: 9\n", "One hot encoding new features for RoofStyle: 6\n", "One hot encoding new features for RoofMatl: 8\n", "One hot encoding new features for ExterQual: 4\n", "One hot encoding new features for ExterCond: 5\n", "One hot encoding new features for Foundation: 6\n", "One hot encoding new features for Heating: 6\n", "One hot encoding new features for HeatingQC: 5\n", "-> The feature CentralAir is boolean! No One Hot Encoding for this features.\n", "-> However, the boolean feature CentralAir contains strings. A ordinal encoding must be performed.\n", "One hot encoding new features for PavedDrive: 3\n", "One hot encoding new features for SaleCondition: 6\n", "Dataset saved: ../../dataset/houses-prices-converted.csv\n", "Types saved: ../../dataset/houses-prices-converted.types\n", "-----------------------------------------------\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/audemard/expectation/softwares/pyxai-mlp/pyxai/sources/learners/preprocessor/tabular_preprocessor.py:500: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.data.insert(index, name, transformed_df[name], True)\n", "/home/audemard/expectation/softwares/pyxai-mlp/pyxai/sources/learners/preprocessor/tabular_preprocessor.py:500: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.data.insert(index, name, transformed_df[name], True)\n", "/home/audemard/expectation/softwares/pyxai-mlp/pyxai/sources/learners/preprocessor/tabular_preprocessor.py:500: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.data.insert(index, name, transformed_df[name], True)\n", "/home/audemard/expectation/softwares/pyxai-mlp/pyxai/sources/learners/preprocessor/tabular_preprocessor.py:500: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.data.insert(index, name, transformed_df[name], True)\n", "/home/audemard/expectation/softwares/pyxai-mlp/pyxai/sources/learners/preprocessor/tabular_preprocessor.py:500: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.data.insert(index, name, transformed_df[name], True)\n", "/home/audemard/expectation/softwares/pyxai-mlp/pyxai/sources/learners/preprocessor/tabular_preprocessor.py:500: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.data.insert(index, name, transformed_df[name], True)\n", "/home/audemard/expectation/softwares/pyxai-mlp/pyxai/sources/learners/preprocessor/tabular_preprocessor.py:500: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.data.insert(index, name, transformed_df[name], True)\n", "/home/audemard/expectation/softwares/pyxai-mlp/pyxai/sources/learners/preprocessor/tabular_preprocessor.py:500: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.data.insert(index, name, transformed_df[name], True)\n", "/home/audemard/expectation/softwares/pyxai-mlp/pyxai/sources/learners/preprocessor/tabular_preprocessor.py:500: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.data.insert(index, name, transformed_df[name], True)\n", "/home/audemard/expectation/softwares/pyxai-mlp/pyxai/sources/learners/preprocessor/tabular_preprocessor.py:500: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.data.insert(index, name, transformed_df[name], True)\n" ] } ], "source": [ "from pyxai import Learning\n", "\n", "preprocessor = Learning.TabularPreprocessor(\"../../dataset/houses-prices.csv\", target_feature=\"SalePrice\", problem_type=\"regression\")\n", "\n", "preprocessor.unset_features([\"Id\"])\n", "\n", "preprocessor.set_categorical_features(features=[\n", " \"MSSubClass\",\n", " \"Street\",\n", " \"LotShape\", \n", " \"LandContour\", \n", " \"LotConfig\", \n", " \"LandSlope\", \n", " \"Neighborhood\", \n", " \"Condition1\", \n", " \"Condition2\", \n", " \"BldgType\", \n", " \"HouseStyle\", \n", " \"OverallQual\", \n", " \"OverallCond\", \n", " \"RoofStyle\", \n", " \"RoofMatl\", \n", " \"ExterQual\", \n", " \"ExterCond\", \n", " \"Foundation\", \n", " \"Heating\", \n", " \"HeatingQC\", \n", " \"CentralAir\", \n", " \"PavedDrive\", \n", " \"SaleCondition\"])\n", "\n", "preprocessor.set_numerical_features({\n", " \"LotArea\": None,\n", " \"YearBuilt\": None, \n", " \"YearRemodAdd\": None, \n", " \"1stFlrSF\": None,\n", " \"2ndFlrSF\": None,\n", " \"LowQualFinSF\": None,\n", " \"GrLivArea\": None,\n", " \"FullBath\": None,\n", " \"HalfBath\": None,\n", " \"BedroomAbvGr\": None,\n", " \"KitchenAbvGr\": None,\n", " \"TotRmsAbvGrd\": None,\n", " \"Fireplaces\": None,\n", " \"WoodDeckSF\": None,\n", " \"OpenPorchSF\": None,\n", " \"EnclosedPorch\": None,\n", " \"3SsnPorch\": None,\n", " \"ScreenPorch\": None,\n", " \"PoolArea\": None,\n", " \"MiscVal\": None,\n", " \"MoSold\": None,\n", " \"YrSold\": None\n", " })\n", "\n", "\n", "preprocessor.process()\n", "dataset_name = \"../../dataset/houses-prices.csv\".split(\"/\")[-1].split(\".\")[0]+\"-converted\" \n", "preprocessor.export(dataset_name, output_directory=\"../../dataset\")" ] }, { "cell_type": "markdown", "id": "1e06e792", "metadata": {}, "source": [ "```console\n", "Index(['Id', 'MSSubClass', 'LotArea', 'Street', 'LotShape', 'LandContour',\n", " 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2',\n", " 'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt',\n", " 'YearRemodAdd', 'RoofStyle', 'RoofMatl', 'ExterQual', 'ExterCond',\n", " 'Foundation', 'Heating', 'HeatingQC', 'CentralAir', '1stFlrSF',\n", " '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'FullBath', 'HalfBath',\n", " 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces',\n", " 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch',\n", " 'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold',\n", " 'SaleCondition', 'SalePrice'],\n", " dtype='object')\n", "--------------- Converter ---------------\n", "Feature deleted: Id\n", "One hot encoding new features for MSSubClass: 16\n", "-> The feature Street is boolean! No One Hot Encoding for this features.\n", "-> However, the boolean feature Street contains strings. A ordinal encoding must be performed.\n", "One hot encoding new features for LotShape: 4\n", "One hot encoding new features for LandContour: 4\n", "One hot encoding new features for LotConfig: 5\n", "One hot encoding new features for LandSlope: 3\n", "One hot encoding new features for Neighborhood: 25\n", "One hot encoding new features for Condition1: 9\n", "One hot encoding new features for Condition2: 8\n", "One hot encoding new features for BldgType: 5\n", "One hot encoding new features for HouseStyle: 8\n", "One hot encoding new features for OverallQual: 10\n", "One hot encoding new features for OverallCond: 9\n", "One hot encoding new features for RoofStyle: 6\n", "One hot encoding new features for RoofMatl: 8\n", "One hot encoding new features for ExterQual: 4\n", "One hot encoding new features for ExterCond: 5\n", "One hot encoding new features for Foundation: 6\n", "One hot encoding new features for Heating: 6\n", "One hot encoding new features for HeatingQC: 5\n", "-> The feature CentralAir is boolean! No One Hot Encoding for this features.\n", "-> However, the boolean feature CentralAir contains strings. A ordinal encoding must be performed.\n", "One hot encoding new features for PavedDrive: 3\n", "One hot encoding new features for SaleCondition: 6\n", "Dataset saved: ../../dataset/houses-prices-converted_0.csv\n", "Types saved: ../../dataset/houses-prices-converted_0.types\n", "-----------------------------------------------\n", "```" ] }, { "cell_type": "markdown", "id": "8bb64dc8", "metadata": {}, "source": [ "Now we produce a model and pick up an instance: " ] }, { "cell_type": "code", "execution_count": 4, "id": "e3fe96a1", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-------------- Information ---------------\n", "Problem type: regression\n", "Instances type: tabular\n", "Labels type: continuous-values\n", "\n", "Dataset path: ../../dataset/houses-prices-converted_0.csv\n", "nFeatures (nAttributes, with the labels): 179\n", "nInstances (nObservations): 2919\n", "--------------- Model creation, fitting and evaluation ---------------\n", "Splitting method: hold-out\n", "Problem type: regression\n", "Models type: boosted-tree\n", "model_parameters: {}\n", "--------- Evaluation Information ---------\n", "For the evaluation number 0:\n", "Metrics:\n", " mean_squared_error: 2285286545.0315585\n", " root_mean_squared_error: 47804.670744934105\n", " mean_absolute_error: 30935.259742048067\n", "Number of Training instances: 2189\n", "Number of Testing instances: 730\n", "\n", "--------------- Explainer ----------------\n", "For the split number 0:\n", "**Boosted Tree Regression model**\n", "nTrees: 100\n", "nVariables: 1056\n", "\n", "--------------- Instances ----------------\n", "Number of instances selected: 1\n", "----------------------------------------------\n" ] } ], "source": [ "from pyxai import Learning, Explaining\n", "\n", "learner = Learning.Xgboost(\"../../dataset/houses-prices-converted_0.csv\", problem_type=Learning.REGRESSION)\n", "model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.BT)\n", "instance, prediction = learner.get_instances(model, n=1)" ] }, { "cell_type": "markdown", "id": "bcd926f6", "metadata": {}, "source": [ "Finally, we display the direct reason for this instance. Note that the theory created by the PyXAI's Preprocessor is achieved by adding the parameter ```features_type=\"../../dataset/houses-prices-converted_0.types\"``` to the ```initialize``` method. More information about theories is available on this [page]({{ site.baseurl }}/documentation/explainer/theories/)." ] }, { "cell_type": "code", "execution_count": 5, "id": "f5ba8cd7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "feature_names: ['MSSubClass_20', 'MSSubClass_30', 'MSSubClass_40', 'MSSubClass_45', 'MSSubClass_50', 'MSSubClass_60', 'MSSubClass_70', 'MSSubClass_75', 'MSSubClass_80', 'MSSubClass_85', 'MSSubClass_90', 'MSSubClass_120', 'MSSubClass_150', 'MSSubClass_160', 'MSSubClass_180', 'MSSubClass_190', 'LotArea', 'Street', 'LotShape_IR1', 'LotShape_IR2', 'LotShape_IR3', 'LotShape_Reg', 'LandContour_Bnk', 'LandContour_HLS', 'LandContour_Low', 'LandContour_Lvl', 'LotConfig_Corner', 'LotConfig_CulDSac', 'LotConfig_FR2', 'LotConfig_FR3', 'LotConfig_Inside', 'LandSlope_Gtl', 'LandSlope_Mod', 'LandSlope_Sev', 'Neighborhood_Blmngtn', 'Neighborhood_Blueste', 'Neighborhood_BrDale', 'Neighborhood_BrkSide', 'Neighborhood_ClearCr', 'Neighborhood_CollgCr', 'Neighborhood_Crawfor', 'Neighborhood_Edwards', 'Neighborhood_Gilbert', 'Neighborhood_IDOTRR', 'Neighborhood_MeadowV', 'Neighborhood_Mitchel', 'Neighborhood_NAmes', 'Neighborhood_NPkVill', 'Neighborhood_NWAmes', 'Neighborhood_NoRidge', 'Neighborhood_NridgHt', 'Neighborhood_OldTown', 'Neighborhood_SWISU', 'Neighborhood_Sawyer', 'Neighborhood_SawyerW', 'Neighborhood_Somerst', 'Neighborhood_StoneBr', 'Neighborhood_Timber', 'Neighborhood_Veenker', 'Condition1_Artery', 'Condition1_Feedr', 'Condition1_Norm', 'Condition1_PosA', 'Condition1_PosN', 'Condition1_RRAe', 'Condition1_RRAn', 'Condition1_RRNe', 'Condition1_RRNn', 'Condition2_Artery', 'Condition2_Feedr', 'Condition2_Norm', 'Condition2_PosA', 'Condition2_PosN', 'Condition2_RRAe', 'Condition2_RRAn', 'Condition2_RRNn', 'BldgType_1Fam', 'BldgType_2fmCon', 'BldgType_Duplex', 'BldgType_Twnhs', 'BldgType_TwnhsE', 'HouseStyle_1.5Fin', 'HouseStyle_1.5Unf', 'HouseStyle_1Story', 'HouseStyle_2.5Fin', 'HouseStyle_2.5Unf', 'HouseStyle_2Story', 'HouseStyle_SFoyer', 'HouseStyle_SLvl', 'OverallQual_1', 'OverallQual_2', 'OverallQual_3', 'OverallQual_4', 'OverallQual_5', 'OverallQual_6', 'OverallQual_7', 'OverallQual_8', 'OverallQual_9', 'OverallQual_10', 'OverallCond_1', 'OverallCond_2', 'OverallCond_3', 'OverallCond_4', 'OverallCond_5', 'OverallCond_6', 'OverallCond_7', 'OverallCond_8', 'OverallCond_9', 'YearBuilt', 'YearRemodAdd', 'RoofStyle_Flat', 'RoofStyle_Gable', 'RoofStyle_Gambrel', 'RoofStyle_Hip', 'RoofStyle_Mansard', 'RoofStyle_Shed', 'RoofMatl_ClyTile', 'RoofMatl_CompShg', 'RoofMatl_Membran', 'RoofMatl_Metal', 'RoofMatl_Roll', 'RoofMatl_Tar&Grv', 'RoofMatl_WdShake', 'RoofMatl_WdShngl', 'ExterQual_Ex', 'ExterQual_Fa', 'ExterQual_Gd', 'ExterQual_TA', 'ExterCond_Ex', 'ExterCond_Fa', 'ExterCond_Gd', 'ExterCond_Po', 'ExterCond_TA', 'Foundation_BrkTil', 'Foundation_CBlock', 'Foundation_PConc', 'Foundation_Slab', 'Foundation_Stone', 'Foundation_Wood', 'Heating_Floor', 'Heating_GasA', 'Heating_GasW', 'Heating_Grav', 'Heating_OthW', 'Heating_Wall', 'HeatingQC_Ex', 'HeatingQC_Fa', 'HeatingQC_Gd', 'HeatingQC_Po', 'HeatingQC_TA', 'CentralAir', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'PavedDrive_N', 'PavedDrive_P', 'PavedDrive_Y', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold', 'SaleCondition_Abnorml', 'SaleCondition_AdjLand', 'SaleCondition_Alloca', 'SaleCondition_Family', 'SaleCondition_Normal', 'SaleCondition_Partial']\n", "--------- Theory Feature Types -----------\n", "Before the one-hot encoding of categorical features:\n", "Numerical features: 22\n", "Categorical features: 21\n", "Binary features: 2\n", "Number of features: 45\n", "Characteristics of categorical features: {'MSSubClass_20': ['MSSubClass', 20, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_30': ['MSSubClass', 30, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_40': ['MSSubClass', 40, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_45': ['MSSubClass', 45, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_50': ['MSSubClass', 50, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_60': ['MSSubClass', 60, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_70': ['MSSubClass', 70, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_75': ['MSSubClass', 75, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_80': ['MSSubClass', 80, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_85': ['MSSubClass', 85, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_90': ['MSSubClass', 90, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_120': ['MSSubClass', 120, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_150': ['MSSubClass', 150, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_160': ['MSSubClass', 160, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_180': ['MSSubClass', 180, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'MSSubClass_190': ['MSSubClass', 190, [20, 30, 40, 45, 50, 60, 70, 75, 80, 85, 90, 120, 150, 160, 180, 190]], 'LotShape_IR1': ['LotShape', 'IR1', ['IR1', 'IR2', 'IR3', 'Reg']], 'LotShape_IR2': ['LotShape', 'IR2', ['IR1', 'IR2', 'IR3', 'Reg']], 'LotShape_IR3': ['LotShape', 'IR3', ['IR1', 'IR2', 'IR3', 'Reg']], 'LotShape_Reg': ['LotShape', 'Reg', ['IR1', 'IR2', 'IR3', 'Reg']], 'LandContour_Bnk': ['LandContour', 'Bnk', ['Bnk', 'HLS', 'Low', 'Lvl']], 'LandContour_HLS': ['LandContour', 'HLS', ['Bnk', 'HLS', 'Low', 'Lvl']], 'LandContour_Low': ['LandContour', 'Low', ['Bnk', 'HLS', 'Low', 'Lvl']], 'LandContour_Lvl': ['LandContour', 'Lvl', ['Bnk', 'HLS', 'Low', 'Lvl']], 'LotConfig_Corner': ['LotConfig', 'Corner', ['Corner', 'CulDSac', 'FR2', 'FR3', 'Inside']], 'LotConfig_CulDSac': ['LotConfig', 'CulDSac', ['Corner', 'CulDSac', 'FR2', 'FR3', 'Inside']], 'LotConfig_FR2': ['LotConfig', 'FR2', ['Corner', 'CulDSac', 'FR2', 'FR3', 'Inside']], 'LotConfig_FR3': ['LotConfig', 'FR3', ['Corner', 'CulDSac', 'FR2', 'FR3', 'Inside']], 'LotConfig_Inside': ['LotConfig', 'Inside', ['Corner', 'CulDSac', 'FR2', 'FR3', 'Inside']], 'LandSlope_Gtl': ['LandSlope', 'Gtl', ['Gtl', 'Mod', 'Sev']], 'LandSlope_Mod': ['LandSlope', 'Mod', ['Gtl', 'Mod', 'Sev']], 'LandSlope_Sev': ['LandSlope', 'Sev', ['Gtl', 'Mod', 'Sev']], 'Neighborhood_Blmngtn': ['Neighborhood', 'Blmngtn', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_Blueste': ['Neighborhood', 'Blueste', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_BrDale': ['Neighborhood', 'BrDale', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_BrkSide': ['Neighborhood', 'BrkSide', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_ClearCr': ['Neighborhood', 'ClearCr', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_CollgCr': ['Neighborhood', 'CollgCr', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_Crawfor': ['Neighborhood', 'Crawfor', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_Edwards': ['Neighborhood', 'Edwards', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_Gilbert': ['Neighborhood', 'Gilbert', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_IDOTRR': ['Neighborhood', 'IDOTRR', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_MeadowV': ['Neighborhood', 'MeadowV', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_Mitchel': ['Neighborhood', 'Mitchel', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_NAmes': ['Neighborhood', 'NAmes', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_NPkVill': ['Neighborhood', 'NPkVill', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_NWAmes': ['Neighborhood', 'NWAmes', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_NoRidge': ['Neighborhood', 'NoRidge', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_NridgHt': ['Neighborhood', 'NridgHt', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_OldTown': ['Neighborhood', 'OldTown', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_SWISU': ['Neighborhood', 'SWISU', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_Sawyer': ['Neighborhood', 'Sawyer', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_SawyerW': ['Neighborhood', 'SawyerW', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_Somerst': ['Neighborhood', 'Somerst', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_StoneBr': ['Neighborhood', 'StoneBr', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_Timber': ['Neighborhood', 'Timber', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Neighborhood_Veenker': ['Neighborhood', 'Veenker', ['Blmngtn', 'Blueste', 'BrDale', 'BrkSide', 'ClearCr', 'CollgCr', 'Crawfor', 'Edwards', 'Gilbert', 'IDOTRR', 'MeadowV', 'Mitchel', 'NAmes', 'NPkVill', 'NWAmes', 'NoRidge', 'NridgHt', 'OldTown', 'SWISU', 'Sawyer', 'SawyerW', 'Somerst', 'StoneBr', 'Timber', 'Veenker']], 'Condition1_Artery': ['Condition1', 'Artery', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNe', 'RRNn']], 'Condition1_Feedr': ['Condition1', 'Feedr', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNe', 'RRNn']], 'Condition1_Norm': ['Condition1', 'Norm', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNe', 'RRNn']], 'Condition1_PosA': ['Condition1', 'PosA', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNe', 'RRNn']], 'Condition1_PosN': ['Condition1', 'PosN', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNe', 'RRNn']], 'Condition1_RRAe': ['Condition1', 'RRAe', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNe', 'RRNn']], 'Condition1_RRAn': ['Condition1', 'RRAn', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNe', 'RRNn']], 'Condition1_RRNe': ['Condition1', 'RRNe', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNe', 'RRNn']], 'Condition1_RRNn': ['Condition1', 'RRNn', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNe', 'RRNn']], 'Condition2_Artery': ['Condition2', 'Artery', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNn']], 'Condition2_Feedr': ['Condition2', 'Feedr', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNn']], 'Condition2_Norm': ['Condition2', 'Norm', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNn']], 'Condition2_PosA': ['Condition2', 'PosA', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNn']], 'Condition2_PosN': ['Condition2', 'PosN', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNn']], 'Condition2_RRAe': ['Condition2', 'RRAe', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNn']], 'Condition2_RRAn': ['Condition2', 'RRAn', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNn']], 'Condition2_RRNn': ['Condition2', 'RRNn', ['Artery', 'Feedr', 'Norm', 'PosA', 'PosN', 'RRAe', 'RRAn', 'RRNn']], 'BldgType_1Fam': ['BldgType', '1Fam', ['1Fam', '2fmCon', 'Duplex', 'Twnhs', 'TwnhsE']], 'BldgType_2fmCon': ['BldgType', '2fmCon', ['1Fam', '2fmCon', 'Duplex', 'Twnhs', 'TwnhsE']], 'BldgType_Duplex': ['BldgType', 'Duplex', ['1Fam', '2fmCon', 'Duplex', 'Twnhs', 'TwnhsE']], 'BldgType_Twnhs': ['BldgType', 'Twnhs', ['1Fam', '2fmCon', 'Duplex', 'Twnhs', 'TwnhsE']], 'BldgType_TwnhsE': ['BldgType', 'TwnhsE', ['1Fam', '2fmCon', 'Duplex', 'Twnhs', 'TwnhsE']], 'HouseStyle_1.5Fin': ['HouseStyle', '1.5Fin', ['1.5Fin', '1.5Unf', '1Story', '2.5Fin', '2.5Unf', '2Story', 'SFoyer', 'SLvl']], 'HouseStyle_1.5Unf': ['HouseStyle', '1.5Unf', ['1.5Fin', '1.5Unf', '1Story', '2.5Fin', '2.5Unf', '2Story', 'SFoyer', 'SLvl']], 'HouseStyle_1Story': ['HouseStyle', '1Story', ['1.5Fin', '1.5Unf', '1Story', '2.5Fin', '2.5Unf', '2Story', 'SFoyer', 'SLvl']], 'HouseStyle_2.5Fin': ['HouseStyle', '2.5Fin', ['1.5Fin', '1.5Unf', '1Story', '2.5Fin', '2.5Unf', '2Story', 'SFoyer', 'SLvl']], 'HouseStyle_2.5Unf': ['HouseStyle', '2.5Unf', ['1.5Fin', '1.5Unf', '1Story', '2.5Fin', '2.5Unf', '2Story', 'SFoyer', 'SLvl']], 'HouseStyle_2Story': ['HouseStyle', '2Story', ['1.5Fin', '1.5Unf', '1Story', '2.5Fin', '2.5Unf', '2Story', 'SFoyer', 'SLvl']], 'HouseStyle_SFoyer': ['HouseStyle', 'SFoyer', ['1.5Fin', '1.5Unf', '1Story', '2.5Fin', '2.5Unf', '2Story', 'SFoyer', 'SLvl']], 'HouseStyle_SLvl': ['HouseStyle', 'SLvl', ['1.5Fin', '1.5Unf', '1Story', '2.5Fin', '2.5Unf', '2Story', 'SFoyer', 'SLvl']], 'OverallQual_1': ['OverallQual', 1, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], 'OverallQual_2': ['OverallQual', 2, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], 'OverallQual_3': ['OverallQual', 3, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], 'OverallQual_4': ['OverallQual', 4, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], 'OverallQual_5': ['OverallQual', 5, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], 'OverallQual_6': ['OverallQual', 6, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], 'OverallQual_7': ['OverallQual', 7, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], 'OverallQual_8': ['OverallQual', 8, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], 'OverallQual_9': ['OverallQual', 9, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], 'OverallQual_10': ['OverallQual', 10, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], 'OverallCond_1': ['OverallCond', 1, [1, 2, 3, 4, 5, 6, 7, 8, 9]], 'OverallCond_2': ['OverallCond', 2, [1, 2, 3, 4, 5, 6, 7, 8, 9]], 'OverallCond_3': ['OverallCond', 3, [1, 2, 3, 4, 5, 6, 7, 8, 9]], 'OverallCond_4': ['OverallCond', 4, [1, 2, 3, 4, 5, 6, 7, 8, 9]], 'OverallCond_5': ['OverallCond', 5, [1, 2, 3, 4, 5, 6, 7, 8, 9]], 'OverallCond_6': ['OverallCond', 6, [1, 2, 3, 4, 5, 6, 7, 8, 9]], 'OverallCond_7': ['OverallCond', 7, [1, 2, 3, 4, 5, 6, 7, 8, 9]], 'OverallCond_8': ['OverallCond', 8, [1, 2, 3, 4, 5, 6, 7, 8, 9]], 'OverallCond_9': ['OverallCond', 9, [1, 2, 3, 4, 5, 6, 7, 8, 9]], 'RoofStyle_Flat': ['RoofStyle', 'Flat', ['Flat', 'Gable', 'Gambrel', 'Hip', 'Mansard', 'Shed']], 'RoofStyle_Gable': ['RoofStyle', 'Gable', ['Flat', 'Gable', 'Gambrel', 'Hip', 'Mansard', 'Shed']], 'RoofStyle_Gambrel': ['RoofStyle', 'Gambrel', ['Flat', 'Gable', 'Gambrel', 'Hip', 'Mansard', 'Shed']], 'RoofStyle_Hip': ['RoofStyle', 'Hip', ['Flat', 'Gable', 'Gambrel', 'Hip', 'Mansard', 'Shed']], 'RoofStyle_Mansard': ['RoofStyle', 'Mansard', ['Flat', 'Gable', 'Gambrel', 'Hip', 'Mansard', 'Shed']], 'RoofStyle_Shed': ['RoofStyle', 'Shed', ['Flat', 'Gable', 'Gambrel', 'Hip', 'Mansard', 'Shed']], 'RoofMatl_ClyTile': ['RoofMatl', 'ClyTile', ['ClyTile', 'CompShg', 'Membran', 'Metal', 'Roll', 'Tar&Grv', 'WdShake', 'WdShngl']], 'RoofMatl_CompShg': ['RoofMatl', 'CompShg', ['ClyTile', 'CompShg', 'Membran', 'Metal', 'Roll', 'Tar&Grv', 'WdShake', 'WdShngl']], 'RoofMatl_Membran': ['RoofMatl', 'Membran', ['ClyTile', 'CompShg', 'Membran', 'Metal', 'Roll', 'Tar&Grv', 'WdShake', 'WdShngl']], 'RoofMatl_Metal': ['RoofMatl', 'Metal', ['ClyTile', 'CompShg', 'Membran', 'Metal', 'Roll', 'Tar&Grv', 'WdShake', 'WdShngl']], 'RoofMatl_Roll': ['RoofMatl', 'Roll', ['ClyTile', 'CompShg', 'Membran', 'Metal', 'Roll', 'Tar&Grv', 'WdShake', 'WdShngl']], 'RoofMatl_Tar&Grv': ['RoofMatl', 'Tar&Grv', ['ClyTile', 'CompShg', 'Membran', 'Metal', 'Roll', 'Tar&Grv', 'WdShake', 'WdShngl']], 'RoofMatl_WdShake': ['RoofMatl', 'WdShake', ['ClyTile', 'CompShg', 'Membran', 'Metal', 'Roll', 'Tar&Grv', 'WdShake', 'WdShngl']], 'RoofMatl_WdShngl': ['RoofMatl', 'WdShngl', ['ClyTile', 'CompShg', 'Membran', 'Metal', 'Roll', 'Tar&Grv', 'WdShake', 'WdShngl']], 'ExterQual_Ex': ['ExterQual', 'Ex', ['Ex', 'Fa', 'Gd', 'TA']], 'ExterQual_Fa': ['ExterQual', 'Fa', ['Ex', 'Fa', 'Gd', 'TA']], 'ExterQual_Gd': ['ExterQual', 'Gd', ['Ex', 'Fa', 'Gd', 'TA']], 'ExterQual_TA': ['ExterQual', 'TA', ['Ex', 'Fa', 'Gd', 'TA']], 'ExterCond_Ex': ['ExterCond', 'Ex', ['Ex', 'Fa', 'Gd', 'Po', 'TA']], 'ExterCond_Fa': ['ExterCond', 'Fa', ['Ex', 'Fa', 'Gd', 'Po', 'TA']], 'ExterCond_Gd': ['ExterCond', 'Gd', ['Ex', 'Fa', 'Gd', 'Po', 'TA']], 'ExterCond_Po': ['ExterCond', 'Po', ['Ex', 'Fa', 'Gd', 'Po', 'TA']], 'ExterCond_TA': ['ExterCond', 'TA', ['Ex', 'Fa', 'Gd', 'Po', 'TA']], 'Foundation_BrkTil': ['Foundation', 'BrkTil', ['BrkTil', 'CBlock', 'PConc', 'Slab', 'Stone', 'Wood']], 'Foundation_CBlock': ['Foundation', 'CBlock', ['BrkTil', 'CBlock', 'PConc', 'Slab', 'Stone', 'Wood']], 'Foundation_PConc': ['Foundation', 'PConc', ['BrkTil', 'CBlock', 'PConc', 'Slab', 'Stone', 'Wood']], 'Foundation_Slab': ['Foundation', 'Slab', ['BrkTil', 'CBlock', 'PConc', 'Slab', 'Stone', 'Wood']], 'Foundation_Stone': ['Foundation', 'Stone', ['BrkTil', 'CBlock', 'PConc', 'Slab', 'Stone', 'Wood']], 'Foundation_Wood': ['Foundation', 'Wood', ['BrkTil', 'CBlock', 'PConc', 'Slab', 'Stone', 'Wood']], 'Heating_Floor': ['Heating', 'Floor', ['Floor', 'GasA', 'GasW', 'Grav', 'OthW', 'Wall']], 'Heating_GasA': ['Heating', 'GasA', ['Floor', 'GasA', 'GasW', 'Grav', 'OthW', 'Wall']], 'Heating_GasW': ['Heating', 'GasW', ['Floor', 'GasA', 'GasW', 'Grav', 'OthW', 'Wall']], 'Heating_Grav': ['Heating', 'Grav', ['Floor', 'GasA', 'GasW', 'Grav', 'OthW', 'Wall']], 'Heating_OthW': ['Heating', 'OthW', ['Floor', 'GasA', 'GasW', 'Grav', 'OthW', 'Wall']], 'Heating_Wall': ['Heating', 'Wall', ['Floor', 'GasA', 'GasW', 'Grav', 'OthW', 'Wall']], 'HeatingQC_Ex': ['HeatingQC', 'Ex', ['Ex', 'Fa', 'Gd', 'Po', 'TA']], 'HeatingQC_Fa': ['HeatingQC', 'Fa', ['Ex', 'Fa', 'Gd', 'Po', 'TA']], 'HeatingQC_Gd': ['HeatingQC', 'Gd', ['Ex', 'Fa', 'Gd', 'Po', 'TA']], 'HeatingQC_Po': ['HeatingQC', 'Po', ['Ex', 'Fa', 'Gd', 'Po', 'TA']], 'HeatingQC_TA': ['HeatingQC', 'TA', ['Ex', 'Fa', 'Gd', 'Po', 'TA']], 'PavedDrive_N': ['PavedDrive', 'N', ['N', 'P', 'Y']], 'PavedDrive_P': ['PavedDrive', 'P', ['N', 'P', 'Y']], 'PavedDrive_Y': ['PavedDrive', 'Y', ['N', 'P', 'Y']], 'SaleCondition_Abnorml': ['SaleCondition', 'Abnorml', ['Abnorml', 'AdjLand', 'Alloca', 'Family', 'Normal', 'Partial']], 'SaleCondition_AdjLand': ['SaleCondition', 'AdjLand', ['Abnorml', 'AdjLand', 'Alloca', 'Family', 'Normal', 'Partial']], 'SaleCondition_Alloca': ['SaleCondition', 'Alloca', ['Abnorml', 'AdjLand', 'Alloca', 'Family', 'Normal', 'Partial']], 'SaleCondition_Family': ['SaleCondition', 'Family', ['Abnorml', 'AdjLand', 'Alloca', 'Family', 'Normal', 'Partial']], 'SaleCondition_Normal': ['SaleCondition', 'Normal', ['Abnorml', 'AdjLand', 'Alloca', 'Family', 'Normal', 'Partial']], 'SaleCondition_Partial': ['SaleCondition', 'Partial', ['Abnorml', 'AdjLand', 'Alloca', 'Family', 'Normal', 'Partial']]}\n", "\n", "Number of used features in the model (before the encoding of categorical features): 45\n", "Number of used features in the model (after the encoding of categorical features): 156\n", "----------------------------------------------\n", "instance: MSSubClass_20 0\n", "MSSubClass_30 0\n", "MSSubClass_40 0\n", "MSSubClass_45 0\n", "MSSubClass_50 0\n", " ..\n", "SaleCondition_AdjLand 0\n", "SaleCondition_Alloca 0\n", "SaleCondition_Family 0\n", "SaleCondition_Normal 1\n", "SaleCondition_Partial 0\n", "Name: 0, Length: 179, dtype: int64\n", "prediction: 187277.17\n", "\n", "len binary representation: 1056\n", "len direct: 349\n", "to_features: ['MSSubClass = 60', 'LotArea in [8170.0, 8750.0[', 'Street = 1', 'LotShape != {IR2,IR3}', 'LandContour = Lvl', 'LotConfig = Inside', 'LandSlope != {Sev,Mod}', 'Neighborhood = CollgCr', 'Condition1 != {Artery,PosN,RRNn,RRAe,RRNe}', 'Condition2 = Norm', 'BldgType = 1Fam', 'HouseStyle != {1.5Fin,SFoyer,SLvl}', 'OverallQual = 7', 'OverallCond = 5', 'YearBuilt in [1996.0, 2006.0[', 'YearRemodAdd in [2002.0, 2006.0[', 'RoofStyle != {Mansard,Flat,Shed}', 'ExterQual != {TA,Fa,Ex}', 'ExterCond = TA', 'Foundation = PConc', 'Heating = GasA', 'HeatingQC != TA', 'CentralAir = 1', '1stFlrSF in [728.0, 866.0[', '2ndFlrSF in [843.0, 863.0[', 'LowQualFinSF < 140.0', 'GrLivArea in [1702.0, 1802.0[', 'FullBath >= 2.0', 'HalfBath >= 1.0', 'BedroomAbvGr in [3.0, 4.0[', 'KitchenAbvGr in [1.0, 2.0[', 'TotRmsAbvGrd in [7.0, 10.0[', 'Fireplaces < 1.0', 'PavedDrive = Y', 'WoodDeckSF < 48.0', 'OpenPorchSF in [59.0, 63.0[', 'EnclosedPorch < 32.0', '3SsnPorch < 144.0', 'ScreenPorch < 53.0', 'MiscVal < 54.0', 'MoSold < 4.0', 'YrSold in [2008.0, 2009.0[', 'SaleCondition != {Family,AdjLand}']\n" ] } ], "source": [ "explainer = Explaining.initialize(model, instance, features_type=\"../../dataset/houses-prices-converted_0.types\")\n", "print(\"instance:\", instance)\n", "print(\"prediction:\", prediction)\n", "print()\n", "direct_reason = explainer.direct_reason()\n", "print(\"len binary representation:\", len(explainer.binary_representation))\n", "print(\"len direct:\", len(direct_reason))\n", "print(\"to_features:\", explainer.to_features(direct_reason))" ] }, { "cell_type": "markdown", "id": "663667d5", "metadata": {}, "source": [ "We can remark that the direct reason for this instance $x$ contains 413 binary variables of $t_{\\vec x}$ out of 1696. This reason explains why the model predicts the regression value for this instance. But it is probably not the most compact reason for this instance, we invite you to look at the other types of reasons presented on the [Boosted Tree Explanations]({{ site.baseurl }}/documentation/regression/BTregression/) page. More precisely, the [Tree-Specific]({{ site.baseurl }}/documentation/regression/BTregression/treespecific/) reasons are often more compact and therefore more interpretable reasons. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }