Link Search Menu Expand Document
PyXAI
Papers Video GitHub EXPEKCTATION About

Building Models

The builder module of PyXAI allows to build a classifier with tree components using nodes and leaves. This can be very useful to test and check different types of explanations. Definitions of classes and methods are given on this page and some examples are provided in the Decision Tree, Random Forest and Boosted Tree pages.

PyXAI data structures are used directly in the builder module. A DecisionNode object represents a node while a LeafNode is a leaf. A DecisionTree object consists of DecisionNode and LeafNode. The models are specified through the objects DecisionTree, RandomForest and BoostedTrees. A DecisionTree contains only one tree whereas a RandomForest or BoostedTree object represents a set of trees and then contains some DecisionTree objects. Each DecisionNode represents a condition “<id_feature> <operator> <threshold> ?” (such as “$x_4 \ge 0.5$ ?”) which can be created using the builder.DecisionNode class:

builder.DecisionNode(id_feature, *, operator=builder.GE, threshold=0.5, left, right, parent=None):
Returns and create a DecisionNode. During the process, can also create two or one LeafNode without returning it. A DecisionNode represents a condition “<id_feature> <operator> <threshold> ?” (such as “$x_4 \ge 0.5$ ?”) in the model while a LeafNode is a value.
id_feature Integer: The feature identifier used in the condition “<id_feature> <operator> <threshold> ?”.
operator builder.GE, builder.GT, builder.LE, builder.LT, builder.EQ, builder.NEQ: The operator used in the condition “<id_feature> <operator> <threshold> ?”. Default value is builder.GE.
threshold Float: The threshold used in the condition “<id_feature> <operator> <threshold> ?”. Default value is 0.5.
parent DecisionNode None: To define the parent of this node. If this parameter is set to None, the parent is automatically defined when the tree is created.
left DecisionNode Integer Float: The left child of the node. When this parameter is an Integer or a Float, a LeafNode is generated.
right DecisionNode Integer Float: The right child of the node. When this parameter is an Integer or a Float, a LeafNode is generated.

When the operator and threshold parameters are not defined, they take their default values. In this case, the associated condition is of the form “$x \ge 0.5$ ?” which can be used to represent a binary feature (i.e. a features with value 0 or 1).

builder.DecisionTree(n_features, root, target_class=0, force_features_equal_to_binaries=False):
Returns a DecisionTree of n_features which has as root node the root parameter.
n_features Integer: The total number of features used by all trees of the model (not only those of this DecisionTree).
root DecisionNode DecisionLeaf: The root of the tree.
target_class Integer: Equal to 0 in binary classification. Not useful for multi-class classification except BoostedTrees where each DecisionTree represents a single class. Default value is 0.
force_features_equal_to_binaries Boolean: Setting this parameter to True ensures that the binary variables are equal to feature identifiers. Assume that a feature represents a condition (this is not true with most datasets). Put this option to True only when you build a tree where all features are Boolean conditions. Default value is False.

The DecisionTree, RandomForest and BoostedTrees methods allows to create models. During this process, binary variables representing the conditions (of the form “<id_feature> <operator> <threshold> ?”) of nodes are defined. These binary variables allow one to represent instances and explanations. The value of a binary variable denotes the condition used and its sign indicates whether this condition is satisfied or not in the model. By default, these binary variables have random values depending on the order according to which the tree is traversed. However, in rare cases when the features represent conditions (this is not true with most datasets), setting the force_features_equal_to_binaries parameter to True allows us to get binary variables, and then explanations, that directly match the features, without the need of using the to_features method.

builder.RandomForest|builder.BoostedTrees(trees, n_classes):
Returns a RandomForest or a BoostedTrees consisting of trees and n_classes to predict.
trees List of DecisionTree: The trees.
n_classes Integer: The number of classes to predict (can be multi-class).

The builder module of PyXAI allows also to build a regression model with tree components using nodes and leaves. This can be very useful to test and verify different types of explanations.

builder.BoostedTreesRegression(trees):
Returns a BoostedTreesRegression consisting of trees.
trees List of DecisionTree: The trees.

An example of how to build a regression model is given in the second part of the Boosted Tree page.


Table of contents