Class Xgboost (extends Learner)
This class is used to create a model using the Xgboost library.
All the methods used in this class (evaluate, get_instances, …) are documented in the parent class Learner.
def __init__(self, data: str|DataFrame|NoneData = NoneData, *,
problem_type: str|ProblemType|None = None,
model_type: str|ModelType|None = None,
instances_type: str|InstancesType|None = None,
labels_type: str|LabelsType|None = None,
get_item_function: Callable|None = None,
instances_directory: str|None = None,
labels_directory: str|None = None): Highlight
Initialise the learner with the dataset and its characteristics.
The dataset can be given as a path to a csv, json or excel file or as a pandas DataFrame. In the case of a csv or excel file, the last column is considered as the label column and all other columns are considered as features. In the case of a json file, the dataset should be in a specific format (see documentation).
Parameters
data : str | pandas.DataFrame | NoneData
The dataset to use, either as a path to a csv, json or excel file or as a pandas DataFrame.
problem_type : str | ProblemType
The type of problem (classification, regression, …)
Possible values are defined in the ProblemType enum.
model_type : str | ModelType (optional, default=None)
The type of model (linear, tree-based, neural network, …)
Can be None and put in the evaluation method.
Possible values are defined in the ModelType enum.
instances_type : str | InstancesType (optional, default=None)
The type of instances (image, tabular, text, temporal, …)
Possible values are defined in the InstancesType enum.
When set to None, if data is a csv and get_item_function is None then it is set to “tabular”
labels_type : str | LabelsType (optional, default=None)
The type of labels (class, text, mask, contours, …)
Possible values are defined in the LabelsType enum.
When set to None, if problem_type is “classification” then it is set to “classes”, if problem_type is “regression” then it is set to “continuous-values”
get_item_function : Callable (optional, default=None)
A function to get an instance from the dataset. This function is used to get an instance in the right format for the model and the explainer.
If the dataset is a pandas DataFrame and the instances are tabular, this function is not necessary and can be set to None.
In other cases, this function should be defined by the user. It should take as input a row of the dataframe and return the corresponding instance in the right format for the model and the explainer.
instances_directory : str (optional, default=None)
The directory where the instances are stored (only for a JSON dataset).
This parameter is used to extend the path of instances in the dataframe when the instances are of type image and the dataset is given as a json file.
labels_directory : str (optional, default=None)
The directory where the labels are stored (only for a JSON dataset).
This parameter is used to extend the paths of labels in the dataframe when the labels are of type masks and the dataset is given as a json file.
Warning: NOT YET IMPLEMENTED
Returns
Xgboost :
A Learner object.
Examples
from pyxai import Learning, Tools
learner = Learning.Xgboost(Tools.Options.dataset, problem_type=Learning.CLASSIFICATION)
model = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.BT, splitting_parameters={'test_size':0.2}, model_parameters={'max_depth':6, 'base_score':0.5})