Link Search Menu Expand Document
PyXAI
Papers Video GitHub In-the-Loop EXPEKCTATION Release Notes About

Class ModelIO

A utility class designed to handle input/output operations for machine learning models of PyXAI.

This class covers importing, loading, and saving models.


    def import_models(models,
                  data: str|pandas.DataFrame|NoneData = NoneData, *,
                  problem_type: str|ProblemType|None = None,
                  model_type: str|ModelType|None = None,
                  instances_type: str|InstancesType|None = None,
                  labels_type: str|LabelsType|None = None,
                  get_item_function: Callable|None = None,
                  instances_directory: str|None = None,
                  labels_directory: str|None = None):
Highlight

Import existing ML models.

The method detects the type of models and applies the correct conversions in order to translate them into PyXAI data structures.

Parameters

models : list[RandomForestClassifier|DecisionTreeClassifier|XGBClassifier|XGBRegressor|LGBMRegressor]

List of models to import.
Possible values are defined in the SplittingMethod enum.

data : str | pandas.DataFrame | NoneData (optional, default=NoneData)

The dataset to use, either as a path to a csv, json or excel file or as a pandas DataFrame.

model_type : str | ModelType

The type of model (linear, tree-based, neural network, …)
Possible values are defined in the ModelType enum.

instances_type : str | InstancesType (optional, default=None)

The type of instances (image, tabular, text, temporal, …)
Possible values are defined in the InstancesType enum.

labels_type : str | LabelsType (optional, default=None)

The type of labels (class, text, mask, contours, …)
Possible values are defined in the LabelsType enum.

get_item_function : Callable (optional, default=None)

A function to get an instance from the dataset. This function is used to get an instance in the right format for the model and the explainer.
If the dataset is a pandas DataFrame and the instances are tabular, this function is not necessary and can be set to None. 
In other cases, this function should be defined by the user. It should take as input a row of the dataframe and return the corresponding instance in the right format for the model and the explainer.

instances_directory : str (optional, default=None)

The directory where the instances are stored (only for a JSON dataset). 
This parameter is used to extend the path of instances in the dataframe when the instances are of type image and the dataset is given as a json file.

labels_directory : str (optional, default=None)

The directory where the labels are stored (only for a JSON dataset). 
This parameter is used to extend the paths of labels in the dataframe when the labels are of type masks and the dataset is given as a json file.
Warning: NOT YET IMPLEMENTED

Returns

tuple(Scikitlearn|Xgboost|LightGBM, list[DecisionTree|RandomForest|BoostedTrees|BoostedTreesRegression]) :

Return a tuple (learner, models) with the good learner and the PyXAI models.

Examples

from pyxai import Tools, Learning, Explaining
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets

model_rf = RandomForestClassifier(random_state=0)
data = datasets.load_breast_cancer(as_frame=True)
X = data.data
Y = data.target

feature_names = data.feature_names
model_rf.fit(X, Y)

learner, model = Learning.ModelIO.import_models(model_rf, instances_type='tabular')
learner.feature_names = feature_names
instance, prediction = learner.get_instances(dataset=data.frame, model=model, n=1)

explainer = Explaining.initialize(model, instance=instance)
direct = explainer.direct_reason()
    def load(models_directory, *, data=NoneData, tests=False):  Highlight

Load the models that were saved using the save method from the models_directory directory.

The files in the models_directory have to be in the form of two files per model:
- <models_directory>/<dataset>.<i>.pkl: the models in the form of Pickle files
- <models_directory>/<dataset>.<i>.map: contain information about models in JSON formats.
where <i> is the index of a model.

Parameters

models_directory : str

The directory of models to load.

data : str | pandas.DataFrame | NoneData (optional, default=NoneData)

The dataset to use, either as a path to a csv, json or excel file or as a pandas DataFrame.

Examples

from pyxai import Tools, Learning

learner = Learning.Scikitlearn(dataset, problem_type=problem_type, labels_type=labels_type)
model = learner.evaluate(splitting_method='hold-out', model_type=model_type, splitting_parameters={"test_size":0.2, "random_state":0})

Learning.ModelIO.save(model, "my_models")

del learner
del model

learner, model = Learning.ModelIO.load("my_models", dataset=dataset)
instances = learner.get_instances(model=model, n=n_instances, indexes=Learning.TEST)
    def save(models, save_directory): Highlight

Save models in the save_directory in the form of two files: <save_directory>/<dataset>.<i>.map and <save_directory>/<dataset>.<i>.pkl where <i> is the index of a model given in the model parameter.

The backup formats of .pkl of models are Pickle files. The .map files contain information about models in JSON formats.

Parameters

models : list[DecisionTree|RandomForest|BoostedTrees|BoostedTreesRegression] | DecisionTree|RandomForest|BoostedTrees|BoostedTreesRegression

PyXAI model or List of PyXAI models to be saved.

save_directory : str

The directory where the models are saved. Creates the directory if it does not exist.

Returns

tuple(Scikitlearn|Xgboost|LightGBM, list[DecisionTree|RandomForest|BoostedTrees|BoostedTreesRegression]) :

Return a tuple (learner, models) with the good learner and the PyXAI models.

Examples

from pyxai import Tools, Learning

learner = Learning.Scikitlearn(dataset, problem_type=problem_type, labels_type=labels_type)
model = learner.evaluate(splitting_method='hold-out', model_type=model_type, splitting_parameters={"test_size":0.2, "random_state":0})

Learning.ModelIO.save(model, "saving_directory")

Symbols