Class ModelIO
A utility class designed to handle input/output operations for machine learning models of PyXAI.
This class covers importing, loading, and saving models.
def import_models(models,
data: str|pandas.DataFrame|NoneData = NoneData, *,
problem_type: str|ProblemType|None = None,
model_type: str|ModelType|None = None,
instances_type: str|InstancesType|None = None,
labels_type: str|LabelsType|None = None,
get_item_function: Callable|None = None,
instances_directory: str|None = None,
labels_directory: str|None = None): Highlight
Import existing ML models.
The method detects the type of models and applies the correct conversions in order to translate them into PyXAI data structures.
Parameters
models : list[RandomForestClassifier|DecisionTreeClassifier|XGBClassifier|XGBRegressor|LGBMRegressor]
List of models to import.
Possible values are defined in the SplittingMethod enum.
data : str | pandas.DataFrame | NoneData (optional, default=NoneData)
The dataset to use, either as a path to a csv, json or excel file or as a pandas DataFrame.
model_type : str | ModelType
The type of model (linear, tree-based, neural network, …)
Possible values are defined in the ModelType enum.
instances_type : str | InstancesType (optional, default=None)
The type of instances (image, tabular, text, temporal, …)
Possible values are defined in the InstancesType enum.
labels_type : str | LabelsType (optional, default=None)
The type of labels (class, text, mask, contours, …)
Possible values are defined in the LabelsType enum.
get_item_function : Callable (optional, default=None)
A function to get an instance from the dataset. This function is used to get an instance in the right format for the model and the explainer.
If the dataset is a pandas DataFrame and the instances are tabular, this function is not necessary and can be set to None.
In other cases, this function should be defined by the user. It should take as input a row of the dataframe and return the corresponding instance in the right format for the model and the explainer.
instances_directory : str (optional, default=None)
The directory where the instances are stored (only for a JSON dataset).
This parameter is used to extend the path of instances in the dataframe when the instances are of type image and the dataset is given as a json file.
labels_directory : str (optional, default=None)
The directory where the labels are stored (only for a JSON dataset).
This parameter is used to extend the paths of labels in the dataframe when the labels are of type masks and the dataset is given as a json file.
Warning: NOT YET IMPLEMENTED
Returns
tuple(Scikitlearn|Xgboost|LightGBM, list[DecisionTree|RandomForest|BoostedTrees|BoostedTreesRegression]) :
Return a tuple (learner, models) with the good learner and the PyXAI models.
Examples
from pyxai import Tools, Learning, Explaining
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets
model_rf = RandomForestClassifier(random_state=0)
data = datasets.load_breast_cancer(as_frame=True)
X = data.data
Y = data.target
feature_names = data.feature_names
model_rf.fit(X, Y)
learner, model = Learning.ModelIO.import_models(model_rf, instances_type='tabular')
learner.feature_names = feature_names
instance, prediction = learner.get_instances(dataset=data.frame, model=model, n=1)
explainer = Explaining.initialize(model, instance=instance)
direct = explainer.direct_reason()
def load(models_directory, *, data=NoneData, tests=False): Highlight
Load the models that were saved using the save method from the models_directory directory.
The files in the models_directory have to be in the form of two files per model:
- <models_directory>/<dataset>.<i>.pkl: the models in the form of Pickle files
- <models_directory>/<dataset>.<i>.map: contain information about models in JSON formats.
where <i> is the index of a model.
Parameters
models_directory : str
The directory of models to load.
data : str | pandas.DataFrame | NoneData (optional, default=NoneData)
The dataset to use, either as a path to a csv, json or excel file or as a pandas DataFrame.
Examples
from pyxai import Tools, Learning
learner = Learning.Scikitlearn(dataset, problem_type=problem_type, labels_type=labels_type)
model = learner.evaluate(splitting_method='hold-out', model_type=model_type, splitting_parameters={"test_size":0.2, "random_state":0})
Learning.ModelIO.save(model, "my_models")
del learner
del model
learner, model = Learning.ModelIO.load("my_models", dataset=dataset)
instances = learner.get_instances(model=model, n=n_instances, indexes=Learning.TEST)
def save(models, save_directory): Highlight
Save models in the save_directory in the form of two files: <save_directory>/<dataset>.<i>.map and <save_directory>/<dataset>.<i>.pkl where <i> is the index of a model given in the model parameter.
The backup formats of .pkl of models are Pickle files. The .map files contain information about models in JSON formats.
Parameters
models : list[DecisionTree|RandomForest|BoostedTrees|BoostedTreesRegression] | DecisionTree|RandomForest|BoostedTrees|BoostedTreesRegression
PyXAI model or List of PyXAI models to be saved.
save_directory : str
The directory where the models are saved. Creates the directory if it does not exist.
Returns
tuple(Scikitlearn|Xgboost|LightGBM, list[DecisionTree|RandomForest|BoostedTrees|BoostedTreesRegression]) :
Return a tuple (learner, models) with the good learner and the PyXAI models.
Examples
from pyxai import Tools, Learning
learner = Learning.Scikitlearn(dataset, problem_type=problem_type, labels_type=labels_type)
model = learner.evaluate(splitting_method='hold-out', model_type=model_type, splitting_parameters={"test_size":0.2, "random_state":0})
Learning.ModelIO.save(model, "saving_directory")