# Importing Models From Libraries

PyXAI can [generate models]({{ site.baseurl }}/pyxai/documentation/learning/generating/) for you. Indeed  it provides some dedicated functions that simplify this task. However, if your model has already been learned, you may want to import it inside PyXAI in order to extract explanations afterwards. This page explains how to perform such a task.

## Procedure 

Consider the follownig source code to create a ```RandomForestClassifier``` using [Scikit-learn](https://scikit-learn.org/stable/): 

In [1]:
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier

model_rf = RandomForestClassifier(random_state=0)
data = datasets.load_breast_cancer(as_frame=True)
X = data.data.to_numpy()
Y = data.target.to_numpy()

feature_names = data.feature_names
model_rf.fit(X, Y);

You can import this ML model thanks to the ```Learning.import_models()``` method:

| <font style="font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;" size="+1pt">Learning.import_models(models, feature_names=[]):</font> | 
| :----------- | 
| Import the ```models```. The method detects the type of  models and applies the correct conversions in order to translate them into PyXAI data structures. Return a tuple ```(<Learner Object>, models)``` where the returned ```models``` depend on the conversions applied. More precisely, the returned ```models``` can be of the form ```DecisionTree```\|```RandomForest```\|```BoostedTrees```\|```BoostedTreesRegression```.  |
| <b><font style="font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;">models</font></b> ```List``` of ```RandomForestClassifier```\|```DecisionTreeClassifier```\|```XGBClassifier```\|```XGBRegressor```\|```LGBMRegressor```: List of models to import.|
| <b><font style="font-family:Consolas,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New;">feature_names</font></b> ```List``` of ```String```: The feature names. If the ```feature names``` are not specified, they can be replaced by strings starting with 'f' followed by a number (e.g., f1,f2,f3,...,f30) in the explanations provided by the ```to_features()``` method. 

Here is a table summarizing the compatibility ensured with respect to 3 standard ML libraries:

<table>
<thead>
  <tr>
    <th>Type</th>
    <th>Scikit-learn</th>
    <th>Xgboost</th>
    <th>LightGBM</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td style="text-align:center">Decision Tree</td>
    <td style="text-align:center">DecisionTreeClassifier</td>
    <td style="text-align:center"></td>
    <td style="text-align:center"></td>
  </tr>
  <tr>
    <td style="text-align:center">Random Forest</td>
    <td style="text-align:center">RandomForestClassifier</td>
    <td style="text-align:center"></td>
    <td style="text-align:center"></td>
  </tr>
  <tr>
    <td style="text-align:center">Boosted Tree</td>
    <td style="text-align:center"></td>
    <td style="text-align:center">XGBClassifier<br>XGBRegressor</td>
    <td style="text-align:center">LGBMRegressor</td>
  </tr>
</tbody>
</table>
<br>

In [2]:
from pyxai import Tools, Learning, Explainer
learner, model = Learning.import_models(model_rf, feature_names)

---------------   Explainer   ----------------
For the evaluation number 0:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 1755



Then, you can get explanations by executing: 

In [3]:
instance, prediction = learner.get_instances(dataset=data.frame, model=model, n=1)
print("instance:", instance)
print("prediction:", prediction)

---------------   Instances   ----------------
data:
     mean radius  mean texture  mean perimeter  mean area  mean smoothness   
0          17.99         10.38          122.80     1001.0          0.11840  \
1          20.57         17.77          132.90     1326.0          0.08474   
2          19.69         21.25          130.00     1203.0          0.10960   
3          11.42         20.38           77.58      386.1          0.14250   
4          20.29         14.34          135.10     1297.0          0.10030   
..           ...           ...             ...        ...              ...   
564        21.56         22.39          142.00     1479.0          0.11100   
565        20.13         28.25          131.20     1261.0          0.09780   
566        16.60         28.08          108.30      858.1          0.08455   
567        20.60         29.33          140.10     1265.0          0.11780   
568         7.76         24.54           47.92      181.0          0.05263   

     mean 

In [4]:
explainer = Explainer.initialize(model, instance=instance)

direct = explainer.direct_reason()
print("len direct reason:", len(direct))

sufficient = explainer.sufficient_reason()
print("len sufficient reason:", len(sufficient))

print("to_features:", explainer.to_features(sufficient))

len direct reason: 294
len sufficient reason: 159
to_features: ('mean radius > 15.045000076293945', 'mean texture <= 11.585000038146973', 'mean perimeter > 96.57999801635742', 'mean area > 694.5', 'mean smoothness > 0.09075499698519707', 'mean compactness > 0.09524999931454659', 'mean concavity > 0.17409999668598175', 'mean concave points > 0.07939000055193901', 'mean symmetry > 0.12639999762177467', 'radius error > 0.7730999886989594', 'texture error > 0.7377500236034393', 'perimeter error > 2.76200008392334', 'area error > 33.064998626708984', 'smoothness error in ]0.005567499902099371, 0.009928999934345484]', 'compactness error > 0.00834800023585558', 'concavity error in ]0.018459999933838844, 0.2157999947667122]', 'fractal dimension error in ]0.0030724999960511923, 0.012140000239014626]', 'worst radius > 17.72499942779541', 'worst texture in ]15.434999942779541, 18.289999961853027]', 'worst perimeter > 120.70000076293945', 'worst area > 953.7000122070312', 'worst smoothness > 0.136

{: .attention }
> Giving the ```feature_names``` in the ```Learning.import_models()``` parameters allows to get the right feature names with the ```to_features()``` method.  If you do not give them, the feature names will be of the form f1, f2, f3 ,..., f30 where the numbers correspond to ranks in the dataset.  

{: .attention }
> You can use ```learner.get_label_from_value(value)``` and ```learner.get_value_from_label(label)``` to get the right values comming from the encoding of labels. The python dictionary variable ```learner.dict_labels``` contains the encoding performed.

## Load/Save From Libraries

The creation of ML models and the calculation of explanations are done by two different programs. You cab save them using the first one and  load them using the second one. 

### Scikit-learn

We follow the documentation of [Scikit-learn](https://scikit-learn.org/stable/model_persistence.html) which advises the use of module [pickle](https://docs.python.org/3/library/pickle.html).

In [5]:
from sklearn import svm
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
import pickle

rf = RandomForestClassifier()
X, Y = datasets.load_breast_cancer(return_X_y=True)
rf.fit(X, Y)
file = open("example.model", 'wb')
pickle.dump(rf, file)
file.close()

You can load this model into another program thanks to these lines of code:

In [6]:
with open("example.model", 'rb') as file:
    learner = pickle.load(file)

And then you can import your model:

In [7]:
from pyxai import Tools, Learning, Explainer
learner, model = Learning.import_models(learner)

---------------   Explainer   ----------------
For the evaluation number 0:
**Random Forest Model**
nClasses: 2
nTrees: 100
nVariables: 1675



###  XGBoost

We follow the documentation of [XGBoost](https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html).

In [8]:
from sklearn import svm
from sklearn import datasets
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

X, Y = datasets.load_iris(return_X_y=True)
bt = XGBClassifier(eval_metric="mlogloss")
bt.fit(X, Y)
bt.save_model('my_model.json')

You can load this model into another program thanks to these lines of code:

In [9]:
bt_loaded = XGBClassifier(eval_metric='mlogloss')
bt_loaded.load_model('my_model.json')

And then you can import your model:

In [10]:
from pyxai import Tools, Learning, Explainer
learner, model = Learning.import_models(bt_loaded)

---------------   Explainer   ----------------
For the evaluation number 0:
**Boosted Tree model**
NClasses: 3
nTrees: 300
nVariables: 33



### LightGBM

The documentation of [LightGBM](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html) allows to  save/load a ```Booster``` object. Thus, to save/load a ```LGBMRegressor```, we use [pickle](https://docs.python.org/3/library/pickle.html).

In [11]:
from sklearn import datasets
import lightgbm

X, Y = datasets.load_iris(return_X_y=True)
learner = lightgbm.LGBMRegressor(n_estimators=5, random_state=0)
learner.fit(X, Y)
file = open("example.model", 'wb')
pickle.dump(learner, file)
file.close()

You can load this model into another program thanks to these lines of code:

In [12]:
with open("example.model", 'rb') as file:
    learner_loaded = pickle.load(file)

And then you can import your model:

In [13]:
from pyxai import Tools, Learning, Explainer
learner, model = Learning.import_models(learner_loaded)

---------------   Explainer   ----------------
For the evaluation number 0:
**Boosted Tree model**
NClasses: None
nTrees: 5
nVariables: 9



## Example with cross-validation

This example shows how to import models and to compute explanations. We start by implementing a function to process the dataset: 

In [14]:
import pandas
import numpy

def load_dataset(dataset):
    data = pandas.read_csv(dataset).copy()

    # extract labels
    labels = data[data.columns[-1]]
    labels = numpy.array(labels)

    # remove the label of each instance
    data = data.drop(columns=[data.columns[-1]])

    # extract the feature names
    feature_names = list(data.columns)

    return data.values, labels, feature_names

Then, we implement a function performing a cross validation. More precisely, we chose here to use the Leave One Group Out cross-validator of [Scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeaveOneGroupOut.html) and a ```lightgbm.LGBMRegressor``` of the [LightGBM](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html) library:

In [15]:
import functools
import random 
import operator
import lightgbm
from sklearn.model_selection import LeaveOneGroupOut

def cross_validation(X, Y, n_trees=100, n_forests=2) :
    n_instance = len(Y)
    quotient = n_instance // n_forests
    remain = n_instance % n_forests

    # Groups creation
    groups = [quotient*[i] for i in range(1,n_forests+1)]
    groups = functools.reduce(operator.iconcat, groups, [])
    groups += [i for i in range(1,remain+1)]
    random.shuffle(groups)

    # Variable definition
    loo = LeaveOneGroupOut()
    forests = []
    i = 0
    for index_training, index_test in loo.split(X, Y, groups=groups):
        if i < n_forests:
            i += 1
        # Creation of instances (X) and labels (Y) according to the index of loo.split() 
        # for both training and test set
        x_train = [X[x] for x in index_training]
        y_train = [Y[x] for x in index_training]
        x_test = [X[x] for x in index_test]
        y_test = [Y[x] for x in index_test]

        # Training phase
        learner = lightgbm.LGBMRegressor(n_estimators=5, random_state=0)
        learner.fit(x_train, y_train)
        # Get the classifier prediction of the test set  
        y_predict = learner.predict(x_test)

        forests.append((learner, index_training, index_test))
    return forests

Finally, we use the two previous functions and  import the models  in PyXAI in order to compute explanations.

In [16]:
from pyxai import Tools, Learning, Explainer

data, labels, feature_names = load_dataset("../dataset/winequality-red.csv")
results = cross_validation(data, labels, n_trees=5)

models = [result[0] for result in results]
training_indexes = [result[1] for result in results]
test_indexes = [result[2] for result in results]

learner, models = Learning.import_models(models)

for i, model in enumerate(models):
    instances = learner.get_instances(dataset="../dataset/winequality-red.csv", model=model, n=2, indexes=Learning.TEST, test_indexes=test_indexes[i])

    for (instance, prediction_classifier) in instances:
        explainer = Explainer.initialize(model, instance=instance)
        prediction = model.predict_instance(instance)
        print("prediction:", prediction)
        direct = explainer.direct_reason()
        print("len direct reason:", len(direct))
        explainer.set_interval(prediction - 0.2, prediction + 0.2)
        ts = explainer.tree_specific_reason()
        print("len tree_specific_reason:", len(ts))
        print("---------------------------")

---------------   Explainer   ----------------
For the evaluation number 0:
**Boosted Tree model**
NClasses: None
nTrees: 5
nVariables: 73

For the evaluation number 1:
**Boosted Tree model**
NClasses: None
nTrees: 5
nVariables: 86

---------------   Instances   ----------------
number of instances selected: 2
----------------------------------------------
prediction: 5.344027682956049
len direct reason: 12
len tree_specific_reason: 5
---------------------------
prediction: 5.3536241433926985
len direct reason: 9
len tree_specific_reason: 5
---------------------------
---------------   Instances   ----------------
number of instances selected: 2
----------------------------------------------
prediction: 5.407780216085812
len direct reason: 28
len tree_specific_reason: 9
---------------------------
prediction: 5.54355845301202
len direct reason: 28
len tree_specific_reason: 9
---------------------------


With PyXAI, you can also generate your own models. We invite you to look at the [Generating Models](/documentation/learning/builder/) page for more information.