{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Concepts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This section deals with the concepts of the ```Explaining``` object of PyXAI. First, we show how to use it, then we explain the notion of binary variables, and finally we give an example.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Main Methods"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First of all, in order to explain instances from a given model, you need to create an Explainer Object. This is done\n",
    "using the function ```Explaining.initialize```."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once the ```Explainer``` is created you can explain several instances using it. Each time you want to explain a new instance, you need to call the function ```set_instance``` (it is not necessary to create a new ```Explainer``` object)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The prediction made by the ML model is given by the ```target_prediction``` variable of the ```Explainer``` Object. Now, we have to consider binary representation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Binary representation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, let us recall that all the ML models we consider consist of trees (only one for a decision tree). Each tree contains nodes representing conditions \"\\<id_feature\\> \\<operator\\> \\<threshold\\> ?\" (such as \"$x_4 \\ge 0.5$ ?\"). Internally, the ```Explainer``` works with these conditions, treated as Boolean variables. The **binary representation** of an instance is a set of Boolean variables matching such conditions. Each Boolean variable represents a condition \"\\<id_feature\\> \\<operator\\> \\<threshold\\> ?\" of the model. The binary representation can be found in the ```binary_representation``` attribute of the ```Explainer``` object. The function ```to_features``` converts a binary representation (or an explanation) into a list of conditions \"\\<id_feature\\> \\<operator\\> \\<threshold\\>\" representing the features used. It is possible to obtain detailed results — in that case, a dictionary is returned with all details.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The function ```to_features``` is independent of the instance given by the explainer through the ```initialize```\n",
    "and ```set_instance``` methods, but depends only on the binary representation given by the parameter."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We present below an example based on the [iris](../../dataset/iris.csv) dataset and a Decision Tree as ML model. You should take a look at the [Generating Model](/documentation/learning/) page if you need more information about the ```Learning``` module.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------   Information   ---------------\n",
      "Problem type: classification\n",
      "Instances type: tabular\n",
      "Labels type: classes\n",
      "\n",
      "Dataset path: ../../dataset/iris.csv\n",
      "nFeatures (nAttributes, with the labels): 4\n",
      "nInstances (nObservations): 150\n",
      "nLabels: 3\n",
      "---------------   Model creation, fitting and evaluation  ---------------\n",
      "Splitting method: hold-out\n",
      "Problem type: classification\n",
      "Models type: decision-tree\n",
      "model_parameters: {}\n",
      "---------   Evaluation Information   ---------\n",
      "For the evaluation number 0:\n",
      "Metrics:\n",
      "   micro_averaging_accuracy: 96.49122807017544\n",
      "   micro_averaging_precision: 94.73684210526315\n",
      "   micro_averaging_recall: 94.73684210526315\n",
      "   macro_averaging_accuracy: 96.49122807017542\n",
      "   macro_averaging_precision: 93.73219373219372\n",
      "   macro_averaging_recall: 93.73219373219372\n",
      "   true_positives: {'Iris-setosa': 16, 'Iris-versicolor': 8, 'Iris-virginica': 12}\n",
      "   true_negatives: {'Iris-setosa': 22, 'Iris-versicolor': 28, 'Iris-virginica': 24}\n",
      "   false_positives: {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 1}\n",
      "   false_negatives: {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 1}\n",
      "   accuracy: 94.73684210526315\n",
      "   sklearn_confusion_matrix: [[16, 0, 0], [0, 8, 1], [0, 1, 12]]\n",
      "Number of Training instances: 112\n",
      "Number of Testing instances: 38\n",
      "\n",
      "---------------   Explainer   ----------------\n",
      "For the split number 0:\n",
      "**Decision Tree Model**\n",
      "nFeatures: 4\n",
      "nNodes: 8\n",
      "nVariables: 8\n",
      "\n",
      "---------------   Instances   ----------------\n",
      "Number of instances selected: 1\n",
      "----------------------------------------------\n",
      "instance: Sepal.Length    5.1\n",
      "Sepal.Width     3.5\n",
      "Petal.Length    1.4\n",
      "Petal.Width     0.2\n",
      "Name: 0, dtype: float64\n",
      "binary representation: (-1, -2, -3, -4, -5, -6, -7, -8)\n",
      "target_prediction: Iris-setosa\n",
      "to_features: ['Sepal.Length <= 5.950000047683716', 'Petal.Length <= 2.449999988079071', 'Petal.Width <= 1.550000011920929']\n",
      "to_features (keep redundant): ['Sepal.Length <= 6.599999904632568', 'Sepal.Length <= 5.950000047683716', 'Petal.Length <= 2.449999988079071', 'Petal.Length <= 4.75', 'Petal.Length <= 4.950000047683716', 'Petal.Length <= 4.8500001430511475', 'Petal.Width <= 1.699999988079071', 'Petal.Width <= 1.550000011920929']\n",
      "to_features with details: OrderedDict({'Petal.Length': [{'id': 3, 'name': 'Petal.Length', 'operator_sign_considered': <OperatorCondition.LE: 'LE'>, 'threshold': np.float64(2.449999988079071), 'weight': None, 'string': 'Petal.Length <= 2.449999988079071'}]})\n"
     ]
    }
   ],
   "source": [
    "from pyxai import Learning, Explaining\n\nlearner = Learning.Scikitlearn(\"../../dataset/iris.csv\", problem_type=Learning.CLASSIFICATION)\nmodel = learner.evaluate(splitting_method=Learning.HOLD_OUT, model_type=Learning.DT)\n\ninstance, prediction = learner.get_instances(model, n=1, is_correct=True)\nexplainer = Explaining.initialize(model, instance)\n\nprint(\"instance:\", instance)\nprint(\"binary representation:\", explainer.binary_representation)\nprint(\"target_prediction:\", explainer.target_prediction)\n\nprint(\"to_features:\", explainer.to_features(explainer.binary_representation))\nprint(\"to_features (keep redundant):\", explainer.to_features(explainer.binary_representation, eliminate_redundant_features=False))\n\nprint(\"to_features with details:\", explainer.to_features([-1], details=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We notice that the binary representation contains more variables than features because several features can appear multiple times in the decision tree — one condition per node. The function ```to_features``` eliminates redundant conditions by default, keeping only the tightest bound for each feature. Passing ```eliminate_redundant_features=False``` returns all conditions without simplification.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}