Séminaire de David Ing
On Integrating Logical Analysis of Data into Random Forests
16 oct. 2025 - 14:00Random Forests (RFs) are one of the most popular classifiers in machine learning. RF is an ensemble learning method that combines multiple Decision Trees (DTs), providing a more robust and accurate model than a single DT. However, one of the main steps of RFs is the random selection of many different features during the construction phase ofDTs, resulting in a forest with various features,which makes it difficult to extract short and concise explanations. In this paper, we propose integrating Logical Analysis of Data (LAD) into RFs.LAD is a pattern learning framework that combines optimization, Boolean functions, and combinatorial theory. One of its main goals is to generate minimal support sets (MSSes) that discriminate between different groups of data. More precisely, weshow how to enhance the classical RF algorithm by randomly choosing MSSes rather than randomly choosing feature subsets that potentially contain irrelevant features for constructing DTs. Experiments on benchmark datasets reveal that integrating LAD into classical RFs using MSSes can maintain similar performance in terms of accuracy, produce forests of similar size, reduce the set of used features, and enable the extraction of significantly shorter explanations compared to classical RFs.