Contributions to Constraint-based Approaches for Data Mining
- PhD Student:
- Amel Hidouri
- Co-Advisors :
- Saïd Jabbour
- Bouteina Ben Yaghlane
- Funding : Autre
Thesis in co-supervision with the University of Tunis.
This thesis deals with the field of data mining and, more precisely, the extraction of knowledge from data by enumerating interesting patterns. This research domain was introduced in the 1990s and became a core part of data mining and machine learning.
High Utility Pattern Mining (HUIM, for short) is a well-known problem in pattern mining that extends the classical problem of mining frequent itemsets. In fact, the utility can be evaluated in terms of profit, cost, or any other user preference. The objective of HUIM is to find itemsets with a utility greater than a threshold.
Declarative approaches have recently been proposed for various data mining tasks such as mining frequent itemsets, association rules, sequences, or graphs. These declarative approaches have the advantage of easily incorporating new constraints for the search for particular patterns.
The thesis’s first goal is to propose a declarative framework for mining high utility itemsets from transaction databases using symbolic artificial intelligence. Our method is based on the propositional satisfiability problem. Second, in order to improve scalability, we intend to investigate how decomposition and parallelism can solve the common problem of symbolic techniques dealing with large databases while producing interesting results. The third contribution is to propose a propositional satisfiability-based framework for dealing with various condensed representations of high utility patterns as a solution to reduce the mining algorithm’s output. Finally, the final objective of this thesis is to highlight the performances through a comparison with a set of approaches in the literature on real and synthetic data.