• Author:
  • Engelbert Mephu Nguifo
  • HDR Defended on :
  • Dec 10, 2001

Knowledge extraction is an interactive and iterative process of analyzing a large set of raw data in order to extract exploitable knowledge, and where the user-analyst plays a central role. In the perspective of designing knowledge extraction systems, we present our work on the development of data pre-processing and lattice classification methods for data mining.

The first part introduces the problem of data preprocessing by treating two types of methods: selection and attribute construction. Then, several classification methods based on decision trees, nearest neighbors, Galois lattices, and fuzzy logic are discussed.

In the second part, we present our methods for designing classification systems based on the Galois lattice and progressively on the use of majority voting, on the introduction of information theory measures, on the transformation of attributes, on the use of nearest neighbor techniques, on the selection of prototypes, and on the introduction of fuzzy subset theory. These methods have been used to implement several systems (LEGAL, Flexible-LEGAL, GLUE, IGLUE, CIBLe) which have been evaluated on a set of data sets. This part also presents a method for transforming symbolic attributes into numerical attributes.

Finally, the third and last part concerns the interaction and application aspects of our contributions. We show how an objection-based control technique allows the user-analyst to interact with the system in order to validate the knowledge produced. We then present a comparative study of operators used in dialogue with those used in machine learning. We conclude with a presentation of applications, notably in molecular biology where several problems have been addressed, including: splice site prediction, protein sequence alignment and amino acid coding analysis.

Key words: Data mining, Machine learning, Attribute transformation, Human-computer interaction, Bioinformatics.