• Funding : Artois
  • Start year :
  • 2022

This thesis concerns the representation and processing of heterogeneous and uncertain data, knowledge and constraints.

Heterogeneity refers first of all to different types of information. For example, some data may be structured and easy to interrogate. Other data may be unstructured and difficult to use. Examples of unstructured data are images, reports and analogue or digitized maps of networks (e.g. urban). Heterogeneity also refers to the various imperfections associated with information: uncertain factual data, incomplete data, flexible constraints, potentially contradictory information, etc.

This thesis is defined in this context of heterogeneous information and data management. It initially concerns the extraction, completion (from multi-source data and information), interpretation and automatic annotation of unstructured data. The aim is to combine Machine Learning techniques with approaches based on knowledge, constraints and ontologies in order to represent unstructured data (e.g. images) in a format that can be easily exploited by Artificial Intelligence methods (e.g. in the form of databases).

The second objective of the thesis concerns the definition of formal languages (logical and graphical models) to represent the different forms of data and knowledge. Emphasis will be placed on so-called ‘processable’ languages, specially designed for applications dealing with large amounts of data and in which querying is an important reasoning task.

The final objective of the thesis concerns the definition of efficient query and inference mechanisms for responding to queries based on heterogeneous information and for reasoning about constrained spatial data. In a context of heterogeneous information, an important issue concerns the management of ‘conflicting’ information. The conflict management problem has received considerable attention in the literature and remains an open problem in Artificial Intelligence. Different attitudes can be followed in the presence of conflicts in knowledge bases. The aim is to extend the mechanisms for selecting preferred repairs, defined in the context of lightweight ontologies, to heterogeneous data and in the presence of uncertainty and flexible constraints.

The models and algorithms developed in this thesis will be validated on urban network data rich in heterogeneous information.