Utilisation de la fusion de croyances pour améliorer les ontologies et l’exploitation des données ouvertes • Truong Thanh Ma
- Directeur de thèse :
- Sébastien Konieczny
- Financement : Artois
Context and related work: Structured web knowledge, often in the simple form of subject-predicate-object triples (such as “paris/capital-of/france”), plays an increasingly important role in domains such as information retrieval, natural language processing and machine learning. More generally, such type of knowledge is available in widely used sources comprising e.g. formal ontologies such as SUMO, SNOMED, ResearchCyC, knowledge graphs such as DBpedia, Google Freebase and WikiData, semantic markup (e.g. RDFa), and web pages (e.g. in the form of tables, lists, or natural language assertions). Recent applications such as search engines, recommendation sites, and social networks, require knowledge bases with a wide coverage, even if that means accepting some inaccuracies. This has motivated a large body of recent work on methods for automatically extending knowledge bases using (i) methods for finding plausible rules and facts that can be added to the knowledge base, (ii) merging knowledge provided by several, possibly incompatible, sources in order to get a unique and global knowledge about a specific task or domain. In this thesis proposal, the main focus will be on the merging task, which is a challenging problem on its own. In fact, gathering information provided by several sources often leads to inconsistencies and ambiguities. For instance, the SUMO ontology claims that a “creek” is disjoint from a “river” while wikipedia claims that “creek” is of type “river”. Another problem is that the information that is available may be ambiguous. For instance, the concept “café” is considered as type of “RestaurantOrganization” in OpenCyc, while it can be considered of type “DrinkingEstablishment” (a concept in OpenCyc that only contains “Pub” and “Bar”). The goal of merging, as it is classically studied, is to define operators taking as input the information provided by the different sources and produce as output a base that is consistent and synthesises the best of the sources while dealing with possible inaccuracies. It turns out that merging open domain knowledge is a particularly challenging task. For instance, Tanon et al.  point out the different problems and difficulties encountered when merging freebase with Wikidata.
Objectives: The general goal of the work to be carried out during the Ph.D. project is the delivery of flexible and robust methods for merging open domain knowledge. The first step consists in studying the different classical belief merging approaches [Konieczny and Pino Pérez, 2002] that have already been proposed within a propositional-logic setting. This part focuses on how these methods can be compared in terms of how useful their results are in merging open domain data. The second step consists in proposing flexible methods for merging that use vector space representation of entities [Bouraoui and Schockaert, 2018] in order to deal with problems like ambiguities and typicalities [Booth et al., 2013; Booth et al., 2015; Varzinczak, 2018]. Notice that a (non exhaustive) list of requirements should be fulfilled. The first one is that the results of merging should be supported by intuitive explanations, as this will allow us to determine the plausibility of the resulting knowledge. Another important requirement is that the merging process should be relatively cautious, as in a deductive setting the impact of accepting incorrect conclusions could be far-reaching. Finally, conclusions which are derived inductively using vector space representation should be associated with commensurable confidence scores that can be used to restore consistency when conflicts arise [Bouraoui et al., 2017].
The scalability of sound and complete merging methods is likely to be limited. We will hence work on identifying tractable fragments, i.e., syntactic restrictions under which polynomial-time sound and complete reasoning is possible. Finally, the models that will be proposed will not only be used for merging knowledge bases, but also for dealing with inconsistency, especially the one arising as borderline effect. Namely, instead of gathering the most plausible knowledge which can be in the knowledge base, we then try to find the least plausible which are in the knowledge base.
Connaissances requises :
- Logique formelle
- Apprentissage automatique
Connaissances à développer pendant la thèse :
- Révision et fusion des croyances classiques
- Formes de raisonnement
- Apprentissage statistique
Potential practical applications coming out of this work is the increase of interoperability of different ontologies and webplaforms, and the development of better web query answering tools. These aspects can considerably improve the user experience when querying the web or commercial websites in particular.
For further information, please feel free to contact:
Open-Domain Knowledge, Ontology, Belief Merging, Statistical learning, Vector Space Embeddings.
[Booth et al., 2013] R. Booth, T. Meyer, and I. Varzinczak. A propositional typicality logic for extending rational consequence. In E.L. Fermé, D.M. Gabbay, and G.R. Simari, editors, Trends in Belief Revision and Argumentation Dynamics, volume 48 of Studies in Logic – Logic and Cognitive Systems, pages 123–154. King’s College Publications, 2013.
[Booth et al., 2015] R. Booth, G. Casini, T. Meyer, and I. Varzinczak. On the entailment problem for a logic of typicality. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), pages 2805–2811, 2015.
[Bouraoui et al., 2017] Zied Bouraoui, Shoaib Jameel and Steven Schockaert: Inductive Reasoning about Ontologies Using Conceptual Spaces. Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI 2017).
[Bouraoui and Schockaert, 2018] Zied Bouraoui and Steven Schockaert: Learning Conceptual Space Representations of Interrelated Concepts. Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2018).
[Konieczny and Pino Pérez, 2002] Konieczny, S. and Pino Pérez, R. (2002). Merging information under constraints: A logical framework. Journal of Logic and Computation, 12(5):773–808.
[Tanon et al., 2016] Tanon, T. P., Vrandecic, D., Schaffert, S., Steiner, T., and Pintscher, L. (2016). From freebase to wikidata : The great migration. In Proceedings of the 25th International Conference on World Wide Web (WWW 2016), pages 1419–1428.
[Varzinczak, 2018] I. Varzinczak. A note on a description logic of concept and role typicality for defeasible reasoning over ontologies. Logica Universalis, 12(3-4):297–325, 2018.