Zied Bouraoui

Structured knowledge about entities increasingly plays an important role in various web applications (e.g. search engines, recommendation sites, and social networks). Such knowledge is available, for example, in ontologies such as SUMO or OpenCyc, in knowledge graphs such as DBpedia, Google Freebase, Facebook Graph and WikiData, as semantic markup (e.g. RDFa), or on web pages (e.g. in the form of tables, lists, or natural language assertions). While these web applications typically require some form of reasoning, deduction is often too limited in this setting. One cause for this limitation is that the available information may be conflicting. For instance, the concept “ice cream shop” is considered as type of “restaurant” on Wikipedia, but is disjoint from “restaurant” in OpenCyc. Another reason for this limitation is that available knowledge is often incomplete. For example, the SUMO ontology encodes knowledge about summer olympics games such as Basketball, Football and Handball, but mentions nothing about Beach volleyball. Clearly, web applications would benefit from robust commonsense reasoning methods that could handle inconsistency and incompleteness in a principled data-driven way. Commonsense reasoning is a well-known problem in artificial intelligence. This problem has at least two facets. The first, largely studied in KR, concerns nonmonotonic reasoning frameworks, such as default logics and answer set programming. The second, which has traditionally been less addressed in KR, but has been widely studied in other research areas such as machine learning or information retrieval, relates to similarity and analogy based reasoning, including induction. Similarity and analogy are naturally a matter of degree, which makes it difficult, or even impossible, to study the latter types of commonsense reasoning in a purely symbolic setting.

To address this point, we developed a number of methods for robust and flexible data- driven reasoning with logic-based structured knowledge from the web (i.e. description logics or existential rules). At the centre of this work are semantic spaces, which act as an interface between logical representations and data, together with an efficient Bayesian inference machinery that allows us to make plausible inferences in a principled way. Semantic spaces are vector space representation of entities, which have been widely studied in the cognitive science literature to model phenomena such as categorisation, induction, vagueness, typicality, etc.

In this talk, I will show how such semantic spaces can be generated from a combination of structured knowledge and bag-of-words representations. I will then introduce a Bayesian method which is inspired by cognitive models of category based induction. This method is useful for deriving concept membership (e.g. x is a kind of X) and subsumption (e.g. every X is a Y) assertions, but it can not directly be applied to model relations between entities or concepts (for instance, a rule of the form \forall x \forall y R(x,y) -> \exists z P(z,x) ). To overcome this limitation, three additional methods have been investigated. The first views this problem of “relation induction” as a Bayesian regression problem, the second considers it as a special case of category based induction, and the third models a form of analogical reasoning. Our experimental results show that this approaches can substantially outperform state-of-the-art methods for knowledge base completion.

Keywords: Ontologies, Description Logics, Bayesian Inference, Vector Space Embeddings, Data-Driven Reasoning.