Large scale reasoning using models of human cognition
|has title::Improvements in large-scale automated reasoning inspired by models of cognition|
|Master:||project within::Cognitive Science|
|Student name:||student name::Arjon Buikstra|
|Supervisor:||Annette ten Teije|
|Second reader:||has second reader::Frank van Harmelen|
|Company:||has company::VU & ABC at Max Plank Institute for Human Development in Berlin|
This paper offers several approaches to reduce error in a set of results from a large-scale semantic data repository. These methods were inspired by cognitive science, specifically by the fields of simple heuristics and similarity theory. We present a problem case based on a memory foraging experiment for which we constructed fifty SPARQL queries, human quality measures, and a gold standard of results.
The main contribution is an algorithm that calculates an endogenous feature-based value of similarity for items in a set. It can be used to separate hits from false alarms in a set of results. This similarity method is shown to be statistically reliably related to actual measures of result quality, based on precision and recall. The algorithm is a simple, heuristic approach to determining quality of items in a set of results, and can be used to filter out false positives in the result set of any query on any semantic repository automatically.
Additionally, a global (information-eager) measure of similarity employing latent semantic analysis is presented. This measure is effective providing the semantic space covers the results sufficiently. However, in some of our experiments the semantic space did not sufficiently cover the results, which suggests approaches using global similarity spaces are more fragile than those using the more simple endogenous similarity method.
Our research question states: “How can theories and models from cognitive science be used to improve querying over large data sets?” Throughout this project it is shown that methods relying on theories of similarity from cognitive science can be used as heuristics to improve the results of queries over large data sets. Simple algorithms are presented that help reduce error inherent in results from large-scale data, and aid in giving a satisfactory set of results for any query.