Event representation and time-based identity resolution / data consolidation
|has title::Event representation and time-based identity resolution in micro-blogging|
|Master:||project within::Knowledge Technology and Intelligent Internet Applications|
|Student name:||student name::J. van Dijk|
|Second reader:||has second reader::Willem van Hage|
The rise of micro-blogging results in a whole new way of communicating. Micro-blogging makes it possible to send in very short time the latest news into the world. This way of communicating, gives a lot of potential new opportunities for new applications. Nowadays many micro-blogging applications are available on computers and mobile devices. Often users of micro-blogging application send about what they are doing or what is happening or interesting at that moment in their lives. One general topic are events that users attending or following. In this study, the focus lies for example on festivals, sport events and conferences. An approach in this study is developed for detecting automatically events in micro-blogging messages. Events are seen as things that happen during some time interval on a location. The goal of this study is to develop a general method for event detection. The information of the detected events can be used to generate news or information about those events. For example, an overview of which artists perform on a festival or what are the scores of the playoff matches the day before. The implementation has two types of analysis: The semantic part is based on a set of models that contain information about actors of events, these are for example band/artists, speakers or sport teams. These models are applied on the collections of micro-blogging messages. For example, a model of a band will be applied on the data about a festival. If the band performs on that festival it uses first the content of the messages that are about that band to detect if user say something where and on which time they perform. The second step, if the semantics are not accurate enough or contain less data a statistical approach is applied on the collection of messages. Three methods based on some statistical principles are implemented: Threshold, Area and Crossing method. The results of each method (semantic and statistical as well) return in a data consolidation step which combines the results to one single result. Finally, the system represents the events that are detected. The semantic and statistical methods combined in the data consolidation step results in an interesting prototype system and insight in the way event detection can work for the use in micro-blogging messages.