Sentiment Analysis of Dutch Social Media

From Master Projects

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Sentiment Analysis of Dutch Social Media
status: ongoing
Master: project within::Technical Artificial Intelligence
Student name: student name::Dennis Lokkers
number: student number::1204203
Start start date:=2009/03/30
End end date:=2009/10/30
Supervisor: Stefan Schlobach
Second reader: has second reader::Shenghui Wang
Company: has company::BuzzCapture
Poster: has poster::Media:Media:Posternaam.pdf

Signature supervisor



The internet is frequently used as a medium for exchange of information and opinions. Knowledge about the opinion of consumers about product can be of great value for organizations. Because the amount of information on the internet is so large, research on finding classifiers that automatically label text on sentiment is being conducted.

To analyze opinions expressed in user posted articles, there are some difficulties. The data is noisy, articles don't necessarily talk about only one product, there is not only one domain you want to analyze, and the data is in dutch (where most research is about english texts). This influences the training, where every time a new domain is needed to be analyzed, the training is from scratch. And the accuracy of the analysis gets influenced when the expressed sentiment in the text is about more than one product. A start has been made in the sentiment analysis of dutch articles. But they are either not performing so well, or designed for really specific domains.

This master project explores ways to improve sentiment analysis on online user postings for the Dutch language. We will investigate if prior knowledge will improve the sentiment analysis and reduces the amount of training the classifier needs. And also if sentence level analysis will perform better on texts containing multiple subjects than text level analysis. We will train existing classifiers with different domain corpora on sentence level and text level analysis. We also measure the effect on the accuracy and the amount of needed training when including prior domain knowledge.