Sentiment Analysis of Dutch Social Media
Sentiment Analysis of Dutch Social Media | |
---|---|
status: ongoing
| |
Master: | project within::Technical Artificial Intelligence |
Student name: | student name::Dennis Lokkers |
number: | student number::1204203 |
Dates | |
Start | start date:=2009/03/30 |
End | end date:=2009/10/30 |
Supervision | |
Supervisor: | Stefan Schlobach |
Second reader: | has second reader::Shenghui Wang |
Company: | has company::BuzzCapture |
Poster: | has poster::Media:Media:Posternaam.pdf |
Signature supervisor
..................................
Abstract
The internet is frequently used as a medium for exchange of information and opinions. Knowledge about the opinion of consumers about product can be of great value for organizations. Because the amount of information on the internet is so large, research on finding classifiers that automatically label text on sentiment is being conducted.
To analyze opinions expressed in user posted articles, there are some difficulties. The data is noisy, articles don't necessarily talk about only one product, there is not only one domain you want to analyze, and the data is in dutch (where most research is about english texts). This influences the training, where every time a new domain is needed to be analyzed, the training is from scratch. And the accuracy of the analysis gets influenced when the expressed sentiment in the text is about more than one product. A start has been made in the sentiment analysis of dutch articles. But they are either not performing so well, or designed for really specific domains.
This master project explores ways to improve sentiment analysis on online user postings for the Dutch language. We will investigate if prior knowledge will improve the sentiment analysis and reduces the amount of training the classifier needs. And also if sentence level analysis will perform better on texts containing multiple subjects than text level analysis. We will train existing classifiers with different domain corpora on sentence level and text level analysis. We also measure the effect on the accuracy and the amount of needed training when including prior domain knowledge.