Difference between revisions of "Mapping drugs and medicine websites into VAD space"

From Master Projects
Jump to: navigation, search
Line 12: Line 12:
 
|Poster=Posternaam.pdf
 
|Poster=Posternaam.pdf
 
}}
 
}}
Within the EU TAFEIC project (Tools Against Financial and Economic Internet Crime), [http://www.parabots.nl/ Parabots] and [http://sentient.nl/ Sentient] are collecting large sets of websites relating to sales of drugs over the web. The goal of this thesis is to find and develop a way to cluster these sites and give a tool to fiscal inspectors to visualize this data in clear way that helps them to identify malicious and/or fraud websites.
+
In this research the VAD emotional state model is used to map Drugs
In order to do that, 3D representation will be used as a solution to show groups of similar websites, so that the final representation will be sphere shaped. Instead of applying a traditional dimensionality reduction algorithm, the three dimensions will be taken and adapted from psychology theory, starting from the Wundt’s three-dimensional theory of emotions [1]. Based on the studies made by Osgood et at. [2] the dimensions that will be used use are valence, arousal and dominance.
+
& Medicine websites to a three dimensional continuous space, by transposing a website’s characteristics to the Valence, Arousal and Dominance
Valence define the extent to which a website is trying to sell illegal drugs, rather than selling legal ones or treating the subject in other allowed ways.
+
psychological dimensions. For the Valence dimension, we performed supervised continuous sentiment analysis, trained on samples different from
Arousal measures how often a certain website has been updated.
+
the ones in the investigated websites data-set. In order to develop a reliable
Dominance is a measure of the popularity of the website based on Alexa rank, google Pagerank.
+
model, experiments were made to discover that changing the granularity
 
+
of the outcome of a sentiment analysis task does not influence its performance, when the training set and the test set come from different contexts.
The purpose of this study is to see whether a mapping from a different (namely psychological) field can be mapped in a coherent way to a different context, which is in our case Drugs & medicine websites, and return clusters useful for the final users of the tool (fiscal inspectors). The second purpose is to see how a three dimensional, rather than the usual two dimensional, representation can help users to understand and make use of the outcome of a clustering task.
+
We also experimented with two interpretations of Valence, finding out their
 
+
orthogonality. The dimension of Arousal is represented by the amount of
'''References'''
+
the changes of a website’s pages, and Dominance is a measure of the popularity that a website has. When comparing the dimensions between each
 
+
others, we find a small correlation between the Arousal and Dominance dimensions, due to the intrinsic properties of relevant websites. Finally, after
[1] Reisenzein, R. (1992). A structuralist reconstruction of Wundt's three-dimensional theory of emotion.
+
performing a clustering task, we show how the points of the same cluster,
 
+
represented in the VAD space, are more close to each other compared to a
[2] Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. (1957). The measurement of meaning. Urbana, IL: University of Illinois Press.
+
random space.

Revision as of 09:59, 2 March 2016


has title::Mapping drugs and medicine websites into VAD space
status: ongoing
Master: project within::Technical Artificial Intelligence
Student name: student name::Alberto Caroli
Dates
Start start date:=2015/04/13
End end date:=2015/10/13
Supervision
Supervisor: Guszti Eiben
Second supervisor: Bas Weitjens
Company: has company::Sentient
Thesis: has thesis::Media:Thesis.pdf
Poster: has poster::Media:Posternaam.pdf

Signature supervisor



..................................

Abstract

In this research the VAD emotional state model is used to map Drugs & Medicine websites to a three dimensional continuous space, by transposing a website’s characteristics to the Valence, Arousal and Dominance psychological dimensions. For the Valence dimension, we performed supervised continuous sentiment analysis, trained on samples different from the ones in the investigated websites data-set. In order to develop a reliable model, experiments were made to discover that changing the granularity of the outcome of a sentiment analysis task does not influence its performance, when the training set and the test set come from different contexts. We also experimented with two interpretations of Valence, finding out their orthogonality. The dimension of Arousal is represented by the amount of the changes of a website’s pages, and Dominance is a measure of the popularity that a website has. When comparing the dimensions between each others, we find a small correlation between the Arousal and Dominance dimensions, due to the intrinsic properties of relevant websites. Finally, after performing a clustering task, we show how the points of the same cluster, represented in the VAD space, are more close to each other compared to a random space.