Mapping drugs and medicine websites into VAD space

Revision as of 20:05, 31 May 2015

Within the EU TAFEIC project (Tools Against Financial and Economic Internet Crime), Parabots and Sentient are collecting large sets of websites relating to sales of drugs over the web. The goal of this thesis is to find and develop a way to cluster these sites and give a tool to fiscal inspectors to visualize this data in clear way that helps them to identify malicious and/or fraud websites. In order to do that, 3D representation will be used as a solution to show groups of similar websites, so that the final representation will be sphere shaped. Instead of applying a traditional dimensionality reduction algorithm, the three dimensions will be taken and adapted from psychology theory, starting from the Wundt’s three-dimensional theory of emotions [1]. Based on the studies made by Osgood et at. [2] the dimensions that will be used use are valence, arousal and dominance. Valence define the extent to which a website is trying to sell illegal drugs, rather than selling legal ones or treating the subject in other allowed ways. Arousal measures how often a certain website has been updated. Dominance is a measure of the popularity of the website based on Alexa rank, google Pagerank.

The purpose of this study is to see whether a mapping from a different (namely psychological) field can be mapped in a coherent way to a different context, which is in our case Drugs & medicine websites, and return clusters useful for the final users of the tool (fiscal inspectors). The second purpose is to see how a three dimensional, rather than the usual two dimensional, representation can help users to understand and make use of the outcome of a clustering task.


