Profiling Java-programmers based on application of Data Mining Techniques on Resumes

From Master Projects
Jump to: navigation, search

has title::Profiling Java-programmers based on application of Data Mining Techniques on Resumes
status: ongoing
Student name: student name::Aron de Jong
Start start date:=2014/12/01
End end date:=2015/08/01
Supervisor: Mark Hoogendoorn
Second reader: has second reader::Guszti Eiben
Thesis: has thesis::Media:Aron de Jong_Thesis.pdf
Poster: has poster::Media:Aron de Jong_Posternaam.pdf

Signature supervisor



War for talent

Companies compete heavily in order to attract, select and retain talented employees (Trank et al., 2002). Furthermore, companies try to improve the effectiveness of the recruitment process, by attracting and selecting the right type of employees, having a high level of abilities based on achievements in the past. The following problems are encountered in the process of recruitment:

Manual analysis of large number of applicants. Companies receive a large number of application letters. Generally, the curriculum vitae of the jobs applicants are manually analysed, which can be error prone, costly and cumbersome (Metha et al, 2013).

Shortage of qualified workers -> War on talent. Most companies face difficulties attracting qualified applicants (Ng and Burke, 2005). Talented applicants are scarce. This results in a ‘war on talent’, as companies compete in selecting qualified applicants from the same small pool of candidates.

Breaugh & Starke (2002) present a model describing the recruitment process. The model contains five stages: (1) Recruitment objectives, (2) Strategy development, (3) Recruitment activities, (4) Intervening/Process variables, (5) Recruitment results. This framework incorporates several separate recruitment models and theories.

Goal of thesis

The thesis involves the application of Data Mining techniques in order to increase the effectiveness of the recruitment process. The problem of many IT secondment firms is the shortage of qualified workers, especially Java programmers. The research question of the thesis addresses whether a job profile can be built with Data Mining techniques, so that Java programmers can be better attracted. Related to strategy development stage of Breaugh & Starke’s model, the profile can be used to answer the following two sub-questions: (1) What are the characteristics of a good applicant for a job function? (2) Where can these good applicants be found? To answer these question, resumes of existing employees are used as data source.


Currently, resumes (N=1800) are used as input data. These data are crawled and parsed from an employee database of a IT firm (intranet). The resumes have the following categories: Summary (profile, related projects), Personalia (surname, gender, job function, manager, work unit), Languages, Knowledge (theoretical and practical skills), Education (degrees, certificates) and Working Experience.

Data Mining approach

The current approach is to predict a Java programmer, based on Knowledge skills, Prior Education background and Previous Working Experience background. However, there are several trivial skills (such as Java Spring, Java EE, etc.), which are removed as predictors. Furthermore, irrelevant predictors were removed using feature selection.


[1] Trank, C. Q., Rynes, S. L., & Bretz Jr, R. D. (2002). Attracting applicants in the war for talent: Differences in work preferences among high achievers. Journal of Business and Psychology, 16(3), 331-345.

[2] Mehta, S., Pimplikar, R., Singh, A., Varshney, L. R., & Visweswariah, K. (2013). Efficient multifaceted screening of job applicants. In Proceedings of the 16th International Conference on Extending Database Technology (pp. 661-671). ACM.

[3] Ng, E. S., & Burke, R. J. (2005). Person–organization fit and the war for talent: does diversity management make a difference?. The International Journal of Human Resource Management, 16(7), 1195-1210.