Bioinformatic analysis of whole genome sequencing data from clinical Mycobacterium tuberculosis and Leptospira sp. strains

From Master Projects
Jump to: navigation, search

About Bioinformatic analysis of whole genome sequencing data from clinical Mycobacterium tuberculosis and Leptospira sp. strains


Tuberculosis (TB) is one of the most severe infectious diseases, although TB can be easily treated with a combination of antibiotics. The emerging epidemic of drug-resistant TB calls for improved diagnostic tools and early detection of drug resistance. This allows appropriate treatment of the patient and could thereby reduce the incidence of multidrug-resistant TB (MDR-TB) or extensively drug resistant TB (XDR-TB) and secondary cases.

We have collected M. tuberculosis strains from patients in a high MDR-TB burden country in Central Asia, 25 of these strains have been subjected to whole genome sequencing via the Illumina method. These strains carry different drug resistance mutations but seem to be epidemiologically linked, according to analysis by routine molecular typing methods; some of the strains belong to the same “cluster”. We would like the student to analyse the genomic data in detail. This will help determine the epidemiological and phylogenetic relationship between the selected strains as well as to M. tuberculosis strains that are epidemic in other regions of the world and hopefully provide additional insights into the problem of MDR-TB.

Leptospirosis is an infectious, re-emerging zoonotic disease caused by pathogenic species of the genus Leptospira. Conventionally leptospires classified based on serology with serovar as the basic taxon. Currently more than 250 serovars placed into 25 serogroups have been identified. Lipopolysaccharide (LPS) is the dominant antigen recognized during the infection and represents the major antigen involved in serological classification of leptospires. In silico comparative analysis of the LPS biosynthetic loci (rfb) could reveal potential genetic markers that define the serovar, evading the need for complicated and expensive serological approaches. The proposed study can be performed on raw genome data of two serovars from our collection in addition to the genome data of more than 200 leptospires serovars available in National Center for Biotechnology Information (NCBI).

Skills needed

Basic knowledge of the bioinformatics principles of genome analysis, confidence with computational analysis and a desire to become familiar with software to perform genomic analysis of pathogens (e.g. BWA, Bowtie, GATK, SAMtools, Artemis), familiarity with data interpretation and phylogenetic analysis (preferred). Basic skills of Linux operating system and command line (Perl).

Skills taught

Genomic analysis of prokaryotic organisms (bacteria), basic molecular epidemiology of two infectious diseases, phylogenetic analysis, oral presentation of obtained results during work meetings and written report at the end of the internship.


KIT biomedical resarch: