Selecting docking conformations based on predicted interface and interaction strength
|Selecting docking conformations based on predicted interface and interaction strength|
|Student name:||student name::Sije van der Veen|
|Second supervisor:||Anton Feenstra|
Past decades years the human genome has been unravelled. On the genome there are many genes which encodes for protein sequences. These sequences can have several functions and a property can be that they can interact with another protein. Compared to human genome, there is much less experimental data available on protein-protein interaction (PPI), therefore, developing predictive methods for PPIs is an interesting topic for research and development.
Computational protein-protein docking is a valuable tool for determining the conformation of complexes formed by interacting proteins. The problem here is the ranking to select the 'best' predicted bound orientation. At the IBIVU research is performed for creating methods which can predict protein interactions in order to identify stable complexes of interacting proteins based on molecular dynamics simulations. Full atomistic simulation require about a year on a single CPU per PPI which is unfeasible to apply to for example 1000 docking orientations for a single interacting protein pair. A coarse-grained forcefield can be used which brings the run time down to about 1⁄2 a day per PPI. This is still expensive, and certainly far too expensive to investigate all possible PPIs in a genome, for example the 20000 genes in the human genome may give rise to potentially 200 million interacting protein pairs.
In order to reduce the amount of computation needed, we will test whether filtering this ranking by interface information, generated by a interface site prediction, can yield accurate predictions. Interaction site predictions are quick, but not very accurate. Binding orientations which do not include the predicted interface site will be filtered out, to generate a more accurate ranking. Statistical test on the rankings will be performed to assess significance of rankings, and of changes in the rankings generated in this process.
This programming work will be done by python, with some libraries like BioPython. The R language will be useful for the statistical part of the work. Several available interface prediction methods will be used, and docking software may also be used. The start of the work will be Monday 31 august, for a period of 20 weeks.
- May, A., Pool, R., van Dijk, E., Bijlard, J., Abeln, S., Heringa, J., Feenstra, K.A. (2014). Coarse-grained versus atomistic simulations: realistic interaction free energies for real proteins. Bioinformatics, 30(3): 326-334. http://bioinformatics.oxfordjournals.org/content/30/3/326.abstract