Team:IISER-Pune-India/Model

Objectives


  1. To identify and model structures of protein-protein interactions between Plasmodium falciparum infected erythrocytes and Human endothelial cell-surface receptors that play a pivotal role in the pathology of malaria.

  2. In silico design and development of a library of peptide drugs against Plasmodium falciparum using Cyclotide kalata B1 as the drug scaffold.

  3. To generate Cyclotide kalata B1 grafted with the peptide of interest and assess its stability.


Our Approach:

1) Preprocessing


Results of this step is given in the Model Results subpage.


The first and most crucial step in developing any inhibitory drug is to choose the interaction which the inhibitor will target. We started with a thorough literature review to document important protein-protein interactions between P. falciparum-infected RBCs and Human endothelial cells and their receptors that are responsible for the pathology of the disease.We started our search from the PlasmoDB Database [1]. As we needed an exhaustive list of interactions, along with their crystal structures, we performed an advanced search on RCSB [2] for all Protein-protein interactions between Plasmodium falciparum and Homo sapiens. This gave us a list of 444 PDB IDs. We wrote a program for a static-web parser that dynamically parsed the RCSB Database to yield all the required information on all the shortlisted PDB IDs.

We worked with this database of 444 interactions and narrowed our interests to choose only those interactions which contained the wildtype Human protein and P. falciparum-infected erythrocytes proteins (We did not consider the interactions with artificial antibodies because we were working only on naturally found protein protein interactins). We started our dry lab work with the following five interactions:


Interaction ID PDB ID Description Type of Interaction Experimental Method
1 5MZA The DBL-beta domain of PF11_0521 PfEMP1 bound to human ICAM-1 Cell Adhesion X-Ray Diffraction
2 4U0Q Plasmodium falciparum reticulocyte-binding protein homologue 5 (PfRH5) bound to basigin Signaling Protein X-Ray Diffraction
3 5LGD The CIDRa domain from MCvar1 PfEMP1 bound to CD36 Cell Adhesion X-Ray Diffraction
4 4V3D The CIDRa domain from HB3var03 PfEMP1 bound to endothelial protein C receptor Signaling Protein X-Ray Diffraction
5 6S8U PfEMP1 IT4var13 DBLbeta domain bound to ICAM-1 Cell Adhesion X-Ray Diffraction

The above 5 protein crystal structures obtained from RCSB were incomplete (with missing residues and regions). We attempted to computationally model the missing regions and build models using homology modeling (with SWISS-MODEL[3]).

How we designed Peptide Inhibitors

The strategy we used to inhibit the formation of the Host-Parasite protein complex was to design a peptide that would competitively bind to the Parasite protein, preventing its binding to the host (Human) protein. Therefore, we identified motifs on the Human protein that remain closest to and bind strongly to the parasite protein. We isolated these motifs along with the parasite protein into different structures and considered each one of them as a potential Inhibitory Model. By this method, we generated multiple models each consisting of the Parasite protein and a peptide sequence (6-20 amino acids in length) from the Human protein.


2) Computational Saturated Mutagenesis and Scoring


Results of this step is given in the Model Results subpage.


Computational Saturated Mutagenesis


Saturation mutagenesis is a protein-engineering technique in which single/multiple codons is substituted with all possible amino acids at a particular position. This is used widely to improve upon certain properties of proteins such as catalytic activity, enantioselectivity, thermostability, and binding affinity.[4]

Here we performed Saturation Mutagenesis on the peptide sidechain of each model to obtain mutated sequences that show better binding affinity than the original peptide. We wrote a custom script in Python 2.7 that acts as a wrapper function, using the inbuilt functions of UCSF Chimera[5] to mutate a specific residue to a different side-chain conformation (rotamer) or into a different amino acid for each position of the original sequence. This was repeated for all the Inhibitory Models.

We used the Dunbrack 2010 smooth backbone-dependent rotamer library and for chain-terminal residues, the Dunbrack 2002 backbone-independent version was used.[6] The backbone-dependent rotamer library consists of rotamer frequencies, mean dihedral angles, variances as a function of the backbone dihedral angles and the rotamer probabilities, estimated using kernel density estimation. Bond lengths and angles were taken from the Amber parameter files all*94.lib and hydrogens were not included. Only the sidechain atoms of a rotamer were evaluated.

Rotamers were chosen automatically based on the lowest clash score, the most number of Hydrogen-bonds, best fit to an electron density map, and/or highest probability according to the library. For clashes and Hydrogen-bond detection, interactions with other rotamers in the same set and the current residue at that position were disregarded, but all other atoms in the vicinity were included.

Identifying Clashes and Hydrogen bonds


Clashes, which is one of the selection criteria, are those unfavourable interactions that occur when atoms come too close together i.e. are in close contact with each other. The clash parameters such as ‘overlap’ control what can to be considered as a clash. The Van der Waals (VDW) overlap between two atoms i and j is defined as the sum of their VDW radii minus the distance between them and minus an allowance for potentially hydrogen-bonded pairs:

Overlapij = rVDW-i + rVDW-j - dij - allowanceij

A cutoff of 0.6 Å was set as the minimum VDW overlap that would count as a clash. A larger positive cutoff restricts the results to more severe clashes. When the VDW overlap was calculated, 0.4 Å was subtracted for atom pairs consisting of possible hydrogen bond donors (or its hydrogen) and possible acceptor as an allowance. An allowance > 0 reflects the observation that atoms sharing a hydrogen bond can come closer to each other than would be expected from their VDW radii. The clash score is calculated as a simple count of the number of clashes.

To identify the Hydrogen-bonds, a set of geometric criteria based on a survey of small-molecule crystal structures, as described in[7] was used with a tolerance of 0.4 Å on the distance and 20.0 degrees on the angular range criteria for each category of Hydrogen-bond. There is an upper bound on distance and one or more angular range criteria for each category of the Hydrogen-bond. It is generally useful to add some relaxation to this criteria because most structures are not as high-resolution as those used in the survey.


Scoring of the mutants


To compare the binding energies of these mutant peptides to the parasite protein, we scored the models obtained after saturation mutagenesis using FoldX, an empirical force field.[8] FoldX is a force field developed for the rapid evaluation of stability, folding, and dynamics of proteins to assess the effect of mutations. The different energy terms taken into account in FoldX have been weighted using empirical data from protein engineering experiments, and the predictive power has been tested on a very large set of protein mutants, covering most of the structural environments found in proteins.[9]

We worked with FoldX because it had been validated on protein-peptide complexes, to estimate the change of the protein–peptide-binding energy with respect to the energy of an experimentally resolved complex and also because the use of FoldX in protein design has been extensively reported.[10]


Repairing and Standardizing Models before scoring


  • The 'RepairPDB' function within FoldX was used to perform a quick optimization in native structures[10] before scoring the models. 'RepairPDB' identifies those residues which have bad torsion angles, or Van der Waals' clashes, or total energy, and repairs them. First, it looks for all Asn (Asparagine), Gln (Glutamine) and His (Histidine) residues and flips them by 180 degrees.

  • This is done to prevent incorrect rotamer assignment in the structure due to the fact that the electron density of Asn and Gln carboxamide groups is almost symmetrical and the correct placement can only be discerned by calculating the interactions with the surrounding atoms. The same applies to His.

  • A small optimization of the side chains is done to eliminate small Van der Waals’ clashes. This way moving side chains in the final step is prevented. The residues that have very bad energies are identified and mutated and their neighbours to themselves to explore the different rotamer combinations to find the new energy minima.[8] We used the default values for the various parameters in the command.

Assessing Energy Values


The interaction between two molecules is driven by the free energy of binding, (ΔGbinding), that is directly related to the thermodynamic dissociation constant (Kd) by the following equation:

ΔGbinding = -RT ln(Kd)

where R is the gas constant (1.9859 cal mol-1 K-1) and T is the temperature in Kelvin.[9]

In order to calculate the free energy of binding of a complex AB, FoldX ‘AnalyseComplex’ command was used. It computes the Gibbs energies of the complex (ΔGAB) and of the two molecules A and B alone. It operates by unfolding the selected targets and determining the stability of the remaining molecules and then subtracting the sum of the individual energies from the global energy.[8] The free energy of folding (the stability) of the protein is broken down into the energy components used by FoldX such as the contribution of backbone Hydrogen-bonds, the contribution of sidechain-sidechain and sidechain-backbone Hydrogen-bonds, Electrostatic interactions, Penalization for burying polar groups, Contribution of hydrophobic groups, Energy penalization due to Van der Waals’ clashes (inter-residue), Entropy cost of fixing the side chain, etc.

The interaction energy is then given by:

ΔGbinding = ΔGAB - (ΔGA + ΔGB)

The output contains the ΔGbinding for each pair of polypeptide chains in the PDB file plus an additional term that reflects the intrachain Van der Waals clashes of residues forming part of the interface.[9]

After scoring all our mutant models (models obtained after performing saturation mutagenesis on the initial models), we sorted the peptide inhibitors according to their Binding/Interaction Energy. We could identify many peptides that had better scores than the wildtype sequence and hence could bind stronger to the Parasite protein. We also created and scored Hybrid peptides for many mutants by selecting the residues with the least Binding Energy at each position and stitched them together to form a peptide of the same length.


3) Molecular Dynamics Simulations


Results of this step is given in the Model Results subpage.


Why Molecular Dynamic Simulations are necessary


Simulations can provide the ultimate detail concerning individual particle motions as a function of time and can be used to address specific questions about the properties of a model system. Molecular dynamics can be defined as a computer simulation technique that permits the prediction of the time evolution of a particular interacting system involving the generation of atomic trajectories of a system using numerical integration of Newton’s laws of motion for a specific interatomic potential defined along with appropriate initial and boundary conditions.[11] Molecular dynamics simulations are ubiquitous for understanding the physical basis of the structure and function of biological macromolecules.

We performed MD Simulations to understand how well the designed inhibitory peptide interacts with the Parasite protein. If the interaction is strong, then it may act as an efficient peptide inhibitor and can slow down the pathogenesis of Plasmodium falciparum malaria. We also plan to graft the inhibitor into a synthetic kalataB1 cyclotide and check its stability under physiological conditions.


GROMACS for performing MD Simulations


For performing the MD Simulations, we have used the GROMACS 2019.1 package.[12] We chose the AMBER99SB-ILDN force field as it had a better accord with experimental NMR relaxation data of test protein systems[13] and also was used for the MD Simulation of a similar study of peptide-peptide interaction.[14] This force field, based on the ff94 force field is commonly associated with the AMBER simulation package is one of the most widely used parameter sets for biomolecular simulation.[13] All MD Simulations were performed on the PARAM BRAHMA facility under the National Supercomputing Mission, Government of India at IISER Pune.

Methods

MD runs were performed in triplicates for three best scoring inhibitor models along with their respective Parasite protein. We generated a topology file using the AMBER99SB-ILDN force fields and solvated it using the SPC/E water model.[15] A water box of the minimum solute-box distance of 1.0 nm was used to solvate each system. Sodium and Chloride ions were added to achieve charge neutrality by replacing the solvent molecules. The Particle mesh Ewald (PME) sum method[16] was used to treat the electrostatic interactions and LINCS algorithm[17] to constrain the hydrogen bond length.

A time step of 2 fs was used for the integration. The whole system was minimized for 50000 steps or till the maximum force was less than 1000.0 kJ/mol/nm. The system was then brought to 300 K in an NVT ensemble simulation of 100 ps using Velocity Rescaling thermostat.[18] The pressure was stabilized in an NPT ensemble for 100 ps using Parrinello-Rahman barostat.[19] The entire system was simulated for 100 ns, where temperature and pressure are regulated by Velocity rescaling thermostat[18] and Parrinello-Rahman barostat,[19] respectively. The snapshots of the structure were taken at every 500 ps starting from zero. The temperature and potential energy of the system were monitored for anomalies.

We visualized the trajectory file to see how the inhibitor interacts with the Parasite protein. RMSD plots were made using trjconv tool in GROMACS to verify the stability of the proteins. Furthermore, the distance between centroids of inhibitor and Parasite protein over time and intermolecular hydrogen bond frequency over the triplicate MD runs were plotted and analyzed using custom scripts.


Analysis of Molecular Dynamics Data

An MD simulation is an in-silico experiment with large volumes of data. We wish to interpret this data to learn more about the interaction between our Inhibitor and the P. falciparum protein. To analyse the MD simulations of a protein and its peptide inhibitor, we need to understand two particular aspects in greater detail:

  1. How does the distance between the centroids of the Protein and peptide evolve over time?
    1. If it increasing, it means that over-time the peptide flies away from the protein and is not binding effectively.
    2. If the distance between the centroid of the two entities remains constant, then over the duration of the simulation, the peptide binds strongly to the protein and can act as a good inhibitor.
    3. If the distance decreases, then the peptide moves closer to the Protein core. (However, this is assumed to be unlikely)

    To perform these calculations, we wrote the script PDB_centroid_analyser

  2. Hydrogen Bond Profile Analysis

    To understand effective binding, we analyse the number of intermolecular Hydrogen bonds between the Protein and the peptide. This entails calculating the hydrogen bond profile of each snapshot of the MD simulation and to determine which residues and specifically which atom in these residues of the peptide inhibitor form hydrogen bonds with the protein. We also aim to determine the relative abundance of Hydrogen bonds formed over the entire simulation period. To perform these calculations we wrote PDB_Hbond_analyser.py


4) Grafting of inhibitors into cyclotides


Results of this step is given in the Model Results subpage.


The ultimate aim of our project is to graft our designed inhibitors into Cyclotide kalata-B1 and hence synthesise a stable peptide inhibitor for various key interactions involved in the pathogenesis of P. falciparum malaria. So we decided to computationally assess the stability of our grafted cyclotide drug. There were many challenges to this, beginning with the homology modelling of cyclic peptides. Most homology modelling packages give us a linear peptide and there was difficulty in finding a template, so we approached Dr. David Norman, College of Life Sciences, University of Dundee, for help. He provided us with a tutorial[20] on the MODELLER[21] package and offered help in case we faced any difficulties.

The first step in modelling a sequence is aligning our sequence to a template. Since structures identical to our grafted cyclotides could not be found, we used the structure of kalata B1 given in the PDB database with PDB ID 4TTN to get a linear homology model using SWISS-MODEL.[3] We used a custom script built on top of MODELLER[21] to align our target sequence to the model obtained from SWISS-MODEL. The alignment was done using the BLOSUM62 scoring scheme. The alignment file along with the PDB file of the model from SWISS-MODEL was used to generate an N to C backbone circularised. MODELLER gives non circularised peptides as output by default, so we used a custom script using a special patch to achieve an end to end circularisation of the backbone.

There are a lot of shortcomings for this method. One major problem is that as the hybrid cyclotide was modelled based only on a native kalata B1 backbone, the 3-dimensional structure of the grafted region may not have modelled properly. This may affect the binding affinity of the inhibitor for the Parasite protein. We performed MD simulations on these hybrid cyclotides using the same parameters we used for our peptide binding studies. As the study is still rudimentary, we just observed the RMSD plots with respect to the backbone to get an idea of its stability


References


[1] Bahl, A. (2003). PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data. Nucleic Acids Research, 1, 212–215;
DOI: 10.1093/nar/gkg081

[2] Parasuraman, S. (2012). Protein data bank. Journal of Pharmacology and Pharmacotherapeutics, 4, 351;

DOI: 10.4103/0976-500x.103704

[3] Schwede, T. (2003). SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Research, 13, 3381–3385;
DOI: 10.1093/nar/gkg520

[4] When Second Best Is Good Enough: Another Probabilistic Look at Saturation Mutagenesis. Yuval Nov (2011).
DOI: 10.1128/AEM.06265-11

[5] UCSF Chimera--a visualization system for exploratory research and analysis. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. J Comput Chem. 2004 Oct;25(13):1605-12. PMID: 15264254;
DOI: 10.1002/jcc.20084

[6] A Smoothed Backbone-Dependent Rotamer Library for Proteins Derived from Adaptive Kernel Density Estimates and Regressions Maxim V. Shapovalov, Roland L. Dunbrack Jr. Open archive:
DOI: 10.1016/j.str.2011.03.019

[7] Three-dimensional hydrogen-bond geometry and probability information from a crystal survey. Mills JE, Dean PM. J Comput Aided Mol Des. 1996 Dec;10(6):607-22;
DOI: 10.1007/BF00134183

[8] The FoldX web server: an online force field; Joost Schymkowitz, Jesper Borg, Francois Stricher, Robby Nys, Frederic Rousseau, and Luis Serrano; Nucleic Acids Res. 2005 Jul 1; 33(Web Server issue) Published online 2005 Jun 27.
DOI: 10.1093/nar/gki387

[9] ADAN: a database for prediction of protein-protein interaction of modular domains mediated by linear motifs; J. A. Encinar, G. Fernandez-Ballester, I. E. Sánchez, E. Hurtado-Gomez, F. Stricher, P. Beltrao, L. Serrano; Bioinformatics, September 2009;
DOI: 10.1093/bioinformatics/btp424

[10] BindProfX: Assessing Mutation-Induced Binding Affinity Change by Protein Interface Profiles with Pseudo-Counts, Peng Xiong, Chengxin Zhang, Wei Zheng, Yang Zhang;
DOI: 10.1016/j.jmb.2016.11.022

[11] “Molecular Dynamics - an Overview | ScienceDirect Topics.” ScienceDirect.Com | Science, Health and Medical Journals, Full Text Articles and Books.,
www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/molecular-dynamics Accessed 6 Oct. 2020

[12] Berendsen, H. J. C., et al. “GROMACS: A Message-Passing Parallel Molecular Dynamics Implementation.” Computer Physics Communications, no. 1–3, Elsevier BV, Sept. 1995, pp. 43–56. Crossref,
DOI: 10.1016/0010-4655(95)00042-e

[13] Hornak, Viktor, et al. “Comparison of Multiple Amber Force Fields and Development of Improved Protein Backbone Parameters.” Proteins: Structure, Function, and Bioinformatics, no. 3, Wiley, Nov. 2006, pp. 712–25. Crossref,
DOI: 10.1002/prot.21123

[14] Sen, Neeladri, et al. “Predicting and Designing Therapeutics against the Nipah Virus.” PLOS Neglected Tropical Diseases, edited by Jeanne Salje, no. 12, Public Library of Science (PLoS), Dec. 2019, p. e0007419. Crossref,
DOI: 10.1371/journal.pntd.0007419

[15] Mark, Pekka, and Lennart Nilsson. “Structure and Dynamics of the TIP3P, SPC, and SPC/E Water Models at 298 K.” The Journal of Physical Chemistry A, no. 43, American Chemical Society (ACS), Nov. 2001, pp. 9954–60. Crossref,
DOI: 10.1021/jp003020w

[16] Darden, Tom, et al. “Particle Mesh Ewald: AnN⋅log(N) Method for Ewald Sums in Large Systems.” The Journal of Chemical Physics, no. 12, AIP Publishing, June 1993, pp. 10089–92. Crossref,
DOI: 10.1063/1.464397

[17] Hess, Berk, et al. “LINCS: A Linear Constraint Solver for Molecular Simulations.” Journal of Computational Chemistry, no. 12, Wiley, Sept. 1997, pp. 1463–72. Crossref,
DOI: 10.1002/(sici)1096-987x(199709)18:12 <1463::aid-jcc4>3.0.co;2-h

[18] Bussi, Giovanni, et al. “Canonical Sampling through Velocity Rescaling.” The Journal of Chemical Physics, no. 1, AIP Publishing, Jan. 2007, p. 014101. Crossref,
DOI: 10.1063/1.2408420

[19] Parrinello, M., and A. Rahman. “Polymorphic Transitions in Single Crystals: A New Molecular Dynamics Method.” Journal of Applied Physics, no. 12, AIP Publishing, Dec. 1981, pp. 7182–90. Crossref,
DOI: 10.1063/1.328693

[20] Cyclic protein - Modeller Wiki. (n.d.). Andrej Sali Lab. Retrieved October 20, 2020, from
salilab.org/modeller/wiki/Cyclic protein

[21] Webb, B., & Sali, A. (2014). Comparative Protein Structure Modeling Using MODELLER. Current Protocols in Bioinformatics, 1, 5.6.1-5.6.32.
DOI: 10.1002/0471250953.bi0506s47