Team:Nanjing-China/Model

Experiments

picture display fail Model
Since the objective of our experiment is to find the best condition for the occurrence of a strong interaction between polyP and proteins, the modeling team set out to simulate the binding between them.
Preparation
The structure of target proteins
Despite the fact that Green Fluorescent Protein (GFP) and spidroins are both proteins that have been studied extensively, Protein Data Bank could not offer us the exact 3D structure of our target proteins because GFP has been mutated in order to bind more efficiently with polyP and no 3D structures of spidroin in full length have been revealed yet. To obtain the accurate structures of our proteins, we adopted homology modeling for +36GFP and hybrid modeling, which is a mixture of homology modeling and ab initio modeling, for spidroins.
Using Swiss model and Robetta, we successfully obtained the tertiary structure of +36GFP, spidroin(sp), spidroin1(a variant of native spidroin, sp1) and spidroin2(a different variant of native spidroin, sp2) (Fig 2). The details of the sequence information for each protein can be found in the experiment part.
Hypothesis
Since the question we want to answer is the exact binding behaviors between proteins and polyP, we dedicate to answering it in 2 ways. First, we seek to find the ratio for optimal binding between polyP and target proteins which is especially useful for the experiment team to conduct their experiments. Second, we want to find the interacting patterns between these two. Our initial hypothesis for the binding pattern between polyP and target proteins is that for one single polyP, multiple proteins could bind to it at different sites.
To test this hypothesis, we first simplify the model by assuming that the effect of the binding sites on polyP60 would be equivalent to that of a shorter polyP with the same length, so the docking results between polyPs in different lengths and target proteins should kill two birds with one stone: On one hand, docking is a rather precise way to tell us the interacting patterns. On the other hand, since the polyP used by the experiment team is of 45 units long, if we successfully find a polyP with a specific length N that bind to polyP most potently, the final ratio of target proteins and PolyP should be 45/N. A graphic representation for the hypothesis above is shown in Fig 1.
Before we step any further, the above hypotheses provide us with several testable points:
(1) when multiple proteins bind to a single polyP, interactions between proteins should not be discernable enough to influence the interacting behavior of single protein.
(2) PolyP should adopt a nearly linear conformation for the efficient binding of multiple proteins since linear conformation can reduce the influence of non-binding sites on binding significantly because circular conformation is more likely to put non-binding sites of polyP in space adjacent to the binding area.
Part I      docking  and  analysis
We first begin our modeling by docking the polyPs in various length (3, 5, 10, 20 units) to the target proteins, during which we wish to see some patterns of binding and use the predicted binding energy to tell the difference of binding abilities between different proteins and find the optimal length N for binding.
Using Autodock4, we carried out the semi-flexible docking, i.e. the conformation of the ligand can be changed while the conformation of receptor protein remains rigid.
The basic principle of Autodock can be divided into 2 parts. First, the Autogrid will generate a gridbox that for each grid in the box, the program will put a probe atom and then calculate several kind of interactions. Then the Autodock4 will do conformational search and evaluate the energy for every single conformation consequently, after which we will get the final conformations ranked by energy.
We process the docking procedure according to the tutorial:
(1) The first step is to pre-process the PDB file of proteins and ligand by adding hydrogens, computing charges and assigning AD4 type for each atom. For ligand, we need to choose roots and torsion trees for it in extra so we can allow the conformation of the ligand to change during the docking process.
(2) The second step is to set the map types for searching by choosing the ligand, in which way we can take all the atom types of the ligand as well as electrostatics and desolvation into consideration. After that we create a searching space called gridbox which contains all the atoms of the receptor protein, completing the process of preparing GPF file.
(3) The third step is to let the Autogrid executable file process the GPF file.
(4) The forth step is to create a DPF file for Autodock executable file, and during this process we get to choose the appropriate search algorithm.
(5) After processed by Autodock executable file, we can receive a DLG file which presents us the conformations of ligand docking into the receptor protein.
The whole procedures of docking are in great consistent with the tutorial with a few significant differences: In an effort to achieve more accurate docking results, for polyP within 20 units, we first adopted simulated annealing algorithm in a single docking process and let it run 100 times. Then in an effort to optimize our results, we seek to narrow down the searching space and use a different algorithm: genetic algorithm. To achieve that goal, we select the top 10 docking results of the previous process, take a close look at the interacting amino acids of the protein and manually divide the docking results into groups based on the distance of interacting amino acids. For example, picture 3B and 3C show adjacent binding areas, therefore we put these 2 docking results into 1 group. picture 3D and 3E show a distinct yet adjacent binding areas, so we put these two into another group. We then use the gridbox that contains all the interacting residues from 1 group as the search area (Fig 3A right), which is much smaller than before (Fig 3A left).
Our docking results (Fig 4-6) revealed several interesting points
(1)Analysis of the interacting residues
The first observation that we make is that for one protein, there exists multiple docking sites (data not shown). Many docking sites consist of lysine and arginine, which possess positive charges under appropriate pH conditions and can form strong attractive charge interactions (or electrostatic interactions) with polyP. However, we also observe that for some positive charged amino acids (Fig 5H), they merely form hydrogen bonds with polyP. We therefore speculate that since the semi-flexible docking prohibits the conformation change of proteins, it is possible that these amino acids are not in their “optimal conformations” and therefore do not form the electrostatic interactions. Alternatively, it is also of great possibility that these reflect certain positive charged amino acids simply do not form electrostatic interactions in an effort to gain a more stable conformation in the overall structure, which reminds us, as well as the experiment team, that later when trying to mutate the proteins, we should look into the whole structure of the protein more rather than the single effect of one residue. Due to the fact that the experiment team can provide us with only limited information about the formation of interactions, we could not rule out any of the possibilities.
(2)His tag, a multi-functional site?
Interestingly, the His tag was intentionally inserted into the proteins to ease the purification process, some of our docking results (Fig 5B, 5D, 5F, 6D), however, indicate that it is a promising docking site as well. Since Histidine is a basic amino acid just like arginine and lysine and His tag adopt a rather flexible conformation (Fig 5B), it is possible that under appropriate pH conditions His tag can undertake the job of binding to polyP. However, experiments must be carried out to provide a more concrete proof to such hypothesis.
(3)Binding energy prediction indicates that the mutation of sp is a success.
Theoretically, the binding energy employed by Autodock4 is predicted by a semi empirical free energy force field which takes dispersion/repulsion, hydrogen bonding, electrostatic interactions and desolvation into account, and such predictions are separated into 2 steps: First the energy differences between the bounded forms and unbounded forms of ligand/protein will be calculated. Second the energy of binding between ligand and protein will be calculated. The combination of 2 steps will give us a relatively accurate prediction of binding energy.
Interestingly, this force field is designed to estimate the binding in a water environment, which satisfies our need perfectly (The experiment was carried out in a water environment).
The binding energy predicted by Autodock4(Fig 7 and table 1) shows that while the binding energy between sp and polyP remains high (indicating poor interactions), sp2 and +36GFP show lower binding energy (indicating strong interactions). This result suggests that by mutating certain residues of native sp into arginine, sp2 improves the ability of binding to polyP to an extent that is nearly comparable to +36GFP, indicating that the mutation of sp made by the experiment group is a success.
(4)The content of secondary structures provides us with extra information about binding.
The analysis above all focus on the primary and tertiary structure of proteins, however, it is almost impossible for one to neglect that the secondary structure of +36GFP differ a lot to that of sp or sp2. While +36GFP mainly consists of β-sheets, the majority of secondary structures of all sps is α-helix (Fig 8).
While these results seem nothing peculiar to us, they are in consistent with earlier notions that β-sheets are more likely to combine with the polyP. Some reasons are obvious such as β-sheets possess a large and extended surface (The large surface exposes more sites to bind with the proteins better than α-helix). And according to new study there are highly percentage of β-sheets in the complex of polyP and proteins. Considering that many polyP binding partners are mainly α-helix in their natural conformations, the results show that polyP binding either induces secondary structural changes or stabilizes the β-sheet structure. There is no obvious interaction between polyPs and natural proteins so we prefer the second theory (for the reason why to bind with β-sheet better haven’t been worked out yet). But due to the highly negative charge of polyP we may be hard to tell whether polyP interacts with proteins via their charged side chains or binds to the backbone of β-sheet structures. However, it’s certain that β-sheets are apt to bind with polyP.
The data taken together suggest that it will be helpful for the experiment team to measure the change in the secondary structures of target proteins before and after binding to polyP, due to the fact that the secondary structure of proteins after binding to polyP is difficult to predict.
Part II      Molecular Dynamic Simulations
While insightful observations have been made by docking, it could not give us an overall picture of the whole system of polyP and target proteins in solution. To refine our modeling, we further use the molecular dynamic program GROMACS to simulate the whole process of binding. The core idea of GROMACS is by calculating interactions (force) that acting on every atom, one can obtain trajectories of each atoms during the whole process using Newton’s law of motion. Theoretically, this method can be best at predicting molecules’ behavior.
In this part, we use a protein-polyP ratio derived from our experiment team’s experiment design. We put 3 refined GFP and 43 polyP molecules into a cubic (1.2nm per edge) simulation box, all these molecules were arranged randomly to mimic the initial state of the mixed solution. Both the protein and polyP are configured with the CHARMM36 force field by GROMACS and other tools. Then, the box is filled with water molecule TIP3P until it reaches a reasonable density (~1000 kg/m3). Since the solution should be electroneutral, we further replace 105 water molecules with anions (chloride ion) to balance the positive charge in the system. Also, we perform energy minimization to ensure that the system is configured without steric clashes or inappropriate geometry. Since we place solvents (TIP3P water) and ions around polyP and protein randomly, it is important to let them move to the clefts that water or other small molecules should have been placed. This process includes two steps, NVT equilibration and NPT equilibration. A graphic description of the NVT equilibration and NPT equilibration can be seen in Fig 10 and Fig 11. The first step also ensures that the solvents (usually optimized with themselves, may not be the temperature we need) be brought to the temperature that our simulation needs, and the NPT equilibration can stabilize the density of the system such that our system won’t collapse. After these configuration and equilibration steps, we perform the production step of our system, with time step of 2 fs and total 750000 steps. After these, we obtain the trajectory file which describes the movements of every atoms during the MD production step.
Results from Fig 9 confirm that polyP can bind to the protein at multiple positions. Additionally, it also shows us that usually polyP adopt a linear conformation which, compared to circular conformation, is more likely to interact with multiple proteins. What’s more, the simulation that contains multiple proteins and ligands shows that the interplay between protein and polyP can be more complicated than the docking results revealed: A single polyP chain can be long enough to bind two proteins at the same time while occupying different binding sites of the protein. These results together provide a direct proof to our initial hypothesis that one polyP could bind to multiple target proteins.
Conclusion & Discussion
In this two parts, we modeling team first come up with a hypothesis that one polyP could bind to multiple target proteins. Then we use the docking method to find the binding patterns between them. The results cannot come to a definite conclusion about the exact ratio of polyP and target proteins for efficient binding since Fig 7 do not contain any useful information about the ratio. However, they (Fig 4-6) do show the interacting residues and types of interactions, which partially fulfill our initial goals. And most importantly, it indicates that the mutation of sp made by the experiment team is a success. After that, the molecular dynamics (Fig 9) give us a direct proof to the initial hypothesis that multiple proteins could bind to a single polyP and further proves our hypothesis by confirming that polyP adopt a linear conformation after binding and two proteins are not close enough to exert any influence on each other.
What we have to realize is that the semi-flexible docking methods we employ cannot compare to the flexible docking methods in terms of accuracy. Therefore, there is space for us to improve the quality of docking results. Similarly, molecular dynamics simulation is a powerful tool in studying protein-ligand interactions, however, both computational capability and size of the system (i.e. the total atom number included in the system) will significantly limit its temporal-spatial resolution. Therefore, from this perspective, we believe that our method can be further improved through a finer tuned system’s configuration and longer simulation time (MD production time), or switching to dissipative particle dynamics (DPD) to obtain a longer simulation time at the cost of ignoring detailed interactions between atoms.
What’s more, the results of molecular dynamics of the whole system of polyP and proteins remind us that while some polyP can be used as a bridge to connect different proteins to form the material that has novel characteristics, there exist certain amount of polyP that do not interact with proteins directly. Possible explanation is that these polyPs simply serve to stabilize the whole system at the sacrifice of their own, whether they form higher order structure or not is not known. In order to improve our modeling results and understand the experiment results, we further speculate that polyPs may form some kind of layer that allows the target proteins to roll freely before they go through marked conformational changes to bind efficiently to the layer of polyPs. These are something we want to research on in the future.
figures

Fig.1 Graphic demonstration of our hypothesis. Our initial hypothesis is that the upper situation is the reality of polyP interacting with proteins. To simplify our hypothesis and better suit the docking process, we speculate that the upper situation is nearly equal to the lower situation. Through this simplification, we get to work on a ‘1 vs 1’ scenario rather than a ‘multiple vs 1’ scenario. After we determine the length of binding sites by docking, by simply calculating how many binding sites can one single polyP possess, we should get the optimal ration.




Fig.2 Tertiary structures of target proteins. Using Swiss-model, we obtained the structure of +36GFP(A). Additionally, by using Robetta, we obtained the structure of sp (B), sp1 (C) and sp2 (D).
Fig.3 Screenshots from Autodock tools representing our ‘narrowing down’ strategy. The searching areas differ a lot in sizes and locations before (A left) and after (A right) the first docking process. While picture B and C show adjacent binding areas, picture D and E show a distinct yet adjacent binding areas. The interacting residues detected by the first docking process are shown in atom spheres. Note that all the pictures shown above are merely for demonstration and therefore are not the real docking results.
Fig.4 Docking results for +36GFP: While pictures on the left show the most possible binding sites for polyP3(A), polyP5(C), polyP10 (E)and polyP20(G), pictures on the right give us a closer look at the binding sites and amino acids that are responsible for interacting with polyP3(B), polyP5(D), polyP10(F) and polyP20(H). The dotted lines in green are conventional hydrogen bonds, the dotted lines in orange are attractive charge interactions and the dotted lines in gray are carbon hydrogen bond interactions (all of the interactions above are predicted rather than experimental proved).


Fig.5 Docking results for sp: While pictures on the left show the most possible binding sites for polyP3(A), polyP5(C), polyP10 (E)and polyP20(G), pictures on the right give us a closer look at the binding sites and amino acids that are responsible for interacting with polyP3(B), polyP5(D), polyP10(F) and polyP20(H). The dotted lines in green are conventional hydrogen bonds, the dotted lines in orange are attractive charge interactions and the dotted lines in gray are carbon hydrogen bond interactions (all of the interactions above are predicted rather than experimental proved).


Fig.6 Docking results for sp2: While pictures on the left show the most possible binding sites for polyP3(A), polyP5(C), polyP10 (E)and polyP20(G), pictures on the right give us a closer look at the binding sites and amino acids that are responsible for interacting with polyP3(B), polyP5(D), polyP10(F) and polyP20(H). The dotted lines in green are conventional hydrogen bonds, the dotted lines in orange are attractive charge interactions and the dotted lines in gray are carbon hydrogen bond interactions (all of the interactions above are predicted rather than experimental proved).


Table 1Predicted binding energy for every docking simulation shown above. The data are collected from Autodock4.


Fig.7 Predicted binding energy for every docking simulation shown above. The blue, orange and gray dots represented +36GFP, sp and sp2, respectively. The data are collected from Autodock4 and shown in table 1.


(A)
(B)
(C)
(D)
Fig.8 Secondary structures of +36GFP, sp and sp2 predicted by Psipred. Detailed information about the secondary structures of +36GFP, sp and sp2 is shown in A, B, C, respectively. A comparison which only takes the two most classic secondary structures: α-helix and β-strands into account is made in D.


  (A)   (B)   (C)
Fig.9 +36GFPs bind to a single polyP60. This is extracted from 1 state of 152 state during molecular dynamics stimulations, while we have found many other states possess such “2 vs 1” scenario, this is especially typical. While it is obvious that the polyP60 confer a nearly linear conformation under such condition(A), we find that the interacting amino acids for each +36GFP differ a lot (B, C).


Fig.10 System's temperature during NVT equilibration: This step ensures that both the solvent and solute are brought to the temperature we desire. In the modelling step, we chose room temperature (~300K) as the system’s initial state. This plot indicates that the system was brought to 300K successfully with reasonable fluctuations. The temperature is derived from the average kinetic energy of the molecules.


Fig.11 System's density during NPT equilibration: During the NPT equilibration we set number of particles, pressure and temperature to constant. And let the density of the system adjust to a proper value. This step is also known as isothermal-isobaric process and most closely resembles experimental conditions.