Team:Tuebingen/Model

PacMn

Model

Phytochelatin

Phytochelatins are a group of polypeptides which have gained a lot of momentum in the field of synthetic biology and metabolic engineering as they hold the potential to be employed as detoxification modules (for more information see: Design).
As phytochelatins are an important compartiment of our project, we wanted to understand its ion capturing capabilities on a deeper level. Thus our aim was to characterize the polypeptide and to explore possibilities to improve its manganese ion chelation.
As a result, we generated two altered sequences. In the original sequence (1), we have a lot of glutamic acid - cysteine pairs. We assumed that the glutamic acid pulls ions through electronegativity and that the thiol groups of the cysteine then bind the heavy metal atoms. One way in which we tried to improve this binding was through exchanging the cysteines with histidines, as we assumed that the amino acid would bind to metal ions via its imidazole ring. In the second altered sequence, we replaced cysteine with aspartic acid, since its electronegative properties (in addition to Glutamic acid) could bind heavy metal attraction.
In the following, those three generated sequences will be referred to as:
  • Origin, for the Cysteine including sequence.
  • His, for the Histidin including sequence.
  • Asp, for the Aspartic acid including sequence.
For all sequences, we needed to determine a three dimensional structure and test its stability. After this we planned to analyze them further by for example determining and comparing the binding energies via Umbrella sampling. That is a technique to study protein binding-unbinding processes. As well it is possible to estimate the binding free energy between those states.
We built models as specified in the following:
Figure 1: Workflow for the bioinformatic analysis of phytochelatin and its variants.

Figure 1: Workflow for the bioinformatic analysis of phytochelatin and its variants.

Protein structure prediction

To model the three dimensional structures of our versions of the phytochelatin from their sequences, we evaluated several web tools listed here:
  • I-TASSER - https://zhanglab.ccmb.med.umich.edu/I-TASSER
  • Phyre2 - http://www.sbg.bio.ic.ac.uk/phyre2
  • CBS - http://www.cbs.dtu.dk/
  • Orion - https://www.dsimb.inserm.fr
Those tools all employ different methods based on either homology modelling, threading or ab initio, to predict the structures based on the sequence.
Homology modelling methods can be used if there exist (enough) homolog proteins With similar sequence and known structure, as they compare the query peptide sequence against a database. The structures of close matches can be used as templates. Threading is often used to find template structures in a parallelized and fast way.
If there only exists little or no knowledge about homologous proteins, ab-initio modelling might be necessary. In this case, a free energy minimization is performed while iterating over different possible structures for smaller parts of the sequence.
In our case, there exist phytochelatins with known structures and similar sequences, but with much less repeats than observed in our phytochelatin. After extensive research, we decided to work with a hierarchical threading method “I-TASSER”, as it aligns fragments of the query sequence with fragments of a database. Furthermore, the structures obtained with “I-TASSER” scored highest in terms of confidence scores in comparison to all other tools we tested.
Briefly, I-TASSER uses a procedure using four different steps: (8)
  1. Template identification by LOMETS using Threading (3)
  2. Fragment structure reassembly by replica-exchange Monte Carlo simulations (4) using ab initio and the threading results from step 1
  3. Atomic level structure refinement by REMO (5) and FG-MD (6) using molecular dynamic simulations, H-bond network optimization
  4. Structure-based function interpretation by COFACTOR (7) by evaluating similar template proteins about their functions and structures
I-TASSER generates a confidence score to measure the quality of a model-result. Thus, for all of our structures we chose the model with the highest confidence score (C-Score). Models with a C-Score of -1,5 are expected to have a correct fold. Our models have the following C-Scores:
  • Origin: -1,28 (Figure 2)
  • His: -1,12 (Figure 3)
  • Asp: -0,75 (Figure 4)

As these are acceptable confidence scores, we used the Origin, His and Asp models for our next step.
Figure 2

Figure 2: I-TASSER result for the Origin sequence.

Figure 3

Figure 3: I-TASSER result for the His sequence.

Figure 4

Figure 4: I-TASSER result for the Asp sequence.

Molecular dynamics simulation

Next, we needed to test whether the predicted structures are plausible in a Molecular Dynamics (MD) simulation. The method simulates the movement of a set of atoms over time by calculating and analyzing forces between them. The structure being stable for a defined period of time is a good indicator for its stability and thus plausibility in vivo.
To achieve this, we used GROMACS 2018.6 a MD package capable of simulating Molecular Mechanical (MM) and Quantum Mechanical (QM) forces applied to all the particles of a molecule. (9) Using MD simulations it can be assumed whether a molecule can withstand the forces of a defined system and as well calculate the way it will change when exposed to a specific system.
Those systems´ forces are defined via force fields. A force field is a set of parameters for each atom type or group in the simulation that is used to calculate the energy of the system. Since several different force fields for various molecule types exist it is imperative to find the most suitable for the problem at hand.
Although, we can presume that our Origin sequence will be stable, since we got it from an already existing iGEM-Part, we commenced our MD simulation with the Origin structure as well in order to prove this presumption and to receive reference values for the other sequences. Thus, we started by following the “Lysozyme in Water Tutorial” (10) with the exceptions mentioned here:
In our first approach we used the AMBER99-SB Force Field, because AMBER is pretty well known and was recommended to us by Dr. Thiel . Furthermore, we found articles on how to extend the AMBER99-SB Force Field by values of heavy metals (in focus: Manganese). (12) Generally, it is challenging to come across force fields with these properties already built in so we decided that the aforementioned approach is presumably the most promising. Accordingly, we employed the following parameters provided as .mdp files:
integrator = steep
emtol = 1000.0
emstep = 0.01
nsteps = 50000
nstlist = 1
cutoff-scheme = Verlet
ns_type = grid
coulombtype = cutoff
rcoulomb = 1.4
rvdw = 1.4
pbc = xyz
define = -DPOSRES
integrator = md
nsteps = 50000
dt = 0.002
nstxout = 500
nstvout = 500
nstenergy = 500
nstlog = 500
continuation = no
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter = 1
lincs_order = 4
cutoff-scheme = Verlet
ns_type = grid
nstlist = 10
rcoulomb = 1.4
rvdw = 1.4
DispCorr = EnerPres
coulombtype = PME
pme_order = 4
fourierspacing = 0.16
pcoupl = no
pbc = xyz
gen_vel = yes
gen_temp = 300
gen_seed = -1
tcoupl = V-rescale
tc-grps = Protein Non-Protein
ref_t = 300 300
tau_t = 0.1 0.1
define = -DPOSRES
integrator = md
nsteps = 50000
dt = 0.002
nstxout = 500
nstvout = 500
nstenergy = 500
nstlog = 500
continuation = yes
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter = 1
lincs_order = 4
cutoff-scheme = Verlet
ns_type = grid
nstlist = 10
rcoulomb = 1.4
rvdw = 1.4
DispCorr = EnerPres
coulombtype = PME
pme_order = 4
fourierspacing = 0.16 tcoupl = V-rescale
tc-grps = Protein Non-Protein
tau_t = 0.1 0.1
ref_t = 300 300
pcoupl = Parrinello-Rahman
pcoupltype = isotropic
tau_p = 2.0
ref_p = 1.0
compressibility = 4.5e-5
refcoord_scaling = com
pbc = xyz
gen_vel = no
integrator = md
nsteps = 25000000
dt = 0.002
nstxout = 0
nstvout = 0
nstfout = 0
nstenergy = 5000
nstlog = 5000
nstxout-compressed = 5000
compressed-x-grps = System
continuation = yes
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter = 1
lincs_order = 4
cutoff-scheme = Verlet
ns_type = grid
nstlist = 10
rcoulomb = 1.4
rvdw = 1.4
coulombtype = PME
pme_order = 4
fourierspacing = 0.16
tcoupl = V-rescale
tc-grps = Protein Non-Protein
tau_t = 0.1 0.1
ref_t = 300 300
pcoupl = Parrinello-Rahman
pcoupltype = isotropic
tau_p = 2.0
ref_p = 1.0
compressibility = 4.5e-5
pbc = xyz
DispCorr = EnerPres
gen_vel = no
integrator = steep
emtol = 1000.0
emstep = 0.01
nsteps = 50000
nstlist = 1
cutoff-scheme = Verlet
ns_type = grid
coulombtype = PME
rcoulomb = 1.4
rvdw = 1.4
pbc = xyz


In brief, the parameters mostly differ by the following aspects:
  • The most important thing here was to set the coulomb for electrostatic, and van der Waals for intramolecular forces, effects to 1.4 (instead of 1.0). (13)
  • The simulation time was risen up to 50ns (instead of 1ns) for a more reliable test of stability

Figure 5: 50ns Videos of our Phytochelatin-sequences.
As a result, the Origin and our His variant molecules sustained in the AMBER Force Field for 50 nm, whilst our Asp prototype fell apart after ~48ns. This indicates that the Origin and His variant may be stable in vivo, an important indication for a putative application of the His variant in the wet lab.
Concerning the Asp variant, no force field is proven to model 100% of real conditions and a stability for 48 ns are, compared to publications which require 20 ns, quite strong. Thus, we decided to place the Asp variant within the OPLS-AA/L force field. The OPLS-AA/L force field is an all atom force field which was used in the referred tutorial as well. (11) In this force field, the Asp variant remained very stable, making it a second promising candidate for our wet lab work.
Of note, the His modification showed quite a lot of gyration in comparison to the other two variants, which could eventually lead to the protein being torn apart on the long run.

Binding Site Prediction:

In addition to our MD simulation, we used another tool to elucidate the probable binding sites of metal ions (Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+) for Origin, His and Asp using IonCom. IonCom is a webtool from zhanglab to predict possible binding sites to metals. It combines methods like ab initio training with template-based transferals (2).
Contrary to our suspicion of additional sulfide-bonds from the thiol-groups of the Cysteine in the Origin sequence, we found Glutamic Acid (E15 & E19) to bind to Mn2+. Additionally, the Origin sequence does not have active sites for Fe2+, Fe3+, Na+, K+)). In contrast, our His and Asp are both binding to all 7 different metal types and therefore appear to be more versatile than the Origin sequence.
To conclude, we suggest testing and using both of our phytochelatin variants additionally to the original sequence in the wet lab, as they appear to be stable and more versatile in terms of ion binding.
Figure 6

Figure 6: IonCom result for Origin binding to Mn

IonCom Results
Sequence Zn Cu Fe2+ Ca Mn Na K Fe3+ Mg
Origin E1 C2 E3 C4 E5 C6 E7 C8 E9 C10 E11 C12 E13 C14 E15 C16 E17 C18 E19 C20 E21 C22 E23 C24 E25 C26 E27 C28 E29 C30 E31 C32 E33 C34 E35 C36 E37 C38 E39 C40 C2 C18 C22 C26 C28 C32 C38 E39 E13 E15 E17 E19 E15 E19 E35 E37
Asp E1 D16 E19 D22 E23 D24 E25 D26 E27 D28 E29 D30 E31 D32 D34 E35 D36 D38 E39 D18 E19 D22 E23 D26 D28 E29 D32 E33 D36 E39 D36 E1 D4 D6 E7 D8 D10 E13 D14 E15 D16 D18 E19 E21 D22 E23 D24 E25 D26 E27 D28 E29 D30 E31 D32 D34 E35 D36 E37 D38 E39 D40 E19 D20 E21 D22 E23 D24 E25 D26 E27 D28 E29 D30 E31 D32 E33 D34 E35 E37 D38 E39 E25 D26 D28 E29 D30 E31 D32 D34 E35 D36 E39 E19 E23 D24 E25 D26 E27 D28 E29 E31 D32 E35 E37 D38 E39
His E1 H2 E3 H4 E5 H6 E7 H8 E9 H10 E11 H12 H14 E15 H16 E17 H18 E19 H20 E21 H22 E23 H24 E25 H26 E27 H28 E29 H30 E31 H32 E33 H34 E35 H36 E37 H38 E39 H40 H2 H4 H6 H12 H22 H26 H18 E19 H22 H24 H26 H28 E29 H30 H32 E33 H34 E35 H36 H38 E39 E15 H16 E17 H18 E19 E21 H22 E23 H24 H26 H28 E29 H30 E31 H32 E33 H34 E35 H38 E39 H12 E15 H16 E17 H18 E19 H20 E21 H22 E23 H24 E25 H26 E27 H28 E29 H30 E31 H32 E33 H34 H36 E37 E39 H24 H28 H30 H32 E33 H12 E15 H16 E17 H18 E19 H20 E21 H22 E23 H24 E25 H26 E27 H28 E29 H30 E31 H32 E33 H34 E35 H36 E37 H38 E39 H2 H4 H6 H12 H22 H26 H4 H6
Final assessment
Despite not being able to execute all the planned tasks to evaluate phytochelatin alternatives and comparing them to the origin sequence, we can conclude that both of our variants are acceptable, stable candidates, since they sustained for 50 ns in the forcefield. Asp, although torn apart in the AMBER99-SB force field, performed extremely stable in the OPLS force field. Both phytochelatin variants have the potential to improve the binding of phytochelatin to metals, but considering the higher radius of gyration, we suggest to focus on the Asp variant.
As an outlook, the performance of Umbrella sampling to estimate the free energy between bound and unbound structures would present us with a putatively higher variant amongst our alternatives. Furthermore, considering quantum mechanical aspects in our force field would potentially increase the yield of our simulation, as recommended by our collaboration partner from Waterloo.

References

  1. PubChem [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; 2004-. PubChem Compound Summary for CID 20756463, Phytochelatin; [cited 2020 Oct. 16]
  2. Xiuzhen Hu, Qiwen Dong, Jianyi Yang, Yang Zhang. Recognizing metal and acid radical ion binding sites by integrating ab initio modeling with template-based transferals. Bioinformatics 32, no. 21 (2016): 3260-3269
  3. Wu, S., Zhang, Y. LOMETS: a local meta-threading-server for protein structure prediction. Nucleic Acids Res. 35, 3375-3382 (2007).
  4. Zhang, Y., Kihara, D., Skolnick, J. Local energy landscape flattening: parallel hyperbolic Monte Carlo sampling of protein folding. Proteins. 48, 192-201 (2002).
  5. Li, Y., Zhang, Y. REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins. 76, 665-676 (2009).
  6. Zhang, J., Zhang, Y. High-resolution protein structure refinement using fragment guided molecular dynamics simulations. (2011).
  7. Roy, A., Zhang, Y. COFACTOR: protein-ligand binding site predictions by global structure similarity match and local geometry refinement. Forthcoming (2011).
  8. Roy, A., Xu, D., Poisson, J., Zhang, Y. A Protocol for Computer-Based Protein Structure and Function Prediction. J. Vis. Exp. (57), e3259, doi:10.3791/3259 (2011).
  9. M.J. Abraham, D. van der Spoel, E. Lindahl, B. Hess, and the GROMACS development team, GROMACS User Manual version 2019,
  10. GROMACS Tutorial - Lysozyme in Water - Justin A. Lemkul, Ph.D. - Virginia Tech Department of Biochemistry
  11. J. Am. Chem. Soc. 1988, 110, 6, 1657–1666. Publication Date:March 1, 1988 https://doi.org/10.1021/ja00214a001 Copyright © 1988 American Chemical Society
  12. AMBER parameter database
  13. [AMBER] AMBER 99SB force field used by GROMACS