PacMn

Model

Phytochelatin

Phytochelatins are a group of polypeptides which have gained a lot of momentum in the field of synthetic biology and metabolic engineering as they hold the potential to be employed as detoxification modules (for more information see: Design).

As phytochelatins are an important compartiment of our project, we wanted to understand its ion capturing capabilities on a deeper level. Thus our aim was to characterize the polypeptide and to explore possibilities to improve its manganese ion chelation.

As a result, we generated two altered sequences. In the original sequence (1), we have a lot of glutamic acid - cysteine pairs. We assumed that the glutamic acid pulls ions through electronegativity and that the thiol groups of the cysteine then bind the heavy metal atoms. One way in which we tried to improve this binding was through exchanging the cysteines with histidines, as we assumed that the amino acid would bind to metal ions via its imidazole ring. In the second altered sequence, we replaced cysteine with aspartic acid, since its electronegative properties (in addition to Glutamic acid) could bind heavy metal attraction.
In the following, those three generated sequences will be referred to as:

Origin, for the Cysteine including sequence.
His, for the Histidin including sequence.
Asp, for the Aspartic acid including sequence.

For all sequences, we needed to determine a three dimensional structure and test its stability. After this we planned to analyze them further by for example determining and comparing the binding energies via Umbrella sampling. That is a technique to study protein binding-unbinding processes. As well it is possible to estimate the binding free energy between those states.
We built models as specified in the following:

Figure 1: Workflow for the bioinformatic analysis of phytochelatin and its variants.

Protein structure prediction

To model the three dimensional structures of our versions of the phytochelatin from their sequences, we evaluated several web tools listed here:

I-TASSER - https://zhanglab.ccmb.med.umich.edu/I-TASSER
Phyre2 - http://www.sbg.bio.ic.ac.uk/phyre2
CBS - http://www.cbs.dtu.dk/
Orion - https://www.dsimb.inserm.fr

Those tools all employ different methods based on either homology modelling, threading or ab initio, to predict the structures based on the sequence.

Homology modelling methods can be used if there exist (enough) homolog proteins With similar sequence and known structure, as they compare the query peptide sequence against a database. The structures of close matches can be used as templates. Threading is often used to find template structures in a parallelized and fast way.
If there only exists little or no knowledge about homologous proteins, ab-initio modelling might be necessary. In this case, a free energy minimization is performed while iterating over different possible structures for smaller parts of the sequence.

In our case, there exist phytochelatins with known structures and similar sequences, but with much less repeats than observed in our phytochelatin. After extensive research, we decided to work with a hierarchical threading method “I-TASSER”, as it aligns fragments of the query sequence with fragments of a database. Furthermore, the structures obtained with “I-TASSER” scored highest in terms of confidence scores in comparison to all other tools we tested.

Briefly, I-TASSER uses a procedure using four different steps: (8)

Template identification by LOMETS using Threading (3)
Fragment structure reassembly by replica-exchange Monte Carlo simulations (4) using ab initio and the threading results from step 1
Atomic level structure refinement by REMO (5) and FG-MD (6) using molecular dynamic simulations, H-bond network optimization
Structure-based function interpretation by COFACTOR (7) by evaluating similar template proteins about their functions and structures

I-TASSER generates a confidence score to measure the quality of a model-result. Thus, for all of our structures we chose the model with the highest confidence score (C-Score). Models with a C-Score of -1,5 are expected to have a correct fold. Our models have the following C-Scores:

Origin: -1,28 (Figure 2)
His: -1,12 (Figure 3)
Asp: -0,75 (Figure 4)

As these are acceptable confidence scores, we used the Origin, His and Asp models for our next step.

Figure 2: I-TASSER result for the Origin sequence.

Figure 3: I-TASSER result for the His sequence.

Figure 4: I-TASSER result for the Asp sequence.

Molecular dynamics simulation

Next, we needed to test whether the predicted structures are plausible in a Molecular Dynamics (MD) simulation. The method simulates the movement of a set of atoms over time by calculating and analyzing forces between them. The structure being stable for a defined period of time is a good indicator for its stability and thus plausibility in vivo.

To achieve this, we used GROMACS 2018.6 a MD package capable of simulating Molecular Mechanical (MM) and Quantum Mechanical (QM) forces applied to all the particles of a molecule. (9) Using MD simulations it can be assumed whether a molecule can withstand the forces of a defined system and as well calculate the way it will change when exposed to a specific system.

Those systems´ forces are defined via force fields. A force field is a set of parameters for each atom type or group in the simulation that is used to calculate the energy of the system. Since several different force fields for various molecule types exist it is imperative to find the most suitable for the problem at hand.

Although, we can presume that our Origin sequence will be stable, since we got it from an already existing iGEM-Part, we commenced our MD simulation with the Origin structure as well in order to prove this presumption and to receive reference values for the other sequences. Thus, we started by following the “Lysozyme in Water Tutorial” (10) with the exceptions mentioned here:

In our first approach we used the AMBER99-SB Force Field, because AMBER is pretty well known and was recommended to us by Dr. Thiel . Furthermore, we found articles on how to extend the AMBER99-SB Force Field by values of heavy metals (in focus: Manganese). (12) Generally, it is challenging to come across force fields with these properties already built in so we decided that the aforementioned approach is presumably the most promising. Accordingly, we employed the following parameters provided as .mdp files:

integrator = steep
emtol = 1000.0
emstep = 0.01
nsteps = 50000
nstlist = 1
cutoff-scheme = Verlet
ns_type = grid
coulombtype = cutoff
rcoulomb = 1.4
rvdw = 1.4
pbc = xyz

define = -DPOSRES
integrator = md
nsteps = 50000
dt = 0.002
nstxout = 500
nstvout = 500
nstenergy = 500
nstlog = 500
continuation = no
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter = 1
lincs_order = 4
cutoff-scheme = Verlet
ns_type = grid
nstlist = 10
rcoulomb = 1.4
rvdw = 1.4
DispCorr = EnerPres
coulombtype = PME
pme_order = 4
fourierspacing = 0.16
pcoupl = no
pbc = xyz
gen_vel = yes
gen_temp = 300
gen_seed = -1
tcoupl = V-rescale
tc-grps = Protein Non-Protein
ref_t = 300 300
tau_t = 0.1 0.1

define = -DPOSRES
integrator = md
nsteps = 50000
dt = 0.002
nstxout = 500
nstvout = 500
nstenergy = 500
nstlog = 500
continuation = yes
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter = 1
lincs_order = 4
cutoff-scheme = Verlet
ns_type = grid
nstlist = 10
rcoulomb = 1.4
rvdw = 1.4
DispCorr = EnerPres
coulombtype = PME
pme_order = 4
fourierspacing = 0.16 tcoupl = V-rescale
tc-grps = Protein Non-Protein
tau_t = 0.1 0.1
ref_t = 300 300
pcoupl = Parrinello-Rahman
pcoupltype = isotropic
tau_p = 2.0
ref_p = 1.0
compressibility = 4.5e-5
refcoord_scaling = com
pbc = xyz
gen_vel = no

integrator = md
nsteps = 25000000
dt = 0.002
nstxout = 0
nstvout = 0
nstfout = 0
nstenergy = 5000
nstlog = 5000
nstxout-compressed = 5000
compressed-x-grps = System
continuation = yes
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter = 1
lincs_order = 4
cutoff-scheme = Verlet
ns_type = grid
nstlist = 10
rcoulomb = 1.4
rvdw = 1.4
coulombtype = PME
pme_order = 4
fourierspacing = 0.16
tcoupl = V-rescale
tc-grps = Protein Non-Protein
tau_t = 0.1 0.1
ref_t = 300 300
pcoupl = Parrinello-Rahman
pcoupltype = isotropic
tau_p = 2.0
ref_p = 1.0
compressibility = 4.5e-5
pbc = xyz
DispCorr = EnerPres
gen_vel = no

integrator = steep
emtol = 1000.0
emstep = 0.01
nsteps = 50000
nstlist = 1
cutoff-scheme = Verlet
ns_type = grid
coulombtype = PME
rcoulomb = 1.4
rvdw = 1.4
pbc = xyz

In brief, the parameters mostly differ by the following aspects:

The most important thing here was to set the coulomb for electrostatic, and van der Waals for intramolecular forces, effects to 1.4 (instead of 1.0). (13)
The simulation time was risen up to 50ns (instead of 1ns) for a more reliable test of stability

Figure 5: 50ns Videos of our Phytochelatin-sequences.

As a result, the Origin and our His variant molecules sustained in the AMBER Force Field for 50 nm, whilst our Asp prototype fell apart after ~48ns. This indicates that the Origin and His variant may be stable in vivo, an important indication for a putative application of the His variant in the wet lab.

Concerning the Asp variant, no force field is proven to model 100% of real conditions and a stability for 48 ns are, compared to publications which require 20 ns, quite strong. Thus, we decided to place the Asp variant within the OPLS-AA/L force field. The OPLS-AA/L force field is an all atom force field which was used in the referred tutorial as well. (11) In this force field, the Asp variant remained very stable, making it a second promising candidate for our wet lab work.

Of note, the His modification showed quite a lot of gyration in comparison to the other two variants, which could eventually lead to the protein being torn apart on the long run.

Binding Site Prediction:

In addition to our MD simulation, we used another tool to elucidate the probable binding sites of metal ions (Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+) for Origin, His and Asp using IonCom. IonCom is a webtool from zhanglab to predict possible binding sites to metals. It combines methods like ab initio training with template-based transferals (2).

Contrary to our suspicion of additional sulfide-bonds from the thiol-groups of the Cysteine in the Origin sequence, we found Glutamic Acid (E15 & E19) to bind to Mn2+. Additionally, the Origin sequence does not have active sites for Fe2+, Fe3+, Na+, K+)). In contrast, our His and Asp are both binding to all 7 different metal types and therefore appear to be more versatile than the Origin sequence.

To conclude, we suggest testing and using both of our phytochelatin variants additionally to the original sequence in the wet lab, as they appear to be stable and more versatile in terms of ion binding.

Figure 6: IonCom result for Origin binding to Mn

IonCom Results

Sequence	Zn	Cu	Fe2+	Ca	Mn	Na	K	Fe3+	Mg
Origin	E1 C2 E3 C4 E5 C6 E7 C8 E9 C10 E11 C12 E13 C14 E15 C16 E17 C18 E19 C20 E21 C22 E23 C24 E25 C26 E27 C28 E29 C30 E31 C32 E33 C34 E35 C36 E37 C38 E39 C40	C2 C18 C22 C26 C28 C32 C38 E39		E13 E15 E17 E19	E15 E19				E35 E37
Asp	E1 D16 E19 D22 E23 D24 E25 D26 E27 D28 E29 D30 E31 D32 D34 E35 D36 D38 E39	D18 E19 D22 E23 D26 D28 E29 D32 E33 D36 E39	D36	E1 D4 D6 E7 D8 D10 E13 D14 E15 D16 D18 E19 E21 D22 E23 D24 E25 D26 E27 D28 E29 D30 E31 D32 D34 E35 D36 E37 D38 E39 D40	E19 D20 E21 D22 E23 D24 E25 D26 E27 D28 E29 D30 E31 D32 E33 D34 E35 E37 D38 E39	E25 D26 D28 E29 D30 E31 D32 D34 E35 D36 E39	E19 E23 D24 E25 D26 E27 D28 E29 E31 D32 E35 E37 D38 E39
His	E1 H2 E3 H4 E5 H6 E7 H8 E9 H10 E11 H12 H14 E15 H16 E17 H18 E19 H20 E21 H22 E23 H24 E25 H26 E27 H28 E29 H30 E31 H32 E33 H34 E35 H36 E37 H38 E39 H40	H2 H4 H6 H12 H22 H26	H18 E19 H22 H24 H26 H28 E29 H30 H32 E33 H34 E35 H36 H38 E39	E15 H16 E17 H18 E19 E21 H22 E23 H24 H26 H28 E29 H30 E31 H32 E33 H34 E35 H38 E39	H12 E15 H16 E17 H18 E19 H20 E21 H22 E23 H24 E25 H26 E27 H28 E29 H30 E31 H32 E33 H34 H36 E37 E39	H24 H28 H30 H32 E33	H12 E15 H16 E17 H18 E19 H20 E21 H22 E23 H24 E25 H26 E27 H28 E29 H30 E31 H32 E33 H34 E35 H36 E37 H38 E39	H2 H4 H6 H12 H22 H26	H4 H6

Final assessment

Despite not being able to execute all the planned tasks to evaluate phytochelatin alternatives and comparing them to the origin sequence, we can conclude that both of our variants are acceptable, stable candidates, since they sustained for 50 ns in the forcefield. Asp, although torn apart in the AMBER99-SB force field, performed extremely stable in the OPLS force field. Both phytochelatin variants have the potential to improve the binding of phytochelatin to metals, but considering the higher radius of gyration, we suggest to focus on the Asp variant.

As an outlook, the performance of Umbrella sampling to estimate the free energy between bound and unbound structures would present us with a putatively higher variant amongst our alternatives. Furthermore, considering quantum mechanical aspects in our force field would potentially increase the yield of our simulation, as recommended by our collaboration partner from Waterloo.

Team:Tuebingen/Model

Model

Phytochelatin

Protein structure prediction

Molecular dynamics simulation

Binding Site Prediction:

Final assessment

References

Team:Tuebingen/Model

Model

Phytochelatin

Protein structure prediction

Molecular dynamics simulation

ions.mdp

nvt.mdp

npt.mdp

minim.mdp

md.mdp

Binding Site Prediction:

Final assessment

References