Phytochelatin
Phytochelatins are a group of polypeptides which have gained a lot of momentum in the field of synthetic
biology and metabolic engineering as they hold the potential to be employed as detoxification modules
(for
more information see: Design).
As phytochelatins are an important compartiment of our project, we wanted to understand its ion
capturing
capabilities on a deeper level. Thus our aim was to characterize the polypeptide and to explore
possibilities to improve its manganese ion chelation.
As a result, we generated two altered sequences.
In the original sequence (1), we have a lot of glutamic acid - cysteine pairs. We assumed that the
glutamic
acid pulls ions through electronegativity and that the thiol groups of the cysteine then bind the heavy
metal atoms.
One way in which we tried to improve this binding was through exchanging the cysteines with histidines,
as
we assumed that the amino acid would bind to metal ions via its imidazole ring.
In the second altered sequence, we replaced cysteine with aspartic acid, since its electronegative
properties (in addition to Glutamic acid) could bind heavy metal attraction.
In the following, those three generated sequences will be referred to as:
In the following, those three generated sequences will be referred to as:
- Origin, for the Cysteine including sequence.
- His, for the Histidin including sequence.
- Asp, for the Aspartic acid including sequence.
For all sequences, we needed to determine a three dimensional structure and test its stability. After
this
we planned to analyze them further by for example determining and comparing the binding energies via
Umbrella sampling.
That is a technique to study protein binding-unbinding processes. As well it is possible to estimate the
binding free energy between those states.
We built models as specified in the following:
We built models as specified in the following:
Figure 1: Workflow for the bioinformatic analysis of phytochelatin and its variants.
Protein structure prediction
To model the three dimensional structures of our versions of the phytochelatin from their sequences, we
evaluated several web tools listed here:
- I-TASSER - https://zhanglab.ccmb.med.umich.edu/I-TASSER
- Phyre2 - http://www.sbg.bio.ic.ac.uk/phyre2
- CBS - http://www.cbs.dtu.dk/
- Orion - https://www.dsimb.inserm.fr
Those tools all employ different methods based on either homology modelling, threading or ab initio, to
predict the structures based on the sequence.
Homology modelling methods can be used if there exist (enough) homolog proteins With similar sequence
and
known structure, as they compare the query peptide sequence against a database. The structures of close
matches can be used as templates. Threading is often used to find template structures in a parallelized
and
fast way.
If there only exists little or no knowledge about homologous proteins, ab-initio modelling might be necessary. In this case, a free energy minimization is performed while iterating over different possible structures for smaller parts of the sequence.
If there only exists little or no knowledge about homologous proteins, ab-initio modelling might be necessary. In this case, a free energy minimization is performed while iterating over different possible structures for smaller parts of the sequence.
In our case, there exist phytochelatins with known structures and similar sequences, but with much less
repeats than observed in our phytochelatin. After extensive research, we decided to work with a
hierarchical
threading method “I-TASSER”, as it aligns fragments of the query sequence with fragments of a database.
Furthermore, the structures obtained with “I-TASSER” scored highest in terms of confidence scores in
comparison to all other tools we tested.
Briefly, I-TASSER uses a procedure using four different steps: (8)
- Template identification by LOMETS using Threading (3)
- Fragment structure reassembly by replica-exchange Monte Carlo simulations (4) using ab initio and the threading results from step 1
- Atomic level structure refinement by REMO (5) and FG-MD (6) using molecular dynamic simulations, H-bond network optimization
- Structure-based function interpretation by COFACTOR (7) by evaluating similar template proteins about their functions and structures
I-TASSER generates a confidence score to measure the quality of a model-result. Thus, for all of our
structures we chose the model with the highest confidence score (C-Score). Models with a C-Score of -1,5
are
expected to have a correct fold. Our models have the following C-Scores:
As these are acceptable confidence scores, we used the Origin, His and Asp models for our next step.
- Origin: -1,28 (Figure 2)
- His: -1,12 (Figure 3)
- Asp: -0,75 (Figure 4)
As these are acceptable confidence scores, we used the Origin, His and Asp models for our next step.
Figure 2: I-TASSER result for the Origin sequence.
Figure 3: I-TASSER result for the His sequence.
Figure 4: I-TASSER result for the Asp sequence.
Molecular dynamics simulation
Next, we needed to test whether the predicted structures are plausible in a Molecular Dynamics (MD)
simulation. The method simulates the movement of a set of atoms over time by calculating and analyzing
forces between them. The structure being stable for a defined period of time is a good indicator for its
stability and thus plausibility in vivo.
To achieve this, we used GROMACS 2018.6 a MD package capable of simulating Molecular Mechanical (MM) and
Quantum Mechanical (QM) forces applied to all the particles of a molecule. (9)
Using MD simulations it can be assumed whether a molecule can withstand the forces of a defined system
and
as well calculate the way it will change when exposed to a specific system.
Those systems´ forces are defined via force fields. A force field is a set of parameters for each atom
type
or group in the simulation that is used to calculate the energy of the system. Since several different
force
fields for various molecule types exist it is imperative to find the most suitable for the problem at
hand.
Although, we can presume that our Origin sequence will be stable, since we got it from an already
existing
iGEM-Part, we commenced our MD simulation with the Origin structure as well in order to prove this
presumption and to receive reference values for the other sequences. Thus, we started by following the
“Lysozyme in Water Tutorial” (10) with the exceptions mentioned here:
In our first approach we used the AMBER99-SB Force Field, because AMBER is pretty well known and was
recommended to us by Dr.
Thiel . Furthermore, we found articles on how to extend the
AMBER99-SB Force Field by values of heavy metals (in focus: Manganese).
(12)
Generally, it is challenging to come across force fields with these properties already built in so we
decided that the aforementioned approach is presumably the most promising. Accordingly, we employed the
following parameters provided as .mdp files:
integrator = steep
emtol = 1000.0
emstep = 0.01
nsteps = 50000
nstlist = 1
cutoff-scheme = Verlet
ns_type = grid
coulombtype = cutoff
rcoulomb = 1.4
rvdw = 1.4
pbc = xyz
emtol = 1000.0
emstep = 0.01
nsteps = 50000
nstlist = 1
cutoff-scheme = Verlet
ns_type = grid
coulombtype = cutoff
rcoulomb = 1.4
rvdw = 1.4
pbc = xyz
define = -DPOSRES
integrator = md
nsteps = 50000
dt = 0.002
nstxout = 500
nstvout = 500
nstenergy = 500
nstlog = 500
continuation = no
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter = 1
lincs_order = 4
cutoff-scheme = Verlet
ns_type = grid
nstlist = 10
rcoulomb = 1.4
rvdw = 1.4
DispCorr = EnerPres
coulombtype = PME
pme_order = 4
fourierspacing = 0.16
pcoupl = no
pbc = xyz
gen_vel = yes
gen_temp = 300
gen_seed = -1
tcoupl = V-rescale
tc-grps = Protein Non-Protein
ref_t = 300 300
tau_t = 0.1 0.1
integrator = md
nsteps = 50000
dt = 0.002
nstxout = 500
nstvout = 500
nstenergy = 500
nstlog = 500
continuation = no
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter = 1
lincs_order = 4
cutoff-scheme = Verlet
ns_type = grid
nstlist = 10
rcoulomb = 1.4
rvdw = 1.4
DispCorr = EnerPres
coulombtype = PME
pme_order = 4
fourierspacing = 0.16
pcoupl = no
pbc = xyz
gen_vel = yes
gen_temp = 300
gen_seed = -1
tcoupl = V-rescale
tc-grps = Protein Non-Protein
ref_t = 300 300
tau_t = 0.1 0.1
define = -DPOSRES
integrator = md
nsteps = 50000
dt = 0.002
nstxout = 500
nstvout = 500
nstenergy = 500
nstlog = 500
continuation = yes
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter = 1
lincs_order = 4
cutoff-scheme = Verlet
ns_type = grid
nstlist = 10
rcoulomb = 1.4
rvdw = 1.4
DispCorr = EnerPres
coulombtype = PME
pme_order = 4
fourierspacing = 0.16 tcoupl = V-rescale
tc-grps = Protein Non-Protein
tau_t = 0.1 0.1
ref_t = 300 300
pcoupl = Parrinello-Rahman
pcoupltype = isotropic
tau_p = 2.0
ref_p = 1.0
compressibility = 4.5e-5
refcoord_scaling = com
pbc = xyz
gen_vel = no
integrator = md
nsteps = 50000
dt = 0.002
nstxout = 500
nstvout = 500
nstenergy = 500
nstlog = 500
continuation = yes
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter = 1
lincs_order = 4
cutoff-scheme = Verlet
ns_type = grid
nstlist = 10
rcoulomb = 1.4
rvdw = 1.4
DispCorr = EnerPres
coulombtype = PME
pme_order = 4
fourierspacing = 0.16 tcoupl = V-rescale
tc-grps = Protein Non-Protein
tau_t = 0.1 0.1
ref_t = 300 300
pcoupl = Parrinello-Rahman
pcoupltype = isotropic
tau_p = 2.0
ref_p = 1.0
compressibility = 4.5e-5
refcoord_scaling = com
pbc = xyz
gen_vel = no
integrator = md
nsteps = 25000000
dt = 0.002
nstxout = 0
nstvout = 0
nstfout = 0
nstenergy = 5000
nstlog = 5000
nstxout-compressed = 5000
compressed-x-grps = System
continuation = yes
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter = 1
lincs_order = 4
cutoff-scheme = Verlet
ns_type = grid
nstlist = 10
rcoulomb = 1.4
rvdw = 1.4
coulombtype = PME
pme_order = 4
fourierspacing = 0.16
tcoupl = V-rescale
tc-grps = Protein Non-Protein
tau_t = 0.1 0.1
ref_t = 300 300
pcoupl = Parrinello-Rahman
pcoupltype = isotropic
tau_p = 2.0
ref_p = 1.0
compressibility = 4.5e-5
pbc = xyz
DispCorr = EnerPres
gen_vel = no
nsteps = 25000000
dt = 0.002
nstxout = 0
nstvout = 0
nstfout = 0
nstenergy = 5000
nstlog = 5000
nstxout-compressed = 5000
compressed-x-grps = System
continuation = yes
constraint_algorithm = lincs
constraints = h-bonds
lincs_iter = 1
lincs_order = 4
cutoff-scheme = Verlet
ns_type = grid
nstlist = 10
rcoulomb = 1.4
rvdw = 1.4
coulombtype = PME
pme_order = 4
fourierspacing = 0.16
tcoupl = V-rescale
tc-grps = Protein Non-Protein
tau_t = 0.1 0.1
ref_t = 300 300
pcoupl = Parrinello-Rahman
pcoupltype = isotropic
tau_p = 2.0
ref_p = 1.0
compressibility = 4.5e-5
pbc = xyz
DispCorr = EnerPres
gen_vel = no
integrator = steep
emtol = 1000.0
emstep = 0.01
nsteps = 50000
nstlist = 1
cutoff-scheme = Verlet
ns_type = grid
coulombtype = PME
rcoulomb = 1.4
rvdw = 1.4
pbc = xyz
emtol = 1000.0
emstep = 0.01
nsteps = 50000
nstlist = 1
cutoff-scheme = Verlet
ns_type = grid
coulombtype = PME
rcoulomb = 1.4
rvdw = 1.4
pbc = xyz
In brief, the parameters mostly differ by the following aspects:
- The most important thing here was to set the coulomb for electrostatic, and van der Waals for intramolecular forces, effects to 1.4 (instead of 1.0). (13)
- The simulation time was risen up to 50ns (instead of 1ns) for a more reliable test of stability
Figure 5: 50ns Videos of our Phytochelatin-sequences.
As a result, the Origin and our His variant molecules sustained in the AMBER Force Field for 50 nm,
whilst
our Asp prototype fell apart after ~48ns. This indicates that the Origin and His variant may be stable
in
vivo, an important indication for a putative application of the His variant in the wet lab.
Concerning the Asp variant, no force field is proven to model 100% of real conditions and a stability
for 48
ns are, compared to publications which require 20 ns, quite strong. Thus, we decided to place the Asp
variant within the OPLS-AA/L force field. The OPLS-AA/L force field is an all atom force field which was
used in the referred tutorial as well. (11) In this force field, the Asp variant remained very stable,
making it a second promising candidate for our wet lab work.
Of note, the His modification showed quite a lot of gyration in comparison to the other two variants,
which
could eventually lead to the protein being torn apart on the long run.
Binding Site Prediction:
In addition to our MD simulation, we used another tool to elucidate the probable binding sites of metal
ions
(Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+) for Origin, His and Asp using IonCom. IonCom is a
webtool from zhanglab to predict possible binding sites to metals. It combines methods like ab initio
training with template-based transferals (2).
Contrary to our suspicion of additional sulfide-bonds from the thiol-groups of the Cysteine in the
Origin
sequence, we found Glutamic Acid (E15 & E19) to bind to Mn2+. Additionally, the Origin sequence does not
have active sites for Fe2+, Fe3+, Na+, K+)). In contrast, our His and Asp are both binding to all 7
different metal types and therefore appear to be more versatile than the Origin sequence.
To conclude, we suggest testing and using both of our phytochelatin variants additionally to the
original
sequence in the wet lab, as they appear to be stable and more versatile in terms of ion binding.
Figure 6: IonCom result for Origin binding to Mn
IonCom Results
Sequence | Zn | Cu | Fe2+ | Ca | Mn | Na | K | Fe3+ | Mg |
---|---|---|---|---|---|---|---|---|---|
Origin | E1 C2 E3 C4 E5 C6 E7 C8 E9 C10 E11 C12 E13 C14 E15 C16 E17 C18 E19 C20 E21 C22 E23 C24 E25 C26 E27 C28 E29 C30 E31 C32 E33 C34 E35 C36 E37 C38 E39 C40 | C2 C18 C22 C26 C28 C32 C38 E39 | E13 E15 E17 E19 | E15 E19 | E35 E37 | ||||
Asp | E1 D16 E19 D22 E23 D24 E25 D26 E27 D28 E29 D30 E31 D32 D34 E35 D36 D38 E39 | D18 E19 D22 E23 D26 D28 E29 D32 E33 D36 E39 | D36 | E1 D4 D6 E7 D8 D10 E13 D14 E15 D16 D18 E19 E21 D22 E23 D24 E25 D26 E27 D28 E29 D30 E31 D32 D34 E35 D36 E37 D38 E39 D40 | E19 D20 E21 D22 E23 D24 E25 D26 E27 D28 E29 D30 E31 D32 E33 D34 E35 E37 D38 E39 | E25 D26 D28 E29 D30 E31 D32 D34 E35 D36 E39 | E19 E23 D24 E25 D26 E27 D28 E29 E31 D32 E35 E37 D38 E39 | ||
His | E1 H2 E3 H4 E5 H6 E7 H8 E9 H10 E11 H12 H14 E15 H16 E17 H18 E19 H20 E21 H22 E23 H24 E25 H26 E27 H28 E29 H30 E31 H32 E33 H34 E35 H36 E37 H38 E39 H40 | H2 H4 H6 H12 H22 H26 | H18 E19 H22 H24 H26 H28 E29 H30 H32 E33 H34 E35 H36 H38 E39 | E15 H16 E17 H18 E19 E21 H22 E23 H24 H26 H28 E29 H30 E31 H32 E33 H34 E35 H38 E39 | H12 E15 H16 E17 H18 E19 H20 E21 H22 E23 H24 E25 H26 E27 H28 E29 H30 E31 H32 E33 H34 H36 E37 E39 | H24 H28 H30 H32 E33 | H12 E15 H16 E17 H18 E19 H20 E21 H22 E23 H24 E25 H26 E27 H28 E29 H30 E31 H32 E33 H34 E35 H36 E37 H38 E39 | H2 H4 H6 H12 H22 H26 | H4 H6 |
Final assessment
Despite not being able to execute all the planned tasks to evaluate phytochelatin alternatives and
comparing
them to the origin sequence, we can conclude that both of our variants are acceptable, stable
candidates,
since they sustained for 50 ns in the forcefield. Asp, although torn apart in the AMBER99-SB force
field,
performed extremely stable in the OPLS force field.
Both phytochelatin variants have the potential to improve the binding of phytochelatin to metals, but
considering the higher radius of gyration, we suggest to focus on the Asp variant.
As an outlook, the performance of Umbrella sampling to estimate the free energy between bound and
unbound
structures would present us with a putatively higher variant amongst our alternatives. Furthermore,
considering quantum mechanical aspects in our force field would potentially increase the yield of our
simulation, as recommended by our collaboration partner from Waterloo.