Difference between revisions of "Team:XMU-China/Model"

Line 23: Line 23:
 
     <link href="https://cdn.bootcss.com/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet">
 
     <link href="https://cdn.bootcss.com/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet">
 
     <script src="https://2020.igem.org/common/MathJax-2.5-latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
 
     <script src="https://2020.igem.org/common/MathJax-2.5-latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
 +
<script type="text/x-mathjax-config">
 +
        MathJax.Hub.Config({
 +
        tex2jax:{
 +
        inlineMath:[ ['$','$'],["\\(","\\)"] ],
 +
        displayMath:[ ['$$','$$'],["\\[","\\]"] ]
 +
        }
 +
        });
 +
    </script>
 +
    <script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_CHTML"></script>
 
</head>
 
</head>
 
<body>
 
<body>

Revision as of 10:37, 26 October 2020

1.Detection

To achieve the goal to detect the Glyphosate, we designed a system which is composed of EcAKR4-1, GRHPR and iNAP. We convert glyphosate into AMPA and glyoxylate through a Glyphosate oxidase (GOX) from Echinochloa Colona named EcAKR4-1. Then glyoxylate can be transferred into glycolic acid with the function of a Human Glyoxylate Reductase named GRHPR, which will simultaneously convert a molecule of NADPH to NADP+.

We use iNAP, a fused protein of T-Tex and cpYFP to detect the amount of NADPH in the system. When NADPH exists, the conformational changes of the fusion protein make it become fluorescent activity. Therefore, the amount of glyphosate can be indirectly quantified by determining the amount of NADPH consumed in the process from fluorescence intensity.

We modeled to describe the dynamics of the detection system and giving a full version of working curve which can be used in the degradation systems.
The whole reactions in the detection system are as following:

$$glyphosate\xrightarrow{{GOX}}AMPA$$ $$glyoxylic\mathop {}\limits_{}^{} acid + NADPH\xrightarrow{{GRHPR}}glycolic\mathop {}\limits_{}^{} acid + NAD{P^ + }$$ $$iNAP + NADPH\xrightarrow[{}]{{}}iNAP - NADPH$$ $$glyphosate\xrightarrow{{{k_1}}}\{ \} $$ $$AMPA\xrightarrow{{{k_2}}}\{ \} $$ $$glyoxylic\mathop {}\limits_{}^{} acid\xrightarrow{{{k_3}}}\{ \} $$ $$NADPH\xrightarrow{{{k_4}}}\{ \} $$ $$glycolic\mathop {}\limits_{}^{} acid\xrightarrow{{{k_5}}}\{ \} $$ $$NAD{P^ + }\xrightarrow{{{k_6}}}\{ \} $$ $$iNAP\xrightarrow{{{k_7}}}\{ \} $$ $$iNAP - NADPH\xrightarrow{{{k_8}}}\{ \} $$

And those reactions can be described by the following ODE equations:

$$\frac{{d[glyphosate]}}{{dt}} = - \frac{{{v_{gox}}[glyphosate]}}{{{K_{m(gox)}} + [glyphosate]}} - {k_1}[glyphosate]$$ $$\frac{{d[AMPA]}}{{dt}} = - \frac{{{v_{gox}}[glyphosate]}}{{{K_{m(gox)}} + [glyphosate]}} - {k_2}[AMPA]$$ $$\frac{{d[glyoxylic\mathop {}\limits^{} acid]}}{{dt}} = - \frac{{{v_{gox}}[glyphosate]}}{{{K_{m(gox)}} + [glyphosate]}} - \frac{{{\rm{V}}[NADPH][Glyoxylic\mathop {}\limits^{} acid]}}{{{K_{m(Glyoxylic\mathop {}\limits_{} acid)}}[NADPH] + {K_{m(NADPH)}}[Glyoxylic\mathop {}\limits_{} acid] + {K_{ia}}{K_{m(Glyoxylic\mathop {}\limits_{} acid)}} + [NADPH][Glyoxylic\mathop {}\limits_{} acid]}} - {k_3}[glyoxylic\mathop {}\limits_{} acid]$$ $$\frac{{d[NADPH]}}{{dt}}{\rm{ = }}\frac{{{\rm{V}}[NADPH][Glyoxylic\mathop {}\limits^{} acid]}}{{{K_{m(Glyoxylic\mathop {}\limits_{} acid)}}[NADPH] + {K_{m(NADPH)}}[Glyoxylic\mathop {}\limits_{} acid] + {K_{ia}}{K_{m(Glyoxylic\mathop {}\limits_{} acid)}} + [NADPH][Glyoxylic\mathop {}\limits_{} acid]}} - {k_4}[NADPH]$$ $$\frac{{d[glycolic\mathop {}\limits^{} acid]}}{{dt}}{\rm{ = }}\frac{{{\rm{V}}[NADPH][Glyoxylic\mathop {}\limits^{} acid]}}{{{K_{m(Glyoxylic\mathop {}\limits_{} acid)}}[NADPH] + {K_{m(NADPH)}}[Glyoxylic\mathop {}\limits_{} acid] + {K_{ia}}{K_{m(Glyoxylic\mathop {}\limits_{} acid)}} + [NADPH][Glyoxylic\mathop {}\limits_{} acid]}} - {k_5}[glycolic\mathop {}\limits_{}^{} acid]$$ $$\frac{{d[NAD{P^ + }]}}{{dt}}{\rm{ = }}\frac{{{\rm{V}}[NADPH][Glyoxylic\mathop {}\limits^{} acid]}}{{{K_{m(Glyoxylic\mathop {}\limits_{} acid)}}[NADPH] + {K_{m(NADPH)}}[Glyoxylic\mathop {}\limits_{} acid] + {K_{ia}}{K_{m(Glyoxylic\mathop {}\limits_{} acid)}} + [NADPH][Glyoxylic\mathop {}\limits_{} acid]}} - {k_6}[NAD{P^ + }]$$ $$\frac{{d[iNAP]}}{{dt}} = - {k_i}[iNAP][NADPH] + {k_{ - i}}[iNAP - NADPH] - {k_7}[iNAP]$$ $$\frac{{d[iNAP]}}{{dt}} = {k_i}[iNAP][NADPH] - {k_{ - i}}[iNAP - NADPH] - {k_8}[iNAP - NADPH]$$

The reaction between iNAP and NADPH is quite fast so we using the rapid equilibrium assumption. After fitting with experimental data, we found our ODE equations can describe all the dynamics in Fig.1.

Taking the fluorescence in 。。。。。。And the working curve can be simulation and obtained in Fig.2.

2.Degradation

Modeling of PhnJ was aimed in search of optimized enzymatic protein to achieve idealized degradative function of PhnJ after being transformed into E. coli. The following processes and methods led us to explore the optimal solution.

2.1 An Top-down strategy : in the holoenzyme view

Before we going deeply into the mechanism of PhnJ, we firstly try to understand the holoenzyme—PhnGHIJK, where the PhnJ is assembled to after translation.

PhnGHIJ, the C-P lyase core complex, contains four protein subunits, in which PhnJ is the core enzyme, catalyzing the lysis of C-P bond in glyphosate and methylphosphonate as a C-P lyase. PhnI is a nucleosidase to deglycosylate ATP and GTP into ribose 5-triphosphate. PhnGHL support the reaction of C-P lyase pathway in the place of PhnI to transfer the phosphonate to the ATP position in ribose C-1` position to displace adenine and generate the reaction intermediate ribose 5`-triphosphonate alkyl phosphonate. PhnM release the pyrophosphate and PhnJ cleaves the C-P bond and PhnP and PhnN convert it to PRPP [2-18]. PhnK associates with PhnGHIJ as the fifth protein, resembles ABC cassette proteins with unclear stoichiometry [1].

这是一个figure-word:[3, structural insights into...]

“The C-P lyase core complex resembles the letter 'H' with rounded arms that are twisted approximately 45° in and out of the plane with respect to each other.” [3] PhnG and PhnJH compose the two arms of “H”, while two PhnI touch and stretch over the two arms in the middle, as the center of C-P lyase complex.

PhnI is comprised with four-stranded, antiparallel beta-sheet next to a four-helix bundle with two helical extension in both C- and N- termini to grasp and tether PhnJ by extensive interaction. Furthermore, PhnJ connects and associates to PhnH through packing of two conserved alpha-helices in both proteins. The interaction of PhnHJ heterodimer is similar to the subunits structure of PhnH homodimer.

Just as described above, the collaboration between PhnG, PhnH, PhnI, PhnJ, PhnK is quite important so we suppose that increasing the binding affinity between the four proteins may contribute to the performance of Glyphosate degradation.

For the PhnJ (exogenous), our work mainly focuses on getting the structure by homologous modeling.

The gene sequence of PhnJ in Enterobacter cloacea K7 is reported to be highly conserved and high identity (99%) to the phnJ sequence in Enterobacter cloacae strain EcWSU1[4]. Therefore, we used codon-optimization to the preference of E. coli and submited the gene sequence in the website of SWISS-MODEL (https://swissmodel.expasy.org/), filtrated and obtained the optimal homologous protein model of PhnJ (exogenous).

In order to achieve a more precise and accurate result of PhnJ protein model, we implemented MOE to build another homologous model with different algorithm.

Plus, MOE offers geometry analysis of protein model, which brings much convenience to our model assessment and evaluation.

Initially, we assessed the SWISS-MODEL result of PhnJ in Molprobity (http://molprobity.biochem.duke.edu/) , with the function of “Analyze geometry without all-atom contacts”. It was reported that the model possesses numbers of clashes, outlier rotamers, and CaBlam outliers.

Thus, we built another homologous PhnJ model in software MOE and achieve geometry analysis within the software. Moreover, with the basic understanding of the C-P lyase complex in the aspect of structure and protein interaction, we designed several mutations in PhnJ and transformed them into the fittest configuration of each mutated amino acid. Basically, the interface of PhnJ to PhnH and PhnI was the essence of mutation. For instance, the hydrophilic lysine in the PhnJH interface was mutated into hydrophobic phenylalanine to achieve a strengthened hydrophobic interaction and diminish the surface water inside. So we try to choose the following three mutation, including R21M, which mutate the 21th amino acid from R to M, P45Q and T16S & R40Y which have two sites mutated.

The mutation is Homologous modeling by Swiss-Model, and the mutated site can be seen in Fig?-Fig.?.

Protein-Protein docking is carried out by the ClusPro server and the binding energy can be seen in the table.1.

Endangerous R21M T16S & R40Y P45Q
Center -1299.0 -1821.9 -2078.1 -1739.4
Lowest -3913.7 -4023.7 -4044.8 -4001.6

2.2 An Bottom-up strategy : in the PhnJ view

Only obtaining the higher binding affinity can’t guarantee that the mutated protein will have better performance to degradation the glyphosate. We should go deeper into the reaction mechanism of PhnJ to check whether the mutated action leads to a higher speed of reaction or just inactive the enzyme.

2.2.1 The reaction mechanism of PhnJ

The reaction mechanism of PhnJ can divide into the following six-step.

Step 1. Initial formation of the 5`deoxyadenosyl radical. The mechanism is assumed to be identical to other Radical SAM type proteins.

Step 2. The adenosyl radical abstracts a hydrogen atom from the catalytic glycine residue. The products of this activation step represent the ground state of this enzyme.

Step 3. The glycyl radical abstracts a hydrogen atom from the catalytic cysteine residue.

Step 4. The cysteinyl residue attacks the phosphate group, forming a pentavalent intermediate.

Step 5. The pentavalent intermediate collapses, eliminating a methyl radical with concomitant abstraction of a hydrogen atom from the glycine residue.

Step 6. One of the hydroxyl groups of the ribose intermediate initiates a nucleophilic attack on the covalently bound phosphate to regenerate the cysteine residue and final product. The enzyme is now in a state that it can perform another round of catalysis.

In the all six-step reaction mechanism, some super-distance hydrogen atom transforms are worth to study, like Step2 and Step3. The previous research shows that the hydrogen atom transforms which carried by the Hydrogen bond network, some proton channel or protein re-folding. In other views for PhnJ itself, the super-distance hydrogen atom transforms just happened between the binding site and catalytic site which means that it’s possible that the substrate is transported inside or outside the protein. But for the most common situation in such a complex system, those mechanisms mentioned above are likely to work together to finish hydrogen atom transforms.

So here we have following three parts to evaluate the state of hydrogen atom transforms in mutation which is operated in three pars in reaction mechanism including Step2, Step3 and key reaction process in Step5.

2.2.2 Finding the adenosyl radical transforms path in Step2,an general method combining Monte Carlo sample and reinforcement learning

As we can see in reaction mechanism Step2,we can find that first super-distance hydrogen atom transforms in PhnJ’s reaction coordinates. After Initial formation of the 5`deoxyadenosyl radical, The adenosyl radical abstracts a hydrogen atom from the catalytic glycine residue. But the distance between adenosyl radical and Gly32 is too far to finish the direct transform without the moving of adenosyl radical. In order to finding the possible hydrogen atom transforms path in Step2, we propose an algorithm call F2ARTP (Finding the adenosyl radical transforms path) to calculate the path.

The FHATP algorithm can be split into two parts: the Monte Carlo sampling which finally gives a score function to evaluate the possible docking position of the substrate using the Gibson binding energy and, reinforcement learning which will find the path that satisfies the minimization of total reaction energy and moving distance. The workflow of FHATP can be seen in Fig.?,

The Monte Carlo Sampling parts is used to collect the sample which can be used to integrate the scoring functions which is used to evaluate the possible docking position of the substrate and work as the return function in Deep Q-Learning. It can be split into following parts:

1. Pre-processing the protein structure for molecular dynamics. Adding missing hydrogen atoms and convert non-standard residues to their standard equivalents to make the PDB structure suitable for performing the molecular dynamics.

2. Using the classical molecular dynamics to collect the possible flexible structural sample which can be used for molecular docking.

3. Pick a reasonable number of frames from the results of MD and using as Accepter to perform molecular docking to find all the possible binding sites for substrate around the protein and their Gibson binding energy.

4. All step here using the hypothesis that the protein is the flexible part while the substrate is the rigid part because we need to identify the position of the substrate using the only one coordinates in cartesian coordinates, which is a convenience for us to integrating the result of molecular docking. In the step, we should find a mapping between the possible docking position and its binding energy. We can choose the table function or neural network.

The two key methods: molecular dynamics and molecular docking is described as following.

Molecular dynamics (MD) is a computer simulation method for studying the physical movements of atoms and molecules. The atoms and molecules are allowing to interact for a fixed period of time, giving a view of the dynamic evolution of the system. In the most common version, the trajectories of atoms and molecules are determined by numerically solving Newton's equations of motion for a system of interacting particles, whose forces between the particles and their potential energies are often calculated using interatomic potentials or molecular mechanics force fields.

To a system which consists of /N/ molecule or atoms, the total energy of a system includes kinetic energy and potential energy,which can be describe by the formula below:

$$E = {E_{kin}} + U$$

where ${E_{kin}}$ donates the kinetic energy and $U$ donates the potential energy.

In a molecule system, the total potential energy can be calculated by adding the ${U_{nb}}$ bond stretching potentials energy ${U_b}$ angle bending potentials energy ${U_\theta }$ torsion angle potentials energy ${U_\phi }$ out-of-plane potentials energy ${U_\chi }$ and some other cross effect ${U_{cross}}$ together,which also can be describe by the formula below:

$$U = {U_{nb}} + {U_b} + {U_\theta } + {U_\phi } + {U_\chi } + {U_{cross}}$$

This formula above also called the force field in molecular dynamics’ theory. There are many force field in the world that based on the statistical thermodynamics and empirical result. In the research of the protein and nucleic acid, the Amber force field is one of the best force field in the world. So we choose Amber as our force field and the formula of this field show below:

\[\begin{array}{l} E = \sum\limits_{bond} {{K_b}{{({r_{ij}} - {r_0})}^2}} + \sum\limits_{angle} {{K_\theta }{{(\theta - {\theta _0})}^2}} + \sum\limits_{dihedral} {\frac{{{K_\phi }[1 + \cos (n\phi - {\phi _0})]}}{2}} \\ + \sum\limits_{impr} {\frac{{{K_\chi }[1 + \cos (n\chi - {\chi _0})]}}{2}} + \sum\limits_{nobond} {{\varepsilon _{ij}}\left[ {{{\left( {\frac{{R_{ij}^0}}{{{R_{ij}}}}} \right)}^{12}} - 2{{\left( {\frac{{R_{ij}^0}}{{{R_{ij}}}}} \right)}^6}} \right]} + \sum\limits_{nobond} {\frac{{{q_i}{q_j}}}{{{R_{ij}}}}} \end{array}\]

The items in the formula refers to bond stretching term、angle bending potentials、dihedral angle potentials、put of plane angle potentials、improper dihedral angle potentials、Van Der Waals interaction and Coulombic interaction terms in order.

Now considering the system which contains of consists of $N$ molecule or atoms in the classic mechanics, the atom or molecule can be characterized as follow:

The position of atom $i$ is ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} _i}$ , the speed of atom $i$ is \[{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over v} _i}\] , the acceleration of atom $i$ is ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over a} _i} = \frac{d}{{dt}}{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over v} _i} = \frac{{{d^2}}}{{d{t^2}}}{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} _i}$ , $i = 1,2 \cdots N$ .

After the integral operation, we can get two formula of \[{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over v} _i}\] and ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} _i}$ :

$${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over v} _i} = {\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over v} _i}^0 + {\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over a} _i}t$$ $${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} _i} = {\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} _i}^0 + {\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over v} _i}^0 + \frac{1}{2}{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over a} _i}{t^2}$$

where the ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over v} _i}^0$ refers to initial speed and the ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} _i}^0$ refers to initial

According to the classic mechanics, the force applied to atoms is the negative gradient of potential energy:

$${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over F} _i} = - {\nabla _i}U = - \left( {\frac{\partial }{{\partial x}}\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over i} + \frac{\partial }{{\partial y}}\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over j} + \frac{\partial }{{\partial z}}\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over k} } \right)U$$

Due to Newton’s second law, the acceleration of atom $i$ can also be described as:

$${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over a} _i} = \frac{{{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over F} }_i}}}{{{m_i}}}$$

Molecular docking is a key tool in structural molecular biology. The goal of ligand-protein and protein-protein docking is to predict the predominant binding modes of a ligand (include molecules, ions and proteins) with a protein of known three-dimensional structure. Successful docking methods search high-dimensional spaces effectively and use a scoring function that correctly ranks candidate dockings. Docking can be used to perform virtual screening on large libraries of compounds, rank the results, and propose structural hypotheses of how the ligands inhibit the target, which is invaluable in lead optimization. The setting up of the input structures for the docking is just as important as the docking itself, and analyzing the results of stochastic search methods can sometimes be unclear.

The reinforcement learning parts is used to identify the path that satisfies the minimization of total reaction energy and moving distance.

Reinforcement learning is the training of machine learning models to make a sequence of decisions. The agent learns to achieve a goal in an uncertain, potentially complex environment. In reinforcement learning, an artificial intelligence faces a game-like situation. The computer employs trial and error to come up with a solution to the problem. To get the machine to do what the programmer wants, the artificial intelligence gets either rewards or penalties for the actions it performs. Its goal is to maximize the total reward.

In this part, we using the Deep Q-Learning showing in the Fig.? Comparing to Q-Learning, Deep Q-Learning using the neural network in stead of Q-table to get the better result, which is proved is better in playing Atari games. And recently years with the development of AlphaFold, Reinforcement learning is used in biological design more and more frequently.

As the algorithm workflow mentioned. We treat the problem that Finding the adenosyl radical transforms path as a semi- Atari game. For this game, we need to define the Target-Network and Reward used in the algorithm.
For the Target-Network, we used an integrated network including an ANN which taking the two kinds of input: the position of agents and the distance between the destination and real-time position, a 3-D Convolutional Neural Network which taking a 3D image record the score function’ values which besides agents (cut-off distance is 2A using a cube).

For the reword, before we get the destination, the reward will be the following formula:

$$R{\rm{ = }}\frac{{\Delta \Delta G}}{{\Delta d}}$$

Which is used to calculate as energy barrier per unit between two steps.

And after we get the destination, the reward will be the following formula:

$$R{\rm{ = }}\frac{{\sum {\Delta \Delta G} }}{{total\_seep}}$$

Which is used to calculate as energy used per step in the whole game session.

As for the PhnJ which we want to design, there will be more specific things in this F2ARTP algorithm. Also in the following paragraphs, we will mention some result about it.

PhnJ and its mutation's PDB file can get in 2.1 while the structure and the molecular forcefield of adenosyl radical can be generated by LigParGen and balance by Avagadro with MMF868. And finally, molecular dynamics is carried out by OpenMM and DQN is coding with TensorFlow.

First result we want to post here is the docking result which can be seen in Fig.?.

When integrate the scoring function, we should noticed that PhnJ is parts of PhnGHIJK, which means that there are some interface region which is impossible regard as docking region as you can see in Fig.?, We should delete them all.

And finally the four path calculated by F2ARTP can be seen in the following picture and the total energy barrier and the total step can be seen in the table2.

这里有一个表格啊啊啊!!!!!都是问号的!!!

2.2.3 Detect the hydrogen atom transforms in Step3 using molecular dynamics

As the second super-distance hydrogen atom transforms in PhnJ’s reaction coordinates. The glycyl radical abstracts a hydrogen atom from the catalytic cysteine residue. But the distance between Gly32 and Cys272 is also large than any chemical bound so we need to find the way to detect the hydrogen atom transforms.

To calculate the hydrogen atom path, the QM or QM/MM will be the most perfect method but our system is too large for QM and the distance between the beginning palace(Gly32) and destination(Cys272) is too far to set the QM region. So we finally choosing the classical molecular dynamics we mentioned in 2.2.2 to detect the hydrogen atom transform by evaluate the dynamics distance between Gly32 and Cys272.

For the MD structure prepare, we draw 3, 5′-phospho-α-d-ribosyl 1′-(N-phosphonomethylglycine) in ChemDraw and get the smiles representation. After putting it in the LigPraGen, we can simply get the PDB files and forcefield, then and releax by Avagadro with MMF868. After that we dock this small molecular with PhnJ in PatchDock web service and perform MD with OpenMM.

We use the distance between Cys272-S and Gly32-R as an mark, and the result can be see in Fig.?.

2.2.4 Explore the breaking of C-P bonds in Step5 using QM/MM

The PhnJ belongs to a family called C-P lysis, which can break the C-P bound, like the step5 described in 2.2.1. The pentavalent intermediate collapses, eliminating a methyl radical with concomitant abstraction of a hydrogen atom from the glycine residue. We want to try to simulate this by QM/MM method.

The QM/MM method is a balance between QM and MD, which can simulate the break of chemically bound that beyond the MD while guarantee that the time cost is acceptable than QM.The simulation system in QM/MM will be split into three parts, the MM region, the QM region, and the interface. QM region is calculated with Principles of quantum mechanics while MM with classical mechanics. The interface is often set with the hydrogen atom(Link atom) which can be easy to calculate.

QM Calculations are the pinnacle of precision compared to all other methods used for calculating molecular systems, but this precision comes with very high computational costs. All QM calculation try to solve the Schroedinger Equation :
H Hamiltonian, Ψ Wavefunction , E Energy
They do that by calculating the Hamiltonian of the system. The Hamiltonian is an operator that if applied to the wavefunction of a system results in the same wavefunction times its energy. QM calculations are characterized by two important things: the functional and the basis set used. The functional is characterized by the way the Hamiltonian is calculated. For all calculations that we do, we also need a starting point to describe orbitals or the density function (we will go into more detail to this later) and the basis set describes the number and type of the orbital functions that we use to calculate the system.Born Oppenheimer ApproximationA necessary assumption to use QM methods is the Born Oppenheimer Approximation. This is the approximation that due to the vast difference in speed between the electrons and the nucleus of the atoms the movement of both can be investigated apart from each other. For QM Methods this means that we only look at the movement of electrons and neglect the very slow movements of the nuclei when solving the Schroedinger Equation.Functionals and Basis setsFunctionalsThere are two fundamentally different types of functionals, wave function based and density function based methods. We only used density function based methods, to be precise B3LYP. Density Function Theory (DFT) is centered not on the wavefunction but on the square of the wave function, i.e. the electron density. It is based on the theorem of Hohenberg and Kohn and modern DFT is also based on the Kohn Sham approach. The electron density is a measurable quantity that is dependent on the cartesian coordinates of the electrons. In the first days of DFT (when invented by physicists) mostly the Linear Density Approach (LDA) was used and this approach works fine for metallic solids, in which the electrons are evenly distributed. However, this is a poor approximation when we want to model molecular systems. As a further Development of this approximation, Gradient corrected methods (GGA) have been invented. These methods consider the electron density not to be uniform.

In our system, the C-P bond with its group and Gly32-R set as QM region, and the rest part set as the MM region. The interface is C-P bound and Gly32-R bound, which is break and replace with link atom. QM/MM use PyChemShell to couple two software: GLUP for MD and ORCA for QM.

But unfortunately, due to the time and computer capability’s limitation, the QM/MM simulation is still running when the time Wiki Freeze.

3.Kill Switch

As mentioned on the project design page, for the biosafety we have created two kinds of kill switch which is used both in biological system and hardware. It obtains that our bacteria for degradation only survive when formaldehyde presents.

However, we found that formaldehyde itself also has toxic effect to the cell, which means that whether the concentration of formaldehyde is too high or too low, there’s a high chance that our bacteria would die. Therefore, figuring out the optimum concentration of formaldehyde only by doing some experiments will be time-consuming and almost impossible. Hence, it’s necessary to build a model so as to illustrate the relationship between the concentration of formaldehyde and the survival of cell. Eventually, we can find the most suitable concentration of formaldehyde for the engineered bacteria and the best concentration of glyphosate as well. Besides, it is also of beneficial for large-scale industrialization in the future.

As shown above, the gene circuit contains the formaldehyde promoter, the inverter (including cI lam protein), RBS and the toxin mazF. To begin with, we found related parameters of those biobricks on the Part Page and read some articles and papers about modeling.

1.The relationship between the concentration of formaldehyde and the expression of cI lam protein

1) Molecular(formaldhyde)

$$\frac{d[formaldhyde]}{dt}=\mathrm{-Degradation}$$ $$\frac{d[formaldhyde]}{dt}=-D_{formaldhyde}\cdot [formaldhyde]$$

2) P-formaldehyde promoter(DNA→mRNA):

$$\frac{d[{mRNA}_{P-formaldehyde}]}{dt}=\frac{(V_O+(V_{max}\cdot [formaldhyde]^{n}))}{K_M^{n}+[formaldhyde]^{n}}-D_{mRNA}\cdot [{mRNA}_P]$$

3) Peptide of B0034 after P-formaldehyde(mRNA→peptide)

4) cl lam protein(peptide→protein)

Since it’s unlikely to obtain some of the parameters in those three functions above, we integrated them into one:

2.The relationship between the expression of cI lam protein and the expression of toxin mazF protein

1) P-cl lam promoter(DNA→mRNA):

2) Peptide of B0034 after P-cl lam(mRNA→peptide)

3) mazF(peptide→protein)

For the same reason, we integrated those functions and listed the one below:

Significantly, the DmRNA, Dpeptide, R, V0, Vmax, KM and n are different in different part.

3.The relationship between the expression of toxin and the survival of cells

To stimulate the toxic effect of mazF to our bacteria, we introduced the Monod function.

μ is the specific growth rate (h-1); X is the cell concentration (g/L); S is the glycerol concentration (as nutrients) (g/L); μmax is the maximum specific growth rate (h-1); KS is the saturation constant (g/L)

4.The relationship between the concentration of formaldehyde and the survival of cells

The Monod function was also applied for the stimulation.

After the listing of functions, we designed some experiments to get those parameters. To begin with, we chose a strain of E.coli BL21 that do not have the gene circuit of suicide system and tested the toxic effect of formaldehyde by measuring the OD600. Then, we detected the toxic effect of mazF through CFU method. After that, to measure the expression of cI lam and mazF, we replaced them with ECFP & EYFP separately and detected the fluorescence.