# Team:XMU-China/Model

1.Detection

To achieve the goal to detect glyphosate, we designed a system that is composed of EcAKR4-1, GRHPR, and iNAP. We convert glyphosate into AMPA and glyoxylate through a Glyphosate oxidase (GOX) from Echinochloa Colona named EcAKR4-1. Then glyoxylate can be transferred into glycolic acid with the function of a Human Glyoxylate Reductase named GRHPR, which will simultaneously convert a molecule of NADPH to NADP+.

Fig. 1 The reaction happened in the detection system

We used iNAP, a fused protein of T-Tex, and cpYFP to detect the amount of NADPH in the system. When NADPH exists, the conformational changes of the fusion protein arouse its fluorescent activity. Therefore, the amount of glyphosate can be indirectly quantified by determining the amount of NADPH consumed in the process from fluorescence intensity.

We modeled to describe the dynamics of the detection system and gave a full version of the calibration curve which can be used in the degradation system.

The whole reactions in the detection system are as following:

$$glyphosate\xrightarrow{{GOX}}AMPA$$ $$glyoxylic\mathop {}\limits_{}^{} acid + NADPH\xrightarrow{{GRHPR}}glycolic\mathop {}\limits_{}^{} acid + NAD{P^ + }$$ $$iNAP + NADPH\xrightarrow[{}]{{}}iNAP - NADPH$$ $$glyphosate\xrightarrow{{{k_1}}}\{ \}$$ $$AMPA\xrightarrow{{{k_2}}}\{ \}$$ $$glyoxylic\mathop {}\limits_{}^{} acid\xrightarrow{{{k_3}}}\{ \}$$ $$NADPH\xrightarrow{{{k_4}}}\{ \}$$ $$glycolic\mathop {}\limits_{}^{} acid\xrightarrow{{{k_5}}}\{ \}$$ $$NAD{P^ + }\xrightarrow{{{k_6}}}\{ \}$$ $$iNAP\xrightarrow{{{k_7}}}\{ \}$$ $$iNAP - NADPH\xrightarrow{{{k_8}}}\{ \}$$

And those reactions can be described by the following ODE equations:

$$\frac{{d[glyphosate]}}{{dt}} = - \frac{{{v_{gox}}[glyphosate]}}{{{K_{m(gox)}} + [glyphosate]}} - {k_1}[glyphosate]$$ $$\frac{{d[AMPA]}}{{dt}} = - \frac{{{v_{gox}}[glyphosate]}}{{{K_{m(gox)}} + [glyphosate]}} - {k_2}[AMPA]$$ $$\begin{array}{l} \frac{{d[glyoxylic\mathop {}\limits^{} acid]}}{{dt}} = - \frac{{{v_{gox}}[glyphosate]}}{{{K_{m(gox)}} + [glyphosate]}} - {k_3}[glyoxylic\mathop {}\limits_{} acid]\\ - \frac{{{\rm{V}}[NADPH][Glyoxylic\mathop {}\limits^{} acid]}}{{{K_{m(Glyoxylic\mathop {}\limits_{} acid)}}[NADPH] + {K_{m(NADPH)}}[Glyoxylic\mathop {}\limits_{} acid] + {K_{ia}}{K_{m(Glyoxylic\mathop {}\limits_{} acid)}} + [NADPH][Glyoxylic\mathop {}\limits_{} acid]}} \end{array}$$ $$\frac{{d[NADPH]}}{{dt}}{\rm{ = }}\frac{{{\rm{V}}[NADPH][Glyoxylic\mathop {}\limits^{} acid]}}{{{K_{m(Glyoxylic\mathop {}\limits_{} acid)}}[NADPH] + {K_{m(NADPH)}}[Glyoxylic\mathop {}\limits_{} acid] + {K_{ia}}{K_{m(Glyoxylic\mathop {}\limits_{} acid)}} + [NADPH][Glyoxylic\mathop {}\limits_{} acid]}} - {k_4}[NADPH]$$ $$\begin{array}{l} \frac{{d[glycolic\mathop {}\limits^{} acid]}}{{dt}}{\rm{ = }}\frac{{{\rm{V}}[NADPH][Glyoxylic\mathop {}\limits^{} acid]}}{{{K_{m(Glyoxylic\mathop {}\limits_{} acid)}}[NADPH] + {K_{m(NADPH)}}[Glyoxylic\mathop {}\limits_{} acid] + {K_{ia}}{K_{m(Glyoxylic\mathop {}\limits_{} acid)}} + [NADPH][Glyoxylic\mathop {}\limits_{} acid]}}\\ - {k_5}[glycolic\mathop {}\limits_{}^{} acid] \end{array}$$ $$\frac{{d[NAD{P^ + }]}}{{dt}}{\rm{ = }}\frac{{{\rm{V}}[NADPH][Glyoxylic\mathop {}\limits^{} acid]}}{{{K_{m(Glyoxylic\mathop {}\limits_{} acid)}}[NADPH] + {K_{m(NADPH)}}[Glyoxylic\mathop {}\limits_{} acid] + {K_{ia}}{K_{m(Glyoxylic\mathop {}\limits_{} acid)}} + [NADPH][Glyoxylic\mathop {}\limits_{} acid]}} - {k_6}[NAD{P^ + }]$$ $$\frac{{d[iNAP]}}{{dt}} = - {k_i}[iNAP][NADPH] + {k_{ - i}}[iNAP - NADPH] - {k_7}[iNAP]$$ $$\frac{{d[iNAP]}}{{dt}} = {k_i}[iNAP][NADPH] - {k_{ - i}}[iNAP - NADPH] - {k_8}[iNAP - NADPH]$$

The reaction between iNAP and NADPH is real fast so we used the rapid equilibrium assumption. After fitting with experimental data, we found our ODE equations can describe the system well as you can see in Fig.1.

Fig. 2 The comparison between experimental data and simulation data

Taking the fluorescence in 3960 seconds as a mark, the calibration curve can be calculated by running the different initial concentrations of glyphosate. The calibration curve is shown in Fig. 3.

Fig. 3 The calibration curve in the detection system (the fluorescence in 3960 seconds as a mark)

From the Fig.3 we can get the information that the detection range of our system is from 0 to 0.5 mM.

Modeling of PhnJ was aimed in search of optimized enzymatic protein to achieve the idealized function of PhnJ after being transformed into E.coli. The following processes and methods led us to explore the optimal solution.

# 2.1 An Top-down strategy : begin with the holoenzyme's view

Before we went deeply into the mechanism of PhnJ, we first tried to understand the holoenzyme—PhnGHIJK, where the PhnJ is assembled after translation.

PhnGHIJ, the C-P lyase core complex, contains four protein subunits, in which PhnJ is the core enzyme, catalyzing the lysis of C-P bond in glyphosate and methyl phosphonate as a C-P lyase. PhnM release the pyrophosphate and PhnJ cleaves the C-P bond and PhnP and PhnN convert it to PRPP. PhnK associates with PhnGHIJ as the fifth protein resembles ABC cassette proteins with unclear stoichiometry.PhnI is a nucleosidase to deglycosylate ATP and GTP into ribose 5-triphosphate. PhnGHL supports the reaction of the C-P lyase pathway in the place of PhnI to transfer the phosphonate to the ATP position in the ribose C-1 position to displace adenine and generate the reaction intermediate ribose 5-triphosphate alkyl phosphonate.

Fig. 4 The reaction carried out by PhnGHIJ

“The C-P lyase core complex resembles the letter 'H' with rounded arms that are twisted approximately 45° in and out of the plane with respect to each other." PhnG and PhnJH compose the two arms of “H”, while two PhnI touch and stretch over the two arms in the middle, as the center of C-P lyase complex.

Fig. 5 The 3D structure of PhnGHIJ(PDB ID:4xb6)

PhnI is comprised with a four-stranded, antiparallel beta-sheet next to a four-helix bundle with two helical extensions in both C- and N- termini to grasp and tether PhnJ by extensive interaction. Furthermore, PhnJ connects and associates with PhnH through the packing of two conserved alpha-helices in both proteins. The interaction of PhnHJ heterodimer is similar to the subunits structure of PhnH homodimer.

Fig. 6 The functional structure of PhnGHIJ(PDB ID:4xb6)

Just as described above, the collaborations between PhnG, PhnH, PhnI, PhnJ, PhnK are important so we suppose that increasing the binding affinity between the five proteins may contribute to the performance of Glyphosate degradation.

For the PhnJ (exogenous), our work mainly focused on getting the structure by homologous modeling.

The gene sequence of PhnJ in Enterobacter cloacae K7 was reported to be highly conserved and high identity (99%) to the phnJ sequence in Enterobacter cloacae strain EcWSU1. Therefore, we used codon-optimization to the preference of E. coli and submitted the gene sequence in the website of SWISS-MODEL (https://swissmodel.expasy.org/), filtrated and obtained the optimal homologous protein model of PhnJ (exogenous).

Fig. 7 The structure of PhnJ (exogenous) predicted by SWISS-MODEL

In order to achieve a more precise and accurate result of the PhnJ protein model, we implemented MOE to build another homologous model with different algorithms.

Fig. 8 The structure of PhnJ (exogenous) predicted by MOE

Plus, MOE offers geometry analysis of protein model, which brings much convenience to our model assessment and evaluation.

Fig. 9 The geometry analysis of PhnJ (exogenous) predicted by MOE

Initially, we assessed the SWISS-MODEL result of PhnJ in Molprobity (http://molprobity.biochem.duke.edu/), with the function of “Analyze geometry without all-atom contacts”. It was reported that the model possesses numbers of clashes, outlier rotamers, and CaBlam outliers.

Fig. 10 Analyze geometry without all-atom contacts for PhnJ (exogenous) predicted by SWISS-Model

Thus, we built another homologous PhnJ model in software MOE and achieved geometry analysis within the software. Moreover, with the basic understanding of the C-P lyase complex in the aspect of structure and protein interaction, we designed several mutations in PhnJ and transformed them into the fittest configuration of each mutated amino acid. Basically, the interface of PhnJ to PhnH and PhnI was the essence of mutation. For instance, the hydrophilic lysine in the PhnJH interface was mutated into hydrophobic phenylalanine to achieve a strengthened hydrophobic interaction and diminish the surface water inside. So we tried to choose the following three mutations, including R21M, which have mutate the 21st amino acid from R to M, P45Q, and T16S & R40Y which have two sites mutated.

The mutation was homologous modeling by Swiss-Model, and the mutated site can be seen in Fig. 11.

Fig. 11 The mutation site of PhnJ designed with the top-down strategy

Protein-Protein docking was carried out by the ClusPro server and the binding energy can be seen in table.1.

Table.1 The binding energy between PhnJ and the rest part of PhnGHIJ

Endogenous R21M T16S & R40Y P45Q
Center -1299.0 -1821.9 -2078.1 -1739.4
Lowest -3913.7 -4023.7 -4044.8 -4001.6

# 2.2 An Bottom-up strategy : begin with the PhnJ's view

Only obtaining the higher binding affinity can’t guarantee that the mutated protein will have better performance to degradation the glyphosate. We should go deeper into the reaction mechanism of PhnJ to check whether the mutated sites leads to a higher speed of reaction or just inactive the enzyme.

# 2.2.1 The reaction mechanism of PhnJ

The reaction mechanism of PhnJ can divided into the following six-step.

Step1. The initial formation of the 5' deoxyadenosine radical. The mechanism is assumed to be identical to other Radical SAM type proteins.

Step2. The adenosyl radical abstracts a hydrogen atom from the catalytic glycine residue. The products of this activation step represent the ground state of this enzyme.

Step3. The glycyl radical abstracts a hydrogen atom from the catalytic cysteine residue.

Step4. The cysteinyl residue attacks the phosphate group, forming a pentavalent intermediate.

Step5. The pentavalent intermediate collapses, eliminating a methyl radical with the concomitant abstraction of a hydrogen atom from the glycine residue.

Step6. One of the hydroxyl groups of the ribose intermediate initiates a nucleophilic attack on the covalently bound phosphate to regenerate the cysteine residue and final product. The enzyme is now in a state that it can perform another round of catalysis.

Besides the reaction mechanism, some super-distance hydrogen atom transfer are worth to study, like Step2 and Step3. The previous research has shown that the hydrogen atom transfer can be carried by the hydrogen bond network, some proton channel, or protein re-folding. While for PhnJ itself, the super-distance hydrogen atom transfer just happened between the binding site and catalytic site which means that the substrate may be transported inside or outside the protein. But for the most common situation in such a complex system, those mechanisms mentioned above are likely to work together to finish hydrogen atom transfer.

Fig. 12 The distance between Gly32 and Cys262 in PhnJ

So here we have the following three parts to evaluate the state of hydrogen atom transfer in a mutation which is operated in three steps in reaction mechanism including Step2, Step3, and Step5.

# 2.2.2 Finding the adenosyl radical transforms path in Step2，an general method combining Monte Carlo sampling and reinforcement learning is created and concluded.

In the step2 of the reaction mechanism, the first super-distance hydrogen atom transfer takes place in PhnJ’s reaction coordinates. After the initial formation of the 5`deoxyadenosine radical, the adenosyl radical abstracts a hydrogen atom from the catalytic glycine residue (Gly32). But the distance between adenosyl radical and Gly32 is too far to finish the direct transfer without moving adenosyl radical. To find the possible hydrogen atom transfer pathways in Step2, we propose an algorithm call F2ARTP (Finding the adenosyl radical transfer path) to calculate the path.

The F2ARTP algorithm can be splited into two parts: the Monte Carlo sampling which will gives a score function to evaluate the possible docking position of the substrate using the Gibson binding energy and, reinforcement learning which will find the path that satisfies the minimized requirements of total reaction energy and moving distance. The workflow of F2ARTP can be seen in Fig. 13.

Fig. 13 The F2ARTP algorithm

The Monte Carlo Sampling part is used to collect the sample which can be used to integrate the scoring functions that are used to evaluate the possible docking position of the substrate and work as a part of the return function in Deep Q-Learning. It can be split into the following parts:

1. Pre-processing the protein structure for molecular dynamics. Adding missing hydrogen atoms and convert non-standard residues to their standard equivalents to make the PDB structure suitable for performing the molecular dynamics.

2. Using the classical molecular dynamics to collect the possible flexible structural sample which can be used for molecular docking.

3. Pick a reasonable number of frames from the results of MD and use it as accepter while perform molecular docking to find all the possible binding sites for substrate around the protein and calculate their Gibson binding energy.

4. All steps here using the hypothesis that the protein is the flexible part while the substrate is the rigid part because we need to identify the position of the substrate using the only one coordinates in cartesian coordinates, which is a convenience for us to integrating the result of molecular docking. In the step, we should find a mapping between the possible docking position and its binding energy. We can choose the table function or neural network.

The two key methods: molecular dynamics and molecular docking is described as following.

Molecular dynamics (MD) is a computer simulation method for studying the physical movements of atoms and molecules. The atoms and molecules are allowing to interact for a fixed period of time, giving a view of the dynamic evolution of the system. In the most common version, the trajectories of atoms and molecules are determined by numerically solving Newton's equations of motion for a system of interacting particles, whose forces between the particles and their potential energies are often calculated using interatomic potentials or molecular mechanics force fields.

To a system which consists of /N/ molecule or atoms, the total energy of a system includes kinetic energy and potential energy，which can be describe by the formula below:

$$E = {E_{kin}} + U$$

where ${E_{kin}}$ donates the kinetic energy and $U$ donates the potential energy.

In a molecule system, the total potential energy can be calculated by adding the ${U_{nb}}$ bond stretching potentials energy ${U_b}$ angle bending potentials energy ${U_\theta }$ torsion angle potentials energy ${U_\phi }$ out-of-plane potentials energy ${U_\chi }$ and some other cross effect ${U_{cross}}$ together，which also can be describe by the formula below:

$$U = {U_{nb}} + {U_b} + {U_\theta } + {U_\phi } + {U_\chi } + {U_{cross}}$$

This formula above also called the force field in molecular dynamics’ theory. There are many force field in the world that based on the statistical thermodynamics and empirical result. In the research of the protein and nucleic acid, the Amber force field is one of the best force field in the world. So we choose Amber as our force field and the formula of this field show below:

$\begin{array}{l} E = \sum\limits_{bond} {{K_b}{{({r_{ij}} - {r_0})}^2}} + \sum\limits_{angle} {{K_\theta }{{(\theta - {\theta _0})}^2}} + \sum\limits_{dihedral} {\frac{{{K_\phi }[1 + \cos (n\phi - {\phi _0})]}}{2}} \\ + \sum\limits_{impr} {\frac{{{K_\chi }[1 + \cos (n\chi - {\chi _0})]}}{2}} + \sum\limits_{nobond} {{\varepsilon _{ij}}\left[ {{{\left( {\frac{{R_{ij}^0}}{{{R_{ij}}}}} \right)}^{12}} - 2{{\left( {\frac{{R_{ij}^0}}{{{R_{ij}}}}} \right)}^6}} \right]} + \sum\limits_{nobond} {\frac{{{q_i}{q_j}}}{{{R_{ij}}}}} \end{array}$

The items in the formula refers to bond stretching term、angle bending potentials、dihedral angle potentials、put of plane angle potentials、improper dihedral angle potentials、Van Der Waals interaction and Coulombic interaction terms in order.

Now considering the system which contains of consists of $N$ molecule or atoms in the classic mechanics, the atom or molecule can be characterized as follow:

The position of atom $i$ is ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} _i}$ , the speed of atom $i$ is ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over v} _i}$ , the acceleration of atom $i$ is ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over a} _i} = \frac{d}{{dt}}{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over v} _i} = \frac{{{d^2}}}{{d{t^2}}}{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} _i}$ , $i = 1,2 \cdots N$ .

After the integral operation, we can get two formula of ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over v} _i}$ and ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} _i}$ :

$${\mathord{\buildrel{\lower3pt\hbox{\scriptscriptstyle\rightharpoonup}} \over v} _i} = {\mathord{\buildrel{\lower3pt\hbox{\scriptscriptstyle\rightharpoonup}} \over v} _i}^0 + {\mathord{\buildrel{\lower3pt\hbox{\scriptscriptstyle\rightharpoonup}} \over a} _i}t$$ $${\mathord{\buildrel{\lower3pt\hbox{\scriptscriptstyle\rightharpoonup}} \over r} _i} = {\mathord{\buildrel{\lower3pt\hbox{\scriptscriptstyle\rightharpoonup}} \over r} _i}^0 + {\mathord{\buildrel{\lower3pt\hbox{\scriptscriptstyle\rightharpoonup}} \over v} _i}^0 + \frac{1}{2}{\mathord{\buildrel{\lower3pt\hbox{\scriptscriptstyle\rightharpoonup}} \over a} _i}{t^2}$$

where the ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over v} _i}^0$ refers to initial speed and the ${\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over r} _i}^0$ refers to initial

According to the classic mechanics, the force applied to atoms is the negative gradient of potential energy:

$${\mathord{\buildrel{\lower3pt\hbox{\scriptscriptstyle\rightharpoonup}} \over F} _i} = - {\nabla _i}U = - \left( {\frac{\partial }{{\partial x}}\mathord{\buildrel{\lower3pt\hbox{\scriptscriptstyle\rightharpoonup}} \over i} + \frac{\partial }{{\partial y}}\mathord{\buildrel{\lower3pt\hbox{\scriptscriptstyle\rightharpoonup}} \over j} + \frac{\partial }{{\partial z}}\mathord{\buildrel{\lower3pt\hbox{\scriptscriptstyle\rightharpoonup}} \over k} } \right)U$$

Due to Newton’s second law, the acceleration of atom $i$ can also be described as:

$${\mathord{\buildrel{\lower3pt\hbox{\scriptscriptstyle\rightharpoonup}} \over a} _i} = \frac{{{{\mathord{\buildrel{\lower3pt\hbox{\scriptscriptstyle\rightharpoonup}} \over F} }_i}}}{{{m_i}}}$$

Molecular docking is a key tool in structural molecular biology. The goal of ligand-protein and protein-protein docking is to predict the predominant binding modes of a ligand (include molecules, ions, and proteins) with a protein of known three-dimensional structure. Successful docking methods search high-dimensional spaces effectively and use a scoring function that correctly ranks candidate dockings. Docking can be used to perform virtual screening on large libraries of compounds, rank the results, and propose structural hypotheses of how the ligands inhibit the target, which is invaluable in lead optimization. The setting up of the input structures for the docking is just as important as the docking itself, and analyzing the results of stochastic search methods can sometimes be unclear.

The reinforcement learning part is used to identify the path that satisfies both of the minimization of total reaction energy and moving distance.

Reinforcement learning is the training of machine learning models to outcome a sequence of decisions. The agent learns to achieve a goal in an uncertain and potentially complex environment. In reinforcement learning, artificial intelligence faces a game-like situation. The computer employs trial and error to come up with a solution to the problem. To satisfy the program's demand by the machine, artificial intelligence gets either rewards or penalties for the actions it performs. Its goal is to maximize the total reward.

In this part, we used the Deep Q-Learning showing in Fig. 14 Comparing to Q-Learning, Deep Q-Learning used the neural network instead of Q-table to get the better result, which was proved to be better in playing Atari games. And in recent years, with the development of AlphaFold, Reinforcement learning is applied in biological design much more frequently.

Fig. 14 The Deep Q-Learning algorithm

As the algorithm workflow mentioned above, we treated the problem that Finding the adenosyl radical transforms path as a semi- Atari game. For this game, we needed to define the Target-Network and Reward used in the algorithm.

For the Target-Network, we used an integrated network including an ANN which taking two kinds of input: the position of agents and the distance between the destination and real-time position, and a 3-D Convolutional Neural Network which taking a 3D image record the score function’ values which besides agents (cut-off distance is 2A using a cube).

For the reword, before we got the destination, the reward would be the following formula:

$$R{\rm{ = }}\frac{{\Delta \Delta G}}{{\Delta d}}$$

Which was used to calculate as energy barrier per unit between two steps.

And after we got the destination, the reward would be the following formula:

$$R{\rm{ = }}\frac{{\sum {\Delta \Delta G} }}{{total\_seep}}$$

Which was used to calculate the energy used per step in the whole game session.

As for the PhnJ which we wanted to design, there will be more specific things in this F2ARTP algorithm. Also in the following paragraphs, we will mention some results about it.

PhnJ and its mutation's PDB file can get in 2.1 while the structure and the molecular forcefield of adenosyl radical can be generated by LigParGen and all structures are balanced by Avagadro with MMF868 forcefield. And finally, molecular dynamics was carried out by OpenMM, and DQN was coded with TensorFlow.

The First result we wanted to post here is the docking result which can be seen in Fig. 15.

Fig. 15 All the docking position in the paths detected by the PatchDock algorithm

When integrating the scoring function, we should notice that PhnJ is a part of PhnGHIJK, which means that there are some interface regions which are impossible to be regarded as docking region as you can see in Fig. 16, We should delete them all.

Fig. 16 The interfacal region between PhnJ and other proteins in PhnGHIJK

And finally the all possible paths calculated by F2ARTP can be seen in the following video.

And we took three scrennshouts(Fig. 17 to Fig. 19) by sequential and drawn the paths we found to discuss it.

Fig. 17 The first screenshot of the video

Fig. 18 The second screenshot of the video

Fig. 19 The third screenshot of the video

The F2ARTP algorithm finally found three paths: two of them can directly reach the Gly32 (the destination) with different paths while the other path will drop into a strange energy local minimization points which can't be escaped. Four enzymes we used in the F2ARTP algorithm including endogenous PhnJ, R21M, T16S&R40Y, P45Q belongs to different paths: the P45Q's agent will drop into the energy local minimization points (the path in blue) while the R21M and endogenous PhnJ goes in the same way (the path in red). The T16S&R40Y uses the path in yellow to reach the Gly32 .

Finally, the total energy barrier and the total step can be seen in the Table.2.

Table.2 Total energy barrier and the total steps for PhnJ and its mutation

Endogenous R21M T16S & R40Y P45Q
Steps 562 485 681 942
Total Energy Barrier 1324.49 1624.88 1224.36 1985.5

# 2.2.3 Detect the hydrogen atom transfer in Step3 using molecular dynamics

As the second super-distance hydrogen atom transfer in PhnJ’s reaction coordinates. the glycyl radical abstracts a hydrogen atom from the catalytic cysteine residue. But the distance between Gly32 and Cys272 is also large than any chemical bound so we need to find a way to detect the hydrogen atom transfer.

To calculate the hydrogen atom path, the QM or QM/MM will be the most perfect method but our system is too large for QM and the distance between the destination(Gly32) and beginning palace(Cys272) is too far to set the QM region. So we finally chosen the classical molecular dynamics we mentioned in 2.2.2 to detect the hydrogen atom transform by evaluating the dynamic distance between Gly32 and Cys272.

For the MD structure preparation, we drawn 3,5'-phospho-α-d-ribosyl1'-(N-phosphonomethylglycine) in ChemDraw and got the smiles representation. After putting it in the LigPraGen web service, we could simply get the PDB files and it's forcefield. Then the target structure will be relaxed by Avagadro with MMF868 forcefield. After that we dock this small molecular with PhnJ in PatchDock web service and perform MD with OpenMM.

We used the distance between Cys272-S and Gly32-R as a mark, and the result can be seen in Fig. 20.

Fig. 20 The dynamics distance between Cys272-S and Gly32-R

The data was not right or maybe in this step, the protein re-folding was not the right method to finish hydrogen atom transfer. It was still unsure for us.

# 2.2.4 Explore the breaking of C-P bonds in Step5 using QM/MM

The PhnJ belongs to a family called C-P lysis, which can break the C-P bound, like the step5 described in 2.2.1. The pentavalent intermediate collapses, eliminating a methyl radical with the concomitant abstraction of a hydrogen atom from the glycine residue. We wanted to simulate this by QM/MM method.

The QM/MM method is a balance between QM and MD, which can simulate the break of chemically bound that beyond the MD while guarantee that the time cost is acceptable than QM. The simulation system in QM/MM will be split into three parts, the MM region, the QM region, and the interface. QM region is calculated with the principles of quantum mechanics while MM with classical mechanics. The interface is often set with the hydrogen atom(Link atom) which can be easy to calculate.

QM Calculations are the pinnacle of precision compared to all other methods used for calculating molecular systems, but this precision comes with very high computational costs. All QM calculation tries to solve the Schroedinger Equation: $$E = \widehat H\psi$$

We can solve the equation by calculating the Hamiltonian of the system. The Hamiltonian is an operator that if applied to the wavefunction of a system results in the same wavefunction times its energy. QM calculations are characterized by two important things: the functional and the basis set used. The functional is characterized by the way the Hamiltonian is calculated. For all calculations that we do, we also need a starting point to describe orbitals or the density function and the basis set describes the number and type of the orbital functions that we use to calculate the system. Born Oppenheimer Approximation A necessary assumption to use QM methods is the Born Oppenheimer Approximation. This is the approximation that due to the vast difference in speed between the electrons and the nucleus of the atoms the movement of both can be investigated apart from each other. For QM Methods this means that we only look at the movement of electrons and neglect the very slow movements of the nuclei when solving the Schroedinger Equation.

There are two fundamentally different types of functionals, wave function-based, and density function-based methods. We only used density function-based methods. Density Function Theory (DFT) is centered not on the wavefunction but the square of the wave function, i.e. the electron density. It is based on the theorem of Hohenberg and Kohn and modern DFT is also based on the Kohn Sham approach. The electron density is a measurable quantity that is dependent on the cartesian coordinates of the electrons. As a further development of this approximation, Gradient corrected methods (GGA) have been invented. These methods consider the electron density not to be uniform.

There are multiple terms in the Hamiltonian that have to be calculated. The most troublesome one is the so-called "exchange-correlation" term. There is a class of DFT functionals, the so-called Hybrid Functionals, that use a combination of exchange-correlation terms of different methods. The method we used, B3LYP, uses a combination of LDA, GGA, and Hartree Fock (A wavefunction based method) exchange-correlation.

Now that we know which method we use, we can take a look at the second important characteristic of our QM calculations, the basis set. All methods described previously, even if based on DFT, use Orbitals in their calculations. By combining a certain number of orbitals, the molecular orbitals of the system have to be expressed. This starting number of orbitals is called the basis set. The more orbitals we have the more accurate the calculations become, but they also get more demanding.

In our system, the C-P bond with its group and Gly32-R set as QM region, and the rest part set as the MM region. The interface is C-P bound and Gly32-R bound, which is broken and replace with link atom. QM/MM use PyChemShell to couple two software: GLUP for MD and ORCA for QM.

But unfortunately, due to the time and computer capability’s limitation, the QM/MM simulation is still running when the time Wiki Freeze.

# 2.5 Result Discussion

All the mutation was tested and the result is shown in Fig. 21.

Fig. 21 Relationship between concentration of glyphosate and culture time

Two mutations, R21M and T16S&R40Y, were successful in circuit construct and the total glyphosate was measured with our detection system. We could see that all the mutations are weak than the original enzyme. But when silencing the original enzyme in E.coil, the degradation efficiency was significantly increasing which indicated that the exogenous PhnJ and endogenous PhnJ is a competition relationship when they're assembling into the holoenzyme PhnGHIJ. Think more about this question, exactly we will not get the real PhnJ mutation's efficiency unless silencing the endogenous PhnJ in the experiments.

Go back to our model, we tried to design with the top-down strategy while checking with the bottom-up strategy. Even our mutation was confirmed as the higher binding infinity but they seem that didn't won in the competition with the endogenous PhnJ. But when checking with the bottom-up strategy, our F2ARTP algorithm gave the prediction that T16S&R40Y will almost have the same path as endogenous PhnJ, but also predicted that the R21M has the same way as the endogenous PhnJ. It was a little different compared to the experimental data but it may due to the lack of data or the limit sampling number. While the classical molecular dynamics is running in the opposite direction, but it will be reasonable if the step3 is not the limitation step for the enzyme and we are looking forward to the QM/MM's result.

3.Kill Switch

As it is mentioned in the project design page, we have created two kinds of kill switches which are used in both detection and degradation system for the biosafety. It should obtain the expectation that our bacteria for degradation only survive when formaldehyde presents in the system and for detection only survives when Arabinose presents in the detection system. In Fig22 and Fig. 23, there are no differences in the two gene circuits except the promoters：pBAD promoter in the detection system while the formaldehyde promoter in the degradation system.

Fig. 22 The kill switch in the degradation system

Fig. 23 The kill switch in the detection system

However, we found that formaldehyde itself also has a toxic effect on the cell, which means that whether the concentration of formaldehyde is too high or too low, there’s a high chance that our bacteria would die. Therefore, figuring out the optimum concentration of formaldehyde only by doing some experiments will be time-consuming and almost impossible. Hence, it’s necessary to build a model to illustrate the relationship between the concentration of formaldehyde and the survival of cells. Eventually, we can find the most suitable concentration of formaldehyde for the engineered bacteria and the best concentration of glyphosate as well. Besides, it is also beneficial for large-scale industrialization in the future. Now we take the circuit in Fig. 22 as an example to introduce the ODEs model for the kill switch.

The circuit in Fig. 22 contains the formaldehyde promoter, the inverter (including cI lam protein), RBS, and the toxin mazF. To begin with, we found related parameters of those biobricks on the Part Page and read some articles and papers about modeling.

1.The relationship between the concentration of formaldehyde and the expression of cI lam protein

1) Molecular(formaldhyde)

$$\frac{d[formaldhyde]}{dt}=\mathrm{-Degradation}$$

2) P-formaldehyde promoter（DNA→mRNA）：

$$\frac{{d\left[ {mRN{A_{P - formaldehyde}}} \right]}}{{dt}} = \frac{{\left( {a - \xi } \right)\cdot{{\left[ {formaldhyde} \right]}^n}}}{{{K_M}^n + {{\left[ {formaldhyde} \right]}^n}}} + \xi - {D_{mRNA}}\cdot\left[ {mRN{A_P}} \right]$$

3) Peptide of B0034 after P-formaldehyde（mRNA→peptide）

$$\frac{{d\left[ {peptid{e_{P - formaldehyde - RBS}}} \right]}}{{dt}} = RBS\;Strength\cdot\left[ {mRN{A_{P - formaldehyde}}} \right] - {D_{protein}}\cdot\left[ {peptide} \right]$$

4) cl lam protein（peptide→protein）

$$\frac{{d\left[ {cl\;lam\;protein} \right]}}{{dt}} = \left( {Peptide\;Maturation\;Ratio} \right) \times {\rm{\;}}\left[ {peptide} \right]$$

Since it’s unlikely to obtain some of the parameters in those three functions above, we integrated them into one:

2.The relationship between the expression of cI lam protein and the expression of toxin mazF protein

1) P-cl lam promoter（DNA→mRNA）：

$$\frac{{d\left[ {mRN{A_{P - cl\;lam}}} \right]}}{{dt}} = \frac{{\left( {a - \xi } \right)}}{{1 + {{\left[ {\frac{{cl\;lam\;protein}}{{{K_M}}}} \right]}^n}}} + \xi - {D_{mRNA}}\cdot\left[ {mRN{A_{P - cl\;lam}}} \right]$$

2) Peptide of B0034 after P-cl lam（mRNA→peptide）

$$\frac{{d\left[ {peptid{e_{P - cl\;lam - RBS}}} \right]}}{{dt}} = RBS\;Strength\cdot\left[ {mRN{A_{P - cl\;lam}}} \right] - {D_{protein}}\cdot[peptide]$$

3) mazF（peptide→protein）

$$\frac{{d\left[ {mazF} \right]}}{{dt}} = \left( {Peptide\;Maturation\;Ratio} \right) \cdot \left[ {peptide} \right]$$

Significantly, the DmRNA, Dpeptide, R, V0, Vmax, KM and n are different in different part.

3.The relationship between the expression of toxin and the OD600

To stimulate the toxic effect of mazF to our bacteria, we introduced following functions:

$$\frac{{dO{D_{600}}}}{{dt}} = O{D_{600}} \cdot {\mu _{\max }} \cdot (1 - \frac{{O{D_{600}}}}{{O{D_{600}}_{\max } \cdot \frac{{{k_{mazF}}}}{{{k_{mazF}} + [mazF]}}}})$$ $$O{D_{600}} = \sqrt {\frac{{CFU}}{k}}$$

By our experimental data, we firstly get the relationship between OD600 and CFU which following the formula below:

$$O{D_{600}} = \sqrt {\frac{{CFU}}{486}}$$

Comparison of the formula above and experimental data can be seen in Fig. 24.

Fig. 24 The relationship between OD600 and CFU

Using this formula, we can transform the CFU we measured into OD600 and fit data with the ODEs model we build. The result is quite well in the Fig. 25 comparing to the experimental data.

Fig. 25 The simulation data and experimental data in kill switch system with pBAD promoter

Unfortunately, the circuit in Fig. 22 failed, but by calculating all the pBAD-related parameters' sensitivity of OD600, we found that it's small enough as you can see in Fig. 26 which means that even we change the promoter, the result will keep stable. So we have proved the feasibility of the kill switch both in detection and degradation system.

Fig. 26 all the pBAD-related parameters' sensitivity of OD600

Reference

1 Pan, L. et al. Aldo-keto Reductase Metabolizes Glyphosate and Confers Glyphosate Resistance in Echinochloa colona. Plant Physiol 181, 1519-1534, doi:10.1104/pp.19.00979 (2019).
2 Rumsby, G. & Cregeen, D. P. Identification and expression of a cDNA for human hydroxypyruvate/glyoxylate reductase. Biochimica et Biophysica Acta (BBA) - Gene Structure and Expression 1446, 383-388, doi:https://doi.org/10.1016/S0167-4781(99)00105-0 (1999).
3 Tao, R. et al. Genetically encoded fluorescent sensors reveal dynamic regulation of NADPH metabolism. Nat Methods 14, 720-728, doi:10.1038/nmeth.4306 (2017).
4 Zhao, Y. et al. SoNar, a Highly Responsive NAD+/NADH Sensor, Allows High-Throughput Metabolic Screening of Anti-tumor Agents. Cell Metabolism 21, 777-789, doi:10.1016/j.cmet.2015.04.009 (2015).
5 van Bloois, E., Winter, R. T., Kolmar, H. & Fraaije, M. W. Decorating microbes: surface display of proteins on Escherichia coli. Trends Biotechnol 29, 79-86, doi:10.1016/j.tibtech.2010.11.003 (2011).
6 Fu, G. M. et al. Pathway and rate-limiting step of glyphosate degradation by Aspergillus oryzae A-F02. Prep Biochem Biotechnol 47, 782-788, doi:10.1080/10826068.2017.1342260 (2017).
7 Hove-Jensen, B., Zechel, D. L. & Jochimsen, B. Utilization of glyphosate as phosphate source: biochemistry and genetics of bacterial carbon-phosphorus lyase. Microbiol Mol Biol Rev 78, 176-197, doi:10.1128/MMBR.00040-13 (2014).
8 Singh, S. et al. Glyphosate uptake, translocation, resistance emergence in crops, analytical monitoring, toxicity and degradation: a review. Environmental Chemistry Letters 18, 663-702, doi:10.1007/s10311-020-00969-z (2020).
9 Kryuchkova, Y. V. et al. Isolation, and characterization of a glyphosate-degrading rhizosphere strain, Enterobacter cloacae K7. Microbiological Research 169, 99-105, doi:https://doi.org/10.1016/j.micres.2013.03.002 (2014).
10 Errey, J. C. & Blanchard, J. S. Functional Annotation and Kinetic Characterization of PhnO from Salmonella enterica. Biochemistry 45, 3033-3039, doi:10.1021/bi052297p (2006).
11 Chan, C. T., Lee, J. W., Cameron, D. E., Bashor, C. J. & Collins, J. J. 'Deadman' and 'Passcode' microbial kill switches for bacterial containment. Nat Chem Biol 12, 82-86, doi:10.1038/nchembio.1979 (2016).