The model part in iGEM provides us an opportunity to attempt some
interdisciplinary methods, including computer science, chemistry, physics, math and so on. One of
the most important features of modeling is that we can design experiments reasonably and have a
better knowledge of some quantitative features of the biological system after modeling.
Interestingly, models can also help us to discover something novel and achieve goals that can’t be
fulfilled with only traditional experiments.
In this year, kinetics models are utilized to help to design our
experiments and estimate results for some tasks that can’t be completed in the wet lab. We also use
a cell programming language to visualize our system and act as a complement of kinetics models.
Furthermore, bioinformatics and structural biology are combined to help us have a better knowledge
of our enzymes and provide a firm grounding for protein design for us in phase 2. Now see texts
below for detailed information.
Q: What is your inspiration of building this model?
A: When we firstly thought about what may be the target of our
models, we put forward building a quorum dynamics model at once. Because we wanted to get an
overview of the quorum sensing system, to see how it works and the effects our quenching enzymes can
have on it. And it may also show the connection between the two quorum sensing systems and help us
find the enzymes with the best overall inhibition of the two systems. So after brainstorm we decided
to make it.
Q: How do you want to improve this model in the future?
A: We simplify the quorum sensing systems of the Pseudomonas
aeruginosa in this model a lot so we really want to take more factors into consideration in the
future to make it more similar to the real situation. And as the results don’t show the connection
between the two quorum sensing systems well, we hope to find more accurate parameters or conduct
more experiments for data to modify the present model.
Q: Why did you use this model to simulate biofilms?
A: The biofilm model we established is based on the diffusion
equation and the law of conservation of mass, which has certain theoretical basis and can simulate
the growth of biofilm well.
Q: What are the advantages and disadvantages of biofilm model?
A: Since the model has clear physical and biological significance,
it can make the growth of biofilm easier to understand. At the same time, using the form of partial
differential equation, the dimension and the condition of a definite solution can be changed to meet
the needs of different cases. But because of the complexity of the form of the partial differential
equation, it is difficult to solve the model.
Q: How do you want to improve this model in the future?
A: We want that the partial differential equation of the biofilm
model can be further simplified without changing the original significance, so as to make the
solution more convenient.
Connection with wet lab:
The quorum dynamics model helps us analyze the data we got from the experiments with quorum
quenching enzymes. In this way, we can rapidly find the effects of different enzymes with different
Km values towards the two AHLs have on the pathogens’quorum sensing systems. And then we can choose
a better enzyme to inhibit the toxicity of the pathogens.
The biofilm model can help us simulate the growth process of biofilm, so as to predict the growth of
biofilm and the influence of quorum sensing on it without doing wet experiments. In this way, it may
provide guidance the design of wet lab.
To prove that enzymes from our engineering probiotics can actually
decrease the AHLs concentration in the pathogens effectively and help us see the effects of
using different enzymes directly (search for better enzymes), we build a quorum dynamics model
composed of ODEs to simulate the process of quorum sensing and quenching in Pseudomonas
aeruginosa. And we analyze the model with MATLAB. Although with some problems existing, this
model still successfully shows that the enzymes can make a big influence of the AHLs in the
pathogens that can inhibit the toxicity of them.
We start with these assumptions:
1. We assume that protein turnover is slower than the degradation of mRNA and that there is no
significant post-transcriptional regulation;
2. It is also assumed that there is no shortage of substrate for autoinducer synthesis. As a
consequence, there is no need to explicitly model the biosynthesis of OdDHL and BHL by the LasI
and RhlI synthases;
3. Production of LasR, RhlR, OdDHL, and BHL are assumed to follow Michaelis- Menten
kinetics;
4. The OdDHL and BHL’s transmembrane diffusion coefficients of engineered bacteria are the same
as those of pseudomonas aeruginosa (, );
5. Spontaneous degradation and background synthesis of non-swarm sensitive factors in
extracellular environment and engineered bacteria;
6. Group quenching enzyme is Mihalanase. [1]
Our model layout is as follows:
For Intracellular pathogens:
LasR:
According to the law of mass action, R1 forms the complex formation at a rate of k1, while the
complex decomposes to R1 at a rate of k2. R1 degrades at the rate of bR1. Based on the Michaelis
Menten equation, the production of LasR is also determined by C1, VR1, and KR1. VR1 is the
maximum rate of LasR production. KR1 is a constant that determines the affinity between the
complex formation (C1) and the LasR promoter. And we assume LasR to be constitutively expressed
at rate R10. Thus the system corresponds to
RhlR:
Similar to R1, the system of RhlR corresponds to
Complex formation:
According to the law of mass action, C1 associates at rate k1 and dissociates at rate k2. In the
same way, C2 associates at rate k3 and dissociates at rate k4, so the system corresponds to
OdDHL:
According to the law of mass action, A1 forms the complex formation at a rate of k1, while the
complex decomposes to A1 at a rate of k2. A1 degrades at the rate of bA1. Based on the Michaelis
Menten equation, the production of OdDHL is also determined by C1, VA1, and KA1. VA1 is the
maximum rate of OdDHL production. KA1 is a constant that determines the affinity between the
complex formation (C1) and the OdDHL promoter. And we assume OdDHL to be constitutively
expressed at rate A10. A1 diffuses outward at the rate of d1β. d1 is the Diffusion coefficient,
and β is the ratio of pathogen to extracellular volume. Thus the system corresponds to
BHL:
Similar to A1, the system of BHL corresponds to
For extracellular part:
OdDHL:
A1 flows out of the cell at the rate of d1β. d1 is the Diffusion coefficient, and β is the ratio
of pathogen to extracellular volume. Thus the system corresponds to
Similar to A1ex, the system of extracellular BHL corresponds to
After the addition of engineered bacteria, considering that the extracellular OdDHL and BHL
enter into the engineered bacteria through diffusion, equations (7)-(8) are changed into
equations (9)-(10) as follows:
OdDHL:
For intracellular engineered bacteria:
OdDHL:
Based on the Michaelis Menten equation, the decomposition of OdDHL in engineered bacteria is
determined by VA1in, and KA1in. VA1in is the maximum rate of OdDHL decomposition. KA1in is a
constant that determines the affinity between the quenching enzyme and the OdDHL promoter. A1in
diffuses outward at the rate of d3α. d3 is the Diffusion coefficient, and α is the ratio of
engineered bacteria to extracellular volume. Thus the system corresponds to
Similar to A2ex, the system of BHL in engineered bacteria corresponds to
Figure 1 The concentration of each substance in the system without adding engineered bacteria
The figure above shows the concentration of each substance in the system without adding engineered bacteria. It can be seen that after reaching equilibrium, the intracellular OdDHL, BHL concentrations and extracellular concentrations of pathogenic bacteria are approximately equal.
Figure 2 The concentration of each substance in the system adding engineered bacteria
The figure above shows the concentration of each substance in the system after adding engineer bacteria. It can be seen that compared with Figure 1, the equilibrium concentration of most substances is significantly reduced, and the equilibrium time is earlier. The following is to illustrate by comparing the concentration change curves of each substance before and after the addition of engineer bacteria.
Figure 3 Comparison of the concentration before and after adding engineered bacteria
As can be seen from the above comparison figure, after the addition of engineered bacteria, the equilibrium concentration of LasR and RhlR as well as the time to reach the equilibrium do not change much, that is, the engineered bacteria could be considered to have insignificant influence on the intracellular LasR and RhlR of the pathogenic bacteria. However, the equilibrium concentration of OdDHL, BHL both in and out of the cells of pathogenic bacteria decrease obviously, and the time to reach the equilibrium is shortened. This indicates that the engineered bacteria can evidently change the equilibrium state of OdDHL and BHL, and have a significant effect on the concentration reduction.
(A) |
(B) |
(C) |
(D) |
Figure 4 The sensitivities of OdDHL(A) and BHL(B) to the Km value of the enzyme towards OdDHL and the sensitivities of OdDH(C) and BHL(D) to the complex of the OdDHL/LasR.
After the modeling process above, we also use SimBiology in MATLAB to do the parameter sensitivity analysis. We want to use the analysis to see how much the changes of the Km value of the enzyme can affect the concentrations of both AHLs which will prove the importance of choosing a better enzyme to degrade AHLs. The results (not shown totally) show that OdDHL is quite sensitive to the Km value of the enzyme while BHL is not. This makes sense because this parameter is used to describe how well the enzyme can degrade OdDHL. But this also means that BHL is not sensitive to the change of OdDHL. As the connection between the two systems is the complex of OdDHL/LasR, we then analyze the sensitivity of the two AHLs to the complex. It turns out that both AHLs are not sensitive to the concentration of the complex. Together with the data above showing that the concentration of the complex of OdDHL/LasR is very low in the steady state, we find that this model doesn’t manifest the connection between the two AHL systems as we expect. This may be due to the parameters we get from the reference don’t fit our situation as they add more factors to their model. So the next step is to modify the parameters with more suitable data by experiments or other literature which can be done next year.
Table 1 Variables
Table 2 Parameters
In order to explore the growth rate of biofilm and the influence of the quorum sensing of pathogens on the growth of biofilm, a partial differential equation model of biofilm is established to simulate the growth process of biofilm. The idea of the model comes from the diffusion equation, which is based on the law of conservation of mass. And MATLAB is used to solve the model numerically. Although the model is difficult to solve, which means the ideal result is not completely obtained, it is still possible to determine the linear correlation between the growth rate of biofilm and time, and the growth perpendicular to the surface will be limited.
The basic assumptions are as follows:
1. Biofilms grow in a static medium and that their growth perpendicular to the surface is
restricted by saturated nutrients in the local population density.
2. The bacterial population is composed of two cell subsets corresponding to down-regulated and
up-regulated cells respectively. AHL was down-regulated and up-regulated at a fixed rate.
3. The rate of division of the two cell subsets is the same.
4. Biofilm is considered to be uniform, so cell density NT can be interpreted as proportional to
the local height H of biofilm.
5. The growth model of biofilm is only in one dimension.
Modeling ideas:
The process of biofilm’s growth is through a splitting unit pushing its adjacent units aside to
accommodate the extra volume. Thus, cell growth produces movement within the colony at a rate
described as v tangent to the solid surface. So the growth of biofilms can be described by
diffusion process, which is based on the law of conservation of mass, as following:
Supposing Ω is a single unit, is its border ∂Ω, n is the normal vector directing to the outward, f is the u generated per unit volume, J is the u flowing across Ω. Then from the conservation law above, in the case of locally generated u, we can get
Our PDE Model:
The generation of down-regulated cells per unit volume includes the
increase caused by cell
division and the conversion of up-regulated cells to down-regulated cells, as well as the
decline caused by the conversion of down-regulated cells to up-regulated cells induced by AHL.
The division rate of up-regulated and down-regulated cells is the same, and it is denoted as
taking the influence of carrying capacity into consideration. And one up-regulated cell can
divide into γ up-regulated cell and 2-γ down-regulated cell. The conversion rate of
down-regulated
cells to up-regulated cells induced by AHL is α, and the conversion rate of up-regulated cells
to
down-regulated cells is β. Thus the system of down-regulated cells is
The generation of up-regulated cells per unit volume includes the increase caused by cell division and the conversion of down-regulated cells to down-regulated cells, as well as the decline caused by the conversion of up-regulated cells to down-regulated cells induced by AHL. The rate of division of up-regulated is the same as ones of down-regulated cells. As saying above, one up-regulated cell can divide into γ up-regulated cell. So the number of up-regulated cells that actually proliferate once they complete a division is 1- γ. The conversion rate of down-regulated cells to up-regulated cells induced by AHL is α, and the conversion rate of up-regulated cells to down-regulated cells is β. Thus the system of up-regulated cells is
The change of AHL per unit volume includes includes the increase caused by diffusion and the secretion of both up-regulated cells and down-regulated cells, as well as the the decline of degeneration, escape into the external media and consumption of inducing the transformation of up-regulated cells to down-regulated cells. Where, the diffusion coefficient is D, and the secretion rate of down-regulated cells and up-regulated cells is and separately. The degeneration rate of AHL is λ, and the escape rate is assumed to be inversely proportional to the local cell density, where the proportionality coefficient is Q. Thus the system of AHL is
Let
Through numerical simulation of the above model by MATLAB, we can get the variation of biofilm size with time, which is shown as below:
Figure 1 The variation of biofilm size with time
As can be seen from this figure, with time going on, the size of biofilm is approximately linear with time, that is, the growth rate of biofilm is approximately constant. Substituting this result back into the system non-dimensionalised, it can be seen that at a fixed position, with the increase of time, and will eventually approach 0, which means an equilibrium state is reached. It also confirms the first assumption, the growth of both down-regulated and up-regulated cells perpendicular to the surface is restricted, is reasonable.
Table 1 Variables
Table 2 Parameters
Q: Why did you decide to use this language?
A: gro allows us to make a vivid movie for our quorum quenching system and make some calculations about
cell behaviour that are hard to be taken into account in traditional kinetics models.
Q: Why did you consider spatial distribution as an interesting problem?
A:
We started paying attention to it when executing programs with gro. We are very curious about
whether the spatial distribution of the quorum quenching system will have a profound effect on the
dynamics of this system and whether there is an optimal pattern for the highest quenching efficiency. We
can’t increase engineered bacteria without limitation because of the risk of inflammation storm.
What’s more, The spatial distribution of different cells/molecules in the biological system is a hot issue
in biology.
Q: How did you do for spatial distribution this year?
A:
We used gro to simulate what was happening in the laboratory and avoided problems we encountered
when trying to use math to depict complicated cell-cell interaction. However, we should seek for
more accurate methods for this problem in the future.
Connection with wet lab:
gro models help us to estimate the results of spatial distribution experiments, which we don’t perform
this iGEM season. It also might help us to explore if there is a best scheme when this method(quorum
quenching) is applied to clinical treatment.
gro is a cell programming language, which is designed for modeling and simulating the behaviors
of cells in silico [2]. This programming language allows users to consider growth, division, signal
receiving and sending, chemical reaction, and gene expression. Detailed behaviors of cells are
so essential in synthetic biology that increasing people attach more importance to it. In fact,
iGEM team from Imperial College London firstly used this language as an important part of their
model work in 2016, hoping that this program can act as a complement to their Simbiology
simulation.
Initially, we took this special language into consideration for our project for several
reasons:
gro allows us to visualize our microorganism system, thus helping us and readers to have a
better understanding of quorum sensing and quorum quenching;
Kinetics models mentioned above mainly focus on chemical substances and their quantitative
changes, while gro can provide us more information about cells (bacteria).
Consequently, codes were written to display our quorum quenching system. You can play the video
below to have a glimpse of the visual illustration of the system:
(A)
(B)
(C)
(D)
(E)
(F)
Figure 1 Results of gro simulation. Please see the text for more information. (A) Result of a system without engineered bacteria.(B-F) Selected results of a system with engineered bacteria.
Interestingly, we found something worth exploring further when executing the simulation above. It
was clear to see that the spatial distribution of different microorganisms had a conspicuous
effect on the things that occur subsequently.
Please image a game where you control one character to fight against two or more opponents. How
to make the most of your weapon is always a tricky problem, depending on your attack scope, your
opponents’ attack scope, and other factors.
This scene also exists in the biological world. In our quorum quenching system, engineered
bacteria have enzymes that can degrade AHL and destroy quorum sensing of pseudomonas aeruginosa.
However, if they are too far away from pseudomonas aeruginosa, it is difficult for them to use
their enzymes effectively and then protect human cells, owing to the degradation and transport
limitation of proteins. Similarly, AHL from pseudomonas aeruginosa can’t expand as they want. It
seems that we now have no mature techniques to accurately control relative positions of
engineered bacteria as we want, but we still attach much importance to this problem for these
reasons:
Spatial distribution is a common problem in almost all kinds of cell-cell dialogues. Hence, it
is meaningful to explore this problem as we can in our system. Additionally, the scale of
microorganisms we take into consideration is much larger than single cells. Quorum sensing
following by the formation of biofilm makes it possible for pseudomonas aeruginosa to take up a
relative place on the organ’s surface. It is believed that with the advent of PK/PD progress,
some medical treatments have the potential to achieve accurate control of the drugs.
Firstly, we tried to form a concise mathematical model for this spatial distribution problem.
The diffusion of AHL satisfies Fick’s law.
We wrote codes to display different situations where engineered bacteria and pseudomonas aeruginosa were positioned in the virtual plate. There is no denying that the initial distribution of different microorganisms (and chemical substances secreted by them) have a profound effect on the process of simulation, such as the pattern of different signals.
Yellow cells in these movies represent pseudomonas aeruginosa, while red cells represent engineered bacteria. Black cells refer to all kinds of cells that can’t produce fluorescent molecules, we can ignore these cells in the movie, but they will be included in the calculation. Pink signals are AHL, while blue signals mean engineered bacteria can destroy quorum sensing in these areas. Two movies above show different patterns after running because of their initial difference of spatial distribution.
(A)
(B)
(C)
(D)
(E)
(F)
(G)
(I)
Figure 2 Different patterns of spatial distribution and corresponding results of gro simulations. Please see the text for more information. (A)-(E) Five kinds of distributions and their results. (F)-(G) These two patterns share the same structure as (E), but the distance of two microorganisms has changed to different values. (I) A summary of (A)-(G)
If you are interested in our codes, please contact us without hesitation.
Q: Why did you perform this bioinformatics analysis?
A: Ancestral protein reconstruction and enzymatic promiscuity are two notions that gain more and more
popularity nowadays. We want to create an enzyme that is not only efficient but also broad-spectrum,
so we tried to reconstruct ancestral proteins to analyze in silico at first.
Q: What software/web-server did you use?
A:
NCBI BLAST for sequences collection;
MEGA-X MUSCLE for multiple sequences alignment;
Mrbayes and GRASP server for reconstruction;
Figtree and iTOL for tree drawing;
EXPASY SWISS-MODEL and Phyre2 for protein structure prediction.
Connection with experiments:
Obtaining ancestral proteins is an essential step for us to modify/create enzymes. Ancestral
proteins provide us more residues information which is not available when we perform traditional
biochemical researches, like site-specific mutagenesis. These sequences can be analyzed and then
synthesized in wet lab to perform enzymatic experiments.
Promiscuity is an interesting property of some enzymes, ancestral proteins particularly. Most
evolutionists suppose that proteins tend to be more and more specific during the process of
molecular evolution. That is to say, ancestral proteins are more likely to be able to recognize
different substrates and catalyze different reactions, compared with extant enzymes. Utilizing
promiscuous enzymes is an appealing strategy of all kinds of organisms. The limitation, ranging from
sequence space to energy and nutrients, make it necessary for ancient organisms to utilize
multifunctional enzymes to cover basic needs for lives. This phenomenon reveals an interesting
trade-off during evolution. There are some downsides of promiscuous enzymes. The most obvious one is
that its catalytic efficiency is relatively lower than more substrate-specific modern enzymes.
However, if a biologist can obtain an ancient enzyme, he can modify its property with the aid of
many methods, like directed evolution, mechanism-based site-specific mutagenesis and so on. So an
ancient enzyme is promising in synthetic biology, despite its unsatisfying catalytic efficiency.
So how to get an ancestral protein sequence? With the advent of bioinformatics and advanced
mathematical modeling in biology, resurrecting ancient molecules becomes more and more tangible.
The phylogenetic analysis provides us an opportunity to figure out the evolution of not only species,
but also more small things, including circuits, molecules, and so on. In the field of protein
engineered, ancestral protein reconstruction is regarded as a useful method and a perfect
complement of directed evolution [4,5].
In our project, most enzymes we chose to use this year, like PvdQ, are all substrate-specific, while
AHL contains a series of organics with different lengths of carbon chains, including OdDHL and BHL.
Although AiiA is considered as an enzyme that has the potential to catalyze different reactions, its
catalytic activity is much higher when interacting with substrates possessing long carbon chains. So
ancestral protein reconstruction may help us to explore some key residues that play a significant
part in promiscuity and even design an enzyme whose activity is high no matter what AHL it meets in
the near future.
Now procedures of PvdQ sequences reconstruction will be given as an example to illustrate our
methods and results.
Firstly, we selected extant sequences to build our analytic set. Based on the PvdQ sequence, homologous
sequences were retrieved from NCBI ref-seq database, with the help of BLAST. We selected sequences
with 30-90% identical residues and removed sequences with higher identity, as suggested by most
papers. The number of sequences in our analytic set ranged from 35-60, which was a relatively small
set.
Then, a multiple alignment was created for subsequent analysis. To achieve this goal, we utilized
MEGA-X software [6]. Protein sequences were aligned with the support of the MUSCLE tool in MEGA-X.
Furthermore, the method of maximum likelihood(ML) was used to infer the best substitution model for
phylogenetic tree construction. The table below presents the top ten models for this sequence set,
according to different indexes, including AICc, BIC, and LnL. It was clear to see that the model
LG+G+I+F was the best substitution model in this run, namely the LG model with Gamma-distributed rates
and frequency plus Invariant Sites. We used this model when reconstructing ancestral sequence in
GRASP web-server.
Table 1 Substitution model results from MEGA-X.
The next stage was to compute a phylogenetic tree. There are lots of state-of-the-art computational approaches to fulfill this target, including ML and Bayesian inference. We used Mrbayes 3.2.7 [7], the mechanism of which is based on the notion of posterior probabilities, as our analytic platform. Most default settings were maintained for our analysis, with the exception of rates and some parameters in the command mcmc. Three tables below show key parameters in our run.
Table 2-4 Parameters used in Bayesian analysis.
Figure 1 Phylogenetic tree of PvdQ protein, generated by iTOL.
Finally, we tried to reconstruct ancestral sequences. The extant sequence set and the tree
determined above formed the basis for the computation of ancestral sequences. Most advanced
phylogenetic platforms can both create the tree and analyze the ancient states, such as Mrbayes. As
for our project, we constrained a set of all taxa in the sequence set, let the program report the
results and got the ancestral sequence for all taxa after analysis.
What’s more, there are increasing web-server designed for biologists to reconstruct ancestral
sequence, like FASTML and PhyloBot. We also used Graphical Representation of Ancestral Sequence
Predictions (GRASP) [9], an up-to-date web-server open to the public in 2020 for our reconstruction.
Here we present results for PvdQ analysis with GRASP web-server.
Figure 2 Results of GRASP reconstruction, including a tree, a multiple sequence alignment and a box for ancestral sequence presentation.
Solid edges are fully supported by bi-directional edge-parsimony. These links are strong candidates
for how an ancestor can be assembled. A dotted edge is maximally parsimonious when only one of the
nodes it joins is considered. These edges are weaker candidates but are kept to allow full-length
ancestors to be formulated in the absence of the stronger option above.
GRASP employs exact probabilistic inference; it distinguishes between two types of
reconstructions:
Joint reconstruction: GRASP infers the most probable combination of all ancestral states. The
assignment of all ancestral states (across the whole phylogenetic tree) optimizes a single objective
(the joint state with the greatest likelihood). This will tell users which are the most probable
ancestors, given the extant sequences and their phylogenetic relationships.
Marginal reconstruction: GRASP infers the probability distribution of each residue for a single,
nominated ancestor. This inference will present you with the probability of amino acids at each
position in the nominated ancestor, given extant sequences and their phylogenetic relationships.
For detailed information, please see http://grasp.scmb.uq.edu.au/.You can also click here to
download files we provide for you and have a try to reconstruct ancestral sequence with GRASP
immediately.
After obtaining sequences of all nodes of the tree, we selected some appropriate sequences(nodes)
according to the branch length among nodes and the place where the node is positioned in. Generally,
sequences that appear later during evolution were discarded, while attempts were made to select
ancestral sequences that were ancestors of many extant sequences. In the meanwhile, we made sure
that these extant sequences were not strongly related to each other in order to allow more
significant mutations to stay in the ancestral sequences. We finally chose seven ancestral sequences
of AiiA and five sequences of PvdQ.
Then, BLAST was used to detect the highest identity between extant sequence and ancestral sequence.
Results are showed in the table below.
Table 5 The highest sequence identity of selected ancestral protein sequence.
Sequence PvdQ_N0, PvdQ_N1, AiiA_N4, AiiA_N6, and AiiA_N8 were well-suited to our analysis, with the
highest identity 84.80%, 78.93%, 95.60%, 94.00%, 80.08% respectively. The reason why we chose these
five sequences was that their corresponding nodes during evolution suggested that they might be
more likely to show enzymatic promiscuity. What’s more, the highest sequence identity can’t be too low
to ensure the accuracy of protein structure prediction, while sequences highly similar to extant
sequences may make us ignore some key mutations. Actually, attempts could be made to examine all
sequences to avoid mistakes or omissions, but it took us too much time to achieve that goal this
year.
Next, SWISS-MODEL [10] was used to predict protein structures toughly. After that, we utilized Phrye2
(Protein Homology Recognition Engine V 2.0) web-server [11] for accurate protein structure prediction.
It is a state-of-the-art protein bioinformatics tools, using remote homology detection methods to
build protein structures, predict ligand binding sites, and analyze the effect of mutations on
structures.
Key steps of this server include:
Backbone determination (α-carbon), namely placement of corresponding residues according to
structure templates.
Loop modeling for insertions or deletions spare parts algorithm; from a special loop library.
Side-chain positioning. For those conserved side chains, the algorithm will run according to
structure templates; for those varied ones, other sophisticated algorithms will be used.
Model refinement, namely energy minimization.
We submitted five sequences mentioned above to this server and performed intensive modeling, after
which results were sent back in a few hours. The highest structure identity of each protein was showed
below. It is acknowledged that sequences with more than 25% identity over an alignment of 80
residues or more adopt the same basic structure. All of the figures of identity of our sequences reach
70%, so it is obvious that the proteins we submitted were all well-suited to this modeling method, with
100% confidence in this web-server. These structures will be used to analyze and modify in the next part.
Table 6 The highest structure identity of selected ancestral protein sequence.
Table 7 RMSD between different protein structures.
Q
: What did you do to analyze structure?
A:
Sequence alignment, structure alignment, molecular docking, virtual mutation and so forth have been
tried this year. We select important residues through 2 ways: differences found in the alignment of
extant sequence and ancestral sequence; literature. These residues were analyzed as further as we
can in phase 1.
Q
: Why did you reconstruct ancestral protein before the structural biology survey?
A:
Some residues that exist in ancestral sequences may play an important role in broad target scope. So
ancestral sequences may inform us of some key residues that can’t be inferred in a traditional way. In
fact, we successfully found some residues worth exploring, such as PHE68, GLU136, and TYR194.
Q
: How could in silico methods be used in an improved screening workflow?
A:
It's proposed that the process of drug screening starts from in silico methods to generate massive
screening, e.g. from 1000 candidates to 100. Then in vitro platforms are used to facilitate rapid
and high-throughput testing of in silico results, e.g. from 100 candidates to 10. Ultimately, the effects of
several promising candidates are verified via in vivo systems.
Connection with wet lab:
We plan to design and synthesize sequences according to our analysis, and some enzymatic experiments
will be done to check if our analysis is right. We also can perform simulations that are led by
experiments.
The results generated by AutoDock Vina will be directly tested in cell-free quorum sensing pathways,
then verified in Pseudomonas aeruginosa if promising.
We successfully got some ancestral sequences and corresponding protein structures in part 3.
Structural analysis was now important for our design. Information obtained from in silico
analysis can inform us what residue is worth paying attention to and how can we optimize our
enzymes. In this part, Discovery Studio and PyMOL were used to perform a series of projects
including structural alignment, virtual amino acid mutation, molecular docking and molecular
dynamics(MD) and so forth.
The enzyme we analyzed here was AiiA, a metal-dependent N-acyl homoserine lactone hydrolase that
displays broad substrate specificity but has a preference for substrates with long N-acyl
substitutions(carbon chains). The reason why AiiA is more appropriate for our analysis is that
it has a relatively simple structure and its catalytic mechanism has been clearly revealed. An
the addition−elimination reaction leads to ring-opening, with two zinc ions bound together in a
the dinuclear cluster at the enzyme’s active site [12].
Figue1 Reaction mechanism of C6-HSL catalyzed by AiiA.
Firstly, we performed advanced molecular docking for AiiAWT protein(PDB code: 3DHB) and C6-HSL. Small molecules were drawn and optimized in Kingdraw software. Water molecules were removed from protein, while some residues and hydrogen atoms were added to the protein to make its structure complete. Receptor binding sites are very important in molecular docking and molecular dynamics. In this part, we defined the receptor binding site according to where the ligand was positioned in the protein. We also checked to make sure that key residues were included in the site we defined. The radius of the sphere was set to 8 angstroms. Molecular docking simulations of the 10 lowest-energy poses were done using the grid-based approach CDOCKER, a molecular dynamics simulated annealing-based algorithm in which the receptor is held rigid while the ligands are allowed to flex during the refinement, implemented in Discovery Studio. All of the structures were energy-minimized using force field CHARMm. The RMSD threshold was set to 0.5 angstroms to ensure as much diversity as possible in the conformation.
Figure 2 The ligand-binding pocket of AiiA protein, generated by PyMOL. Green molecule is C6-Hse, yellow molecules and white molecules are important residues mentioned below.
(A)
(B)
(C)
Figure 3 (A)-(C) Residue interaction histogram for AiiA and C6-HSL docking.
Figure 4 Heatmap of interaction between important residues of AiiA and C6-HSL. Please note that in this picture we mistake the molecules’ name, so the name “3DHB” refers to C6-HSL.
We analyzed every ligand poses and the results are pictures showed above. It is clear to see
that PHE64, PHE68, HIS106, PHE107, ASP108, and GLU136 strongly interact with the ligand,
consistent with the conclusions of many pieces of research. HIS106, PHE107, ASP108 contribute most hydrogen
bonds, while both PHE64 and PHE68 provide hydrophobic interaction.
Next, we performed CDOCKER for AiiA_N8 protein and C6-HSL. The difference between these two
simulations was that we defined the receptor binding sites here based on protein cavities,
instead of the method used in the first docking task.
Figure 5 Interaction between important residues of AiiA_N8 and C6-HSL.
(A)
(B)
(C)
(D)
(E)
(F)
Figure 6 (A)-(F) Residue interaction histogram for AiiA_8 and C6-HSL docking.
Figure 7 Heatmap of interaction between important residues of AiiA_N8 and C6-HSL. Please note that the name “molecule1” refers to C6-HSL.
Subsequently, we aligned sequence of AiiA protein and AiiA_N8 protein, and carefully examined their differences. Here, EsPript 3 web-server [13] was used to make the align picture showed below.
Figure 8 Alignment of AiiA and AiiA_N8 sequences.
We found that most different residues of these two sequences were located outside the activity
site or even the surface of this protein, such as Ser14 and Arg14. We proposed that most of
these residues acted as a role of stabilizing the protein, that is to say, these mutations were
likely to be neutral. Three pair of promising residues which were sited near the activity site,
were found after examination: ALA20 and Ser20, Arg 68 and PHE68, Asp136 and Glu136(the former
residue corresponds to residues of AiiA_N8). These results were recorded for further
analysis.
Then, virtual amino acid mutations based on interaction forces were rendered to AiiA protein. We
performed this simulation to see if there were some mutations that can enhance the interaction
between AiiA protein and C6-HSL. Firstly, force field CHARMm was applied to the protein-ligand
system. Amino acid residues within 3 angstroms around the ligand were selected and mutated to
ALA in turn(single mutation). The result is shown below.
Figure 9 Report form of selected residues of AiiA mutated to ALA. Effects are judged by the energy when bound to C6-HSL.
Figure 10 Report form of 7 key residues of AiiA mutated to other 19 amino acids. Effects are judged by the energy when bound to C6-HSL.
We decided to examine three pairs of different residues mentioned above carefully. The first one we analyzed was residue 68, because it was located in the activity site. PHE68 of AiiA was mutated to other nineteen amino acids and we calculated the mutation energy.
(A)
(B)
Figure 11 (A)-(B) Report form of PHE68 of AiiA mutated to other 19 amino acids. Effects are judged by the energy when bound to C6-HSL.
At the next stage, we changed the substrate to C5-HSL and performed the same simulations.
Figure 12 Heatmap of interaction between important residues of AiiA and C5-HSL.
We mutated residues residing in the sphere located in the activity site, with the radius of 4.5 angstroms, to ALA.
Figure 13 Report form of selected residues of AiiA mutated to ALA. Effects are judged by the energy when bound to C5-HSL.
As a consequence, we mutated PHE68 to other nineteen amino acids. Results were shown below.
(A)
(B)
Figure 14 (A)-(B) Report form of PHE68 of AiiA mutated to other 19 amino acids. Effects are judged by the energy when bound to C5-HSL.
All mutations were neutral, which convinced us that PHE68 was not important in terms of binding and catalysis. According to evolutionary theory of enzymes, PHE68 may exist to stabilize the whole molecule.
Similar simulations were done for the substrate C8-HSL/C10-HSL and results were shown in pictures below.
Figure 15 Heatmap of interaction between important residues of AiiA and C8-HSL.
Figure 16 Report form of selected residues of AiiA mutated to ALA. Effects are judged by the energy when bound to C8-HSL.
Figure 17 Report form of PHE68 of AiiA mutated to other 19 amino acids.
Figure 18 Heatmap of interaction between important residues of AiiA and C10-HSL.
Figure 19 Report form of selected residues of AiiA mutated to ALA. Effects are judged by the energy when bound to C10-HSL.
Figure 20 Report form of PHE68 of AiiA mutated to other 19 amino acids.
Overall, results shown above tell us two things:
1. Some residues, especially TYR194, play an important role in catalysis and binding when the
substrates have long carbon chains. TYR194 is conserved in our sequence data set, but attempts
can be made to mutate this residue to see whether it hinders the enzyme to be
multifunctional.
2. It seems that PHE is more important when the substrates have long carbon chains, but it may
only act as a stabilizing residue. This residue is worth exploring and more accurate methods
like molecular dynamics should be utilized. From literature we know PHE68 and PHE 64 form a
hydrophobic clamp, making an interaction that would also be available for longer substrates, but
not for shorter substrates. This residue is a promising one that should be done with lots of
experiments in phase 2.
We also analyzed the other 2 pairs of different residues found in alignment, ALA20, and Ser20,
Asp136 and Glu136. Single site mutation was performed in both residue 20 and residue 136.
Firstly, we examined the binding of C10-HSL and AiiA.
Figure 21 Report form of SER20 of AiiA mutated to other 19 amino acids. Effects are judged by the energy when bound to C10-HSL.
Figure 22 Report form of GLU136 of AiiA mutated to other 19 amino acids. Effects are judged by the energy when bound to C10-HSL.
Results show
that, ALA20 and SER20 are both well-suited to this binding event. However, GLU136 is more
important compared with ASP136, because the mutation to ASP destabilized the binding.
Similar simulations were executed to AiiA and C5-HSL.
Figure 23 Report form of SER20 of AiiA mutated to other 19 amino acids. Effects are judged by the energy when bound to C5-HSL.
Figure 24 Report form of GLU136 of AiiA mutated to other 19 amino acids. Effects are judged by the energy when bound to C5-HSL.
Themutation
from SER20 to ALA20 didn’t influence
the whole system. Interestingly, the mutation fromGLU136 and ASP 136 didn’t affect the binding
event either, compared with the binding of
C10-HSL and AiiA.
These results may suggest that GLU136 is important for catalysis of substrates with long carbon chains and didn’t play a significant role of binding C5-HSL. This residue can be modified in the future.
In our project, in vitro platforms driven drug screening was considered as a powerful alternative to currently used in silico-in vivo methods. The well-known in silico methods include molecular docking, which is founded on the lock-and-key model. To illustrate the strengths of our cell-free quorum sensing systems, an improved workflow combined with in silico methods needs to be proposed and practiced. The new workflow encompassed in silico docking, in vitro rapid and high-throughput screening, and later in vivo testing for verification. Through investigating literature, we found salicylic acid is an inhibitory ligand to both LasI (PDB ID: 1RO5) and LasR (PDB ID: 2UV0), and how its structural analogs interact with these two receptor proteins remain uncertain. Thus we wanted to screen the structural analogs of salicylic acid in silico and test promising candidates (with salicylic acid as the standard) in the cell-free quorum-sensing systems for proof-of-concept.
AutoDock [14] was a set of well-developed molecular docking softwares, and AutoDock Vina was its newest version with enhanced performance. We decided to use AutoDock Vina to facilitate the virtual screening of quorum sensing inhibitors. Additionally, the free database of commercially available compounds ZINC [15] was used. We downloaded the mol2 files of salicylic acid (ZINC1554) and its structural analogues in ZINC. Furthermore, PyMOL was used to delete the original ligand and waters in LasR.
Figure 1 structural analogues of salicylic acid in ZINC
To predict the binding pocket of receptor proteins for more accurate docking, we used a user-friendly webserver COACH [16] . The results for LasI were displayed here, with the most promising binding center predicted as (44.963, -15.387, -15.641).
Figure 2 predicting the binding pocket of LasI by COACH
The results for LasR were displayed here, with the most promising binding center predicted as (22.403, 15.683, 79.699). Based on these information, the subsequent targeted docking will perform faster and more accurate than blind docking.
Figure 3 predicting the binding pocket of LasR by COACH
Unfortunately, due to uncharacterized bugs and limited time, the results of virtual screening using AutoDock Vina cannot be displayed, also, our efforts to reconstitute the cell-free quorum sensing pathways didn’t lead to success this year. However, we anticipate the above-mentioned in silico workflow combined with in vitro reconstituted pathways has the full potential for rapid and high-throughput screening of candidate inhibitors to quorum sensing-associated proteins, and we will endeavour to accomplish this work after the iGEM season.
Q: Why did you build this model?
A:
Fibrosis is an important medical phenomenon. For example, fibrosis of lung is a vital indicator
for doctors. Pseudomonas aeruginosa
can also make people suffer from fibrosis, so we think it is interesting to learn more about
this phenomenon from the perspective of circuits
and math.
Connection with experiments:
We plan to perform co-culture of different immune cells in phase 2, so this model adapted from
some excellent papers acts as our lamp
when designing tasks in the wet lab.
Inflammation is one of the most important medical phenomena, which can cause more serious
diseases, including fibrosis and even
cancer. Pseudomonas aeruginosa will enter lungs of human beings and activate the immune system,
after which lead to inflammation
in lungs. Inflammation is set off to fight against pathogens. During this process, blood vessels
will expand to give room for different kinds
of immune cells. To seal up the injury as fast as possible, fibrous materal is required to cover
the injury. This process depends on a cell-cell
dialogue, which happens between macrophages and myofibroblasts. Macrophages will secret some
kinds of growth factors to activate
fibroblasts, after which fibroblasts tend to change their shapes and function, before becoming
myofribroblasts. This is a concise description
of why the fibrosis happens in tissues. In reality, fibrosis is a key medical imaging indicator
in X-ray/MRI medical examination. One of the most
significant example is that whether the lung of a patient contracting with Covid-19 is fibrous
has become an essential medical criterion. A
patient with a fibrous lung is more likely to die of pneumonia. As a consequence, it is vital to
learn more about this process (fibrosis) and try to
put up with some ideas to prevent fibrosis if one is diagnosed with pneumonia because of
bacteria(pseudomonas aeruginosa), virus
(SARS-Cov-2) and other factors. In this part, we tried to illustrate this key phenomenon
happening in inflammation with our mathematical
model adapted from a list of interesting papers [17,18].
Figure 1 An illustration of interaction system of two kinds of immune cells
The myofibroblasts, F, produce a specific growth factor for themselves at the rate of β2 , called PDGF, which we will denote as c1. c1 also can be produced by macrophages, M, at the rate of β1. In addition, it is endocytosed by F at the rate of α1 , and degrades at the rate of γ .Thus the system of c1 corresponds to
Since the time scale of c1’s production and degradation is smaller than that of cell’s division and death, we can assume that c1 has reached their steady state. In other words, growth factors c1 are in quasi-steady-state, which means , thus
The myofibroblasts proliferate at the rate (at certain growth factors) of λ1, and remove at the rate of μ1. Taking the carrying capacity (K) into account, we multiply division rate by the term 1-F/K. And we can normalize F concentration by it carrying capacity K, so that the carrying capacity term is 1. Thus the system of F corresponds to
So if we substitute equation (2) into equation (3), we can get
The myofibroblasts (F) can also produce a specific growth factor for the macrophages (M) at the rate of β3, called CSF, which represented as c2. In addition, c2 is endocytosed by M at the rate of α2, and degrades at the rate of γ . Thus the system of c2 corresponds to
Similar with c1 ,we can assume growth factors c2 are in quasi-steady-state, which means , thus
The macrophages, M, proliferate at the rate (at certain growth factors) of λ2, and remove at the rate of μ2. Unlike myofibroblasts (F), The growth of M is not limited by the carrying capacity.Thus the system of M corresponds to
So if we substitute equation (6) into equation (7), we can get
We can find points of stability in the system by plotting Nullclines. First, by drawing a curve satisfying and , we can find the system's fixed point.Then, by analyzing the Hessian matrix at each fixed point, or analyzing the trend of the phase diagram near the fixed point, we can determine whether the fixed point is a stable point of the system. The results are shown in the figure below.
Figure 2 Nullclines. The blue line is the solve of , and the red line is the solve of . The solid points are the stable fixed points, and the hollow points are the unstable fixed points.
The impact of trauma on ultimate homeostasis is analyzed below. Assuming that the corresponding proliferation rate of M at the time of trauma is I(t), (7) is modified to the following expression
According to the above analysis, the system has two stabled state, namely, final recovery or fibrosis, denoted as OFF-state or ON-state respectively.If the trauma do not last long enough,the dynamics stays within the basin of attraction to the OFF states, just as the diagram 1 shows. So the dynamics will finally go to the OFF state. On the contrary, if the trauma do last long enough, or if the trauma is multiple, the dynamics will cross the separatrix just as the diagram 2,3 show. Thus the dynamics will finally go to the ON state.
Diagram 1 If the trauma do not last long enough, the dynamics, which stays within the basin of attraction to the OFF states, ends up with the healing states(adapted from a list of papers [17,18]).
Diagram 2 If the trauma do last long enough, the dynamics, which cross the separatrix, ends up with the fibrosis states(adapted from a list of papers [17,18]).
Diagram 3 With multiple traumas, the dynamics, eventually crossing the separatrix , ends up with the fibrosis states(adapted from a list of papers [17,18]).
Table 1 Parameters
Q: Why did you build this kinetics model?
A: At first, we planned to execute experiments about this module. Teammates in the wet lab were curious
about the quantitative features of
this system, so we built this model for our experiments. We finally had no time to perform this
experiment, so figures can’t be got for
calculation. This model will be used in phase 2, when we have enough time to complete this
part.
Connection with experiments:
Please see Q&A above.
As described
in Part “design”, we utilized PQS as
an inducer for a inducible promoter, namely PqsR promoter. The process, PQS binding pqs promoter
and activating the transcription of
related genes, can be quantitatively described with Michaelis-Menten equation,
where a
denotes activators and p denotes pqs
molecules. ω is a parameter that should be scanned after obtaining experimental data, while Kd
is a parameter similar to Michaelis constant Km.
Similarly,
activators produced from the event
mentioned above are tend to interact with the induced promoter and lead to transcription of
genes associated with immunity, so we can also
describe this process with Michaelis-Menten equation,
where c
denotes cytokines. κ is a parameter that
should be scanned after obtaining experimental data, while Kd2 is a parameter similar to
Michaelis constant Km.
Compared with
inducible promoter, constitutive
promoter can’t be perfectly depicted with Michaelis-Menten equation. Statistical physics gave us
a reasonable way to quantify this common
biological component19,
where E
denotes TEV protease, Nns denotes the
number of DNA, P is the number of polymerase, this ratio can be estimated from literature. β is
boltzmann factor and it should be multiplied by
the energy difference of 2 states(namely, DNA bound to or not bound to polymerase).
TEV protease
will separate cytokine and sec-tag,
thus hindering the excretion of cytokines. This reaction conforms to the principles of
Michaelis-Menten equation, so a Michaelis-Menten
equation is built for it,
where kcat
and Kd3 are parameters that should be
determined from experimental data.
Due to the
fact that transportation of cytokines among
cells requires both energy and nutrients, that is to say, this process will be confronted with
the limitation, the excretion of cytokines can be
modeled like this,
where η is
the constant of degradation.
Finally, the
concentrations of total cytokines,
cytokines(in the cell) and cytokines(outside the cell)can be expressed:
where Ci
denotes cytokines in the cell and Ce
denotes cytokines located outside the cell.
We believe
that our models are promising enough,
we can at least do these things in phase II:
1. Adjust parameters of our quorum sensing kinetics models in according with experiment
figures.
2. Find better methods to replace gro like Simmune suit developed by NIH(we have no time to
learn more about this suit this year) or use
an advanced version gro developed in 2017.
3. Expand our analysis sequence set and synthesize ancestral sequences for experimental
proof.
4. Keep analyzing protein structures and try to use more advanced methods like molecular
dynamics. Necessary experiments are required
to demonstrate our structural analysis. Molecular docking of LasR/LasI should be done as
well.
5. Try to do some experiments related to TEV protease and inflammation system to show that our
models built for these two designs are right.
1. Fagerlind, M. G. et al. The role of regulators in the expression of quorum-sensing signals in
Pseudomonas aeruginosa. Journal of molecular microbiology and biotechnology 6, 88-100 (2003).
2. Jang, S. S., Oishi, K. T., Egbert, R. G. & Klavins, E. Specification and simulation of
synthetic multicelled behaviors. ACS synthetic biology 1, 365-374 (2012).
3. Alon, U. An introduction to systems biology: design principles of biological circuits.
(CRC press, 2019).
4. Merkl, R. & Sterner, R. Ancestral protein reconstruction: techniques and applications.
Biol Chem 397, 1-21, doi:10.1515/hsz-2015-0158 (2016).
5. Joy, J. B., Liang, R. H., McCloskey, R. M., Nguyen, T. & Poon, A. F. Ancestral
reconstruction. PLoS computational biology 12, e1004763 (2016).
6. Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary
genetics analysis across computing platforms. Molecular biology and evolution 35, 1547-1549
(2018).
7. Ronquist, F. & Huelsenbeck, J. P. MrBayes 3: Bayesian phylogenetic inference under mixed
models. Bioinformatics 19, 1572-1574 (2003).
8. Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v4: recent updates and new
developments. Nucleic acids research 47, W256-W259 (2019).
9. Foley, G. et al. Identifying and engineering ancient variants of enzymes using Graphical
Representation of Ancestral Sequence Predictions (GRASP). bioRxiv, 2019.2012. 2030.891457
(2020).
10. Waterhouse, A. et al. SWISS-MODEL: homology modelling of protein structures and
complexes. Nucleic acids research 46, W296-W303 (2018).
11. Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. The Phyre2 web
portal for protein modeling, prediction and analysis. Nature protocols 10, 845-858 (2015).
12. Liu, D. et al. Mechanism of the quorum-quenching lactonase (AiiA) from Bacillus
thuringiensis. 1. Product-bound structures. Biochemistry 47, 7706-7714 (2008).
13. Gouet, P., Robert, X. & Courcelle, E. ESPript/ENDscript: extracting and rendering
sequence and 3D information from atomic structures of proteins. Nucleic acids research 31,
3320-3323 (2003).
14. Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with
a new scoring function, efficient optimization, and multithreading. Journal of computational
chemistry 31, 455-461 (2010).
15. Sterling, T. & Irwin, J. J. ZINC 15–ligand discovery for everyone. Journal of chemical
information and modeling 55, 2324-2337 (2015).
16. Yang, J., Roy, A. & Zhang, Y. Protein–ligand binding site recognition using
complementary binding-specific substructure comparison and sequence profile alignment.
Bioinformatics 29, 2588-2595 (2013).
17. Adler, M. et al. Endocytosis as a stabilizing mechanism for tissue homeostasis.
Proceedings of the National Academy of Sciences 115, E1926-E1935 (2018).
18. Adler, M. et al. Principles of Cell Circuits for Tissue Repair and Fibrosis. Iscience
23, 100841 (2020).
19. Phillips, R., Kondev, J., Theriot, J. & Garcia, H. Physical biology of the cell.
(Garland Science, 2012).