Team:WHU-China/Model

Proof-of-Concept

Overview

The model part in iGEM provides us an opportunity to attempt some interdisciplinary methods, including computer science, chemistry, physics, math and so on. One of the most important features of modeling is that we can design experiments reasonably and have a better knowledge of some quantitative features of the biological system after modeling. Interestingly, models can also help us to discover something novel and achieve goals that can’t be fulfilled with only traditional experiments.
In this year, kinetics models are utilized to help to design our experiments and estimate results for some tasks that can’t be completed in the wet lab. We also use a cell programming language to visualize our system and act as a complement of kinetics models. Furthermore, bioinformatics and structural biology are combined to help us have a better knowledge of our enzymes and provide a firm grounding for protein design for us in phase 2. Now see texts below for detailed information.

Kinetics Models of Quorum Sensing

Q: What is your inspiration of building this model?
A: When we firstly thought about what may be the target of our models, we put forward building a quorum dynamics model at once. Because we wanted to get an overview of the quorum sensing system, to see how it works and the effects our quenching enzymes can have on it. And it may also show the connection between the two quorum sensing systems and help us find the enzymes with the best overall inhibition of the two systems. So after brainstorm we decided to make it.

Q: How do you want to improve this model in the future?
A: We simplify the quorum sensing systems of the Pseudomonas aeruginosa in this model a lot so we really want to take more factors into consideration in the future to make it more similar to the real situation. And as the results don’t show the connection between the two quorum sensing systems well, we hope to find more accurate parameters or conduct more experiments for data to modify the present model.

Q: Why did you use this model to simulate biofilms?
A: The biofilm model we established is based on the diffusion equation and the law of conservation of mass, which has certain theoretical basis and can simulate the growth of biofilm well.

Q: What are the advantages and disadvantages of biofilm model?
A: Since the model has clear physical and biological significance, it can make the growth of biofilm easier to understand. At the same time, using the form of partial differential equation, the dimension and the condition of a definite solution can be changed to meet the needs of different cases. But because of the complexity of the form of the partial differential equation, it is difficult to solve the model.

Q: How do you want to improve this model in the future?
A: We want that the partial differential equation of the biofilm model can be further simplified without changing the original significance, so as to make the solution more convenient.

Connection with wet lab:
The quorum dynamics model helps us analyze the data we got from the experiments with quorum quenching enzymes. In this way, we can rapidly find the effects of different enzymes with different Km values towards the two AHLs have on the pathogens’quorum sensing systems. And then we can choose a better enzyme to inhibit the toxicity of the pathogens. The biofilm model can help us simulate the growth process of biofilm, so as to predict the growth of biofilm and the influence of quorum sensing on it without doing wet experiments. In this way, it may provide guidance the design of wet lab.

Quorum Sensing

To prove that enzymes from our engineering probiotics can actually decrease the AHLs concentration in the pathogens effectively and help us see the effects of using different enzymes directly (search for better enzymes), we build a quorum dynamics model composed of ODEs to simulate the process of quorum sensing and quenching in Pseudomonas aeruginosa. And we analyze the model with MATLAB. Although with some problems existing, this model still successfully shows that the enzymes can make a big influence of the AHLs in the pathogens that can inhibit the toxicity of them.

We start with these assumptions:
1. We assume that protein turnover is slower than the degradation of mRNA and that there is no significant post-transcriptional regulation;
2. It is also assumed that there is no shortage of substrate for autoinducer synthesis. As a consequence, there is no need to explicitly model the biosynthesis of OdDHL and BHL by the LasI and RhlI synthases;
3. Production of LasR, RhlR, OdDHL, and BHL are assumed to follow Michaelis- Menten kinetics;
4. The OdDHL and BHL’s transmembrane diffusion coefficients of engineered bacteria are the same as those of pseudomonas aeruginosa (, );
5. Spontaneous degradation and background synthesis of non-swarm sensitive factors in extracellular environment and engineered bacteria;
6. Group quenching enzyme is Mihalanase. [1]

Our model layout is as follows:

For Intracellular pathogens:

LasR:
According to the law of mass action, R1 forms the complex formation at a rate of k1, while the complex decomposes to R1 at a rate of k2. R1 degrades at the rate of bR1. Based on the Michaelis Menten equation, the production of LasR is also determined by C1, VR1, and KR1. VR1 is the maximum rate of LasR production. KR1 is a constant that determines the affinity between the complex formation (C1) and the LasR promoter. And we assume LasR to be constitutively expressed at rate R10. Thus the system corresponds to

RhlR:
Similar to R1, the system of RhlR corresponds to

Complex formation:
According to the law of mass action, C1 associates at rate k1 and dissociates at rate k2. In the same way, C2 associates at rate k3 and dissociates at rate k4, so the system corresponds to

OdDHL:
According to the law of mass action, A1 forms the complex formation at a rate of k1, while the complex decomposes to A1 at a rate of k2. A1 degrades at the rate of bA1. Based on the Michaelis Menten equation, the production of OdDHL is also determined by C1, VA1, and KA1. VA1 is the maximum rate of OdDHL production. KA1 is a constant that determines the affinity between the complex formation (C1) and the OdDHL promoter. And we assume OdDHL to be constitutively expressed at rate A10. A1 diffuses outward at the rate of d1β. d1 is the Diffusion coefficient, and β is the ratio of pathogen to extracellular volume. Thus the system corresponds to

BHL:
Similar to A1, the system of BHL corresponds to


For extracellular part:

OdDHL:
A1 flows out of the cell at the rate of d1β. d1 is the Diffusion coefficient, and β is the ratio of pathogen to extracellular volume. Thus the system corresponds to

BHL:
Similar to A1ex, the system of extracellular BHL corresponds to

After the addition of engineered bacteria, considering that the extracellular OdDHL and BHL enter into the engineered bacteria through diffusion, equations (7)-(8) are changed into equations (9)-(10) as follows:

OdDHL:

BHL:


For intracellular engineered bacteria:

OdDHL:
Based on the Michaelis Menten equation, the decomposition of OdDHL in engineered bacteria is determined by VA1in, and KA1in. VA1in is the maximum rate of OdDHL decomposition. KA1in is a constant that determines the affinity between the quenching enzyme and the OdDHL promoter. A1in diffuses outward at the rate of d3α. d3 is the Diffusion coefficient, and α is the ratio of engineered bacteria to extracellular volume. Thus the system corresponds to

BHL:
Similar to A2ex, the system of BHL in engineered bacteria corresponds to
In conclusion, the system without engineered bacteria is expressed by (1)-(8), and the system adding engineered bacteria is expressed by (1)-(6), (9)-(12).


Results:

Figure 1 The concentration of each substance in the system without adding engineered bacteria

The figure above shows the concentration of each substance in the system without adding engineered bacteria. It can be seen that after reaching equilibrium, the intracellular OdDHL, BHL concentrations and extracellular concentrations of pathogenic bacteria are approximately equal.

Figure 2 The concentration of each substance in the system adding engineered bacteria

The figure above shows the concentration of each substance in the system after adding engineer bacteria. It can be seen that compared with Figure 1, the equilibrium concentration of most substances is significantly reduced, and the equilibrium time is earlier. The following is to illustrate by comparing the concentration change curves of each substance before and after the addition of engineer bacteria.

Figure 3 Comparison of the concentration before and after adding engineered bacteria


As can be seen from the above comparison figure, after the addition of engineered bacteria, the equilibrium concentration of LasR and RhlR as well as the time to reach the equilibrium do not change much, that is, the engineered bacteria could be considered to have insignificant influence on the intracellular LasR and RhlR of the pathogenic bacteria. However, the equilibrium concentration of OdDHL, BHL both in and out of the cells of pathogenic bacteria decrease obviously, and the time to reach the equilibrium is shortened. This indicates that the engineered bacteria can evidently change the equilibrium state of OdDHL and BHL, and have a significant effect on the concentration reduction.


(A)


(B)


(C)


(D)

Figure 4 The sensitivities of OdDHL(A) and BHL(B) to the Km value of the enzyme towards OdDHL and the sensitivities of OdDH(C) and BHL(D) to the complex of the OdDHL/LasR.

After the modeling process above, we also use SimBiology in MATLAB to do the parameter sensitivity analysis. We want to use the analysis to see how much the changes of the Km value of the enzyme can affect the concentrations of both AHLs which will prove the importance of choosing a better enzyme to degrade AHLs. The results (not shown totally) show that OdDHL is quite sensitive to the Km value of the enzyme while BHL is not. This makes sense because this parameter is used to describe how well the enzyme can degrade OdDHL. But this also means that BHL is not sensitive to the change of OdDHL. As the connection between the two systems is the complex of OdDHL/LasR, we then analyze the sensitivity of the two AHLs to the complex. It turns out that both AHLs are not sensitive to the concentration of the complex. Together with the data above showing that the concentration of the complex of OdDHL/LasR is very low in the steady state, we find that this model doesn’t manifest the connection between the two AHL systems as we expect. This may be due to the parameters we get from the reference don’t fit our situation as they add more factors to their model. So the next step is to modify the parameters with more suitable data by experiments or other literature which can be done next year.

Table 1 Variables

Table 2 Parameters

Biofilm

In order to explore the growth rate of biofilm and the influence of the quorum sensing of pathogens on the growth of biofilm, a partial differential equation model of biofilm is established to simulate the growth process of biofilm. The idea of the model comes from the diffusion equation, which is based on the law of conservation of mass. And MATLAB is used to solve the model numerically. Although the model is difficult to solve, which means the ideal result is not completely obtained, it is still possible to determine the linear correlation between the growth rate of biofilm and time, and the growth perpendicular to the surface will be limited.


The basic assumptions are as follows:
1. Biofilms grow in a static medium and that their growth perpendicular to the surface is restricted by saturated nutrients in the local population density.
2. The bacterial population is composed of two cell subsets corresponding to down-regulated and up-regulated cells respectively. AHL was down-regulated and up-regulated at a fixed rate.
3. The rate of division of the two cell subsets is the same.
4. Biofilm is considered to be uniform, so cell density NT can be interpreted as proportional to the local height H of biofilm.
5. The growth model of biofilm is only in one dimension.


Modeling ideas:
The process of biofilm’s growth is through a splitting unit pushing its adjacent units aside to accommodate the extra volume. Thus, cell growth produces movement within the colony at a rate described as v tangent to the solid surface. So the growth of biofilms can be described by diffusion process, which is based on the law of conservation of mass, as following:

change in u = u generated locally + u from transporting

Supposing Ω is a single unit, is its border ∂Ω, n is the normal vector directing to the outward, f is the u generated per unit volume, J is the u flowing across Ω. Then from the conservation law above, in the case of locally generated u, we can get

By Gauss formula
we can change the formula above into
that is

Our PDE Model:

The generation of down-regulated cells per unit volume includes the increase caused by cell division and the conversion of up-regulated cells to down-regulated cells, as well as the decline caused by the conversion of down-regulated cells to up-regulated cells induced by AHL. The division rate of up-regulated and down-regulated cells is the same, and it is denoted as taking the influence of carrying capacity into consideration. And one up-regulated cell can divide into γ up-regulated cell and 2-γ down-regulated cell. The conversion rate of down-regulated cells to up-regulated cells induced by AHL is α, and the conversion rate of up-regulated cells to down-regulated cells is β. Thus the system of down-regulated cells is

The generation of up-regulated cells per unit volume includes the increase caused by cell division and the conversion of down-regulated cells to down-regulated cells, as well as the decline caused by the conversion of up-regulated cells to down-regulated cells induced by AHL. The rate of division of up-regulated is the same as ones of down-regulated cells. As saying above, one up-regulated cell can divide into γ up-regulated cell. So the number of up-regulated cells that actually proliferate once they complete a division is 1- γ. The conversion rate of down-regulated cells to up-regulated cells induced by AHL is α, and the conversion rate of up-regulated cells to down-regulated cells is β. Thus the system of up-regulated cells is

Due to
adding (1) and (2), we can get

The change of AHL per unit volume includes includes the increase caused by diffusion and the secretion of both up-regulated cells and down-regulated cells, as well as the the decline of degeneration, escape into the external media and consumption of inducing the transformation of up-regulated cells to down-regulated cells. Where, the diffusion coefficient is D, and the secretion rate of down-regulated cells and up-regulated cells is and separately. The degeneration rate of AHL is λ, and the escape rate is assumed to be inversely proportional to the local cell density, where the proportionality coefficient is Q. Thus the system of AHL is

Let the inverse density function, represents the proportion of growth that occurs tangential to the surface. So the system of v can describe as
Through the discussion above, the system of equations is
In here, we choose the form of and to be
In which K is the saturation density per area. The initial conditions and boundary conditions of the system are as follows
Non-dimensionalisation

Let
Thus the system changes to
And the initial conditions and the boundary conditions change to

Results:

Through numerical simulation of the above model by MATLAB, we can get the variation of biofilm size with time, which is shown as below:

Figure 1 The variation of biofilm size with time

As can be seen from this figure, with time going on, the size of biofilm is approximately linear with time, that is, the growth rate of biofilm is approximately constant. Substituting this result back into the system non-dimensionalised, it can be seen that at a fixed position, with the increase of time, and will eventually approach 0, which means an equilibrium state is reached. It also confirms the first assumption, the growth of both down-regulated and up-regulated cells perpendicular to the surface is restricted, is reasonable.

Table 1 Variables

Table 2 Parameters

gro Language for Visualization

Q: Why did you decide to use this language?
A: gro allows us to make a vivid movie for our quorum quenching system and make some calculations about cell behaviour that are hard to be taken into account in traditional kinetics models.

Q: Why did you consider spatial distribution as an interesting problem?
A: We started paying attention to it when executing programs with gro. We are very curious about whether the spatial distribution of the quorum quenching system will have a profound effect on the dynamics of this system and whether there is an optimal pattern for the highest quenching efficiency. We can’t increase engineered bacteria without limitation because of the risk of inflammation storm. What’s more, The spatial distribution of different cells/molecules in the biological system is a hot issue in biology.

Q: How did you do for spatial distribution this year?
A: We used gro to simulate what was happening in the laboratory and avoided problems we encountered when trying to use math to depict complicated cell-cell interaction. However, we should seek for more accurate methods for this problem in the future.

Connection with wet lab:
gro models help us to estimate the results of spatial distribution experiments, which we don’t perform this iGEM season. It also might help us to explore if there is a best scheme when this method(quorum quenching) is applied to clinical treatment.

Quorum quenching system in gro language

gro is a cell programming language, which is designed for modeling and simulating the behaviors of cells in silico [2]. This programming language allows users to consider growth, division, signal receiving and sending, chemical reaction, and gene expression. Detailed behaviors of cells are so essential in synthetic biology that increasing people attach more importance to it. In fact, iGEM team from Imperial College London firstly used this language as an important part of their model work in 2016, hoping that this program can act as a complement to their Simbiology simulation.

Initially, we took this special language into consideration for our project for several reasons:
gro allows us to visualize our microorganism system, thus helping us and readers to have a better understanding of quorum sensing and quorum quenching;
Kinetics models mentioned above mainly focus on chemical substances and their quantitative changes, while gro can provide us more information about cells (bacteria).
Consequently, codes were written to display our quorum quenching system. You can play the video below to have a glimpse of the visual illustration of the system:

Movie 1 A movie of quorum quenching system. Please see the text for more information.
Red cells, green cells and blue cells in this video represent pseudomonas aeruginosa, engineered bacteria and human cells respectively. Pink background means that AHL is diffusing, while we used blue background to denote where engineered bacteria were destroying the system of quorum sensing. Black cells refer to all kinds of cells that can’t produce fluorescent molecules, we can ignore these cells in the movie, but they were included in the calculation. As can be seen in this video, in the beginning, red cells kept reproducing, while blue cells which are exposed to the pink signal background kept shrinking and even collapsed. After engineered bacteria, those green cells, came to rescue human cells by degrading those AHL signal molecules, blue cells were able to reproduce normally.

(A)

(B)

(C)

(D)

(E)

(F)

Figure 1 Results of gro simulation. Please see the text for more information. (A) Result of a system without engineered bacteria.(B-F) Selected results of a system with engineered bacteria.

Please see these quantitative results for better understandings. The figures titled “pathogen/human cells” inform us that, human cells nearly stopped reproducing and growing in the beginning because of AHL, so the ratio of pathogens(namely pseudomonas aeruginosa )/human cells kept rising. However, engineered bacteria save them, after which the ratio kept decreasing in each figure. We also plotted the relationship between time and ratios of different microorganisms(pseudomonas aeruginosa, human cells, and engineered bacteria) and overall cells. It is obvious that the figure of human cells in each graph experiences decrease and increase in turn. This change owes to engineered bacteria. A system without engineered bacteria was also created to make comparisons.
Spatial distribution in gro language

Interestingly, we found something worth exploring further when executing the simulation above. It was clear to see that the spatial distribution of different microorganisms had a conspicuous effect on the things that occur subsequently.
Please image a game where you control one character to fight against two or more opponents. How to make the most of your weapon is always a tricky problem, depending on your attack scope, your opponents’ attack scope, and other factors.

This scene also exists in the biological world. In our quorum quenching system, engineered bacteria have enzymes that can degrade AHL and destroy quorum sensing of pseudomonas aeruginosa. However, if they are too far away from pseudomonas aeruginosa, it is difficult for them to use their enzymes effectively and then protect human cells, owing to the degradation and transport limitation of proteins. Similarly, AHL from pseudomonas aeruginosa can’t expand as they want. It seems that we now have no mature techniques to accurately control relative positions of engineered bacteria as we want, but we still attach much importance to this problem for these reasons:
Spatial distribution is a common problem in almost all kinds of cell-cell dialogues. Hence, it is meaningful to explore this problem as we can in our system. Additionally, the scale of microorganisms we take into consideration is much larger than single cells. Quorum sensing following by the formation of biofilm makes it possible for pseudomonas aeruginosa to take up a relative place on the organ’s surface. It is believed that with the advent of PK/PD progress, some medical treatments have the potential to achieve accurate control of the drugs.
Firstly, we tried to form a concise mathematical model for this spatial distribution problem.
The diffusion of AHL satisfies Fick’s law.

Considering the degradation of molecule, we modified this equation like this:
where α is the degradation coefficient and C denotes the concentration of AHL. Boundary conditions here are a steady concentration of AHL at its source at x = 0, C(x = 0) = Co, and zero boundary conditions far into this field, C(∞) = 0. So at steady-state,this equation became:
However, we were confronted with difficulties when considering the geometrical of those microorganisms. We came up with some assumptions, for instance, circles are used to represent colonies and they would stop expanding when it was tangent to other circles. These assumptions were poorly fit to the fact, due to the fact that interaction among microorganisms is extremely complex. This interaction is so stochastic that we finally turned to some computational tools for help.
gro is a helpful tool here for us to have a better understanding of spatial distribution. The algorithm of gro language is complicated. One significant example is that it takes physical interaction among cells into consideration. gro includes important noises calculation and random interaction when dealing with cell-cell interaction. This feature enables the simulation process highly similar to what is happening in the laboratory. Furthermore, signals in gro program will be represented by different colors, which makes it easy for us to directly observe the distribution of signals when running the program.

We wrote codes to display different situations where engineered bacteria and pseudomonas aeruginosa were positioned in the virtual plate. There is no denying that the initial distribution of different microorganisms (and chemical substances secreted by them) have a profound effect on the process of simulation, such as the pattern of different signals.
Movie2 Two examples of spatial distribution.

Yellow cells in these movies represent pseudomonas aeruginosa, while red cells represent engineered bacteria. Black cells refer to all kinds of cells that can’t produce fluorescent molecules, we can ignore these cells in the movie, but they will be included in the calculation. Pink signals are AHL, while blue signals mean engineered bacteria can destroy quorum sensing in these areas. Two movies above show different patterns after running because of their initial difference of spatial distribution.

(A)

(B)

(C)

(D)

(E)

(F)

(G)

(I)

Figure 2 Different patterns of spatial distribution and corresponding results of gro simulations. Please see the text for more information. (A)-(E) Five kinds of distributions and their results. (F)-(G) These two patterns share the same structure as (E), but the distance of two microorganisms has changed to different values. (I) A summary of (A)-(G)

We also wrote codes to calculate the ratio of pseudomonas aeruginosa and engineered bacteria(bad and good). The results above tell us that different spatial distribution of two kinds of bacteria will have a profound effect on what occurs. What’s more, we ran the program in each pattern for several times, and it is clear that the figure is keeping fluctuating because of stochastic process. This also suggests that it is hard to depict this process exactly without computational biology, so this is the direction we will head for in the future.
If you are interested in our codes, please contact us without hesitation.
Ancestral Protein Reconstruction

Q: Why did you perform this bioinformatics analysis?
A: Ancestral protein reconstruction and enzymatic promiscuity are two notions that gain more and more popularity nowadays. We want to create an enzyme that is not only efficient but also broad-spectrum, so we tried to reconstruct ancestral proteins to analyze in silico at first.

Q: What software/web-server did you use?
A: NCBI BLAST for sequences collection;
MEGA-X MUSCLE for multiple sequences alignment;
Mrbayes and GRASP server for reconstruction;
Figtree and iTOL for tree drawing;
EXPASY SWISS-MODEL and Phyre2 for protein structure prediction.

Connection with experiments:
Obtaining ancestral proteins is an essential step for us to modify/create enzymes. Ancestral proteins provide us more residues information which is not available when we perform traditional biochemical researches, like site-specific mutagenesis. These sequences can be analyzed and then synthesized in wet lab to perform enzymatic experiments.

Promiscuity is an interesting property of some enzymes, ancestral proteins particularly. Most evolutionists suppose that proteins tend to be more and more specific during the process of molecular evolution. That is to say, ancestral proteins are more likely to be able to recognize different substrates and catalyze different reactions, compared with extant enzymes. Utilizing promiscuous enzymes is an appealing strategy of all kinds of organisms. The limitation, ranging from sequence space to energy and nutrients, make it necessary for ancient organisms to utilize multifunctional enzymes to cover basic needs for lives. This phenomenon reveals an interesting trade-off during evolution. There are some downsides of promiscuous enzymes. The most obvious one is that its catalytic efficiency is relatively lower than more substrate-specific modern enzymes. However, if a biologist can obtain an ancient enzyme, he can modify its property with the aid of many methods, like directed evolution, mechanism-based site-specific mutagenesis and so on. So an ancient enzyme is promising in synthetic biology, despite its unsatisfying catalytic efficiency.
So how to get an ancestral protein sequence? With the advent of bioinformatics and advanced mathematical modeling in biology, resurrecting ancient molecules becomes more and more tangible. The phylogenetic analysis provides us an opportunity to figure out the evolution of not only species, but also more small things, including circuits, molecules, and so on. In the field of protein engineered, ancestral protein reconstruction is regarded as a useful method and a perfect complement of directed evolution [4,5].
In our project, most enzymes we chose to use this year, like PvdQ, are all substrate-specific, while AHL contains a series of organics with different lengths of carbon chains, including OdDHL and BHL. Although AiiA is considered as an enzyme that has the potential to catalyze different reactions, its catalytic activity is much higher when interacting with substrates possessing long carbon chains. So ancestral protein reconstruction may help us to explore some key residues that play a significant part in promiscuity and even design an enzyme whose activity is high no matter what AHL it meets in the near future.

Now procedures of PvdQ sequences reconstruction will be given as an example to illustrate our methods and results.
Firstly, we selected extant sequences to build our analytic set. Based on the PvdQ sequence, homologous sequences were retrieved from NCBI ref-seq database, with the help of BLAST. We selected sequences with 30-90% identical residues and removed sequences with higher identity, as suggested by most papers. The number of sequences in our analytic set ranged from 35-60, which was a relatively small set.
Then, a multiple alignment was created for subsequent analysis. To achieve this goal, we utilized MEGA-X software [6]. Protein sequences were aligned with the support of the MUSCLE tool in MEGA-X. Furthermore, the method of maximum likelihood(ML) was used to infer the best substitution model for phylogenetic tree construction. The table below presents the top ten models for this sequence set, according to different indexes, including AICc, BIC, and LnL. It was clear to see that the model LG+G+I+F was the best substitution model in this run, namely the LG model with Gamma-distributed rates and frequency plus Invariant Sites. We used this model when reconstructing ancestral sequence in GRASP web-server.

Table 1 Substitution model results from MEGA-X.


The next stage was to compute a phylogenetic tree. There are lots of state-of-the-art computational approaches to fulfill this target, including ML and Bayesian inference. We used Mrbayes 3.2.7 [7], the mechanism of which is based on the notion of posterior probabilities, as our analytic platform. Most default settings were maintained for our analysis, with the exception of rates and some parameters in the command mcmc. Three tables below show key parameters in our run.

Table 2-4 Parameters used in Bayesian analysis.

Our tree showed below was edited in iTOL sever [8].

Figure 1 Phylogenetic tree of PvdQ protein, generated by iTOL.

Finally, we tried to reconstruct ancestral sequences. The extant sequence set and the tree determined above formed the basis for the computation of ancestral sequences. Most advanced phylogenetic platforms can both create the tree and analyze the ancient states, such as Mrbayes. As for our project, we constrained a set of all taxa in the sequence set, let the program report the results and got the ancestral sequence for all taxa after analysis.

What’s more, there are increasing web-server designed for biologists to reconstruct ancestral sequence, like FASTML and PhyloBot. We also used Graphical Representation of Ancestral Sequence Predictions (GRASP) [9], an up-to-date web-server open to the public in 2020 for our reconstruction.

Here we present results for PvdQ analysis with GRASP web-server.


Figure 2 Results of GRASP reconstruction, including a tree, a multiple sequence alignment and a box for ancestral sequence presentation.


Solid edges are fully supported by bi-directional edge-parsimony. These links are strong candidates for how an ancestor can be assembled. A dotted edge is maximally parsimonious when only one of the nodes it joins is considered. These edges are weaker candidates but are kept to allow full-length ancestors to be formulated in the absence of the stronger option above.

GRASP employs exact probabilistic inference; it distinguishes between two types of reconstructions:
Joint reconstruction: GRASP infers the most probable combination of all ancestral states. The assignment of all ancestral states (across the whole phylogenetic tree) optimizes a single objective (the joint state with the greatest likelihood). This will tell users which are the most probable ancestors, given the extant sequences and their phylogenetic relationships.
Marginal reconstruction: GRASP infers the probability distribution of each residue for a single, nominated ancestor. This inference will present you with the probability of amino acids at each position in the nominated ancestor, given extant sequences and their phylogenetic relationships.
For detailed information, please see http://grasp.scmb.uq.edu.au/.You can also click here to download files we provide for you and have a try to reconstruct ancestral sequence with GRASP immediately.

The link to pvdq formal.fas and pvdq.nwk

After obtaining sequences of all nodes of the tree, we selected some appropriate sequences(nodes) according to the branch length among nodes and the place where the node is positioned in. Generally, sequences that appear later during evolution were discarded, while attempts were made to select ancestral sequences that were ancestors of many extant sequences. In the meanwhile, we made sure that these extant sequences were not strongly related to each other in order to allow more significant mutations to stay in the ancestral sequences. We finally chose seven ancestral sequences of AiiA and five sequences of PvdQ.
Then, BLAST was used to detect the highest identity between extant sequence and ancestral sequence. Results are showed in the table below.

Table 5 The highest sequence identity of selected ancestral protein sequence.

Sequence PvdQ_N0, PvdQ_N1, AiiA_N4, AiiA_N6, and AiiA_N8 were well-suited to our analysis, with the highest identity 84.80%, 78.93%, 95.60%, 94.00%, 80.08% respectively. The reason why we chose these five sequences was that their corresponding nodes during evolution suggested that they might be more likely to show enzymatic promiscuity. What’s more, the highest sequence identity can’t be too low to ensure the accuracy of protein structure prediction, while sequences highly similar to extant sequences may make us ignore some key mutations. Actually, attempts could be made to examine all sequences to avoid mistakes or omissions, but it took us too much time to achieve that goal this year.

Next, SWISS-MODEL [10] was used to predict protein structures toughly. After that, we utilized Phrye2 (Protein Homology Recognition Engine V 2.0) web-server [11] for accurate protein structure prediction. It is a state-of-the-art protein bioinformatics tools, using remote homology detection methods to build protein structures, predict ligand binding sites, and analyze the effect of mutations on structures.

Key steps of this server include:
Backbone determination (α-carbon), namely placement of corresponding residues according to structure templates.
Loop modeling for insertions or deletions spare parts algorithm; from a special loop library.
Side-chain positioning. For those conserved side chains, the algorithm will run according to structure templates; for those varied ones, other sophisticated algorithms will be used.
Model refinement, namely energy minimization.

We submitted five sequences mentioned above to this server and performed intensive modeling, after which results were sent back in a few hours. The highest structure identity of each protein was showed below. It is acknowledged that sequences with more than 25% identity over an alignment of 80 residues or more adopt the same basic structure. All of the figures of identity of our sequences reach 70%, so it is obvious that the proteins we submitted were all well-suited to this modeling method, with 100% confidence in this web-server. These structures will be used to analyze and modify in the next part.

Table 6 The highest structure identity of selected ancestral protein sequence.

Table 7 RMSD between different protein structures.

Structural Biology and Molecular Docking

Q : What did you do to analyze structure?
A: Sequence alignment, structure alignment, molecular docking, virtual mutation and so forth have been tried this year. We select important residues through 2 ways: differences found in the alignment of extant sequence and ancestral sequence; literature. These residues were analyzed as further as we can in phase 1.

Q : Why did you reconstruct ancestral protein before the structural biology survey?
A: Some residues that exist in ancestral sequences may play an important role in broad target scope. So ancestral sequences may inform us of some key residues that can’t be inferred in a traditional way. In fact, we successfully found some residues worth exploring, such as PHE68, GLU136, and TYR194.

Q : How could in silico methods be used in an improved screening workflow?
A: It's proposed that the process of drug screening starts from in silico methods to generate massive screening, e.g. from 1000 candidates to 100. Then in vitro platforms are used to facilitate rapid and high-throughput testing of in silico results, e.g. from 100 candidates to 10. Ultimately, the effects of several promising candidates are verified via in vivo systems.

Connection with wet lab:
We plan to design and synthesize sequences according to our analysis, and some enzymatic experiments will be done to check if our analysis is right. We also can perform simulations that are led by experiments.
The results generated by AutoDock Vina will be directly tested in cell-free quorum sensing pathways, then verified in Pseudomonas aeruginosa if promising.

Structural analysis of extant/ancestral lactonase and acylase

We successfully got some ancestral sequences and corresponding protein structures in part 3. Structural analysis was now important for our design. Information obtained from in silico analysis can inform us what residue is worth paying attention to and how can we optimize our enzymes. In this part, Discovery Studio and PyMOL were used to perform a series of projects including structural alignment, virtual amino acid mutation, molecular docking and molecular dynamics(MD) and so forth.

The enzyme we analyzed here was AiiA, a metal-dependent N-acyl homoserine lactone hydrolase that displays broad substrate specificity but has a preference for substrates with long N-acyl substitutions(carbon chains). The reason why AiiA is more appropriate for our analysis is that it has a relatively simple structure and its catalytic mechanism has been clearly revealed. An the addition−elimination reaction leads to ring-opening, with two zinc ions bound together in a the dinuclear cluster at the enzyme’s active site [12].

Figue1 Reaction mechanism of C6-HSL catalyzed by AiiA.

Firstly, we performed advanced molecular docking for AiiAWT protein(PDB code: 3DHB) and C6-HSL. Small molecules were drawn and optimized in Kingdraw software. Water molecules were removed from protein, while some residues and hydrogen atoms were added to the protein to make its structure complete. Receptor binding sites are very important in molecular docking and molecular dynamics. In this part, we defined the receptor binding site according to where the ligand was positioned in the protein. We also checked to make sure that key residues were included in the site we defined. The radius of the sphere was set to 8 angstroms. Molecular docking simulations of the 10 lowest-energy poses were done using the grid-based approach CDOCKER, a molecular dynamics simulated annealing-based algorithm in which the receptor is held rigid while the ligands are allowed to flex during the refinement, implemented in Discovery Studio. All of the structures were energy-minimized using force field CHARMm. The RMSD threshold was set to 0.5 angstroms to ensure as much diversity as possible in the conformation.

Figure 2 The ligand-binding pocket of AiiA protein, generated by PyMOL. Green molecule is C6-Hse, yellow molecules and white molecules are important residues mentioned below.

(A)

(B)

(C)

Figure 3 (A)-(C) Residue interaction histogram for AiiA and C6-HSL docking.

Figure 4 Heatmap of interaction between important residues of AiiA and C6-HSL. Please note that in this picture we mistake the molecules’ name, so the name “3DHB” refers to C6-HSL.

We analyzed every ligand poses and the results are pictures showed above. It is clear to see that PHE64, PHE68, HIS106, PHE107, ASP108, and GLU136 strongly interact with the ligand, consistent with the conclusions of many pieces of research. HIS106, PHE107, ASP108 contribute most hydrogen bonds, while both PHE64 and PHE68 provide hydrophobic interaction.

Next, we performed CDOCKER for AiiA_N8 protein and C6-HSL. The difference between these two simulations was that we defined the receptor binding sites here based on protein cavities, instead of the method used in the first docking task.

Figure 5 Interaction between important residues of AiiA_N8 and C6-HSL.

(A)

(B)

(C)

(D)

(E)

(F)

Figure 6 (A)-(F) Residue interaction histogram for AiiA_8 and C6-HSL docking.

Figure 7 Heatmap of interaction between important residues of AiiA_N8 and C6-HSL. Please note that the name “molecule1” refers to C6-HSL.


Most results here were similar to those of AiiA protein.
Subsequently, we aligned sequence of AiiA protein and AiiA_N8 protein, and carefully examined their differences. Here, EsPript 3 web-server [13] was used to make the align picture showed below.

Figure 8 Alignment of AiiA and AiiA_N8 sequences.

We found that most different residues of these two sequences were located outside the activity site or even the surface of this protein, such as Ser14 and Arg14. We proposed that most of these residues acted as a role of stabilizing the protein, that is to say, these mutations were likely to be neutral. Three pair of promising residues which were sited near the activity site, were found after examination: ALA20 and Ser20, Arg 68 and PHE68, Asp136 and Glu136(the former residue corresponds to residues of AiiA_N8). These results were recorded for further analysis.
Then, virtual amino acid mutations based on interaction forces were rendered to AiiA protein. We performed this simulation to see if there were some mutations that can enhance the interaction between AiiA protein and C6-HSL. Firstly, force field CHARMm was applied to the protein-ligand system. Amino acid residues within 3 angstroms around the ligand were selected and mutated to ALA in turn(single mutation). The result is shown below.

Figure 9 Report form of selected residues of AiiA mutated to ALA. Effects are judged by the energy when bound to C6-HSL.

From the report form above, we saw that seven residues that were mutated to ALA destabilized the system, including the last five residues in the table above. This suggested that these seven residues were important in the interaction between protein and ligand. So we analyzed these seven residues further. We successively mutated these seven residues to nineteen amino acids and calculated mutation energy(binding).

Figure 10 Report form of 7 key residues of AiiA mutated to other 19 amino acids. Effects are judged by the energy when bound to C6-HSL.

As can be seen from this table, there were some mutations that stabilize the binding, which means that these mutations are likely to improve the efficiency of this enzyme towards C6-HSL. These conclusions can help us to create an enzyme that is more substrate-specific. However, seven residues mentioned above were all the same in two sequences, AiiA and AiiA_N8. So we had to seek for other approaches to explore.
We decided to examine three pairs of different residues mentioned above carefully. The first one we analyzed was residue 68, because it was located in the activity site. PHE68 of AiiA was mutated to other nineteen amino acids and we calculated the mutation energy.

(A)

(B)

Figure 11 (A)-(B) Report form of PHE68 of AiiA mutated to other 19 amino acids. Effects are judged by the energy when bound to C6-HSL.


Interestingly, all mutation of PHE68 showed that the mutation didn’t stabilize the protein. It seemed that PHE68 for protein in terms of binding to C6-HSL was a good choice.
At the next stage, we changed the substrate to C5-HSL and performed the same simulations.

Figure 12 Heatmap of interaction between important residues of AiiA and C5-HSL.

Compared with the heatmap of AiiA-C6-HSL, this residue was more active: ILE73; these residues were less active: TRY194, HIS 169 and GLU136.
We mutated residues residing in the sphere located in the activity site, with the radius of 4.5 angstroms, to ALA.

Figure 13 Report form of selected residues of AiiA mutated to ALA. Effects are judged by the energy when bound to C5-HSL.

As we can see, PHE68 mutation didn’t have an influence on the binding, in contrast to the simulation of AiiA-C6-HSL. It suggested that PHE68 might not act as an inevitable role in binding to AHL with relative short carbon chains.
As a consequence, we mutated PHE68 to other nineteen amino acids. Results were shown below.

(A)

(B)

Figure 14 (A)-(B) Report form of PHE68 of AiiA mutated to other 19 amino acids. Effects are judged by the energy when bound to C5-HSL.


All mutations were neutral, which convinced us that PHE68 was not important in terms of binding and catalysis. According to evolutionary theory of enzymes, PHE68 may exist to stabilize the whole molecule.
Similar simulations were done for the substrate C8-HSL/C10-HSL and results were shown in pictures below.

Figure 15 Heatmap of interaction between important residues of AiiA and C8-HSL.

Figure 16 Report form of selected residues of AiiA mutated to ALA. Effects are judged by the energy when bound to C8-HSL.

Figure 17 Report form of PHE68 of AiiA mutated to other 19 amino acids.

Figure 18 Heatmap of interaction between important residues of AiiA and C10-HSL.

Figure 19 Report form of selected residues of AiiA mutated to ALA. Effects are judged by the energy when bound to C10-HSL.

Figure 20 Report form of PHE68 of AiiA mutated to other 19 amino acids.


Overall, results shown above tell us two things:
1. Some residues, especially TYR194, play an important role in catalysis and binding when the substrates have long carbon chains. TYR194 is conserved in our sequence data set, but attempts can be made to mutate this residue to see whether it hinders the enzyme to be multifunctional.
2. It seems that PHE is more important when the substrates have long carbon chains, but it may only act as a stabilizing residue. This residue is worth exploring and more accurate methods like molecular dynamics should be utilized. From literature we know PHE68 and PHE 64 form a hydrophobic clamp, making an interaction that would also be available for longer substrates, but not for shorter substrates. This residue is a promising one that should be done with lots of experiments in phase 2.

We also analyzed the other 2 pairs of different residues found in alignment, ALA20, and Ser20, Asp136 and Glu136. Single site mutation was performed in both residue 20 and residue 136. Firstly, we examined the binding of C10-HSL and AiiA.

Figure 21 Report form of SER20 of AiiA mutated to other 19 amino acids. Effects are judged by the energy when bound to C10-HSL.

Figure 22 Report form of GLU136 of AiiA mutated to other 19 amino acids. Effects are judged by the energy when bound to C10-HSL.

Results show that, ALA20 and SER20 are both well-suited to this binding event. However, GLU136 is more important compared with ASP136, because the mutation to ASP destabilized the binding.

Similar simulations were executed to AiiA and C5-HSL.

Figure 23 Report form of SER20 of AiiA mutated to other 19 amino acids. Effects are judged by the energy when bound to C5-HSL.

Figure 24 Report form of GLU136 of AiiA mutated to other 19 amino acids. Effects are judged by the energy when bound to C5-HSL.

Themutation from SER20 to ALA20 didn’t influence the whole system. Interestingly, the mutation fromGLU136 and ASP 136 didn’t affect the binding event either, compared with the binding of C10-HSL and AiiA.

These results may suggest that GLU136 is important for catalysis of substrates with long carbon chains and didn’t play a significant role of binding C5-HSL. This residue can be modified in the future.

Virtual screening for anti-LasR/LasI molecules

In our project, in vitro platforms driven drug screening was considered as a powerful alternative to currently used in silico-in vivo methods. The well-known in silico methods include molecular docking, which is founded on the lock-and-key model. To illustrate the strengths of our cell-free quorum sensing systems, an improved workflow combined with in silico methods needs to be proposed and practiced. The new workflow encompassed in silico docking, in vitro rapid and high-throughput screening, and later in vivo testing for verification. Through investigating literature, we found salicylic acid is an inhibitory ligand to both LasI (PDB ID: 1RO5) and LasR (PDB ID: 2UV0), and how its structural analogs interact with these two receptor proteins remain uncertain. Thus we wanted to screen the structural analogs of salicylic acid in silico and test promising candidates (with salicylic acid as the standard) in the cell-free quorum-sensing systems for proof-of-concept.

AutoDock [14] was a set of well-developed molecular docking softwares, and AutoDock Vina was its newest version with enhanced performance. We decided to use AutoDock Vina to facilitate the virtual screening of quorum sensing inhibitors. Additionally, the free database of commercially available compounds ZINC [15] was used. We downloaded the mol2 files of salicylic acid (ZINC1554) and its structural analogues in ZINC. Furthermore, PyMOL was used to delete the original ligand and waters in LasR.

Figure 1 structural analogues of salicylic acid in ZINC

To predict the binding pocket of receptor proteins for more accurate docking, we used a user-friendly webserver COACH [16] . The results for LasI were displayed here, with the most promising binding center predicted as (44.963, -15.387, -15.641).

Figure 2 predicting the binding pocket of LasI by COACH

The results for LasR were displayed here, with the most promising binding center predicted as (22.403, 15.683, 79.699). Based on these information, the subsequent targeted docking will perform faster and more accurate than blind docking.

Figure 3 predicting the binding pocket of LasR by COACH

Unfortunately, due to uncharacterized bugs and limited time, the results of virtual screening using AutoDock Vina cannot be displayed, also, our efforts to reconstitute the cell-free quorum sensing pathways didn’t lead to success this year. However, we anticipate the above-mentioned in silico workflow combined with in vitro reconstituted pathways has the full potential for rapid and high-throughput screening of candidate inhibitors to quorum sensing-associated proteins, and we will endeavour to accomplish this work after the iGEM season.

An Inflammation Model for fibrosis

Q: Why did you build this model?
A: Fibrosis is an important medical phenomenon. For example, fibrosis of lung is a vital indicator for doctors. Pseudomonas aeruginosa can also make people suffer from fibrosis, so we think it is interesting to learn more about this phenomenon from the perspective of circuits and math.

Connection with experiments:
We plan to perform co-culture of different immune cells in phase 2, so this model adapted from some excellent papers acts as our lamp when designing tasks in the wet lab.

Inflammation is one of the most important medical phenomena, which can cause more serious diseases, including fibrosis and even cancer. Pseudomonas aeruginosa will enter lungs of human beings and activate the immune system, after which lead to inflammation in lungs. Inflammation is set off to fight against pathogens. During this process, blood vessels will expand to give room for different kinds of immune cells. To seal up the injury as fast as possible, fibrous materal is required to cover the injury. This process depends on a cell-cell dialogue, which happens between macrophages and myofibroblasts. Macrophages will secret some kinds of growth factors to activate fibroblasts, after which fibroblasts tend to change their shapes and function, before becoming myofribroblasts. This is a concise description of why the fibrosis happens in tissues. In reality, fibrosis is a key medical imaging indicator in X-ray/MRI medical examination. One of the most significant example is that whether the lung of a patient contracting with Covid-19 is fibrous has become an essential medical criterion. A patient with a fibrous lung is more likely to die of pneumonia. As a consequence, it is vital to learn more about this process (fibrosis) and try to put up with some ideas to prevent fibrosis if one is diagnosed with pneumonia because of bacteria(pseudomonas aeruginosa), virus (SARS-Cov-2) and other factors. In this part, we tried to illustrate this key phenomenon happening in inflammation with our mathematical model adapted from a list of interesting papers [17,18].


Figure 1 An illustration of interaction system of two kinds of immune cells


The myofibroblasts, F, produce a specific growth factor for themselves at the rate of β2 , called PDGF, which we will denote as c1. c1 also can be produced by macrophages, M, at the rate of β1. In addition, it is endocytosed by F at the rate of α1 , and degrades at the rate of γ .Thus the system of c1 corresponds to

Since the time scale of c1’s production and degradation is smaller than that of cell’s division and death, we can assume that c1 has reached their steady state. In other words, growth factors c1 are in quasi-steady-state, which means , thus

The myofibroblasts proliferate at the rate (at certain growth factors) of λ1, and remove at the rate of μ1. Taking the carrying capacity (K) into account, we multiply division rate by the term 1-F/K. And we can normalize F concentration by it carrying capacity K, so that the carrying capacity term is 1. Thus the system of F corresponds to

So if we substitute equation (2) into equation (3), we can get

The myofibroblasts (F) can also produce a specific growth factor for the macrophages (M) at the rate of β3, called CSF, which represented as c2. In addition, c2 is endocytosed by M at the rate of α2, and degrades at the rate of γ . Thus the system of c2 corresponds to

Similar with c1 ,we can assume growth factors c2 are in quasi-steady-state, which means , thus

The macrophages, M, proliferate at the rate (at certain growth factors) of λ2, and remove at the rate of μ2. Unlike myofibroblasts (F), The growth of M is not limited by the carrying capacity.Thus the system of M corresponds to

So if we substitute equation (6) into equation (7), we can get


We can find points of stability in the system by plotting Nullclines. First, by drawing a curve satisfying and , we can find the system's fixed point.Then, by analyzing the Hessian matrix at each fixed point, or analyzing the trend of the phase diagram near the fixed point, we can determine whether the fixed point is a stable point of the system. The results are shown in the figure below.

Figure 2 Nullclines. The blue line is the solve of , and the red line is the solve of . The solid points are the stable fixed points, and the hollow points are the unstable fixed points.


The impact of trauma on ultimate homeostasis is analyzed below. Assuming that the corresponding proliferation rate of M at the time of trauma is I(t), (7) is modified to the following expression

According to the above analysis, the system has two stabled state, namely, final recovery or fibrosis, denoted as OFF-state or ON-state respectively.If the trauma do not last long enough,the dynamics stays within the basin of attraction to the OFF states, just as the diagram 1 shows. So the dynamics will finally go to the OFF state. On the contrary, if the trauma do last long enough, or if the trauma is multiple, the dynamics will cross the separatrix just as the diagram 2,3 show. Thus the dynamics will finally go to the ON state.

Diagram 1 If the trauma do not last long enough, the dynamics, which stays within the basin of attraction to the OFF states, ends up with the healing states(adapted from a list of papers [17,18]).


Diagram 2 If the trauma do last long enough, the dynamics, which cross the separatrix, ends up with the fibrosis states(adapted from a list of papers [17,18]).


Diagram 3 With multiple traumas, the dynamics, eventually crossing the separatrix , ends up with the fibrosis states(adapted from a list of papers [17,18]).


Table 1 Parameters


Kinetics Model of TEV protease

Q: Why did you build this kinetics model?
A: At first, we planned to execute experiments about this module. Teammates in the wet lab were curious about the quantitative features of this system, so we built this model for our experiments. We finally had no time to perform this experiment, so figures can’t be got for calculation. This model will be used in phase 2, when we have enough time to complete this part.


Connection with experiments:
Please see Q&A above.

As described in Part “design”, we utilized PQS as an inducer for a inducible promoter, namely PqsR promoter. The process, PQS binding pqs promoter and activating the transcription of related genes, can be quantitatively described with Michaelis-Menten equation,

where a denotes activators and p denotes pqs molecules. ω is a parameter that should be scanned after obtaining experimental data, while Kd is a parameter similar to Michaelis constant Km.

Similarly, activators produced from the event mentioned above are tend to interact with the induced promoter and lead to transcription of genes associated with immunity, so we can also describe this process with Michaelis-Menten equation,

where c denotes cytokines. κ is a parameter that should be scanned after obtaining experimental data, while Kd2 is a parameter similar to Michaelis constant Km.

Compared with inducible promoter, constitutive promoter can’t be perfectly depicted with Michaelis-Menten equation. Statistical physics gave us a reasonable way to quantify this common biological component19,

where E denotes TEV protease, Nns denotes the number of DNA, P is the number of polymerase, this ratio can be estimated from literature. β is boltzmann factor and it should be multiplied by the energy difference of 2 states(namely, DNA bound to or not bound to polymerase).

TEV protease will separate cytokine and sec-tag, thus hindering the excretion of cytokines. This reaction conforms to the principles of Michaelis-Menten equation, so a Michaelis-Menten equation is built for it,

where kcat and Kd3 are parameters that should be determined from experimental data.

Due to the fact that transportation of cytokines among cells requires both energy and nutrients, that is to say, this process will be confronted with the limitation, the excretion of cytokines can be modeled like this,

where η is the constant of degradation.

Finally, the concentrations of total cytokines, cytokines(in the cell) and cytokines(outside the cell)can be expressed:

where Ci denotes cytokines in the cell and Ce denotes cytokines located outside the cell.

Future

We believe that our models are promising enough, we can at least do these things in phase II:

1. Adjust parameters of our quorum sensing kinetics models in according with experiment figures.

2. Find better methods to replace gro like Simmune suit developed by NIH(we have no time to learn more about this suit this year) or use an advanced version gro developed in 2017.

3. Expand our analysis sequence set and synthesize ancestral sequences for experimental proof.

4. Keep analyzing protein structures and try to use more advanced methods like molecular dynamics. Necessary experiments are required to demonstrate our structural analysis. Molecular docking of LasR/LasI should be done as well.

5. Try to do some experiments related to TEV protease and inflammation system to show that our models built for these two designs are right.


Reference

1. Fagerlind, M. G. et al. The role of regulators in the expression of quorum-sensing signals in Pseudomonas aeruginosa. Journal of molecular microbiology and biotechnology 6, 88-100 (2003).
2. Jang, S. S., Oishi, K. T., Egbert, R. G. & Klavins, E. Specification and simulation of synthetic multicelled behaviors. ACS synthetic biology 1, 365-374 (2012).
3. Alon, U. An introduction to systems biology: design principles of biological circuits. (CRC press, 2019).
4. Merkl, R. & Sterner, R. Ancestral protein reconstruction: techniques and applications. Biol Chem 397, 1-21, doi:10.1515/hsz-2015-0158 (2016).
5. Joy, J. B., Liang, R. H., McCloskey, R. M., Nguyen, T. & Poon, A. F. Ancestral reconstruction. PLoS computational biology 12, e1004763 (2016).
6. Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Molecular biology and evolution 35, 1547-1549 (2018).
7. Ronquist, F. & Huelsenbeck, J. P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572-1574 (2003).
8. Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic acids research 47, W256-W259 (2019).
9. Foley, G. et al. Identifying and engineering ancient variants of enzymes using Graphical Representation of Ancestral Sequence Predictions (GRASP). bioRxiv, 2019.2012. 2030.891457 (2020).
10. Waterhouse, A. et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic acids research 46, W296-W303 (2018).
11. Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. The Phyre2 web portal for protein modeling, prediction and analysis. Nature protocols 10, 845-858 (2015).
12. Liu, D. et al. Mechanism of the quorum-quenching lactonase (AiiA) from Bacillus thuringiensis. 1. Product-bound structures. Biochemistry 47, 7706-7714 (2008).
13. Gouet, P., Robert, X. & Courcelle, E. ESPript/ENDscript: extracting and rendering sequence and 3D information from atomic structures of proteins. Nucleic acids research 31, 3320-3323 (2003).
14. Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry 31, 455-461 (2010).
15. Sterling, T. & Irwin, J. J. ZINC 15–ligand discovery for everyone. Journal of chemical information and modeling 55, 2324-2337 (2015).
16. Yang, J., Roy, A. & Zhang, Y. Protein–ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics 29, 2588-2595 (2013).
17. Adler, M. et al. Endocytosis as a stabilizing mechanism for tissue homeostasis. Proceedings of the National Academy of Sciences 115, E1926-E1935 (2018).
18. Adler, M. et al. Principles of Cell Circuits for Tissue Repair and Fibrosis. Iscience 23, 100841 (2020).
19. Phillips, R., Kondev, J., Theriot, J. & Garcia, H. Physical biology of the cell. (Garland Science, 2012).