A scientific model is a simplification of reality used to predict the behaviour of a real-life
system. It is an educated guess that scientists use to make a prediction about real life. With these
models or simulations, insights into the modelled system are gained. In the past years, models have
become an important part of scientific research since they save both time and costs to researchers,
which after studying the model can prognosticate how a system will behave when working with it and
changing some parameters.
During our project, we have performed different kinds of modelling in order to predict and optimize the performance of our SINISENS biosensor. For this, we have used different software: MATLAB, Rosetta and SWISS-MODEL. To recap, lets remember that our genetic circuit consists on a promoter that controls the expression of our MphR transcription factor and ermC gene, which gives resistance to macrolide antibiotics. When there are no macrolides present, MphR binds to the pMphR and represses the expression of the gene controlled by this promoter, egfp. Consequently, if there are no macrolides present there is no output signal, this is, no fluorescence. On the other hand, if there are macrolides present in the sample, these bind to the MphR, forming a complex uncapable to bind to the pMphR. Thus, if macrolides are present, there is expression of the egfp gene, so the biosensor emits fluorescence (Figure 1).
MATLAB: MATHEMATICAL MODELLING
Mathematical modelling consists on describing a physical system with mathematical functions and
properties. It usually starts with an examination of a physical system to determine how it works.
The end goal of such process is to find a set of functions that describe the physical system as
precisely as possible. This set of functions is called mathematical model. Mathematical
models can be as simple as just one function, but they can contain tens or even hundreds of
functions that all affect the output. A simple example of such model could be the trajectory of a
thrown ball which can be simplified to one quadratic function. Mathematical models can be useful
during the design of a new physical system. For example, a mathematical model of a flying ball can
tell at what angle the ball needs to be thrown to reach a required distance.
In synthetic biology, mathematical models can be very beneficial. For example, they can describe how output measured from biosensor changes when parameters of a system are tuned. These parameters might include biological constants such as promoter strenght, dissociation constant, and protein degradation rate.
Our motivation behind matematical modelling was to achieve three goals we had set to ourselves:
- Estimate the range of macrolide concentration at which our biosensor could be used.
- Get insights on how we could improve the sensitivity of the biosensor.
- Calibrate our biosensor.
To achieve these goals, we built two different mathematical models. We decided that mathematical modelling was a proper way to achieve these goals due to the fact that biological systems such as pathways can be described with mathematical functions and the output is easily comparable as it is numerical.
Estimating the Range of the Biosensor
We built our first model to determine the range of the biosensor. To achieve this, we used
experimental data from literature and applied it to the Hill equation. In biochemistry, the Hill
equation describes the relationship between ligand and receptor. We used this equation to plot the
percentage of receptors bound to macrolides as a function of macrolide concentration. We used data
from wildtype as well as mutated E. coli , since we wanted to get insights on how
mutations affect the MphR range. In addition, we plotted macrolide concentrations in wastewater to
see if these were inside the range of the biosensor.
The results of the model can be seen in figure 2: the maximum concentration of macrolides in effluent wastewaters is not inside the range of the biosensor. In fact, our biosensor needed to be 10-100 times more sensitive to detect the low concentrations of macrolides found in wastewaters. These results were crucial to move one step further in our modelling: we decided that we were planning to use Rosetta to predict mutations in the binding site of MphR that increase the binding affinity of MphR to macrolide antibiotics. Furthermore, since the sensitivity had to be heavily increased, we arrived at the conclusion that we would need to tune our system even more and we settled the third main goal of our project, to concentrate macrolide antibiotics inside our biosensor cells.
So, this first model highlighted the importance of modelling as it gave us insight to prepare for this problem well before obtaining any experimental data from lab.
Introduction and Motivation
The goal of the second model was to understand how the biosensor worked. This would help us to gain
insights on how the sensitivity could be improved by discovering possible bottlenecks in the
pathway. In addition, the mathematical model could help us during the calibration of the final
device by estimating the output in different scenarios.
To built this model, we used MATLAB’s add-on application, SimBiology. SimBiology is a tool designed for systems biology; this is, mathematical analysis of biological systems. This app allows building models interactively with a graphical interface in addition to the traditional method of programming with MATLAB-language. The graphical interface of SimBiology, referred as block diagram editor, allows building complicated biological systems visually. Thus, it requires little to no programming experience to be able to build a mathematical model.
System and Reactions
In figure 3 the reader can see a visual model of our genetic system modelled with MatLab
SimBiology. Briefly, mphr transcribes and translates and once there is MphR protein
in the cell, it binds to the plasmid in the PmphR promoter, repressing the expression of
egfp. On the other hand, outside the cell there are macrolide antibiotics that will get into
the cell and bind to the MphR. When this happens, MphR unbinds from the plasmid and the egfp
transcribes and translates.
This model constists of two parts: a sample of wastwater and an E. coli cell in it. In SimBiology, these are referred as compartments. Our pathway starts from outside of the cell; wastewater contains some concentration of macrolide antibiotics. We assumed that macrolides get inside of the cell by diffusion: passive transport of molecules through the cell membrane. The rate of such transportation method depends on the concentration difference between the two sides of the membrane. It is important to notice that as the movement of molecules is more or less random, in the end the system settles down in an equilibrium state. In this state, the concentration on both sides of the membrane is equal. This reaction can be modelled with reversible mass action kintetics, which states that the rate of the reaction is proportional to the concentration of the reactants.
After the macrolide molecules have entered the cell, they can bind to the repressor MphR, which is controlled by a constitutive promoter about which we will discuss later. This reaction results in the derepression of egfp expression:
- Two macrolide molecules bind to the MphR transcription factor and this causes MphR to unbind from the plasmid.
- Transcription of egfp begins.
- The end products of the reaction are the plasmid and the MphR-macrolide complex.
We decided to model the derepression, as all other reactions, using mass action kinetics. We made
this decision due to the fact that there is very limited amount of information and experimental data
on these system. Thus, finding different constants such as the promoter strength or the dissociation
constant might be impossible. As a consequence, we might need to estimate these constants, which is
egfp transcription can then start once the plasmid is derepressed. At the end, this results in the last step of the pathway: the EGFP protein translation. Noteworthy, in our system there are two transcription-translation pathways: transcription-translation of MphR and transcription-translation of EGFP. These two pathways are very similar and only the constants are different. However, there is a small difference in the regulation of the transcription: mphr transcription is controlled by a constitutive promoter, which means that the translation is continuous. On the other hand, egfp transcription is controlled by the previously mentioned transcription factor, the MphR repressor. This means that the egfp transcription stops as soon as the MphR binds to the plasmid.
Opposite to the previously discussed reactions (derepression and diffusion), neither transcription nor translation are reversible reactions. In addition, the reactants in these reactions are not used in the process. Hence, if the model doesn’t take that into account, there would be a reaction product overflow at the end. In real biological systems proteins are not permanent; they degrade and their concentration decreases when the cell grows. Thus, we added reactions for mRNA degradation as well as protein degradation. In addition, we assumed dilution not to be significant due to the fact that the volume of immobilized cells stays constant in the sensing timeframe of 1-2 hours, which is the time of use of our biosensor.
Two reactions we have yet to discuss are repression of the plasmid by MphR and binding of two macrolide molecules with the unbound MphR. These reactions are similar: both are reversible as well as both have two types of reactants binding to one complex. The only difference is that MphR binds with two molecules, which makes the reaction rate funciton a bit different.
The last part of modelling the SINISENS pathway was to set initial concentrations (Table 1) as well as finding proper constants for each reaction (Table 2). Due to the limited amount of experimental data and information available, some of the reaction constants had to be estimated by either fitting data or using numerical averages of a similar part.
|Macrolides outside the cell||1.41E-09||1 mol/L||---|||
|MphR:plasmid||200||Molecules||High Copy Number plasmid|||
|Average E. coli cell volume||0.47||μm^3||Used to convert molecules to concentration|||
|Wastewater Sample Size||1||L||---||---|
|Others||---||---||Other initial values are assumed to be zero||---|
|Average Translation Rate (E. coli)||17-21||Amino acids / second||Used 17AA/s to calculate MphR and EGFP translation rate|||
|Length of MphR (WT)||194||Amino acids||Used to calculate MphR translation rate|||
|Length of EGFP||240||Amino acids||Used to calculate EGFP translation rate|||
|Absolute Promoter Strength of BBa_J23101||0.03||Polymerase / second||Used to calculate promoter strenght of BBa_J23106|||
|Relative Promoter Strenght of BBa_J23106||0.67||Relative||Compared to BBa_J23101, used to calculate promoter strenght of BBa_J23106: 0.03 · 0.67 = 0.02 Polymerase / second|||
|Typical mRNA Degradation Rate||3-8||Minutes||Used 5 minutes|||
|EGFP Degradation Rate||0.021||1/hour||---|||
|Dissociation Constant of MphR Binding to its Operator (DNA chain)||574 +/- 29||nM||K_d: Ratio of reverse reaction rate and forward reaction rate. Used in repression.|||
|Concentration of Ligand at Half Maximum Normalized GFP Fluorescence||0.19 +/- 0.02||μM||Used to calculate dissociation constant of macrolide molecule binding to MphR|||
|Hill Coefficient||1.7 +/- 0.3||---||Used to calculate dissociation constant of macrolide molecule binding to MphR|||
When modelling physical systems, it is impossible to take every little detail into account to add them in mathematical equations. For example, in the throwing ball example we did not take air resistance into account to simplify the problem. The same applies for modelling biological systems: we have to make assumptions to simplify the problem. Below is a list of assumptions that we made in order to simplify our system in the mathematical model.
- The volume of immobilized cell stays constant, therefore dilution is insignificant in the measuring time-frame.
- At the beginning, all of the plasmids are repressed by MphR.
- At the beginning, the cell only contains plasmids with MphR.
- Macrolides get inside the cell by passive diffusion.
- Leakiness in the model is explained as a reversible reaction.
Using the values from tables 1 and 2, we run a SimBiology
simulation with stop time of 10 hours. Resulting plots can be seen in figures 4 and 5, which show
concentration in molarity on y-axis and time on x-axis. We used ode45 as solver in the simulation as
it was recommended by MATLAB documentation .
EGFP concentration is the most important measure due to the fact that it is the output of our optical biosensor. As expected, at the beginning EGFP concentration grows exponentially due to mass action kinetics: the rate of the reaction is proportional to concentrations of reactants. The production of EGFP slows down after 4 hours.
The goal of our final product was not only to detect macrolide antibiotics but also quantify them. However, the above figures give us little to no information on how would our sensor behave in different macrolide concentrations. Therefore, we run a scan with 10 different initial concentrations of macrolides in wastewater starting from 1E-10 M to 1E-5 M by evenly spaced intervals. The result of this scan can be seen in figure 6. As expected, a higher concentration of macrolides in wastewater produces higher output of EGFP. However, the difference between selected concentrations is very little, especially in the working time-frame of the final biosensor: first 2 hours. This might become a problem if leakiness and noise of the output is high enough to hide the low differences in the EGFP output.
Later on in the project, we had a suspicion that we could improve our biosensor output by choosing a weaker promoter for MphR transcription. Due to the fact that MphR works as repressor, it is evident that a cell with less MphR protein will produce more fluorescence. Thus, using a weaker promoter will result in higher EGFP output. To confirm this hypothesis we decided to run a MATLAB scan with five different promoter strengths with constant intervals between 0 and 0.02 polymerase/second (constitutive promoter BBa_J23106 ) in a constant concentration. The results of the scan confirmed the hypothesis: output is higher when less repressor is present. This result inspired us to try a different promoter for our biosensor; we switched from our initial constitutive promoter to an inducible promoter, pBAD.
ROSETTA: PROTEIN MODELLING
As we previously said, after realizing we needed to increase the range of our biosensor, we decided
to use Rosetta to perform some modelling in our MphR protein. Rosetta is a software that includes
algorithms for computational modelling and analysis of protein structures. It allows enzyme design,
de novo protein design, ligand docking and structure prediction of biological macromolecules and
complexes among others. We used this software to try to predict modifications in the binding sites
of MphR that would increase the binding affinity that this transcription factor has for macrolides.
Concretely, we modelled this increase focusing in two macrolides, erythromycin and clarithromycin,
which are both in the “watch list” of pharmaceuticals for EU-wide monitoring in aquatic environments
. For more information on Rosetta you can visit the Rosetta official site.
In order to achieve our objective, we based our modelling in the paper “Rosetta and the Design of Ligand Binding Sites” by Moretti et al. . We found this paper as a source from a Rosetta Guide iGEM Technion 2016 and iGEM TU Eindhoven 2016 wrote during the iGEM 2016 Competition. For our modelling we needed to follow several steps:
- Preparation of the MphR pdb file.
- Preparation of the ligand files.
- Docking of the ligands in the binding sites of the protein.
- Design of the binding site.
- Re-run the results.
Noteworthy mentioning, we took part in a journal initiative
organized by the Maastricht 2020 iGEM team, in which we wrote a step-by-step guide for
designing ligand binding sites. This guide is based in the Moretti et al. paper above mentioned and
the previous guide made by the 2016 iGEM teams, but it is thought to be used by non-computational
biologists, so it has more explanations and more concrete steps. We hope our Guide for Using
Rosetta when Designing Ligand Binding Sites helps future iGEM participants to use this
software even if they lack computational biologists in their teams.
Next, we are going to do an overview of our whole modelling process, but if the reader is interested in using the Rosetta software, they can download our step-by-step guide.
1. Preparation of the MphR pdb file
First, we downloaded the MphR pdb file from Protein Data Bank (PDB) and we used the molecular visualization software PyMOL to gain insights on its structure. As seen in the video, this transcription factor is an homodimer formed by two chains with two ligand binding sites, one binding site per chain. In the video, we can visualize the macrolide antibiotic erythromycin (orange) attached to the MphR. After visualizing the structure of MphR, we prepared the PDB file by cleaning it (removing the waters and other molecules) and relaxing it, so it could be used for the designing of the ligand binding sites. More information about the concrete commands applied can be found in our guide.
2. Preparation of the ligand files
In the second step, we downloaded the ligand files (erythromycin and clarithromycin) from ZINC and studied their structures in PyMOL (Figure 8). We also did some preparations to be able to use the files in Rosetta. To work in line with the experiments performed in wet-lab, we added hydrogens to both ligands according to the pH of wastewater samples we were working with (pH=6). This pH was measured from sterilized wastewater samples from HSY wastewater treatment plant that we used during the lab experiments. After the addition of hydrogens, we prepared a library of conformers to be used during the modelling.
3. Docking of ligands to the binding site
Molecular docking predicts the orientation and binding of a molecule to another molecule. There are
many different software that can be used for molecular docking, such as Autodock 4.2, AutoDock Vina and SwissDock.
In our study, the ligands were macrolide antibiotics (erythromycin and clarithromycin) and the receptor the transcription factor MphR. The binding site of erythromycin to MphR has been determined with crystallography  and is therefore known. Because of this, we did not have to perform blind docking. However, some other problems were caused by the size of the ligand and its multiple torsion angles. In general, it is harder to perform accurate docking with a bigger ligand, because of the amount of orientations the molecule can have. A second challenge in the docking process was that the MphR protein is an homodimer and therefore it binds to two ligands. Nonetheless, docking can be done only with one ligand at a time and at the end, the docked ligands have to be combined in one file. However, the additional challenge arose from the difference between the two chains (chain A and chain B) of the MphR. For the chain A the docking was easily successful but for the chain B the docking was more challenging. The estimations of the binding affinities were much lower and to achieve the correct visual orientation many runs with different parameters and software were needed. This may be caused by.....??
We used Autodock 4.2 to perform our first docking of erythromycin with the prepared and relaxed MphR. We achieved good results with chain A, but not chain B. After multiple runs with different parameters we tried to do the docking with the AutoDock Vina software. With slight adjustments to the parameters, the docking was successful for both erythromycin and clarithromycin. For comparison we also used the SwissDock software to dock erythromycin to MphR. However, we still achieved the best results with AutoDock Vina, therefore, it is the software we finally used.
For the analysis of the docking results we use the molecular visualization software PyMOL. Since the binding of erythromycin to MphR has been experimentally determined, we compared the docked conformation with the crystallized structure (Figures 9 and 10). We chose the conformation based on the best visual alignment of the ligand in combination with the estimated binding affinities, which were between –9 and –13 kcal/mol.
4. Design of the binding site
The following step is the designing itself, in which modifications to the ligand binding sites are made and assessed. Noteworthy, this step is computationally heavy and users who want to perform it should have access to a computer cluster if they want to obtain results in a reasonable time-frame. To perform this step, we needed to create a file in which we specified the amino acid residues we wanted to mutate. During our research, we came up with a paper by Feng et al. (2015) , which states that the main energy contributors for the binding between MphR and the macrolides are Met59, Val66, Ser92, Met93, Tyr103, Val126, His147, Thr154 and Met155. These are the MphR residues we modelled during our project. It is important to mention that during our modelling we did not allow mutations to cysteine because this amino acid tends to form sulphur bonds. Because of that, if allowing mutations for this amino acid almost all mutations obtained were cysteines. For more concrete information on this step, check our guide.
In this step, we filtered two times the results obtained during the designing part in order to get the results with the best scores, which are indicative of a probable better mutation. Here, the user can choose which metrics they use for the filtering. In our case, we used the metrics specified in tables 3 and 4. For all the metrics, we used the average values as the cut-off to obtain our final results. For more information on filtering and metrics, our guide can be checked.
|total_score||Measure of protein stability|
|if_X_fa_atr||Measure of attractive potential between atoms in different residues in the interface protein-ligand|
|fa_rep||Measure of ligand clashes in the whole protein|
|if_X_fa_rep||Measure of ligand clashes in the interface protein-ligand|
|ligand_is_touching_X||Measure of the distance between protein and ligand|
|delta_unsatHbonds||Unsatisfied hydrogen bonds|
|dG_separated/dSASAx100||Binding energy per contact area|
The final step of our modelling process consisted on re-running the design of the ligand binding sites using the best files obtained in the previous round. It is recommended to re-run the design from 3 to 5 times in order to obtain better results in each round. We performed the re-running 3 times for each ligand and we selected the 5 files with the best binding energies as our final files for each ligand. Figure 11 shows a summary of all the designing process above explained.
Final Analyses and Results
Tables 5 and 6 show the design scores for the final files we selected for erythromycin and clarithromycin. To ensure these files had better scores than the PDB file with erythromycin bound, we evaluated the file from PDB and the final files with the same Rosetta scoring function, ref2015. Table 7 shows the scoring of the proteins for this function. Focusing in the total_score metric, it can be clearly seen that the proteins with mutations obtained with Rosetta have more stability than the file retrieved from PDB, since they have lower scores. This indicates that a protein with the modelled mutations is expected to work in real life.
|MphR Protein Data Bank||-527,085||---|
After selecting these results, we downloaded the FASTA protein sequences and compare them. Worth mentioning, for erythromycin two out of the five selected sequences had the same mutations and for clarithromycin this happened with 3 out of the 5 selected sequences. We interpreted this as a good sign: if several of the best sequences have the same mutations, we expect these mutations to surely increase the binding affinity of MphR to macrolides. An important issue to point out is that even if our protein was an homodimer, we obtained different predicted mutations for the two chains. We talked with some people experienced in modelling and they recommended us to work with the mutations obtained in chain A, since we already had problems with the docking of the ligand with chain B, as above mentioned. Therefore, we agreed with our wet-lab team that we would order the MphR with the mutations obtained in chain A to test them. Below, we show the ordered sequences (Figure 12): (i) one sequence with the common mutations for erythromycin and clarithromycin, (ii) two sequences of the mutations obtained for erythromycin, concretely, the sequence with the best score and the sequences that had the same mutations and (iii) two sequences of the mutations obtained for clarithromycin, the sequence with the best score and the sequences that had the same mutations.
Since the concentration of macrolide antibiotics is very low in the wastewaters (Figure 2), we
investigated ways to further increase the sensitivity of the biosensor. One way to do this, would be
to increase the concentration of macrolides within the cells. After researching, we came up with a
paper by Mohammad et al. (2010)  in which it was shown that the modification of the
transmembrane protein FhuA could potentially increase the amount of some molecules such as
antibiotics inside the cells. In order to investigate this, we used the SWISS-MODEL server, a fully
automated protein structure homology-modelling server thought to make protein modelling accessible
to all life science researchers worldwide.
With SWISS-MODEL we compared the wildtype FhuA (video: green) with a modified FhuA with some loops removed (video: blue) and estimated the stability of the proteins.
Worth mentioning, although we studied the stability of FhuA computationally, at the end we did not run the experiments in the lab.
1. Kasey, C. M., Zerrad, M., Li, Y., Cropp, T. A., & Williams, G. J. (2018). Development of
Transcription Factor-Based Designer Macrolide Biosensors for Metabolic Engineering and Synthetic
Biology. ACS Synthetic Biology, 7(1), 227–239. https://doi.org/10.1021/acssynbio.7b00287
2. Schafhauser, Bruno Henrique, Lauren A. Kristofco, Cíntia Mara Ribas de Oliveira, and Bryan W. Brooks. “Global Review and Analysis of Erythromycin in the Environment: Occurrence, Bioaccumulation and Antibiotic Resistance Hazards.” Environmental Pollution 238 (July 1, 2018): 440–51. https://doi.org/10.1016/j.envpol.2018.03.052.
3. iGEM Registry: http://parts.igem.org/Part:pSB1K3
4. B10NUMB3R5: https:// bionumbers.hms.harvard.edu/ bionumber.aspx?id=114925&ver= 1&trm= cell+ volume+ Bacteria+ Escherichia+ coli&org=
5. Ross, J. F., and M. Orlowski. “Growth-Rate-Dependent Adjustment of Ribosome Function in Chemostat-Grown Cells of the Fungus Mucor Racemosus.” Journal of Bacteriology 149, no. 2 (February 1982): 650.
8. iGEM Warsaw: http:// 2010.igem.org/ Team:Warsaw/ Stage1/ PromMeas
9. iGEM registry: http:// parts.igem.org/ Promoters/ Catalog/ Anderson
10. B10NUMB3R5: https:// bionumbers.hms.harvard.edu/ bionumber.aspx?id=108598&ver= 0&trm=mrna+ degradation&org=
11. iGEM registry: http:// parts.igem.org/ Part:BBa_E0040
12. Zheng, Jianting, Vatsala Sagar, Adam Smolinsky, Chase Bourke, Nicole LaRonde-LeBlanc, and T. Ashton Cropp. “Structure and Function of the Macrolide Biosensor Protein, MphR(A), with and without Erythromycin.” Journal of Molecular Biology 387, no. 5 (April 17, 2009): 1250–60. https://doi.org/10.1016/j.jmb.2009.02.058.
13. https:// www.mathworks.com/ help/ matlab/ math/ choose-an-ode-solver.html#bu3n5rf-1
14. Agency for Healthcare Research and Quality. (2018). 2017 national healthcare quality and disparities report (Report No. 18-0033-EF). U.S. Department of Health and Human Services. https:// www.ahrq.gov/ research/ findings/ nhqrdr/ nhqdr17/ index.html
15. Moretti, R., Bender, B. J. & Allison, B. (2016). Methods in Molecular Biology: Design and Creation of Ligand Binding Proteins, 1414, 47–62. https://doi.org/10.1007/978-1-4939-3569-7
16. Feng, T., Zhang, Y., Ding, J. N., Fan, S., & Han, J. G. (2015). Insights into resistance mechanism of the macrolide biosensor protein MphR(A) binding to macrolide antibiotic erythromycin by molecular dynamics simulation. Journal of Computer-Aided Molecular Design, 29(12), 1123–1136. https://doi.org/10.1007/s10822-015-9881-0
17. Mohammad, M. M., Howard, K. R., & Movileanu, L. (2011). Redesign of a plugged β-barrel membrane protein. Journal of Biological Chemistry, 286(10), 8000–8013. https://doi.org/10.1074/jbc.M110.197723