Aalto-Helsinki 2020

MODELLING

INTRODUCTION

A scientific model is a simplification of reality used to predict the behaviour of a real-life system. It is an educated guess that scientists use to make a prediction about real life. With these models or simulations, insights into the modelled system are gained. In the past years, models have become an important part of scientific research since they save researchers’ time and costs. After studying, the model can predict how a system will behave when working with it and changing some parameters.

During our project, we have performed different kinds of modelling in order to predict and optimize the performance of our SINISENS biosensor. In order to achieve this, we have used different softwares: MATLAB, Rosetta and SWISS-MODEL. Our genetic circuit consists of a promoter that controls the expression of our MphR transcription factor and ermC gene, which gives resistance to macrolide antibiotics. When there are no macrolides present, MphR binds to the pMphR and represses the expression of the gene controlled by this promoter, egfp. Consequently, if there are no macrolides present there is no output signal, this is, no fluorescence. On the other hand, if there are macrolides present in the sample, these bind to the MphR, forming a complex incapable to bind to the pMphR. Thus, if macrolides are present, egfp gene is expressed, so the biosensor emits fluorescence (Fig. 1).

**Figure 1.** Genetic circuit of the SINISENS biosensor.

MATLAB: MATHEMATICAL MODELLING

Mathematical modelling consists of describing a physical system with mathematical functions and properties. It usually starts with an examination of a physical system to determine how it works. The end goal of such process is to find a set of functions that describe the physical system as precisely as possible. This set of functions is called a mathematical model. Mathematical models can be as simple as just one function, but they can contain tens or even hundreds of functions that all affect the output. A simple example of such a model could be the trajectory of a thrown ball which can be simplified to one quadratic function. Mathematical models can be useful during the design of a new physical system. For example, a mathematical model of a flying ball can tell at what angle the ball needs to be thrown to reach a required distance.

In synthetic biology, mathematical models can be very beneficial. For example, they can describe how output measured from biosensor changes when parameters of a system are tuned. These parameters might include biological constants, such as promoter strength, dissociation constant, and protein degradation rate.

Our motivation behind mathematical modelling was to achieve three goals we had set to ourselves:

Estimate the range of macrolide concentration at which our biosensor could be used.
Get insights on how we could improve the sensitivity of the biosensor.
Calibrate our biosensor.

To achieve these goals, we built two different mathematical models. We decided that mathematical modelling was a proper way to achieve these goals due to the fact that biological systems such as pathways can be described with mathematical functions and the output is easily comparable as it is numerical.

Estimating the Range of the Biosensor

We built our first model to determine the range of the biosensor. To achieve this, we used experimental data from literature and applied it to the Hill equation. In biochemistry, the Hill equation describes the relationship between ligand and receptor. We used this equation to plot the percentage of receptors bound to macrolides as a function of macrolide concentration. We used data from wildtype as well as mutated Escherichia coli [1], since we wanted to get insights on how mutations affect the MphR range. In addition, we plotted macrolide concentrations in wastewater to see if these were inside the range of the biosensor.

The maximum concentration of macrolides in effluent wastewaters is not inside the range of the biosensor. In fact, our biosensor would need to be 10-100 times more sensitive than the best performing circuit found from literature to detect the low concentrations of macrolides found in wastewaters (Fig. 2). These results were crucial to move one step further in our modelling: we decided that we were planning to use Rosetta to predict mutations in the binding site of MphR that increase the binding affinity of MphR to macrolide antibiotics. Furthermore, since the sensitivity had to be heavily increased, we arrived at the conclusion that we would need to tune our system even more and we settled the third main goal of our project, to concentrate macrolide antibiotics inside our biosensor cells.

To conclude, this first model highlighted the importance of modelling as it gave us insight to prepare for this problem well before obtaining any experimental data from the lab.

**Figure 2.** Plot of the Hill function showing at which macrolide concentration does MphR bound to these antibiotics. Blue line shows the result for the wildtype MphR and red line for a modified RBS [1].

Biosensor Functionality

Introduction and Motivation

The goal of the second model was to understand how the biosensor worked. This would help us to gain insights on how the sensitivity could be improved by discovering possible bottlenecks in the pathway. In addition, the mathematical model could help us during the calibration of the final device by estimating the output in different scenarios.

To build this model, we used MATLAB’s add-on application, SimBiology. SimBiology is a tool designed for systems biology; this is mathematical analysis of biological systems. This app allows building models interactively with a graphical interface in addition to the traditional method of programming with MATLAB-language. The graphical interface of SimBiology, referred to as block diagram editor, allows building complicated biological systems visually. Thus, it requires little to no programming experience to be able to build a mathematical model.

System and Reactions

In our modelled genetic system (Fig. 3) mphR is transcribed and translated. Once there is MphR protein in the cell, it binds to the pMphR promoter, repressing the expression of egfp. If there are macrolide antibiotics present, they will enter the cell and bind to the MphR. This results in MphR unbinding from the promoter and the egfp can be expressed.

This model consists of two parts: a sample of wastewater and an E. coli cell in it. In SimBiology, these are referred to as compartments. Our pathway starts from outside of the cell; wastewater contains some concentration of macrolide antibiotics. We assumed that macrolides get inside of the cell by diffusion: passive transport of molecules through the cell membrane. The rate of such a transportation method depends on the concentration difference between the two sides of the membrane. It is important to notice that as the movement of molecules is more or less random, in the end the system settles down in an equilibrium state. In this state, the concentration on both sides of the membrane is equal. This reaction can be modelled with reversible mass action kinetics, which states that the rate of the reaction is proportional to the concentration of the reactants.

After the macrolide molecules have entered the cell, they can bind to the repressor MphR, which is controlled by a constitutive promoter about which we will discuss later. This reaction results in the derepression of egfp expression:

Two macrolide molecules bind to the MphR transcription factor and this causes MphR to unbind from the plasmid.
Transcription of egfp begins.
The end products of the reaction are the plasmid and the MphR-macrolide complex.

We decided to model the derepression, as all other reactions, using mass action kinetics. We made this decision due to the fact that there is very limited amount of information and experimental data on these systems. Thus, finding different constants such as the promoter strength or the dissociation constant might be impossible. As a consequence, we might need to estimate these constants, which is discussed later.

The transcription of egfp can then start once the promoter is derepressed. At the end, this results in the last step of the pathway: the EGFP protein translation. Noteworthy, in our system there are two transcription-translation pathways: transcription-translation of MphR and transcription-translation of EGFP. These two pathways are very similar and only the constants are different. However, there is a small difference in the regulation of the transcription: mphr transcription is controlled by a constitutive promoter, which means that the translation is continuous. On the other hand, egfp transcription is controlled by the previously mentioned transcription factor, the MphR repressor. This means that the egfp transcription stops as soon as the MphR binds to the plasmid.

Opposite to the previously discussed reactions (derepression and diffusion), neither transcription nor translation are reversible reactions. In addition, the reactants in these reactions are not used in the process. Hence, if the model doesn’t take that into account, there would be a reaction product overflow at the end. In real biological systems, proteins are not permanent; they degrade and their concentration decreases when the cell grows. Thus, we added reactions for mRNA degradation as well as protein degradation. In addition, we assumed dilution not to be significant due to the fact that the volume of immobilized cells stays constant in the sensing time frame of 1-2 hours, which is the time of use of our biosensor.

Two reactions we have yet to discuss are repression of the egfp gene in the plasmid by MphR and binding of two macrolide molecules with the unbound MphR. These reactions are similar: both are reversible as well as both have two types of reactants binding to one complex. The only difference is that MphR binds with two molecules, which makes the reaction rate function a bit different.

**Figure 3.** Genetic system modelled with MatLab SimBiology.

Finding Constants

The last part of modelling the SINISENS pathway was to set initial concentrations (Table 1) as well as finding proper constants for each reaction (Table 2). Due to the limited amount of experimental data and information available, some of the reaction constants had to be estimated by either fitting data or using numerical averages of a similar part.

**Table 1.** Initial values of the reactants.
Initial Values	Amount	Units	Notes	Source
Macrolides outside the cell	1.41E-09	1 mol/L	---	[2]
MphR:plasmid	200	Molecules	High Copy Number plasmid	[3]
Average E. coli cell volume	0.47	μm^3	Used to convert molecules to concentration	[4]
Wastewater Sample Size	1	L	---	---
Others	---	---	Other initial values are assumed to be zero	---

**Table 2.** Reaction constants.
Reaction Constants	Amount	Units	Notes	Source
Average Translation Rate (E. coli)	17-21	Amino acids / second	Used 17AA/s to calculate MphR and EGFP translation rate	[5]
Length of MphR (WT)	194	Amino acids	Used to calculate MphR translation rate	[6]
Length of EGFP	240	Amino acids	Used to calculate EGFP translation rate	[7]
Absolute Promoter Strength of BBa_J23101	0.03	Polymerase / second	Used to calculate promoter strenght of BBa_J23106	[8]
Relative Promoter Strenght of BBa_J23106	0.67	Relative	Compared to BBa_J23101, used to calculate promoter strenght of BBa_J23106: 0.03 · 0.67 = 0.02 Polymerase / second	[9]
Typical mRNA Degradation Rate	3-8	Minutes	Used 5 minutes	[10]
EGFP Degradation Rate	0.021	1/hour	---	[11]
Dissociation Constant of MphR Binding to its Operator (DNA chain)	574 +/- 29	nM	K_d: Ratio of reverse reaction rate and forward reaction rate. Used in repression.	[12]
Concentration of Ligand at Half Maximum Normalized GFP Fluorescence	0.19 +/- 0.02	μM	Used to calculate dissociation constant of macrolide molecule binding to MphR	[1]
Hill Coefficient	1.7 +/- 0.3	---	Used to calculate dissociation constant of macrolide molecule binding to MphR	[1]

Assumptions

When modelling physical systems, it is impossible to take every little detail into account to add them in mathematical equations. For example, in the throwing ball example we did not take air resistance into account to simplify the problem. The same applies for modelling biological systems: we have to make assumptions to simplify the problem. Below is a list of assumptions that we made in order to simplify our system in the mathematical model.

The volume of immobilized cells stays constant, therefore dilution is insignificant in the measuring time-frame.
At the beginning, egfp is repressed by MphR in all the plasmids.
At the beginning, the cell only contains plasmids with MphR.
Macrolides get inside the cell by passive diffusion.
Leakiness in the model is explained as a reversible reaction.

Results

Using the values from tables 1 and 2, we run a SimBiology simulation with stop time of 10 hours. Resulting plots can be seen in figures 4 and 5, which show concentration in molarity on y-axis and time on x-axis. We used ode45 as a solver in the simulation as it was recommended by MATLAB documentation [13].

EGFP concentration is the most important measure due to the fact that it is the output of our optical biosensor. As expected, at the beginning EGFP concentration grows exponentially due to mass action kinetics: the rate of the reaction is proportional to concentrations of reactants. The production of EGFP slows down after 4 hours.

**Figure 4.** EGFP concentration in our model. As expected, EGFP concentration grows exponentially and it slows down after a while (concretely, 4 hours).

**Figure 5.** Concentration of other reactants and products of our system. The concentration of macrolides outside the cell is always fixed. Macrolide concentration inside the cell increases fast and equilibrates when reaching the concentration of macrolides outside the cell since we assumed macrolide diffusion. Concentration of EGFP mRNA increases fast and then it decreases due to EGFP mRNA degradation. Free plasmid concentration is 0 at the beginning because all MphR is bound to the plasmid. When macrolides start entering the cell and binding MphR, the concentration of free plasmid increases. After a while there is a slight decrease because MphR is constitutively produced and it starts binding to the plasmid again until an equilibrium is reached.

The goal of our final product was not only to detect macrolide antibiotics but also quantify them. However, the above figures give us little to no information on how our sensor would behave in different macrolide concentrations. Therefore, we run a scan with 10 different initial concentrations of macrolides in wastewater starting from 1E-10 M to 1E-5 M by evenly spaced intervals. The result of this scan can be seen in figure 6. As expected, a higher concentration of macrolides in wastewater produces higher output of EGFP. However, the difference between selected concentrations is very little, especially in the working time-frame of the final biosensor: first 2 hours. This might become a problem if leakiness and noise of the output is high enough to hide the low differences in the EGFP output.

**Figure 6.** Output of the SINISENS biosensor with different macrolide concentration in wastewater.

Later on in the project, we had a suspicion that we could improve our biosensor output by choosing a weaker promoter for MphR transcription. Due to the fact that MphR works as a repressor, it is evident that a cell with less MphR protein will produce more fluorescence. Thus, using a weaker promoter will result in higher EGFP output. To confirm this hypothesis we decided to run a MATLAB scan with five different promoter strengths with constant intervals between 0 and 0.02 polymerase/second (constitutive promoter BBa_J23106 [9]) in a constant concentration. The results of the scan confirmed the hypothesis: output is higher when less repressor is present. This result inspired us to try a different promoter for our biosensor; we switched from our initial constitutive promoter to an inducible promoter, pBAD.

**Figure 7.** Output of a scan with 5 different promoter strengths.

ROSETTA: PROTEIN MODELLING

As we previously said, after realizing we needed to increase the range of our biosensor, we decided to use Rosetta to perform binding site modelling in our MphR protein. Rosetta is a software that includes algorithms for computational modelling and analysis of protein structures. It allows enzyme design, de novo protein design, ligand docking and structure prediction of biological macromolecules and complexes among others. We used this software to try to predict modifications in the binding sites of MphR that would increase the binding affinity that this transcription factor has for macrolides. Concretely, we modelled this increase focusing on two macrolides, erythromycin and clarithromycin, which have been placed on the ‘watch list’ of pharmaceuticals for EU-wide monitoring in aquatic environments in 2015 and are now considered for prioritization [14]. For more information, refer to the official website of Rosetta.

In order to achieve our objective, we based our modelling in the paper “Rosetta and the Design of Ligand Binding Sites” by Moretti et al. [15]. We found this paper as a source from a Rosetta Guide iGEM Technion 2016 and iGEM TU Eindhoven 2016 wrote during the iGEM 2016 Competition. For our modelling we needed to follow several steps:

Preparation of the MphR pdb file.
Preparation of the ligand files.
Docking of the ligands in the binding sites of the protein.
Design of the binding site.
Filtering.
Re-run the results.

Noteworthy mentioning, we took part in a journal initiative organized by the Maastricht 2020 iGEM team, in which we wrote a step-by-step guide for designing ligand binding sites. This guide is based on the Moretti and colleagues paper above mentioned and the previous guide made by the 2016 iGEM teams, but it is thought to be used by non-computational biologists, so it has more explanations and more detailed steps. We hope our Guide for Using Rosetta when Designing Ligand Binding Sites helps future iGEM participants to use this software even if they lack computational biologists in their teams.

In the following sections, we present an overview of our whole modelling process, but if the reader is interested in using the Rosetta software, they can download our step-by-step guide.

1. Preparation of the MphR pdb file

First, we downloaded the MphR pdb file from Protein Data Bank (PDB) and we used the molecular visualization software PyMOL to gain insights on its structure. As seen in the video, this transcription factor is an homodimer formed by two chains with two ligand binding sites, one binding site per chain. In the video, we can visualize the macrolide antibiotic erythromycin (orange) attached to the MphR. After visualizing the structure of MphR, we prepared the PDB file by cleaning it (removing the waters and other molecules) and relaxing it, so it could be used for the designing of the ligand binding sites. The module for relaxing the protein is the following: Khatib F, Cooper S, Tyka MD, Xu K, Makedon I, Popovic Z, Baker D, and Players F. (2011). Algorithm discovery by protein folding game players. Proc Natl Acad Sci USA 108(47):18949-53. doi: 10.1073/pnas.1115898108. More information about the concrete commands applied can be found in our guide.

2. Preparation of the ligand files

In the second step, we downloaded the ligand files (erythromycin and clarithromycin) from ZINC and studied their structures in PyMOL (Figure 8). We also did some preparations to be able to use the files in Rosetta. To work in line with the experiments performed in wet-lab, we added hydrogens to both ligands according to the pH of wastewater samples we were working with (pH=6). This pH was measured from sterilized wastewater samples from one of HSY wastewater treatment plants that we used during the lab experiments. After the addition of hydrogens, we prepared a library of conformers to be used during the modelling.

**Figure 8.** Structures of erythromycin (orange) and clarithromycin (red). Both macrolides are 14-membered lactone rings; the only difference between them is that at position six, erythromycin has an hydroxyl group and clarithromycin a methoxy group.

3. Docking of ligands to the binding site

Molecular docking predicts the orientation and binding of a molecule to another molecule. There are many different software that can be used for molecular docking, such as Autodock 4.2, AutoDock Vina and SwissDock.

In our study, the ligands were macrolide antibiotics (erythromycin and clarithromycin) and the receptor the transcription factor MphR. The binding site of erythromycin to MphR has been determined with crystallography [12] and is therefore known. Because of this, we did not have to perform blind docking. However, some other problems were caused by the size of the ligand and its multiple torsion angles. In general, it is harder to perform accurate docking with a bigger ligand, because of the amount of orientations the molecule can have. A second challenge in the docking process was that the MphR protein is an homodimer and therefore it binds to two ligands. Nonetheless, docking can be done only with one ligand at a time and at the end, the docked ligands have to be combined in one file. However, the additional challenge arose from the difference between the two chains (chain A and chain B) of the MphR. For the chain A the docking was easily successful but for the chain B the docking was more challenging. The estimations of the binding affinities were much lower and to achieve the correct visual orientation many runs with different parameters and software were needed.

We used Autodock 4.2 to perform our first docking of erythromycin with the prepared and relaxed MphR. We achieved good results with chain A, but not chain B. After multiple runs with different parameters we tried to do the docking with the AutoDock Vina software. With slight adjustments to the parameters, the docking was successful for both erythromycin and clarithromycin. For comparison we also used the SwissDock software to dock erythromycin to MphR. However, we still achieved the best results with AutoDock Vina, therefore, it is the software we finally used.

For the analysis of the docking results we use the molecular visualization software PyMOL. Since the binding of erythromycin to MphR has been experimentally determined, we compared the docked conformation with the crystallized structure (Figures 9 and 10). We chose the conformation based on the best visual alignment of the ligand in combination with the estimated binding affinities, which were between –9 and –13 kcal/mol.

**Figure 9.** Docked erythromycin with AutoDock Vina (red) compared to the crystallized structure of erythromycin bound to MphR(A) retrieved from PDB (blue).

**Figure 10.** Docked clarithromycin with AutoDock Vina (red) compared to the crystallized structure of erythromycin bound to MphR(A) retrieved from PDB (blue).

4. Design of the binding site

The following step is the designing itself, in which modifications to the ligand binding sites are made and assessed. Noteworthy, this step is computationally heavy and users who want to perform it should have access to a computer cluster if they want to obtain results in a reasonable time-frame. To perform this step, we needed to create a file in which we specified the amino acid residues we wanted to mutate. During our research, we came up with a paper by Feng et al. (2015) [16], which states that the main energy contributors for the binding between MphR and the macrolides are Met59, Val66, Ser92, Met93, Tyr103, Val126, His147, Thr154 and Met155. These are the MphR residues we modelled during our project. It is important to mention that during our modelling we did not allow mutations to cysteine because this amino acid tends to form sulphur bonds. Because of that, if allowing mutations for this amino acid almost all mutations obtained were cysteines. For more concrete information on this step, check our guide.

5. Filtering

In this step, we filtered two times the results obtained during the designing part in order to get the results with the best scores, which are indicative of a probable better mutation. Here, the user can choose which metrics they use for the filtering. In our case, we used the metrics specified in tables 3 and 4. For all the metrics, we used the average values as the cut-off to obtain our final results. For more information on filtering and metrics, our guide can be checked.

**Table 3.** Metrics used in the first filtering step.
Metric	Explanation
total_score	Measure of protein stability
if_X_fa_atr	Measure of attractive potential between atoms in different residues in the interface protein-ligand
fa_rep	Measure of ligand clashes in the whole protein
if_X_fa_rep	Measure of ligand clashes in the interface protein-ligand
ligand_is_touching_X	Measure of the distance between protein and ligand
interface_delta_X	Binding energy

**Table 4.** Metrics used in the second filtering step.
Metric	Explanation
packstat	Packing metric
sc_value	Shape complementarity
delta_unsatHbonds	Unsatisfied hydrogen bonds
dG_separated/dSASAx100	Binding energy per contact area
dG_separated	Binding energy

6. Re-Run

The final step of our modelling process consisted of re-running the design of the ligand binding sites using the best files obtained in the previous round. It is recommended to re-run the design from 3 to 5 times in order to obtain better results in each round. We performed the re-running 3 times for each ligand and we selected the 5 files with the best binding energies as our final files for each ligand. Figure 11 shows a summary of all the designing processes above explained.

**Figure 11.** Summary of the steps followed for MphR modelling.

Final Analyses and Results

Tables 5 and 6 show the design scores for the final files we selected for erythromycin and clarithromycin. To ensure these files had better scores than the PDB file with erythromycin bound, we evaluated the file from PDB and the final files with the same Rosetta scoring function, ref2015. Table 7 shows the scoring of the proteins for this function. Focusing in the total_score metric, it can be clearly seen that the proteins with mutations obtained with Rosetta have more stability than the file retrieved from PDB, since they have lower scores. This indicates that a protein with the modelled mutations is expected to work in real life.

**Table 5.** Final scores of the selected erythromycin and clarithromycin files (first filtering step).
File	total_score	fa_rep	if_X_ fa_atr	if_X_ fa_rep	ligand_is_ touching_X	interface_ delta_X
Erythromycin 1	-1658,168	185,078	-49,016	7,503	1	-28,208
Erythromycin 2	-1668,171	183,354	-48,433	8,016	1	-26,936
Erythromycin 3	-1670,627	184,126	-51,915	8,126	1	-27,151
Erythromycin 4	-1659,102	182,991	-48,626	6,068	1	-30,052
Erythromycin 5	-1660,972	185,565	-47,716	6,736	1	-25,527
Clarithromycin 1	-1678,894	189,084	-56,086	6,633	1	-35,033
Clarithromycin 2	-1675,459	187,563	-55,160	5,954	1	-33,033
Clarithromycin 3	-1679,833	185,432	-55,558	5,842	1	-34,199
Clarithromycin 4	-1672,542	187,834	-54,593	6,168	1	-31,958
Clarithromycin 5	-1675,155	183,774	-53,198	6,856	1	-33,903

**Table 6.** Final scores of the selected erythromycin and clarithromycin files (second filtering step).
File	packstat	sc_value	delta_ unsatHbonds	dG_separated/ dSASAx100	dG_separated
Erythromycin 1	0,657	0,535	12	-1,918	-52,311
Erythromycin 2	0,625	0,533	13	-1,880	-52,008
Erythromycin 3	0,613	0,542	15	-1,848	-50,556
Erythromycin 4	0,593	0,539	17	-1,802	-50,519
Erythromycin 5	0,603	0,528	14	-1,845	-50,516
Clarithromycin 1	0,610	0,584	17	-2,236	-62,079
Clarithromycin 2	0,614	0,581	16	-2,252	-61,642
Clarithromycin 3	0,642	0,579	15	-2,237	-61,031
Clarithromycin 4	0,630	0,580	16	-2,188	-60,103
Clarithromycin 5	0,614	0,559	15	-2,171	-59,758

**Table 7.** Scoring results from ref2015 Rosetta function.
File	total_score	dG_separated
MphR Protein Data Bank	-527,085	---
Erythromycin 1	-944,062	-52,311
Erythromycin 2	-960,434	-52,008
Erythromycin 3	-962,823	-50,556
Erythromycin 4	-952,452	-50,519
Erythromycin 5	-948,739	-50,516
Clarithromycin 1	-955,655	-62,079
Clarithromycin 2	-963,694	-61,642
Clarithromycin 3	-967,505	-61,031
Clarithromycin 4	-957,522	-60,103
Clarithromycin 5	-971,376	-59,758

After selecting these results, we downloaded the FASTA protein sequences and compared them. Worth mentioning, for erythromycin two out of the five selected sequences had the same mutations and for clarithromycin this happened with 3 out of the 5 selected sequences. We interpreted this as a good sign: if several of the best sequences have the same mutations, we expect these mutations to likely increase the binding affinity of MphR to macrolides. An important issue to point out is that even if our protein was an homodimer, we obtained different predicted mutations for the two chains. We talked with Maria Sammalkorpi and Bart Rooijakkers, who have experience in modelling and they recommended us to work with the mutations obtained in chain A, since we already had problems with the docking of the ligand with chain B, as above mentioned. Therefore, we agreed with our team that we would order the MphR with the mutations obtained in chain A to test them. Below, we show the ordered sequences (Fig. 12): (i) one sequence with the common mutations for erythromycin and clarithromycin, (ii) two sequences of the mutations obtained for erythromycin, concretely, the sequence with the best score and the sequences that had the same mutations and (iii) two sequences of the mutations obtained for clarithromycin, the sequence with the best score and the sequences that had the same mutations.

**Figure 12.** MphR sequences with the mutations that were ordered to test in wet-lab.

SWISS-MODEL

Since the concentration of macrolide antibiotics is very low in the wastewaters (Fig. 2), we investigated ways to further increase the sensitivity of the biosensor. One way to do this, would be to increase the concentration of macrolides within the cells. After researching, we found a paper by Mohammad and colleagues [17] in which it was shown that the modification of the transmembrane protein FhuA could potentially increase the amount of some molecules such as antibiotics inside the cells. In the article they had replaced some of the loops of the protein with a single polypeptide. However, we wanted to estimate the protein stability of FhuA with only removed loops. In order to investigate this, we used the SWISS-MODEL server, a fully automated protein structure homology-modelling server thought to make protein modelling accessible to all life science researchers worldwide.

With SWISS-MODEL we compared the wildtype FhuA (video: green) with a modified FhuA with the loops removed (video: blue) and estimated the stability of the proteins. The sequence for the FhuA protein can be found in Figure 13, the highlighted sequences are the ones removed. Based on the paper by Mohammad and the colleagues [17] we chose to remove L3 (Tyr243–Asn273), L5 (Asp394–Asn419), L11 (Asn682–Arg704), L4 (Cys318–His339) and the cork domain (Met1–Pro160).

**Figure 13.** MphR sequences with the mutations that were ordered to test in wet-lab.

The results from the SWISS-MODEL scoring, which were very promising, can be seen in tables 8 and 9. Notable is that all the values of the modified FhuA protein lie close to the ones of the unmodified. Worth mentioning, although we studied the stability of FhuA computationally, at the end we did not run the experiments in the lab.

**Table 8.** Modified Fhu(A) scores in SWISS-MODEL.
Modified Fhu(A)	Scores	Definitions or residues
GMQE	0,99	The resulting GMQE score is expressed as a number between 0 and 1, reflecting the expected accuracy of a model built with that alignment and template and the coverage of the target. Higher numbers indicate higher reliability [18].
MolProbity Score	1,692	---
Clash Score	1,83	---
Ramachandran Favoured	93,76%	---
Ramachandran Outliers	1,25%	A459 ALA, A235 ASP, A278 PHE, A394 LYS, A463 MET, A347 GLY
Rotamer Outliers	2,71%	A279 ASP, A298 VAL, A10 THR, A267 GLN, A401 VAL, A170 LEU, A45 GLN, A480 THR, A175 VAL, A171 GLN, A236 LYS
C-β Deviations	1	A187 ASP
Bad Bonds	3/3943	_1A GP1-_1B GP4, _1D GMH-_1E GPH, _1F GLC-_1H GLA
Bad Angles	34/5342	A384 ASP, A235 ASP, A278 PHE, A478 THR, A440 ASN, A265 ASP, A172 ASN, A451 ASP, A390 ASP, A264 ASP, (A397 THR-A398 PRO), A480 THR, (A472 VAL-A473 ASN), A230 ASP, A473 ASN, A191 THR, A404 HIS, (A62 ARG-A63 PRO), A187 ASP, (A293 GLU-A294 PRO), A144 GLU, A437 ASP, A344 ASP, A139 ASN, A197 ASP, (A246 ASP-A247 TRP), (A346 GLU-A347 GLY), A92 THR, A471 HIS, A114 THR, A6 PHE, A246 ASP
QMEAN	-2,07	The QMEAN Z-score provides an estimate of the "degree of nativeness" of the structural features observed in the model on a global scale and is described in Benkert et al,. It indicates whether the QMEAN score of the model is comparable to what one would expect from experimental structures of similar size. QMEAN Z-scores around zero indicate good agreement between the model structure and experimental structures of similar size. Scores of -4,0 or below are an indication of models with low quality. The four individual terms of the global QMEAN quality scores are Cβ, all atom, solvation and torsion. Positive values indicate that the model scores higher than experimental structures on average. Negative numbers indicate that the model scores lower than experimental structures on average [18].
Cβ	-1,95	---
All Atom	1,27	---
Solvation	1,30	---
Torsion	-2,27	---

**Table 9.** Modified Fhu(A) scores in SWISS-MODEL.
Unmodified Fhu(A)	Scores	Definitions or residues
GMQE	0,99	The resulting GMQE score is expressed as a number between 0 and 1, reflecting the expected accuracy of a model built with that alignment and template and the coverage of the target. Higher numbers indicate higher reliability [18].
MolProbity Score	1,84	---
Clash Score	2,22	(_1F GLC-_1H GLA), (A266 PHE-A403 TYR), (A384 ARG-_1C KDO)
Ramachandran Favoured	93,37%	---
Ramachandran Outliers	1,01%	A128 ARG, A555 GLY, A667 ALA, A602 LYS, A486 PHE, A443 ASP, A20 SER
Rotamer Outliers	3,45%%	A268 GLU, A475 GLN, A609 VAL, A132 MET, A71 SER, A352 LEU, A457 ASP, A118 ASP, A487 ASP, A419 ASN, A711 THR, A357 VAL, A205 GLN, A66 VAL, A397 VAL, A112 GLN, A93 ARG, A76 VAL, A353 GLN, A444 LYS
C-β Deviations	2	A403 TYR, A592 ASP
Bad Bonds	4/5600	_1A GP1-_1B GP4, _1D GMH-_1E GPH, A403 TYR, _1F GLC-_1H GLA
Bad Angles	35/7602	A266 PHE, A592 ASP, A413 ASN, A171 ASP, A699 PHE, A403 TYR, A623 PHE, A323 ASN, A20 SER, A486 PHE, (A73 THR-A74 PRO), A354 ASN, A557 PHE, A404 ASN, (A605 THR-A606 PRO), A339 HIS, A89 HIS, A443 ASP, A132 MET, A308 SER, A304 GLU, A679 HIS, A419 ASN, A88 ASP, A186 ASP, A61 HIS, A402 LEU, (A666 LEU-A667)
QMEAN	-1,60	The QMEAN Z-score provides an estimate of the "degree of nativeness" of the structural features observed in the model on a global scale and is described in Benkert et al,. It indicates whether the QMEAN score of the model is comparable to what one would expect from experimental structures of similar size. QMEAN Z-scores around zero indicate good agreement between the model structure and experimental structures of similar size. Scores of -4,0 or below are an indication of models with low quality. The four individual terms of the global QMEAN quality scores are Cβ, all atom, solvation and torsion. Positive values indicate that the model scores higher than experimental structures on average. Negative numbers indicate that the model scores lower than experimental structures on average [18].
Cβ	-1,99	---
All Atom	-0,28	---
Solvation	-0,31	---
Torsion	-1,15	---

REFERENCES

1. Kasey, C. M., Zerrad, M., Li, Y., Cropp, T. A., & Williams, G. J. (2018). Development of Transcription Factor-Based Designer Macrolide Biosensors for Metabolic Engineering and Synthetic Biology. ACS Synthetic Biology, 7(1), 227–239. https://doi.org/10.1021/acssynbio.7b00287
2. Schafhauser, Bruno Henrique, Lauren A. Kristofco, Cíntia Mara Ribas de Oliveira, and Bryan W. Brooks. “Global Review and Analysis of Erythromycin in the Environment: Occurrence, Bioaccumulation and Antibiotic Resistance Hazards.” Environmental Pollution 238 (July 1, 2018): 440–51. https://doi.org/10.1016/j.envpol.2018.03.052.
3. iGEM Registry: http://parts.igem.org/Part:pSB1K3
4. B10NUMB3R5: https:// bionumbers.hms.harvard.edu/ bionumber.aspx?id=114925&ver= 1&trm= cell+ volume+ Bacteria+ Escherichia+ coli&org=
5. Ross, J. F., and M. Orlowski. “Growth-Rate-Dependent Adjustment of Ribosome Function in Chemostat-Grown Cells of the Fungus Mucor Racemosus.” Journal of Bacteriology 149, no. 2 (February 1982): 650.
6. https:// www.ebi.ac.uk/ ena/ data/ view/ BAB12241
7. https:// www.fpbase.org/ protein/ egfp/
8. iGEM Warsaw: http:// 2010.igem.org/ Team:Warsaw/ Stage1/ PromMeas
9. iGEM registry: http:// parts.igem.org/ Promoters/ Catalog/ Anderson
10. B10NUMB3R5: https:// bionumbers.hms.harvard.edu/ bionumber.aspx?id=108598&ver= 0&trm=mrna+ degradation&org=
11. iGEM registry: http:// parts.igem.org/ Part:BBa_E0040
12. Zheng, Jianting, Vatsala Sagar, Adam Smolinsky, Chase Bourke, Nicole LaRonde-LeBlanc, and T. Ashton Cropp. “Structure and Function of the Macrolide Biosensor Protein, MphR(A), with and without Erythromycin.” Journal of Molecular Biology 387, no. 5 (April 17, 2009): 1250–60. https://doi.org/10.1016/j.jmb.2009.02.058.
13. https:// www.mathworks.com/ help/ matlab/ math/ choose-an-ode-solver.html#bu3n5rf-1
14. Decision (EU) 2015/495 of 20 March 2015 establishing a watch list of substances for Union-wide monitoring in the field of water policy pursuant to Directive 2008/105/EC of the European Parliament and of the Council ' (2015) Official Journal L78, p. 40.
15. Moretti, R., Bender, B. J. & Allison, B. (2016). Methods in Molecular Biology: Design and Creation of Ligand Binding Proteins, 1414, 47–62. https://doi.org/10.1007/978-1-4939-3569-7
16. Feng, T., Zhang, Y., Ding, J. N., Fan, S., & Han, J. G. (2015). Insights into resistance mechanism of the macrolide biosensor protein MphR(A) binding to macrolide antibiotic erythromycin by molecular dynamics simulation. Journal of Computer-Aided Molecular Design, 29(12), 1123–1136. https://doi.org/10.1007/s10822-015-9881-0
17. Mohammad, M. M., Howard, K. R., & Movileanu, L. (2011). Redesign of a plugged β-barrel membrane protein. Journal of Biological Chemistry, 286(10), 8000–8013. https://doi.org/10.1074/jbc.M110.197723
18. https://swissmodel.expasy.org/docs/help