## Introduction

Our dry lab efforts expanded to several different tracks, each
confronting a different set of problems that required solving by different methods and tools. We emphasized cooperation with the wet lab, which lead to a great deal attention being paid to interpreting results and turning them into insights that can be integrated into the wet lab. Our main focus areas were:

- ✧
**Structural modeling**of the alpha subunit of yeast G protein yielded unexpected insights into protein interaction. - ✧
**ODE (ordinary differential equation)**models for our designs of signaling pathways in yeast provided us with a comparison of their properties. - ✧
**SDE (stochastic differential equation)**models for the same three designs predicted unexpected behaviors that may arise in the cell.

## Differential Equations for Signaling Pathways

Since we were developing a biosensor which needed to sense changes in rather
low concentrations, it was crucial for us to have a good understanding of
the signal transduction pathway we were coupling it to, and what reactions
made up that pathway (please refer to our project design page
for more information on our biosensor design). Through our dry lab work,
we set out to answer these questions:

To answer these questions, we made use of

We needed to identify the relevant parameters for the models, in order to run realistic simulations. These values were found by researching literature, other models, and past iGEM wiki-pages. The parameters were selected to be consistent among our different designs. The parameters found were taken as rough estimates (see figure 1), because the performance of biological systems can vary for several reasons that have yet to be elucidated. However, rough estimates are enough to get great insights into the mechanics of cellular pathways. Optimally, producing more precise laboratory data, and fitting the models hereto, would be the next step to obtaining more realistic modeling results.

- 1) How do we make sure that our biosensor can react to the low concentrations of cytokines found in sweat, within a reasonable time limit set by our Human Practices research?
- 2) Where are the bottlenecks in the pathway, and can we fix them?
- 3) Which signal transduction steps are robust, and which are sensitive?

To answer these questions, we made use of

**differential equations**: Equations that illustrate how the state of a variable changes with time.## How We Model

We generated two classes of models based on ODEs and SDEs, respectively. First we will walk through the details of ODE implementation, which we built upon by introducing features unique to SDEs. Then, some hints about alternative modeling frameworks will be featured for the benefit of future iGEMers!We needed to identify the relevant parameters for the models, in order to run realistic simulations. These values were found by researching literature, other models, and past iGEM wiki-pages. The parameters were selected to be consistent among our different designs. The parameters found were taken as rough estimates (see figure 1), because the performance of biological systems can vary for several reasons that have yet to be elucidated. However, rough estimates are enough to get great insights into the mechanics of cellular pathways. Optimally, producing more precise laboratory data, and fitting the models hereto, would be the next step to obtaining more realistic modeling results.

**Figure 1: A list of parameters used in our models.**

Due to the design of our models, the actual parameters were often derived from the ones listed in the figure.

## Ordinary Differential Equations (ODE)

To build our ODEs, the below-stated three steps were followed:- 1) Formulate a diagram that specifies the key players (state variables) and describes all possible ways these variables might interact with each other.
- 2) Explicitly write out the quantitative form of all interactions.
- 3) Convert the explicit quantitative interactions into a mathematical framework.

##### Step 1 - Making a Diagram

In order to make a diagram, we needed to have an idea of what our design should look like. For this, one could assume that:- a) The presence of a biomolecule in proximity to the cellular membrane lead to the binding of our receptor of interest to its ligand/biomolecule, and then the binding of a coreceptor to this complex.
- b) Our receptor and coreceptor were both transmembrane proteins, with intracellular domains that associate intracellularly following the extracellular association.
- c) Their intracellular association was facilitated by the natural affinity between the two halves of a split-ubiquitin molecule (for reference, see design page). This association and reconstitution of a full ubiquitin prompted its cleavage by naturally occurring deubiquitinases.
- d) Such a cleaving event released a transcription factor that was bound to the C-terminal half of the split ubiquitin.
- e) The freed transcription factor traveled to the nucleus and initiated transcription of the gene into mRNA.
- f) mRNA was translated into a reporter protein.

**Figure 2: Diagram of a signaling pathway of the design #1.**

The signal proceeds from the interleukin, receptor, and co-receptor towards NanoLuc (the reporter). Design #2 looks virtually identical.

##### Step 2 - Writing Interactions in a Quantitative Form

The next step was to find out how much product was produced from a given amount of reactant. As we were dealing with signaling pathways, our concentrations were changing in accordance with the association and dissociation of the variables. The equation derived hereof can be found below:###### k1[A][B]=>[C]

where k1 is a kinetic parameter, [A] and [B] are reactant concentrations, and [C] is a product concentration. Note that such a reaction proceeds only in one direction. However, the real nature of many reactions was better represented by breakdown into a forward reaction and a backward reaction. This was because it was empirically observed that chemical reactions tend to reach a state of equilibrium with nonzero concentrations of reactants and products. This was understood as a situation described by:###### k_on[A][B]<=>k_off[C]

where k_on and k_off are kinetic parameters (binding and unbinding constant, respectively). Note that the reaction proceeds in both directions. After some time, equilibrium is reached, which is described by:###### k=k_on/k_off=[C]/([A][B]).

In practice, it was much easier to deduce k compared to its constituents k_on and k_off, because k could be calculated in a straightforward manner from observed concentrations. This was first addressed later. The equilibrium was now written as:

###### k_on * [biomolecule] * [receptor] * [coreceptor] <=> k_off * [complex]

However, the next reaction where deubiquitinase cleaved the reassociated split-ubiquitin proceeded only in a forward direction, as it is irreversible – once the transcription factor becomes released, it is never again part of the transmembrane protein.###### k_on * [complex] * [deubiquitinase] => [transcription factor]

The next step was transcription, which was a slightly more complicated reaction, only proceeding in the forward direction. The kinetics of transcription follows a simple form of Hill equation that was written as**[A]/(Kd+[A])=>[B]**where [A] and [B] are concentrations and Kd is a dissociation constant (a measure of how tightly the transcription factor bound a promoter). This equation further needed to be multiplied by a specific rate of transcription

**r=g*R/L**where g is a number of copies of a given gene, R is a general rate of transcription and L is a gene length.

###### rate * [transcription factor] / (Kd + [transcription factor]) => [mRNA]

Translation rate was a simpler case, as knowing the concentration of mRNA and the specific translation rate was enough to determine the translation rate. The specific translation rate is r=R/L where R is a general rate of translation and L is the length of mRNA.###### rate * [mRNA] => [protein]

##### Step 3 - Converting quantified interactions into a mathematical framework.

Concerting quantified interactions into a mathematical framework is the first step that was model dependent. Here, we wanted to create a model based on ordinary differential equations. Such equations described how values of given variables change over time, or rather how they changed during each time step. The ODE-based models worked by computing a set of equations iteratively – i.e. output of a previous computation was used as an input for a next run. Turning quantified interactions into differential equations was done by rewriting them from a standpoint of variables. Each differential equation needed to capture all changes that happen to a single variable. This included both additions and subtractions – if a certain molecule was a product of one reaction but a reactant in other reaction, the concentration of the molecule could both rise or drop.In the following reaction, the first term captured emergence of the complex by association of the biomarker, the receptor and the coreceptor. The second term captured the backward reaction where the complex dissociated into its constituent parts. The third term captured spending of the complex by catalytic activity of deubiquitinase.

###### D(complex) = k_on * [biomolecule] * [receptor] * [coreceptor] – k_off * [complex] – k_cat * [complex] * [deubiquitinase]

We had now made a system of ordinary differential equations that described kinetics of our signaling pathway. Next, we needed to translate our mathematical model into a programming language of choice, e.g. Matlab. Each variable was named ‘c’ and an arbitrary number. Parameters and whole equations were handled in a similar fashion:###### d4 = k1 * c1 * c2 * c3 – k2 * c4 – k3* c4 * k4;

Except from providing initial conditions, we did not need to worry about all the c’s and d’s any longer. The k’s, however, informed about the speed and order by which our system worked.Here is an archive with the models. As scientific work must be translatable at all times, we paid special attention to satisfying standards like MIRIAM. Knowing that many various software tools were used for pathway modeling, we utilized Moccasin to turn our Matlab code into SBML files that could be opened with a wide variety of programs. Below are snippets of code from our models:

**Figure 3: Example of the Matlab code from one of our models.**

It shows a part of the description and a checklist derived from the MIRIAM standard for models.

Click for our parameters, plot, and differential equations

## Stochastic Differential Equations (SDE)

##### Know what

After working with ODEs for a while, we encountered the limits of using deterministic mathematics to model something as unpredictable as a cell. Therefore, we chose to utilize stochastic differential equations (SDEs). Such equations take into account randomness (i.e. noise, stochasticity) which makes them useful for establishing a probabilistic model. The true SDE model incorporates randomness in all its calculations. We noted that this appears to be the first use of true SDE model in the context of iGEM, as some past teams presented ODE models with randomization of results performed after the end of the simulation, as an SDE model. However, such an ad-hoc randomization does not expand the expressive power of the model in any way and does not change the mathematical nature of the model.An SDE model contains everything from the ODE model and much more, thanks to expanding its expressive power from single values to ranges of probabilities. With this tool, we started thinking further ahead of the laboratory experiments and asked hypothetical questions like – how much can the reporter concentration vary if the biomarker concentration fluctuates? How well does the biosensor function if the cells face senescence? How does the behavior of the cell change if one of the metabolites is toxic?

These are the kinds of biologically relevant questions that can be answered with stochastic modeling, partially because SDEs can represent inherent randomness present in all natural processes. Additionally, the SDEs allowed us to take our lack of knowledge of fine bioprocesses into account. All this was possible because an SDE framework made it possible to subject different variables to different amounts of noise, or even different kinds of noise.

##### Know how

This part built upon the previous section concerning ODEs. In fact, the ODEs described in the previous section were so similar to the SDEs that we took our implemented system of differential equations and copied it in the right place of our code for the SDE-based model. Before that, we simplified the equation of our mathematical model to the form where M denoted all the deterministic parts. In other words, M was all the contribution coming from the signaling pathway of our interest.###### D(complex) = k_on * [biomolecule] * [receptor] * [coreceptor] – k_off * [complex] – k_cat * [complex] * [deubiquitinase] = M

Now, we needed a term that was driven entirely by randomness. We called it N. If we decided that N stood for a stochastic part of the equation, we got an equation like this:

###### D(complex) = M + N

In the context of SDEs, N represents a Wiener process. N is seen as a variable, whose value changes randomly but without any discontinuity occuring in its graph.

**Figure 4: An illustrative example of continuous noise generated by a Wiener process.**

Such a process may be tied to one or more variables in our SDE models.

This allowed us to model the effects of randomness and probabilities. Going back to the mathematical formula, this was expanded further, so that we obtained a stochastic differential equation that we later implemented in Matlab.

###### D(complex) = M + N = k1 * [biomolecule] * [receptor] * [coreceptor] - k2 * [complex] - k3 * [complex] * [deubiquitinase] + N

Here is an archive with the models. We implemented our SDE-based model in Matlab. Yet, there was a limited built-in support for SDEs in Matlab, so we used a package called SDETools that was available online. It separated the deterministic and stochastic parts of each equation so that the deterministic parts stayed grouped together, which was why we could easily copy and paste our system of ODEs. The stochastic part was then defined separately.This was no issue because Wiener processes can be defined in any number of dimensions, i.e. if we had ten differential equations, we simply defined N as a 10-dimensional process – one dimension per equation.

As mentioned earlier, our intention was to make the code readable and translatable, so we followed the same sets of standards for our SDE models as for our ODE models. One exception was that no SBML file was generated as there was no software support for SDETools due to it being an unofficial package. For the structure of our Matlab code, click here for contact information, here for parameters, here for deterministic equations, Wiener process, vectors and options, and finally, running the code.

##### To Future iGEMers!

There are some limitations to SDETools. Using a programming
language called Julia together with available packages could give a better SDE
modeling experience (our recommendation for future teams).

There are more ways to model that are quite different from differential equations. Some of them are rules-based or stochastic, such as: Gillespie algorithm, Markov chains, Petri nets.

Different algorithms come with different limitations or assumptions (that may not be immediately visible) so it could be a good idea to model the same system in several ways to make sure that one's predictions are meaningful.

Comparison of amplification provided by a one-to-one relationship versus one-to-many relationship.

## Note On Alternative Modeling Paradigms

Apart from ODEs and SDEs, there were many other modeling approaches that are useful for modeling of signaling pathways. The first one to mention is partial differential equations (PDEs). They are very similar to ODEs, but they also allow a variable change to be dependent on other variables, and not only on time. This extension was essential for modeling physical processes that are located in space (e.g. diffusion).There are more ways to model that are quite different from differential equations. Some of them are rules-based or stochastic, such as: Gillespie algorithm, Markov chains, Petri nets.

Different algorithms come with different limitations or assumptions (that may not be immediately visible) so it could be a good idea to model the same system in several ways to make sure that one's predictions are meaningful.

## Pheromone Cascade – Attractive but Complicated

Taking inspiration from last year’s UCopenhagen iGEM team, Ovulaid, we modeled the
yeast pheromone pathway in a similar way to what the did (using only two equations to
abstract the whole cascade) and provided it with a parameter fixed for
experimental data found in literature. After some experimentation with our
models, we became convinced that the reactions, in order to lead to empirically observed hypersensitivity, were necessarily
more complex compared to what some papers claimed.

Notably, one-to-one reactions could not provide significant amplification from a theorical standpoint. After reading through literature, we found more detailed descriptions of the pheromone cascade which included a series of complexes and one-to-many interactions. In other words, a lot of amplification. As such, we became convinced that our model now more accurately represented the complexity found in the pheromone pathway.

Notably, one-to-one reactions could not provide significant amplification from a theorical standpoint. After reading through literature, we found more detailed descriptions of the pheromone cascade which included a series of complexes and one-to-many interactions. In other words, a lot of amplification. As such, we became convinced that our model now more accurately represented the complexity found in the pheromone pathway.

**Figure 5: Amplification comparison.**

Comparison of amplification provided by a one-to-one relationship versus one-to-many relationship.

We found a brilliant curated model on biomodels.com. We extracted the important
part of the cascade and added it in all our models of the design #3. This complicated our models, however, making the SDE implementation run into RAM memory issues.

**Figure 6: Diagram of a signaling pathway of the design #3.**

The signal proceeds from the interleukin, receptor, and co-receptor towards NanoLuc (the reporter).

### Impact on Wet Lab - ODE/SDE Experiments

#### How our modeling was tied to wet lab experiments

Since the beginning of our iGEM journey, wet lab and dry lab worked closely together. Thanks to our supervisors, we always intended to use our dry lab activities to guide our wet lab decisions. As our project ended up focusing a great deal on protein analysis, the wet lab work with G-alpha would not have been possible without the simultaneous work in the dry lab.The dry lab guided the wet lab in finding out which of the several biosensor designs was the most viable, in shaping the design of the assays for our

*Saccharomyces cerevisiae*cells, and also in engineering the proteins with the desired properties.

In all the following experiments, all values were held equal except from the one studied at a time.

#### Our ODE Models

##### Dynamic Ranges

**Figure 7: Comparison of dynamic ranges of our designs.**

Note the logarithmic scale on the x-axis. It indicates that the pheromone cascade which differs design #3 from the other two designs, provides a significant signal amplification which leads to a higher sensitivity.

Dynamic ranges showed how well different designs responded to a range of
interleukin concentrations. It was observed that the lack of pheromone
cascade in the design #1 and the design #2 lead to insufficient sensitivity.
The pheromone cascade increased sensitivity by several orders of magnitude.
These findings resulted in more effort allocated to the design #3 (with G-alpha and TEV protease).

##### Sensitivity Analysis

Sensitivity analysis showed how perturbations of parameter values impacted
the reporter concentration. It identified more robust parameters (i.e. resilient
to perturbations) and more sensitive parameters (i.e. even slight perturbations
lead to a noticeable impact). Additionally, it was observed how the importance
of certain parameters changed over time. In the design #1, the affinity of a
transcription factor to a promoter (k15 in the figure) was a sensitive parameter,
so that the reporter concentration range could be adjusted by choice of a promoter.
The reporter concentration was shown to be sensitive to interleukin concentration (c_1 in the figure).

**Figure 8: Sensitivity analysis of the design #1 performed in SimBiology.**

The reporter was most sensitive to the affinity of a transcription factor to a promoter (k15) and the interleukin concentration (c_1).

##### Correlation Among Pathway Proteins

Here we performed principal component analysis (PCA) and obtained a matrix of all correlations between variables in the design #3 (spectrum from blue to red represented perfect negative correlation to perfect positive correlation, respectively). The results allowed us to understand indirect interactions between proteins in the signaling pathway. Provided that laboratory tools and methods for advanced pathway engineering existed, this figure could guide us in pathway optimization.**Figure 9: Results from our PCA - Correlation matrix of all proteins in the signaling pathway of the design #3 and also time.**

Blue boxes are associated with a negative correlation (the values closer to -1 signified a higher negative correlation - if one concentration went up, the other went down), the red were associated with a positive correlation (the values closer to +1 signified a higher positive correlation - if one concentration went up, so did the other). Values closer to zero signified less correlation between two concentrations.

##### Impact of Promoter

Affinity of a transcription factor (TF) to a promoter was varied in
this case study. High-affinity TF-promoter pair (Kd = 1 nM) lead to an increased
sensitivity accompanied by significantly higher reporter concentrations,
compared with a low-affinity TF-promoter pair (Kd = 100 nM). The biosensor’s dynamic range
as well as reporter concentrations can be adjusted by the right choice of
a TF-promoter pair.

**Figure 10: The impact of transcription factors on biosensor sensitivity.**

Higher affinity of a transcription factor to a promoter lead to a higher sensitivity and higher reporter concentrations.

##### Inflammatory vs baseline reporter concentrations

**Figure 11: The impact of inflammation levels on reporter concentrations.**

High levels of inflammation leads to a higher reporter concentration, if compared to baseline inflammation levels.

In this case study, the design #3 was subjected to low and high
interleukin concentration (i.e. levels of inflammation).
High inflammation (c = 10 nM) lead to a fast development of high reporter
concentrations, compared with low inflammation (c = 1 pM). Significant differences
of the plotted curves suggested the design #3 could distinguish between
low and high inflammation, as long as the measurements were done at the
right time (preferably ~5 hours mark).

**This fits perfectly with our Human Practices considerations, where multiple experts preferred a design that provided fast results, and patients were willing to wear the patch for up to 24 hours, according to our survey results!**##### Impact of proteolytic activity

Proteolytic activity of TEV protease shifted the dynamic range of
design #3 by orders of magnitude, without a change in overall reporter
concentrations. There were many versions of TEV proteases available
with various levels of proteolytic activity. Protease choice had a
significant impact on the dynamic range of a biosensor. See figure 12 for comparison of high proteolytic actvity (k_cat = 0.18 s^-1) and low proteolytic actvity (k_cat = 0.000907 s^-1).

**Figure 12: The impact of TEV protease activity on biosensor sensitivity.**

Higher proteolytic activity of a TEV protease leads to a higher sensitivity of the biosensor.

#### SDEs - hypothetical scenarios with effects of noise

Figures are illustrative. Use of SDE models was limited by computational requirements of the current implementation. Thus, desired phenomena were amplified to be observable on a short timescale. SDE models are an extension that is less of a design tool and more of an analytic tool for further improvement of the biosensor. SDE models can aid with identification of unknown behaviors.##### Rapid aging lead to disorder in the cell

This case study examined the impact of noise increasing with time. Such a phenomenon could be present in senescent cells, where by-products of aging obstruct the reliability of cellular processes. If such divergence of curves was observed in our sweat patches, we would have known whether the dysfunctionality is primarily time dependent.**Figure 13: The impact of increasing noise on the reporter concentration.**

The amount of noise is increasing with time (illustrative).

##### Longer measurement allows the signal pathway to yield more consistent results

This case study examined the impact of noise decreasing with time. Such a phenomenon could be observed in any cell, especially soon after triggering the receptor system before the pathway processes overcame the magnitude of the biochemical background. If such divergence of curves was observed in our sweat patches, we would know that this is a perfectly expected behavior coming from the physical law that longer measurements yield more precise results.**Figure 14: The impact of decreasing noise on the reporter concentration.**

The amount of noise is decreasing with time (illustrative).

##### Unknown process decreases and increases disorder in the cell

This case study examined a rather exotic-looking hypothetical scenario where high levels of noise diminished and came back to initial levels afterwards. Such bizarre divergence of curves could indicate that the cellular pathway only worked under very specific conditions that were achieved only for a short time.**Figure 15: The impact of fluctuating noise levels on the reporter concentration.**

The levels of noise are first decreasing, then increasing again (illustrative).

##### Natural variability in measurements

This case study examined natural variability of measurements that followed the normal distribution. By fitting data from experiments to the divergence of these curves, we could know how much noise was present in our biosensor or provided by the measurement technique.**Figure 16: The impact of naturally arising variability in measurements on the measured reporter concentrations.**

The amount of noise is uniform (illustrative).

##### Receptors are not specific enough

This case study examined impact of noisy activation of signaling pathways that could be mediated by a receptor system that was not specific enough. If such divergence of curves was observed in our sweat patches, we would have known the problem was possibly caused by unspecific receptors.**Figure 17: Lower specificity of the receptors towards their ligand effects the reporter concentrations.**

The uniform level of noise is applied only to variables that represent the receptors (illustrative).

##### Receptor disrupts cellular processes

This case study examined the impact of noise increase tied to reporter concentration. If such divergence of curves was observed in our sweat patches, we would have known that the reporter was causing significant harm to the reliability of cellular processes.**Figure 18: The effect of reporter toxicity on the reporter concentration.**

The amount of noise applied to all variables increases with reporter concentration, which leads to highly variable reporter concentrations (illustrative).

##### Variability in Interleukin Levels

This case study examined the impact of variability in interleukin concentration. This behavior could be expected in our sweat patch as the baseline interleukin concentrations likely vary among people. If such divergence of curves was observed in our sweat patches, we would have known that this behavior was expected and we could deduce variability among people.**Figure 19: The effect of variable interleukin concentration on the reporter concentration.**

The uniform level of noise is applied only to the interleukin concentration (illustrative).

##### Diminishing disorder of cellular processes

This case study examined the effect of initially high but diminishing noise. This exotic scenario could represent the cellular process stabilizing at random states. Such a behavior is not expected in a functioning biosensor and could indicate there was a serious issue with part of the signaling pathway, or perhaps that the cells had been compromised.**Figure 20: Lowering of the initially high level of noise effects the reporter concentration.**

The noise applied to all variables is decreasing with time (illustrative).

## G-alpha and Structural Modelling with Rosetta

Just to refresh, our work on G-alpha hinged on finding a mutant that preserved
natural function when there was no signal (i.e. no cytokine), and also dissociated
appropriately in the presence of a signal.

Following the recommendation from team Sinisens (iGEM Aalto-Helsinki 2020), we looked into Rosetta Software Suite. Alanine scan was a Rosetta script available via an old version of Robetta server, and we used it to calculate how Gibbs free energy of the two proteins (G-alpha and G-beta) changed upon changing a single residue to alanine. Changes in Gibbs free energy correlate with binding affinity.

We submitted a list of residues selected according to our previous analyses. A crystal structure of the bovine G protein complex served as a template for spatial structural positioning of our proteins.

Following the recommendation from team Sinisens (iGEM Aalto-Helsinki 2020), we looked into Rosetta Software Suite. Alanine scan was a Rosetta script available via an old version of Robetta server, and we used it to calculate how Gibbs free energy of the two proteins (G-alpha and G-beta) changed upon changing a single residue to alanine. Changes in Gibbs free energy correlate with binding affinity.

We submitted a list of residues selected according to our previous analyses. A crystal structure of the bovine G protein complex served as a template for spatial structural positioning of our proteins.

**Figure 21: Change in Gibbs free energy (in kcal/mol) generated by replacing the selected residues with alanine.**

As Gibbs free energy correlates with a change in affinity, it appears that residue 227 is the most important one for the binding interface of G-alpha and G-beta proteins.

Thus, we needed to simulate binding affinities (or change in Gibbs free energy – delta delta G) of seven G-alpha proteins and nine fragments formed after cleavage. We eventually landed on using a Rosetta script called flex_ddg, as this script has been shown to be less computationally demanding than other scripts, while making great predictions of molecular dynamics .

We particularly found the script that was written with a capability to include refinement of every protein structure many times to be useful. We followed recommendations from script creators and went with 35 000 independent refinements per structure. Although the script was initially developed for mutagenesis simulations, we found an easy way to adjust it for our own use. For each of our 16 proteins (mutants + fragments), we generated 35 independently optimized structures to probe the conformational space (this took 20 days on 16 CPUs with 64 GBs of RAM). We used the scoring function REF2015 as it was shown to evaluate results of flex_ddg better than others (https://pubs.acs.org/doi/10.1021/acscentsci.8b00717).

**Figure 22: Change in Gibbs free energy of fragments produced by cleavage of G-alpha mutants.**

The x-axis denotes the length of each fragment. There seems to be a slight trend towards a higher affinity to G-beta protein in smaller fragments, most likely caused by more degrees of freedom for a binding region constituting a majority of such fragments.

**Figure 23: Changes in Gibbs free energy of our G-alpha mutants relative to the wildtype.**

Numbers in their labels signify what cleavage sites were included in each mutant. As all changes in energy are negative, it indicates that all mutants have a higher affinity to G-beta protein.

Here is an archive with input files and results. Results of the simulation were surprising as they revealed our
reasoning might have been flawed. Essentially, our assumption had been that
cleaved fragments dissociate from the protein, yet our results, as can be observed in figures 22 and 23,
showed that the majority of protein fragments had a higher affinity than the parent proteins.
We also realized that our simulation was not directly answering our true question
– could cleavage of our mutant G-alpha trigger Ste5 recruitment by G-beta?
Directly simulating that could include up to seven simultaneously interacting
proteins, which translated into a very demanding computational task.

After analyzing the results, we decided to run another simulation with the same script but different input, to focus on fragments of the best-scoring protein – M124 (GPA mutant with TEV sites numbered 1, 2, and 4). What made this simulation different and more precise than the previous one was a ground reference for the calculation of Gibbs free energy – this time optimized for a particular protein whose spatial coordinates were obtained through the previous Rosetta simulation. Due to a lack of time, only a single structure per protein fragment was generated. Results were analyzed in the same way (see figure 24) and further confirmed that the fragments were likely to exhibit higher affinity. Spatial coordinates of the fragments were compared in PyMol (see figure 25) and it turned out their position clashed, which meant that real protein fragments did not have the option to stay in their optimal position. This exposed another limitation of simulating interaction only of a single protein fragment at a time. Furthermore, there was still the option that fragments could get digested by naturally present enzymes due to their clash and probable dysfunctionality, although that could compromise the whole G protein complex.

After analyzing the results, we decided to run another simulation with the same script but different input, to focus on fragments of the best-scoring protein – M124 (GPA mutant with TEV sites numbered 1, 2, and 4). What made this simulation different and more precise than the previous one was a ground reference for the calculation of Gibbs free energy – this time optimized for a particular protein whose spatial coordinates were obtained through the previous Rosetta simulation. Due to a lack of time, only a single structure per protein fragment was generated. Results were analyzed in the same way (see figure 24) and further confirmed that the fragments were likely to exhibit higher affinity. Spatial coordinates of the fragments were compared in PyMol (see figure 25) and it turned out their position clashed, which meant that real protein fragments did not have the option to stay in their optimal position. This exposed another limitation of simulating interaction only of a single protein fragment at a time. Furthermore, there was still the option that fragments could get digested by naturally present enzymes due to their clash and probable dysfunctionality, although that could compromise the whole G protein complex.

**Figure 24: Change in Gibbs free energy of fragments produced by cleavage of G-alpha mutant m124.**

The labels on the x-axis signify the position of each fragment in the original protein.

**Figure 25: An example of clashing positions of the fragments.**

This indicates a situation that cannot happen in the real-world conditions, thus reducing our trust in the prediction.

##### Future Vision

In the future, simulating molecular dynamics via force fields could be applied
to obtain insights into interactions of the protein fragments of G-alpha. Furthermore, this
could inform us whether G-alpha cleavage leads to any conformational changes in
G-beta, which could shed light on its ability or inability to recruit Ste5
and thus trigger the pheromone cascade.

## Transmembrane Domain Engineering Attempt

The extracellular portions of the human interleukin receptors were then fused to a transmembrane domain to secure localization to the membrane, as previously mentioned. We ran a sequence analysis and transmembrane protein prediction with TMHMM 2.0 (https://services.healthtech.dtu.dk/service.php?TMHMM-2.0) on 13 different endogenous single-pass type I transmembrane proteins (see figure 26).

**Figure 26: Predicted scores for select transmembrane domain sequences.**

These scores informed our decisions when engineering our novel transmembrane domain.

After alignment of sequences for regions identified as transmembrane, most conserved residues were observed. Further, amino acid compositions of the transmembrane domains were taken into account. Using our knowledge of the characteristics of the phospholipid bilayer and transmembrane proteins in general; we refined our own candidate for a transmembrane protein. This TMD would have hydrophobic amino acid residues pointing “inwards”, with the polar amino acid residues on either side and tryptophan in-between those areas. Our candidate had the following sequence of amino acids:

**IAGIVIGVVLGVIFILIAILFAFW**.
This candidate proved to have a higher predicted localization to the membrane than the TMDs we compared it to. However, we compared it to Wsc1 domain and learned these two had virtually same properties for a transmembrane domain (see figure 27).

**Figure 27: Predicted scores for our novel transmembrane domain compared to transmembrane domain Wsc1.**

The graph indicates that the Wsc1 domain that can be found in nature performs better than our own novel transmembrane domain.

Thus, we decided to use the TMD Wsc1 for our project design henceforth, as some of our designs build on papers that use the Wsc1 domain. This way, we can protect the project from uncertainty accompanying novel designer proteins.

- Barlow KA, Ó Conchúir S, Thompson S, Suresh P, Lucas JE, Heinonen M, Kortemme T. Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein-Protein Binding Affinity upon Mutation. J Phys Chem B. 2018 May 31;122(21):5389-5399. doi: 10.1021/acs.jpcb.7b11367. Epub 2018 Feb 15. PMID: 29401388; PMCID: PMC5980710. ↩