Difference between revisions of "Team:UCopenhagen/Model"

Line 43: Line 43:
 
         font-family: Avenir, Arial;
 
         font-family: Avenir, Arial;
 
         font-size: 16px;
 
         font-size: 16px;
 +
        line-height: 1.4;
 
       }
 
       }
  

Revision as of 15:55, 25 October 2020

Introduction

Our dry lab efforts this year expanded to several different tracks, each focused on a different problem and solving it by different methods and tools. Our focus has primarily been on:
  • Structural modeling and mutation of the alpha subunit of yeast G protein.
  • Developing and comparing ODE models for our three iterative designs.
  • Developing and comparing SDE models for our three iterative designs.
As you'll see in the following!

Differential Equations for Signaling Pathways

Since we were developing a biosensor which needed to sense changes in rather low concentrations, it was crucial for us to have a good understanding of the signal transduction pathway we were coupling it to, and what reactions made up that pathway (please refer to our project design page for more information on our biosensor design). Through our dry lab work, we set out to answer these questions:
  1. 1) How do we make sure that our biosensor can react to the low concentrations of cytokines found in sweat, within a reasonable time limit set by our Human Practices research?
  2. 2) Where are the bottlenecks in the pathway, and can we fix them?
  3. 3) Which signal transduction steps are robust or sensitive?

And to answer them, we made use of ordinary differential equations (ODEs): Equations that illustrate how the state of a variable changes with time.

After working with ODEs for a while though, we realized the limits there were to using deterministic mathematics to model something as unpredictable as a cell, and as such, we opened up to the possibility of using stochastic differential equations (SDEs).
Results from such a model contained everything from the ODE model and much more, thanks to expanding from single values to ranges of probabilities. With this tool, we started asking questions like – how much can the reporter concentration vary given the same biomarker concentration? How well does the biosensor function if the cells face senescence? How does the behavior of the cell change if one of the metabolites is toxic?

These are the kinds of biologically relevant questions that could be answered with stochastic modeling, partially because SDEs could represent inherent randomness present in all existing processes. We argued they are especially relevant because they allowed us to take our lack of knowledge of fine bioprocesses into account.

How We Model

We generated two classes of models based on ODEs and SDEs, respectively. First, we'll walk through the details of ODE implementation, after which we built upon them by introducing features unique to SDEs. Some hints about alternative modeling frameworks were added at the bottom of this page for future iGEMers!

We needed to identify values of parameters for the models to give somewhat realistic data. We searched through literature, other models, past iGEM wikis. The parameters were selected to be consistent among different designs of ours. The parameters found were taken as rough estimates (see the figure), because performance of biological systems can vary for reasons not yet understood. However, rough estimates are enough to get great insights into mechanics of cellular pathways. The next step could be obtaining precise data for our own biological parts in the lab, and fitting our models to such first-hand data.
List of parameters used in our models.

Ordinary Differential Equations (ODEs)

To build our ODEs, we followed the following three steps:
  1. 1) Formulate a diagram that specifies the key players (state variables) and describes all possible ways these variables might interact with each other.
  2. 2) Explicitly write out the quantitative form of all interactions.
  3. 3) Convert the explicit quantitative interactions into a mathematical framework.
Step 1 - Making a Diagram
In order to make a diagram, we needed to have some idea of what our design is intended to be like. So, let’s say we knew something like:
  • a) The presence of a biomolecule in proximity to the cellular membrane lead to the binding of our receptor to the biomolecule, and then the binding of a coreceptor to this complex.
  • b) Our receptor and coreceptor were both transmembrane proteins, with intracellular domains that associate intracellularly following the extracellular association.
  • c) Their intracellular association was facilitated by the natural affinity of the two halves of a split-ubiquitin molecule (explained on the design page). This association and reconstitution of a full ubiquitin prompted its cleavage by naturally occurring deubiquitinases.
  • d) Such a cleaving event released a transcription factor that was bound to the C-terminal half of the split ubiquitin.
  • e) The freed transcription factor traveled to the nucleus and initiated transcription of gene into mRNA.
  • f) mRNA was translated into a reporter protein.
From the points above, our key players (state variables) were identified as: biomolecule, transmembrane receptor protein, transmembrane coreceptor protein, free transcription factor, gene, mRNA, reporter protein. All other factors were treated as parameters with a constant value. The interactions between our variables were basically the six points above. Based on all this information, we made a nice diagram where the pathway became more intuitively visible.
Diagram of a signaling pathway of the design #1. The design #2 looks virtually identical.
Step 2 - Writing Interactions in a Quantitative Form
The next step was to find out what amount of a given reactant lead to what amount of a given product. As we were dealing with signaling pathways, our concentrations were changing in accordance with the association and dissociation of the variables. We ended up with equations like
k1[A][B]=>[C]
where k1 was a kinetic parameter, [A] and [B] were reactant concentrations, and [C] was a product concentration. Note that such a reaction proceeded only in one direction. However, the reality of many reactions was better represented by breakdown into a forward reaction and a backward reaction. This was because it was empirically observed that chemical reactions tend to reach a state of equilibrium with nonzero concentrations of reactants and products. This was understood as a situation described by
k_on[A][B]<=>k_off[C]
where k_on and k_off were kinetic parameters (binding and unbinding constant, respectively) and [A], [B] and [C] are concentrations. Note that the reaction proceeded in both directions. And after some time, it reached the equilibrium described by
k=k_on/k_off=[C]/([A][B]).

Now, back to the empirical settings – in practice, it was much easier to deduce k than its constituents k_on and k_off, because k could be calculated in a straightforward manner from observed concentrations. We got back to this issue later. For now, we wrote down:
k_on * [biomolecule] * [receptor] * [coreceptor] <=> k_off * [complex]
However, the next reaction where deubiquitinase cleaved the reassociated split-ubiquitin proceeded only in a forward direction – once the transcription factor was released, it was never again part of the transmembrane protein.
k_on * [complex] * [deubiquitinase] => [transcription factor]
The next step was transcription, which was a slightly more complicated reaction that also proceeded only in the forward direction. Its kinetics followed a simple form of Hill equation that was written as [A]/(Kd+[A])=>[B] where [A] and [B] were concentrations and Kd was a dissociation constant (a measure of how tightly the transcription factor bound a promoter). This equation further needed to be multiplied by a specific rate of transcription r=g*R/L where g was a number of copies of a given gene, R was a general rate of transcription and L was a gene length.
rate * [transcription factor] / (Kd + [transcription factor]) => [mRNA]
Translation rate was a simpler case again where knowing the concentration of mRNA and the specific translation rate was enough. The specific translation rate was r=R/L where R was a general rate of translation and L was a length of mRNA.
rate * [mRNA] => [protein]
Step 3 - Converting quantified interactions into a mathematical framework.
This was the first step that is model-dependent. Here we wanted to create a model based on ordinary differential equations. Such equations described how values of given variables change over time, or rather how they changed during each time step. The ODE-based models worked by computing a set of equations iteratively – i.e. output of a previous computation was used as an input for a next run. Turning quantified interactions into differential equations was done by rewriting them from a standpoint of variables. Each differential equation needed to capture all changes that happen to a single variable. This included both additions and subtractions – if a certain molecule was a product of one reaction but a reactant in other reaction, its concentration could both rise or drop.

In the following reaction, the first term captured emergence of the complex by association of the biomarker, the receptor and the coreceptor. The second term captured the backward reaction where the complex dissociated into its constituent parts. The third term captured spending of the complex by catalytic activity of deubiquitinase.
D(complex) = k_on * [biomolecule] * [receptor] * [coreceptor] – k_off * [complex] – k_cat * [complex] * [deubiquitinase]
Alright, now we had a system of ordinary differential equations that described kinetics of our signaling pathway. We needed to translate our mathematical model into a programming language of choice, e.g. Matlab. Things became a little less transparent because a variable was named ‘c’ and an arbitrary number, similar went for parameters and whole equations.
d4 = k1 * c1 * c2 * c3 – k2 * c4 – k3* c4 * k4;
Except from providing initial conditions, we did not need to worry about all the c’s and d’s any longer. The k’s however were what told us something about the speed and order by which our system worked. We gathered values for the parameters from literature, from other models, from experiments and from our educated guesses, while paying attention to the units we got our parameters in.

Because we thought scientific work had to be translatable at all times, we paid special attention to satisfying standards like MIRIAM. Knowing that many various software tools were used for pathway modeling, we utilized Moccasin to turn our Matlab code into SBML files that could be opened with a wide variety of programs. Although our Matlab files contained a plentitude of comments, it was meaningful to show how we structured our Matlab code.
Example of the Matlab code from one of our models.

Click for our parameters, plot, and differential equations

Stochastic Differential Equations (SDEs)

This part built upon the previous one concerning ODEs. In fact, ODEs were so similar that we took our implemented system of differential equations and copied it in the right place of our code for the SDE-based model. But before we did that, we stepped a bit back to where we finished with our mathematical model. We had the equation, but we simplified it to the form where M denoted all the deterministic parts. In other words, M was all the contribution coming from the signaling pathway of our interest.
D(complex) = k_on * [biomolecule] * [receptor] * [coreceptor] – k_off * [complex] – k_cat * [complex] * [deubiquitinase] = M

Now, we needed a term that was driven entirely by randomness, we called it N. If we decided that N stood for a stochastic part of the equation, we got an equation like this:
D(complex) = M + N

So far so good. What was N? In the context of SDEs, it was representing a Wiener process, which was seen as a random variable and its value kept randomly changing up and down.
Illustrative example of continuous noise generated by so-called Wiener process.

This allowed us to model the effects of randomness and probabilities. How we made N be what we claimed it was, was a matter of implementation. Alright, now we went back to the mathematical formula and we expanded it again, so that we obtained a stochastic differential equation that we later implemented.
D(complex) = M + N = k1 * [biomolecule] * [receptor] * [coreceptor] - k2 * [complex] - k3 * [complex] * [deubiquitinase] + N
We implemented our SDE-based model in Matlab. Yet, there was a limited built-in support for SDEs in Matlab, so we used a package SDETools that was available online. It separated the deterministic and stochastic parts of each equation so that the deterministic parts stayed grouped together, which was why we easily copied/pasted our system of ODEs. The stochastic part was then defined separately.

This was no issue because Wiener processes could be defined in any number of dimensions, i.e. if we had ten differential equations, we simply defined N as a 10-dimensional process – one dimension per equation.

As already mentioned, our intention was to make the code readable and translatable, so we followed the same sets of standards for our SDE models. One exception was that no SBML files were generated as there was no software support for SDETools as it was an unofficial package. For the structure of our Matlab code, click here for contact information, here for parameters, here for deterministic equations, wiener process, vectors and options, and finally, running the code.
To Future iGEMers!
To future iGEMers: there were some limitations to SDETools. Using a programming language called Julia together with available packages could give a nicer SDE modeling experience - our recommendation for future teams.

Note On Alternative Modeling Paradigms

Apart from ODEs and SDEs, there were many other modeling approaches useful for modeling of signaling pathways. The first one to mention was certainly partial differential equations (PDE). They were very similar to ODEs, but they allowed a variable change to be dependent on other variables, not only on time. This extension was essential for modeling physical processes that are located in space (e.g. diffusion).

There were more ways to model that were quite different from differential equations, some of them were rules-based or stochastic. Let’s name a few: Gillespie algorithm, Markov chains, Petri nets.

Different algorithms came with different limitations or assumptions (which might not be immediately visible) so it could be a good idea to model the same system in several ways to make sure the predictions were meaningful. It was a bit unfortunate that so much of the research in biology took advantage solely of ODEs/PDEs, while there was so much more.

Pheromone Cascade – Attractive but Complicated

Taking inspiration from last year’s UCopenhagen iGEM team, we modeled the pheromone pathway in a similar way to them (using only two equations to abstract the whole cascade) and provided it with a parameter fixed for experimental data found in literature. After some experimentation with our models at that time, we became convinced that the reactions were necessarily more complex, in order to lead to empirically observed hypersensitivity.

Notably, one-to-one reactions could not provide significant amplification from a theorical standpoint. After some reading, we found more detailed descriptions of the pheromone cascade, that included a series of complexes and one-to-many interactions. In other words, a lot of amplification. As such, we got convinced that our model now more accurately represented the complexity found in the pheromone pathway.
Comparison of amplification provided by a one-to-one relationship versus one-to-many relationship.
We found a brilliant curated model on biomodels.com. We extracted the important part of the cascade and added it in our model. The only downside was that the model became quite a lot more complicated (and the SDE implementation thus started running into RAM memory issues).
Diagram of a signaling pathway of the design #3. Note how much more complicated it has become thanks to the pheromone pathway.

Impact on Wet Lab - Experiments

How our modeling was tied to wet lab experiments

Since the beginning of our journey, wet lab and dry lab worked closely together. Thanks to our supervisors, we always intended to use our dry lab activities to guide our wet lab decisions. As our project ended up focusing a great deal on protein analysis, the wet lab work with G-alpha would not have been possible without the simultaneous work in the dry lab.

The dry lab guided the wet lab in finding out which of the several biosensor designs was the most viable, in shaping the design of the assays for our yeast cells, and also in engineering the proteins with the desired properties.

Mutual involvement and daily briefings guaranteed that we headed in the same direction.

In all the following experiments, all values were held equal except from the ones studied.

Our ODE Models

Dynamic Ranges
Comparison of dynamic ranges of our designs. Note the logarithmic scale on the x-axis.
Dynamic ranges showed how well different designs responded to a range of interleukin concentrations. It was observed that the lack of pheromone cascade in the design #1 and the design #2 lead to insufficient sensitivity. The pheromone cascade increased sensitivity by several orders of magnitude. These findings resulted in more effort allocated to the design #3 (with G-alpha and TEV protease).
Sensitivity Analysis
Sensitivity analysis showed how perturbations of parameter values impacted the reporter concentration. It identified more robust parameters (i.e. resilient to perturbations) and more sensitive parameters (i.e. even slight perturbations lead to a noticeable impact). Additionally, it was observed how the importance of certain parameters changed over time. In the design #1, the affinity of a transcription factor to a promoter (k15 in the figure) was a sensitive parameter, so that the reporter concentration range could be adjusted by choice of a promoter. The reporter concentration was shown to be sensitive to interleukin concentration (c_1 in the figure).
Sensitivity analysis of the design #1 performed in SimBiology. The reporter was most sensitive to: affinity of a transcription factor to a promoter (k15), interleukin concentration (c_1).
Correlation Among Pathway Proteins
Here we performed principal component analysis (PCA) and obtained a matrix of all correlations between variables in the design #3 (spectrum from blue to red represented perfect negative correlation to perfect positive correlation, respectively). The results allowed us to understand indirect interactions between proteins in the signaling pathway. Provided that laboratory tools and methods for advanced pathway engineering existed, this figure could guide us in pathway optimization.
Correlation matrix of all proteins in the signaling pathway of the design #3 and also time. The blue was associated with a negative correlation (the values closer to -1 signified a higher negative correlation - if one concentration went up, the other went down), the red was associated with a positive correlation (the values closer to +1 signified a higher positive correlation - if one concentration went up, so did the other). Values closer to zero signified less correlation between two concentrations.
Impact of Promoter
Affinity of a transcription factor (TF) to a promoter was varied in this case study. High-affinity TF-promoter pair lead to an increased sensitivity accompanied by significantly higher reporter concentrations, compared with a low-affinity TF-promoter pair. Biosensor’s dynamic range as well as reporter concentrations can be adjusted by a right choice of a TF-promoter pair.
Higher affinity of a transcription factor to a promoter lead to a higher sensitivity and higher reporter concentrations.
Inflammatory vs baseline reporter concentrations
High levels of inflammation lead to a higher reporter concentration, if compared to baseline inflammation levels.
In this case study, the design #3 was subjected to low and high interleukin concentration (i.e. levels of inflammation). High inflammation lead to a fast development of high reporter concentrations, compared with low inflammation. Significant differences of the plotted curves suggested the design #3 could distinguish between low and high inflammation, as long as the measurements were done at the right time (preferably ~5 hours mark).


This fits perfectly with our Human Practices considerations, where multiple experts preferred a design that provided rapid results, and patients were willing to wear the patch for up to 24 hours, according to our survey results!
Impact of proteolytic activity
Proteolytic activity of TEV protease shifted the dynamic range of design #3 by orders of magnitude, without a change in overall reporter concentrations. There were many versions of TEV proteases available with various levels of proteolytic activity. Protease choice had a significant impact on the dynamic range of a biosensor.
Higher proteolytic activity of a TEV protease lead to a higher sensitivity of the biosensor.

SDEs - hypothetical scenarios with effects of noise

Figures are illustrative. Use of SDE models was limited by computational requirements of the current implementation. Thus, desired phenomena were amplified to be observable on a short timescale. SDE models were an extension that was less of a design tool and more of an analytic tool for further improvement of the biosensor – it aided with identification of unknown behaviors.
Rapid aging lead to disorder in the cell
This case study examined the impact of noise increasing with time. Such a phenomenon could be present in senescent cells, where the by-products of aging obstructed reliability of cellular processes. If such divergence of curves was observed in our sweat patches, we would have known the dysfunctionality is primarily time dependent.
Amount of noise increasing with time (illustrative).
Longer time allows the signal pathway to yield more consistent results
This case study examined the impact of noise decreasing with time. Such a phenomenon could be observed in any cell, especially soon after triggering the receptor system before the pathway processes overcame the magnitude of the biochemical background. If such divergence of curves was observed in our sweat patches, we would have known this is a perfectly expected behavior coming from the physical law that longer measurements yield more precise results.
Amount of noise decreasing with time (illustrative).
Unknown process can decrease and increase disorder in the cell
This case study examined rather an exotic-looking scenario where high levels of noise diminished and came back to initial levels afterwards. Such bizarre divergence of curves could show the cellular pathway only worked under very specific conditions that were achieved only for a short time.
Decreasing and increasing amount of noise (illustrative).
Natural variability in measurements
This case study examined natural variability of measurements that followed the normal distribution. By fitting data from experiments to the divergence of these curves, we could know how much noise was present in our biosensor or provided by the measurement technique.
Uniform amount of noise (illustrative).
Receptors are not specific enough
This case study examined impact of noisy activation of signaling pathways that could be mediated by a receptor system that was not specific enough. If such divergence of curves was observed in our sweat patches, we would have known the problem was possibly caused by unspecific receptors.
Noise applied on the receptors (illustrative).
Receptor disrupts cellular processes
This case study examined impact of noise increase tied to reporter concentration. If such divergence of curves was observed in our sweat patches, we would have known the reporter was causing significant harm to reliability of cellular processes.
Amount of noise increasing with reporter concentration (illustrative).
Variability in Interleukin Levels
This case study examined impact of variability in interleukin concentration. This behavior could be expected in our sweat patch as the baseline interleukin concentrations could likely vary among people. If such divergence of curves was observed in our sweat patches, we would have known this behavior was expected and we could deduce variability among people.
Noise applied on interleukin concentrations (illustrative).
Diminishing disorder of cellular processes
This case study examined effect of initially high but diminishing noise. This exotic scenario could represent the cellular process stabilizing at random states. Such a behavior could not be expected in a functioning biosensor and could indicate there was a serious issue with part of the signaling pathway, or perhaps the cells had been compromised.
Decreasing amount of noise (illustrative).

G-alpha and Structural Modelling With Rosetta

Just to refresh, our work on G-alpha hinged on finding a mutant that preserved natural function when there was no signal (i.e. no cytokine), and also dissociated appropriately in the presence of a signal.

Following the recommendation from team Aalto-Helsinki, we looked into Rosetta Software Suite. Alanine scan was a Rosetta script available via an old version of Robetta server, and we used it to calculate how Gibbs free energy of the two proteins (G-alpha and G-beta) changed upon changing a single residue to alanine. Changes in Gibbs free energy correlated with binding affinity.

We submitted a list of residues selected according to our previous analyses. Crystal structure of bovine G protein complex served as a template for spatial structural positioning of our proteins.
figure caption here
Thanks to Kurt V. Mikkelsen, we got the opportunity to further our simulations with Rosetta. After several engineering cycles, we crafted seven G-alpha mutants. We wanted to find out which one was the best candidate for our purposes. We wanted to identify a candidate mutant G-alpha with binding affinity like the wild type, but with lowest possible binding affinity of its fragments after cleavage.
Thus, we needed to simulate binding affinities (or change in Gibbs free energy – delta delta G) of seven G-alphas and nine fragments formed after cleavage. We eventually landed on using a Rosetta script flex_ddg, as this script had been shown to be less computationally demanding than other scripts, while making great predictions of molecular dynamics .

We particularly liked the script was written with a capability to include refinement of every protein structure many times (we followed recommendations and went with 35 000 times). Although the script was initially developed for mutagenesis simulations, we found and easy way to adjust it for our own use. For each of our 16 proteins (mutants + fragments), we generated 35 independently optimized structures to probe the conformational space (this took 20 days on 16 CPUs with 64 GBs of RAM). We used the scoring function REF2015 as it was shown to evaluate results of flex_ddg better than others.
Results of the simulation were shocking for the team, as they revealed our reasoning might have been flawed. Essentially, our assumption had been that cleaved fragments dissociate from the protein, yet our results (as in the figure) showed that majority of protein fragments had a higher affinity than parent proteins. We also realized that our simulation was not directly answering our true question – could cleavage of our mutant G-alpha trigger Ste5 recruitment by G-beta? Directly simulating that could include up to seven simultaneously interacting proteins, this translated into a very demanding computational task.

After analyzing the results, we decided to run another simulation with the same script but different input, to focus on fragments of the best-scoring protein – M124 (GPA mutant with TEV sites numbered 1, 2, and 4). What made this simulation different and more precise that the previous one, was a ground reference for calculation of Gibbs free energy – this time optimized for a particular protein whose spatial coordinates were obtained through the previous Rosetta simulation. Due to a lack of time, only a single structure per protein fragment was generated. Results were analyzed in the same way (see the figure) and further confirmed that the fragments were likely to exhibit higher affinity. Spatial coordinates of the fragments were compared in PyMol and it turned out their position clashed, which meant that real protein fragments did not have the option to stay in their optimal position. This exposed another limitation of simulating interaction only of a single protein fragment at a time. Furthermore, there was still the option that fragments could get digested by naturally present enzymes due to their clash and probable dysfunctionality. Although, that could compromise the whole G protein complex.
In the future, simulating molecular dynamics via force fields could be applied to obtain insights into interactions of protein fragments. Furthermore, this could inform us whether G-alpha cleavage lead to any conformational changes in G-beta, which could shed light on its ability or inability to recruit Ste5 and thus trigger the pheromone cascade.
  1. https://pubmed.ncbi.nlm.nih.gov/29401388/

Icons made by Freepik from www.flaticon.com