Team:Chalmers-Gothenburg/GEM

iGEM Chalmers Gothenburg 2020

C:\No_Time_to_Waste\Modeling\GEM.html>_

Genome scale models: an introduction
A genome scale model (GEM) is a mathematical representation of the organism’s metabolism, which depicts all the metabolites, proteins and genes that can be found in the cell, the reactions in which they take part, as well as the relationship between genes and proteins [1].

The organism’s metabolism is represented as a matrix, in which a stoichiometric coefficient indicates the generation or consumption (i.e. 1, -1) or the absence (i.e. 0) of a metabolite. Furthermore, several techniques are available to predict the flux value through a reaction, and to optimize this value for a reaction of interest. One of these techniques is flux balance analysis, or FBA, that uses linear programming to perform the optimization [1].

Using this approach, we can model the metabolism to calculate yields or test the theoretical capabilities of an organism. We can also use GEMs to build in silico pathways for a certain organism. For example, one can manually add the reactions and metabolites that are part of a certain pathway and use the model to simulate the behavior of a genetically modified strain in the lab.

Our genome scale model comprises the whole metabolism of E. Coli MG1655. We used the model iML1515, which is the most recent version and to the extent of our knowledge, the most complete GEM produced until today [1]. To the information contained in iML1515 (reactions, metabolites and genes) we added 9 exogenous reactions that correspond to the genes that we transformed into our strain, plus the metabolites and genes that take part in them. In this way we obtained an in-silico version of the work we planned to conduct in the wet lab.

Methods
Many tools have been developed to analyze the metabolism of a cell using genome scale modelling. Two of them are COBRA [2], developed mainly by the University of Luxembourg and UC San Diego, and RAVEN [3], developed at Chalmers University of Technology. To run the linear programming optimizations, we used the solver Gurobi [4].

We decided to use RAVEN for our analysis because of the expertise that our university could provide us. We checked the capabilities of the model to ensure that the model was still functional when adding our modifications and performed several analyses to understand what our strain would be able to do while using polyurethane, PET and PEG as carbon source, since these are the plastic fibers that are present in clothing and that we aim to degrade.

Analysis and results

Flux balance analysis and yield calculation

We carried out a flux balance analysis (FBA), which is a function from the RAVEN toolbox, for our three different carbon sources: glucose, polyurethane and PET and for the standard media for the iML1515 model and our defined minimal media.

The FBA assumes a steady-state growth and mass balance and tries to maximize the flux through an objective function [5]. The objective function defines the reaction whose flux we want to maximize or minimize under the specific set of constraints (that can be thought as conditions) [6]. In our case, we wanted to maximize growth, which in these models is usually represented by the production of biomass. The production of biomass is defined by a reaction in the model, that we can call the biomass reaction. Therefore, the flux through the biomass reaction is what we want to optimize, and the biomass reaction is our objective function. Thus, the flux that the FBA calculates for that reaction is the maximum rate of biomass production.

The FBA returns a flux distribution that satisfies the objective function, which in our case is maximizing the biomass production. However, the FBA reports a single steady-state solution, and it is possible that multiple optimal solutions exist [6]. (Link to FVA)

From the FBA results, we calculated the maximum theoretical yield that we can obtain by growing E. coli in the different media and carbon sources. To calculate the theoretical yield, the amount of glucose, polyurethane and PET that were supplied to the media were normalized to the number of carbons in each molecule. Since polyurethane and PET are polymers, we also chose a definition of molecule for polyurethane (C17H16N2O4) and PET (C10H8O4). Flux Balance Analysis Yield Figure 1. Maximum theoretical biomass yield calculated from the FBA performed over different Carbon sources and media. The simulations on minimal media are indicated by the word “minimal” following the carbon source.
From the results in Figure 1, we can see that the plastic sources of carbon are less efficient than glucose. Furthermore, it appears that PET is a worse carbon source than polyurethane. However, this can be explained by how we defined our degradation pathway in the model. According to our reactions, when PET is degraded it is transformed into terephthalate and ethylene glycol (EG). Since terephthalate cannot be used by the cell, it is taken out of the system in our simulation in order to maintain the steady state. This means that half of the carbon contained in a molecule of PET is lost, which explains the low efficiency when compared to growth in polyurethane.

The results are one of the reasons that pointed us towards considering the production of an enzyme cocktail rather than cultivating the strain in a bioreactor: E. coli does not seem to grow well in our plastic substrates, and the yield becomes considerably worse when growing on minimal media, which means that it is likely that we would have to supply a complex defined media. The necessity of a rich media could considerably increase the costs of the process and would add time in optimizing the media.

placegolder 2 Map central ay Figure 2. Map of the central carbon metabolism of E. coli utilizing glucose and Polyurethane as carbon source. Colours reflect the fold change in flux between glucose and polyurethane.
Placeholder3 map with pet Figure 3. Map of the central carbon metabolism of E. coli utilizing glucose and PET as carbon source.

The maps visualize the fold change differences in the flux that the reactions carry for polyurethane and PET vs. glucose. Low fold changes are coloured in dark green, and the scale changes all the way to red, which represent that largest fold changes. It is clear that there are changes in the metabolism of E. coli when growing on the different substrates.

To further explore the adaptations of the metabolism to the carbon sources, we used the maximum theoretical production of biomass, as determined by the FBA, as an additional constrain to our model. Then, we computed over 3000 possible solutions to our growth simulation for each carbon source, using the function randomSampling from the RAVEN toolbox. For each run, the function randomly picks a reaction to maximize. Doing this is a good way to explore the solution space of the metabolism. In other words, we explored different flux distributions that the metabolism can adopt when specifying the amount of carbon source and the biomass that needs to be produced.

t-SNE
To get a visualization of the results of the randomSampling function, we used a technique called t-distributed stochastic neighbour embedding (t-SNE).

t-SNE is a dimensionality reduction technique that maps a set of high-dimensional points to two dimensions, so that close neighbours remain close and distant data points remain distant [8].

To get an idea of whether there would be some significant changes between the different conditions, we used the t-SNE algorithm to see if it would be able to distinctively cluster the flux distributions according to the carbon source the simulations used to grow. The t-SNE was performed on the fluxes not optimized for maximum production of biomass.
placeholder 4 missing figcap Figure 4. t-SNE embedding. t-SNE over all the flux distributions for the different carbon sources. There is a clear clusterization between the different substrates, and we can also see that polyurethane and PET cluster closer to each other than to glucose, being polyurethane closes to glucose than PET. The elipse contains the dots corresponding 95% interval confidence of our data.

The ellipse shows that most of our data clusters extremely close together, probably because most of the fluxes are quite similar. It fits the points for each group with a 95% interval confidence assuming that the data follows a multivariate t-distribution. There is a clear clustering of the fluxes corresponding to glucose (on the right side of the plot) vs the fluxes from plastic substrates (on the left). There is also some separation between fluxes from polyurethane and from PET. Therefore, it seems we can expect some differences that can be interesting to explore.

To further explore whether there is a significant change in the usage of certain enzymes or pathways depending on the carbon source, we wanted to perform a differential analysis where we compared the flux carried by each enzyme in each of the possible solutions for the three conditions. Before doing this, we filtered out those solutions for which the biomass production is 0, since we are not interested in those distributions that do not allow the bacteria to grow.

Exploring the solution space with random sampling
We calculated the flux fold change for each reaction across the different conditions and performed a Kruskal-Wallis test (non-parametric variant of one-way ANOVA) to assess if the differences were significant or not. To facilitate the analysis, we did a query in KEGG to extract the pathways reported for each enzyme, so that we can group enzymes by pathway and see which pathways are enriched in enzymes that have different activity levels for each substrate.
Figure 5. Flux visualization for each of the enzymes involved in the Glycolysis/Gluconeogenesis . Each dot corresponds to one of the generated random solutions. The horizontal line represents the value of 0 in the plot. Ns: p > 0.05; *: p <= 0.05; **: p <= 0.01; ***: p <= 0.001; ****: p <= 0.0001
Soft Segment Hard Segment Hard Segment


Each dot in the graphs corresponds to one solution. Therefore, a scattered cloud means that the enzyme or reaction can take many different flux values. We can also see that for PET and polyurethane, most of the enzymes carry a similar flux distribution but for glucose the flux is usually higher. It is important to note here that, in genome scale modelling, the sign indicates the direction of the reaction. Therefore, if an enzyme adopts a negative flux for one of the conditions, and positive for the rest, it means that the directionality of the reaction is changing.

In general, there is a wider variety of fluxes when simulating growth in glucose. It reflects the most efficient use that E. coli does of this substrate, since its metabolism is adapted to use it.

From these results, we can also infer enzymes that could be a good target for future optimization. For example, if we the metabolism of Glycine, serine and threonine, we see that GLYCK and GLYCK2 consistently have higher fluxes for the simulations in polyurethane and PET. These enzymes are glycerate kinases and participate in the production of 3-phospho-D-glycerate, which in term has an important role in glycolysis and the TCA cycle. We could try to upregulate these two enzymes in our strain to allow for a higher activity. Another interesting target is the enzyme lactaldehyde reductase, which carries flux when simulating in polyurethane and PET but not for glucose.

We have identified about 15 enzymes belonging to the native metabolism of E. coli that are interesting targets for optimization.

Parsimonious FBA to find the most efficient flux distribution
Parsimonious FBA performs a bi-linear optimization. First, it runs a normal FBA to optimize the production of biomass. Then, it tries to minimize the flux through each reaction while maintaining the biomass level at its maximum, and it labels each gene according to its contribution to the optimal growth prediction into essential genes, genes that are part of the optimal solution, genes that are predicted to maximize the flux if included, and genes that are predicted to decrease the flux [9], [10]. We can compare the essential genes predicted for each substrate, and the genes that are predicted to maximize flux, with the results of our random sampling to increase the confidence in our findings. Evolutionary, the cell wants to be as efficient as possible. Therefore, it makes sense that eventually the bacterial metabolism will converge to the most efficient version of its metabolic network.

We were interested in the enzymes that are predicted to increase the efficiency of the network, to compare if the pFBA prediction would match to the enzymes that appeared to have a significantly different activity across carbon sources from the random sampling analysis.

The list of enzymes that was provided by the pFBA has many enzymes in common with the enzymes showing higher activity for the plastic substrates in the random sampling analysis. Therefore, it increases the confidence in the prediction that a higher expression of these enzymes could lead to an increase growth of the recombinant strain of E. coli in polyurethane or PET.
Plastic Enzyme ID Reaction
PET/PU 'ATPM' ATP maintenance requirement
PET 'CO2tpp' CO2 transporter via diffusion (periplasm)
PET/PU 'CYTBO3_4pp' Cytochrome oxidase bo3 (ubiquinol-8: 4 protons) (periplasm)
PET/PU 'GLYCTO2' Glycolate oxidase
PET 'MALS' Malate synthase
PET/PU 'ME1' Malic enzyme (NAD)
PET/PU 'NADH16pp' NADH dehydrogenase (ubiquinone-8 & 3 protons) (periplasm)
PET/PU 'PDH' Pyruvate dehydrogenase
PET/PU 'TRSARr' Tartronate semialdehyde reductase
PU 'ACALD' Acetaldehyde dehydrogenase (acetylating)
PU 'ATPS4rpp' ATP synthase (four protons for one ATP) (periplasm)
PU 'CYTBO3_4pp' Cytochrome oxidase bo3 (ubiquinol-8: 4 protons) (periplasm)
PU 'GLUDy' Glutamate dehydrogenase (NADP)
PU 'GLXCL' Glyoxalate carboligase
PU 'MDH2' Malate dehydrogenase (ubiquinone 8 as acceptor)
PU 'THD2pp' NAD(P) transhydrogenase (periplasm)
PU 'VALTA' Valine transaminase
PU 'VALt3pp' L-valine transport out via proton antiport (periplasm)
PU 'VALtex' L-valine transport via diffusion (extracellular to periplasm)
Table 1. Summary of the enzymes predicted to increase the efficiency of the metabolic network according to the results of the pFBA. The “Plastic” column indicates in what substrate the presence of the enzyme would lead to a more efficient metabolic network.


Flux Variability Analysis to explore the flexibility of the optimal solutions
Flux Variability Analysis (FVA) is used to find the minimum and maximum flux for reactions in the model, while still supporting some state of the network [11] such as the optimal growth as reported by an FBA. As mentioned before, only one optimal solution is reported by FBA, while many optimal solutions may exist for our model. With the FVA, we can explore the range of these alternative optimal solutions [6]. We run the simulations maximizing first for the biomass function and then allowing the system to find optimal solutions that would maintain 90% of the maximum growth rate. By doing so, we maintain growth while exploring if our bacteria could be able to produce anything in addition to growing. In this way we could explore future possibilities to generate products, such as platform chemicals that could be used for other processes, from the degradation of the plastic fibres.

Placeholder 6 Figure 6. Results from the Flux Variability Analysis. We can see the range between the maximum and minimum value for each reaction for the three different conditions.

We can see that the variability of the reactions when simulating growth in PET is lower compared to polyurethane or glucose. This means that there are not so many optimal solutions to the system for growth in PET, which means that the constraints on the model are tighter. If we think of the design of the degradation pathway of Elastane (link to Design page), we can see that it makes sense. When degrading PET, we obtain two products, terephthalate and ethylene glycol. To simplify our model and adapt it to the scope of our project, we ignored the production of terephthalate, since E. coli cannot metabolize it. In reality, terephthalate will simply accumulate in the media. However, to be able to run our simulations, we need the model to achieve a steady state, which means that there cannot be accumulation of any product. To solve this, we included a sink for terephthalate in our model, so that the compound can leave the system and it is not further considered for the metabolism of the cell, which is similar enough to its accumulation in the media in real life. In the model, this means that half of the carbon atoms that are fed to the metabolism from PET are discarded in the form of terephtalate, whereas all carbon atoms are utilized from glucose and polyurethane.

For polyurethane, on the contrary, the degradation pathway yields ethylene glycol and 2-oxopent-4-enoate. Both compounds can be further metabolized by the cell and used as energy sources. This also probably accounts for the higher flux variability of polyurethane than glucose: when using glucose as carbon source the cell mainly uses the glycolysis pathway to generate the energy, but when using polyurethane, it can use many different pathways, albeit less efficient than glycolysis.

It is important to remember here that a higher variability range does not mean large flux through a reaction: it just means that there is a large difference between the maximum and minimum value that the flux can take. In other words, an enzyme that has a low variability range according to the FVA can still carry a large flux, it only means that it carries a similar amount of flux for all the possible optimizations that are calculated by the FVA. A low variability range can also indicate that the confidence in the prediction for that enzyme is higher, since it has a similar value no matter what the network looks like.

When we studied the reactions that have higher and lower variability range for the three carbon sources, we discovered that, as expected, the reactions with lowest variability range are similar between polyurethane and PET, and different from glucose, whereas the reactions with highest variability range are different for each carbon source. We decided to study some of the reactions that seemed to be interesting targets for optimization from the randomSampling exploration. Placeholder 7 Figure 7. Variability range for enzymes that had a significantly different activity from the exploratory analysis. We calculated the difference between the maximum and minimum flux for a few handpicked enzymes, to explore the variability of enzymes that behave different for the different carbon sources.

Unexpectedly, the variability is quite high. Ideally, we would like the enzymes that we want to target to have a low variability range since the confidence in our predictions would be higher in that case, as we have explained above. This does not mean that these enzymes are not good targets for optimization, but rather than we should expect discrepancies on how they behave in the lab and how we expect them to behave from modelling.

Adding value to recycling: can we produce something when degrading plastic?
E. Coli has been extensively used to produce bio-based chemicals [12]. Therefore, it is not a stretch to think that we could use our chassis to produce a valuable product when growing on plastic fibers. To check whether it is possible to produce anything besides biomass, we performed a bi-linear optimization: first we optimized for the production of biomass, and then, setting the optimal biomass value as an additional constrain (as we did for the parsimonious FBA, for example). Then, we changed the objective function to the secretion of other metabolites. In our case, we decided to explore the possibility to produce succinate, a well-known platform chemical that can be produced by E. Coli under certain conditions [13], and methanol, that has also reportedly been produced in E. Coli using biomass-derived sugar compounds as carbon source [14].

Unfortunately, according to our model it is not possible to produce succinate even when reducing the growth to a lower rate of 0.01 g/gDWh. However, it seems possible to produce very small amounts of methanol when lowering the biomass production to 0.01 g/gDWh. Placeholder 8 Figure 8. Methanol yield on carbon source. Yield of methanol production when growing the model on polyurethane and PET and specifying a biomass production rate of 0.01 g/gDWh.
Although it is obvious that a significant production of methanol cannot be achieved from our strain, the fact that it is possible to secrete even a minuscule amount opens the door for future projects that could work towards optimization of the production of methanol from plastic fibers.

These results do not mean that it would be impossible to produce other compounds, such as succinate, but rather than a heavier work on metabolic engineering is required. Furthermore, we also should take our results with a grain of salt, since our simulations are a simplification of the model and for example we are not taking into account the amount of carbon that we are losing in the production of terephthalate (from the soft segment degradation) or benzoate (from the hard segment degradation). Further research is needed to either try to further degrade these products or to see whether they could be turn into something of value: for example extract the benzoate and use it as platform chemical for other processes.

Limitations of GEMs
As it happens with many in-silico approaches, there are limitations to the extent that the GEM represents reality. Here we present some of general limitations inherent to the use of GEMs, and some that arise from the way we conducted our experiments. It is important to take these limitations into account when drawing conclusions from the results.

General limitations

For example, in vivo most reactions are harnessed by physico-chemical and spatial constraints, the availability of nutrients, and regulation systems. Nowadays, the amount of enzyme available to catalyze the reaction is believed to be one of the main constraints of a metabolic network. However, in the GEM most of the reactions are not constrained: we allow the concentration of enzyme to take any value within a very large range. This means that our model can give us a solution that is not realistic, because for it to happen the amount of protein that should be present in the cell just cannot be there (after all, the cell is a tiny thing). A way to reduce the impact of this limitation is to use the GECKO methodology [15] to obtain an enzyme-constraint model.

Another problem of FBA and FVA is that they do not account for the tight regulation that the metabolism of the cell is under [5]. For example, negative feedback loops from metabolites in the network. This means that the predicted phenotypes are not accurate, and it cannot predict metabolites concentrations, since it only balances fluxes [5].

Limitations of our approach

The model that we chose, iML1515, presented some problems in the beginning: some of the metabolites’ names were incorrect because they had been duplicated, i.e. for water H2O H2O [e]. To solve this, we made a script to further curate the model. However, it is possible that there were other incongruencies in the model that we did not realize and that could influence the analysis results.

To analyse the pathways that exhibited higher variability with random sampling , we wrote a script that would use the gene IDs from our model to query the KEGG database and retrieve the associated pathways. KEGG often has more than one pathway associated with a gene. However, to simplify the workflow, we chose only the first pathway that was reported and disregarded the rest.

The solutions coming from random sampling are, obviously, random, which means that different results may be obtained for different runs. However, using a large enough sample diminishes the stochastic effects and give considerably good results.

What does this mean for our project?
Our goal is to understand what the metabolism of our strain of E. coli is, in theory, able to do when growing on plastic fibres. We have also tried to predict which metabolic state is more optimal and what are the enzymes that contribute to the maximum theoretical growth rate. These enzymes can be targeted in the lab to improve the efficiency of the strain.

As we have explained, as of today modelling is not an ineffable tool that will give us a unique solution – it is likely that it will not even give us a real solution. However, to optimize a strain to produce a specific product or protein can be a very long and difficult project. With our predictions we have identified both shortcomings of our original plan: i.e. it will be hard to grow our bacterial strain in a bioreactor and to obtain an efficient elastane degradation; and several possibilities for future optimization. Therefore, the modelling has been an important decision-making tool: in our case, the low yields when simulating growth in plastic fibres, and the fact that it seemed unrealistic for a project so constricted in time to get E. Coli to produce any kind of chemical when degrading plastic supported our decision to switch from engineering all our enzymes into the bacteria towards purifying them one by one and producing an enzymatic cocktail.

Furthermore, by performing a detailed in silico analysis of the metabolism of E. Coli, we have also improved our general knowledge about the metabolism of the strain we are working with, which can also help troubleshoot problems that arise when working in the lab.

  1. [1] C. Gu, G. B. Kim, W. J. Kim, H. U. Kim, and S. Y. Lee, “Current status and applications of genome-scale metabolic models,” Genome Biology, vol. 20, no. 1. BioMed Central Ltd., pp. 1–18, Jun. 13, 2019, doi: 10.1186/s13059-019-1730-3.
  2. [2] L. Heirendt et al., “Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0,” Nat. Protoc., vol. 14, no. 3, pp. 639–702, Mar. 2019, doi: 10.1038/s41596-018-0098-2.
  3. [3] H. Wang et al., “RAVEN 2.0: A versatile toolbox for metabolic network reconstruction and a case study on Streptomyces coelicolor,” PLOS Comput. Biol., vol. 14, no. 10, p. e1006541, Oct. 2018, doi: 10.1371/journal.pcbi.1006541.
  4. [4] Gurobi Optimization, “Gurobi Optimizer.” Houston, Texas.
  5. [5] E. J. O’Brien, J. M. Monk, and B. O. Palsson, “Using genome-scale models to predict biological capabilities,” Cell, vol. 161, no. 5. Cell Press, pp. 971–987, May 30, 2015, doi: 10.1016/j.cell.2015.05.019.
  6. [6] H. A. Herrmann, B. C. Dyson, L. Vass, G. N. Johnson, and J. M. Schwartz, “Flux sampling is a powerful tool to study metabolism under changing environmental conditions,” npj Syst. Biol. Appl., vol. 5, no. 1, pp. 1–8, Dec. 2019, doi: 10.1038/s41540-019-0109-0.
  7. [7] Z. A. King, A. Dräger, A. Ebrahim, N. Sonnenschein, N. E. Lewis, and B. O. Palsson, “Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways,” PLOS Comput. Biol., vol. 11, no. 8, p. e1004321, Aug. 2015, doi: 10.1371/journal.pcbi.1004321.
  8. [8] D. Kobak and P. Berens, “The art of using t-SNE for single-cell transcriptomics,” Nat. Commun., vol. 10, no. 1, pp. 1–14, Dec. 2019, doi: 10.1038/s41467-019-13056-x.
  9. [9] N. E. Lewis et al., “Omic data from evolved E. coli are consistent with computed optimal growth from genome‐scale models,” Mol. Syst. Biol., vol. 6, no. 1, p. 390, Jan. 2010, doi: 10.1038/msb.2010.47.
  10. [10] M. L. Jenior, T. J. Moutinho, B. V. Dougherty, and J. A. Papin, “Transcriptome-guided parsimonious flux analysis improves predictions with metabolic networks in complex environments,” PLoS Comput. Biol., vol. 16, no. 4, p. e1007099, Apr. 2020, doi: 10.1371/journal.pcbi.1007099.
  11. [11] S. Gudmundsson and I. Thiele, “Computationally efficient flux variability analysis,” BMC Bioinformatics, vol. 11, no. 1, p. 489, Sep. 2010, doi: 10.1186/1471-2105-11-489.
  12. [12] X. Chen et al., “Metabolic engineering of Escherichia coli: A sustainable industrial platform for bio-based chemical production,” Biotechnology Advances, vol. 31, no. 8. Elsevier, pp. 1200–1223, Dec. 01, 2013, doi: 10.1016/j.biotechadv.2013.02.009.
  13. [13] C. Thakker, I. Martínez, K. Y. San, and G. N. Bennett, “Succinate production in Escherichia coli,” Biotechnology Journal, vol. 7, no. 2. NIH Public Access, pp. 213–224, Feb. 2012, doi: 10.1002/biot.201100061.
  14. [14] T. Takeya, M. Yamakita, D. Hayashi, K. Fujisawa, Y. Sakai, and H. Yurimoto, “Methanol production by reversed methylotrophy constructed in Escherichia coli,” Biosci. Biotechnol. Biochem., vol. 84, no. 5, pp. 1062–1068, May 2020, doi: 10.1080/09168451.2020.1715202.
  15. [15] B. J. Sánchez, C. Zhang, A. Nilsson, P. Lahtvee, E. J. Kerkhoven, and J. Nielsen, “Improving the phenotype predictions of a yeast genome‐scale metabolic model by incorporating enzymatic constraints,” Mol. Syst. Biol., vol. 13, no. 8, p. 935, Aug. 2017, doi: 10.15252/msb.20167411.