Team:TAU Israel/Poster

Poster: TAU_Israel

sTAUbility: an Innovative Approach to Increase the Genetic Stability of Heterologous Genes
Presented by Team TAU Israel 2020

Karin Sionov¹, Matan Arbel¹, Itamar Menuhin¹, Doron Naki¹,Niv Amity¹, Bar Glickstein¹ ,Omer Edgar¹, Noa Kraicer¹, Itai Katzir¹, Einav Saadia¹, David Kenigsberger¹, Hadar Ben Shoshan¹, Dan Alon², Adi Yanai², and Prof. Tamir Tuller³.

¹iGEM Student Team Member, ²iGEM Team Mentor, ³iGEM Team Primary PI.

Abstract

A key challenge in the field of synthetic biology is genomic instability of introduced genes. Once a gene is inserted into a host organism, it can cause an additional metabolic load, significantly reducing host fitness. Mutations that damage the introduced gene are therefore likely to be selected for, diminishing its expression. These mutations could render synthetic-biology products obsolete and require constant maintenance. We propose interlocking a target gene to the N-terminus of an essential gene in the host’s genome, under the same promoter. This way, mutations on the target gene are likely to affect the expression of the essential gene, leading to mutated host mortality. We are developing a software called sTAUbility, that would match the best-fitting essential gene and linker to a given target gene, based on bioinformatic models and novel approaches for measuring stability. Furthermore, sTAUbility optimizes the combined construct for efficient gene expression and increased stability.
Problem Description, Motivation and Inspiration

Problem Description
When a foreign genetic construct is introduced to a host cell, it causes an additional metabolic load and inherently decreases the cell's fitness, since it requires extra energy without any benefit to the host. Therefore, genetically modified organisms which acquire mutations that inactivate the construct will have improved fitness relative to other modified individuals and their genotypes will become the most frequent in the population, leading to the loss of the construct. This phenomenon results in the evolutionary instability of introduced synthetic genes, hindering many major advances in biosynthetic designs and implementation.

Motivation
We discovered that there are no existing tools to design stable constructs. Previous iGEM projects have focused on identifying sequences that contribute to the relative instability of genetic constructs, but they did not recommend a proper solution for avoiding instability patterns. Many commercially available expression optimization tools do not consider mutational stability at all when offering an optimized sequence.
In addition to a lack of tools for designing stable foreign genetic constructs, limitations exist in available techniques to measure stability. Currently, stability measurements use reporter genes to indicate the target construct's preservation status in the population, and can only measure stability at small scales. Moreover, there are no available techniques for measuring the co-stability of linked genes expressed together.

Inspiration
Our initial inspiration was the Lethbridge 2013 project. They developed a software that enables dual coding of proteins, utilizing pseudoknots as regulatory parts. This idea inspired us to use the pseudoknots for the fight against evolution, to preserve genetic constructs’ functionality when introduced to a new host.

Originally, we planned to create an overlap between a foreign gene of interest (hence referred to as "target gene") and an essential gene, to increase the target gene's mutational stability. However, we realized that the probability for such an overlap is very low, and came up with a new generic method for increasing the stability of a gene of interest.
Idea
We propose interlocking a given target gene to the N-terminus of an essential gene in the host’s genome, under the same promoter. An essential gene is a gene whose activity is vital for the cell's survival. Thus, mutations on the target gene are likely to affect the transcription of the essential gene, leading to the mutated host’s mortality.



The way we link the genes together depends on the purpose of the construct. In this project, we focused mainly on two linkage options: fusion linkers and 2A linkers.
We prove that each conjugated gene provides different stability levels when attached to a specific target gene. Moreover, we hypothesized that there is a possibility for a non-essential gene to better promote a target gene’s stability in comparison to an essential gene, perhaps because it is preserved in evolution even though it is not defined as "essential".
Accordingly, the recommendation engine that we developed offers an end-to-end solution and generates the best-fitting conjugated gene and linker for a given target gene, increasing stability and expression levels.
Project Goals
  1. Devise novel prediction and optimization models which simulate the co-stability of interlocked genes and help understand their behavior.

  2. Develop a software tool (sTAUbility Enhancer) that designs a construct for a given target gene, composed of the best-matching essential gene and linker for improved stability and higher expression levels. This software provides users with our solution, in an easy-to-use framework.

  3. Create an innovative technique (Gene-SEQ: Stability-Enhancing Quantifier), for large-scale measurement of genetic (co)-stability. The data from such experiments will be used to train our models further and create a more robust and versatile software.

  4. Demonstrate that N' terminally attaching a target gene to a host organism’s essential gene significantly increases the evolutionary stability of that target gene, depending on the essential gene's properties.

Engineering Process
Several steps in our modular design required preliminary assessment. We demonstrate the use of engineering principles through five sub-projects that influenced our design and allowed us to improve our methods, based on models and statistical analysis.
The five sub-projects are as follows:




  1. Feasibility analysis for linking target and essential genes by sequence overlap. We evaluated the possibility of linking target and host genes with overlapping frame-shifted sequences, but a probability analysis demonstrated that this approach was too limited. We improved our proposed solution accordingly and designed new ways to interlock the essential and target genes.

  2. Co-culture statistical analysis. In the original design of the Gene-SEQ experiment, we planned to sample daily the culture (~50 microliters) for the following day. After sampling, we planned to perform FACS and separate variants according to their fluorescence level. In this analysis, we attempted to quantify the sample size that would enable us to keep most of the variants in the population, regardless of their fitness.
    We realized that there is a problem with sampling a small fraction of the population every day - very quickly we would sample only the variants with the highest fitness, as they have many copies. Accordingly, there would be no point in performing FACS separation, because would separate by fluorescence, not fitness.
    We solved this problem using the Chi.Bio system. This system dilutes the population instead of sampling a small portion of the culture, maintaining higher heterogeneity.

  3. Chi.Bio setup. In-depth analysis of the Chi.Bio platform allowed us to use the sophisticated hardware to monitor optical density (OD) and fluorescence, constantly dilute the culture with fresh media, and always maintain an OD of our choice in our Gene-SEQ experiment.

  4. Restriction enzyme analysis. We sought to select the optimal restriction enzymes for our experimental design, whether custom-designed, commercial, or even multiple enzymes. An optimal enzyme would allow us to uniquely identify the different strains in the mixed libraries of the Gene-SEQ experiment, preserving as much of the target gene (GFP or RFP) as possible for mutational-footprint analysis. It would also minimize overall construct length for much more efficient self-ligation.

  5. Selection of 10 essential genes for the Proof-of-Concept experiment. The experimental design included measuring the mutational stability of ten diverse essential genes interlocked to GFP, as compared to the stability of GFP alone. Our selection model helped us define what is meant by "diverse genes", enabling us to achieve much higher information gain.

Gene-SEQ: Measurement
Introduction
We created a novel technique aimed at large-scale measurement of evolutionary genetic stability and fitness, called Gene-SEQ (Stability Enhancing Quantifier). This technique can measure thousands of constructs simultaneously in a single reactor.
Using Gene-SEQ technique, we measured genetic stability of multiple target genes, and the genetic co-stability of multiple target-essential linkage. Furthermore, Gene-SEQ allows for the extraction of mutational footprints and fitness of each target-essential linkage, which will be used for further improvement of our software.



Our Gene-SEQ experiment
In the Gene-SEQ experiment, we used two libraries [1] provided by Prof. Maya Schuldiner from the Weizmann Institute. These libraries contain about 6000 yeast variants each, one for each ORF in the yeast genome, with a distinct promoter per library. Each variant has a fluorescent gene (either GFP or RFP) fused to the N-terminus of a different gene.




We prepared two mixed cultures, one with RFP-tagged strains and the other with GFP. We grew them in an in-lab evolution experiment with the state-of-the-art Chi.Bio machinery, permitting constant growth. We extracted samples every few days (total of 21 days). We applied our Gene-SEQ deep-sequencing protocol on the samples and sequenced them as mutational data for our optimization model.

POC Experiment
We hypothesized that interlocking a target gene upstream to an essential gene prolongs its evolutionary half-life. To test this hypothesis, we compared the evolutionary half-life of ten diverse constructs against the half-life of the unattached target gene. We did this by measuring their relative preservation of fluorescence, which is used as a measure of stability.


We selected the ten genes by first filtering only for essential genes, with no sensitivity to over-expression [2]. Then we further filtered them for sufficient fluorescence, resulting in 600 possible genes. Following the feature generation process, we utilized partitional clustering to predict the ten genes that are both the most diverse and representative of the gene population.


This experiment had two main goals:
  1. Confirm that fusion linking with an essential gene likely prolongs a target gene’s evolutionary half-life, in comparison to the target gene alone (as the negative control for stability).
  2. Demonstrate that different essential genes provide varying stability levels.

Each well was measured for fluorescence and OD every day for 180 generations, using a plate-reader.


Model
The sTAUbility Enhancer is our main software product. According to the paradigm presented in the 'Idea' section, this software recommends which conjugated gene and linker should be used, and provides a final sequence, optimized for stability and expression levels. For this purpose, it utilizes the following models:

Stability Predictor
This model aims to estimate which conjugated gene will most stabilize the desired construct, by predicting the co-stability levels collected from large-scale experiments.


For this purpose, we generated ~2000 features according to various bioinformatic and biophysical models [3-5] and chose the most significant ones. Our random-forest model used these features and predicted the most fluorescent genes (= most stable) in the yeast library. The genes with the highest score are likely to best stabilize the target construct and will be used subsequently.

Fusion Linker
When fusing two genes, the output proteins can fold due to interactions with the linker and each other, significantly reducing expression levels and functionality due to changes in the protein's folding. Therefore, after finding the optimal conjugated gene, our linker module approximates the change in the target and conjugated proteins' folding, by calculating their relative disorder profile [6]. Our model then predicts which linker will likely cause the least change of folding in the conjugated proteins, ensuring their function and preserving expression levels.



Optimization
Our final model utilizes the DNA chisel framework [7] and optimizes the combined construct to avoid mutational hotspots (detected using EFM principles; see 'EFM Optimizer' section), weak initial mRNA folding, GC content, and codon usage bias. These factors influence stability and expression levels of the combined construct.

The EFM Optimizer Software
The original EFM Calculator [8-9] was created by the 2015 Texas iGEM Team; it is a web tool in which you insert a sequence and find mutational hotspots – Simple Sequence Repeats (SSR) and Recombination Mediated Deletions (RMD). By carefully editing them, a user can improve any sequence’s genetic stability significantly.


RMD sites [9].

SSR sites [9].


Our EFM Optimizer utilizes these and other principles and more to provide an intuitive product, easily embedded into common workflows. The improvements compared to the original tool include:
  1. Analyzing multiple sequences, thus significantly saving time and enabling better tracking of the results.
  2. Methylation sites are taken into account in order to improve applications for mammalian and insectoid cells. Methylation sites are significant epigenetic inheritance hotspots which increase DNA folding and reduce expression levels.
  3. By utilizing the optimization framework described, we provide the users with final sequences, optimized for the avoidance of these hotspots, GC content, and codon usage bias. These factors all influence stability and expression levels in the output sequence.
  4. Lastly, our EFM optimizer is a software as opposed to a web tool, which enables easier long-term product upkeep and better version control for our users.
In summary, by using our EFM Optimizer you avoid time-consuming manual corrections that could lead to non-optimal results, and are provided an end-to-end solution by a click of a button.
The sTAUbility Enhancer Software
The sTAUbility Enhancer is a tool created for designing embeddings of target genes into host genomes to enhance evolutionary stability and expression levels. Due to the variety of linkage options and given that each conjugated gene will provide different degrees of stability when attached to a specific target gene, we offer our software as a way to communicate our solution.





The Staubility Enhancer is a three-phase software. By combining all the developed models, it allows the user to:
  1. Choose the best conjugated gene to increase your target gene's stability.
  2. Select the best linker for your construct's purpose, considering protein folding in the case of fusion linkers.
  3. Optimize the combined construct for efficient translation and increased stability.

We created a robust, versatile GUI Framework that is designed according to the following guidelines:
  1. Providing useful, empirically-based outputs for our users.
  2. Being as intuitive and user-friendly as possible.
  3. Interfacing easily with workflows. This includes using standard data formats and providing outputs that can be used as-is, without further manipulation.
  4. Utilizing existing, established platforms, in order to facilitate higher confidence in our product.
Results
POC experiment

Change in fluorescence (% of original fluorescence) after 180 generations, relative to day one. The green bar shows florescence change from a target gene introduced into the host genome without being linked with an essential host gene (control); all blue bars show florescence from target genes which have been linked to various essential host genes. All but one of these show significant improvements in florescence compared to the non-linked control.


In the POC experiment’s results, we display the fluorescence readings after 180 generations, relative to the experiment’s initial fluorescence.
Our POC experiment’s results demonstrate two properties:
  1. The target-essential gene linkage significantly prolongs the target gene’s evolutionary half-life in the vast majority of cases.
  2. Each essential gene affects the evolutionary half-life differently, emphasizing the need for careful selection of the host gene to be included in the combined construct, which our software can provide.

Stability Prediction: sTAUbility Enhancer software
To prove the stability predictor’s accuracy, we ran our sTAUbility Enhancer software with GFP as a target gene and ranked all of the yeast's ORFs according to their predicted co-stability. We then tested whether there is a correlation between the predicted ranking and the empirical ranking of the nine validated genes analyzed in our POC experiment. The dot-plot reflects the finding that there is a significant correlation between the software’s prediction and the empirical results.
This result demonstrates our software‘s validity and accuracy in predicting which essential gene is most capable of prolonging a target gene’s evolutionary half-life.

Correlation between empirical data and predictions made by our sTAUbility Enhancer software.



Moreover, we predict at least 3 out of the top 10 true best genes, in 100% of the tests!
Further data collected by our Gene-SEQ experiment will highly improve the software’s performance in the future.



EFM optimizer

To prove the efficiency and robustness of our EFM optimizer, we analyzed several highly conserved proteins in "ConSurf" - a bioinformatic tool that assigns a conservation score for each amino acid in a given protein.
We then compared the average conservation score of all residues in these proteins against the average score only in areas indicated by our software as genetically unstable.
We hypothesized that the areas marked by the EFM will have a lower conservation score, as they are genetically unstable

Average amino-acid conservation score of the entire protein compared to the average conservation score of areas predicted by the EFM to be less genetically stable.

Our results demonstrate that our hypothesis was correct; regions marked as unstable by the EFM Optimizer are significantly less conserved (p-value of 0.0003) than the rest of the protein.
Human Practices
Our human practice work is defined as our journey to explore the impact and implementation of our solution. We designed a unique workflow called "ICE-PCR", which is composed of 6 planes:



  1. Industry - We needed to understand both the need for our product, as well as how to refine it for the needs of users. For this purpose, we had several meetings with some of the leading pharmaceitical & biotechnology companies in Israel and around the world. For example, we met with Lonza, a Swiss multinational, chemicals and biotechnology company. Following our meeting, Lonza gave us professional support, help in designing a business plan, and $10K for further development.
  2. Consult – We refined every component of our product, again and again, using the help of many academic consultants who graciously agreed to aid us.
  3. Entrepreneurship - We submitted a patent related to our idea and designed an initial business plan with the support of experts from Lonza.
  4. Public engagement - We held several lectures about synthetic biology for various audiences and also hosted a workshop about genomic stability at the German iGEMer’s meetup. Furthermore, we aided Prof. Tuller’s lab in their research, and are currently engaging other labs who expressed their interest.
  5. Collaborations - Using our software and the knowledge we have accumulated, we helped other iGEM teams improve their projects. In addition, we collaborated with the other Israeli teams and established the “The Israeli Academic Inclusivity Video Project Collaboration”, aiming to promote scientific aspirations in underrepresented groups in Israel.

  6. The AREA framework.

  7. Responsible research - Inspired by the work of previous iGEM teams, we utilized the AREA framework and simplified it, making it easier to implement. This way, we ensured that each step of our research was done safely and responsibly.




Both the ICE-PCR and AREA frameworks connect to our original inspiration - evolution. Responsible research and innovation integrate into each of the 6 planes we defined, helping us to constantly evolve - as individuals, as a team, and as a society. For one instance, consultations with experts led to a solution better tailored to the user’s needs. For another, comprehensive public engagement will help us reach many segments of society, increasing awareness of synthetic biology and equality in research.
Achievements
During this year, we had numerous achievements. To name a few, we:
  • Designed an innovative model for predicting genetic co-stability, using state-of-the-art bioinformatic tools and empirical data gathered from large-scale genomic experiments.

  • Created two software tools. First, we hope that the sTAUbility Enhancer will revolutionize the field of synthetic biology, and allow us as a community to dream bigger. Second, the EFM Optimizer has already proven to be useful for labs and other iGEM teams and will help many more in the future.

  • Developed a novel technique for large-scale measurement of genetic stability.

  • Provided a Proof of Concept for our solution.

  • Initiated a movement in the Israeli science community aiming for a more inclusive tomorrow.

  • Built a mutually beneficial partnership with a major biotechnology company that will help our vision come to life.

  • Had a lot of fun, learned a great deal, and joined an amazing community!
Future directions

  • Additional linkers and plasmid integration – In future versions of our software we intend to offer more linkage options, such as a pseudoknot with a left-over that fits the C-terminal of the target protein, or a "super linker": signal peptide combined with a pseudoknot, allowing for the secretion of the target gene, and for the cleavage of the linker in the process.
    Moreover, in a future version of our product, we would like to address the integration of plasmids into a host organism.

  • Empirical validation of the software, by conducting evolution experiments and testing different target genes and their proposed candidates for increased stability.

  • Further analysis of the Gene-SEQ sequencing results, to quantify the stability and fitness of each variant, and understand the difference between essential host genes and other genes in terms of their co-stability performances.

  • Implementation of our solution to other organisms. This process requires collecting additional data, but the methods are the same as already applied.

  • Broaden our research regarding descriptive features that affect stability of the organism's genes, and adding new features that can promote our stability-predictor performances.

  • Continue our collaboration with Lonza company, according to the business plan we designed.

  • Publish a paper that summarizes our project and its influence on the synthetic biology research.

Acknowledgements and Sponsors
Acknowledgements
Prof. Tamir Tuller
Dan Alon
Adi Yannai
Prof. Maya Schuldiner
Alessio Marcozzi, Cyclomics company
Lonza Pharma & Biotech Group
Edinburgh Genome Foundry
Shaked Bergman‏
Michael Peeri
Ben Oz
Berrick's lab
Prof. Martin Kupiec
Prof. Uri Gophna
Prof. Itai Benhar
Daniel Sionov
Rozi Buber
Idan Melchior
Dr. Keren Raiter

References

[1]         I. Yofe et al., “One library to make them all: Streamlining the creation of yeast libraries via a SWAp-Tag strategy,” Nat. Methods, vol. 13, no. 4, pp. 371–378, 2016.

[2]         J. M. Cherry et al., “Saccharomyces Genome Database: the genomics resource of budding yeast,” Nucleic Acids Res., vol. 40, no. D1, pp. D700–D705, Jan. 2012.

[3]         T. Tuller et al., “An Evolutionarily Conserved Mechanism for Controlling the Efficiency of Protein Translation,” Cell, vol. 141, no. 2, pp. 344–354, 2010.

[4]         M. Usaj et al., “TheCellMap.org: A Web-Accessible Database for Visualizing and Mining the Global Yeast Genetic Interaction Network,” G3 Genes|Genomes|Genetics, vol. 7, no. 5, pp. 1539 LP – 1549, May 2017.

[5]         R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, 1998.

[6]         B. Mészáros, G. Erdős, and Z. Dosztányi, “IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding,” Nucleic Acids Res., vol. 46, no. W1, pp. W329–W337, Jul. 2018.

[7]         V. Zulkower and S. Rosser, “DNA Chisel, a versatile sequence optimizer,” bioRxiv, p. 2019.12.16.877480, Jan. 2019.

[8]         B. R. Jack et al., “Predicting the Genetic Stability of Engineered DNA Sequences with the EFM Calculator,” ACS Synth. Biol., vol. 4, no. 8, pp. 939–943, Aug. 2015.

[9]         B. A. Renda, M. J. Hammerling, and J. E. Barrick, “Engineering reduced evolutionary potential for synthetic biology,” Mol. Biosyst., vol. 10, no. 7, pp. 1668–1678, 2014.

 

Sponsors