As iGEMers, we know that every Synthetic Biologist wishes to create better, safer and more stable genetic constructs. Our answer? Genetic Entanglement!
With our project HuGenesS we wish to introduce a technology that allows the entanglement of two genes of choice, so that a single stretch of DNA will encode two proteins with overlapping coding sequences, but in different reading frames. We believe that this project could ignite interest in this technology in future iGEM teams and aid in multiple areas of Synthetic Biology, such as improved genetic construct design, improved vector design and biocontainment.
Overlapping genes and their potential uses in Synthetic Biology
Overlapping genes is a phenomenon where two proteins are simultaneously encoded by the same stretch of DNA. The genes can run in opposite directions, i.e. on opposite strands of the same DNA segment. Alternatively, the overlap can be unidirectional, where the two genes are found on the same strand, typically coding for two different proteins depending on the reading frame.
Such genes occur in nature, in all domains of life and their existence has been known for decades. Most of the known overlapping genes come from viruses, which served as an inspiration for our project. The first ideas for our project this year centered on phagotherapy, namely adding new genes to small filamentous phages, to make them suitable for combating antibiotic resistant bacteria. However, we quickly hit an obstacle - the maximum possible size of a stable viral particle was too small to accommodate the genes we wished to add. Our search for solutions led us to an old paper on the φX174 bacteriophage, documenting the first discovery of two pairs of unidirectional overlapping genes (Sanger et al, 1977).
We realised that if we could generate such genes ourselves, the potential applications far exceeded fitting more information into small vectors. For example gene entanglement could be used for the stabilisation of a gene of interest or biocontainment of a gene and/or a genetically engineered microorganism (GEM) (Blazejewski et al, 2019).
From our dive into the field of phagotherapy, we knew that one of the major problems genetically engineered machines (GEMs) face is the high mutation rate of genes introduced. However, by entangling a gene of interest with a gene essential for multiplication or survival, the rate of mutation in this gene can be reduced. Mutations occurring in the gene of interest would automatically lead to a change in the sequence of the essential gene too. Since many mutations would have a deleterious impact of the function of the essential gene, mutant versions will be eliminated and the integrity of the gene of interest in the population will be maintained.
Genetic entanglement could also prove useful for GEM confinement and preventing horizontal gene transfer. For this application one could interweave a gene of interest with that of a gene encoding a toxin from a toxin-antitoxin system. Ensuring that the antitoxin is only produced in a particular growth medium would be useful to prevent the whole GMO chassis to survive and multiply outside from the laboratory/industrial setting. Moreover, if only the synthetic chassis produces the antitoxin, one would prevent the acquisition of the gene of interest by other organisms through horizontal gene transfer. This would be a step up from simply having the gene of interest and toxin on the same vector, as it eliminates the chance of them being separated, by recombination, for example.
Consideration of all the exciting things we could do by entangling pairs of genes, as well as a recent scientific development revealed by a search of the literature, made us officially choose Genetic Entanglement as the focus of our project - HuGenesS (Hug Genes Saclay). The genes are interlaced, as if in a loving hug!
Understanding and improving the CAMEOS software
The scientific development that made our project possible was the development of a software called CAMEOS, for Constraining Adaptive Mutations using Engineered Overlapping Sequences, that was recently published (Blazejewski et al, 2019). It is an algorithm that can artificially entangle pairs of protein-coding genes into a single sequence.
There are several necessary steps in the pipeline of creating entangled genes. Let us imagine we want to entangle two coding sequences, A and B:
1) First, a list of protein Multiple Sequence Alignments (MSAs) of each of the two gene products, A and B, with their homologs should be compiled. This provides the algorithm with the "digital fingerprint" of each protein family - information about which features give each protein its identity, the regions most important to conserve. Homologs of the two genes are chosen that would necessitate least significant changes to either sequence upon entanglement.
2) Second, the CAMEOS algorithm is run, adjusting the sequences of both genes in a way that both minimises substitutions and indels in their encoded amino acid sequences and least perturbs long-range interactions within the proteins.
3)A number of variants of the entangled sequence are generated, that then have to be analysed manually to choose the most promising ones - those that contain both coding sequences, and have fewest serious modifications to either amino-acid sequence.
The algorithm is mainly based on two mathematical tools: a Hidden Markov Model to construct the “digital fingerprint”, generate and assess entangled sequences and Markov Random Fields to improve the entangled sequences by taking into account long range interactions in the protein tertiary structure. More on the maths behind CAMEOs, as well as suggested improvements to the selection algorithm can be found in our Modelling section
Getting accustomed to the CAMEOs software was no easy task, but in the end we generated over 12 different kinds of sequence entanglements using the pipeline described above, which can be seen in our Engineering section . We also aimed to improve the pipeline by creating our own Software : a script to streamline the generation of CAMEOs inputs from the MSA files for the two genes, and two scripts for result analysis, replacing the native criteria of choosing optimal entangled sequences with a less arbitrary method utilizing the Pareto optimization technique . To create an easier and more user-friendly experience for future iGEM teams, we produced a tutorial for the utilization of the CAMEOs software - the CAMEOS Course .
Gene Entanglement - proof of concept
As proof of concept, we chose to entangle two sequences that should produce easily observable phenotypes to test in the lab. After going through the CAMEOS pipeline, we generated 6 HuGenesS in silico, predicted to encode both (1) fluorescent proteins and antibiotic resistance, or (2) antibiotic resistance and luciferase, an enzyme that produces light, or (3) luciferase and CcdB, a toxin that is lethal for bacteria that do not express the antitoxin. The entangled HuGenesS were further optimized for cloning and a more detailed account can be found in our Project Design section. We succeeded in cloning one sequence combining the Kanamycin nucleotidyltransferase (Knt; aminoglycoside antibiotic resistance) and green fluorescent protein (GFP), our BioBrick - knt-gfp ( BBa_K3427001 ). While, unfortunately, the encoded proteins did not function as we had hoped, requiring further experiment to troubleshoot and improve our entangled sequence design, our lab work did yield one biobrick success. Our search for potential reporter proteins to entangle led us to successfully characterize a novel thermostable flavin-based fluorescent protein stable_YFP_LOV ( BBa_K3427000 ), as can be seen in our Results .
Our engagement with other iGEM teams, the scientific community and the general public
We appreciate that the iGEM competition is not a solitary effort. We wanted to define how Gene Entanglement could benefit future iGEMers, fit in in the field of Synthetic Biology and be applied practically. We were eager to collaborate with other iGEM teams and seek the opinions of experts, as well as of the community.
On our exciting iGEM journey, GO Paris-Saclay participated in global meetups, such as iGEMeetParis and the meetup focused on Phagotherapy hosted by team Marburg and collaborated with 11 French and Swiss teams a video “iGEM FR” presenting our teams, projects and regions. We also conducted several insightful talks and interviews with researchers in various fields of Synthetic Biology, including Maher BEN KHALED (Toulouse Biotechnology Institute, INRAE) working on plastic biodegradation; Dr Tristan Rossignol, a work package leader in the European project CHASSY (Model-Based Construction and Optimisation of Versatile Chassis Yeast Strains for Production of Valuable Lipid and Aromatic Compounds) upon whos’ request we produced several entangled gene sequences or their model organism Yarrowia lipolytica; and Dr Thomasz Blazejewski and Dr Harris H. Wang, two of the co-authors of the CAMEOS article that helped us better understand their software and its applications.
We considered the practical implementation of our project, researching French laws on GMOs and conducting an analysis of the actual offer for a potential commercialization.
We also realise that as an iGEM team, one of our main goals is to make Synthetic Biology more accessible to the general public. This is why our team held interactive science workshops at the Apériscience exhibition following the "DNA superstar or superflic" conference (presented by Catherine Bourgain) at the Massy Library and at the Fête de la Science at the Cité des Sciences et de l'Industrie.
More information on our involvement with the iGEM, scientific and general community can be found in our Human Practices section .
Sanger F, Air GM, Barrell BG, et al. (1977). Nucleotide sequence of bacteriophage phi X174 DNA, Nature, 265(5596):687-695. doi:10.1038/265687a0
Blazejewski, T., Ho, H.I., and Wang, H.H. (2019). Synthetic sequence entanglement augments stability and containment of genetic information in cells. /Science 365: 595–598.