Poster: Heidelberg
The Legend of Cellda – A Link Between Proteins
Team Members
Lucas Arnoldt, Jannik Berger, Elizaveta Bobkova, Fabian Bradic, Elizaveta Chernova, Wangjun Hu,
Julia Köberle, Maria Kuchina, Julia Langer, Angela Maidhof, Lukas Voos, Mario Wisbar
Supervisors
Michael Jendrusch, Christian Kobrow, Lukas Bange, Prof. Dr. Stefan Wölfl
Abstract
DNA, RNA and proteins are the fundamental building blocks of synthetic biology and life itself. Of
them all, fusion proteins, which are often very large and inflexible, take the spotlight, allowing for
a modular approach to protein engineering. To mediate interactions between all three core components
of life in a more flexible way we aim to harness the power and versatility of RNAs. We apply
machine
learning to design flexible RNA modules and binding proteins for assembling complexes of
protein domains, RNA and DNA in a sequence-specific manner. In addition, we engineer
trans-splicing ribozymes to reconfigure our RNAs and proteins at the transcript level. Also, we
apply molecular modelling to provide tools to design specific RNA-binding proteins. Taken
together, we provide a controllable, compact, and dynamical alternative to fusion proteins with
applications to drug delivery and expand the toolbox of synthetic biology.
Introduction - Our Inspiration and Design of The Legend of Cellda
Living organisms are made of intricate regulatory pathways that hold the power to replicate,
synthesize and break down molecules and react to external conditions. The most common type of
regulation in such systems is the control of transcription at the DNA-level, that is exerted through
proteinogenic transcription factors. However, when looking at the central dogma of molecular
biology,
we saw many more cellular gateways, that we thought may be just as suited to exert control onto. How
can one achieve this? We thought that we could extend the design capabilities of molecular
biologists
beyond proteins and create RNA associated parts that would allow control of RNA mediated
transcription, RNA and protein mediated translation and finally even
post-translational RNA-protein interaction.
To find parts that would pave the way into these new domains of cellular regulation, we harnessed
the power of RNA binding proteins (RBPs). They allow us to create RNA-linked fusion
proteins, and thus create an opportunity to regulate proteins post-translationally creating
highly interchangeable fusion-proteins. Further we used them for protein mediated RNA manipulation.
We thought of ways to regulate transcription and translation by designing trans-splicing
ribozymes and RNA-DNA-DNA triple helices.
To support our efforts in the wetlab, we developed PRISM, a language model that allows the
creation of RBPs, and 3DOC a software that allows modelling of fusion protein. For the
complex construction of RNA, we created CoNCoRDe based on our RNA combinatorics
modelling, that facilitates RNA design.
We present The Legend of Cellda - A Link between proteins, a new look at cellular regulation.
Protein-RNA Linking
By using RNA to tether together protein subunits, post-translational assembly of fusion
proteins becomes possible. Big proteins, like
dCas-transcription factor complexes, could in this way be assembled after the independent
translation of the protein subunits, allowing for highly interchangeable and space-saving regulatory
systems (J. Lian et al., 2017). Further the delivery of large protein complexes through size
limited viral vectors may
become possible because such proteins could simply be split,
distributed among different vectors and then be assembled in the targeted cell.
To validate this idea, we tested naturally occurring viral RNA binding proteins (RBPs)
such as the Antitermination Protein N from the bacteriophage lambda (lambda N) (C. R. Conant et
al., 2008) and the MS2 Coat
protein from
the bacteriophage MS2 (MCP) (D. Peabody, 1993). They are relatively small and can bind to a
specific, short RNA motif
with high association strength.
For our proof of concept we designed a simple reporter assay (C.G. Wilson et al.,
2004), in which two domains of a
split GFP are brought into close proximity by linking them with RNA. We tethered MCP to
the N-terminal domain of GFP (NGFP) and lambda N to the C-Terminal domain of GFP(CGFP) with a
flexible GS-Linker to ensure correct protein folding and activity. Then, we designed
multiple RNA linkers with a variable sequence which contained the RNA binding motifs of MCP and
lambda N. The designed sequences were flanked with 5’ and 3’ Hammerhead-Ribozymes to ensure optimal
processing and thus folding of the RNA Linkers.
We hypothesized that by co-transforming a cell with these constructs, the two halves of split GFP could assemble via an RNA linker and the cell would fluoresce green proportionally to the assembly strength of the subunits. This assay is not only a proof of concept, but could be used to characterize RBPs or RNA Linkers and test their potential. For example we also designed this assay for versions of the Pumilio Homology Domains
Cloning with the Marburg Collection in E. coli
For our cloning we were able to obtain a physical copy of the Marburg Collection from the
iGEM
Marburg team 2018. We successfully showed how it can be implemented in E. coli, adapted some
of the protocols and created a short guide with tips and our experiences that we collected
while working with the cloning method. Future iGEM teams can learn the following from us:
- In Level 0 cloning, only small amounts (7.5 ng) of the entry vector should be used, as larger amounts lead to plates containing an overwhelming amount of colonies just containing the entry vector.
- Educt plasmids should be added in an equal molar amount when preparing the Level 1 and 2 cloning. The resistance gene should always be added in a molar concentration 1/10-fold compared to the other components, as larger amounts lead to plates containing an overwhelming amount of colonies containing the resistance in some combination with other educt plasmids.
- We adapted the PCR protocols to a high temperature ligase – NEB Hi-T4 ligase – and theorize that it may be worth to work with a ligase of lower activity to possibly gain better cloning efficiency.
- E. coli do not have to be lysed prior to colony PCRs.
- In Level 1 and 2 Cloning lots of colonies should be picked and sequenced (at least 5+ per construct). In Level 0 Cloning picking 3 is usually sufficient.
- Working with this collection really felt like using a library of biobricks. It was great to see how much one could profit from the work of other iGEM teams, and we hope that we were able to add a bit ourselves.
- In Level 1 and 2 Cloning lots of colonies should be picked and sequenced (at least 5+ per construct). In Level 0 Cloning picking 3 is usually sufficient.
Working with this collection really felt like using a library of biobricks. It was
great to see how much one could profit from the work of other iGEM teams, and we hope
that we were able to add a bit ourselves.
Modular RNA Binding - Pentatricopeptide Repeat and Pumby Proteins
After finding small viral RNA binding proteins (RBPs) we set out to discover new RBPs and found the
pentatricopeptide repeat proteins (PPRs) and pumilio homology domains or pumby
proteins.
They are made of repeating domains that bind a specific ribonucleobase. By concatenating
these
domains a protein that binds a specific RNA motif can be created (K. P. Adamala et al.,
2016)(S. Manna, 2015).
We decided to use their ability to target specific RNA motifs, to create a control element for
protein expression that is not like usual based on the control of DNA transcription, but on
mRNA
translation. For this, we designed a simple reporter assay, where we engineered a PPR-Protein to
bind
the ribosomal binding site (RBS) of the mRNA of a sfGFP Reporter. In case of successful binding we
would suspect the inability to translate sfGFP decreasing fluorescence
proportional to the expression strength of the PPR protein.
Further, naturally occurring PPR Proteins in chloroplasts are often fused to catalytic
protein-subunits that perform actions like RNA-splicing or editing (S. Manna, 2015). We wanted to
mimic this by creating a universally targetable endonuclease constituted of a PPR and the
unspecifically cleaving Ribonuclease A. To validate this complex we reengineered the aptamer
broccoli(G. S. Filonov et al., 2014) to contain a PPR binding site. In case of
successful
binding we would again predict a loss of fluorescence due to the degradation of the aptamer.
These experiments could show the vast customizability and utility of modular RBPs in protein
expression control or functional RNA manipulation.
DNA-RNA Triple Helix
The RNA·DNA-DNA triple helix is a structure formed when an RNA binds to the major
groove of a DNA double helix by Hoogsteen base pairing (Kunkler et al., 2019). Hoogsteen interaction
is a type of non-canonical hydrogen bond configuration between nucleotides (Ghosal & Muniyappa,
2006). Our motivation to use the RNA·DNA-DNA triple helix was its theoretical ability to
act as a dCas protein. Complexes constituted from dCas, a guide RNA and a transcription
factor are becoming ever more common in transcriptional regulation (Brezgin et al., 2019;
Dong et al., 2018). The use of RNA-DNA-DNA triple helices would make such systems more lightweight:
reduce the size of the gene construct and diminish the metabolic burden of the complex.
Figure 1. Schematic representation of the first experiment.
We designed a proof of concept. In this experiment the transcription activator SoxS
is fused to the MS2 coat protein (MCP), an RNA binding protein (Wu & Weiss, 1992, Dong et. al, 2018). The
counterpart of MCP, the MS2 hairpin is connected to the RNA sequence that can form an
RNA-DNA-DNA triple helix. The double stranded DNA sequence needed for this interaction is placed
upstream of a weak constitutive promoter which regulates the expression of superfold GFP (see Figure
1). The hypothesis is that when all three cassettes are expressed simultaneously, the induction
construct assembles. SoxS therefore comes in close proximity to the promoter that expresses sf GFP
and activates it. We designed multiple assays, regarding different triple helices and
distances of their binding sites and the promoter (Bikard et. al, 2013).
Figure 2. Schematic representation of the second experiment.
Exploring more complicated regulation circuits, we designed an experiment in which the same gene
can be activated and/or repressed (see Figure 2). SoxS (activator) and KRAB (repressor) are fused
to different RNA binding proteins (MCP/LambdaN). The MS2 coat protein hairpin (MS2) is
connected to the triple helix forming RNA and expressed under an araC pBad promoter. The LambdaN
hairpin (BoxB) is connected to the triple helix forming RNA and expressed under a lac promoter. Both
RNA parts of the triple helix have the same sequence, that can bind double stranded DNA upstream of
a constitutive promoter of medium strength which regulates the expression of superfold GFP. The
hypothesis is that when L-arabinose is present in the cell the activation construct assembles
and there is an increase in superfold GFP expression. When allolactose is present in the cell
the repression construct can assemble and the expression superfold GFP decreases.
Split Trans-Splicing Ribozymes
In our quest for total control of RNA interactions, we also wanted to make use of
RNA-RNA-interactions. We can find this interaction in nature by looking at ribozymes and thus
decided to focus on a real classic - the self splicing group 1 intron from Tetrahymena
thermophila.
To transform this ribozyme into a useful control-element we first had to modify it and turn its
self-splicing activity into trans-splicing-activity.
The group 1 intron can be structurally separated into the 5’-exon, a 5’-external guide sequence
(EGS),
the complement of the EGS, an internal guide sequence (IGS), the ribozyme core flanked by hairpin
loops and a 3’-exon (Y. Ikawa, & S. Matsumura, 2018). Because these functional units do not always
directly depend on each other for the activity, we could split the 5’-exon and 5’-external sequence
from the rest of the ribozyme, resulting in a split-ribozyme. If these new two split
sequences
came into proximity they would form the functional ribozyme and show
trans-splicing-activity, fusing the 5’-exon of one sequence to the 3’-exon of the
other sequence.
To validate this system we designed an experiment with two sequences. One expressing a short peptide
with a start codon as 5’-exon, followed by the 5’-external guide sequence under a strong ribosomal
binding site (RBS). Another containing the rest of the ribozyme and a reporter gene like a
fluorescent
protein or a resistance. If trans-splicing occurs, a 3’-exon like mScarlet is fused to the RBS and
peptide with a start codon and is thus translated. This part can be used as a customizable,
targetable
mRNA translation activator.
3DOC
We are introducing the tool 3DOC (3D domain concatenation), which is based
on the
mathematical concept of an operad capable of fusing 3D structures of proteins. Thereby, the
pipeline enables the creation of fusion protein-binding domains and fusion proteins. For
instance, it can be used to create pumby and PPR protein sequences and PDBs, which can be linked
modularly to bind specific RNAs ((Adamala AP et al., 2016) (Coquille S et al.,
2014)).
3DOC makes use of BLASTp together with PyRosetta respectively trRosetta to generate a PDB file
for given protein sequences ((Chaudhury et al., 2010); (Yang J et al., 2020)) . If
several sequences are put in, the PDBs are fused or interconnected automatically with the inputted
amino acid linker via PyRosetta. Bundled with 3DOC Python tools to model Protein-RNA complexes are
offered ((Das R et al., 2010), (Cheng C et al., 2015), (Kappel K and Das R, 2019)).
PRISM
With PRISM - Protein-RNA interaction sequence modeling) - we
are presenting a language model based on transformer architecture capable of generating a
RNA-binding protein (RBP) sequence for an RNA motif chosen by the user (Vaswani A et al.,
2017).
PRISM was trained with more than 500.000 sequences from swissprot database (The UniProt
Consortium et al., 2019). We integrated and mapped several other databases. PRISM was then
finetuned PRISM on RBPs using the AttRACT database (Giudice G et al., 2016). We based
our model on the CTRL architecture while adapting the layer normalisation shown in the
ReZero architecture ((Keskar NS et al., 2019); (Bachlener et al., 2020)).
The training was undertaken for 16 epochs for a batch size of 128. Fine tuning was undertaken for
20 epochs for a batch size of 64. The training loss did not decline as much as we have hoped for in
the initial training and fine tuning. Literature shows that machine learning on protein sequences
takes relatively long (Madani et al., 2020). We propose that increasing the training time
would significantly improve the performance of PRISM.
Model
Figure 1. Visual representation of the RNA operad.
RNA secondary structure can be mathematically described as an operad – a concept
from category theory. We defined loops, stems and multiloops as operations. Furthermore, we
defined two ways to compose these operations: sequential and parallel composition (See Figure
1).
Defining operations and composition allows us to describe the rules by which the RNA structure is
formed. To get an actual RNA structure from our operad we have to use a mathematical concept known
as algebra. In our case, this means translating our operations into a dot-bracket
structure. It is achieved by writing every nucleotide in a loop as a dot and every nucleotide
in a stem as a bracket. A multiloop can be seen as a combination of loops and stems. Therefore,
every
element of a multiloop can be analyzed separately. Consequently, by proceeding recursively
dot-bracket structure of the whole RNA can be generated.
Figure 2. Schematic representation of the RNA secondary structure generating algorithm.
In a similar way a resource algebra on RNA structures, which tells us the length of our
RNA was defined. This allowed the definition of resource coalgebra, which given the
basic elements of RNA structure and the total length of the RNA, provides the minimal and maximal
lengths of the single elements to arrive at a structure of the given length. We used our resource
coalgebra to develop a program that generates secondary structures of RNA of given length
uniformly at random. (see Figure 2) Constraints such as the number, but also as minimal and
maximal sizes of single elements can be applied. This algorithm can be very useful for RNA
structure prediction as well as for generating training data for machine learning, as
we did with our network CoNCoRDe.
Strict 2-Category of Pseudoknot RNA Structures
Figure 3. The symmetric monoidal strict 2-category of pseudoknot RNA structures from
generators and relations. By virtue of being a strict symmetric monoidal 2-category, it admits
three types of compositions: on 0-cells (natural numbers) the monoidal product (red) given
by addition; in addition, on 1-cells (signatures) sequential composition given by
substitution of signatures (blue); on 2-cells (RNA secondary structures), both preceding modes of
composition together with 2-cell horizontal composition (previously "overlay" composition)
(green).
To extend our operadic model to pseudoknot secondary structures, we left behind operads and
instead chose to work with the more general monoidal categories, or, specifically strict
symmetric monoidal 2-categories. A monoidal 2-category comes with three kinds of
composable entities – n-cells, for n in 0, 1, 2 – and three kinds of
corresponding laws of composition. Composition of 0-cells is the monoidal product, which corresponds
to setting RNA structures side-by-side, which we refer to as parallel composition (red).
1-cells correspond to signatures defining which fragments of a given RNA secondary structure
are unpaired and which are paired. 1-cell composition corresponds to substituting a sequence of
paired and unpaired RNA fragments into another (blue). 2-cells correspond to actual RNA secondary
structures. Their composition (green) finally allows us to model pseudoknots. 2-cell horizontal
composition overlays two RNA structures with compatible signatures (paired and unpaired fragments)
to construct a pseudoknot. 2-cell vertical composition (blue) induced by 1-cell composition then
corresponds to RNA structure sequential composition in our operad of RNAs.
CoNCoRDe
Building on our model providing a compositional description of RNA structure we developed
CoNCoRDe – Compositional Nested Conditional RNA Design, our software for accelerating RNA
sequence design. Given a target RNA secondary structure, CoNCoRDe decomposes it into its
poset of substructures, and proceeds by recursively solving substructures until it
arrives at a solution for the whole RNA structure. When solving multiple structures, it remembers
the solutions to previous substructures, resulting in a continually learning algorithm.
Figure 1. Reinforcement learning environment for training agents to design RNA sequences
given a target RNA secondary structure. The agent designs an RNA sequence, which is then folded
by Vienna RNA and is assigned a score depending on how well it matches the target structure.
In a reinforcement learning loop this results in continuous agent improvement.
As a means to solving substructures, we use ViennaRNA, a simple genetic algorithm, as well as a
neural network agent trained using reinforcement learning. To that end, we implement a
learning environment for agents designing RNA sequences for a target RNA structure.
Figure 2. Training progress of our reinforcement learning agent learning to design RNA.
Note that the agent reaches the maximum environment reward (dotted line) within 200k timesteps.
In this environment, our agent quickly learns to design RNA sequences for fixed-length target
structures.
Figure 3. Hard example RNAs solved by our CoNCoRDe algorithm.
RISE
RISE is our iGEM registry intelligent search engine. It offers
to filter and search the iGEM registry. We thereby add functionality to the registry, so a
user can search for a certain keyword or filter for instance for the uses of the part. Also, RISE
presents the information in the registry in a standardized table.
The annotations and the protein sequence can be exported as .gb and .fasta file.
These files can be used in our pipeline 3DOC. RISE helps to make the valuable iGEM registry –
widely
used by iGEMers – even more useful. RISE is offered as a Jupyter Notebook. Thereby it
is easily
accessible by all potential users.
Collaboration
Collaboration is the key of science. iGEM is not only a team effort but also working
together in one giant project to spread and work on synthetic biology. We are glad to have
participated at the postcard challenge by iGEM Düsseldorf, the iGEM Meetup by iGEM
Gießen-Marburg, the iJET campaign by iGEM Darmstadt and iGEM Aachen as well as the iGEM
flash mob by iGEM Moscow.
Over the last few months we also collaborated with iGEM Manchester on a meta-analysis of
iGEM wikis. After we got to know their plans for a iGEM meta-analysis, we decided to help them
in the implementation of a web scraper to quantify specific parameters from different
iGEM team wiki pages. These include for instance the number of pictures and mean number of words per
sentence and are split by winners (and nominees) and participants. We constantly kept in contact
with the iGEM team Manchester to discuss our and their work and to find the best
parameters for which to search the wiki pages. The final analysis and report were created by
the iGEM team Manchester.
Human Practice
In order to engage with the public we hosted an online Science Slam with other iGEM teams.
The slam was won by iGEM Ghana with their project “Coast Busters”. To educate the next
generation of scientists we created an experiment booklet with a 3-days course
program for pupils at student research centers which includes a booklet version for students
and teachers, and accompanying material for teachers. We believe to have created a long-lasting
program with our experiment booklet and hope to implement it further in more schools and
research centers.
Our integrated Human Practices includes nine intense interviews with experts from various
fields who shaped our project. For instance, they were the reason why we decided to design the
mRNA translation inhibition and PPR-Ribonuclease complex experiments and influenced how we
constructed plasmids making use of Golden Gate cloning. The feedback of our experts has
shaped our work in the drylab as well, so we can make our tools better accessible for our
academic target audience.
Achievements
- We provided a useful contribution to future iGEM teams, by writing a guide of working with the Marburg Collection in E.coli and adapting the cloning protocols to a high temperature ligase.
- We have shown engineering success, by providing intensively researched plans for our experiments and navigating the engineering cycle with in silico improvement of parts needed for our experiments, and first results in our wetlab experimental validation.
- We worked together with iGEM Team Manchester to analyze iGEM Wikis via web scraping.
- We talked to stakeholders from politics and science about the future of synthetic biology.
- With CoNcoRDe and 3DOC we are the first iGEM team to model RNAs and proteins using category theory. CoNcoRDe is our software to simplify RNA design and 3DOC is capable of fusing protein domains, such as pumby modules.
- We have demonstrated with PRISM (Protein RNA Interaction Sequence Modelling) the generation of RNA-binding protein sequences based on a user-inputted RNA-motif.
- Science communication was an important issue for us. We developed a 3-day course program (experiment booklet) for pupils and reached out into society with the science slam.
- RISE is the abbreviation for: iGEM Registry Intelligent Search Engine. - This is what we present as excellence in another area. With RISE we present an easy-to-use tool, made for iGEMers and scientists out there!
- We spent our summer working remote and in the lab. The iGEM experience was quite different this year, but we had fun!
Attributions
Many people helped us over the course of our project. We want to thank them for all the support!
Rene Inckemann, a long-term iGEM Marburg and Düsseldorf member and instructor, was our mentor in the iGEM Mentorship program. He mentored us regarding our plans in the wetlab and with the wiki. He showed us how to use the iGEM Marburg 2019 Golden Gate Assembly Cloning Technique. Rene really was a huge support in developing our project further.
Christian Leiblein and the Studierendenwerk Heidelberg provided us with advice and the Marstall Cafe (If you ever come to Heidelberg it definitely is worth going there.) as our hub for our digital Science Slam.
Dr. Franziska Grün, a scientist at the Jäschke Lab at Heidelberg University, showed us how to do an in-vitro transcription.
Charlotte Westhoven, a second-year bachelor student in Molecular Biotechnology, did the voice-over for our video for the GASB video competition and the iGEM project promotion video.
Dr. Lorenz Adlung - PhD in Systems Biology, Weizmann Institute of Science, Israel
Theresia Bauer – Minister for Science, Research and Art Baden-Württemberg
Prof. Dr. Chase Beisel – Helmholtz centre for infection research
Dr. Kerstin Göpfrich – Max-Planck-Institute for medical research
Prof. Prof. Markus Gumbel – Hochschule Mannheim
Dr. Dorothea Kaufmann - IPMB, Heidelberg University
Dr. Anja Störiko - Editorial staff at Association for General and Applied Microbiology - VAAM
Prof. Dr. Chase Beisel – Helmholtz centre for infection research
Dr. Katharina Höfer - Max-Planck-Institute for terrestrial microbiology
Rene Inckemann - PhD student at the Max-Planck-Instiute for terrestrial microbiology
Julius Upmeier zu Belzen – PhD student at Berlin Institute of Health
Carola Fischer - PhD student at Technical University Berlin
Prof. Dr. Fausto Giunchiglia – Universita di Trento
Prof. Prof. Markus Gumbel – Hochschule Mannheim
Benedict Wolf - Student assistant at Berlin Institute of Health
Dr. Christiane Talke-Messerer - Student Research Center Lörrach-Tripoint
Advisors and Instructors:
Michael Jendrusch
A third-year master student in Molecular Biotechnology, was our instructor.
He was the one we came to first when we had a question in the lab or were fiddling around with our
code. Without Michael, our project would have been a way more arduous journey! He also supervised
the
development of PRISM and CoNcoRDe and helped us with our wiki development.
Christian Kobrow
A third-year master student in Physics, helped us in organizing our
development tools and GitHub. He was the one we asked when there were questions concerning RISE.
Also, he supported us with our wiki development.
Daniil Pshegorlinsky
A second-year bachelor student in Game Design, helped us in the
development of our wiki and our website. He also gave us advice on how to set up a data protection
compliant Lab-switch program.
Lukas Bange
Helped us in the development of our wiki and supervised our work in the lab.
Principal Investigator:
Prof. Dr. Stefan Wölfl
Our wonderful PI who always supported our project and liberated us from a lot of organizational
needs. He also helped us with his advice.
Special thanks:
Dr. Karin Hübner, Nina Beil, and the Bioquant-Team provided us with a lab and a meeting space. They always helped us with technical issues in the lab.Rene Inckemann, a long-term iGEM Marburg and Düsseldorf member and instructor, was our mentor in the iGEM Mentorship program. He mentored us regarding our plans in the wetlab and with the wiki. He showed us how to use the iGEM Marburg 2019 Golden Gate Assembly Cloning Technique. Rene really was a huge support in developing our project further.
Christian Leiblein and the Studierendenwerk Heidelberg provided us with advice and the Marstall Cafe (If you ever come to Heidelberg it definitely is worth going there.) as our hub for our digital Science Slam.
Dr. Franziska Grün, a scientist at the Jäschke Lab at Heidelberg University, showed us how to do an in-vitro transcription.
Charlotte Westhoven, a second-year bachelor student in Molecular Biotechnology, did the voice-over for our video for the GASB video competition and the iGEM project promotion video.
Expert Interviews
We enjoyed it a lot to talk to experts in science communication, stakeholders and also scientists
telling us more about their work. Their feedback on our project encouraged us to spread synthetic
biology even more and made us think beyond the borders of our own project!
Dr. Lorenz Adlung - PhD in Systems Biology, Weizmann Institute of Science, Israel
Theresia Bauer – Minister for Science, Research and Art Baden-Württemberg
Prof. Dr. Chase Beisel – Helmholtz centre for infection research
Dr. Kerstin Göpfrich – Max-Planck-Institute for medical research
Prof. Prof. Markus Gumbel – Hochschule Mannheim
Dr. Dorothea Kaufmann - IPMB, Heidelberg University
Dr. Anja Störiko - Editorial staff at Association for General and Applied Microbiology - VAAM
Integrated Human Practices
We are really grateful for all of the scientists we talked to share their knowledge and skills
with us. Their advice shaped our project in many ways and has inspired us to think of other ways to
spread synthetic biology in times of COVID-19. Without all the input, advice and information, our
project would not be the same!
Prof. Dr. Chase Beisel – Helmholtz centre for infection research
Dr. Katharina Höfer - Max-Planck-Institute for terrestrial microbiology
Rene Inckemann - PhD student at the Max-Planck-Instiute for terrestrial microbiology
Julius Upmeier zu Belzen – PhD student at Berlin Institute of Health
Carola Fischer - PhD student at Technical University Berlin
Prof. Dr. Fausto Giunchiglia – Universita di Trento
Prof. Prof. Markus Gumbel – Hochschule Mannheim
Benedict Wolf - Student assistant at Berlin Institute of Health
Dr. Christiane Talke-Messerer - Student Research Center Lörrach-Tripoint
Sponsors
We would like to thank our sponsors and promoters. With their support they made our project
possible!
Heidelberg University Foundation
LBBW foundation Baden-Württemberg
Merck
Promega
References
- A. Barkan, M. Rojas, S. Fujii, A. Yap, Y. S. Chong, C. Bond, & I. Small (2012). A Combinatorial Amino Acid Code for RNA Recognition by Pentatricopeptide Repeat Proteins. PLoS Genetics, 8.
- A. Griffiths, J. Miller, R. C. Lewontin, & D. Suzuki (2000). An Introduction to Genetic Analysis, 7th edition.
- A. Kissinger, & J. V. D. Wetering (2019). Reducing T-count with the ZX-calculus. arXiv: Quantum Physics.
- A. Kumar, X. Peng, & S. Levine (2019). Reward-Conditioned Policies. ArXiv, abs/1912.13465.
- A. M. Watkins, & R. Das (2019). FARFAR2: Improved de novo Rosetta prediction of complex global RNA folds. bioRxiv.
- A. Madani, B. McCann, N. Naik, N. Keskar, N. Anand, R. R. Eguchi, P.-S. Huang, & R. Socher (2020). ProGen: Language Modeling for Protein Generation. bioRxiv.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A.Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, & S. Chintala (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. ArXiv, abs/1912.01703.
- A. Patel, J. Zhao, D. Duan, & Y. Lai (2019). Design of AAV Vectors for Delivery of Large or Multiple Transgenes. Methods in molecular biology, 1950, 19-33.
- A. Senior, R. Evans, J. Jumper, J. R. M. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Židek, A. W. R. Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli, D. Jones, D. Silver, K. Kavukcuoglu, & D. Hassabis (2020). Improved protein structure prediction using potentials from deep learning. Nature, 577, 706-710.
- A. Singh, E. Jang, A. Irpan, D. Kappler, M. Dalal, S. Levine, M. Khansari, & C. Finn (2020). Scalable Multi-Task Imitation Learning with Autonomous Improvement. 2020 IEEE International Conference on Robotics and Automation (ICRA), 2167-2173.
- A. V. Anzalone, P. B. Randolph, J. R. Davis, A. Sousa, L. W. Koblan, J. M. Levy, P. J. Chen, C. Wilson, G. A. Newby, A. Raguram, & D. Liu (2019). Search-and-replace genome editing without double-strand breaks or donor DNA. Nature, 576, 149 – 157.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, & I. Polosukhin (2017). Attention is All you Need. ArXiv, abs/1706.03762.
- B. Fong, & D. I. Spivak (2018). Seven Sketches in Compositionality: An Invitation to Applied Category Theory. arXiv: Category Theory.
- Batey, Rambo, & Doudna (1999). Tertiary Motifs in RNA Structure and Folding. Angewandte Chemie, 38 16, 2326-2343 .
- Bikard, D., Jiang, W., Samai, P., Hochschild, A., Zhang, F., & Marraffini, L. (2013). Programmable repression and activation of bacterial gene expression using an engineered CRISPR-Cas system. Nucleic Acids Research, 41(15), 7429-7437.
- Boehm, C., & Bock, R. (2019). Recent advances and current challenges in synthetic biology of the plastid genetic system and metabolism. Plant physiology, 179(3), 794–802.
- C. Cheng, F.-C. Chou, & R. Das (2015). Modeling complex RNA tertiary folds with Rosetta. Methods in enzymology, 553, 35-64.
- C. Domenger, & D. Grimm (2019). Next-generation AAV vectors – don’t judge a virus (only) by its cover. Human molecular genetics.
- C. Dong, J. Fontana, A. Patel, J. Carothers, & J. G. Zalatan (2018). Synthetic CRISPR-Cas gene activators for transcriptional reprogramming in bacteria. Nature Communications, 9.
- C. Elliott (2018). The simple essence of automatic differentiation. Proceedings of the ACM on Programming Languages, 2, 1 – 29.
- C. N. Kunkler, J. P. Hulewicz, S. C. Hickman, M. C. Wang, P. J. McCown, & J. Brown (2019). Stability of an RNA•DNA–DNA triple helix depends on base triplet composition and length of the RNA third strand. Nucleic Acids Research, 47, 7213 – 7222.
- C. R. Conant, J. Goodarzi, S. E. Weitzel, & P. V. von Hippel (2008). The antitermination activity of bacteriophage lambda N protein is controlled by the kinetics of an RNA-looping-facilitated interaction with the transcription complex. Journal of molecular biology, 384 1, 87-108 .
- C.G. Wilson, T. Magliery, & L. Regan (2004). Detecting protein-protein interactions with GFP-fragment reassembly. Nature Methods, 1, 255-262.
- D. Carr (2005). Interferon Methods and Protocols. In Methods in Molecular Medicine.
- D. Fiorenza, C. Rogers, & U. Schreiber (2016). Higher U(1)-gerbe connections in geometric prequantization. Reviews in Mathematical Physics, 28, 1650012.
- D. Ghosh, A. Gupta, J. Fu, A. Reddy, C. Devin, B. Eysenbach, & S. Levine (2019). Learning To Reach Goals Without Reinforcement Learning. ArXiv, abs/1912.06088.
- D. Kim, D. Chivian, & D. Baker (2004). Protein structure prediction and analysis using the Robetta server. Nucleic acids research, 32 Web Server issue, W526-31.
- D. Klopfenstein, L. Zhang, B. Pedersen, F. Ramirez, A. W. Vesztrocy, A. Naldi, C. Mungall, J. M. Yunes, O. B. Botvinnik, M. D. Weigel, W. Dampier, C. Dessimoz, P. Flick, & H. Tang (2018). GOATOOLS: A Python library for Gene Ontology analyses. Scientific Reports, 8.
- D. P. Kingma, & J. Ba (2015). Adam: A Method for Stochastic Optimization. CoRR, abs/1412.6980.
- D. Peabody (1993). The RNA binding site of bacteriophage MS2 coat protein. The EMBO Journal, 12.
- Dong-Jiunn, J. Truong, K. Kühner, R. Kühn, S. Werfel, S. Engelhardt, W. Wurst, & O. Ortiz (2015). Development of an intein-mediated split–Cas9 system for gene therapy. Nucleic Acids Research, 43, 6450 – 6458.
- E. Volkin, & W. Cohn (1953). On the structure of ribonucleic acids. II. The products of ribonuclease action. The Journal of biological chemistry, 205 2, 767-82.
- Engelhardt, F., Praetorius, F., Wachauf, C., Brüggenthies, G., Kohler, F., Kick, B., Kadletz, K., Pham, P., Behler, K., Gerling, T., & others (2019). Custom-size, functional, and durable DNA origami with design-specific scaffoldsACS nano, 13(5), 5015–5027
- European parliament, & the council (2007). REGULATION (EC) No 1394/2007 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 13 November 2007 on advanced therapy medicinal products and amending Directive 2001/83/EC and Regulation (EC) No 726/2004 Official Journal of the European Union.
- F. Runge, D. Stoll, S. Falkner, & F. Hutter (2019). Learning to Design RNA. ArXiv, abs/1812.11951.
- G. Ghosal, & K. Muniyappa (2006). Hoogsteen base-pairing revisited: resolving a role in normal biological processes and human diseases. Biochemical and biophysical research communications, 343 1, 1-7 .
- G. Giudice, F. Sánchez-Cabo, C. Torroja Fungairino, & E. Lara Pezzi (2016). ATtRACT—a database of RNA-binding proteins and associated motifs. Database: The Journal of Biological Databases and Curation, 2016.
- G. S. Filonov, J. D. Moon, N. Svensen, & S. Jaffrey (2014). Broccoli: Rapid Selection of an RNA Mimic of Green Fluorescent Protein by Fluorescence-Based Selection and Directed Evolution. Journal of the American Chemical Society, 136, 16299 – 16308.
- H. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. Shindyalov, & P. Bourne (2000). The Protein Data Bank. Acta crystallographica. Section D, Biological crystallography, 58 Pt 6 No 1, 899-907 .
- I. C. de Beauchene, S. D. de Vries, & M. Zacharias (2016). Fragment-based modelling of single stranded RNA bound to RNA recognition motif containing proteins. Nucleic Acids Research, 44, 4565 – 4580.
- I. C. de Beauchene, S. D. Vries, & M. Zacharias (2016). Binding Site Identification and Flexible Docking of Single Stranded RNA to Proteins Using a Fragment-Based Approach. PLoS Computational Biology, 12.
- I. Hofacker (2009). RNA Secondary Structure Analysis Using the Vienna RNA Package. Current Protocols in Bioinformatics, 26.
- I. Loshchilov, & F. Hutter (2019). Decoupled Weight Decay Regularization. In ICLR. •Ingraham, J., Garg, V.K., Barzilay, R., & Jaakkola, T. (2019). Generative Models for Graph-Based Protein Design. DGS@ICLR.
- J. Baez, & J. Master (2020). Open Petri Nets. Math. Struct. Comput. Sci., 30, 314-341.
- J. Baez, J. Foley, & J. Moeller (2019). Network Models from Petri Nets with Catalysts. ArXiv, abs/1904.03550.
- J. Lian, M. HamediRad, S. Hu, & H. Zhao (2017). Combinatorial metabolic engineering using an orthogonal tri-functional CRISPR system. Nature Communications, 8.
- J. Shi, R. Das, & V. Pande (2018). SentRNA: Improving computational RNA design by incorporating a prior of human design strategies. ArXiv, abs/1803.03146.
- J. Upmeier zu Belzen, T. Bürgel, S. Holderbach, F. Bubeck, L. Adam, C. Gandor, M. Klein, J. Mathony, P. Pfuderer, L. Platz, M. Przybilla, M. Schwendemann, D. Heid, M. Hoffmann, M. Jendrusch, C. Schmelas, M. C. Waldhauer, I. Lehmann, D. Niopek, & R. Eils (2019). Leveraging implicit knowledge in neural networks for functional dissection and engineering of proteins. Nature Machine Intelligence, 1, 225-235.
- J. Wu, & B. Weiss (1992). Two-stage induction of the soxRS (superoxide response) regulon of Escherichia coli. Journal of bacteriology, 174 12, 3915-20.
- Jianyi Yang, I. Anishchenko, H. Park, Zh. Peng, S. Ovchinnikov, & D. Baker (2020). Improved protein structure prediction using predicted interresidue orientations. Proceedings of the National Academy of Sciences, 117, 1496 – 1503.
- K. Kappel, & R. Das (2019). Sampling Native-like Structures of RNA-Protein Complexes through Rosetta Folding and Docking. Structure, 27 1, 140-151.e5.
- K. P. Adamala, D. Martin-Alarcon & E. Boyden (2016). Programmable RNA-binding protein composed of repeats of a single modular unit. Proceedings of the National Academy of Sciences, 113, E2579 – E2588.
- Kim, H., Bojar, D., & Fussenegger, M. (2019). A CRISPR/Cas9-based central processing unit to program complex logic computation in human cells. Proceedings of the National Academy of Sciences, 116(15), 7214–7219.
- Kunkler, C., Hulewicz, J., Hickman, S., Wang, M., McCown, P., & Brown, J. (2019). Stability of an RNA•DNA–DNA triple helix depends on base triplet composition and length of the RNA third strand. Nucleic Acids Research, 47(14), 7213-7222.
- L. Khalatbari, M. R. Kangavari, S. Hosseini, Hongzhi Yin, & N. Cheung (2019). MCP: A multi-component learning machine to predict protein secondary structure. Computers in biology and medicine, 110, 144-155.
- Lei S. Qi, Matthew H. Larson, L. Gilbert, J. Doudna, J. Weissman, A. Arkin, & W. Lim (2013). Resource Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression.
- M. Ashburner, C. Ball, J. Blake, D. Botstein, H. Butler, J. Cherry, A. Davis, K. Dolinski, S. Dwight, J. Eppig, M. Harris, D. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. Richardson, M. Ringwald, G. Rubin, & G. Sherlock (2000). Gene Ontology: tool for the unification of biology. Nature Genetics, 25, 25-29.
- M. Gilson, T. Liu, M. Baitaluk, G. Nicola, L. Hwang, & J. Chong (2016). BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Research, 44, D1045 – D1053.
- M. Huot, S. Staton, & M. Vákár (2020). Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing. In FoSSaCS.
- M. Steinegger, M. Meier, M. Mirdita, Harald Vöhringer, Stephan J. Haunsberger, & J. Söding (2019). HH-suite3 for fast remote homology detection and deep protein annotation. bioRxiv.
- M. Zhang, Q. Su, Y. Lu, M. Zhao, & B. Niu (2017). Application of Machine Learning Approaches for Protein-protein Interactions Prediction. Medicinal chemistry (Shariqah (United Arab Emirates)), 13 6, 506-514.
- Morgan, S., Grootendorst, P., Lexchin, J., Cunningham, C., & Greyson, D. (2011). The cost of drug development: a systematic review. Health policy, 100(1), 4–17.
- N. Behr (2019). Tracelets and Tracelet Analysis Of Compositional Rewriting Systems. ArXiv, abs/1904.12829.
- N. Katz, E. Tripto, S. Goldberg, O. Atar, Z. Yakhini, Y. Orenstein, & R. Amit (2019). Overcoming the design, build, test (DBT) bottleneck for synthesis of nonrepetitive protein-RNA binding cassettes for RNA applications. bioRxiv.
- N. Keskar, B. McCann, L. R. Varshney, C. Xiong, & R. Socher (2019). CTRL: A Conditional Transformer Language Model for Controllable Generation. ArXiv, abs/1909.05858.
- O.Vinyals, I. Babuschkin, W. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y. Wu, R. Ring, D. Yogatama, D. Wünsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, & D. Silver (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 1-5
- P. Cock, T. Antao, J. Chang, B. Chapman, C. J. Cox, A. Dalke, I. Friedberg, T. Hamelryck, F. Kauff, B. Wilczynski, & M. Hoon (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25, 1422 – 1423.
- P. D. Hsu, E. Lander, & F. Zhang (2014). Development and Applications of CRISPR-Cas9 for Genome Engineering. Cell, 157, 1262-1278.
- P. Kerpedjiev, S. Hammer, & I. Hofacker (2015). Forna (force-directed RNA): Simple and effective online RNA secondary structure diagrams. Bioinformatics, 31, 3377 – 3379.
- R. Breaker (2018). Riboswitches and Translation Control. Cold Spring Harbor perspectives in biology, 10 11.
- R. Das, & D. Baker (2008). Macromolecular modeling with rosetta. Annual review of biochemistry, 77, 363-82.
- R. Das, J. Karanicolas, & D. Baker (2010). Atomic accuracy in predicting and designing non-canonical RNA structure. Nature methods, 7, 291 – 294.
- R. Koodli, B. Keep, K. R. Coppess, F. Portela, E. Players, & R. Das (2019). RNA design movesets and strategies from an Internet-scale videogame. bioRxiv, 326736.
- R. Lehmann, & C. Nüsslein-Volhard (1987). Involvement of the pumilio gene in the transport of an abdominal signal in the Drosophila embryo. Nature, 329, 167-170.
- R. Srivastava, P. Shyam, F. Mutz, W. Jaśkowski, & J. Schmidhuber (2019). Training Agents using Upside-Down Reinforcement Learning. ArXiv, abs/1912.02877.
- Ronny Lorenz, S. Bernhart, C. H. Z. Siederdissen, Hakim Tafer, C. Flamm, P. Stadler, & I. Hofacker (2011). ViennaRNA Package 2.0. Algorithms for Molecular Biology : AMB, 6, 26 – 26.
- S. Aubourg, N. Boudet, M. Kreis, & A. Lecharny (2004). In Arabidopsis thaliana, 1% of the genome codes for a novel protein family unique to plants. Plant Molecular Biology, 42, 603-613.
- S. Brezgin, Anastasiya Kostyusheva, D. Kostyushev, & V. Chulanov (2019). Dead Cas Systems: Types, Principles, and Applications. International Journal of Molecular Sciences, 20.’
- S. C. Coquille, A. Filipovska, T. Chia, L. Rajappa, J. P. Lingford, M. Razif, S. Thore, & O. Rackham (2014). An artificial PPR scaffold for programmable RNA recognition. Nature communications, 5, 5729.
- S. Chaudhury, S. Lyskov, & J. J. Gray (2010). PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics, 26 5, 689-91.
- S. Dong, Y. Wang, C. Cassidy-Amstutz, G. Lu, R. Bigler, M. R. Jezyk, C. Li, T. Hall, & Z. Wang (2011). Specific and Modular Binding Code for Cytosine Recognition in Pumilio/FBF (PUF) RNA-binding Domains. The Journal of Biological Chemistry, 286, 26732 – 26742.
- S. Lane (1971). Categories for the Working Mathematician.
- S. Levine, A. Kumar, G. Tucker, & J. Fu (2020). Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. ArXiv, abs/2005.01643.
- S. Lyskov, F.-C. Chou, S. O. Conchuir, B. S. Der, K. Drew, D. Kuroda, J. Xu, B. D. Weitzner, P. D. Renfrew, P. Sripakdeevong, B. Borgo, J. Havranek, B. Kuhlman, T. Kortemme, R. Bonneau, J. J. Gray, & R. Das (2013). Serverification of Molecular Modeling Applications: The Rosetta Online Server That Includes Everyone (ROSIE). PLoS ONE, 8.
- S. Manna (2015). An overview of pentatricopeptide repeat proteins and their applications. Biochimie, 113, 93-9 .
- S. Schmidt (2013). Fusion protein technologies for biopharmaceuticals : applications and challenges.
- T. C. Bachlechner, B. P. Majumder, H. H. Mao, G. Cottrell, & J. McAuley (2020). ReZero is All You Need: Fast Convergence at Large Depth. ArXiv, abs/2003.04887.
- The Gene Ontology Consortium (2019). The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research, 47, D330 – D338.
- The UniProt Consortium (2019). UniProt: a worldwide hub of protein knowledge. Nucleic Acids Research, 47, D506 – D515.
- Van Norman, G. (2016). Drugs, devices, and the FDA: part 1: an overview of approval processes for drugs. JACC: Basic to Translational Science, 1(3), 170–179.
- W. G. Alexander (2019). Marionette strains aim to make refining metabolic pathways faster and easier. Synthetic Biology, 4.
- Weizhong Chen, H. Zhang, Y. Zhang, Yu Wang, J. Gan, & Q. Ji (2019). Molecular basis for the PAM expansion and fidelity enhancement of an evolved Cas9 nuclease. PLoS Biology, 17.
- X. Chen, J. Zaro, & W. Shen (2013). Fusion protein linkers: property, design and functionality. Advanced drug delivery reviews, 65 10, 1357-69.
- X. Peng, A. Kumar, G. Zhang, & S. Levine (2019). Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning. ArXiv, abs/1910.00177.
- Y. Ikawa, & S. Matsumura (2018). Engineered Group I Ribozymes as RNA-Based Modular Tools to Control Gene Expression.
- Z. Sükösd, E. Andersen, & R. B. Lyngso (2014). SCFGs in RNA Secondary Structure Prediction: A Hands-on Approach. Methods of Molecular Biology, 1097, 143-162.
- Z. Wu, Sh. Pan, F. Chen, G. Long, C. Zhang, & P. Yu (2020). A Comprehensive Survey on Graph Neural Networks. IEEE transactions on neural networks and learning systems.