Team:Heidelberg/Integrated

Integrated Human Practices
Talking to Giants

Integrated Human Practices

In our project we have worked with a lot of scientists, got their advice and gained experience through discussing our approaches and ideas. Based on the feedback of other scientists, which are the target group for our toolbox, we continuously improved our project throughout the whole year. All of them shaped our project and gave us extremely valuable advice, how to make our project better. We are extremely grateful to have received feedback from all of them. On this page we have listed all subprojects we received advice for and the scientists we talked to.

Triple-Helix Experiment and RNA-linker

Prof. Dr. Chase Beisel

Professor in Chemical Engineering
Helmholtz Center for Infection Research in Würzburg

Prof. Dr. Chase Beisel is the head of the „RNA Synthetic Biology“ research group at the Helmholtz Center for Infection Research in Würzburg. He is a chemical engineer and biotechnologist. He works on CRISPR-Cas systems in order to be able to use them against genetic diseases or multi-resistant pathogens. He got to work with CRISPR through a background in RNA synthetic biology. He is also studying naturally occurring RNA, to understand which RNAs are conserved and what they are doing. These insides he uses to better engineer RNA.

We contacted Prof. Beisel because of his experience with CRISPR systems, which we wanted to tackle in our Triple-Helix experiment. We discussed with him our idea, that the Triple helix could substitute the quite large nucleases in CRISPR-Cas systems, which are often leading to cloning issues. Prof. Beisel encouraged us to have a strong focus on the exact design of the nucleic acid interaction sites, as this is one of the main challenges of the project. Based on this feedback, we started working on our software CoNcoRDe, which enables us to design RNAs and trans-splicing ribozymes better amongst other things. We also talked about cellular delivery and biological circuit design, which could be a possible real-world implementation of our wetlab results.



Dr. Katharina Höfer

PhD in Biochemistry
Head of the Research Group “Bacterial Transcriptomics”
Max Planck Institute for Terrestrial Microbiology in Marburg

In her lab Dr. Katharina Höfer is working on NAD-capped RNAs. We contacted her to understand the role of the RNA better and ask for advice for planning our experiments since RNAs play a big role in our project. Dr. Höfer gave us input on the functions of the NAP cap. One of the major advantages is that NAD-caps make RNAs more stable, which is exactly what we needed for our project. Unfortunately, there is no reliable method to produce NAD-capped RNA in vivo yet. Dr. Höfer recommended that we pay attention primarily to the structure of the RNA, since highly structured RNAs, especially with loops on 5’ and 3’ ends, are less likely to be degraded in the cell. We made sure to design our RNAs in accordance with this advice when possible.

At the time of the interview we considered developing a sensor on the basis of an aptamer as an example of an application for our toolbox. In the year 2015 the iGEM Heidelberg Team, for which Dr. Höfer was advisor, was also using aptamers in their project. We could profit from these experiences: Dr. Höfer’s main suggestion was to verify whether the concentration of the prospective aptamer ligand is high enough for it to be detected before designing the aptamer. She also expressed her wish to have a sensor for NAD-capped RNA. Because of the lack of laboratory time due to COVID-19 this project was abandoned.

Our further idea was to develop RNA linkers of variable length. This could be achieved by manipulating the structure of the RNA. Dr. Höfer’s advice was to use known riboswitches sequences. She also suggested using an RNA with multiple Mango aptamer sequences. We learned that the addition of the aptamer ligand would both lead to a change of the RNA length and to a detectable light signal.


Pentatricopeptide repeat proteins and Golden Gate Cloning

Rene Inckemann

M.Sc. in Molecular and cellular biology.
PhD Student at AG Erb
Max Planck Institute for Terrestrial Microbiology

When we were first designing the protocols for our experiments we did not know much about cloning (The first approach we wanted to use when designing our plasmid constructs was to connect our gene blocks between each other using golden gate cloning and the edges to the plasmid backbone using 3AA cloning, which of course is a horrific idea).

Luckily, we met René Inckemann at the online iGEM Meetup 2020 hosted by the iGEM Team Marburg and Gießen, while participating in the promotion video competition set out by the German Association of Synthetic Biology (GASB). René introduced us to the Mo-Clo approach to Golden Gate cloning and also gave us an introduction to the iGEM Marburg Assembly method he participated in designing. After the presentation of the assembly method we were hooked of the idea at the spot and subsequently decided to redesign our plasmids to fit a Mo-Clo assembly. Further, René was kind enough to give us a physical copy of the iGEM Marburg Collection 2018, which gave us a headstart when creating our plasmids as we were quickly able to chose from a lot of promoters, terminators, restances and much more when creating our plasmid constructs.

We stayed in contact with René most of the time during the last half of iGEM season. When we told him that we were working on Pumilio homology domain proteins and their ability to be synthetically designable to bind any RNA sequence, he called our attention to the Pentatricopeptide Repeat proteins he knew had similar characteristics. Through this discussion we discovered the many catalytic activities that are associated with the PPR proteins, which subsequently lead us to design the mRNA translation inhibition and PPR-Ribonuclease complex experiments.

PRISM and 3DOC

Julius Upmeier zu Belzen

M.Sc. in Bioinformatics
Student Assistant for Human Genome Machine Learning
Berlin Institute of Health

In his current and previous projects Julius has collected a lot of experience with using Gene Ontology (GO) terms to train neural networks. As a team member, he started to work with GO terms in the iGEM Heidelberg 2017 project, where the team presented the neural networks GAIA and Dee Protein, which are able to assign a protein’s function based on an amino acid sequence Belzen2019LeveragingIK. His explanations helped us to understand the structure of the GO annotation system better.

We chose Swissprot as the learning dataset for our language model. It has a huge variety of possible GO terms assigned to different proteins. This is problematic since our network cannot directly process all GO terms present in the dataset. Julius suggested using the goatools library to determine more general “parent” GO terms and use them instead of the compartmentalized specific GO terms Klopfenstein2018GOATOOLSAP. He also advocated our approach to use semantic similarity between GO terms with different parents to lower the amount of GO terms. Moreover, he explained to us how to use the goatools library to create based on semantic similarity new condensed GO terms. Julius also proposed that GO-Terms can be embedded using a word2vec approach. Hence, semantic similarity and a dimensional reduction of the vector representation would be guaranteed compared to one-hot encoding.

We were planning to exclude RNA-binding proteins with catalytic activity from our learning dataset because they usually bind RNA in a sequence-unspecific way. Julius warned us to be very careful since a lot of proteins could have the according to GO terms, even though their catalytic activity is not their primary function and can be dismissed. Based on this advice we revised the list of sequences we wanted to exclude because of a specific GO term.


Carola Fischer

MSc in Physics
PhD Student Technical University Berlin

Carola Fischer finished her master degree in physics at the University of Heidelberg this year. She is now doing her PhD in magnetic resonance imaging at the Technical University Berlin.

In her master studies Carola has gained a lot of experience with machine learning problems. She is an expert when it comes to neural network development. Due to her academic background she learnt to build neural networks from the ground. In the beginning she taught us to implement a simple neural network using PyTorch. We had the possibility to discuss with her the Transformer architecture as proposed by Vaswani et al. 2017 in the article “Attention is all you need.” Vaswani2017AttentionIA. We used her advice, when our team examined the influence of hyperparameters on the performance of our model as well as how to refine accuracy fluctuations and the model in general.

We also discussed with Carola the similarities and differences between sequences of protein families. We were worried that related protein families could impair the meaningfulness of our validation error. She suggested we validate our pretraining with a k-fold cross-validation to ensure that the validation error stays the same for all validation packages. We suggested to her performing cross-validation only with a subsection of our data, as cross-validation is computationally expensive. Based on her feedback, we decided to implement a cross-validation to control the quality of our dataset.

We also asked Carola for her advice optimizing the finetuning process. To ensure high impact of the fine-tuning dataset, although it is compared to the pretraining dataset relatively small, on our prediction and embedding layers, she gave us her advice to adapt the batch size, the learning gradient and optimizer to the new dataset independently from the settings we used in pre-training.


Prof. Dr. Fausto Giunchiglia

Professor of Computer Science
University of Trento, ECCAI fellow, member of the Academia Europaea

Prof. Giunchiglia is one of the most experienced professionals we contacted, from which we profited a lot. In our interview, we talked about the state of modern synthetic biology and related fields. We asked for his opinion on the benefits of applying formal methods to design robust biological systems. We were surprised to learn that he doesn’t think these methods could be extended to model biological systems. According to him, we need a much deeper revision of the underlying computational model.

Our dry lab team is working with maximum-likelihood language models applied to protein and RNA sequences. We were wondering about how biological sequences differ from natural language and in which way we can treat them similarly. We learned from Prof. Giunchiglia that the essential backbone of natural language is a sequential structure. On top of this, one can construct secondary connections. Asking him for advice on how modeling of protein sequences could benefit from the wealth of knowledge in natural language processing, he told us that he would focus on knowledge graphs and graph embeddings. Prof. Giunchiglia also gave us valuable tips for avoiding some pitfalls while applying machine learning to natural language data. For example, according to him, it’s hard to understand how the decision was made when you used DNN. Also important are many examples that are not biased.

We were also discussing other concepts from natural language processing - adjacent computer science, that could be beneficial to treating biological sequences. Prof. Giunchiglia stated that language can be seen as a set of sentences that represent in different ways similar or identical meanings. He also said that the techniques used to disambiguate language are strongly influenced by the fact that they focus on language. He thinks that graphs are the way to go, because as he explained they can just be shaped to model exactly the phenomenon of interest.

Full Interview with Prof. Giunchiglia Full Interview with Prof. Giunchiglia

Prof. Dr. Markus Gumbel

Professor for Medical Informatics
Mannheim University of Applied Sciences

More and more projects in the life sciences make use of machine learning methods – so do we. This is a very powerful and sophisticated technology. In order to learn something about possible pitfalls beforehand and to get some advice on specific parts of our project, we talked to Prof. Gumbel who has for many years collected experience in this field. He gave us lots of intellectual input, besides other things he introduced us to some of the benefits of machine learning for computational biology.

Regarding our own project Prof. Gumbel alleviated our fear that the amount of RNA and amino-acid sequence we are working with might be too large and made us aware of what to look for in the data. Prof. Gumbel also encouraged us to not shrink back from getting deep into the technology of using computational methods for protein design. We introduced him to our idea of developing a software package for the iGEM community and asked Gumbel for his opinion on what makes a good software tool for biologists. From his experience, he could tell that the tool should be web based, heavily linked to the biological databases, and provide an interface to Excel. Moreover, we also got advice from him on which program language best to use for our purposes. So we decided to implement most of our software directly in Python.

Full Interview with Prof. Gumbel Full Interview with Prof. Gumbel

Collaboration with iGEM Manchester - Webscraping

Benedict Wolf

BSc in Molecular Biotechnology
Student Assistant at Berlin Institute of Health

Benedict „Bene“ Wolf is an old acquaintance. His participation at the iGEM competition 2019 has steeled him to support the iGEM team 2020. After iGEM he decided to specialize for his bachelor thesis to work in the field of bioinformatics at the Berlin Institute of Health under the supervision of Prof. Dr. Roland Eils.

After our discussion with the iGEM team Manchester concerning our collaboration on an iGEM Wiki meta study, we reached out to Bene. In 2019 Bene was responsible for the programming of the Wiki. Therefore he helped us to analyse the programmatic structure of an iGEM wiki page thoroughly. Based on this analysis we were able to improve our web scraper in its performance significantly.

Furthermore we discussed with Bene which parameters on the Wiki may be worth to analyse. He suggested to include counting the number of embedded documents, such as the count of PDFs, as these additional documents may be helpful to understand the breadth of a team's work.

Experiment booklet

Dr. Christiane Talke-Messerer

PhD in Microbiology
Head of the Department for LifeSciences
Student Research Center phaenovum Lörrach-Tripoint

We were highly motivated to spread the word about science, synthetic biology, and our project. Not only to adults but also to students. We find it a pity that only a few students learn something about synthetic biology in school and want to make sure to change that in the future.

Due to the high effort that is behind every experiment and every lesson we decided to help teachers and develop a course program on basic synthetic biology for student research centers. On the path of this development, Dr. Christiane Talke-Messerer has accompanied us. From her perspective and long experience as a supervisor for student projects in the field of biology and teacher for several courses in microbiology we steadily got feedback on our ideas. We had lots of questions on what kind of experiments are possible to do with students and how we should approach the composition of the booklet and instructions. To never lose sight of our goal of helping teachers and creating something they can really use, we also orientated on her advice on that.

Due to this, we created a 3-days course program in which we introduce students to Golden Gate Cloning, a technique that we also used in our project. We learned from Dr. Talke-Messerer that a course program would be more interesting if it has something like a framework and students have the possibility to make their own decisions and explore – like in a real experiment. We have implemented both ideas and set the framework for our iGEM project to have a personal story behind the program.