Team:SYSU-Software/Poster

Poster: SYSU-Software



Maloadis
Writers: Junli Du, Yue Zeng,
Designers: Haoxi Zhang, Jiangyue Yuan, Yuhou Chen, Yilin Zhong
Primary PI: Jianhua Yang
Instructors: Tianqin Li, Tianlun Lei
Sponsor: School of Life Sciences, Sun Yat-sen, Guangzhou, Guangdong, China

Abstract
The efficiency of genetic engineering can be hindered by myriad possibilities of genetic structures and complicated details. Our team, SYSU-Software aim to use computer algorithms to exploit the existing massive data and reduce redundancy in the engineering procedure. Therefore, we create Maloadis, an integrated automated genetic circuit design platform. Maloadis implement automated top-down design with GeneNet algorithm, and is capable of designing and rating possible genetic circuits according to users' requirements. It also exploits the abundant information provided by genetic circuit images by extracting parts and structures from them to search for related previous work through trained neuro network. To improve success rate in wet-lab experiment, Maloadis predicts gene expression level with integrated models, and offers suggestions to shorten experiment cycle using Bayesian Optimization algorithm. We present Maloadis as a de novo approach to facilitate synthetic biology design automation.
Background
Designing and implementing a genetic circuit top-down design could be a long process. The challenge of design lay in the complexity and variety of genetic circuit structures, which requires synthetic biologists to spend a lot of time doing research and considering all the possibilities. On the other hand, experiments may not yield ideal result in the first time, and a repetition of trial and error is often unavoidable in labs. The problems we are facing are:
How to:
  • Find possible genetic circuit structures that meet with our initial goal?
  • Exploit existing data of circuit design?
  • Improve experiment results within shorter cycles?
Motivation
We want to improve the traditional design process of genetic circuits in the folowing aspects:
  • Design
    Besides designing a genetic circuit from sketch with genetic parts, users can also design genetic circuits based on their function requirements. We want to employ a top-down automated design method to make the design process faster and more effective.
  • Search
    The traditional way of text-based keyword searching is not efficient enough to utilize the massive data of synthetic biology. We aim to build a image search engine so that users can directly search and scroll genetic circuit images to quickly grasp key information for papers and iGEM Registry design.
  • Experiments
    Improving lab results can be a long process because we do not know the ideal parameters at first. What we want to achieve is to give users instructive suggestions for every round of the experiment and shorten the experiment cycle.
Project Goals
  • To Automatically design genetic circuits based on users' demand, users just need to provide our software with a simple description of desired circuit function, then a genetic circuit structure and even a circuit filled with detailed genetic parts will be returned.
  • To design an image search engine for genetic circuit design, our software will extract the circuit images' information in a precise and purposeful way. Users can directly use the circuits they designed on our platform to search for similar circuit designs.
  • To provide experiment suggestions for lab researchers, our software will learn from users' experiment results and give them solutions to achieve a better system performance. The experiment cycle will be shortened in this way.
Workflow
This is a general view of how Maloadis works.

Maloadis incorporates the Design, Build, Test, Learn workflow with four functions. Users can use the automated design platform and image search on Maloadis during the design stage.

Then, Maloadis provides simulation of the genetic circuit. After users build their designed circuit in the lab, the simulation result can be used to test whether the lab result reaches the ideal theoretical value.

To further improve the circuit design, users can get experiment suggestions using the parameter optimization function on Maloadis to better learn and optimize lab results.
Automated Design
Regulate gene expression with transcription factors and promoters. We help users regulate target gene expression by designing the transcription factors and promoters.
  • GeneNet Machine-learning algorithm[1]: automatically design genetic circuit structures based on desired functions
    We use GeneNet Machine-learning algorithm to automatically design a genetic circuit structure based on desired functions.
  • Autofill Algorithm: autofill devices and parts for the circuit structure
    we fill the Autofill Algorithm with optimal transcription factors and promoters that can fulfill the structure.

TF-Binding sites affinity prediction

The Automated Design process above is feasible, but it only used the inhibitive or simulative relationship provided by GeneNet Algorithm. GeneNet Algorithm also provides users with the quantified activation parameter.

Here, we designed a deep learning algorithm based on ChIP-seq data set to predict the affinity score between transcription factors and binding sites, and attempted to use the affinity information to provide users with a reference to activation parameters.

Piles of eukaryotic and prokaryotic transcription factor ChIP-seq data was obtained[2, 3].
A Convolutional neural network was trained to predict affinity score between transcription factors and binding sites. References:
[1]. Hiscock, T. W. Adapting machine-learning algorithms to design gene circuits. BMC bioinformatics 20, 214-214, (2019).
[2]. Reddy, T.B., et al., TB database: an integrated platform for tuberculosis research. Nucleic Acids Res, 2009. 37(Database issue): p. D499-508.
[3]. Yevshin, I., et al., GTRD: a database on gene transcription regulation-2019 update. Nucleic Acids Res, 2019. 47(D1): p. D100-D105.
Image Search
  • Preprocessing
    In our software, images are preprocessed with the identification result of YOLOv4 + OCR, and corrected with fuzzy match result that has the shortest Levenshtein Distance.
  • Search similar structures
    To search similar circuit structures, we first extract the function of the circuit's component (promoter, CDS, etc.) and map them into the structure. Then we employ a fuzzy matching algorithm based on Levenshtein Distance.
  • Search similar parts
    To search similar parts, we use the opposite number of score calculated by fuzzy matching algorithm based on Levenshtein Distance. Then we sum the scores of the components in the priority queue to score the matching circuit.

User's input image will be identified with YOLOv4[1,2] and OCR[3], then searched in ACS database or iGEM Registry we incorporated depending on the proportion of standardized iGEM parts. Users can choose search for similar parts and/or similar structures of the input image.

Search results of genetic circuit images are shown below: References:
[1] https://arxiv.org/abs/2004.10934.
[2] https://github.com/AlexeyAB/darknet.
[3] https://opencv.org.
Simulation
  • One click simulation of gene circuits constructed by users
  • Circuit Design Optimization Based on Evolutionary Algorithms
To help researchers better understand how the gene circuit they created works, we build a series of models to simulate the dynamic behaviors of it. Most of the biochemical processes are made up of activation or repression, so we constructed activation and inhibition formulas based on Hill equation. After the user constructs the gene circuit in the designer, our software can create a corresponding ODE system automatically[1].

References:
[1]. Yeoh, J.W., et al., An Automated Biomodel Selection System (BMSS) for Gene Circuit Designs. ACS Synth Biol, 2019. 8(7): p. 1484-1497.
Parameter Optimization

After designing the primary circuit, how to tune it to reach the best performance?
This year, we take the expression level of genes in cells as the system parameters to be adjusted and apply Bayesian optimization algorithm to it.

Here's the workflow: Why we choose it?
  • Top-down method: lack of detailed mechanism will not block you from achieving good result.
    The preparative work in optimization is to determine the initial design, inputs and outputs of the system as well as the objective function[1].
  • Efficient experiment instruction: give users suggestions on how to change the parameters of gene circuit to optimize it in short experiment cycle[2].



Four optimization algorithms were compared. The Figure shows their improvement over Random Search; the horizontal axis represents the number of trials has been evaluated; the vertical axis indicates each optimality gap as a fraction of the Random Search optimality gap at the same point.

  • Learn from results: After performing the instructions, every experiment result will be learned by Maloadis, giving users new instructions for the next round.
References:
[1] HamediRad M, Chao R, Weisberg S et al. Towards a fully automated algorithm driven platform for biosystems design. Nat Commun. 2019;10(1):5150
[2] Golovin D, Solnik B, Moitra S et al. Google Vizier: A Service for Black-Box Optimization. 2017:1487–1495
Human Practices

Our software team have conducted our Human Practices for 2 purposes: investigate whether our project is responsible and beneficial at the same time, and influence communities outside iGEM.


Concerned with the safety issues our software may cause, we have first consulted a professor specialized in bio-ethnics and biosafety. Next, we consulted a law firm on intellectual property protection and user information privacy.


To investigate how our project can benefit the society, we sent out questionnaires to iGEM teams, biology companies and researchers. Moreover, we have consulted wet-lab researchers and a biomedical company to see how we can apply our software in lab researches and the bio-industry.


For technical support, we consulted professors specialized in bioinformatics and computer sciences. Their advice guided us through our project.


We influenced a group of college freshmen and sophomores by holding a mini class sharing and discussing the frontier and application of synthetic biology.
Wet-lab Validations
To validate our auto-designer's performance, we conducted the following experiments. First, we designed an input function, whose output was an oscillation. Based on the Regulon Database, we filled the gene-gene interaction matrix. We constructed two plasmids using Gibson assembly, one called pACYC Duet-1, which contained promoter hnsp, transcription factor mazE, promoter csgDp1, transcription factor hns, the other called pQE-30, which contained promoter mazEp2, transcription factor cpxR, gene GFP. After co-transfection into competent cell DH5α, we detected GFP expression level for several hours. Finally, we graphed GFP with time and found it in line with our expectation.

The detailed experiment steps:
1. Design a circuit.
2. Choose the plasmid and chassis (two kinds of plasmid are necessary for functioning in this experiment)
3. Use PCR to get DNA fragments we need from synthesized DNA which were made by biological company.

Fig.1 Electrophoresis results of PCR product of hnsp, cspA, mazE, mazEp2, cspA2, cpxR2 and maker2 (from left to right).

4. Add Gibson overlaps to target genes and make liner plasmids.
5. Assemble DNA fragments with plasmids as designed.

Fig.2 Electrophoresis results of PCR product of EGFP (with overlaps), T7 terminator, hnsp (with overlaps), hns2 (with overlaps), T7 terminator (with overlaps),pQE plasmid (linearized) (from left to right).

6. Transform two kinds of plasmids into chassis individually.

Fig.3 Bacterial culture and screen result of pACYC Duet-1 plasmid transformation.

7. Cultivate the chassis bacteria with ampicillin or chloromycetin by solid medium.

Fig.4 Solid medium culture result of bacteria with pACYC Duet-1 plasmid transformation (the most left one).

8. Select the survivors and cultivate them by liquid medium to amplify two kinds of target plasmids.

We have done the 8th procedure, and here's our Future work:
9. Sequence the plasmid.
10. Contransform two kinds of them to chassis bacteria.
11. Cultivate and measure the expression.
Team Members
Biology Group
Junli Du (student leader), Yue Zeng(student leader), Youqi Wang, Xiaoyan Zhang, Musen Lin, Xinyao Zhou, Yifei Zang, Qing Lu, Ruoheng Mo, Weining Li, Shulin Jin

Modeling Group
Bei Zhang, Jiahang Cao, Jiaohao Tian, Likun Zhang

Programmers
Yuze Fu, Yongye Su, Yawen Guan, Enze He

Designers
Jiangyue Yuan, Yilin Zhong, Yuhou Chen, Haoxi Zhang

Primary PI
Jianhua Yang

Advisors
Tianqin Li, Tianlun Lei

Instructors
Zhumei He, Jian Ren, Jianzhong Liu, Kun Zeng
Acknowledgements
Special Thanks to
Prof. Yang Jianhua
Prof. Ren Jian
Prof. He Zhumei
Prof. Liu Jianzhong
Prof. Zeng Kun
Guang Dong Ying Zun Law Firm
Ms Xiao Dongqi from KingMed Diagnotics
Mr. Li Bin
Mr. Li Tianqin
Mr. Lei Tianlun

Funding attributions:


Sun Yat-sen University

School of Life Sciences,
Sun Yat-sen University




School of Communication and Design, Sun Yat-sen University


School of Data and Computer Science, Sun Yat-sen University


School of Mathematics, Sun Yat-sen University


KingMed Diagnotics,Guangzhou headquarters