Team:SYSU-CHINA/Engineering

Engineering
Overview
Although the progress of our experimental work is limited due to the impact of the epidemic, we are still constantly improving our project in various ways, including regular group meetings, communication with our PI, organization of online interviews and surveys, and exchanging opinions and ideas with different teams as well as related groups. Also, we've spent much effort on the design of algorithms and modeling work in order to achieve part of our goals.

With sufficient theoretical support, detailed experimental design, the assistance of algorithm and models, and all those different voices from different fields, we are convinced that this project can achieve the desired results if conditions permitting.
Design scheme
In our project, an algorithm-guided model is trained from natural substrates of ADAR1 and used to establish a candidate dsRNA library. Those candidates are connected with sequences that help them cyclize and transferred into HeLa cells. In cells, they compete to bind ADAR1 with an editable stem-loop which has a toxic gene downstream of it. The whole system for directed evolution is regulated by IFN-α and Tet-on system. In this way, the cells will only survive when endogenous ADAR1 is inhibited by transferred dsRNAs. Efficient substrates are then extracted and used to train the model above for the next round. Through the continuous cycle of this screening process, we can obtain high-affinity inhibitors of ADAR1 efficiently, and extend this model to other RBPs. (To learn more: https://2020.igem.org/Team:SYSU-CHINA/Design
Component construction and validation
In order to better put the design scheme into practice, we developed a detailed plan, starting with each component of the scheme for more detailed design and verification. Some of the basic sequences (Stem-loop and apoptin, dsRNA with ribozyme and ligation sequence) were synthesized by the company, and plasmid carriers (pCMV-Tag 3B, pGPU6-GFP-Neo, pTRE2hyg, pTet-On-Advanced) were purchased for preparation of components assembly.

1. The killing efficiency of toxic genes
In our design, cells transfected with low-affinity dsRNA will die because the stem-loop is edited. Therefore, we need to construct parts separately to confirm the killing efficiency of the selected toxic gene——apoptin in HeLa cells, so as to determine how many " losers" have escaped the screening.

We insert apoptin downstream of pCMV on the plasmid pCMV-Tag3B and obtain the killing efficiency of apoptin by calculating its mortality after the recombinant plasmids are transfected into the cells.
Figure 1. pCMV-apoptin
2. Leaked expression of stem-loop
Figure 2. Three different groups of stem-loop
Although the stem-loop structure can prevent the expression of apoptin downstream of it theoretically, we know from the literature that the stem-loop still performs a certain degree of leaky expression. That is, even if the stem-loop is not edited, the cell may die because the toxin gene downstream is expressed. Therefore, we need to determine to what extent the observed cell death is due to the stem-loop being edited and failing to prevent downstream gene expression by comparing the experimental group with the positive control and the negative control[1]. Thus, we will know how many “unfortunate winners” are cleared by mistake.

We inserted GFP, stem-loop and apoptin successively downstream of pCMV. After transfecting the plasmid into the cells, we will verify the parts by calculating the fluorescence level and mortality of the cells in the three different groups.

Figure 3. pCMV-GFP-STEM-apoptin
3. Cyclization of dsRNA
As mentioned above, ribozymes and ligation sequence are added at both ends of dsRNA to cyclize it, so as to improve the intracellular stability of dsRNA and enable it to accumulate enough concentration to function in the cell[2].

In order to determine whether the cyclization succeed, in addition to complete enzyme digestion and electrophoresis method as our references, we'll also verify the dsRNA cyclization by rt-PCR. We will use a pair of target sequence primers that amplifies counter. So, when the cyclization succeeds, we can get a fragment with expected length. While it fails to cyclize, expansion will not be performed or we can only get some short fragement.

Figure 4. pGPU6-dsRNA
Figure 5. Schematic diagram of the method for detecting dsRNA cyclization
4. The efficiency for Tet-on system to start transcription process
Figure 6. Tet-on system
In our design, we use the Tet-on system to control the expression of the stem loop-toxic gene. The Tet-on system allows us to start the transcription as needed[3]. At the same time, the efficiency of its initiation of gene transcription determines how many "stem-loop competitors" will compete with the candidate dsRNAs transferred into the cell to bind to ADAR1. This is one of the factors that affect the selection pressure on dsRNA.

We will insert the GFP gene into pTRE2hyg, culture the transfected cells with Dox in an increasing concentration gradient, and reflect the relationship between the transcription efficiency of Tet-on system and Dox concentration through the level of fluorescence.

Figure 7. pTRE-GFP
5. Concentration of IFN-α and ADAR1 editing level
IFN-α is another factor that affects the selection pressure on dsRNA, because its concentration affects the level of ADAR1 editing in cells. In other words, the concentration of IFN-α determines the difficulty for dsRNA to bind with ADAR1 because the higher the concentration of IFN-α, the higher the expression level of ADAR1. To effectively prevent ADAR1 from editing the stem-loop, the dsRNA is required to have a higher affinity. Therefore, we need to determine how much pressure different concentrations of IFN-α will cause on dsRNA.
There are two verification schemes:
1) Divide the HeLa cells into groups and add IFN-α in a concentration gradient. After culturing for a period of time, extract the protein to perform western blotting
2) Transfect pCMV-GFP-STEM-apoptin that has been verified in stage 2 mentioned above. Then add IFN-α in a concentration gradient and culture for a period of time, compare the cell mortality between cells transfected with pCMV-apoptin and cells transfected with pCMV-GFP-STEM-apoptin without adding IFN-α.

6. Current results
1) Completed construction of components and verified by sequencing
2) Some of them were transfected into cells and a small amount of data results were obtained (ImageJ was used to analyze the total fluorescence intensity and count the number of fluorescent cells) . Results showed that the Dox concentration affected the efficiency of the promoter in initiating transcription.
3) A composite part pTRE-GFP-STEM-apoptin (BBa_K3502011) was constructed and we found that the introduction of stem-loop would reduce the background expression of tet-on system, suggesting that the gene expression level upstream of stem-loop would be affected when an inducible promoter and stem ring structure were used simultaneously. It should be noted that the latter may have an influence on the effect of the former.
Figure 8. The fluorescence of the cells after transfected with pTRE-GFP and pTRE-GFP-STEM-apoptin, and cultured with Dox with concentration gradient for 48h. The left one is the negative control transfected with pTRE2hyg
Figure 9, 10. The relationship between Dox concentration and the number of fluorescent cells as well as total fluorescence intensity after transfection with different plasmids
Directed evolution
Once all the components are validated, we begin the part of directed evolution. There are three different plasmids to be transferred into Hela cells: pTRE-GFP-STEM-apoptin, pGPU6-dsRNA, pTet-On-Advanced. The dsRNA library is constructed under the guidance of algorithm and by error-prone PCR. All these candidates will be assembled with the sequence that promotes cyclization, and then transferred into the cell with the stem loop-toxic gene. When the dsRNA has a higher affinity to ADAR1, the stem-loop will not be edited and the cell survives. Conversely, if the stem-loop is edited, the cell dies, and the corresponding dsRNA is eliminated. The results will be fed back to the algorithm, so to train the model for the next round.
Algorithm and modeling work
1. Modeling work
Fitting experiment data and running simulation models are two useful ways we used to build models for the wet lab experiment. For example, our curvefitting result of DOX-MFI data showed a good fitting effect (see Figure 12.). The simulation model also shows different amounts of toxic gene expression under different IFN and DOX dosage (see Figure 13.). More data is needed to test if the models work well and if it could be used to guide our experiment. However, because of COVID-19, we lack time for getting more data from the experiment to verify the reliability of the models.

At present, due to the lack of experimental data, we turn to the data in the literature to prove that our model is feasible, and to some extent, to guide our experiments. Yet whether the data in the literature can be completely suitable for our experiments is still to be further confirmed.

To learn more about our experimental modeling, click here:https://2020.igem.org/Team:SYSU-CHINA/Model

Figure 11. MATLAB curvefitting result of MFI-DOX scatter
Figure 12. Different amounts of toxic gene expression under different IFN and DOX dosage
2. Deep learning
Figure 13. Results of Training Dataset. The left one is the core of our analysis, and the right one is the accuracy of model prediction.
In the left figure, we can observe that the loss of the model on the training set is decreasing, which means that the neural network is learning constantly. To learn more about our Convolution Neural Network, click here:https://2020.igem.org/Team:SYSU-CHINA/Model

3. Feature extraction
Table 1, 2. The top 10 mers marked in the figure of 5-mers and 6-mers are shown in table 1, 2.
As shown in table 1 and 2, we successfully obtain the top 10 sequences of 5-mers and 6-mers in both positive and negative training sets. By analyzing these, we can acquire the information of which sequence is more likely to appear in which training set, so as to learn more about how to obtain a dsRNA with higher affinity with ADAR1.

To learn more about our feature extraction work, click here:https://2020.igem.org/Team:SYSU-CHINA/Model
Future Work
  • Experiment Part
  • 1. Verification and characterization
    We will complete the validation of each component in HeLa cells and adjust our plan according to the results. And we plan to extend our list of toxic genes, in order to find the most suitable and effective one.
    2. Close the loop between experiment and algorithm
    When we get the first outcome of our experiment, it's time for us to put this data into our algorithms as input to promote their accuracy, so to obtain the evaluation of the effect of directed evolution and the further prediction of the effective substrate. Furthermore, we can test the output of our algorithm by applying them into our following experiments. By closing this loop between the two parts, we can be able to promote both of them at the same time.

    Figure 14. Close the Loop
  • Model Part
  • 1. Modeling work
    In the future, we plan to substitute the parameters learned from literature for our real experiment result, so as to make our model guide each step of the experiment better. for example to calculate the concentration of various substances required to obtain the target effect.

    Furthermore, we hope that through our modeling work, the dimensionless quantity "affinity" in our deep learning part can be connected with the real data in our experiments, passed through an "evolution percentage" defined by ourselves (that is, the ratio of the actual inhibition capability of dsRNA to ADAR1 to the ideal situation), so that our deep learning work can better guide our experiments.

    2. Deep learning
    Whether a neural network has good generalization ability largely depends on whether it can obtain sufficient effective data for training. In our project, we can collect more data on the affinity of dsRNA and the substrate through experiments, so as to expand the scale of the train set. At the same time, in-depth cooperation between biology and data science will help to explore more effective encoding methods of dsRNA sequences, enabling the network to extract characteristics from dsRNA more conveniently and comprehensively. In addition, we can compare the performance of neural networks with different structures in processing dsRNA sequences. We can even fine tune the existing neural networks according to the data characteristics of dsRNA sequences to obtain a better model.

    In the future, due to the fine portability of neural network algorithms, we expect that our network can be more widely used. Any kind of base sequences a network tries to process can conform to the input format of our network simply after being one-hot encoded. As for the output format of the network, our model has fewer restrictions. Although we discretize the affinity value in our project and thus transform the task into a binary classification problem, the network can also fit into both regression and multi-classification problems.

    3. Feature Extraction
    Due to the limitation of time and capability, we only did our feature extraction on the primary structure of RNA. To obtain more reliable features, our next goal is to explore the prediction model of the secondary structure by using tools like RNA Structure. Consequently, combine the primary and secondary ones to build a new prediction model which can improve accuracy of the outcome.

  • Implement our work to the real world
  • Although our project concentrates more on a basic new method for directed evolution of dsRNA, how we implement our work to benefit the real world still needs to be addressed. Therefore, we've gotten in touch with several biotechnology companies and researchers to learn about the research status of nucleic acid drugs, nucleic acid inhibitors and so on. (To learn more: https://2020.igem.org/Team:SYSU-CHINA/Human_Practices)

    In the future, we will continue to exchange ideas and opinions with biotech and pharmaceutical companies as well as relevant researchers and potential stakeholders, so to provide effective references for the development of nucleic acid drug, screening for inhibitors of RBP and so on.

    Reference
    [1] Fritzell K, Xu LD, Otrocka M, Andréasson C, Öhman M. Sensitive ADAR editing reporter in cancer cells enables high-throughput screening of small molecule libraries. Nucleic Acids Res. 2019 Feb 28;47(4):e22. doi: 10.1093/nar/gky1228. PMID: 30590609; PMCID: PMC6393238.
    [2] Litke JL, Jaffrey SR. Highly efficient expression of circular RNA aptamers in cells using autocatalytic transcripts. Nat Biotechnol. 2019 Jun;37(6):667-675. doi: 10.1038/s41587-019-0090-6. Epub 2019 Apr 8. PMID: 30962542; PMCID: PMC6554452.
    [3] Das AT, Zhou X, Metz SW, Vink MA, Berkhout B. Selecting the optimal Tet-On system for doxycycline-inducible gene expression in transiently transfected and stably transduced mammalian cells. Biotechnol J. 2016 Jan;11(1):71-9. doi: 10.1002/biot.201500236. Epub 2015 Sep 24. PMID: 26333522.