Team:SYSU-CHINA/Contribution

Contribution
What We achieve
All components were built and validated by sequencing
We’ve completed the construction of all components with high efficiency and execution within a month. The details are as follows:

1. Preparation
1) Parts synthesized by the company:
Stem-loop and apoptin (including experimental group, positive control and negative control)
dsRNA with ribozyme and ligation sequence
2) Plasmids:
pCMV-Tag 3B
pGPU6-GFP-Neo
pTRE2hyg
pTet-On-Advanced

2. Constructed components and what it uses for
1) pCMV-apoptin
Toxic gene transcription is initiated by constitutive promoters pCMV. This component is used to determine the killing efficiency of apoptin.
2) pGPU6-dsRNA
The transcription of dsRNA is initiated by U6 promoter. This component is to confirm the effect and efficiency of dsRNA cyclization.
3) pCMV-GFP-STEM-apoptin (including experimental group, positive control and negative control)
The transcription of GFP gene, stem-loop and apoptin are initiated by constitutive promoters. These three parts is used to determine the effect of the stem-loop and the relationship between IFN-α and the editing efficiency of ADAR1.
4) pTRE-GFP
It is used to characterized the efficiency of Tet-On system by fluorescence level, and to determine the relationship between the concentration of applied Dox and the efficiency of the promoter in initiating transcription.
5) pTRE-GFP-STEM-apoptin
The target sequence. It will eventually compete with pGPU6-dsRNA to bind ADAR1 in Hela cells.

3. The result of component construction
Figure 1. esults of electrophoresis: pCMV-apoptin (4628bp) indicated by the red arrow and pGPU6-dsRNA (5332bp) indicated by the blue arrow
Figure 2. Results of electrophoresis: pTRE-GFP (6060bp)
Figure 3. Results of electrophoresis: pTRE-GFP-STEM-apoptin (6546bp)
Figure 4. Results of electrophoresis: pCMV-GFP-STEM-apoptin (5421bp), pCMV-GFP-STEM_positive-apoptin (5421bp), pCMV-GFP-STEM_negative-apoptin (5403bp). pCMV-GFP-STEM-apoptin does not have a normal band but a shallow mark indicated by the blue arrow

4. Basic Part
This year we recharacterize a component (BBa_K2748002) to further determine the relationship between Dox concentration and the efficiency of Tet-on system in initiating transcription.

A paradigm and general methods of dsRNA directed evolution
At present, the directed evolution technology of enzymes and other proteins is relatively mature, while the methods for RNA are limited. We believe that our project can fill this gap to a certain extent. By connecting the ADAR1 editing event with a selectable trait, we effectively combined the idea of directed evolution and deep learning algorithm that can extract and analyze RNA features, which allows us to screen out RNA with higher affinity effectively. (See more on https://2020.igem.org/Team:SYSU-CHINA/Design)

A mathematical modeling for experimental dose analysis
We used simbiology to draw a curve by substituting data, which preliminarily proved the feasibility of our pathway modeling. More importantly, we've made improvement of the team 2013 SYSU-China's modeling part on how to calculate the death rate of cells. We are currently using it for the measurement of the toxicity of our toxin gene, just as how it was used in 2013 SYSU-China's project. And with some modification of the formula, we would be able to use it to calculate the death rate of the cells in the following experiment, thus providing data for the measurement of "evolution percentage". (See more on https://2020.igem.org/Team:SYSU-CHINA/Model)

An algorithm for RNA feature extraction and analysis using deep learning algorithms as the framework
We extract the first-order sequence of dsRNAs for analysis as well as innovatively introduce the relevant knowledge of machine learning field into the project. Based on the corresponding situation of dsRNA and its expression effect obtained in our previous experiments, the data set was constructed and used as the training set. In this way, the neural network model can be used to predict the effect of training on the basis of the neural network. Consequently, the computer can predict and judge the expression effect in advance by inputting RNA, so as to reduce the workload of experimenters. By referring to our algorithm with a larger dataset, others may get more pleasant result! Additionally, this can also be generalized to study other RNA binding proteins. (See more on https://2020.igem.org/Team:SYSU-CHINA/Model)

A new idea for the development and screening of RNA drugs
Some of the RBPs have strong relationship with various diseases. If the corresponding inhibitors can be screened out in more effectively, it is bound to improve the treatment of related diseases and eliminate negative impacts. In our project, directed evolution is used to screen the dsRNA substrate of ADAR1, which provide reference for screening nucleic drugs. So far, we have also communicated with relevant pharmaceutical companies and look forward to further cooperation. (See more on https://2020.igem.org/Team:SYSU-CHINA/Human_Practices)
Learn from our Experience
Troubleshooting in Modelling Work
1. How to Express the Progress of Directed Evolution Accurately
In the directed evolution related experiment modeling, with the inspiration of Senior Jianzhao, Yang's suggestion on quantitation of selection pressure, we defined a concept of "evolution percentage", which is used to mathematically connect dsRNA inhibition ability and cell survival rate, to express the progress of our directed evolution more accurately.
We believe that our experience on the definition of "evolution percentage" would help with the modeling of projects similar to ours.

2. Experience of using Matlab
The application called simbiology from software MATLAB did play an important role in our modeling part. We draw diagrams to show the pathways we applied in the cell and use them to run simulate models, from which we got overview of how the concentration of different substrates changed over time. We've learned a lot from simbiology tutorial on the official website of MATLAB and we also make exploration on how to use it for experiment modeling. The modeling page shows what we've done with simbiology.

Troubleshooting in Algorithms
1.The attempt and abandonment of the Fully Connected Neural Network
Fully Connected Neural Network is our first attempt. Since the fully connected layer receives one-dimensional vector as input, we used 1, 2, 3, 4 to refer to base A, C, G and T respectively, and vectorize the original dsRNA sequence. But the result is far from acceptable, and we think that the reason is that we chose a relatively inappropriate encoding method. Specifically, using 1, 2, 3, 4 to represent base A, C, G, T respectively might not be a good idea. The values of the four bases are quite different, which makes the neural network excessively unstable.

We then consider using a new way to encode dsRNA sequences, focusing on limiting the values of each element to [0,1]. A direct idea is One-Hot Encoding, which is our ultimate encoding method. However, since the Fully Connected Neural Network can only deal with one-dimensional vector, the two-dimensional (777,4) matrix has to be flattened according to the first dimension. After improving the existing problem of data encoding, the model can achieve a prediction accuracy of 62% on the test set. But we still think that the interpretability of the model is not good enough, because the Fully Connected Neural Network regards each position of the sequence as an independent feature, and it does not take into account the contribution of the sequence composed of multiple continuous bases to affinity. In fact, the Fully Connected Neural Network is more suitable for dealing with the relatively independent features, but not suitable for data types such as sequence, where the order of bases really makes a difference .

2.Use Recurrent Neural Networks
After summarizing the reasons for the failure of a Fully Connected Neural Network, we have come up with the idea of using Recurrent Neural Networks, which are particularly suitable for processing sequential data. Its advantage is that it takes into account the cumulative effect of all bases that appear before a single base. However, after trying to use several variants of the Recurrent Neural Network, we find that the learning ability of the model is very poor, or its learning seems to be stagnant. We continue to reflect on the loopholes in the model and realize that it may be better to consider only a few adjacent bases than to accumulate the utility of all the previous bases, because bases that are too far away from the current base may not be related to the current base.

This leads to the final model of our network: one-dimensional Convolutional Neural Networks. The advantage of Convolutional Neural Network is that they can extract the local features of a sequence. Only by specifying a window size, the network can learn the effective features with the same size as the window from dsRNA sequences. The final result, as we have shown, is that the loss of the model in the training set is decreasing while the loss is first decreasing then rising on the validation set, which is very consistent with our expectation.

3.How to Process Sequences for Extraction
When we tried to extract the features of the sequence around the editing sites, we found that there were few editing sites which concentrated only on a few sequences, addtionally, the motifs which have editing function are not the same as those with binding function, which prevents them from assisting in determining the affinity.To solve this, we used k-mers to slice each sequence with a step size of 1.

Troubleshooting in Experimental Work
1. Apoptin is not suggested to be used in HEK293
To test the killing efficiency of apoptin, we first transferred pCMV-apoptin into HEK293 to see whether it would work. As we have previously learned about, after days of culturing, the death rate of our cells between experiment group and control group is quite approximate, which suggests that apoptin may not be an adequate toxic gene for experiment based on HEK293.

2. Quick and easy way to test the cyclization of RNA
According to the literature we found, the cyclization of RNA adapter was verified by enzyme digestion and electrophoresis. In practice, we found it is quite difficult to realize in a short time. At the suggestion of Elder brother Yang Wenbing, we designed a pair of primers covering the junction, and verified the cyclization through RT-PCR and electrophoresis. This helps amplify the signal. If the cyclization is successful, we can get the sequence with the expected length, otherwise the fragment will be short.
Figure 5. Schematic diagram of the method for detecting dsRNA cyclization