Contribution to Software
One of the main goals of our project is to design a set of toehold switches that are functional in mammalian contexts. Those toehold switches represent the sensing moiety of our therapeutic circuit. Upon sensing a trigger sequence in SARS-CoV-2 mRNA, they activate a downstream cascade that eventually interferes with SARS-CoV-2 replication.
Rather than reinventing the wheel, we looked for Software tools that can design mammalian toehold switches. To our knowledge, there is currently no available tool for that specific purpose. All available tools were originally designed to make prokaryotic toehold switches. Despite the structural difference in both classes of toehold switch, the design algorithm is comparatively similar. On a system level design, the algorithm is as follows: 1) pinpoint candidate trigger areas in the input sequence, 2) design toehold switches given these triggers, and 3) perform some energy calculations and structural predictions (Pardee et al., 2016). The design scheme in step 2 is a little different in both classes of toehold switches (discussed in detail in the Software page). Traversing a list of software tools designed by previous iGEM teams, we aimed to tweak an available software and make it fit our purpose.
We identified three candidate software tools belonging to three iGEM teams: CUHK, EPFL ,
Ulaval . Unfortunately, all the tools crashed. We submitted multiple queries to CUHK and Ulaval’s Toeholder web tools, but we did not receive results. Upon launching the EPFL source code, we received an ‘Internal Server Error’.
Being the most recent tool, Ulaval’s Toeholder was attractive for debugging and further development. Ulaval team was responsive and helped us debug some of the errors. The tool crashed when we inputted the full genome of SARS-CoV-2 ~30 kb, Ulaval team pinpointed that the tool was originally designed to handle small input of less than 7.7 kb. Moreover, Toeholder was built on older versions of liberaries. Following the documentation in Toeholder led us to install the most recent versions of libraries rather than the corresponding ones upon which the tool was built. This rendered the installed dependencies incompatible with Toeholder. After traversing the older versions of installed libraries, it turns out that Toeholder only functions when BioPython 1.73 or older is installed. To mitigate this issue and make it easier for future iGEM teams, we redocumented the installation procedures and specified the exact versions of libraries to be installed.
Figure 1 Prokaryotic vs Mammalian Toehold Switches The figure to the left is a mammalian toehold switch with the loop being a Kozak sequene and the black part is the start codon. The figure to the right is a prokaryotic toehold switch with the loop being a prokaryotic RBS which is separate from the start codon (the bulge in the middle)
Seeking to generate mammalian toehold switches, we changed the implementation of the get_full_switch function. The difference is mainly due to a change in the RBS and the start codon region. We engineered a loop that encompasses the Kozac sequence which itself includes the start codon. Structural differences are depicted in the figure below. Details are described in the Software page.
Toeholder was too slow and could not process large files. We tried running the tool on Google Colab, but it took Colab more than 12 hours to process a 1.8 kb Fasta file containing the S2 subunit of the Spike protein of SARS-CoV-2. Hence, we started thinking about further developing the tool by enhancing the processing time and processing capacity. It is evident that the original Toeholder sends the whole Fasta file as input for the mfe (minimal free energy) functionality of the NUPACK suite to perform energy calculations in each iteration. The number of iterations in the worst case is N, where N is the length of the input sequence. The complexity of mfe of is O(N3) (Zadeh et al., 2011), and this is the bottleneck.
Toeholder calls NUPACK mfe function N times and passes the whole input sequence as a parameter. The aim of using NUPACK mfe functionality is to predict the minimal free energy of binding between the toehold switch and the trigger. The original Toeholder treats the whole input sequence (S2 sequence in our case) as the trigger. In general, toehold switches sense a region of roughly 30 nucleotides. Based on the assumption that nucleotides in close proximity are more likely to interact than those that are far away from each other, we decided to parse the input sequence and send a small trigger of 200 nucleotides flanking the 30-nucleotide sensed region. Similar approaches are utilized to determine the secondary structure of conserved regions in SARS-CoV-2 genome(Huston et al., 2020; Rangan et al., 2020). Irrespective of the size of the input sequence, the mfe step will take a constant time, reducing the algorithmic complexity of this step from O(N3) to O(1). To compare the processing time of our modified Toeholder with that of the original package, we performed a benchmarking experiment. Results confirm that our developed version is much faster and can handle significantly larger files (see Software). Overall, as the size of the input increases, the amount of processing time in our developed Toeholder grows linearly rather than exponentially.
Figure 2 RBS-Linker RBS-Linker is the yellowish part of the figure which starts from the beginning of the Kozak loop to the end of the 21 nucleotide linker.
We also added more energy features. One important factor in determining the efficacy of toehold switches is the ∆GRBS-Linker (the gibs free energy of the sequence spanning the ribosomal binding site and the 21 nucleotide linker depicted in the figure below) (Green et al., 2014). As ∆GRBS-Linker approaches zero, the efficacy of the toehold is the highest as it signifies the availability of the RBS for binding to the ribosomes.
Additionally, we outputted the MFE difference depicted in the equation below:
MFE difference = MFE = ΔGbound - (ΔGtoehold + ΔGtrigger)
A more negative MFE difference indicates favorability of binding between the toehold and the trigger.
Finally, we changed the output format to display information about the toeholds and triggers sequences, structures, and energy calculations all in one csv file. This makes it easier for the users to decide on the suitable candidate toehold switches by simply analyzing one file rather than traversing a folder for each toehold switch as in the original package. Our criteria for selecting candidate toehold switches is detailed in the Modeling page.
Contribution to Parts
Toehold Switches
Using our improved version of Toeholder, We de novo designed a set of 30 toehold switches that are targeting the S2 mRNA of SARS-CoV-2. Of the 30 toeholds, 15 are series B-like and 15 are first generation-like (See Engineering and Parts).
siRNA
We contributed by creating two de novo double-stranded siRNA, targeting the viral mRNA coding for replicase proteins. Each siRNA sequence comprises 21 nucleotides that were selected using computational inferences by siDirect software tool and were then subjected to a BLAST search to ensure the lack of homology with any other off-target genes from the host (See Parts).
VLP
First, in order to enhance the production of proteins through the baculovirus system, we added the pSeL120 promoter (part number) that yields higher expression rates than the commonly used polyhedrin promoter in insect cells.
Second, we improved the polyhedrin promoter (BBa_K1734000) by adding to it the pSel promoter in tandem. Based on the literature, this composite promoter has higher expression activity than both of the promoter working separately. More information about the characterization of the promoter is found in the description of our composite part (part number). Also, we used this promoter in our composite part () to have higher yield of the reporter protein, deRed2, allowing us to measure the baculoviruses’ production rate.
Third, we also improved the human phosphoglycerate kinase promoter (BBa_J176028) by adding further information about its applications from the literature especially related to gene therapy through the lentivirus vectors as it related to our delivery system. This improvement justifies our choice for this promoter in the composite part () that represents a major part of our transfer plasmid. This plasmid is one of the four essential plasmids for creating our modular baculovirus-mediated lentivirus vector. It’s where the code for our toehold and siRNA construct will be inserted to be loaded into the vectors. However, for the design phase, we will be using the Green fluorescent reporter protein to be able to monitor the efficacy of the transduction. In this composite part (), we used the hPGK promoter and added WPRE to the end of the GFP to enhance its production rates.
Fourth, we improved the dsRed part (BBa_K1323009) by providing an insect cells codon optimized seqeucne for dsRed2, a modified rapidly maturing tetrameric fluorescent protein. This modified part (no.) has stronger expression and stability. Also, it can both act a reporter protein or help in creating easily trackable fusion proteins. Furthermore, we enhanced the production of the dsRed2 gene through this composite part (no.) where we used the improved polh-pSel promoter and added an AcMNPV late gene poly (A) tail to the end of the dsRed2 gene.
Fifth, we created 18 parts for our delivery system. The following composite parts (1,2,3,4) represents the essential components of the 4 minimal plasmids required to assemble our pseudoviral vectors. Those parts were created to give guidance for future iGEM teams to use our modular baculovirus-mediated lentivirus vectors that has specific-targeting capabilities and high capacity for loading. The following composite parts (1,2) are the ones required for producing the main structural and enzymatic proteins of the vectors. This composite part (3) is responsible for the insert of the gene of interest to be loaded into the vectors. The composite part (4) is responsible for the pseuodtyping and the expression of the SARS-CoV-2 spikes on the vectors surface. This part can be modified to allow the expression of different spikes. Hence, targeting different cells based on the project purpose. In our case, 3 variances of the SARS-CoV-2 spike were included in the following basic parts (no.1, no.2,no.3).
References
Green, A. A., Silver, P. A., Collins, J. J., & Yin, P. (2014). Toehold Switches: De-Novo-Designed Regulators of Gene Expression. Cell, 159(4), 925–939. https://doi.org/10.1016/j.cell.2014.10.002
Huston, N. C., Wan, H., de Cesaris Araujo Tavares, R., Wilen, C., & Pyle, A. M. (2020). Comprehensive in-vivo secondary structure of the SARS-CoV-2 genome reveals novel regulatory motifs and mechanisms. BioRxiv. https://doi.org/10.1101/2020.07.10.197079
Pardee, K., Green, A. A., Takahashi, M. K., Braff, D., Lambert, G., Lee, J. W., Ferrante, T., Ma, D., Donghia, N., Fan, M., Daringer, N. M., Bosch, I., Dudley, D. M., O’Connor, D. H., Gehrke, L., & Collins, J. J. (2016). Rapid, Low-Cost Detection of Zika Virus Using Programmable Biomolecular Components. Cell, 165(5), 1255–1266. https://doi.org/10.1016/j.cell.2016.04.059
Rangan, R., Zheludev, I. N., & Das, R. (2020). RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses [Preprint]. Genetics. https://doi.org/10.1101/2020.03.27.012906
Zadeh, J. N., Wolfe, B. R., & Pierce, N. A. (2011). Nucleic acid sequence design via efficient ensemble defect optimization. Journal of Computational Chemistry, 32(3), 439–452. https://doi.org/10.1002/jcc.21633