# Modeling
## Objectives
We employed several computational approaches to optimize the specificity of our biosensor towards magalasian *Dalbergia* species. Our objective was to design a sensor capable of detecting any sample from the *Dalbergia* species but with enough specificity to differentiate *Dalbergia* from other trees that are not protected by the CITES Appendix II [[1]](#ref1).
## DNA Barcoding
DNA Barcoding is a technology for the identification of species, that uses short DNA sequences with enough variability to discriminate among species. Depending on the target organism different genes can be used for precise identification [[2]](#ref2). For example, in animals, a portion of the cytochrome c oxidase I (COX I) gene is routinely used [[3]](#ref3). However, in plants, low substitution rates of mitochondrial DNA lead to the use of alternative genomic regions. Some of the most common genes used are: 1. rbcL:510, 2. matK:565, 3. trnL-F:459 [[4]](#ref4).
## Data
We first looked in the literature for complete *Dalbergia* genomes that would allow us to detect regions in the genome suitable to perform toehold identification. However, few complete genomes of *Dalbergia* have been published to date. Thus, we decided to work with sequences obtained from Bold Systems: a database specialized in DNA-based species identification [[5]](#ref5). Searching the database for *Dalbergia* we found 341 published records with specimens coming from 20 countries and deposited in 19 institutions. Specific to Madagascar, the database stores 177 published records from 32 species, and the majority of them store the sequence of the genes RbcL, MatK and TrnL-UAA. The records include samples from *Dalbergia monticola, chapelieri, baronii, madagascariensis, maritima* var. *pubescens, orientalis, occulta, normandii, trichocarpa* and others [[6]](#ref6).
In principle, we are designing a toehold switch [[7]](#ref7) capable of detecting any species of malagasian *Dalbergia* because all of the *Dalbergia* species are protected by the CITES Appendix II [[1]](#ref1). However, since these species clearly present differences at the level of these three genes, it is not immediately clear what should be the target of the Toehold switch. As a result, the team opted for two routes. First, we calculated consensus sequences for each of the genes reported in the database and designed toehold switches based on them. Second, we created toehold switches specific for each of the species and later compared between them. In this second option, we worked with the rosewood species with higher traffic volume: *Dalbergia maritima*.
## CUHK Model
Our first approach in designing our switches was the usage of a web tool developed by an earlier participating iGEM team: Hong Kong-CUHK 2017 [[8]](#ref8).
The CUHK model takes as an input our target rosewood sequence, chops it into smaller fragments and embedded them each into a fixed switch sequence. Then, the RNAfold algorithm, which predicts an unknown structure from a known sequence, is called to predict minimum free energy (MFE) of trigger, switch, switch-domains and trigger-switch complex. The multiple output switches are then ranked according to an efficacy score calculated with a linear regression model trained on 181 switches and their ON/OFF rate to identify the most promising switch candidates.
We used the CUHK model with the following parameters:
- Target RNA sequence: *Dalbergia maritima* MatK (BBa_K3453020), RbcL (BBa_K3453040) or TrnL-UAA (BBa_K3453060) gene fragments
- Loop sequence: Custom: GGACUUUAGAACAGAGGAGAUAAAGAUG
- Trigger length: 120
- Toehold domain length: 35
- Maximum toehold domain base pairs: 5
- Working temperature (°C): 37
- Promoter sequence: T7
- Linker sequence: AACCUGGCGGCAGCGCAAAAG
- RE site for cloning: --
The custom Loop and Linker sequences are those previously optimised for the Series B of toehold switch sensors for Zika virus detection [[9]](#ref9).
The CUHK model returned a list of 50 switches for each of the three *Dalbergia maritima* genes tested. However, the efficacy score of the toehold switches rapidly decreases so that for each gene only a few can actually work as possible candidates (Table 1).
In the top 3 ranked MatK switches by the CUHK pipeline, the minimum free energy of the switch monomers is approximately 10 units lower than those of the trigger monomers. Even more, the dimer configurations have more than double the value in MFE making them much more stable. These three switches also have a high difference between the MFE of trigger-switch dimer and the trigger-trigger dimer suggesting that the activated switch conformation is favored thermodynamically. Following their calculations, we expect that the MatK 1.1 and MatK 1.2 will have similar beneficial results and that MatK 1.3 will be useful but with lesser performance.
Comparing efficacy scores between genes, it seems that the toehold switches designed to identify TrnL-UAA will have a much higher performance compared to those designed for RbcL, and similar performance compared to best ranked MatK toehold switches.
A comparaison between the experimental results and this efficacy score is presented on the ["Engineering success"](https://2020.igem.org/Team:Evry_Paris-Saclay/Engineering) page of this wiki, suggesting that the efficacy score is not such a good predictor.
**Table 1.** Top 3 best ranked toehold switches estimated by the CUHK pipeline for the three *Dalbergia maritima* genes MatK, RbcL, TrnL-UAA. This set of 9 toehold switches was later synthesized and evaluated *in vivo* as well as in a cell-free system.
Minimum Free Energy (MFE)
Gene
RBS-Linker
Switch
Trigger
Trigger-Switch Dimer
Switch-Switch Dimer
Trigger-Trigger Dimer
MFE Difference
Efficacy Score
MatK
-7
-27.5
-19.2
-84.5
-61
-48.1
37.8
98
-5.9
-31.3
-21.1
-86.5
-68.4
-51.9
34.1
90
-4.8
-29.5
-21.1
-78.7
-64
-48.7
28.1
55
-7.5
-27
-19.2
-81
-62.7
-44.8
34.8
44
RbcL
-8.2
-24.2
-20.1
-79.8
-52.7
-56
35
35
-7.5
-29.1
-20.2
-85.2
-56.1
-47
39
32
-7.5
-24.8
-21.5
-84.6
-68.8
-53
34
15
TrnL-UAA
-5.7
-18.2
-23.3
-74
-45.7
-52.6
32.5
115
-5
-29.3
-20.2
-82.4
-62.6
-51.6
32.9
110
-7.7
-22.8
-22.9
-86.4
-52.6
-52.8
40.7
110
## NuPACK Model
Our second approach in designing our switches was the usage of the Multi-State Design function of the NuPACK software suite [[10]](#ref10). *Multitubedesign* takes a desired toehold structure in its on- and off-state together with a target sequence and other sequence constraints as an input and calculates putative switch sequences. In the past, the NuPACK software has been employed by other iGEM teams (e.g. CSMU_Taiwan 2020, EPFL 2019, Warwick 2018, CLSB-UK 2017, etc.) with the goal of designing toehold switches.
When designing toehold switches there are several constraints to be imposed for optimal performance.
#### Energetic constraints
1. Toehold switches must form a stable hairpin loop structure to avoid the expression of the downstream gene in the absence of the target.
2. The toehold switch - trigger complex and its single components should have similar stabilities so that the unfolding of the hairpin loop and the consequently activation of translation of the downstream gene is thermodynamically favorable.
#### Sequence constraints
3. The toehold switch sequence should not contain unwanted stop codons that forbid the translation of the downstream gene.
#### Material and experimental constraints
4. The temperature of the experiment was set at 37ºC because at this temperature we tested the toehold switches.
5. The material was chosen as Rna1999 which is the default in NuPACK for RNA calculations at 37ºC.
NuPACK works based on several user-defined components that range from domains to complexes. Our nupack model assumes that the switch and triggers are composed of the domains listed in Table 2. This highly detailed complex allows engineering of a more detailed RNA structure.
**Table 2.** Our NuPACK user-defined components.
Domain
Nucleotides
Five End Trigger
GGGN29
Trigger Binding Site
GGG + “Trigger Sequence”
Three End Trigger
N33
B
N3
C
N5
Ribosome Binding Site
AACAGAGGAG
D
ATAN3
ATG
ATG
E
N9
Linker
N21
GFP
ATGCGTAAA
### Processing
Given a species of *Dalbergia*, our software:
1. Downloads of all of the samples in Bold Systems from a specific species (each sample usually consisting of the three genes MatK, RbcL and TrnL-UAA).
2. Converts each of the sequences to its reverse complement, which will serve as the input to the NuPACK model. This step was skipped for *Dalbergia maritima* MatK and TrnL-UAA as the sequence in Bold were already in reverse orientation compared to the gene transcriptional orientation.
3. It cuts each of the genes in small chunks of 30 nucleotides with a jump step of 10 nucleotides, to explore the viability of every region of the gene as a toehold target. For example, the sequence of MatK, consisting of 500 nucleotides, is cut up into 12 pieces.
4. It runs a NuPACK model using the specifications described before to find the nucleotide sequence that most likely forms the input structure.
5. Next, with the aid of NuPACK [[10]](#ref10), RBS Calculator [[11]](#ref11) and Kinfold from the Vienna Package [[12]](#ref12) we rank the toehold switches predicted using several parameters, including:
1. The MFE of the structures predicted,
2. The rate of translation initiation for both the ATG of the stem and GFP,
3. The Hamming distance between the structures desired and predicted
4. Folding times for the C region of the hairpin.
### Implementation
The software was developed under the Python and Bash programming language. It uses only four scripts to achieve all the design tasks plus some quality control of the sequences. The details of the code will be open-sourced in an upcoming publication. More details can be discussed upon reasonable request.
### Results
From the 27 samples of malagasian *Dalbergia maritima* we obtained 435 toehold switches that were deemed viable by our requirements (Table 3). These toehold switches were inequitable distributed over the three genes that we had chosen, displaying why relying on a single gene for toehold specification can be very restrictive.
**Table 3.** Distribution of the 435 toehold switches designed to target the three *Dalbergia maritima* using the Toehold designer.
Gene
Toehold switches
MatK
12
RbcL
318
TrnL-UAA
105
We obtained free energy results for the secondary structure of the switches and triggers designed. In Figure 1, we show their free energies by corresponding genes. The large number of toehold switches designed for samples containing the RbcL switches are largely distributed over the energy range [-40,-15]. The lower number of toehold switches designed fo TrnL-UAA have a narrower range but share the same mode close to -25. Finally, the low number of MatK switches show a mode free energy in -20, close to 5 higuer.
**Figure 1.** Histogram of the Free Energy distribution of the switches generated for magalasian *Dalbergia maritima*.
We designed toehold switches for the *Dalbergia maritima* var. *pubescens* species but, in reality, several species of *Dalbergia* are commonly trafficked from Madagascar. Thus, the team also designed toehold switches for these other species and compared their efficacy (Figure 2 and Table 4).
**Figure 2.** Histogram of the Free Energy distribution of the switches generated for magalasian *Dalbergia occulta*.
**Table 4.** Distribution of the 435 toehold switches designed to target the three *Dalbergia occulta* using the Toehold designer.
Gene
Toehold switches
MatK
1
RbcL
45
TrnL-UAA
13
The success of our detection mechanism is based on selective pairing. For example, other legal logs allowed for trafficking in Madagascar should not be detected because this scenario would carry a financial and commercial cost with the ultimate consequence that our device is not widely appropriated. Thus, we used other biological samples arising from Madagascar to simulate molecularly what would happen if a sample from a non *Dalbergia* tree is tested.
As described above, the ranking was mainly based in 4 parameters of the toehold switches and in addition, their secondary structures (Table 5). Even though the toehold switches predicted by our tool do not reach the MFE calculations estimated by the CUHK software, they have good rates of translation initiation for both the ATG of the stem and GFP as well as good folding times and secondary structures (not shown).
**Table 5.** Thermodynamic parameters of the three best ranked toehold switches estimated by the in-house pipeline for three *Dalbergia maritima* genes MatK, RbcL, TrnL-UAA. This set of 9 toehold switches was later synthesized and evaluated *in vivo*. * represents a miscalculation in the software. TIR2 stem stands for Translation Initiation Rate of stem in the Toehold Switch on state.
Gene
TIR Stem
Fold Change TIR stem
MFE Trigger
MFE Hairpin
MatK
103355
50
-21,6
-4,8
43951
82
-25,5
-4,4
36710
147
-19,9
-8
TrnL-UAA
72106
98
-20
-8,6
65899
*
-22,1
-13,8
62999
26
-17,8
-4,7
RbcL
113089
90
-16,3
-8
94458
652
-24
-9,7
86328
86
-21,4
-6,6
### Conclusion
With the overarching goal of saving the Rosewood trees, the in-house pipeline developed is predicated on the principle of inverse RNA folding where it takes a target sequence and defined structures to predict the final toehold switch sequence. Further, the output switches are ranked and chosen according to several metrics: (i) the predicted Translation Initiation Rates of the GFP reporter in the OFF and ON states, (ii) the Hamming distances between the predicted and the desired structures, and (iii) the minimum free energies of the different switch domains. We successfully tested several switches designed *in silico* in *E. coli* cells as well as in an *E. coli* cell-free system.
## Moving forward
Until now, toehold switch based sensors have been mostly designed to detect only a single target RNA at a time, for example: Zika virus [[9]](#ref9). Other possible applications are the usage of multiple switches to construct logic gates and perform cellular computation [[13]](#ref13). As *Dalbergia maritima* can be identified by sequencing/detecting one of the three genes MatK, RbcL and TrnL-UAA, the detection accuracy could be increased by building a logic circuit, for example an AND gate, consisting of three switches, that only shows activation if all three *Dalbergia maritima* genes are present."
Moreover, recent advances in the field of RNA secondary structure predictions can also be incorporated in further iterations of the project. For example, recently a dataset of 91534 toehold switches was synthesised to train a Deep Neural Network model that outperformed previous state-of-the-art thermodynamic and kinetic models like the ones used here [[14]](#ref14).
## References
[1] Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES). Appendices | CITES.
[2] Savolainen V, Cowan RS, Vogler AP, Roderick GK, Lane R. Towards writing the encyclopedia of life: an introduction to DNA barcoding. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences (2005) 360: 1805–1811.
[3] Hebert PDN, Stoeckle MY, Zemlak TS, Francis CM. Identification of Birds through DNA Barcodes. PLoS biology (2004) 2: e312.
[4] CBOL Plant Working Group. A DNA barcode for land plants. Proceedings of the National Academy of Sciences of the United States of America (2009) 106: 12794–12797.
[5] Ratnasingham S, Hebert PDN. BOLD: The barcode of life data system (http://www.barcodinglife.org). Molecular Ecology Notes (2007) 7: 355–364.
[6] Hassold S, Lowry PP 2nd, Bauert MR, Razafintsalama A, Ramamonjisoa L, Widmer A. DNA barcoding of Malagasy rosewoods: towards a molecular identification of CITES-listed Dalbergia species. PLoS One (2016) 11, e0157881.
[7] Green AA, Silver PA, Collins JJ, Yin P. Toehold switches: de-novo-designed regulators of gene expression. Cell (2014) 159, 925-939.
[8] To AC-Y, Chu DH-T, Wang AR, Li FC-Y, Chiu AW-O, Gao DY, Choi CHJ, Kong S-K, Chan T-F, Chan K-M, Yip KY. A comprehensive web tool for toehold switch design. Bioinformatics (2018) 34: 2862–2864.
[9] Pardee K, Green AA, Takahashi MK, Braff D, Lambert G, Lee JW, Ferrante T, Ma D, Donghia N, Fan M, Daringer NM, Bosch I, Dudley DM, O’Connor DH, Gehrke L, Collins JJ. Rapid, low-cost detection of Zika virus using programmable biomolecular components. Cell (2016) 165: 1255–1266.
[10] Zadeh JN, Steenberg CD, Bois JS, Wolfe BR, Pierce MB, Khan AR, Dirks RM, Pierce NA. NUPACK: Analysis and design of nucleic acid systems. Journal of Computational Chemistry (2011) 32: 170–173.
[11] Salis HM. The ribosome binding site calculator. Methods in Enzymology (2011) 498: 19–42.
[12] Lorenz R, Bernhart SH, Höner Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker IL. ViennaRNA Package 2.0. Algorithms for molecular biology: AMB (2011) 6: 26.
[13] Green AA, Kim J, Ma D, Silver PA, Collins JJ, Yin P. Complex cellular logic computation using ribocomputing devices. Nature (2017) 548: 117–121.
[14] Angenent-Mari NM, Garruss AS, Soenksen LR, Church G, Collins JJ. A deep learning approach to programmable RNA switches. Nature Communications (2020) 11: 5057.