Team:Heidelberg/Triple Helix

DNA:RNA Triple Helices
Tethering RNA to DNA

Introduction


We all learn at school that two nucleic acid strands can interact with each other by forming hydrogen bonds between adenine and thymine or between guanine and cytosine bases to form a double helix. This phenomenon is known as Watson-Crick interaction. Another form of interaction in which nucleic acids can bind is the Hoogsteen interaction. The difference lies in a different orientation of the bases which leads to different atoms forming hydrogen bonds (see Figure 1). Hoogsteen interaction can cause alternative DNA and RNA structures, such as a G-quadruplex and triple helix.triple_helix_1triple_helix_7

Figure 1: Comparison of Watson-Crick and Hoogsteen interactions
Watson-Crick and Hoogsteen interactions differ in the way the bases are orieneted and in the atoms forming the hydrogen bonds. A and B represent Watson-Crick interactions. C and D represent Hoogsteen interactions. A and C represent interaction between cytosine and guanine. B and D represent interaction between adenine and thymine



In our experiment we focused on the RNA•DNA–DNA triple helix (• = Hoogsteen interaction, – = Watson-Crick interaction). It forms when RNA is inserted into the major groove of the DNA double helix. The bonds between different bases have varying stability, which results in a sequence-specific binding. Specificity and bond strength are significantly weaker compared to the Watson-Crick interactions.triple_helix_1

Our goal is to design an RNA that can form a triple helix with a specific DNA sequence and simultaneously attract transcription factors. To attract transcription factors, an RNA is designed in a way that it forms a hairpin structure, which can be bound by RNA-binding proteins. If the transcription factor is now coupled to an RNA-binding protein, it is possible to recruit them, targeting a specific sequence.

Motivation


In recent time research put a lot of effort in the creation of logic gates in living cells on the level of transcription. One promising approach is to use dCas fused to a specific activator and multiple different guide RNAs, that bind to the regions flanking the promoter of a reporter gene. Alternatively a dCas fused to a repressor may be used.triple_helix_2 The problem with this approach is that it would require a huge protein complex. Firstly, it forces the scientists to use multiple plasmids in order to deliver the entire system into the cell. Secondly, its production uses a lot of the resources of a cell.

Here, a more lightweight system, which is entirely RNA based could greatly improve efficiency. A possible solution is to use a RNA that can form a triple helix. It would greatly reduce the base pair size of the construct and would come at a much lower production cost for the cell. Additionally, advances made in translational cell regulation, such as riboswitches, could be also implemented on the transcriptional level.triple_helix_8

Design

Design of the triple helix


It was recently shown by Kunkler et al. that the shortest RNA that creates a triple helix with DNA in vitro is 19 base pairs long.triple_helix_1 When the length of the RNA strand was increased, bond stability was not augmented. Therefore, we designed RNA-DNA pairs to be 19 nucleotides long. Firstly, we generated a random 5’-3’ DNA sequence containing all nucleotides except thymine. Thymine was not preferred as it creates weak bonds with all possible nucleotides of RNA. The usage of thymine would therefore decrease the stability and sequence specificity. Then, an RNA sequence was generated by using the nucleotides that created the strongest bonds with corresponding nucleotides of the DNA, as determined by Kunkler et al.triple_helix_1 The pairs that were used are provided in Table 1. To avoid errors and speed up the work of generating RNA sequences, we used a script which is available in our github repository.

Table 1: DNA and RNA nucleotide pairs used for triple helix design.

DNA RNA
Adenine Uracil
Cytosine Uracil
Guanine Cytosine

Additionally we used the the following sequences, proven by Kunkler et al., that can create a triple helix in vitro: RNA (5’-UUUUUCUUUUCUUUUCUUUCUU-3’), DNA (5’-AAAAAGAAAAGAAAAGAAAGAA-3’).triple_helix_1 Also, these parts can be found in the registry: BBa_K3657022 and BBa_K3657023.

Background


The ability of RNA and DNA to form triple helices may be fascinating on its own, but how exactly can it be used to regulate gene expression? In recent years there has been much research on the CRISPRa/i system. It’s based on designed guide RNAs that can bind through complementary base pairing in front of the promoter region.triple_helix_9 This allows gene expression to be specifically regulated. In early experiments, the transcription factors were fused to the Cas protein.triple_helix_9 In more recent attempts, transcription factors were attached to the guide RNA to increase the modularity of the system.triple_helix_3 One way to achieve binding of transcription factors to guide RNA is to use the MS2 coat protein, which can bind an RNA hairloop. Chen Dong et al. designed a system consisting of guide RNA with an MS2 hairpin and different transcription activators fused to the MS2 coat protein. This system enables the change of the guide RNA or the transcription factor, without the necessity of introducing multiple dCas proteins into a cell.triple_helix_3

Chen Dong et al. scanned multiple transcription activators and showed that SoxS was the most effective. SoxS is a part of the superoxide response regulation.triple_helix_5 In order for activation to function, the activation factor should be brought near to the promoter of the gene of interest. Here fine-tuning is necessary. It was shown that after a certain point, closer proximity has a negative effect on the efficacy of the activation.triple_helix_4 One has to find the sweet spot. In case of the dCas9-RpoZ (dCas fused to the RNAP omega subunit), this sweet spot binding site is located 91 base pairs in upstream of the transcription starting site, as demonstrated by Bikard et al.triple_helix_4

Design of the Experiment

Proof of concept - Recruitment of activator


For the first experiment we wanted to find out whether the RNA sequence of the triple helix could replace the dCas as the specific DNA binding factor and whether it could bring a transcription activator in proximity of a reporter gene. Our approach was similar to the experiment that Chen Dong et al. conducted with a dCas.triple_helix_3 Using the MoClo system and the Marburg collection we designed three different expression cassettes.

The first consists of SoxS, fused to the MS2 coat protein. The second expresses the RNA, which consists of a region that can form the triple helix and a MS2 hairpin loop. The sequence for the RNA is flanked by hammerhead ribozymes. These ensure that the RNA part needed to form the triple helix and the MS2 hairpin can get cut out, to prevent any disruptive overhangs that could form unwanted secondary structures. The third expression cassette is a superfold GFP cassette with a weak constitutive promoter(P_J23109). The DNA sequence, which can form a triple helix, is inserted upstream of the promoter. The first and second “level 1” plasmids are combined into one regulatory “level 2” plasmid. The reporter plasmid is used directly as a “level 1” plasmid. The two plasmids were co-transformed into one cell.

The hypothesis is that, when expressed, the MS2 coat protein binds the MS2 hairpin. This leads to the assembly of the activator. The construct can now bind to the sequence in front of the superfold GFP promoter and brings SoxS in close proximity to it, which activates the expression of the fluorescent protein (see Figure 2).

Figure 2: Schematic representation of the first experiment
RNA binds to the DNA by creating a triple helix. The MS2 binds to the hairpin of the RNA. Therefore the SoxS protein is in the proximity of the promoter and will activate the expression of the gene.

With this basic design multiple assays are possible. Firstly, we wanted to test whether three different DNA and RNA pairs form a triple helix in vivo. One pair was proven by Kunkler et al. to fold into a triple helix in vitro.[1] The other two were designed as described above. Secondly, we wanted to determine the optimal distance between RNA binding site and the promoter to ensure optimal activation. As suggested by the results of Bikard et al. we designed one promoter with the binding site 91 base pairs upstream of the transcription starting site.triple_helix_4 Because the triple helix construct is a lot smaller compared to the dCas9-RpoZ complex used in the paper, we additionally designed variants with the binding site closer to the transcription starting site, being 71 and 61 base pairs upstream of it, respectively.triple_helix_6

Regulation system


We designed our second experiment in order to further explore the possibilities that the modularity of this approach provides. We wanted a reporter gene (superfold GFP) to be activated or repressed with the addition of lactose or L-arabinose respectively (see Figure 3). SoxS was used as the activator. We chose the KRAB protein as a repressor. Although KRAB is to date only described as a transcription factor in mammals, we thought it would be worth a try, since there are no known repressors in E. coli that influence a promoter in the same manner as SoxS. KRAB is a big protein that should be able to disrupt the RNA-polymerase from getting to the promoter.

KRAB was fused to the lambdaN RNA binding protein and cloned into a “level 1” plasmid together with the SoxS-MS2 fusion protein. Because they were cloned into the same cassette, they were expressed under the same strong promoter. To separate the two transcription factors, a stop codon was placed behind SoxS and an RBS was placed in front of lambdaN. In the reporter plasmid, a degradation tag was added behind the superfold GFP and a medium-strength Anderson promoter (PJ23106) was used.These modifications were made to obtain a fast visible response, when the superfold GFP is activated or repressed.

For this experiment, two RNAs that can form triple helices are needed. One includes the MS2 hairpin, the other one includes the BoxB hairpin which LambdaN can bind. RNA with MS2 hairpin is under control of the araC pBad promoter and dependent on L-arabinose. (BBa_K808000) The expression of the RNA with BoxB is regulated by the lac promoter (BBa_R0010) and can be induced by allolactose. Both RNAs include identical triple helix forming sequences. As in the first experiment, the RNA coding sequence is surrounded by the hammerhead ribozymes. The expression cassette with the transcription factors and the two cassettes with the triple helices are cloned into one “level 2” plasmid. Cells are co-transformed with this “level 2” plasmid and the reporter plasmid. This experiment can be performed with different triple helices, and different distances of the RNA binding site from the transcription start site.

Figure 3: Schematic representation of the second experiment
A - when L-arabinose is present in the cell, the araC pBad promoter is triggered. Therefore the RNA containing MS2 hairpin is transcribed. It forms a triple helix in front of the promoter of superfold GFP and attracts the SoxS activator through the interaction of the MS2 and the MS2 hairpin. SoxS causes an increase in superfold GFP expression.
B - when allolactose is present in the cell, the lac promoter is triggered. The RNA containing BoxB is transcribed and forms a triple helix in front of the promoter of the superfold GFP. KRAB repressor is attracted through interaction of BoxB and lambdaN. KRAB represses the expression of superfold GFP.

Due to COVID19 we had access to a laboratory only for a very limited time. Therefore we were unfortunately not able to get any results from these experiments.


Measurement


The fluorescence intensity can be measured in a plate reader. It would indicate how well the superfold GFP is expressed. The changes in the level of fluorescence allow detection of activation and repression, and comparison of their efficacies. By comparing results of the measurements with a positive control (expression of the superfold GFP under a constitutive promoter) and a negative control (not transformed cells) we could determine whether the RNA forms a triple helix with the DNA. Additionally we could detect which distance from the transcription starting site is optimal. Furthermore, the measure the speed of our system with two promoters (second experiment) responding to the addition of allolactose or L-arabinose could be measured . We could also check how the cells would react if first exposed to one of the substances and then to the other.

Plasmid Maps

All used plasmids are available as a GenBank file: link

References