RNAlphABA

Introduction

RNAlpha is an efficient and predictable polycistronic expression system that we wish to construct this year, partly based on work in [1]. It spaces polycistrons by CRISPR palindromic sequences (hairpins) and cleaved them by Csy4 cleavage after transcription so that each cistron could be translated individually. In prokaryotes, the addition of CRISPR hairpins and Csy4 effectively eliminates the effect of upstream and downstream UTRs on gene expression and results in higher and more predictable expression levels than the usual polycistrons.

In eukaryotes, there are so many uncertainties in this system. Since for eukaryotic mRNAs, the 5' m7G cap and the 3' tail have a strong influence on mRNA activity and stability, processing mRNAs by Csy4 alone may decrease it. From the experiments of Erin et al. in eukaryotic cells [2], we know that Csy4 can still function in eukaryotes. But the 5'UTR cleavage leads to mRNA degradation, resulting in a 20-40-fold decrease in expression, while only a modest decrease in expression (about 2.5-fold) is observed for 3'UTR cleavage.

There are still many unknowns about this RNAlpha system. Thus, we hope to determine the feasibility of this system through integrated transcription-translation modeling and further simulate the whole system in dry lab.

Prokaryotic Single Cistron Integrated Transcription-Translation Model

First, we wish to simulate the mechanism of RNAlpha in prokaryotes and explore why the RNAlpha system is viable in prokaryotes. Before proceeding with the exploration, let’s describe the basic transcription and translation process in prokaryotes. The process is shown in the figure below:

Fig 1. Translation and transcription process of prokaryotic single cistron[3].

For a certain protein, we can describe its transcription, translation, and maturation based on The Michaelis-Menten equation [3].

Transcription Model For Single Cistron In Prokaryotes

For the transcription process, we are interested in the dynamics of mRNA most. We can describe this by the following formula: $$\frac{d[mRNA]}{dt}=v_{mSyn}-v_{mDeg}\tag{2.1}$$ Where $[mRNA]$ represents the concentration of mRNA in the system, while $V_{mSyn}$ and $V_{mDeg}$ represent the rates of mRNA synthesis and degradation in the cell, respectively. We assume that the concentration of RNA polymerase in the system is constant. According to the Michalis-Menten equation (see ABA Modelling section), $V_{mSyn}$ and $V_{mDeg}$ can be expressed as follows:

$$v_{mSyn}=\frac{k_{cat,ms}[E_{\sigma}][Pro]}{K_{M,\sigma}+[E_{\sigma}]}\tag{2.2}$$ $$v_{mDeg}=\frac{k_{cat,md}[Rnase][mRNA]}{K_{M,d}+[mRNA]}\tag{2.3}$$ The $k_{cat,ms}$，$k_{cat,md}$ represent the number of molecules converted by each polymerase or degradation enzyme per unit time during RNA synthesis and degradation. $E_{\sigma}$ represents the whole RNA polymerase incorporating with the $\sigma$ factor. $Pro$ represents the promoter, and $Rnase$ represents the RNA degradation enzyme. $K_{M,\sigma}$ and $K_{M,d}$ represent the $K_M$ values of the polymerase and degradation enzyme in turn. Where $k_{cat,ms}, K_{M,\sigma}$ both correlate with the intensity of promoter.

In our modeling, $k_{cat,md}[Rnase]$ can be approximated by a constant , written as $k_{d,m}$. Therefore, Equation (2.3) can be rewritten as follows: $$v_{mDeg}=\frac{k_{d,m}[mRNA]}{K_{M,d}+[mRNA]}\tag{2.4}$$ In fact, determining the amount of $\sigma$ factor bound with RNA polymerase is very inconvenient. Also, we know in prokaryotes, the $\sigma$ factor forms a complex with the core enzyme only when transcription is turned on, which is: $$[\sigma] = [\sigma_f]+[E_{\sigma}]\tag{2.5}$$ where $\sigma$ represents all the $\sigma$ factor while $\sigma_f$ represents the free $\sigma$ factor. What's more, we know that the free factor will bind to the core enzyme as a whole enzyme, denote as: $$\sigma_{f} + E_0 \mathop{\rightleftharpoons}\limits_{K_{\sigma}} E_{\sigma}\tag{2.6}$$ $K_{sigma}$ denotes the dissociation constant of the whole enzyme.Thus the $E_{sigma}$ concentration can be expressed as: $$[E_{\sigma}]=\frac{[E_0][\sigma_{f}]}{K_\sigma}\tag{2.7}$$ Since it is convenient for us to determine the total amount of enzyme, according to Equation(2.5) yields: $$[E_{\sigma}] = \frac{[E_0][\sigma]}{K_\sigma+[E_0]} \tag{2.8}$$ Since we assume that the total amount of RNA polymerase core enzyme in the system is constant, there are three forms of RNA polymerase core enzyme in the system, which are noted as $E_0$ (free state), $E_{\sigma}$(forms a complex with sigma factor), and $E_b$ (binds to promoter or gene and is undergoing transcription). $$[E_t] = [E_{\sigma}]+[E_0]+[E_b] \tag{2.9}$$ $[E_t]$ represents the total concentration of the core enzyme in the system. The concentration of $E_b$ can be expressed as $$[E_b] = \frac{[E_{\sigma}][Pro]}{K_{M,\sigma}+[E_{\sigma}]}(1+k_{cat,ms}\frac{L_m}{C_m}) \tag{2.10}$$ Where $L_m$ refers to the nucleic acid length and $C_m$ represents the transcription rate. This formula can be divided into two parts, where the first half (the non-bracketed part) refers to the probability of the whole enzyme binding to the promoter; and the second half (the bracketed part) indicates the number of RNA polymerases bound in the case of gene saturation [5]. Substituting Equation (2.5) into the above equation again, we get $$[E_b] = \frac{[E_{0}][\sigma][Pro]}{K_{M,\sigma} (K_{\sigma}+[E_0])+[E_0][\sigma]} (1+k_{cat,ms}\frac{L_m}{C_m}) \tag{2.11}$$ Thus, the $E_t$ concentration can also be expressed as $$[E_t] = [E_0]+\frac{[E_0][\sigma]}{K_\sigma+[E_0]} + \\ \frac{[E_{0}][\sigma][Pro]}{K_{M,\sigma} (K_{\sigma}+[E_0])+[E_0][\sigma]} (1+k_{cat,ms}\frac{L_m}{C_m}) \tag{2.12}$$ Combining Equation(2.2), (2.4), (2.8), we can rewrite Equation (2.1) as $$\frac{d[mRNA]}{dt}=\frac{k_{cat,ms}[E_0][\sigma][Pro]} {K_{M,\sigma}(K_{\sigma}+[E_0])+[E_0][\sigma]} -\frac{k_{d,m}[mRNA]}{K_{M,d}+[mRNA]} \tag{2.13}$$

Translation models for single cistron in prokaryotes

For the translation process, we are interested in the dynamic change of protein concentration most. In this model, we consider that there are two states of protein: inactivation and maturation. With reference to the model in the transcription section, the changes in the concentration of them can be written as: $$\frac{d[P_{imm}]}{dt} = \frac{k_{cat,ps}[mRNA][R_0]} {K_{M,R}+[R_0]}-k_{mat}[P_{imm}] \tag{2.14}$$ $$\frac{d[P_{mat}]}{dt} = k_{mat}[P_{imm}] - k_{deg}[P_{mat}] \tag{2.15}$$ Where $k_{cat,ps}$,$K_{M,R}$ represent the kinetic parameters of protein synthesis, we consider it to be related to the untranslated region (UTR). $P_{imm}$ and $P_{mat}$ represent the immature and mature proteins, while $R_0$ represents the free ribosome. $k_{mat}$, $k_{deg}$represents the rate of protein maturation and degradation, and we consider the parameters of protein maturation and degradation to be constant in prokaryotes.

Here we can still assume that there are both free and bound states of the ribosome and that the number of ribosome-sized subunits in the default cell system is equal and constant. This is expressed as: $$[R_t] = [R_0] +[R_b] \tag{2.16}$$ In the above equation, $R_t$ represents the total ribosome and $R_b$ represents the ribosome bound to the mRNA which is being translated. Referring to Equation (2.10), the $R_b$ concentration can be expressed as: $$[R_b] = \frac{[R_0][mRNA]}{K_{M,R}+[R_0]} (1+k_{cat,ps}\frac{L_m}{C_p}) \tag{2.17}$$ Where $C_p$ denotes the translation rate, while equation (2.16) can be rewritten as: $$[R_t] = [R_0] +\frac{[R_0][mRNA]}{K_{M,R}+[R_0]} (1+k_{cat,ps}\frac{L_m}{C_p}) \tag{2.18}$$ The above models provide a more accurate description of prokaryotic single cistron expression, but we are still unable to quantify the amount of polycistron expression. For prokaryotic polycistrons, the transcription process is identical to that of a single cistron, but the translation efficiency of polycistrons can be quite different.

Prokaryotic Polycistron Translation Model

A prokaryotic polycistron contains at least a promoter, a 5' untranslated region (5' UTR), two protein coding sequences (CDS), a spacer sequence before the CDS, and a terminator. Provided that the default CDS codons are fully optimized (i.e., the default peptide chain extension rate is constant), we can obtain the relationship of the initial translation rates as for polycistron according to the model proposed by Tian et al [6].

Fig 2. Diagram of Polycistron.

For the first cistron, its initial translation rate $r_1$ is denoted as: $$r_1\propto e^{-\beta \Delta G_1} \tag{2.19}$$ Where $\beta$ represents Boltzmann factor, determined by previous studies to be approximately $0.45\pm 0.05mol/kcal$. $\Delta G_1$ represents the total free energy of the ribosomal 30S subunit bound to the 5'UTR. It consists of five components: $$\Delta G_1 = \Delta G_{mRNA-rRNA} + \Delta G_{spacing} +\Delta G_{start} \\ +\Delta G_{standby} - \Delta G_{mRNA} \tag{2.20}$$

$\Delta G_{mRNA-rRNA}$ is the energy released by the binding of mRNA to rRNA, negative value;
$\Delta G_{spacing}$ represents the energy loss due to the gap between the 16SrRNA binding position and the initiation codon, positive value;
$\Delta G_{start}$ is the free energy released by the pairing of $tRNA^{fMet}$ and the initiation condon, negative value
$\Delta G_{standby}$ which represents the energy loss caused by binding the mRNA standby site to the platform region of the ribosome, positive value;
$\Delta G_{mRNA}$ representing the energy required to unravel the secondary structure of the mRNA, which is a negative value, negative value.

For these free energy parameters, we can take the mRNA sequence as a parameter and compute it using the RNA folding algorithm tool [7].

For the second cis-response, we have to consider two scenarios, as shown in Figure 3. The first circumstance is that the ribosome that finished the translation of the first cistron continues the translation, which is called re-initiation, also known as translation coupling; The second circumstance is called de novo initiation, where the new ribosome participates in the translation of the second cistron.

Fig 3. The process of translating the second cistron[6].

Taking these two cases together into account, the translation initial rate of the second cistron can be expressed as: $$r_2 \propto r_{reini}^{(2)}+e^{-\beta \Delta G_2} \tag{2.21}$$ where $r_{reini}^{(2)}$ denotes the onset rate of second cistron re-initiation, and the rate of second cistron re-initiation can be expressed as: $$r_{reini}^{(2)} = k_P \cdot k_{reini} (d_{1,2}) \cdot e^{-\beta \Delta G_1} \tag{2.22}$$ In the equation above, $k_P$ represents the ratio constant between the rate of ribosome assembly and the initial rate of translation. $k_{reini}$ depends on the distance between cistrons, while $d_{i,j}$ represents the distance between the $i$ number and the $j$ number cistron, as $d_{i,j} = x_{j,start} - x_{i,stop} -3$, that $x_{j,start}$ represents the first position of the initiation codon of the $j$ number cistron, and $x_{i,stop}$ represents the first position of the terminates codon of $i$ number cistron. Through experiment, $k_{reini}(d_{(1,2)})$ can be expressed as: $$ \begin{equation} k_{reini}(d_{1,2}) = \left\{ \begin{array}{lll} 0.0072, & 0 \le d_{(1,2)} \le 25 \\ 0.022, & d_{(1,2)} = -4 \\ 0.0004 d + 0.0112, & -25 \le d_{(1,2)} < -10 \end{array} \right. \nonumber \tag{2.23} \end{equation} $$ For the initial translation rate of the second cistron de novo pathway, we calculate it using an improved free energy formula. This is due to the fact that during the translation extension of the first cistron, the second cistron may automatically unfold. Therefore the translation rate of the first cistron may change the translation start rate of the second cistron. Thus $\Delta G_2$ is expressed as: $$\Delta G_2 = \Delta G_{mRNA-rRNA} + \Delta G_{spacing} +\Delta G_{start} +\Delta G_{standby} \\ - \Delta G_{noncou} - \Delta G_{cou}F_{cou} \tag{2.24}$$ Where $\Delta G_{cou}$ is used to quantify the unfolding free energy of the secondary structure overlapping the first CDS sequence, and $\Delta G_{noncou}$ can quantify the unfolding free energy of all other RNA structures. $F_{cou}$ is used to quantify the ability of the extended 70S ribosome to release inhibitory RNA structures, which can be expressed as: $$F_{cou} = \frac{1}{1+C\cdot e^{\beta \Delta G_1}} \tag{2.25}$$ The studies and models above show that in the double cistrons’ case, the translation rate of the first CDS is usually influenced only by its upstream RBS intensity, whereas the translation rate of the second CDS is constrained by the translation rate of the preceding cistron in addition to the upstream RBS influence. This has also been confirmed by the study of Ayelet et al [8], which found that by adjusting the upstream RBS intensity (for about 200 times), due to the translational coupling effects, the expression of downstream genes will be 5 times different from the previous state. Similarly, specific sequences upstream can affect the downstream translational coupling mechanism. The same method can be used for predicting the expression of triple cistrons.

In fact, although there is a model that can predict the relationship between the expression of polycistrons, the model can still be influenced by many factors, including the upstream RBS and the specific sequences anterior and posterior to the genes. The RNAlpha system can effectively reduce the number of influencing factors, making it easier to predict the condition of the polycistronic mRNA expression.

In the following content, we will model the transcriptional expression of the RNAlpha system and compare the similarities and differences between conventional and RNAlpha expression in prokaryotic systems.

Prokaryotic RNAlpha System Model

RNAlpha is a polycistronic expression systems that we hope to construct based on the work in [1]. Qi et al. tested [1] and found that as for prokaryotic bi-cistrons, the expression of the upstream gene is linearly related to single cistrons, while downstream gene expression is highly variable. Whereas for RNA-processed cistrons, upstream gene expression is almost identical to single cistron, and the downstream gene expression is lower to a certain degree, but still linearly correlated with single cistron expression. We speculate that this is due to the limited cleavage efficiency of Csy4. This means that the system largely reduces the influence of upstream and downstream sequences as well as the effect of RBS on expression, making the results highly predictable.

Fig 4. Bi-cistron expression comparison chart[1].

For the modeling of RNAlpha in the prokaryotes, we focus on both Csy4 and the transcription and translation process of the polycistron system. Here we take the case of double cistron into account, as shown in Figure 5.

Fig 5. RNAlpha model in prokaryotes.

According to Equation (2.11), the central enzyme concentration involved in the transcription of a given mRNA should be related to its length, like: $$[E_{b,bic}] = \frac{[E_{0}][\sigma][Pro_1]}{K_{M,\sigma,1}(K_{\sigma}+[E_0])+[E_{0}][\sigma]}(1+k_{cat,ms,1}\frac{L_{m,bic}}{C_m}) \tag{2.26}$$ $$[E_{b,csy4}] = \frac{[E_{0}][\sigma][Pro_2]}{K_{M,\sigma,2}(K_{\sigma}+[E_0])+[E_0][\sigma]}(1+k_{cat,ms,2}\frac{L_{m,csy4}}{C_m}) \tag{2.27}$$ $E_{b,bic}$ and $E_{b,csy4}$ represent RNA polymerases bound to the bi-cistrons and csy4 sequences. Respectively, while $K_{M,\sigma,1}$ and $k_{cat,ms,1}$ are kinetic parameters corresponding to promoter 1, $K_{M,\sigma,2}$ and $k_{cat,ms,2}$ are kinetic parameters corresponding to promoter 2. The $E_t$ concentration in equation (2.12) can be rewritten as: $$ \begin{equation} \begin{aligned} \lbrack E_t \rbrack = [E_0]+\frac{[E_0][\sigma]}{K_\sigma+[E_0]} \\ +\frac{[E_{0}][\sigma][Pro_1]}{K_{M,\sigma,1}(K_{\sigma}+[E_0])+[E_{0}][\sigma]}(1+k_{cat,ms,1}\frac{L_{m,bic}}{C_m}) \\ + \frac{[E_{0}][\sigma][Pro_2]}{K_{M,\sigma,2}(K_{\sigma}+[E_0])+[E_0][\sigma]}(1+k_{cat,ms,2}\frac{L_{m,csy4}}{C_m}) \end{aligned} \nonumber \tag{2.28} \end{equation} $$ From this equation, we can state that the mRNA accumulation rate of csy4 is: $$\frac{d[mRNA_{csy4}]}{dt}=\frac{k_{cat,ms,2}[E_0][\sigma][Pro_2]} {K_{M,\sigma,2}(K_{\sigma}+[E_0])+[E_0][\sigma]}\\ -\frac{k_{d,m}[mRNA_{csy4}]}{K_{M,d}+[mRNA_{csy4}]} \tag{2.29}$$ Ignoring Csy4 cleavage, we consider the overall mRNA accumulation rate of bi-cistrons to be: $$\frac{d[mRNA_{bic}]}{dt}=\frac{k_{cat,ms,1}[E_0][\sigma][Pro_1]} {K_{M,\sigma,1}(K_{\sigma}+[E_0])+[E_0][\sigma]}\\ -\frac{k_{d,m}[mRNA_{bic}]}{K_{M,d}+[mRNA_{bic}]} \tag{2.30}$$ According to Equations(2.14-2.15), we can derive the rate of accumulation of immature and mature Csy4 protein as: $$\frac{d[Csy4]}{dt} = \frac{k_{cat,ps,c}[mRNA_{csy4}][R_0]} {K_{M,R,c}+[R_0]} - k_{deg}[Csy4] \tag{2.31}$$ Some parts of mature Csy4 forms a complex with mRNA during cleavage, and it can be indicated as: $$mRNA_{bic} + Csy4 \mathop{\rightleftharpoons}\limits^{k_{c1}}\limits_{k_{c-1}} ES_{csy4} \xrightarrow{k_{c2}} Csy4 +mRNA_{gen1} +mRNA_{gen2} \tag{2.32}$$ Therefore, the two mRNA concentration changes produced by cleavage can be represented as: $$ \frac{d[mRNA_{gen1}]}{dt} = \frac{d[mRNA_{gen2}]}{dt} = \frac{k_{c2}[mRNA_{bic}][Csy4]} {K_{M,Csy4}}\\ -\frac{k_{d,m}[mRNA_{gen1}]}{K_{M,d}+[mRNA_{gen1}]} \tag{2.33}$$ Here, we define the parameters $k_{c2}$ and $K_{M,Csy4}$, which in turn are the $k_{cat}$ and $K_M$ values of Csy4 protein. Since the bi-cistron is cleaved, equations (2.30) and (2.31) also need to be modified as follows: $$ \begin{equation} \begin{aligned} \frac{d[mRNA_{bic}]}{dt} =\frac{k_{cat,ms,1}[E_0][\sigma][Pro_1]} {K_{M,\sigma,1}(K_{\sigma}+[E_0])+[E_0][\sigma]} -\frac{k_{d,m}[mRNA_{bic}]}{K_{M,d}+[mRNA_{bic}]} \\+\frac{k_{c-1}[mRNA_{bic}][Csy4]} {K_{M,Csy4}} - k_{c1}[mRNA_{bic}][Csy4] \end{aligned} \nonumber \tag{2.34} \end{equation} $$ Thus changes in Csy4 concentrations can also be corrected as: $$ \begin{equation} \begin{aligned} \frac{d[Csy4]}{dt} = \frac{k_{cat,ps,c}[mRNA_{csy4}][R_0]} {K_{M,R,c}+[R_0]} - k_{deg}[Csy4] \\ +\frac{(k_{c-1}+k_{c2})[mRNA_{bic}][Csy4]} {K_{M,Csy4}} - k_{c1}[mRNA_{bic}][Csy4] \end{aligned} \nonumber \tag{2.35} \end{equation} $$ As for protein synthesis, we consider that the kinetics of the cleaved cistrons are consistent with the single cistron, and its $k_{cat}$ won’t be affected by RBS, while the rate of protein synthesis of the uncleaved polycistrons is consistent with the polycistron translation rule. Since we obtained energy-based upstream and downstream polycistron expression relationships (Equations. 3.19-3.25), we can continue to model the system based on the M-M equation. For the CRISPR hairpin structure greatly increases the energy to unfold the secondary structure, we consider that the uncleaved polycistrons express mainly the first cistronic protein while the second one is expressed at a very low level, which is also consistent with the model in [1]. Thus, the protein expression of genes 1, 2 can be expressed as: $$\frac{d[P_{1,imm}]}{dt} = \frac{k_{cat,ps} [mRNA_{gen1}][R_0]}{K_{M,R}+[R_0]} \\ +\frac{k_{cat,ps,1} [mRNA_{bic}][R_0]}{K_{M,R,1}+[R_0]} - k_{mat}[P_{1,imm}] \tag{2.36}$$ $$\frac{d[P_{2,imm}]}{dt} = \frac{k_{cat,ps} [mRNA_{gen1}][R_0]}{K_{M,R}+[R_0]} \\ +\frac{k_{cat,ps,2} [mRNA_{bic}][R_0]}{K_{M,R,2}+[R_0]} - k_{mat}[P_{2,imm}] \tag{2.37}$$ $$\frac{d[P_{1,mat}]}{dt} = k_{mat}[P_{1,imm}] -k_{deg}[P_{1,mat}] \tag{2.38}$$ $$\frac{d[P_{2,mat}]}{dt} = k_{mat}[P_{2,imm}] -k_{deg}[P_{2,mat}] \tag{2.39}$$ Among them, $k_{cat,ps,1}$ and $K_{M,R,1}$ can be correlated with the rate of protein synthesis. In addition, the concentration of ribosomes bound to each mRNA can be expressed sequentially as is shown below: $$[R_{b,bic}] = \frac{[R_0][mRNA_{bic}]}{K_{M,R,1}+[R_0]} (1+k_{cat,ps,1}\frac{L_{m,bic}}{C_p}) \tag{2.40}$$ $$[R_{b,gen1}] = \frac{[R_0][mRNA_{gen1}]}{K_{M,R}+[R_0]} (1+k_{cat,ps}\frac{L_{m,gen1}}{C_p}) \tag{2.41}$$ $$[R_{b,gen2}] = \frac{[R_0][mRNA_{gen2}]}{K_{M,R}+[R_0]} (1+k_{cat,ps}\frac{L_{m,gen2}}{C_p}) \tag{2.42}$$ $$[R_{b,csy4}] = \frac{[R_0][mRNA_{csy4}]}{K_{M,R,c}+[R_0]} (1+k_{cat,ps,c}\frac{L_{m,csy4}}{C_p}) \tag{2.43}$$ $$ \begin{equation} \begin{aligned} \lbrack R_t \rbrack = [R_0] + \frac{[R_0][mRNA_{bic}]}{K_{M,R,1}+[R_0]} (1+k_{cat,ps,1}\frac{L_{m,bic}}{C_p}) \\ + \frac{[R_0][mRNA_{gen1}]}{K_{M,R}+[R_0]} (1+k_{cat,ps}\frac{L_{m,gen1}}{C_p})\\ + \frac{[R_0][mRNA_{gen2}]}{K_{M,R}+[R_0]} (1+k_{cat,ps}\frac{L_{m,gen2}}{C_p}) \\ +\frac{[R_0][mRNA_{csy4}]}{K_{M,R,c}+[R_0]} (1+k_{cat,ps,c}\frac{L_{m,csy4}}{C_p}) \end{aligned} \nonumber \tag{2.44} \end{equation} $$ Having known the total core enzyme and ribosome amounts, we can solve the differential equations above to simulate changes in mRNA and protein molecular concentrations to better understand our model.

Prokaryotic RNAlpha Model Solving

Next, we substituted the data from previous studies to simulate the gene expression of the two models. Here we model the first two CDS of the cistron with deGFP and deRFP. The upstream promoter of the polycistron is P70d and the one for Csy4 is P70b. All three RBS used were determined using RBS1,2,3 obtained from [3]. For RNAlpha, the equations to be solved are rewritten as follows: $$\frac{d[mRNA_{csy4}]}{dt}=\frac{k_{cat,ms,b}[E_0][\sigma][P70b]} {K_{M,\sigma,b}(K_{\sigma}+[E_0])+[E_0][\sigma]} -\frac{k_{d,m}[mRNA_{csy4}]}{K_{M,d}+[mRNA_{csy4}]} $$ \begin{equation} \begin{aligned} \frac{d[mRNA_{bic}]}{dt} &=\frac{k_{cat,ms,a}[E_0][\sigma][P70d]} {K_{M,\sigma,a}(K_{\sigma}+[E_0])+[E_0][\sigma]} -\frac{k_{d,m}[mRNA_{bic}]}{K_{M,d}+[mRNA_{bic}]}\\ &+\frac{k_{c-1}[mRNA_{bic}][Csy4]} {K_{M,Csy4}} - k_{c1}[mRNA_{bic}][Csy4] \end{aligned} \nonumber \end{equation} $$ \frac{d[mRNA_{gfp}]}{dt} = \frac{k_{c2}[mRNA_{bic}][Csy4]} {K_{M,Csy4}} -\frac{k_{d,m}[mRNA_{gfp}]}{K_{M,d}+[mRNA_{gfp}]} $$ $$ \frac{d[mRNA_{rfp}]}{dt} = \frac{k_{c2}[mRNA_{bic}][Csy4]} {K_{M,Csy4}} -\frac{k_{d,m}[mRNA_{rfp}]}{K_{M,d}+[mRNA_{rfp}]} $$ \begin{equation} \begin{aligned} \frac{d[Csy4]}{dt} = \frac{k_{cat,ps,3}[mRNA_{csy4}][R_0]} {K_{M,R,3}+[R_0]} - k_{deg}[Csy4] \\ +\frac{(k_{c-1}+k_{c2})[mRNA_{bic}][Csy4]} {K_{M,Csy4}} - k_{c1}[mRNA_{bic}][Csy4] \end{aligned} \nonumber \end{equation} $$\frac{d[deGFP_{imm}]}{dt} = \frac{k_{cat,ps} [mRNA_{gfp}][R_0]}{K_{M,R}+[R_0]}\\ +\frac{k_{cat,ps,1} [mRNA_{bic}][R_0]}{K_{M,R,1}+[R_0]} - k_{mat}[deGFP_{imm}] $$ $$\frac{d[deRFP_{imm}]}{dt} = \frac{k_{cat,ps} [mRNA_{rfp}][R_0]}{K_{M,R}+[R_0]}\\ +\frac{k_{cat,ps,2} [mRNA_{bic}][R_0]}{K_{M,R,2}+[R_0]} - k_{mat}[deRFP_{imm}]$$ $$\frac{d[deGFP_{mat}]}{dt} = k_{mat}[deGFP_{imm}] -k_{deg}[deGFP_{mat}]$$ $$\frac{d[deRFP_{mat}]}{dt} = k_{mat}[deRFP_{imm}] -k_{deg}[deRFP_{mat}]$$ \begin{equation} \begin{aligned} \lbrack E_t \rbrack = [E_0]+\frac{[E_0][\sigma]}{K_\sigma+[E_0]} +\frac{[E_{0}][\sigma][P70d]}{K_{M,\sigma,a}(K_{\sigma}+[E_0])+[E_{0}][\sigma]}(1+k_{cat,ms,d}\frac{L_{m,bic}}{C_m}) \\ + \frac{[E_{0}][\sigma][P70b]}{K_{M,\sigma,b}(K_{\sigma}+[E_0])+[E_0][\sigma]}(1+k_{cat,ms,b}\frac{L_{m,csy4}}{C_m}) \end{aligned} \nonumber \end{equation} $$ \begin{equation} \begin{aligned} \lbrack R_t \rbrack = [R_0] + \frac{[R_0][mRNA_{bic}]}{K_{M,R,1}+[R_0]} (1+k_{cat,ps,1}\frac{L_{m,rfp}}{C_p})\\ + \frac{[R_0][mRNA_{gfp}]}{K_{M,R}+[R_0]} (1+k_{cat,ps}\frac{L_{m,gfp}}{C_p})\\ +\frac{[R_0][mRNA_{rfp}]}{K_{M,R}+[R_0]} (1+k_{cat,ps}\frac{L_{m,rfp}}{C_p})\\ +\frac{[R_0][mRNA_{csy4}]}{K_{M,R,3}+[R_0]} (1+k_{cat,ps,3}\frac{L_{m,csy4}}{C_p}) \end{aligned} \nonumber \tag{2.45} \end{equation} $$ Based on the experiments in the literature and some reasonable hypotheses, we can obtain the parameters as shown in Table 1.

Table 1. Parameters List for Prokayotic RNAlpha Per Cell

Parameters	Value	Units	Reference
$k_{cat,ms,d}$	0.014	$s^{-1}$	[9]
$k_{cat,ms,b}$	0.012	$s^{-1}$	[3]
$K_{M,\sigma,d}$	1	$nM$	[3]
$K_{M,\sigma,b}$	1	$nM$	[3]
$k_{d,m}$	6.6	$nM/s$	[3]
$K_{M,d}$	8000	$nM$	[3]
$K_{\sigma}$	0.26	$nM$	[3]
$k_{cat,ps}$	0.006	$s^{-1}$	Set
$k_{cat,ps,1}$	0.006	$s^{-1}$	[3]
$k_{cat,ps,2}$	0.0008	$s^{-1}$	Set
$k_{cat,ps,3}$	0.0015	$s^{-1}$	[9]
$k_{c1}$	0.002	$s^{-1}$	Set
$k_{c-1}$	0	$s^{-1}$	Set
$k_{c2}$	0.1	$s^{-1}$	Set
$K_{M,R}$	10	$nM$	Set
$K_{M,R,1}$	10	$nM$	[3]
$K_{M,R,2}$	10	$nM$	[3]
$K_{M,R,3}$	10	$nM$	[3]
$K_{M,Csy4}$	50	$nM$	Set
$k_{mat}$	0.0007	$s^{-1}$	[3]
$k_{deg}$	0.00005	$s^{-1}$	Set
$L_{m,bic}$	1600	$bp$	[9]
$L_{m,gfp}$	800	$bp$	[9]
$L_{m,rfp}$	800	$bp$	[9]
$L_{m,csy4}$	600	$bp$	[1]
$C_{m}$	10	$bp/s$	[3]
$C_{p}$	2.5	$bp/s$	[3]
$[P70d]$	5	$nM$	[3]
$[P70b]$	5	$nM$	[3]
$[\sigma]$	30	$bp/s$	[3]
$[E_t]$	400	$bp/s$	[3]
$[R_t]$	1100	$bp/s$	[3]

We plot the concentration of cistron proteins assuming no restriction of energy supply, also with constant concentrations of DNA, ribosomes, and enzymes involved in transcription-translation in the system, and the results are shown in Figure 6.

Fig 6. Concentrations Of Molecules In Prokaryotic RNAlpha.

We can observe a rapid accumulation of $deGFP_{imm}$ at first due to a large number of transcription and translation of the polycistron mRNA, and a small amount of $deRFP_{imm}$ accumulates at the same time. After a short period of time, the polycistron mRNA is cleaved into two mRNAs, and due to the high degradation rate, the two single cistron mRNAs maintain a lower concentration but had a high translation efficiency. After about 2h, the immature deGFP and deRFP concentrations were close to a steady-state, while the mature deGFP and deRFP concentrations showed a linear growing trend. The above modeling results indicate that our RNAlpha system is theoretically feasible and efficient in prokaryotes and can maintain the proportional expression of both two cistrons.

Eukaryotic Integrated Transcription and Translation Model

For eukaryotic transcription-translation system, there are undoubtedly more elements to consider than in prokaryotic one. For example, the involvement of many protein factors, the influence of the nuclear membrane on the system, and so on. In addition, since the eukaryotic RNAlpha will have a greater affection on the transcription and translation process of the system, it is necessary to introduce more precise models to describe the eukaryotic transcription and translation process.

For eukaryotic organisms, mRNA needs to be processed before it can be transported to the cytoplasm for translation, and there is a certain amount of degradation during the processing. Since the RNAlpha system processes mRNAs outside the nucleus, it is necessary for us to model changes in mRNA concentrations in the nucleus and outside the nucleus separately. Under this regard, we represent the processed mRNA in the nucleus as $mRNA_{nuc}$, while the processed and successfully transported to the outside of the nucleus is $mRNA{cyt}$ . Similar to the original nucleus, the changes of $mRNA_{nuc}$ concentration can be represented as follows: $$\frac{d[mRNA_{nuc}]}{dt} = v_{TC} -v_{mTran}\tag{2.46}$$ Where $v_{TC}$ and $v_{tran}$ indicate the rate of transcribed mature mRNA and mRNA transport out of the nucleus, respectively. While the $mRNA_{cyt}$ concentration changes as: $$\frac{d[mRNA_{cyt}]}{dt} = v_{mTran} -v_{mDeg}\tag{2.47}$$ Where $V_{mDeg}$ indicates the rate of mRNA degradation.

Fig 7. Schematic representation of the eukaryotic transcription process[10].

Since the process of transcription and translation in eukaryotic systems involves many protein factors compared to prokaryotic systems, to facilitate modeling we no longer model the transcription rate based on the concentration of each factor in the system, but instead directly consider the transcription rate as a constant. Referring to our previous work [11], the transcription rate can be expressed as: $$ v_{TC} = f_{mat} \cdot k_{TC} \cdot F \tag{2.48}$$ $k_{TC}$ indicates the corresponding transcription start rate under the normal condition of the promoter, $F$ indicates the coefficient of the impact of transcription factors on the transcription rate, $f_{mat}$ indicates the proportion of transcribed mRNA maturation, and the probability of pre-mRNA maturation is usually only 5-10% [10]. $F$ expresses as: $$ \begin{equation} F = \left\{ \begin{array}{ll} \theta, & activator \\ 1- \theta, & repressor \\ \end{array} \right. \nonumber \tag{2.49} \end{equation} $$ According to the Hill equation, the ligand-bound receptor protein concentration fraction $\theta$ can be expressed by the ligand concentration $[L]$, the dissociation constant $k_d$, and the Hill coefficient $n$. $$\theta = \frac{[L]^n}{K_d^n+[L]^n} \tag{2.50}$$ Thus, when the transcription factor acts as an activator, $v_{TC}$ is indicated as: $$v_{TC} = f_{mat} \cdot k_{TC} \frac{[acti]^n}{K^n_A +[acti]^n} =\frac{f_{mat} k_{TC}}{1+(\frac{K_d}{[acti]})^n} \tag{2.51}$$ When transcription factors act as inhibitors, $v_{TC}$ is indicated as: $$v_{TC} = f_{mat}\cdot k_{TC}(1-\frac{[repr]^n}{K^n_A +[repr]^n}) =\frac{f_{mat} k_{TC}}{1+(\frac{[repr]}{K_d})^n} \tag{2.52}$$ For the transportation of mRNA inside the nucleus, it is exported to the cytoplasm in the form of RNP through the nuclear pore; this process is expressed as: $$v_{mTran} = k_{Tran}[mRNA_{nuc}] \tag{2.53}$$ Where $k_{Tran}$ indicates the rate of mRNA transportation to the outside of the nucleus. For mRNA degradation, eukaryotes include $5'\rightarrow 3'$ and $3'\rightarrow 5'$ two degradation pathways. The former one is first decapping by enzyme DCP2, then digested by an exonuclease; The latter one is degradated by the exosome complex [12]. Therefore, we consider that the degradation processing rate is: $$v_{mDeg} = (\delta_{5d}+\delta_{3d})[mRNA_{cyt}] \tag{2.54}$$ where $\delta_{5d}$,$\delta_{3d}$ denotes the rate of mRNA degradation starting from the 5' and 3' end in turn. In summary, the dynamic changes of mRNA can be expressed as follows: \begin{equation} \frac{d[mRNA_{nuc}]}{dt} = \left\{ \begin{array}{ll} \frac{f_{mat} k_{TC}}{1+(\frac{K_d}{[acti]})^n} -k_{Tran}[mRNA_{nuc}], & activator \\ \frac{f_{mat} k_{TC}}{1+(\frac{[repr]}{K_d})^n} -k_{Tran}[mRNA_{nuc}], & repressor \\ \end{array} \right. \nonumber \tag{2.55} \end{equation} $$\frac{d[mRNA_{cyt}]}{dt} = k_{Tran}[mRNA_{nuc}] -(k_{5d}+k_{3d})[mRNA_{cyt}]\tag{2.56}$$ For proteins, due to the prevalence of protein processing after translation in eukaryotes, we consider about their mature and immature states. Their dynamics changes are expressed as： $$\frac{d[P_{imm}]}{dt} = k_{TL}[mRNA_{cyt}] -k_{mat}[P_{imm}] \tag{2.57}$$ $$\frac{d[P_{mat}]}{dt} = k_{mat}[P_{imm}] - \beta[P_{mat}] \tag{2.58}$$

Where $\beta$ represents the rate constant for protein degradation.

Eukaryotic RNAlpha System Model

For eukaryotes, we hope to use a mathematical model to analyze the feasibility of this system. From previous studies, we found that in Saccharomyces cerevisiae, the insertion of a CRISPR hairpin structure at the 5' end lowered the protein expression level by about 20-fold after Csy4 cleavage, while the 3' end insertion only decreased by about 2.5-fold, indicating that the 3' end hairpin structure can largely replace the poly(A) tail [2].

Our system is consistent with the prokaryotic system as shown in Figure 8, except we plan to perform recapping of the cleaved mRNAs in the cytoplasm in order to ensure protein expression. However the 3' hairpin structure can maintain the stability of mRNA, thus there is no need for the tailing process. The complex of two heterologous proteins D1 and D12, which are encoded by cowpox virus, are used in recapping inside the cytoplasm, although they have only been shown to participate in vitro capping. D1 is capable of performing the functions of the three enzymes of nuclear recapping enzymes TPase, MTase, and GTase, whereas D12 has no catalytic activity but stimulates the methyltransferase activity of D1 [13,14]. In addition, we consider that each polycistron is transcribed by a constitutive promoter.

There are still many unknowns about this RNAlpha system. Thus, we hope to determine the feasibility of this system through integrated transcription-translation modeling and further simulate the whole system in dry lab.

Fig 8. RNAlpha Pathway In Eukaryotes.

For dynamic concentration changes of polycistrons, Csy4, and capping enzymes in mRNA, we denote them as: $$\frac{d[mRNA_{bic-nuc}]}{dt}=f_{bic-mat}\cdot k_{TC1} -k_{bic-Tran}[mRNA_{bic-nuc}] \tag{2.59}$$ $$\frac{d[mRNA_{csy4-nuc}]}{dt}=f_{csy4-mat}\cdot k_{TC2} -k_{csy4-Tran}[mRNA_{csy4-nuc}] \tag{2.60}$$ $$\frac{d[mRNA_{cap-nuc}]}{dt}=f_{cap-mat}\cdot k_{TC3} -k_{cap-Tran}[mRNA_{cap-nuc}] \tag{2.61}$$ Consider that proteins in eukaryotic cells need to be processed, so we default that each enzyme has mature and immature state. In the case of Csy4, the mature protein needs to be transported to the nucleus and cleave the polycistron mRNA. Thus the dynamics of Csy4 mRNA and proteins are expressed as: $$\frac{d[mRNA_{csy4-cyt}]}{dt}= k_{csy4-Tran}[mRNA_{csy4-nuc}]\\ - (\delta_{csy4-5d}+\delta_{csy4-3d})[mRNA_{csy4-cyt}] \tag{2.62}$$ $$\frac{d[Csy4_{imm}]}{dt} = k_{TL-csy4}[mRNA_{csy4-cyt}] -k_{csy4-mat}[Csy4_{imm}] \tag{2.63}$$ $$\frac{d[Csy4_{mat}]}{dt} = k_{csy4-mat}[Csy4_{imm}] \\ - \beta_{csy4}[Csy4_{mat}] - T_{csy4}[Csy4_{mat}]\tag{2.64}$$ $$\frac{d[Csy4_{nuc}]}{dt} = T_{csy4}[Csy4_{mat}] - \beta_{csy4'}[Csy4_{mat}] \tag{2.65}$$ where $T_{csy4}$ indicates the efficiency of Csy4 protein translocation into the nucleus, and because of the different concentrations of protein-degrading enzymes in and out of the nucleus, we believe that the rate of Csy4 degradation in the nucleus is $\beta_{csy4'}$. In the case of capping enzymes, we used 2A sequences to maintain the ratio between D1 and D12 in order to obtain a comparable expression of D1 and D12. Here, we believe that the fracture of the 2A sequence can successfully translate D12, while a failed process can only induce the translation of D1. The dynamics of the capping enzyme’s transcription and translation can be represented as: $$\frac{d[mRNA_{cap-cyt}]}{dt}= k_{cap-Tran}[mRNA_{cap-nuc}]\\ - (\delta_{cap-5d}+\delta_{cap-3d})[mRNA_{cap-cyt}] \tag{2.66}$$ $$\frac{d[D1_{imm}]}{dt} = k_{TL-cap}[mRNA_{cap-cyt}] -k_{D1-mat}[D1_{imm}] \tag{2.67}$$ $$\frac{d[D12_{imm}]}{dt} = k_{TL-cap} \cdot e_{2A}[mRNA_{cap-cyt}] -k_{D12-mat}[D12_{imm}] \tag{2.68}$$ Where e2A denotes the 2A cleavage efficiency. Since D1 and D12 need to form a capping enzyme complex in order to be active, we assume the binding dynamics of these two as follows: $$D1_{mat} + D12_{mat} \mathop{\rightleftharpoons}\limits^{k_{Df}}\limits_{k_{Db}} D_{Cap} \tag{2.68}$$ Thus the change in Dcap concentration can be expressed as: $$\frac{d[D_{Cap}]}{dt} = k_{Df}[D1_{mat}][D12_{mat}] -k_{Db}[D_{cap}] \tag{2.69}$$ The concentration changing of free mature D1 and D12 is: $$\frac{d[D1_{mat}]}{dt} = k_{D1-mat}[D1_{imm}] - \beta_{D1}[D1_{mat}]\\ -k_{Df}[D1_{mat}][D12_{mat}] +k_{Db}[D_{cap}] \tag{2.70}$$ $$\frac{d[D12_{mat}]}{dt} = k_{D12-mat}[D12_{imm}] - \beta_{D12}[D12_{mat}]\\ -k_{Df}[D1_{mat}][D12_{mat}] +k_{Db}[D_{cap}] \tag{2.71}$$ Here we think that the double cistrons mRNA in the nucleus needs to be cleaved by Csy4 before it can be transported out of the cell for translation. The number one cistron mRNA is produced directly after cleavage, and the number two cistron mRNA needs to be capped after cleavage for translation. Thus the overall process is expressed as: $$mRNA_{bic-nuc} + Csy4_{nuc} \mathop{\rightleftharpoons}\limits^{k_{c1}}\limits_{k_{c-1}} ES_{csy4} \xrightarrow{k_{c2}} Csy4_{nuc} +mRNA_{1-nuc} +mRNA_{2pre-nuc} \tag{2.72}$$ $$mRNA_{2pre-cyt} + D_{cap} \mathop{\rightleftharpoons}\limits^{k_{d1}}\limits_{k_{d-1}} ES_{cap} \xrightarrow{k_{d2}} D_{cap} +mRNA_{2-cyt} \tag{2.73}$$ In addition, since the 3' end of mRNA1 replaces the poly(A) tail with a hairpin structure, the rate of degradation starting from the 3' end is slightly higher than that of the poly(A)-containing mRNA. Therefore, we set the degradation rate to normal mRNA ratio as $f_{3d}$. On the other hand, we set the degradation rate of $mRNA_{2pre-cyt}$ starting from 5' end and without a cap as a ratio of $f_{5d}$ to normal mRNAs. Furthermore, the efficiency of RFP precursor mRNA transport out of the nucleus is also reduced for mRNAs lacking the cap. We thus consider the mRNA dynamics of several to be expressed as follows: $$\frac{[mRNA_{bic-nuc}]}{dt} = f_{bic-mat}\cdot k_{TC1} - k_{bic-Tran}[mRNA_{bic-nuc}]\\ -\frac{k_{c2}[mRNA_{bic-nuc}][Csy4_{nuc}]}{K_{M,Csy4}} \tag{2.74}$$ $$\frac{d[mRNA_{1-nuc}]}{dt} = \frac{k_{c2}[mRNA_{bic-nuc}][Csy4_{nuc}]}{K_{M,Csy4}}\\ - k_{1-Tran}[mRNA_{1-nuc}] \tag{2.75}$$ $$\frac{d[mRNA_{2pre-nuc}]}{dt} = \frac{k_{c2}[mRNA_{bic-nuc}][Csy4_{nuc}]}{K_{M,Csy4}}\\ - k_{2-Tran}[mRNA_{2pre-nuc}] \tag{2.76}$$ $$\frac{d[mRNA_{1-cyt}]}{dt} = k_{1-Tran}[mRNA_{1-nuc}]\\ - (\delta_{1-5d}+\delta_{1-3d}\cdot f_{3d})[mRNA_{1-cyt}] \tag{2.77}$$ $$ \begin{equation} \begin{aligned} \frac{d[mRNA_{2pre-cyt}]}{dt} = k_{2-Tran}[mRNA_{2pre-nuc}] - \frac{k_{d2}[mRNA_{2pre-cyt}][D_{cap}]}{K_{M,cap}} \\ \cdot (\delta_{2-5d}\cdot f_{5d}+ \delta_{2-3d})[mRNA_{2pre-cyt}] \end{aligned} \nonumber \tag{2.78} \end{equation} $$ $$\frac{d[mRNA_{2-cyt}]}{dt} = \frac{k_{d2}[mRNA_{2pre-cyt}][D_{cap}]}{K_{M,cap}}\\ - (\delta_{2-5d}+\delta_{2-3d})[mRNA_{2-cyt}] \tag{2.79}$$ Similarly, due to the above changes, concentration changes of $Csy4_{nuc}$ and $D_{cap}$ need to be rewritten as: $$ \begin{equation} \begin{aligned} \frac{d[Csy4_{nuc}]}{dt} = T_{csy4}[Csy4_{mat}] - \beta_{csy4'}[Csy4_{mat}] \\ + \frac{(k_{c-1}+k_{c2})[mRNA_{bic-nuc}][Csy4_{nuc}]}{K_{M,csy4}} \\ -k_{c1}[mRNA_{bic-nuc}][Csy4_{nuc}] \end{aligned} \nonumber \tag{2.80} \end{equation} $$ $$ \begin{equation} \begin{aligned} \frac{d[D_{Cap}]}{dt} = k_{Df}[D1_{mat}][D12_{mat}] -k_{Db}[D_{cap}] + \frac{(k_{d-1}+k_{d2})[mRNA_{2pre-cyt}][D_{cap}]}{K_{M,cap}} \\-k_{d1}[mRNA_{2pre-cyt}][D_{cap}] \end{aligned} \nonumber \tag{2.81} \end{equation} $$ Dynamic concentration changes of protein 1, 2 can be expressed as: $$\frac{d[P1_{imm}]}{dt} = k_{TL-p1}[mRNA_{1-cyt}] -k_{p1-mat}[P1_{imm}] \tag{2.82}$$ $$\frac{d[P2_{imm}]}{dt} = k_{TL-p2}[mRNA_{2-cyt}] -k_{p2-mat}[P2_{imm}] \tag{2.83}$$ $$\frac{d[P1_{mat}]}{dt} = k_{p1-mat}[P1_{imm}] - \beta_{p1}[P1_{mat}] \tag{2.84}$$ $$\frac{d[P2_{mat}]}{dt} = k_{p2-mat}[P2_{imm}] - \beta_{p2}[P2_{mat}] \tag{2.85}$$

Eukaryotic RNAlpha Model Solution

Similar to the previous article, we choose the data from real experiments to simulate the performance of RNAlpha. We still make deGFP and deRFP the first and second of the cistrons. The bi-cistron uses a stronger promoter, while Csy4 and capping enzyme use a weaker on upstream, and the strength of RBS is 1>2>4>3. For eukaryotic RNAlpha, the equation to be solved is rewritten as follows: $$\frac{[mRNA_{bic-nuc}]}{dt} = f_{bic-mat}\cdot k_{TC1} - k_{bic-Tran}[mRNA_{bic-nuc}]\\ -\frac{k_{c2}[mRNA_{bic-nuc}][Csy4_{nuc}]}{K_{M,Csy4}} $$ $$\frac{d[mRNA_{csy4-nuc}]}{dt}=f_{csy4-mat}\cdot k_{TC2} -k_{csy4-Tran}[mRNA_{csy4-nuc}]$$ $$\frac{d[mRNA_{cap-nuc}]}{dt}=f_{cap-mat}\cdot k_{TC3} -k_{cap-Tran}[mRNA_{cap-nuc}]$$ $$\frac{d[mRNA_{gfp-nuc}]}{dt} = \frac{k_{c2}[mRNA_{bic-nuc}][Csy4_{nuc}]}{K_{M,Csy4}}\\ - k_{gfp-Tran}[mRNA_{gfp-nuc}] $$ $$\frac{d[mRNA_{rfp,pre-nuc}]}{dt} = \frac{k_{c2}[mRNA_{bic-nuc}][Csy4_{nuc}]}{K_{M,Csy4}}\\ - k_{rfp-Tran}[mRNA_{rfp,pre-nuc}] $$ $$\frac{d[mRNA_{csy4-cyt}]}{dt}= k_{csy4-Tran}[mRNA_{csy4-nuc}]\\ - (\delta_{csy4-5d}+\delta_{csy4-3d})[mRNA_{csy4-cyt}] $$ $$\frac{d[mRNA_{cap-cyt}]}{dt}= k_{cap-Tran}[mRNA_{cap-nuc}] - (\delta_{cap-5d}+\delta_{cap-3d})[mRNA_{cap-cyt}] $$ $$\frac{d[mRNA_{gfp-cyt}]}{dt} = k_{gfp-Tran}[mRNA_{gfp-nuc}]\\ - (\delta_{gfp-5d}+\delta_{gfp-3d}\cdot f_{3d})[mRNA_{gfp-cyt}] $$ \begin{equation} \begin{aligned} \frac{d[mRNA_{rfp,pre-cyt}]}{dt} = k_{rfp-Tran}[mRNA_{rfp,pre-nuc}] - \frac{k_{d2}[mRNA_{rfp,pre-cyt}][D_{cap}]}{K_{M,cap}} \\ -(\delta_{rfp-5d}\cdot f_{5d}+\delta_{rfp-3d})[mRNA_{rfp,pre-cyt}] \end{aligned} \nonumber \end{equation} $$\frac{d[mRNA_{rfp-cyt}]}{dt} = \frac{k_{d2}[mRNA_{rfp,pre-cyt}][D_{cap}]}{K_{M,cap}} - (\delta_{rfp-5d}+\delta_{rfp-3d})[mRNA_{rfp-cyt}] $$ $$\frac{d[Csy4_{imm}]}{dt} = k_{TL-csy4}[mRNA_{csy4-cyt}] -k_{csy4-mat}[Csy4_{imm}] $$ $$\frac{d[Csy4_{mat}]}{dt} = k_{csy4-mat}[Csy4_{imm}] - \beta_{csy4}[Csy4_{mat}] - T_{csy4}[Csy4_{mat}]$$ $$\frac{d[Csy4_{nuc}]}{dt} = T_{csy4}[Csy4_{mat}] - \beta_{csy4'}[Csy4_{mat}] + \frac{(k_{c-1}+k_{c2})[mRNA_{bic-nuc}][Csy4_{nuc}]}{K_{M,csy4}} \\-k_{c1}[mRNA_{bic-nuc}][Csy4_{nuc}] $$ $$\frac{d[D1_{imm}]}{dt} = k_{TL-cap}[mRNA_{cap-cyt}] -k_{D1-mat}[D1_{imm}]$$ $$\frac{d[D12_{imm}]}{dt} = k_{TL-cap} \cdot e_{2A}[mRNA_{cap-cyt}] -k_{D12-mat}[D12_{imm}]$$ $$\frac{d[D1_{mat}]}{dt} = k_{D1-mat}[D1_{imm}] - \beta_{D1}[D1_{mat}] -k_{Df}[D1_{mat}][D12_{mat}] +k_{Db}[D_{cap}]$$ $$\frac{d[D12_{mat}]}{dt} = k_{D12-mat}[D12_{imm}] - \beta_{D12}[D12_{mat}] -k_{Df}[D1_{mat}][D12_{mat}] +k_{Db}[D_{cap}]$$ $$\frac{d[D_{Cap}]}{dt} = k_{Df}[D1_{mat}][D12_{mat}] -k_{Db}[D_{cap}]\\ + \frac{(k_{d-1}+k_{d2})[mRNA_{rfp,pre-cyt}][D_{cap}]}{K_{M,cap}} -k_{d1}[mRNA_{rfp,pre-cyt}][D_{cap}]$$ $$\frac{d[deGFP_{imm}]}{dt} = k_{TL-gfp}[mRNA_{gfp-cyt}] -k_{gfp-mat}[deGFP_{imm}]$$ $$\frac{d[deRFP_{imm}]}{dt} = k_{TL-rfp}[mRNA_{rfp-cyt}] -k_{rfp-mat}[deRFP_{imm}]$$ $$\frac{d[deGFP_{mat}]}{dt} = k_{gfp-mat}[deGFP_{imm}] - \beta_{gfp}[deGFP_{mat}]$$ $$\frac{d[deRFP_{mat}]}{dt} = k_{rfp-mat}[deRFP_{imm}] - \beta_{rfp}[deRFP_{mat}] \tag{2.87}$$

Based on experiments in the literature and calculations from our previous work, we can obtain the parameters shown in Table 2:

Table 2. Parameters List for Eukayotic RNAlpha Per Cell

Parameters	Value	Units	Reference
$f_{bic-mat}$	0.1	-	[10]
$f_{csy4-mat}$	0.1	-	[10]
$f_{cap-mat}$	0.1	-	[10]
$k_{TC1}$	0.2	$nM/s$	[11]
$k_{TC2}$	0.03	$nM/s$	[11]
$k_{TC3}$	0.5	$nM/s$	[11]
$k_{bic-Tran}$	0.2	$s^{-1}$	Set
$k_{csy4-Tran}$	0.3	$s^{-1}$	Set
$k_{cap-Tran}$	0.1	$s^{-1}$	Set
$k_{gfp-Tran}$	0.3	$s^{-1}$	Set
$k_{rfp-Tran}$	0.1	$s^{-1}$	Set
$\delta_{gfp-5d}$	0.0008	$s^{-1}$	[11]
$\delta_{gfp-3d}$	0.0002	$s^{-1}$	[11]
$\delta_{rfp-5d}$	0.0008	$s^{-1}$	[11]
$\delta_{rfp-3d}$	0.0002	$s^{-1}$	[11]
$\delta_{csy4-5d}$	0.001	$s^{-1}$	[11]
$\delta_{csy4-3d}$	0.0003	$s^{-1}$	[11]
$\delta_{cap-5d}$	0.001	$s^{-1}$	[11]
$\delta_{cap-3d}$	0.0003	$s^{-1}$	[11]
$f{3d}$	2.5	-	[2]
$f{5d}$	20	-	[2]
$k_{TL-gfp}$	40	$s^{-1}$	[11]
$k_{TL-rfp}$	25	$s^{-1}$	[11]
$k_{TL-csy4}$	15	$s^{-1}$	[11]
$k_{TL-gfp}$	20	$s^{-1}$	[11]
$e_{2A}$	0.8	-	[16]
$k_{c1}$	0.002	$s^{-1}$	Set
$k_{c2}$	0.1	$s^{-1}$	Set
$k_{c-1}$	0.0001	$s^{-1}$	Set
$k_{d1}$	0.001	$s^{-1}$	Set
$k_{d2}$	0.01	$s^{-1}$	Set
$k_{d-1}$	0.0001	$s^{-1}$	Set
$K_{M,csy4}$	50	$nM$	Set
$K_{M,cap}$	10	$nM$	Set
$k_{gfp-mat}$	0.001	$s^{-1}$	[15]
$k_{rfp-mat}$	0.001	$s^{-1}$	[15]
$k_{csy4-mat}$	0.0005	$s^{-1}$	Set
$k_{D1-mat}$	0.0001	$s^{-1}$	Set
$k_{D12-mat}$	0.0002	$s^{-1}$	Set
$k_{Df}$	0.1	$s^{-1}$	Set
$k_{Db}$	0.001	$s^{-1}$	Set
$T_{csy4}$	0.8	$s^{-1}$	Set
$\beta_{gfp}$	0.0003	$s^{-1}$	[17]
$\beta_{rfp}$	0.0003	$s^{-1}$	[17]
$\beta_{csy4}$	0.0001	$s^{-1}$	[17]
$\beta_{csy4'}$	0.0004	$s^{-1}$	[17]
$\beta_{D1}$	0.0002	$s^{-1}$	[17]
$\beta_{D12}$	0.0005	$s^{-1}$	[17]

In this regard, we plot the concentration of the above-mentioned components ignoring the limitation of the energy supply, and considering there are the constant concentration of DNA, ribosomes, and enzymes involved in transcription and translation in the system, as shown below.

Fig 9. Intranuclear mRNA concentration in RNAlpha.

The changes in mRNA concentration in the nucleus are mainly influenced by the transcription rate as well as the exocytosis rate. We can observe that the intranuclear mRNA concentration of Csy4 and capping enzyme reached the steady-state within a short period of time, and the double cis-transon was also rapidly divided into GFP and RFP-pre mRNAs under the action of Csy4. The lack of a cap on RFP-pre results in a greatly reduced nucleation efficiency, which makes its steady-state concentration several times higher than that of GFP, the ratio that is consistent with the nucleation efficiency we previously set.

Fig 10. Cytoplasmic mRNA concentration in RNAlpha.

In the cytoplasm, the capping enzyme’s mRNA is several times longer than Csy4’s, making it less efficient at transporting out of the nuclear than Csy4, which results in a lower concentration of it. The concentration of deGFP mRNA, on the other hand, was higher than that of deRFP in the early stages, then quickly exceeds GFP’s due to the action of the capping enzyme. Since the 3' end of GFP mRNA has a hairpin structure, making its degradation efficiency slightly higher than that of mRNAs that containing poly(A) tails, its concentration is lower than that of deRFP at steady state. However, since the cytoplasmic capping process is the one we envisioned, we are unable to experimentally verify its feasibility at this time. In addition, because this enzyme caps methylation in a different way than in conventional organisms, we have no way of knowing whether it affects mRNA stability, so we consider it here to be the same as a normal cap.

Fig 11. Enzyme Protein Concentration in RNAlpha.

For both exogenous enzyme protein concentrations, the concentrations of mature Csy4, D1, and D12 in the cytoplasm are consistently quite low. Since Csy4 is rapidly transported to the nucleus and there is a great tendency for D1 and D12 to form complexes, the concentrations of these three as intermediate forms are maintained at low levels. In contrast, immature D1 and D12 both continue to accumulate at relatively high rates, due to higher rates of translation compared to protein maturation. The concentrations of Csy4 protein and the capping enzyme complex in the nucleus showed an exponential increase, which provided a guarantee for the mRNA processing of the target gene.

Fig 12. Target Protein Concentration in RNAlpha.

Ultimately, we observed a delayed effect in the accumulation of immature and mature proteins, which was caused by the fact that the mRNAs of both were accumulated only after Csy4 processing. The final deGFP and deRFP concentrations were close to the steady-state, showing a certain proportional state. The concentration ratios of mature deGEP and deRFP are close to those caused by RBS initiation intensity, which theoretically validates the ability of the eukaryotic mRNAlpha system to achieve predictions of target protein expression.

Conclusion

In summary, we have modeled simulations of the mRNAlpha system in prokaryotes and eukaryotes, and conclude that the quantitative expression of proteins can be predicted by the RNAlpha system. In addition, we provide a set of models describing transcription and translation in prokaryotes and eukaryotes with reference to the work of other people, hoping that this system can be helpful to other teams. Subsequently, we also hope to continue to validate the feasibility of the RNAlpha system in experiments to Provides a universal quantitative polycistron expression system.

Appendix

    % ProRNAlpha.m：
    function [f] = ProRNAlpha(t, x)

    kcatmsa = 0.014;
    kcatmsb = 0.012;
    kMsa = 1;
    kMsb = 1;
    kdm = 6.6;
    kMd = 8000;
    ks = 0.26;
    kcatps = 0.006;
    kcatps1 = 0.006;
    kcatps2 = 0.0008;
    kcatps3 = 0.0015;
    kc1 = 0.002;
    kc_1 = 0;
    kcatc = 0.1;
    kc2 = 0.1;
    kMr = 10;
    kMr1 = 10;
    kMr2 = 10;
    kMr3 = 10;
    kMc = 50;
    kmat = 0.0007;
    kdeg = 0.00005;
    %lmb = 1600;
    %lmg = 800;
    %lmr = 800;
    %cm = 10;
    %cp = 2.5;
    s=30;
    p70a = 5;
    p70b = 5;
    e0 = 400;
    r0 =1100;
    
    mRNAc=x(1);
    mRNAb=x(2);
    mRNAr=x(3);
    mRNAg=x(4);
    csy4m=x(5);
    gfpi=x(6);
    rfpi=x(7);
    gfpm=x(8);
    rfpm=x(9);
    
    dmRNAcdt = (kcatmsb*e0*s*p70b)/(kMsb*(ks+e0)+(e0*s))-(kdm*mRNAc)/(kMd+mRNAc);
    dmRNAbdt = (kcatmsa*e0*s*p70a)/(kMsa*(ks+e0)+(e0*s))-(kdm*mRNAb)/(kMd+mRNAb)...
    +(kc_1*mRNAb*csy4m)/(kMc) - kc1*mRNAb*csy4m;
    dmRNArdt = (kcatc*mRNAb*csy4m)/(kMd)-(kdm*mRNAr)/(kMd+mRNAr);
    dmRNAgdt = (kcatc*mRNAb*csy4m)/(kMd)-(kdm*mRNAg)/(kMd+mRNAg);
    dcsy4mdt = (kcatps3*mRNAc*r0)/(kMr3+r0)-kdeg*csy4m+((kc_1+kc2)...
    *mRNAb*csy4m)/(kMc)-kc1*mRNAb*csy4m;
    dgfpidt = (kcatps*mRNAg*r0)/(kMr+r0)+(kcatps1*mRNAb*r0)/(kMr1+r0)-kmat*gfpi;
    drfpidt = (kcatps*mRNAr*r0)/(kMr+r0)+(kcatps2*mRNAb*r0)/(kMr2+r0)-kmat*rfpi;
    dgfpmdt = kmat*gfpi - kdeg*gfpm;
    drfpmdt = kmat*rfpi - kdeg*rfpm;
    
    f = [dmRNAcdt; dmRNAbdt; dmRNArdt; dmRNAgdt; dcsy4mdt;... 
    dgfpidt; drfpidt; dgfpmdt;drfpmdt;];
    
    end
    
    %tspan = 0:1:15000;
    %initial = [0,0,0,0,0,0,0,0,0];
    
    %[t,x] = ode45( @ProRNAlpha, tspan, initial);


    % EuRNAlpha.m：
    function [f] = EuRNAlpha(t, x)

    fbm = 0.1;
    f4m = 0.1;
    fcm = 0.1;
    kTC1 = 0.2;
    kTC2 = 0.03;
    kTC3 = 0.05;
    kbT = 0.2;
    k4T = 0.3;
    kcT = 0.1;
    kgT = 0.3;
    krT = 0.1;
    dg5d = 0.0008;
    dg3d = 0.0002;
    dr5d = 0.0008;
    dr3d = 0.0002;
    d45d = 0.001;
    d43d = 0.0003;
    dc5d = 0.001;
    dc3d = 0.0003;
    f3d = 2.5;
    f5d = 20;
    kTLg = 40;
    kTLr = 25;
    kTL4 = 15;
    kTLc = 20;
    e2A = 0.8;
    kc1= 0.002;
    kc2 = 0.1;
    kc_1 = 0.0001;
    kd1 = 0.001;
    kd2 = 0.01;
    kd_1 = 0.0001;
    KM4 = 50;
    KMc = 10;
    kgm = 0.001;
    krm = 0.001;
    k4m = 0.0005;
    kD1m = 0.0001;
    kD12m = 0.0002;
    kDf = 0.05;
    kDb = 0.001;
    T4 = 0.8;
    bg = 0.0003;
    br = 0.0003;
    b4 = 0.001;
    b4p = 0.0004;
    bD1 = 0.0002;
    bD12 = 0.0005;
    
    
    mRNAbn=x(1);
    mRNA4n=x(2);
    mRNAcn=x(3);
    mRNAgn=x(4);
    mRNArpn=x(5);
    mRNA4c=x(6);
    mRNAcc=x(7);
    mRNAgc=x(8);
    mRNArpc=x(9);
    mRNArc = x(10);
    Csy4i = x(11);
    Csy4m = x(12);
    Csy4n = x(13);
    D1i = x(14);
    D12i = x(15);
    D1m = x(16);
    D12m = x(17);
    Dcap = x(18);
    GFPi = x(19);
    RFPi = x(20);
    GFPm = x(21);
    RFPm = x(22);
    
    dmRNAbndt = fbm*kTC1-kbT*mRNAbn - (kc2*mRNAbn*Csy4n)/KM4;
    dmRNA4ndt = f4m* kTC2-k4T*mRNA4n;
    dmRNAcndt = fcm* kTC3-kcT*mRNAcn;
    dmRNAgndt = (kc2*mRNAbn*Csy4n)/(KM4) - kgT*mRNAgn;
    dmRNArpndt = (kc2*mRNAbn*Csy4n)/(KM4) - krT*mRNArpn;
    dmRNA4cdt = k4T*mRNA4n - (d45d +d43d)*mRNA4c;
    dmRNAccdt = kcT*mRNA4n - (dc5d +dc3d)*mRNAcc;
    dmRNAgcdt = kgT*mRNAgn - (dg5d + dg3d*f3d)*mRNAgc;
    dmRNArpcdt = krT*mRNArpn - (kd2*mRNArpc*Dcap)/(KMc) - (dr5d*f5d+dr3d)*mRNArpc;
    dmRNArcdt = (kd2*mRNArpc*Dcap)/(KMc) - (dr5d+dr3d)*mRNArc;
    dCsy4idt = kTL4*mRNAcc - k4m*Csy4i;
    dCsy4mdt = k4m*Csy4i - b4*Csy4m - T4*Csy4m;
    dCsy4ndt = T4*Csy4m -b4p*Csy4m + ((kc_1+kc2)*mRNAbn*Csy4n)/KM4-kc1*mRNAbn*Csy4n;
    dD1idt = kTLc*mRNAcc - kD1m*D1i;
    dD12idt = kTLc*e2A*mRNAcc - kD12m*D12i;
    dD1mdt = kD1m*D1i - bD1*D1m - kDf*D1m*D12m + kDb*Dcap;
    dD12mdt = kD12m*D12i - bD12*D12m - kDf*D1m*D12m + kDb*Dcap;
    dDcapdt = kDf*D1m*D12m - kDb*Dcap+ ((kd_1+kd2)*mRNArpc*Dcap)/KMc-kd1*mRNArpc*Dcap;
    dGFPidt = kTLg*mRNAgc-kgm*GFPi;
    dRFPidt = kTLr*mRNArc-krm*RFPi;
    dGFPmdt = kgm*GFPi -bg*GFPm;
    dRFPmdt = krm*RFPi -br*RFPm;
    
    f = [dmRNAbndt; dmRNA4ndt; dmRNAcndt; 
        dmRNAgndt; dmRNArpndt;dmRNA4cdt; dmRNAccdt;
        dmRNAgcdt; dmRNArpcdt; dmRNArcdt;
        dCsy4idt; dCsy4mdt;dCsy4ndt;
        dD1idt;dD12idt; dD1mdt;dD12mdt;dDcapdt;
        dGFPidt; dRFPidt; dGFPmdt; dRFPmdt;];
    end
    
    %tspan = 0:1:21600;
    %initial = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
    
    %[t,x] = ode45( @EuRNAlpha, tspan, initial);

Reference

1. Qi, Lei S., et al. "RNA processing enables predictable programming of gene expression." Nature Biotechnology 30.10 (2012): 1002-1006.

2. Borchardt, Erin K., et al. "Controlling mRNA stability and translation with the CRISPR endoribonuclease Csy4.." RNA 21.11 (2015): 1921-1930.

3. Marshall, Ryan, and Vincent Noireaux. "Quantitative modeling of transcription and translation of an all-E. coli cell-free system." Scientific Reports 9.1 (2019).

4. Karzbrun, Eyal, et al. "Coarse-Grained Dynamics of Protein Synthesis in a Cell-Free System." Physical Review Letters 106.4 (2011).

5. Bremer, Hans, Patrick P. Dennis, and Mans Ehrenberg. "Free RNA polymerase and modeling global transcription in Escherichia coli.." Biochimie 85.6 (2003): 597-609.

6. Tian, Tian, and Howard M. Salis. "A predictive biophysical model of translational coupling to coordinate and control protein expression in bacterial operons." Nucleic Acids Research 43.14 (2015): 7137-7151.

7. Lorenz, Ronny, et al. "ViennaRNA Package 2.0." Algorithms for Molecular Biology 6.1 (2011): 26-26.

8. Levinkarp, Ayelet, et al. "Quantifying Translational Coupling in E. coli Synthetic Operons Using RBS Modulation and Fluorescent Reporters." ACS Synthetic Biology 2.6 (2013): 327-336.

9. Garamella, Jonathan, et al. "The All E. coli TX-TL Toolbox 2.0: A Platform for Cell-Free Synthetic Biology." ACS Synthetic Biology 5.4 (2016): 344-355.

10. Deleon, Samadar Bentabou, and Eric H. Davidson. "Modeling the dynamics of transcriptional gene regulatory networks for animal development." Developmental Biology 325.2 (2009): 317-328.

11. https://2019.igem.org/Team:SCU-China/DelaySystem

12. Goldstrohm, Aaron C., and Marvin Wickens. "Multifunctional deadenylase complexes diversify mRNA control." Nature Reviews Molecular Cell Biology 9.4 (2008): 337-344.

13. Fuchs, Annalisa, Ancilla Neu, and Remco Sprangers. "A general method for rapid and cost-efficient large-scale production of 5′ capped RNA." RNA 22.9 (2016): 1454-1466.

14. Kyrieleis, O. J P, et al. "Crystal Structure of Vaccinia Virus mRNA Capping Enzyme Provides Insights into the Mechanism and Evolution of the Capping Apparatus." Structure 22.3 (2014): 452-465.

15. Iizuka, Ryo, Mai Yamagishishirasaki, and Takashi Funatsu. "Kinetic study of de novo chromophore maturation of fluorescent proteins." Analytical Biochemistry 414.2 (2011): 173-178.

16. Souzamoreira, Tatiana M., et al. "Screening of 2A peptides for polycistronic gene expression in yeast.." Fems Yeast Research 18.5 (2018).

17. Belle, Archana, et al. "Quantification of protein half-lives in the budding yeast proteome." Proceedings of the National Academy of Sciences of the United States of America 103.35 (2006): 13004-13009.

RNAlphABA >

Team:SCU-China/RNAlpha

RNALPHA MODEL