<!DOCTYPE html>
Rapidemic
Models
Welcome to the modeling page! In this page, we will introduce the two models that we developed in this project. The first model is an epidemiological model, which is based on the SEIRQ (susceptible-exposed-infectious-resistant-quarantined) model. Here we slightly modified the SEIRQ model to explore the effects of disease monitoring and intervention. The second model is a kinetic representation of our sequence detection kit. This model will be used to assess the performance of our kit, troubleshoot remaining problems, and guide future optimization of our technique. More in-depth explanation of these models can be seen in the links below!
We have built an ODE framework to simulate the effects of rapid and accurate testing on the spread of an infectious disease during an outbreak. This model shows the importance of large testing capacity in combination with isolation to control an outbreak. Thus, it underlines the need for rapid diagnostic tools (RDTs) that can be quickly produced and distributed to the site of testing at the early stage of an outbreak. In other words, it shows why there is such a strong need for kits like our rapid epidemic response kit Rapidemic.
Aims
This model aims to explore the effects of diagnostic surveillance, represented by the parameter D (surveillance period) in the model, towards the outcome of an outbreak. In general, a smaller value of this parameter signifies intensified testing. Rapid diagnostic tools (RDTs) are assumed to allow higher diagnostic surveillance than tests that require a laboratory due to their general ease of use. In this model, we show the effect of the diagnostic surveillance on the spread of an infectious disease to show how our Rapidemic rapid epidemic response kit could impact the world. Additionally, the model explores the effects of a sudden increase in diagnostic surveillance, resembling the development of a newly developed (rapid) diagnostic tool. Aside from its main use, the model can also be used to simulate the effects of having different false positive and false negative rates (i.e. the accuracy of the diagnostic test), as well as the effects of self-quarantine efforts, towards the outcome of disease management efforts.
Model Structure
The model structure (Fig. 1) is based on the SEIR model, in which S, E, I and R represent the susceptible, exposed, infectious and recovered fraction of the population, respectively. In this structure, the susceptible population moves to the exposed state with a certain rate r*B*I/N (see Table 1 for parameter descriptions). Individuals in the exposed state are infected but not yet infectious. The exposed population moves to the infectious state with rate Eps, and the infectious population moves to the recovered state with rate G. The exposed and infectious population is tested and isolated with rate d2 and d3. When the population in this state is tested negative due to a nonzero false negative rate of the diagnostic kit, these sub-populations will not be moved into quarantine and remain within their corresponding compartments E or I. Individuals that are susceptible (S) or recovered (R) can be tested positive due to a nonzero false positive rate of the diagnostic kit. They are quarantined with rate d1,FP and d4,FP, respectively (FP stands for false-positive). Quarantined people are released from quarantine with the rate q, with the assumption that after the quarantine period they are fully recovered and no longer infectious. The recovered population is assumed to be immune to the disease.
Fig. 1 General structure of SEIR-Q3 model.
In our model, the isolation rates d1FP, d2, d3 and d4FP depend on 1) the probability of being tested and 2) the accuracy of the test. The mathematical description of the first dependency, the probability of being tested, was inspired by Hart et al., 20191. This probability depends on the severity of symptoms of the host (pi) and the surveillance period (D). The severity of symptoms of the host (pi) may differ per state (S, E, I or R); infectious people in general show more severe symptoms than people in the other states and therefore have higher change of being tested. Combining the probability of being tested with the false positive rate (FP) and false negative rate (FN) of the diagnostic test leads to the following isolation rates (Fig. 2):
Fig. 2 Mathematical descriptions of the isolation rates.
Help from an epidemiologist
In our modeling project, we have spoken to Edouard Bonneville, a PhD student at the Leiden University Medical Center, specialized in survival analysis and infectious disease modeling and forecasting. He suggested the SEIR model as a widely used model for epidemiological modeling and recommended this model as a good start for our modeling project. We adjusted the basic SEIR model to our requirements; we incorporated the surveillance rate, the sensitivity and specificity of the testing method and the effect of showing symptoms on the probability of being tested. We presented our model to Edouard, who agreed with our model structure as explanatory tool to show how diagnostic testing impacts epidemics. We incorporated the helpful feedback that he gave on the model structure and presentation, leading to the interactive model that you can play around with below!
Results
SEIR-Q3 MODEL
Three case studies
Intensity of diagnostic surveillance (D)
The diagnostic surveillance period (D) determines how often the population is tested in general and, if tested positive, put in isolation. This specific parameter (D) does not take into account whether a person has symptoms or not. Therefore, we speak of the population in general. If the surveillance period D is long, this means that the time until someone is tested again is long, and thus the population is not tested frequently.
Scenario 1
Imagine a high-income country that has a high testing capacity and a governmental policy that promotes testing. In this scenario, the surveillance period is short, because the population is tested frequently. Take, for instance, D = 20 days. For this value, and with the parameter values pi (severity of symptoms), FP (false positive rate) and FN (false negative rate) as in Table 1, we calculate that the infectious population showing symptoms of the infectious disease is tested every ~12 days.
Scenario 2
Imagine a less economically developed country with a low testing capacity. The surveillance period is long, for instance D = 100 days. In that case, the infectious population showing symptoms of the infectious disease are tested only every ~61 days.
For the two scenario's described above (D = 20 and D = 100) and a third scenario with D = 80, we simulate how the infectious population changes over time during an outbreak (Fig. 3); this is the 'flatten-the-curve plot' that you may recognize from the recent news reports about COVID-193,4. This figure clearly shows the importance of testing; when the surveillance period is shorter, i.e. the population is tested more frequently, the peak is lower and also delayed. This shows thus that, with frequent testing and conditional isolation, we can prevent the infection of a part of the population with the disease. Moreover, hospitals will have more time to prepare for the (low) peak of infected cases.
In the interactive model, change the surveillance period to see its effect on the infectious population during an outbreak. For example, decrease the surveillance period from 40 to 20 to intensify testing, or increase the surveillance period from 40 to 80 to minimise testing.
Fig. 3 Number of people of the infectious population in the simulation of an outbreak with different surveillance periods: 20 days, 40 days and 80 days.
Accuracy of the kit (sensitivity and specificity)
Aside from the testing frequency, the accuracy of the diagnostic test also plays an important role in controlling the spread of an infectious disease. If the population is tested very frequently, but the results cannot be trusted, how can we then isolate the right individuals to effectively control the spread of the disease?
The accuracy of a diagnostic kit is described by two main parameters: the false negative rate (sensitivity) and the false positive rate (specificity). Recently, several RDTs for COVID-19 have come to the market, but the clinical use of these antigen-based tests is currently strongly discouraged by the World Health Organization (WHO)5. Especially the false negative rate can vary greatly between those tests. Here we investigate the impact of such high false negative rates on the progression of an outbreak.
Fig. 4 shows that in combination with low diagnostic surveillance, the effect of false negatives rates from 0.1 to 0.3 (i.e. 10% to 30%) is relatively small. However, the effect of these false negative rates becomes larger in combination with a high diagnostic surveillance (Fig. 5). Thus, we can conclude that in a policy of frequent testing, it is important to have accurate diagnostic tests!
In the interactive model, change the false negative rate and/or the false positive rate to see their effects on the infectious population during an outbreak. For example, increase the false negative rate from 0 to 0.2 for a diagnostic kit with low sensitivity.
Fig. 4 The effect of false negatives of the diagnostic test on the infectious population with low diagnostic surveillance (surveillance period is 50 days). False negative rates are 0, 0.1 and 0.2
Fig. 5 The effect of false negatives of the diagnostic test on the infectious population with high diagnostic surveillance (surveillance period is 20 days). False negative rates are 0, 0.1 and 0.2.
A sudden increase in D resembling the development and use of a new kit (TD)
As we have seen during the recent outbreak of COVID-19, the world needs considerable time to enable large-scale, population-wide testing. It takes time to, for instance, re-organize laboratories and collect enormous amounts of reagents for large-scale PCR testing, or develop new point-of-care rapid diagnostic tests (RDTs). Nevertheless, the importance of quick adaptation is essential to the containment of infectious diseases, as shown in Fig. 6. In this plot, a sudden increase in diagnostic surveillance occurs at early stage (14 days, orange), later stage (130 days, blue) and very late stage (200 days, black). For the quick diagnostic response, the peak is low and delayed, as compared to the later diagnostic interventions. Thus, this indicates that rapid diagnostic response is crucial for obtaining control over the outbreak.
Fig. 6 The effect of a sudden increase in diagnostic surveillance on the infectious population. The diagnostic surveillance increases (surveillance period D changes from 50 days to 20 days) at 14 days, 130 days and 200 days.
Contribution to future iGEM teams
We created this model to explain the importance of testing and to show how our new diagnostic tool could impact the world. Since we promote our diagnostic tool as a versatile method that detects various pathogens, we based our parameter values on previous outbreaks of Ebola as a model example. In future work, this model could be adjusted to other specific diseases, since diseases differ in many aspects. Parameters that could be adjusted are, for instance, the disease transmission rate B and the severity of symptoms pi (see Table 1). Structural changes may also be necessary for infectious diseases in which the recovered population is not immune, or only immune for a small period of time. With such adaptations, future iGEM teams could apply this model to show how their new diagnostic tool would impact the spread of their specific disease of interest.
How to run the model yourself?
-
Download and save the .txt and .xlsl files below in the same folder. Change the extension of the .txt file; change its name from 'SEIR-Q3 Epidemiological model.txt' to 'SEIR-Q3 Epidemiological model.R'. Copy the pathname of the folder as shown below, or: (a) select the folder and use the shortcut ⌥⌘C for Mac (b) right click folder and choose 'copy as path' for Windows.
Download the SEIR-Q3 parameters.xlsx
Download the Epidemiological model.txtCopy the pathname of the folder (Mac).
Copy the path of the folder (Windows).
-
Open the .R file in Rstudio and copy the pathname of the folder to the 2nd line of the code.
Replace 'put your directory to this .R file here' in the second line of the code with your pathname.
Example of pathname.
- This model requires the R packages: readxl, deSolve, ggplot2, gridExtra, reshape2. These packages will be loaded or installed automatically when you run the script. The installation may take a few moments to complete the first time, but future runs will take significantly less time!
-
Run the script. The plots will appear in the 'Plots' tab on the right.
Three simulations are plotted simultaneously to compare different parameter sets.
- Change the parameter values in the .xlsl file. Run the script to observe the changes in the population states. Three simulations are plotted simultaneously to make it easy to compare different sets of parameters.
- Perform a literature search to obtain specific parameters and characteristics for your disease of interest. Think about how you would change the structure of the model and the parameter values. State your research questions; can you simplify the model structure?
- Adjust the structure of the model in the .R file and change the parameter values in the .xlsl file to make the model specific to your disease of interest.
Tip: use the R package 'Shiny' to build a web app to make your model interactive and share it with others!
Methods
Algorithms and Packages
Simulations of ordinary differential equations (ODE) were performed using the 'deSolve' package in R. Due to unavailability of ODE solver in Javascript, ODEs were solved by using Euler's method. Simulations were performed using a pre-defined time increment of 0.1 (days).
Differential Equations used in the Model
$ \dfrac{dS}{dt} = \left( -r*B *\dfrac{I}{N} * S \right) -(d_{1FP} * S ) + (q * Qs)$$ \dfrac{dE}{dt} = \left(r*B *\dfrac{I}{N} \right) -(Eps * E) - (d_2 * E)$
$ \dfrac{dI}{dt} = (Eps * E) - (G * I) - (d_3 * I)$
$ \dfrac{dR}{dt} = (G * I) + (q * Q) - (d_{4FP} * R) + (q * Qr)$
$ \dfrac{dQs}{dt} = (d_{1FP} * S) - (q * Qs)$
$ \dfrac{dQ}{dt} = (d_2 * E) + (d_3 * I) - (q * Q)$
$ \dfrac{dQr}{dt} = (d_{4FP} * R) - (q * Qr)$
Assumptions
- Birth and mortality rates are fixed.
-
The time in quarantine is longer than the latent and infectious period:
$ \dfrac{1}{q} > \dfrac{1}{Eps} + \dfrac{1}{G}$ - Recovered people are immune; people cannot be not infected twice.
Table 1. Parameters for simulating the dynamics of an Ebola epidemic
Parameter | Description | Value | Unit | Source | Note |
---|---|---|---|---|---|
L | Birth rate | 0 | 1/day | Assumption | |
r | Number of contacts | 5 | contacts/day | Assumption | |
B | Probability of disease transmission | 0.054 | 1/contact | Legrand et al., 2007 | |
M | Mortality rate | 0 | 1/day | Assumption | |
Eps | Per-capita rate of progression to infectious state | 0.2 | 1/day | Legrand et al., 2007 | = 1 / latent period |
G | per-capita recovery rate of infectious population | 0.1 | 1/day | Legrand et al., 2007 | = 1 / infectious period |
p1 | Probability of tested when susceptible | 0.1 | Assumption | ||
p2 | Probability of tested when exposed | 0.1 | Assumption | ||
p3 | Probability of tested when infectious | 0.9 | Assumption | ||
p4 | Probability of tested when recovered | 0.1 | Assumption | ||
D | Surveillance period | 50 | days | ||
FP | False positive rate of test kit | 0 | (ratio) | ||
FN | False negative rate of test kit | 0 | (ratio) | ||
q | Get out of quarantine rate | 0.07 | 1/day | Assumption | = 1 / isolated period |
tmax | Total time to simulate | 600 | days | ||
Tr | Contact rate switch time | 600 | days | Global (not person-specific) interventions (e.g. travel restriction, keep distance, lockdown) | |
TD | Test rate switch time | 600 | days | Indicates moment that the testing capacity is increased, for instance the moment when a test is developed | |
r1 | Number of contacts (r) when t > Tr | 1 | contacts/day | Assumption | |
D1 | Surveillance period (D) when t > TD | 11 | days | Assumption |
Table 2. Initial values
Parameter | Value |
---|---|
S | 4500000 |
E | 1 |
I | 7 |
R | 0 |
Qs | 0 |
Q | 1 |
Qr | 0 |
References
- S. Hart, L.F.R. Hochfilzer, N.J. Cunniffe, H. Lee, et al. "Accurate forecasts of the effectiveness of interventions against Ebola may require models that account for variations in symptoms during infection" Epidemics, (2019)
- Legrand, R.F. Grais, P.Y. Boelle, A.J. Valleron, and A. Flahault. “Understanding the dynamics of Ebola epidemics. Epidemiology and Infection”, 135(4):610–621, (2007). doi: 10.1017/S0950268806007217
- Stevens, H. "Why outbreaks like coronavirus spread exponentially, and how to “flatten the curve”", 2020. Retrieved at September 28, 2020 from https://www.washingtonpost.com/graphics/
2020/world/corona-simulator/ -
Leunissen, M., Hal, G. van, and Knegtel, T. "Waarom je afstand moet houden om de corona-uitbraak af te remmen", 2020. Retrieved at September 28, 2020 from https://www.volkskrant.nl/kijkverder
/2020/socialdistancing/#// - World Health Organization. "Advice on the use of point-of-care immunodiagnostic tests for COVID-19 Rapid diagnostic tests based on antigen detection" WHO Coronavirus disease (COVID-19) Pandemic, (2020)
Introduction
The current section will discuss the development of a model that represents the kinetics of our sequence detection technique. This model was developed in order to respond to the occurrence of false-positive signals in our experiments (see more on our Results page). The resulting model will be used in both descriptive and predictive manners in order to achieve the following goals.
- Provide a quantitative description of false-positive signal generation in our detection scheme
- Identify potential points for optimization
- Assess the performance of our kit
In an ideal situation, predictions and explanations provided by the model will be used to provide further guidance for our future experiments. However, laboratory access restrictions due to coronavirus-related measures forced us to slightly adapt our timeline. Model development using our preliminary data was thus performed in parallel with our experimental optimization attempt - which was performed guided solely by the preliminary data. The resulting model was thus used to describe and troubleshoot our laboratory results, and to develop potential optimization strategies for our technique.
Fig. M1. Overall experiment-modeling timeline of this project. New safety measures in our university significantly restricted our lab access. This demanded us to perform in vitro optimization attempt in parallel with in silico (model-based) optimization. Insights obtained from the model were thus yet to be implemented in our experiments. Instead, the model was used to troubleshoot past experimental results and to identify potential points for future optimization.
The kinetic model of our kit was developed by integrating three separate kinetic models. Each of the three constituent models represents the three separate reactions included in our sequence detection scheme. These reactions included 1) RPA, 2) LSDA, and 3) GQ-catalyzed TMB oxidation reactions. Detailed explanation about how each of these models was developed and integrated is available on Panel 1: Development of the Kinetic Model.
What can our model simulate?
The resulting kinetic model can be used to predict product concentrations of each individual reaction at any given time point (Fig. M2).
Fig. M2 Input and output of the Rapidemic model. The Rapidemic model integrates three reactions that build up our sequence detection scheme. In general, the model would take the target sequence concentration in the sample, DMSO concentration, and reaction time of each reaction as inputs. In return, the model will provide product/intermediate concentration of the three reactions at any given time point. This model simulation will be used in subsequent analyses in both descriptive and predictive manners.
As a serial integration product of three individual models, the Rapidemic model adheres to the assumptions and limitations of its composing models. Individual assumptions and limitations of these individual models were discussed in Panel 1: Development of the Kinetic Model. In addition to the individual assumptions, the integrated Rapidemic model also adheres to the following assumptions.
- The presence of carryover constituents of each preceding reactions that do not directly participate in the following reaction will be assumed to be negligible in each following reaction.
- Each reaction follows the same reaction trajectory as described by each individual model regardless of its connection with the other reactions.
- The amount of DNA is described based on the specific equivalent of the relative fluorescent unit (RFU) of the nonspecific RPA amplicon.
- Due to lack of available information, the resource limitation of endpoint signal generation could not be properly described. As a result, the model may over-estimate signal generation during a) low TMB concentration, and b) a longer period of GQ-mediated TMB oxidation reaction.
- The model cannot be used to explore the effects of different reaction components other than a) initial target concentration, b) DMSO concentration, and c) incubation time of each reaction.
What can we learn from the model?
The model that we developed allows us to describe the kinetics of the three-step reactions included in our sequence detection scheme. This model can be used to perform simulations of how our kit behaves in different conditions within the simulation boundaries. In the following discussion, we will show how we can use these simulations to better understand what we saw in the lab, point towards a potential strategy to further optimize our technique, and predict the limits of the optimized technique.
Point-simulations: understanding experimental results
- Our model provided a quantitative understanding of how false-positive signals could have been generated in our experiments.
- Our model identified possible inhibitory effects of DMSO in either LSDA or TMB oxidation reactions and suggested the use of a DMSO concentration lower than 2.5% (v/v).
Understanding the generation of the false-positive signals.
Our experiments identified primer-associated non-specific amplification in RPA as the source of false-positive signals (see Results page). Here, we explored how the kinetics of the observed false-positive signal generation and its extent to the kit as a whole. This analysis was performed by simulating the three integrated reactions at the conditions that we used in our experiment. That is, by using 0% DMSO in RPA reactions. For this simulation, the initial target concentration arbitrarily set to 500 RFU. This simulation helped us to obtain a better understanding of the generation of this false-positive signal in a quantitative manner - how it is generated in RPA, and how it affects each subsequent reaction (Fig. M3).
Fig. M3 Simulating false-positive signal generation. The integrated model was used to simulate amplicon generation (left), GQ DNAzyme generation (middle), and TMB oxidation (right) in a sequential manner. The simulation showed how false signal generation during RPA reaction can be transferred to subsequent reactions and give rise to a false-positive signal reading at the end of the test. The amplification signal was reported as a relative fluorescent unit (RFU) and does not distinguish true amplicon from primer-derived products. GQ DNAzyme production was shown as RFU-equivalent. Change in solution absorbance at 650 nm (A650) due to TMB oxidation was used as final signal reporting.
Troubleshooting our final experiment using 2.5% DMSO.
Our experiments revealed how the addition of DMSO in RPA reactions may attenuate the severity of false-positive signal generation (See Results Page). However, our preliminary experiments were yet to identify an optimal DMSO concentration to obtain maximum signal distinction between the positive and negative tests. The selection of 2.5% DMSO concentration in our experiments was selected solely based on our preliminary RT-RPA results (pre-analysis). In the following analysis, we will use our model to troubleshoot the problems that we encountered during this experiment.
The addition of 2.5% DMSO could attenuate the generation of false-positive signals. However, true-positive signals were similarly reduced to a level that makes it difficult to qualitatively distinguish it from negative results. To provide a better understanding of this observation, we used the model to simulate the progress of the reaction at 2.5% DMSO concentration (Fig. M4).
Fig. M4 Troubleshooting experimental optimization attempt. The integrated model was used to simulate amplicon generation (left), GQ DNAzyme generation (middle), and TMB oxidation (right) at an initial 2.5% DMSO concentration during RPA. The simulation predicted minimization of final false-positive signal to a final absorbance signal of less than 0.2 (a.u.), as observed during the experiment. However, a much higher absorbance signal was expected from true-positive samples, indicating possible inhibitory effects of DMSO on LSDA and/or TMB oxidation reactions. The RPA amplification signal was reported as a relative fluorescent unit (RFU) and does not distinguish true amplicon from primer-derived products. GQ DNAzyme production was shown as RFU-equivalent. Change in solution absorbance at 650 nm (A650) due to TMB oxidation was used as final signal reporting.
Our model predicted an endpoint absorbance signal of approximately 0.600 (a.u.) at 2.5% DMSO concentration. This is significantly higher compared to our observation in the laboratory, in which a maximum absorbance signal of 0.175 (a.u.). This discrepancy hinted at the inhibitory effect of DMSO in either LSDA or TMB oxidation reactions. From this analysis, we identified the importance of reducing DMSO concentration in future optimization attempts.
Optimization: maximizing distinguishability
- Optimization of DMSO concentration was performed by repeating our simulation to identify a concentration that maximizes the difference between endpoint positive- and negative- signals.
- Our optimization identified 1.33% DMSO (v/v) to be the most optimal in maximizing signal distinguishability. This improvement will be implemented in the future.
Tuning DMSO concentration to minimize false-positives.
One of the biggest advantages of modeling lies in its ability to simulate different input conditions multiple times to identify a one that is best suited to reach a certain objective. In the following analysis, we attempt to find a DMSO concentration for RPA that may maximize the difference in the final colorimetric signal.
Our main optimization attempt included simulations that took DMSO concentration as an input and returns the difference in endpoint colorimetric signal (dA650) as output.This difference in colorimetric signals (dA650) was calculated as the difference between cases where the target template is present/absent from the initial RPA sample. The simulation was repeated multiple times under an optimization algorithm to search for the DMSO concentration that maximizes this difference in signal output.
A DMSO concentration of 1.33% (marked by the vertical dashed line in Fig. M5) was identified to be the most optimum to reach this objective. Additional scanning on higher/lower DMSO concentration ranges did confirm the presence of an optimal point at this exact concentration region. Notice how this concentration is approximately half of the DMSO concentration that we used in our final experiment (2.5% DMSO). The proposed optimum DMSO concentration may thus serve as a prospective solution to attenuate the inhibitory effect of DMSO on LSDA and/or TMB oxidation that we previously observed (see Troubleshooting our final experiment using 2.5% DMSO). Therefore, our in silico analyses identified a DMSO concentration of 1.33% to be the most optimal DMSO concentration that can best distinguish signal intensity between the positive and the negative tests. This modification of DMSO concentration from 2.5% to 1.33% (v/v) is thus proposed as a potential point of improvement for our kit, which can be tested in future experiments.
Fig. M5 Simulated endpoint absorbance signal at different DMSO concentrations. The colors light-blue and orange indicated the endpoint absorbance signals obtained in the presence and absence of target template, respectively. The dark-green dataset represented the difference between the previous two values. The black dashed vertical line indicated 1.33% DMSO concentration - the DMSO concentration that was predicted to be the most optimal in maximizing the difference of signal intensity between the positive and negative samples.
Screening: assessment on the kit performance
RPA allows robust reporting of the final absorbance signal.
One important factor of a point-of-care detection method is the robustness of its final signal readout with regard to the initial target amount. In the following analysis, we will investigate the robustness of the developed sequence detection technique.
In the following analysis, the model simulation will be repeated several times at different concentrations of starting template concentration. We will then investigate how the final colorimetry signal varies as starting template concentration changes. In general, a steady endpoint signal is would reflect the robustness of the method, whereas a correlation between template concentration and the resulting signal intensity would suggest the method's potential to be a quantitative technique.
Our simulations showed how our technique can always generate a relatively steady endpoint signal (A650) in the presence of at least 100 RFU-equivalent target template concentration. This consistency in signal generation is mostly due to the logistic limit posed by RPA reaction (Fig. M6, left). This robustness is reflected by the visually indistinguishable 0.05 (a.u.) increase in A650 values when initial target concentration is increased from 100 to 10,000 RFU (Fig. M6, right). The model thus predicted our technique's ability to generate a more robust signal despite being exposed to different concentrations of target template. The lower limit of this robustness may also serve as a lower limit of detection for our technique, even though further experiments might still be needed to fully characterize the lower limit of detection.
Fig. M6. Our technique generates a steady level of endpoint signal at within a given template concentration range. This analysis will be performed at a DMSO concentration of 1.33% in each RPA reactions. Left) Logistic limitation of the RPA reaction sets a maximum number of amplicon at the end of the 20-minutes reaction. This maxima can be reached within the given reaction time when starting template concentration is approximately equal to or above 100 RFU-equivalent. Right) This limit in amplicon number allows constant signal generation at the downstream colorimetry reaction.
Predicting minimum reaction time.
Our experiments showed how a solution with A650 value of ~0.3 (a.u.) can be easily distinguished from that with A650 value of ~0.2 (a.u.). In the following analysis, we will use our model to simulate and find the possible minimum incubation time required to obtain an endpoint positive signal of at least 0.3 (a.u.) while restricting false-positive signals to 0.2 (a.u.). This analysis was performed by employing an optimization algorithm to minimize the following objective function.
In the above equation, a boolean penalty of 10^10 was applied for cases in which true-positive signals fell below 0.3 (a.u.) or false-positive signal raised above 0.2 (a.u.). When both restrictions were satisfied, the objective function would simply return the total time required by the whole three-step reactions. In this analysis, DMSO concentration was set to 1.33% (previously identified as an optimal concentration).
The current analysis revealed the possibility for our method to be shortened to 46.5 minutes while maintaining a final true-positive signal of at least 0.3 (a.u.) while limiting false-positive signals to 0.2 (a.u.). This 46.5 minutes consisted of 15 minutes of RPA reaction, 10 minutes of LSDA reaction, and finally 21.5 minutes of GQ-mediated TMB oxidation reaction.
Fig. M7. Reaction progress at the obtained minimum reaction time. Incubation time for each reactions was obtained by parametric optimization. As a whole, the reaction incorporated 15 minutes of RPA (left), 10 minutes of LSDA (middle), and 21.5 minutes of TMB oxidation reaction (right) - giving a grand minimum for test time of 46.5 minutes.
In this section, we will describe how we developed a semi-empirical model to simulate the underlying reaction kinetics of our three-step reaction detection scheme. We began by modeling the three reactions individually. The three reactions include 1) recombinase polymerase amplification (RPA), 2) linear strand displacement amplification (LSDA), and 3) TMB oxidation by GQ DNAzyme. The parameters used in these models were optimized to fit our experimental observations. Missing kinetic information was substituted with that obtained from the literature. Lastly, we integrated the three individual models to obtain a model that describes the kinetics of our DNA detection kit. This integrated model will provide a general description of how our kit works as well as identified a potential point for further improvement of the kit’s performance.
Recombinase Polymerase Amplification (RPA) Model
In our RPA experiments, we revealed the presence of primer-derived side products in addition to our targeted amplicons. The addition of dimethyl sulfoxide (DMSO) to the RPA mixture was found to effectively inhibit the production of the side-products, even though its presence may also interfere with amplicon generation. Here we developed a model to describe how the two molecular entities (amplicon and side product) are being generated in an RPA reaction in the presence/absence of different concentrations of DMSO.
RPA Model Development
Molecular basis of the model
RPA is a reaction that amplifies a specific dsDNA sequence flanked by a specific set of primers, much like a regular PCR reaction. However, unlike PCR reactions, RPA is performed at an isothermal condition. Just like in PCR, the rate of target product (amplicon) generation increased logarithmically. This is because the amplicon itself can serve as a template for subsequent amplicon generation. This allowed us to implement logistic models to empirically model amplicon generation in an ‘optimal’ RPA reaction – an RPA reaction where only the targeted sequence is amplified. Unfortunately, our experiments revealed the presence of primer complex-related side product(s) in our RPA reactions (See Results Page). Thus, a simple logistic equation is no longer applicable to accurately model our reactions.
To account for side product formation, we implemented the mass balance of a second molecular entity to our RPA model (Panel Fig. M1). Our experiments revealed how these side products can be produced without the presence of any dsDNA template in the reaction mixture. Since the effective rate in a logistic equation is a product of the reaction’s specific rate and product concentration/amount, a logistic equation could not describe how the side product can be generated de novo in our RPA reactions. Thus, a linear resource-limited production model was selected to model side product generation. In reality, side products may also be produced exponentially following its de novo linear generation. Nevertheless, we fixed this exponential rate to zero to avoid identifiability problems while simultaneously describe de novo formation of the side product (see: Panel 2: Understanding parameter identifiability problem).
Model assumptions
Based on this mechanistic basis of the reaction, the model was developed by assuming the following points.
- RPA reactions produce exactly two entities, the target amplicon and the side product
- The generation rate of the two entities was limited by the amount of a single arbitrary resource. This resource limitation will be depicted as a maximum possible amplicon and side product concentration
- Generation of the target amplicon occurred in an exponential manner
- Generation of the side product occurred in a linear manner
- DMSO increases the entities’ production cost towards its resource pool, effectively reducing its generation rate as well as its maximum amount.
- Increased production 'cost' of each molecular entities due to increasing DMSO concentration does not affect its burden on the production of the other entity (individual burden is applied)
Model equations
The resulting model can be described using the following equations. Model parameters were fitted to our experimental observation. Amplicon and side product concentrations were handled as equivalent relative fluorescence units (RFU). Additional experiments were needed to translate these amounts to absolute concentration units (i.e. by making a DNA copy number~RFU calibration curve).
Amplicon [A] production rate
Side product [S] production rate
DMSO effect on production rate
DMSO effect on maxiumum product amount
Panel Fig. M1. Equations of the RPA model.
Parameter Optimization
Parameter fitting for RPA model was performed in a sequential manner, starting from the least complex case and building up to depict more complex scenarios. We began by fitting side product generation in the absence of both target dsDNA and DMSO, in which DMSO effect can be excluded and a single product generation occurred. This analysis provided us with the parameter values needed to describe side product generation in the absence of DMSO.
Amplicon generation could not be separated from side product generation. We thus fit the parameter values related to amplicon generation to the 0% DMSO observation in the presence of target by fixing the parameter values that describe side product generation. This analysis provided us with parameter values needed to describe main amplicon generation in the absence of DMSO.
Similarly, DMSO effects on side product generation rates were obtained by fitting the model to all observations in which target dsDNA is absent and by fixing the values describing side product generation in the absence of DMSO. Functions derived from the Hill equation was used to describe DMSO's effect due to its ability to empirically describe the changes in production rate and maximum side product size at different DMSO concentrations. This analysis provided us with the parameter values needed to describe side-product generation in the presence of DMSO.
Parameters describing amplicon generation in the presence of DMSO were fitted by first fixing the parameter values identified in the previous steps. Since our measurements do not distinguish amplicon and side product concentrations, we will assume that DMSO effect on amplicon generation would follow the same trend as its effect on side product generation. This analysis provided us with the parameter values needed to describe amplicon generation in the presence of DMSO and completes our RPA model.
What we learned from the RPA model
Panel Fig. M2. The RPA model can describe experimental observation. The table on the left lists parameter values included in the model. These values were obtained from parametric optimization. The figure on the right showed how the model simulation (lines) can conform relatively well with the experimental observation (bullets points).
Our RPA model could describe how amplicon and side products were produced during an RPA reaction. As seen from the figure above, the model can sufficiently describe our experimental observation. In the figure above, data points and lines represented observation and model prediction values, respectively. The model could account for side product generation in the absence of the target template. Moreover, the model can explain how increasing DMSO concentration may affect the rate and extent of amplicon and side product generation.
This model also allows us to simulate amplicon/side product generation at different DMSO concentrations that we have not yet explored during our experiments. On its own, the resulting RPA model can already be used to optimize DMSO concentration in RPA such that the final RPA signals (as RFU-equivalent) between positive and negative tests can be maximized. This preliminary attempt provided 1.71% DMSO as the most optimal in maximizing the final RPA signal (Panel Fig. M3). However, non-linearity of the three reactions included in our sequence detection scheme does not guarantee that this DMSO concentration would remain the most optimal when the output signal is observed at a reaction downstream to the RPA reaction. Indeed, similar optimization using the integrated model identified a different optimal DMSO concentration (see: What can we learn from the model?). Since our technique would in the end be an integration of three reactions (RPA, LSDA, and TMB oxidation), the use of the integrated model would thus provide a more accurate result as a proposal for reaction optimization.
Fig. M3. 1.71% DMSO was identified as the most optimal in maximizing endpoint RPA signal. Endpoint RPA signal was reported as relative fluorescence units (RFU), and does not distinguish amplicon from side-products. This result was considered to be less precise than that proposed from similar analysis using the integrated model.
Linear Strand Displacement Amplification (LSDA) Model
Our sequence detection scheme relies on LSDA as a connector between detection using RPA and positive signal reporting by guanine-quadruplex (GQ)-catalyzed TMB oxidation. LSDA reaction takes RPA amplicon as a substrate and produces GQ ssDNA as a product. Our experiments revealed how RPA side-product may also produce GQ ssDNA when used as a substrate in LSDA reactions.
To sufficiently describe LSDA kinetics, time-course information of GQ concentration was required. This can be obtained by monitoring GQ sequences' specific absorbance at 295 nm wavelength. However, our experimental observation lacked this data due to absorbance interference by the material of the container that we used in this experiment. Due to our instrumental limitations to measure and distinguish GQ sequences from non-GQ dsDNA, the current model will be developed based on kinetic LSDA information obtained from the literature1.
LSDA Model Development
Theoretical basis of the model
Kinetic parameters used for modeling LSDA reactions were obtained from the literature (ref2). In this paper, they observed an approximately linear production of single-stranded GQ sequence using 0.4 units/uL Nt.BstNBI nickase. The information that we obtained from this paper was as the following.
- GQ sequence production rate was approximately linear at a given time frame
- Apparent rate constant of the reaction was 0.64 (1/min) at the beginning of the reaction
- Apparent rate constant of the reaction was 0.12 (1/min) at a later stage of the reaction
- Transition from 0.64 to 0.12 (1/min) was reached within the first ~10 minutes of the reaction
Specific assumptions and limitations for the current LSDA model
The current LSDA model is developed based on the report of a different study. The reaction conditions used in this preceding study may differ from our reaction conditions. The accuracy of this model should thus be assessed with future experiments. Nevertheless, the following assumptions were set in order to build the current model.
- The performance of Bst 2.0 polymerase and vent (exo-) DNA polymerase were approximately equal
- Apparent rate constant of the reaction is linearly correlated with Nt.BstNBI concentration
- The apparent rate of the reaction decreased from its maximum to its minimum in a linear manner within the first 10 minutes of the reaction
- Displacement of the GQ sequences EAD2 (used in the literature) and EAD2+3'A (used in our experiments) were equally efficient
- Absolute quantification of both the input and the output of LSDA was not available
- The amount of RPA amplicon/side product and the produced GQ ssDNA were instead expressed as RFU-equivalent
- One RFU-equivalent corresponds to the amount of one arbitrary molar unit of RPA side product that gives rise to exactly one RFU signal
- Nevertheless, simulation using any units of concentration is possible given a starting concentration of the template dsDNA
Model equation
Based on the aforementioned information and assumptions, GQ production during LSDA reaction can thus be expressed as the following.
$ | t \le t_{transition} $
$ \text{(RPA amplicon or side product)} $
Panel Fig. M4. Parameters used in our LSDA model.
Parameter values used in this model were fixed to the obtained theoretical values. No parameter optimization would thus be performed for the current LSDA model.
Our LSDA Model
Here we modified the theoretical LSDA model to our experimental condition. This was performed by multiplying the theoretical reaction rate with the ratio between Nt.BstNBI nickase concentration used in our study with that used in the reference experiment (See Reference). In general, simulation using our model could, to some extent, capture the overall trend observed in the reference study (Panel Fig. M5), even the currently assumed linear reduction in rate kinetics might be too severe in depicting the actual reaction kinetics. Nevertheless, additional measurements are required to further improve the accuracy of this model.
Panel Fig. M5. Simulation of GQ production using LSDA. Our model (left) could capture overall kinetic trends shown by the referenced study (right).
Guanine-Quadruplex (GQ)-mediated TMB Oxidation Model
3, 3', 5, 5'-tetramethylbenzidine (TMB) oxidation was used as a final signal reporter. The rate of GQ-catalyzed TMB oxidation could be observed by monitoring the reaction solution's absorbance at 650 nm wavelength (A650), the wavelength that was selectively absorbed by oxidized TMB. Kinetics of the reaction in pH 6.0 phosphate buffer was found to follow Michaelis-Menten kinetics (See Results Page). However, our final assay using pH 3.8 phosphate-citrate buffer did not indicate resource limitation to have significantly inhibited the current TMB oxidation reactions within the given time frame. In this model, we would thus assume resource limitation to be negligible. The reaction rate of TMB oxidation in our experimental settings was thus assumed to follow first-order reaction kinetics. Fitting oxidation kinetic parameters required integration of the preceding models. This analysis provided the final parameters required to construct an in silico model of our DNA detection kit.
Due to starting point dependency of the GQ model to the preceding two models, development of this TMB oxidation model will be performed in parallel to model integration, which would be explained in the next section.
GQ-Catalyzed TMB Oxidation Model
Linearity assumption within the observation time frame
Previous characterization of EAD2+3'A activity in catalyzing TMB oxidation revealed how the DNAzyme followed Michaelis-Menten kinetics: effective rate of reaction scales with TMB concentration but reaches a certain maximum rate of reaction. However, our subsequent observation on a different buffer condition did not indicate significant resource limitation effects. Without an observable A650 endpoint plateau and without absolute quantification of TMB concentration, this resource limitation effect could not be fitted with our experimental observation with sufficient accuracy. This forced us to resort to a simple linear model. Although this forced linearity may restrict the model's accuracy at different experimental setting, the resulting model could sufficiently describe colorimetric signal (A650) generation in our experimental settings.
Additional assumptions of the model
In addition to the aforementioned linearity assumption, the model was developed using the following additional assumptions.
- Background TMB oxidation rate is constant and follows zero-order reaction kinetics
- Background TMB oxidation rate is constant and follows zero-order reaction kinetics
- GQ concentration is expressed as RFU-equivalent unit
- TMB oxidation within the observation period did not proceed further than the first oxidation state (oxidation to the second oxidation state leads to a reduction in A650 signal).
Model equation
Here we modeled GQ-catalyzed TMB oxidation using first-order kinetics, in which the rate of reaction is linearly correlated with the apparent rate of reaction. The model is used to describe the progression of A650 over the course of observation.
Panel Fig. M6. Parameters used in our TMB oxidation model.
Sequential Integration of the Models
In the following analysis, we will perform parametric optimization to obtain the values required for the integration of the three models. This analysis was performed using observation data in which DNAzyme was obtained as a product of the preceding two reactions. The absolute concentration of the DNAzyme was thus unknown. Using our RPA and LSDA models, we estimated GQ concentration in our TMB oxidation reaction. The integration of the three kinetic models requires additional modeling measures to be included for TMB oxidation rate parameters to be fitted.
Sequential Integration and Parameter Optimization
RPA, LSDA, and TMB oxidation models were integrated by sequentially passing the product of each preceding reaction as the starting substrate for each consecutive reactions. The product concentration of each preceding reaction was diluted according to the sample dilution scheme used in our experiment.
Panel Fig. M7. The three reactions were integrated in a serial manner. Dilution rates used in each connections were fixed to the values used in our experiments.
Before the integration of the three reactions, two remaining parameters had to be obtained. The first parameter is the TMB oxidation rate. The second parameter plays a role in integrating LSDA and RPA reactions. Without information on absolute RPA amplicon/side product concentrations, there is no guarantee that one RFU equivalent of RPA amplicon will represent the same molar value of one RFU equivalent of RPA side product. As an illustration, just because 100 molecules of side product are needed to increase the solution's RFU by 1, it does not necessarily mean that 100 molecules of the amplicon would also translate to the same RFU increase - its molar value per-units of RFU might not be equal. Consequently, its contribution in LSDA in the production of GQ might also differ. This second additional parameter was introduced to empirically normalize the contribution of the two molecular entities in LSDA.
Another important assumption is that both molecular entities could produce GQ sequence in an LSDA reaction. This assumption was placed as an attempt to explain our experimental observation in which positive TMB oxidation signals could be observed from samples that did not contain the targeted templates.
Parameter Optimization
Parameter optimization was performed to fit two parameters, TMB oxidation rate and an LSDA correcting factor. The simulation per-optimization cycle was initiated from the RPA model, in which end-products of each simulation is passed to each subsequent reaction. The initial positive signal of the RPA reaction was fixed to the value of 50 RFU-equivalent. The resulting prediction of A650 time-course data was used to assess the model's fitness.
Optimization predicted a specific EAD2+3'A rate of TMB oxidation of 3.18 x 10-7 A650/[GQ]; in which [GQ] represents the concentration of EAD2+3'A DNAzyme in RFU-equivalent. The correcting factor for comparing amplicon-derived DNAzyme production with respect to side-product-derived DNAzyme production was predicted to be 74.7 (a.u.).
As expected, the linear model could not capture the non-linear behavior of A650 progression (Panel Fig. M8). Nevertheless, the prediction lies within 1.5 standard deviations of each observation triplicates for the case.
Panel Fig. M8 Linear model could sufficiently predict TMB oxidation progression within the observed time range and conditions.
Predictive limitations due to forced linearity used in TMB model
The currently used TMB oxidation model assumed zero-order kinetics with respect to TMB concentrations. Though effective as an empirical explanation, the approach may limit the model's predictive capability on different GQ concentration ranges. Moreover, the model might not be applicable should TMB concentration be changed, in particular to a lower TMB concentration in which rate restriction due to TMB amount limitations would not be negligible. Nevertheless, resource limitation during TMB oxidation did not appear to be significant within the timeframe of our experiment. Implementation of this aspect may come in the future, after further characterization of the technique.
Some experimental data may not be able to distinguish between two rates included in a reaction of interest. For the sake of simplicity, let us assume that these rates follow first-order reaction kinetics and are describable using one parameter each. Simultaneous implementation of the two rates into a model would thus pose a parameter identifiability problem – where the description of an outcome can be sufficiently described by an infinite combination of two contributing parameters.
Here is an illustration of the problem. Imagine you are handed with two infinitely large buckets of water. The water temperature in bucket A is 80°C, while that in your bucket B is 40°C. You are given the task to mix the water in such ways that the final temperature is 60°C. Of course, you can easily complete this task by mixing 10 ml of the water from bucket A and 10 ml of the water from bucket B. However, nothing is stopping you from mixing 20 mL from each bucket – as long as you mixed them in a 1:1 ratio, the final temperature should, theoretically, always be 60°C. Similarly, mixing 1 µL of water from each bucket is just as correct as mixing 1 L from each bucket. The existence of this infinite combination of solutions is what we refer to as an identifiability problem.
Although the outcome of this situation might not be as serious in the water mixing task, the identifiability problem in a parameter optimization attempt can pose a huge difference in what the resulting model depicts (as illustrated in the figures below).
Panel Fig. M9 Some possible scenarios leading to identifiability problem.
In our experiments, this identifiability problem was found during the development of the RPA model. This was due to our measurements' inability to distinguish between amplicon and side-product production rate - both products were measured simultaneously as relative fluorescence units (RFU). In the figure above, this is comparable to the second scheme.
To overcome this problem, the development of the RPA model was performed in a sequential manner, starting from the case where only one of the reactions took place. Back to the second scheme of the figure above, this is comparable to solving the case when the concentration of A is zero. Once the rate from A to P is excluded, determining kB will become an easy task. By assuming this rate from B to P to be independent of the rate from A to P, we can then determine the value of kA by fixing the value of kB to its value that we determined from the case where the flux from A to P was zero.
In the case of our RPA moel, the analysis began by first finding the rate of side-product generation rate in the absence of the target sequence (one of the rates was zero). After characterizing the side-product generation, we can then fix parameter values required to describe the side-product generation and determine the values of the remaining parameters that were required to describe amplicon generation in a more complex setting (both rates were not zero).
All analyses were conducted on R. Simulations of ordinary differential equations (ODE) were performed using the 'RxODE' package. Each parameter optimization routine was performed using a two-step approach using particle swarm optimization (PSO) and limited-memory Broyden-Fletcher-Goldfarb-Shanno with box constraints (L-BFGS-B) methods which were implemented in R using the 'pso' and 'optimx' packages, respectively. Here, the meta-heuristic method PSO was utilized to maximize search space exploration, which is often the weak point of the standard L-BFGS-B method. The utilization of this approach improves the algorithm's chance to identify a global optimum in a multi-parametric optimization problem. Parallel computing was performed using the packages 'foreach' and 'doParallel' when applicable to speed-up computationally heavier optimization runs.
References
- Nie, J., Cai, L. Y., Zhang, F. T., Zhao, M. Z., Zhou, Y. L., & Zhang, X. X. (2014). A G-quadruplex based platform for label-free monitoring of DNA reaction kinetics. The Analyst, 139(24), 6542–6546. https://doi.org/10.1039/c4an01569j