Team:SJTU-BioX-Shanghai/Kinetics Model

home

kinetics model

Kinetics Model

Off-target Rate

Previous studies have made a lot of precise predictions on the identification and ranking of the off-target effect while few researchers have conducted an in-depth study in actual off-target rate[1],[2],[3]. The exploration of the off-target rate is mainly limited by the difficulty in obtaining accurate off-target rate values in experiments.

If we want to assess the off-target effect for catalytically dead Cas9 (dCas9) on the genome, chromatin immunoprecipitation followed by sequencing (ChIP-seq) of DNA bound to dCas9 can display the different degree of enrichment at the off-target sites[4]. However, a reliable method to convert the enriched read count into off-target binding probability is not yet proposed.

Meanwhile, Genome-wide, unbiased identification of DSBs enabled by sequencing (GUIDE-seq) is based on global capturing of DSBs introduced by RNA-guided endonucleases (RGEN)[5], enables genome-wide profiling of off-target cleavage by CRISPR/Cas9 nucleases. In short, GUIDE-seq and other high throughout sequencing approaches result in an indel frequency as an experimental approximation for the off-target rate. Indel frequency is measured by numbers of indel read and total read and normalized to eliminate background noise from array synthesis and PCR amplification[6].

$$\text{Indel frequency (%)} = \frac{\text{Indel read} - (\text{Total read}\times\text{Backgroud indel frequency})}{\text{Total read} - (\text{Total read}\times\text{Backgroud indel frequency})}\times100$$

Principle

Previous studies have demonstrated that different guide RNA structures can affect the cleavage of on-target and off-target sitesz[7],[8]. That is to say, different target sequences decide their unique kinetic process of off-target binding and cleavage. Crystal structure studies and experiments suggest that the PAM sequence is essential for the initiation of Cas9 binding[9], while the seed sequence directly adjacent to PAM is also critical for Cas9 binding, R-loop formation, and the nuclease activities in Cas9. Therefore, we have integrated the stochastic process (Markov process) and kinetics equation (Eyring equation) to estimate the off-target rate for a certain Cas9/dCas9 target sequence. Our model bridges the change of free energy of Cas9/dCas9 system with PAM recognition and mismatch situation on the 20-bp protospacer.

Kinetics process of binding and cleavage of CRISPR/Cas9
Mismatch results in increase in transition state energy

In our kinetics model, we consider the binding process of Cas9/dCas9 consists of finite states from the unbound stage, PAM recognition to base pairing. Previous study has proposed that the probability of cleavage on a target site or transcriptional regulation once the substrate is bound is equivalent to the stationary probability of a Birth-Death process[10]. We label the state set $S$ with $i \in [-1,N+1]$ where $s_{-1},s_0,s_{N+1}$ represent the unbound state, PAM recognition, and final cleavage/regulation respectively. Another intermediate state $s_i (i\in[0,N])$ indicates that the $i$th position is reached during the formation of R-loop. $N$ is usually 20 for Cas9/dCas9 target sequences, and each state $i\in[0,N]$ has rates $k_f(i)$ and $k_b(i)$ associated with it for transitioning to $i+1$ and $i-1$ respectively. At last, we define the probability of cleavage or regulation $P_i$ as the proportion of Cas9/dCas9-sgRNA complex that completes the targeting process before returning to the previous state starting with a state $s_i$. We count all paths that from $s_n$ to $s_{n+1}$ to construct a recursion relation for the off-target probability.

$$P_n = \sum_{m=0}^{\infty}(f(n)(1-P_{n+1}))^mf(n)P_{n+1},\text{where } f(n)=\frac{k_f(n)}{k_b(n)+k_f(n)}$$

$$(f(n)P_{n+1}-1)\times P_n = (f(n)(1-P_{n+1}))^0\times f(n)P_{n+1}$$

$$P_n = \frac{f(n)P_{n+1}}{f(n)P_{n+1}-1} = \frac{P_{n+1}}{\gamma_n+P_{n+1}},\text{where }\gamma_n =\frac{k_b(n)}{k_f(n)}$$

The off-target probability staring with a full R-loop and without reducing the R-loop’s length can be simply obtained through $f(n)$ which is the proportion of transition to final cleavage/regulation state.

$$P_N = \frac{k_f(N)}{k_b(N)+k_f(N)} = \frac{1}{1+\gamma_N}$$

Now we can obtain the off-target rate according to the recursion relation.

$$\frac{1}{P_0} = 1+\gamma_0\frac{1}{P_1} = 1+\gamma_0+\gamma_0\gamma_1\frac{1}{P_2}=\cdots=1+\sum_{n=0}^N\prod_{i=0}^n\gamma_i$$

$$P_{off-target} = P_0 = (1+\sum_{n=0}^N\prod_{i=0}^n\gamma_i)^{-1}$$

Eyring equation connects the reaction rate (kinetics concept) and free energy (thermodynamics concept) based on the transition state theory, providing an access to energy interpretation for the off-target effect for Cas9/dCas9 on a certain target sequence.

$$k = \frac{k_BT}{h}\exp\left({\frac{-\Delta G}{RT}}\right)$$

We can next measure $\gamma_i$ with the delta energy between two adjacent states because $k_f(i)$ and $k_b(i)$ are proportional to the corresponding reaction rate constant.

$$\gamma_i = \exp\left(\frac{T_{i-1,i}-E_i}{RT}\right)/\exp\left(\frac{T_{i,i+1}-E_i}{RT}\right) = \exp\left(\frac{T_{i,i+1}-T_{i-1,i}}{RT}\right),$$

where $T_{i,i+1}$ means the free energy of transition state between $i$ and $i+1$

$$P_{off-target} = (1+\sum_{n=0}^N\prod_{i=0}^n\gamma_i)^{-1} = \left[1+\sum_{n=0}^N\exp\left(\frac{\Delta T_n}{RT}\right)\right]^{-1},\Delta T_n = T_{n+1,n}-T_{-1,0}$$

We have four kinetics parameters to simplify the energy difference between transition states so that we can estimate the off-target rate for different sequences in a directed way. We assume that the reduction of free energy at PAM recognition and matching of base-pair are fixed value for the same target or sgRNA sequence. Mismatching, cleavage, and regulation result in a consistent influence on energy respectively.

$$\Delta T_n = -\Delta_{PAM} - n_c(n)\Delta c + (n-n_c(n))\Delta I + \delta_{n,N}\Delta_{clv/reg}$$

Explanation of some symbols in the formulae above
Symbol Meaning
$\Delta_{PAM}$ Energy change when PAM recognition
$\Delta c$ Energy change when correctly matching
$\Delta I$ Energy change when incorrectly matching
$\Delta_{clv/reg}$ Energy change when cleavage or regulation happens
$n_c(n)$ numbers of correct matches within first n position
$\delta_{n,N}$ Indicator variable, equals 1 when $n=N$ else 0

Analysis

We have applied our kinetics model to fit several off-target rate datasets (mainly detected by GUIDE-seq technique). Although the reliability of our prediction is very dependent on the fitted data, the off-target rate obtained by the model can still provide effective guidance for the choice of sequence in experimental design. Moreover, our kinetic parameters have great potential to interpret the binding or cleavage divergence in mechanism between different target sequences. They also enable us to compare the probable alterations about the kinetics process between WT Cas9 and evolved ones with higher specificity or fidelity.

We fit our model with several target sequences (SpCas9 vs xCas9)to infer the difference in the binding process. We use genetic algorithm to approach the optimal fitting result and obtain the approximate kinetics parameters about PAM recognition, R-loop formation and cleavage.

Model parameters and correlation coefficients for different target sequences and Cas9 variants
Target Cas9 $\Delta_{PAM} /k_BT$ $\Delta c /k_BT$ $\Delta I /k_BT$ $\Delta_{clv} /k_BT$ corrcoef
VEGFA SpCas9 -0.483 -0.270 1.713 1.779 0.861
VEGFA xCas9 3.7 -1.288 -0.456 4.946 0.098 0.988
HEK site1 SpCas9 -1.528 -0.543 3.028 1.031 0.739
HEK site1 xCas9 3.7 -4.302 -0.522 6.332 0.494 0.968


$\Delta_{PAM}$ and $\Delta_{clv}$ mainly decide the maximum of binding and cleavage probability for the Cas9 protein on a certain target sequence, which indicates the value of on-target rate. Maybe the difference in DNA and sgRNA structure and their interaction affect the stability of the system after PAM recognition. This approximate result from the model is in line with the general understanding of the variance in the difficulty of target binding. Meanwhile, the energy reduction caused by correct matching is much smaller than the penalty for mismatching. The most essential finding from our model is the significant difference in kinetics parameters before and after directed evolution. Transcriptional activation and genomic DNA cleavage experiments by evolved xCas9-3.7[11] in human cells display better binding ability for all PAM sequence (resulting from larger $\Delta_{PAM}$) while the reduction of off-target rate comes from the stricter penalty on mismatching ($\Delta I$).

The alteration of transition state energy when targeting on a off-target site.
Off-target effect caused by the mismatch of a single position.

We can observe a seemingly abnormal phenomenon that the off-target effect becomes more distinct when single-mismatch happens at the PAM-distal region. However, the off-target site within one mismatch position at the PAM-distal region rarely appears on the genome so that we can drive the conclusion that xCas9 variants after directed evolution significantly reduce the off-target effect and improve DNA specificity. On the other hand, the extension of the seed (PAM-adjacent) region occurs after directed evolution, improving the conservation of pairing between sgRNA and DNA. (The length of seed region: 9 vs 4 for VEGFA, and 6 vs 4 for HEF site1 if we assume the position when off-target probability drop to half of the maximum as the border of the seed region.)

Off-target effect caused by the mismatch of two positions.

When we evaluate the off-target effect caused by the mismatch of two positions, xCas9 relieves the off-target effect on both VEGFA site and HEF site1 evidently. Although we lack off-target sequences within 2 mismatches, the energy alteration may vary along the 20-bp sequence and high coefficient of variation in indel frequency, our kinetics model still provides reliable estimation on the off-target rate and make a reasonable inference about the binding and cleavage mechanism.

Reference

[1] Labun, K., Montague, T.G., Krause, M., Cleuren, Y.N., Tjeldnes, H., & Valen, E. (2019). CHOPCHOP v3: expanding the CRISPR web toolbox beyond genome editing. Nucleic Acids Research, 47, W171 - W174.
[2] Xie, S., Shen, B., Zhang, C., Huang, X., & Zhang, Y. (2014). sgRNAcas9: A Software Package for Designing CRISPR sgRNA and Evaluating Potential Off-Target Cleavage Sites. PLoS ONE, 9.
[3] Concordet, J., & Haeussler, M. (2018). CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Research, 46, W242 - W245.
[4] O'Geen, H., Henry, I., Bhakta, M.S., Meckler, J.F., & Segal, D. (2015). A genome-wide analysis of Cas9 binding specificity using ChIP-seq and targeted sequence capture. Nucleic Acids Research, 43, 3389 - 3404.
[5] Tsai, S., Zheng, Z., Nguyen, N., Liebers, M., Topkar, V.V., Thapar, V., Wyvekens, N., Khayter, C., Iafrate, A., Le, L., Aryee, M., & Joung, J.K. (2015). GUIDE-Seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nature biotechnology, 33, 187 - 197.
[6] Kim, H.K., Lee, S., Kim, Y., Park, J., Min, S., Choi, J.W., Huang, T.P., Yoon, S., Liu, D., & Kim, H. (2020). High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells. Nature Biomedical Engineering, 4, 111-124.
[7] Xu, X., Duan, D., & Chen, S. (2017). CRISPR-Cas9 cleavage efficiency correlates strongly with target-sgRNA folding stability: from physical mechanism to off-target assessment. Scientific Reports, 7.
[8] Gong, S., Yu, H., Johnson, K.A., & Taylor, D. (2018). DNA Unwinding Is the Primary Determinant of CRISPR-Cas9 Activity. Cell reports, 22 2, 359-371 .
[9] Anders, C., Niewoehner, O., Duerst, A., & Jinek, M. (2014). Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature, 513, 569 - 573.
[10] Klein, M., Eslami-Mossallam, B., Arroyo, D.G., & Depken, M. (2018). Hybridization Kinetics Explains CRISPR-Cas Off-Targeting Rules. Cell reports, 22 6, 1413-1423 .
[11] Hu, J., Miller, S.M., Geurts, M.H., Tang, W., Chen, L., Sun, N., Zeina, C.M., Gao, X., Rees, H., Lin, Z., & Liu, D. (2018). Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature, 556, 57 - 63.