Team:SJTU-BioX-Shanghai/Molecular Dynamics

home

Molecular Dynamics

Molecular Dynamics

Motivation and Methods

Cas9 has always been wildly used as a gene editing tool. In order to understand more about the relationship between Cas9 protein structure and function, we performed a comparative molecular dynamics simulation of dCas9 on and off target states.

We prepared 3 simulation systems including target, lure2 and lure3. The dCas9 binding site we used was the PDCD1 gene editing target in therapy published recently[1]. We used dCas9 X-ray crystal structure from Protein Data Bank (PDB code: 6K3Z) as our template, and mutated the sgRNA and target DNA sequence to PDCD1 sequence in our project. In "target" system, we used the target which was exactly matched with the sgRNA protospacers. This represented the on-target case. Lure2 was from the most abundant off-target site, and lure3 has the highest probability predicted by our model. They both have several mismatches with sgRNA protospacer recognition site.

System sequence Simulation Time (ns) Temperature(K)
Target GCAGTTGTGTGACACGGAAG 200 298.15
Lure2 GACGTTTATTGACCCGGATG 100 298.15
Lure3 GCAGTGATGTGGCACGTAAG 100 298.15

All the simulations were run by AMBER18 suit, using the most popular ff14SB force field[1] for protein, bsc1[2]for DNA and OL3[3] for RNA simulation, with TIP4P-Ew[4] solvent model (the most accurate and commonly used system for protein-nucleotide simulation system). For each system we simulated for 100 nanosecond (ns), target simulation we ran another 100 ns simulation to have more samples.

Molecular Dynamics and Data Analysis Methods

Initial protein structures were mutated by Pymol program. Simulation systems were built with the LEaP module in the Amber 18 suite.[5] Counterions were added to neutralize the systems, which were then solvated in a truncated octahedral box of TIP4P-Ew water molecules with a buffer of 10 Å.(41) Long-range electrostatic interactions were calculated with the particle mesh Ewald (PME) algorithm.[6] The CUDA version of PMEMD[7]was used to accelerate the MD simulations. All bonds involving hydrogen atoms were constrained with the SHAKE algorithm.[8] All of the systems were relaxed for 20 000 steps with steepest-descent minimization and then were heated up for 20 ps and equilibrated for 20 ps in the NPT ensemble at 298 K with PMEMD. CPPTRAJ in AmberTools18 was used to analyze the root-mean-square deviations (RMSDs), RMS fluctuations (RMSFs), protein contact maps, hydrogen bonds distribution, secondary structure and residue distances.[5]

Result Analysis

General Analysis

After simulation, we first compared the root mean square deviation (RMSD) and root mean square fluctuation (RMSF). In RMSD analysis, we selected the first frame as reference structure.

Fig.1 RMSD and RMSF of simulated systems

Comparing the RMSD in 3 systems, we chose 30 to 100 (200) ns simulation as production simulation. Protein structure are relatively stable after 30 ns. Three protein were all stable in simulation.

Base Pair Distances

We calculated distances between paired nucleotides in the recognition region. The figures below are arranged from far to near PAM(protospacer adjacent motif). When Cas9 identifies its target, except for PAM region, it compares the protospacers on sgRNA with corresponding DNA sequence containing 20 base pairs. When there is a mismatch between those base pairs, we call it the off-target scenario. So we compared the distances between on-target and off-target sites and found that the distances between base pair showed a shift when there was a mismatch. In addition, sometimes mismatches can affects its neighbors, like mismatch on 6th place would affect the distance on 5th place in Lure2.

Fig.2 Distances of 20 base pairs in recognition areas

Hydrogen Bonds

We also analyzed the differences between hydrogen bonds. There are plenty of hydrogen bonds from dCas9 protein to DNA-RNA complex, which perform a key role in binding and locating. Here we present the average amount of hydrogen bonds between protein and DNA/RNA. The hydrogen donor is protein and the acceptor is nucleotides, with more negatively charged atoms on nucleotides.

Fig.3 Hydrogen bonds distribution in Cas9 on and off target systems

We can see that the hydrogen bonds broke when there was a mismatch at the 10 and 11th point on sgRNA protospacers. This may cause an energy rise in dCas9 specificity when binding different targets. The 2nd base on PAM region also showed a difference.

Contact Map

Contact can reflect the relationship between protein and its ligand. Here we calculated the contact map between protein residues and RNA protospacers. The contact map revealed some important residues in protein, showing that there is a structural change when Cas9 bonds a wrong off-target site. In this figure, first 3 rows show the contact map for target, lure2 and lure3. The 4 and 5th row show the difference between lure2/lure3 and target state.

Fig.4 Trimmed contact map of sgRNA contact with protein complex

Reference

[1] Lu, Y., Xue, J., Deng, T., Zhou, X., Yu, K., Deng, L., ... & Shen, H. (2020). Safety and feasibility of CRISPR-edited T cells in patients with refractory non-small-cell lung cancer. Nature Medicine, 26(5), 732-740.
[2] Ivani, I., Dans, P. D., Noy, A., Pérez, A., Faustino, I., Hospital, A., ... & Portella, G. (2016). Parmbsc1: a refined force field for DNA simulations. Nature methods, 13(1), 55.
[3] Zgarbová, M., Otyepka, M., Šponer, J., Mládek, A., Banáš, P., Cheatham III, T. E., & Jurecka, P. (2011). Refinement of the Cornell et al. nucleic acids force field based on reference quantum chemical calculations of glycosidic torsion profiles. Journal of chemical theory and computation, 7(9), 2886-2902.
[4] Horn, H. W., Swope, W. C., Pitera, J. W., Madura, J. D., Dick, T. J., Hura, G. L., & Head-Gordon, T. (2004). Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew. The Journal of chemical physics, 120(20), 9665-9678.
[5] Case, D. A., Cheatham III, T. E., Darden, T., Gohlke, H., Luo, R., Merz Jr, K. M., ... & Woods, R. J. (2005). The Amber biomolecular simulation programs. Journal of computational chemistry, 26(16), 1668-1688.
[6] Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W., & Klein, M. L. (1983). Comparison of simple potential functions for simulating liquid water. The Journal of chemical physics, 79(2), 926-935.
[7] Gotz, A. W., Williamson, M. J., Xu, D., Poole, D., Le Grand, S., & Walker, R. C. (2012). Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. Generalized born. Journal of chemical theory and computation, 8(5), 1542-1555.
[8] Ryckaert, J. P., Ciccotti, G., & Berendsen, H. J. (1977). Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. Journal of computational physics, 23(3), 327-341.