Team:SJTU-BioX-Shanghai/Rational Design

home

overview

Rational Design

Promare

Protein and Macromolecular Remodeler (PROMARE) is a protein design software originally developed by SJTU-BioX-Shanghai team. With Promare, we integrated 3 correlated modules including feature extraction module, mutation module and design module.

Sketch of Promare software

With Promare, users can analyze protein-macromolecule interactions , conduct point mutations screening and finally conclude the target protein design strategy. Here, macromolecule refers to DNA, RNA, glycogen, et al. Protein-macromolecule complexs occur ubiquitously in nature like the construction of viruses. In addition, they play fundamental roles in all basic life processes (protein translation, cell division, vesicle trafficking, intra- and inter-cellular exchange of material between compartments, etc.)[1,2].

What can Promare do?

Promare can help you find the key residues in protein, and design a strategy to improve protein function.

How does Promare find key residues?

With Promare, you can visualize some key properties in target protein, like distances, interaction networks, contact maps, et al. Compared with mutant data from literature (directed evolution), researchers can find those key features and analyse their changes in mutation step.

I already got a high reactivity/stability protein, why should I use Promare to design protein?

It sounds nice when you already have the protein with function you want. Sometimes they were from directed evolution, sometimes from saturation mutation, or maybe from molecular dynamics or quantum chemistry calculation. But all of these approaches cost a huge amount of money and time to obtain a single protein. With Promare, you can collect those understandings made by former scientists and design your own protein. For example, Maybe there is a protein stable in 310K, but you want one in 305K; or maybe there is an engineering Cas9 to have a wider PAM, but you just want the NGA PAM. Promare can find those features which make sense in these functions, then you can modify protein structure, build the feature diversity, and find the protein with the specific function you want.

In our project, with Promare, we are able to design our target protein - Cas9 (actually it's dCas9). We find that the dynamic structural differences between on-target complex and off-target are often in the protein-DNA-RNA boundaries, so we focused on these regions in rational design. [Connection with Molecular Dynamics]. In those findings, we find that in Cas9 protein, there are lots of amino acids have interaction with DNA-sgRNA double helix. In feature extraction module, we mainly focused on these regions.

Feature Extraction

We compared the structural difference between wild type spCas9 and directed evolved xCas9 with PACE by David R. Liu group[3].Both protein structures of SpCas9 and xCas9 are from this article[4]. Compared with commonly used wild-type SpCas9 protein, xCas9 has higher specificity and a wider PAM region[3]. Therefore, we use Promare to compare the structural difference between SpCas9 and xCas9 to find the key feature related to the off-target effect between them.

In Cas9 on/off-target effect, one of the most important features is the difference between the local structure around the DNA-sgRNA double helix and close protein residues. The recognition area in Cas9 is the first 20 nucleotides in sgRNA and 20 nucleotides followed by NGG PAM in DNA.

We first selected all the residues with a close contact with these residues (close contact: 15 Å). These residues are located in recognition region.

This figure showed the location of the recognition area in spCas9 and xCas9.

Compared the distance map and contact map between spCas9 and xCas9, we can find out that the contact changed a lot between them, and recognition area has more different than others.

Interestingly, we found that the 8 mutation site on xCas9 distributed wild. They are not close to the recognition area, only 4 of them within 15Å range, and 1 of them are on the protein outer surface, far from the recognition area.


After analyzing the contacts and mutation sites, we focused on the cavity provided by Cas9 protein for base pairs in recognition area. In molecular dynamics simulations we know that off-target mismatch took up in more regions, which means bigger cavity has a higher tolerance for off-target situations. To quantify the cavity size, we conducted a research on the atomic chemistry environment around phosphorus atom in the outer part of nucleotide. By extracting the top 10 closest protein atoms on protein to each nucleotide, we can know the size between nucleotide and protein. This figure showed the difference between spCas9 (blue) and xCas9 (orange) in top 10 closest protein-nucleotide distances.

In this figure, xCas9 showed an obvious improvement in recognition area cavity. Also, the top 10 distances can be a key feature when we want to learn about off-target effect.

Mutation and Design

The mutation step in Promare is based on python and pymol[5]. Our python script can build a pymol batch pml file. After running the pmf file, users can mutate protein to plenty of candidates for selection.

We built a mutation process to mutate all residues appeared in top 10 distance. Within 2h's PC calculation, we searched 19*129 mutants and evaluated them. The scoring function is as follows: $$ \text{Score} = \Delta a + 0.3\times \Delta b + 0.1\times \Delta c $$ a, b, c, refers to the first, second and third closest atom to nucleotide distances. Comparing all the 2451 mutants, we find that only a small part of them showed difference to wild-type SpCas9.

Following figure showed the score of 129 mutated sites, each site contains 19 kinds of mutation.

In those mutated sites, we selected some of them to do further expression and tested them by our characterization and off-target detection system.

Reference

[1] Dutta, S., & Berman, H. M. (2005). Large macromolecular complexes in the Protein Data Bank: a status report. Structure, 13(3), 381-388.
[2] Neuman, N. (2016). The complex macromolecular complex. Trends in biochemical sciences, 41(1), 1-3.
[3] Hu, J. H., Miller, S. M., Geurts, M. H., Tang, W., Chen, L., Sun, N., ... & Liu, D. R. (2018). Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature, 556(7699), 57-63.
[4] Chen, W., Zhang, H., Zhang, Y., Wang, Y., Gan, J., & Ji, Q. (2019). Molecular basis for the PAM expansion and fidelity enhancement of an evolved Cas9 nuclease. PLoS biology, 17(10), e3000496.
[5] DeLano, W. L. (2002). PyMOL.