Model
Overview
Due to the limitations of wet lab, we use modeling to prove our hypothesis of which ACE2 would catch spike protein and if protease(pepP) can destroy spike protein correctly. To achieve this goal, we looked at docking and protein-protein interaction in order to find the binding site.
Structural Modeling
Method
We use SeeSar for docking all the ligands that are available during editing, clashes can also be visualized to better guide the editing process. During docking, it is often times necessary to be a little more forgiving about clashes as poses are generated on a coarse-grain grid of torsion angles only. For this reason, we added clash tolerance as an adjustable parameter in Docking mode besides affinity, which can also be shown on the table.
SeeSar 10 protocol
- load protein and define pockets
- select pocket and load ligand
- docking and calculate the affinity
Docking
Spike and pepP affinity
There are lots of pockets on a spike protein.(figure 1) After filtering the sequence, we select the pocket with the sequence Pro-Phe in order(in the reference, it said that pepP would cut the sequence Pro-Phe into two pieces), and find the site which is on outer space. We then load pepP as ligand. As for docking analysis, we found that there are several poses which shows high affinity without clashes. From this outcome, we could conclude that our protease(pepP) construct is highly possible to bind on the spike protein and destroy it as expected.
Ace2 and pepP affinity
Though there are 10 pockets on the ACE2 RBD, there is only one pocket(figure 1) that contains Pro-Phe ordered sequence. Fortunately, that pocket is the only pocket located in the core instead of on the outer side. We then use the pocket to dock the spike affinity(figure 2). Results show that there is still a chance for ACE2 RBD to bind with spike. To prevent this risk, we make a point mutation to make the YAA become AAA, which would avoid pepP functioning in the original site.
Monitoring Spike Protein
Monitor SARS-CoV-2 Spike
In our project, knowing how SARS-CoV-2 may evolve naturally is an important safety step in finding the right protease to cut spike protein. Through monitoring real SARS-CoV-2 spike sequences we will get to know multiple things. First, we will get to know how often the spike protein mutates and what mutations are most prevalent. Second, after knowing which mutations are more likely to occur, we can make sure that the sites we found for protease cutting are not mutation hot spots so we can be sure that protease would likely be able to cut at the site without much incidence. Lastly, we can also visualize the mutation sites through protein structural modeling to see where exactly on the spike protein structure do the mutations potentially occur.
Collect Genomes and Preprocessing Data
How the monitoring is done exactly, is that at first we need to collect the sequences of multiple SARS-CoV-2 genomes. We get the sequences from the GISAID database. GISAID stands for global initiative of sharing all influenza data, and as of now they also share covid-19 data. From their website, the sequences downloaded are in FASTA file. We took all the sequences with the location “Taiwan” in it. In total there were 114 sequences and the dates ranged from January to late April.
Later on we put the FASTA files into Geneious for a better visualization of the sequences. Of the 114 sequences retrieved, one was deleted before further investigation due to it being radically short compared to the rest of the sequences. Later on, we used the reference genome on NCBI to see where exactly the spike sequence should be. After knowing how the sequence should start and how it should end, we trimmed all 113 sequences to leave only the spike sequence. The end product here is that we now have 113 spike sequences ready for further analysis.
Two Analysis Paths
For analyzing the spike sequences, we chose to analyze it in two ways. One is a time-based analysis where we first sort the sequences in a timely order, then we observe the spike sequences and see what mutations become the new trend. The second way of analysis is a location-based analysis where we first sort the spike sequences by location. Of all the Taiwan SARS-CoV-2 sequences, the locations were further sorted into Taipei, New Taipei, Taoyuan, Keelung, Tainan, and Kaohsiung.
Visualization
Figure 1 : A time record of incidence of mutations for site 614 on SARS-CoV-2 Spike Protein Sequence
From the graph(figure 1) we can see that the occurrence of D614G has greatly increased since March which is in line with global trends. Most of Taiwan’s Covid-19 cases are from people coming back from foreign countries so the trend of following global patterns is expected.
Figure 2 : A time record of incidence of mutations for site 791 on SARS-CoV-2 Spike Protein Sequence
The mutation of site T791I(figure 2) does not seem as prevalent compared to the mutation of D614G. For Taiwan, it seems that T791I mutation did show up briefly but then quickly disappeared and has not shown up again ever since.
Figure 3 : A location record of incidence of mutations for site 614 on SARS-CoV-2 Spike Protein Sequence before and after March 1st.
From the graph(figure 3) we can see that before March all the amino acids on site 614 was D but after March the northern region of Taiwan shows the most impact from D614G. Why this would happen we speculate that a lot of the population lives in the northern region and hence the place with more cases would naturally show more similarity with global trends.
Figure 4 : A location record of incidence of mutations for site 791 on SARS-CoV-2 Spike Protein Sequence before and after March 1st.
The T791I mutation is sometimes seen as being a co-mutation with D614G but in the case of Taiwan, T791I showed up early in the outbreak but later died down. Interestingly, the only location that did show this mutation was Taoyuan county(figure 4). Taoyuan county is the place of Taiwan’s biggest international airport so we speculate that the people with this mutation got it from foreign countries and then brought it back to Taiwan, but for some reason, maybe due to the mutation not bringing any evolutionary advantage to the virus added with the fact that not many people had this mutation to begin with, the mutation then quickly died down and disappeared.
Structural Visualization
After seeing what mutations occur among the SARS-CoV-2 spike sequences, we wanted to see where these common mutations show up exactly on the structure of the spike protein. For this we chose to use Swiss-model for protein structural modeling. The process of how Swiss-model predicts a protein structures is through protein homology modeling, in general it uses an experimentally verified structure as a template and through the template it then predicts the new target protein structure.
Figure 5 : Protein Structural Visualization of SARS-CoV-2 Spike Protein. Site 614 is located at the red band in the yellow circle.
From the graph(figure 5) we can see that the site of 614 is near the border of S1 and S2 subunit of spike protein. We speculate due to it’s important structural placing and of the process of how a virus infects a human (the virus has to undergo protease trimming before actually entering the cell), it is reasonable to think that a mutation at this location may cause change in the chance of infectivity.
Figure 6 : Protein Structural Visualization of SARS-CoV-2 Spike Protein. Site 791 is located at the red band in the yellow circle.
From the graph(figure 6) we can see that site 791 is nearer to the bottom region of the spike protein. Combined with our observation that this mutation later on did not show up in any more Taiwan cases and that the mutation site is nearer to the bottom region, we speculate that this mutation does not have much impact on infectivity of the virus.
References
- Elbe, S., and Buckland-Merrett, G. (2017) Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Challenges, 1:33-46. doi:10.1002/gch2.1018
- “Geneious 2020.1.2 (http://www.geneious.com, Kearse et al., 2012)”
- Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., Heer, F.T., de Beer, T.A.P., Rempfer, C., Bordoli, L., Lepore, R., Schwede, T. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 46(W1), W296-W303 (2018).
- Bette Korber,Will M.Fischer, Sandrasegaram Gnanakaran, Hyejin Yoon, James Theiler, Werner Abfalterer, Nick Hengartner, Elena E.Giorgi, Tanmoy Bhattacharya, Brian Foley, et al. Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus. Cell. 2020 Aug 20. Volume 182, Issue 4, Pages 812-827.e19