Team:IISER Berhampur/Model

IISER-BPR IGEM

Introduction


FRaPPe, the syn-bio device which we are designing, is an engineered bridge between our knowledge on biological systems and its applications in drug development. But because of the complexity of the biological systems, there exist a lot of knowledge gaps and hence engineering new things just out of our biological knowledge is not that easy. Modelling helps in getting out of this difficulty and provides a lot of insights to make the engineering process smoother, much before the actual execution. Hence, we took an approach of- ‘Imagine each step >>> Ask Questions: Why? How? >>> Modelling to answer the questions >>> Integrate the answers into the project >>> Proceed ahead ’ for designing our overall project.








The flow chart above gives an outline of all such questions and how we tried to answer them. Details regarding each aspect of modelling are described in the different sections below.













Mutational Analysis




Before going into this section, let’s first have a look at the various types of DENV Proteins.




Fig 1: Schematic diagram of the DENV genome showing structural and non-structural polyproteins that are encoded by the DENV genome (Chew et al., 2017)


The RNA genome of DENV encodes 3 structural proteins (C, prM and E) and 7 non-structural proteins (NS1, NS2A, NS2B, NS3, NS4A, NS4B, NS5). The structural proteins are the molecules that the virus particle comes equipped with, such as the proteins comprising their capsids or envelope. The non-structural proteins are encoded by the viral genome and expressed in the infected host cells. These play important roles in several processes such as replication of the virus.


Now let’s come back to the topic of Mutational analysis.


To answer the question ‘Which proteins to select as the target of the peptide inhibitors: Non-Structural or Structural proteins?’, we studied the mutational landscape of the DENV genome, to infer which proteins are relatively stable in terms of their mutational entropy so that the inhibitor is not quickly rendered ineffective by mutations in the genome.


To study the genome data of dengue virus, we used the data available in the Nextstrain (https://nextstrain.org/dengue) open-source project website. This website provides data for the genetic diversity of the DENV genome over several years. It compares the number of mutation events and Shannon Entropy values(which gives a measure of how frequently a gene mutates) of each codon that codes for an amino acid in the structural and non-structural proteins of the Dengue virus. It takes into account results from numerous studies from all over the world and provides a timeline for the evolution of the DENV genome and geographical distribution of the different genotypes.


To achieve our aim to understand the difference between the rate of mutations in NS Proteins and that in structural proteins, we took into account the number of mutation events that have occurred in each codon of all the genes of DENV. Out of these, we took those positions where the number of mutational events had been 10 or greater than 10. Some number of mutations is a common occurrence in most genes due to background mutational events. But if this number is significantly high in a gene, it shows that there is some kind of selection pressure. So this bar of 10 mutations was set as our threshold to isolate those positions where mutations were greater than normally occurring mutations, which may suggest a greater tendency to mutate. These positions would be more prone to undergo changes, thus rendering our inhibitor ineffective if targeted against these sites.


After this, the selected codon positions in each of the genes were plotted against ‘The Number of mutations’ and compared.


The results obtained were as follow:


  • For each of the four serotypes, the C and E structural proteins, along with the NS1 protein contained at least one codon that had undergone 10 or more than 10 mutations, implying that these proteins have evolved the most over the course of time. All the other NS Proteins did not have mutations greater than 10, in some serotype or the other.
  • There are comparatively less number of codon positions with high mutational events, in the Structural Proteins. This can be because Structural proteins are shorter in length and have less number of amino acids, as it is. Moreover, despite the shorter length, the number of mutations in C and E proteins for all 4 serotypes, is comparable to that of NS proteins that are longer. This proves that the rate of mutations is high in structural proteins.

Hence, it can be concluded that Non-Structural Proteins would be a better target for our Peptide Inhibitor since it was seen that Structural proteins are more prone to mutations. If chosen as a target for our peptide inhibitor, a more mutationally stable protein would ensure that the peptide inhibitor is viable for a longer time.









Graphs


1) DENV-1



2) DENV-2



3) DENV-3



4)- DENV-4






References




Hadfield et al., Nextstrain: real-time tracking of pathogen evolution, Bioinformatics (2018)Nextstrain: real-time tracking of pathogen evolution





Protein-Protein Interaction Studies




To validate the working of FRaPPe, we thought of selecting a particular DENV associated Protein-Protein Interaction, design peptide inhibitors against them and finally validate the efficiency of the designed peptide inhibitors in inhibiting the target PPI using our reporter system. This will also serve as an exemplar pipeline which can be followed while using our reporter system for building peptide inhibitors against various PPIs.


As we had decided to go ahead with non-structural proteins, hence we searched through all the available literature on DENV interactomes to find out various PPIs involving DENV NSPs (and the interacting partners being either DENV proteins or human proteins) and their specific details (e.g. interacting domains, amino acids involved in the interactions etc.). Based on this, we prepared diagrams representing the interactomes of each NSP.




Figure 1: Representative interactome of DENV NS5


Based on all the data we had collected, we wanted to select a particular PPI as our target PPI. For this selection, we appointed several criteria such as the amount of data available, whether the structures are available, how significant the role of the PPI is in DENV pathogenesis etc. Finally, by all these, we narrowed down to the interaction between Dengue Virus Non-structural Protein 5 (DENV NS5) and Human Signal Transducer and Activator of Transcription-2 (hSTAT2).


Just after the target PPI was selected, we again jumped into the literature to mine more data on DENV NS5-hSTAT2 interaction.


Here we will give a brief overview (mostly structural details) of what we obtained through the data-mining (the molecular mechanisms and pathway through which NS5-STAT2 interaction helps in DENV pathogenesis and the detailed biology relating it to our project will be discussed separately on Engineering Page.

  • hSTAT2 consists of 5 domains which are as follow (from N to C terminal): Amino-terminal domain, Coiled-coil domain (139-316), DNA-binding domain (317-458), -SH2 domain (568-686, 581-700) and Carboxy terminal domain.
  • In contrast, DENV NS5 consists of 2 domains N-terminal Methyl Transferase domain and a C-terminal RNA dependent RNA polymerase domain.
  • There are four serotypes of DENV (1-4) between which the structure, functions and mechanisms vary, but there is a lot of overlap too. The interaction between NS5-STAT2 has been mostly studied in DENV2 but has been also studied somewhat in DENV1. But this particular PPI is thought to be applicable for all serotypes because of the relatively high percentage of NS5 sequence identity when comparing DENV2 to the other 3 serotypes.
  • It is hypothesized that NS5 and STAT2 first bind with each other through specific regions present on each of them, then NS5 mediates STAT2 degradation and in this also specific regions on NS5 and STAT2 are responsible. But in this project, we are interested only in inhibiting the binding between NS5 and STAT2 (which will eventually inhibit the degradation of STAT2). Though several studies support this, evidence regarding direct interaction between NS5 and STAT2 is still lacking (some speculations suggest that NS5 and STAT2 may be actually binding to a third protein instead of binding to each other).
  • In spite of lack of any concrete evidence regarding NS5 and STAT2 direct interaction, people have mapped the ability of NS5 to bind STAT2 to a region within the hSTAT2 coiled-coil domain. Studies show that amino acids between 181-200, which falls in a loop region (in a helix-loop-helix motif), of STAT2, are indispensable for this interaction. Similarly, amino acids between residues 202 and 306 from the N-terminal region of the protein from the dengue 2 serotype have been shown to interact directly with STAT2.

The next job was to create a model of the NS5-STAT2 complex (as no models based on experimentally solved structures were available in the literature). For that, first, we went through the RCSB Protein Data Bank website (http://www.rcsb.org/) and searched for the available structures of DENV NS5 and hSTAT2, based on which we finally selected these two PDB structures for modelling:


  • PDB ID 5ZQK: Dengue Virus Non Structural Protein 5 (Dengue Virus 2) This was used to extract the DENV NS5 (monomeric) structure.
  • PDB ID 6WCZ: CryoEM Structure of full-length ZIKV NS5-hSTAT2 complex. This was used to extract STAT2 structure.
  • Then, using HawkDock Server (http://cadd.zju.edu.cn/hawkdock/), we docked (global docking) the extracted NS5 and STAT2 structure. Then we filtered the output models on the basis of how well they satisfy the experimental results mentioned earlier. The rank 7 model satisfied the experimental results very well with a score of -4037.07. The binding free energy was predicted to be -15.24 kcal/mol by MM/GBSA analysis. We named it as ‘model 1’.



Model 1 : Model of DENV NS5-hSTAT2 complex (Model 1) (Colour Codes: DENV NS5- Blue, hSTAT2- Green)


Model 1 : Model1 ( Colour Code DENV NS5- Yellow, hSTAT2- Pink)



But the problem with the available PDB structures for DENV NS5 and hSTAT2 was that none of them were for the complete proteins. Hence the docked structure was also made of these partial proteins. Hence, to overcome this problem we later thought of using homology modeling for building the complete structure. For that, first we obtained the sequence for full-length DENV NS5 (Accession: 5ZQK_A) and full-length hSTAT2 (Accession: 6WCZ_A) from NCBI Protein (https://www.ncbi.nlm.nih.gov/protein/). Then using SWISS-MODEL (https://swissmodel.expasy.org/) we created structures of complete DENV NS5 (Template: 5zqk.1.A) and hSTAT2 (Template: 6ux2.1.A) proteins by homology-modeling. Now using these two structures, we prepared the model for FL DENV NS5- FL hSTAT2 complex (Model 2) using a procedure very similar to what was adopted for the complex model created earlier (Model 1).




Model 2 : Model of full length DENV NS5-hSTAT2 complex (Model 2) (Colour Codes: DENV NS5- Blue, hSTAT2-Green, White Sphere- N-terminal residues, Red Spheres- C-terminal residues)


Based on this complex structure (Model 2), after analysing the position of N and C terminals, it seemed that it is better to have protein tags such as fluorescent proteins or the CID (Chemically Induced Dimerization) elements only at N terminals of both the proteins so that the tags do not perturb the protein structure or their interaction. This insight was taken into consideration while designing the constructs to avoid spurious results as well as to engineer an effective and efficient reporter (Please visit the Engineering Success Section for more details).






References:


1. Ashour, Joseph, et al. "Mouse STAT2 restricts early dengue virus replication." Cell host & microbe 8.5 (2010): 410-421.


2. Ashour, Joseph, et al. "NS5 of dengue virus mediates STAT2 binding and degradation." Journal of virology 83.11 (2009): 5408-5418.


3. Aslam, B., et al. "Structural modeling and analysis of dengue-mediated inhibition of interferon signaling pathway." Genetics and molecular research: GMR 14.2 (2015): 4215-37.


4. Boxiao, W., et al. (2020) CryoEM structure of full-length ZIKV NS5-hSTAT2 complex doi: http://doi.org/10.2210/pdb6WCZ/pdb


5. Chen F, Liu H, Sun HY, Pan PC, Li YY, Li D, Hou TJ. Assessing the performance of the MM/PBSA and MM/GBSA methods. 6. Capability to predict protein-protein binding free energies and re-rank binding poses generated by protein-protein docking. Physical Chemistry Chemical Physics, 2016, 18(18):22129-22139.


6. Chew, Miaw-Fang, Keat-Seong Poh, and Chit-Laa Poh. "Peptides as therapeutic agents for dengue virus." International journal of medical sciences 14.13 (2017): 1342.


7. Choubey, Sanjay Kumar, et al. "Structural and functional insights of STAT2-NS5 interaction for the identification of NS5 antagonist–An approach for restoring interferon signaling." Computational Biology and Chemistry 88 (2020): 10733


8. El Sahili, Abbas, and Julien Lescar. "Dengue virus non-structural protein 5." Viruses 9.4 (2017): 91.


9. El Sahili, Abbas, et al. "NS5 from dengue virus serotype 2 can adopt a conformation analogous to that of its Zika virus and Japanese encephalitis virus homologues." Journal of virology 94.1 (2019).


10. Feng T, Chen F, Kang Y, Sun HY, Liu H, Li D, Zhu F, Hou TJ. HawkRank: a new scoring function for protein-protein docking based on weighted energy terms. Journal of Cheminformatics, 2017, 9(1):66.


11. Guex, N., Peitsch, M.C., Schwede, T. Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: A historical perspective. Electrophoresis 30, S162-S173 (2009).


12. H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne. (2000) The Protein Data Bank Nucleic Acids Research, 28: 235-242


13. H.M. Berman, K. Henrick, H. Nakamura (2003) Announcing the worldwide Protein Data Bank Nature Structural Biology 10 (12): 980


14.Hou TJ, Wang JM, Li YY, Wang W. Assessing the performance of the MM/PBSA and MM/GBSA methods: I. The accuracy of binding free energy calculations based on molecular dynamics simulations. Journal of Chemical Information & Modeling, 2011, 51(1):69-82.


15. Hou TJ, Qiao XB, Zhang W, Xu XJ. Empirical Aqueous Solvation Models Based on Accessible Surface Areas with Implicit Electrostatics. Journal of Physical Chemistry B, 2002, 106(43):11295-11304.


17. Mazzon, Michela, et al. "Dengue virus NS5 inhibits interferon-α signaling by blocking signal transducer and activator of transcription 2 phosphorylation." The Journal of infectious diseases 200.8 (2009): 1261-1270.


18. Morrison, Juliet, and Adolfo García-Sastre. "STAT2 signaling and dengue virus infection." Jak-stat 3.1 (2014): e27715.


19. Protein [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988] - [cited 2020 Oct 02]. Available from:https://www.ncbi.nlm.nih.gov/protein/


20. ASun HY, Li YY, Tian S, Xu L, Hou, TJ. Assessing the performance of MM/PBSA and MM/GBSA methods. 4. Accuracies of MM/PBSA and MM/GBSA methodologies evaluated by various simulation protocols using PDBbind data set. Physical Chemistry Chemical Physics, 2014, 16(31):16719-16729.


21. Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., Heer, F.T., de Beer, T.A.P., Rempfer, C., Bordoli, L., Lepore, R., Schwede, T. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 46, W296-W303 (2018).


22. Weng GQ, Wang EC, Wang Z, Liu H, Li D, Zhu F, Hou TJ. HawkDock: a web server to predict and analyze the structures of protein-protein complexes based on computational docking and MM/GBSA. Nucleic Acids Research, 2019, 47(W1): W322-W330.


23. Zacharias M. Protein-protein docking with a reduced protein model accounting for side-chain flexibility. Protein Science, 2003, 12(6):1271-1282.


24. The PyMOL Molecular Graphics System, Version 2.3.2 Schrödinger, LLC.





Peptide Inhibitor (iPEP) Designing



So once we were done with the target PPI selection and building its complex model, the next job was to design peptide inhibitors against it.


The first question which came to us was ‘Our iPEPs should bind to which protein among NS5 and STAT2 so as to interrupt their interaction?’ Since the pathogenic protein NS5 used to bind and inhibit the function of the helpful STAT2 protein, we decided to target NS5 using iPEPs i.e. our designed iPEPs will be binding the residues on NS5 that are involved in its interaction with STAT2. In this way interaction of NS5 with STAT2 will be inhibited, hence allowing STAT2 to carry out its normal function, and in addition, since some residues of NS5 are now masked, it may inhibit other pathogenic functions of NS5 apart from its effect on STAT2. Hence, NS5 will be considered as a receptor and STAT2 as its ligand and our target is to block the receptor.



Then, with the Model 1 of NS5-STAT2 complex, we derived potential peptide sequences for iPEPs using Rosetta Petiderive Protocol (https://rosie.rosettacommons.org/peptiderive/). We obtained 3 sequences as output as follows (Lower the score, better is the peptide in binding to the receptor):


1: KEQKILQE (Score: -10.131 REU)

2: QTKEQKILQE (Score: -12.029 REU)

3: QTKEQKILQETL (Score: -12.115 REU)

These 3 sequences then entered into an in-silico iPEP designing Pipeline we adopted for this project. While peptide-based drugs are slowly growing in use, they have 2 major disadvantages associated with them.


  • In-vivo instability - Degradation by various natural proteolytic enzymes present inside the body decreases the half-life of the peptide drugs as well as decreases its biochemical concentration.
  • Membrane impermeability- Membrane impermeability is one of the major problems associated with peptide drugs and even though peptides can be natural ligands for GPCR but very few of the peptides are naturally taken up by cells.

After deducing the peptide sequences which can act as inhibitors, a number of chemical modifications were done in-silico to increase the half-life of the peptides and to stabilise it more.


Some of the common modifications which were done and their purposes are mentioned below. The list is prepared according to various literature studies which focused on the same aspect. The modifications which were necessary and useful for the peptide inhibitors were chosen.


  • Termini protection- Methionine was added at the end terminus as without them the peptides are usually more prone to degradation. So the addition of a chain of Methionine ensured that the termini are well protected from any degradation.
  • Amino acid substitution- Certain Amino Acid residues such as arginine were replaced with lysine to modify the rigidity and the conformation of the peptide and make it more suitable for our own purposes.
  • Identification of hydrophobic Amino acids which were replaced by polar or charged residues for the modulation of the biological pI while maintaining the bioactivity.
  • The chemical changes to polar or charged residues were also helpful to stop the formation of Hydrophobic patches, which usually makes them poorly soluble.
  • For the delivery of the peptide drugs, Cell-Penetrating Peptides can be designed which in conjugation with the peptide can be used for drug delivery.

The 3 initial peptide sequences and the necessary modifications carried out on them to generate the final peptide inhibitor library are mentioned below-



1.1:  KEQKILQE (Without Modifications)

1.2:  KEQKILQEMMMMMMMMMMM

1.3:  KEQKISQEMMMMMMMMMMM

1.4:  MKEQKILQEMMMMMMMMMMMM

1.5:  MKEQKISQEMMMMMMMMMMMM

1.6:  MKEQKISQEMMMMMMMMMMM

2.1:  QTKEQKILQE (Without Modifications)

3.1:  QTKEQKILQETL (Without Modifications)

3.2:  QTKEQKILQETLMMMMMMMMM

3.3:  MMMMMQTKEQKILQETLMMM

3.4:  QTKEQKILQETLMMMMMMMMMM

3.5:  MMMMMMQTKEQKILQETLMMMM

The various parameters such as Instability Index, GRAVY ( The GRAVY value for a protein or peptide is calculated by adding the hydropathy values of each amino acid residue and dividing it by the total number of residues in the sequence or length of the sequence, Increasing positive scores indicates increasing hydrophobicity), Theoretical pI and Half-Life of the designed peptide inhibitors were checked with the ProtParam tool of ExPasy (https://web.expasy.org/protparam/). The values obtained for each peptide are mentioned in table 1.


After the in silico chemical modifications, the docking of the peptide inhibitors to NS5 was performed by Dr Malay Kumar Rana using HADDOCK. The active site residues (constraints for docking) were given based on literature and Model 1. 10 residues ranging from 262 to 272 of NS5 were used to define the active site and hence the docking of the peptide inhibitors was done to that part. The docking scores obtained from HADDOCK for each peptide are mentioned in table


Lower HADDOCK score indicates better binding of the peptide inhibitor to NS5


Table 1: Results from various in-silico characterizations performed on each peptide inhibitor


CODE Peptide Inhibitor Instability Index GRAVY Theoretical pI Half Life(in mammalian reticulocytes, in vitro) HADDOCK SCORE (Kcal/mol)
1.1  KEQKILQE 111.11 (U) -1.688 6.14 1.3 hrs -76.6+/-5.0
1.2  KEQKILQEMMMMMMMMMMM 37.42(S) 0.389 6.14 1.3hrs -74.6+/-5.0
1.3  KEQKISQEMMMMMMMMMMM 20.26(S) 0.147 6.14 1.3 hrs -74.4+/-5.3
1.4 MKEQKILQEMMMMMMMMMMMM 33.43(S) 0.533 5.90 30 hrs -66.0+/-4.2
1.5 MKEQKISQEMMMMMMMMMMMM 17.91(S) 0.314 5.90 30 hrs -66.2+/-2.8
1.6 MKEQKISQEMMMMMMMMMMM 19.74(S) 0.235 5.90 30 hrs -60.7+/-9.3
2.1 QTKEQKILQE 90.89(U) -1.770 6.14 0.8 hrs -54.6+/-8.0
3.1 QTKEQKILQETL 77.41(U) -1.217 6.14 0.8 hrs -59.1+/-2.4
3.2  QTKEQKILQETLMMMMMMMMM 37.55(S) 0.119 6.14 0.8 hrs -71.5+/-7.3
3.3 MMMMMQTKEQKILQETLMMM 38.04(S) 0.030 5.90 30 hrs -79.8+/-14.4
3.4  QTKEQKILQETLMMMMMMMMMM 34.99 (S) 0.200 6.14 0.8 hrs -60.7+/-2.3
3.5 MMMMMMQTKEQKILQETLMMMM 32.87(S 0.200 5.90 30 hrs -77.8+/-3.7

(Note- U stands for Unstable and S stands for stable, values below 40 are usually considered as stable. Lower the value of Instability Index higher is the stability )


Because of the constraints placed upon us by the COVID19 pandemic, wetlab work couldn’t be carried out. Hence, the final ranking of peptide inhibitors couldn’t be done, which is supposed to be done in-vitro using FRaPPe.






Figure: Collage showing structure of each peptide inhibitor docked to DENV NS5


References


1- Bruzzoni-Giovanelli, Heriberto, et al. "Interfering peptides targeting protein–protein interactions: the next generation of drugs?." Drug Discovery Today 23.2 (2018): 272-285.


2- Lee, Andy Chi-Lung, et al. "A comprehensive review on current advances in peptide drug development and design." International journal of molecular sciences 20.10 (2019): 2383.


3- London N, Raveh B, Movshovitz-Attias D, Schueler-Furman O. (2010). Can self-inhibitory peptides be derived from the interfaces of globular protein-protein interactions? Proteins, 78:3140-49.


4- Lyskov S, Chou FC, Conchúir SÓ, Der BS, Drew K, Kuroda D, Xu J, Weitzner BD, Renfrew PD, Sripakdeevong P, Borgo B, Havranek JJ, Kuhlman B, Kortemme T, Bonneau R, Gray JJ, Das R., "Serverification of Molecular Modeling Applications: The Rosetta Online Server That Includes Everyone (ROSIE)". PLoS One. 2013 May 22;8(5):e63906. doi: 10.1371/journal.pone.0063906. Print 2013.


5- Yuval Sedan, Orly Marcu,Sergey Lyskov, Ora Schueler-Furman Peptiderive server: derive peptide inhibitors from protein–protein interactions Nucleic Acids Research 2016; doi: 10.1093/nar/gkw385


6- Gasteiger E., Gattiker A., Hoogland C., Ivanyi I., Appel R.D., Bairoch A. ,ExPASy: the proteomics server for in-depth protein knowledge and analysis Nucleic Acids Res. 31:3784-3788(2003).



Epidemiology


Literature


Before we started to work on our project we needed to gather existing information on the spread of the dengue. Therefore we began a search to find relevant data on the disease


.

Some key factors which we looked into were:


  • Seroprevalence, which is a good indicator of endemicity, was found to be approximately 57% in India based on pooled estimates of seven studies.
  • In the 69 studies which adopted WHO 1997 dengue case classification, the pooled Case Fatality Ratio was 1.1%. The pooled CFR for 8 studies that used the WHO 2009 case definition was 1.6%.
  • Proportion of Severe Dengue Cases: The overall percentage of severe Dengue among laboratory-confirmed studies was 28.9%.
  • Distribution of Serotypes: The published studies from India indicated the circulation of all the four-dengue serotypes, with DEN-2 and DEN-3 being the more commonly reported serotypes.
  • We then analyzed the distribution of Dengue in India with respect to several correlated factors: Age of infected person, Geographical region, and seasonality of Dengue.

DENGUE - A CLOSER LOOK AT HOME


We looked further into government records to see how the number of dengue cases changed over time . As we can see in the plot below the number of cases shows an increase over the years [2009-2017].




Figure 1: Cases and deaths due to Dengue in India


But the yearly data obtained does not reveal the complete picture. The following plot time vs Reported Dengue cases take over a period of 13 months gives us a better idea of the yearly trend . As we can see the number of cases starts to increase rapidly around August until it peaks around October. This increase in cases can be associated with the monsoon season in India which leads to more breeding grounds for the mosquitoes.


How does this help us ?


Knowing that the distribution of dengue cases is not uniform over the span of 1 year can help the government design policies which are more efficient . Robust sanitising and cleaning operations around August can also significantly decrease the disease burden.




Figure 2: Monthly distribution of Dengue cases in India


India is a vast country. To see whether dengue cases were spread uniformly over the country we dived into government records again. We expected that higher population density would result in a higher number of cases but we found some exceptions. Kerala even though having a high population density had comparatively low cases. This is because of the strict measures taken up by the Kerala government to prevent the spread of the disease which includes indoor space spraying and maintaining environmental cleanliness.




Figure 3: Statewise distribution of Dengue cases in India


Well a much deeper understanding of the dynamics of the disease was required and hence we tried to model the disease burden of dengue.


Before we move on to the inferences we made let us look at the model in detail. The basic SIR model we all know and love ( the pandemic has made us all look at it one time or the other ) does not work for a disease like dengue. This is because dengue cannot spread via direct contact, there is a third party involved (i.e. the mosquitoes). Therefore the basic model has to be modified to include the mosquitoes, which makes things a bit more difficult as then we have to consider mosquito birth rate, death rate the data for which is not readily available. But even with these problems, we can still make interesting inferences.



Equations




Where-

  • 1/ah =21600 days which define the average human survival rate
  • 1/av =4 days which define the average mosquito survival rate
  • Sh = susceptible human Ih= Infected human, Rh=recovered human, Sv=suseptible vector and Ih=infected vector
  • N= total human population in a given area
  • n = total vector population in a given area
  • α = contact rate vector to human or can be said as (transmission probability * Bites per susceptible mosquito per day)
  • β = contact rate human to vector or (transmission probability of human to vector * Bites per infectious mosquito per day)
  • γ = recovery constant
  • πh = birth rate of human
  • πv = birth rate of mosquito
  • ϕh = death rate of human
  • So =initial susceptible, Io=initial infected , Ro
  • Svo= initial susceptible mosquitoes and Ivo= initial infected mosquitoes


THE INITIAL VALUES

The equations require some variables to be considered. In this section, we introduce the values that we considered for running the model.


  • Ih = 500
  • Rh0 = 200
  • Ivo = 300000
  • N=960000000 (based on 1996 values [4] )
  • n=10000000000 [ Considering on average ]
  • πh = 10
  • πv = 25
  • ϕh = 0.01
  • ah = 0.000046
  • ah = 0.25

Results


The calculated plot for the number of infected people perfectly fits with the available data for a gamma value of 0.75 .

Gamma(γ) = 0.75(optimum/Present condition)




Figure 4: This Graph represents the infected curve with the current scenario basis



Figure 5: The above SIR plot depicts all the curves with the current scenario (including susceptible , infected and recovered human population along with the susceptible and infected vector population)

Gamma(γ) = 0.2(max infection)




Figure 6: This Graph represents the infected curve with the extreme conditions where no measures are taken neither from govt. or people and hence lowest gamma value



Figure 7:The above SIR plot depicts all the curves with the this lowest gamma value.

Gamma(γ) = 1.0 (no infection)




Figure 8: This Graph represents the infected curve with strict measure from all the people an d govt. side with proper vaccines so that the disease doesn't spread: hence the infected curve is flat.



Figure 9: The above SIR plot depicts all the curves with the above conditions

This SIR plot depicts several features:


  • The infected peak will rise no less than 4 lakhs infected per year by the end of 2024-2025, which is a mammoth number.
  • The infected vector population is increasing in the upcoming years very significantly which is another cause for alarm.

The gamma value as we described earlier is the recovery constant which is the fraction of the infected people that recover from the disease. The following plot shows the maximum number of infected people reached by our defined population [check the initial variables section] versus the recovery constant. From here we can see that an increase in the gamma value brought on by new drugs and better treatment can bring down the maximum number of infected people and hence deaths significantly.




Figure 10: The Graph depicts the relation between Gamma and Infected people


Further Comments

With many overlapping symptoms, Dengue and COVID-19 coinfections in the current scenario could potentially worsen the severity of both the diseases. With a paucity of medical expertise in dealing with these, the scale of the challenge is unpredictable, but can burden already overloaded health systems.



References


1. Ganeshkumar P, Murhekar MV, Poornima V, Saravanakumar V, Sukumaran K, et al. (2018) Dengue infection in India: A systematic review and meta-analysis. PLOS Neglected Tropical Diseases 12(7): e0006618. doi:10.1371/journal.pntd.0006618


2. A Boutayeb, EH Twizell A model of dengue fever M Derouich, Biomed Eng Online 2003 Feb 19;2:4. ,doi: 10.1186/1475-925X-2-4


3. Alemu Geleta Wedajo, Boka Kumsa Bole, Purnachandra Rao Koya, “Analysis of SIR Mathematical Model for Malaria disease with the inclusion of Infected Immigrants”, IOSR Journal of Mathematics (IOSR-JM) e-ISSN: 2278-5728, p-ISSN: 2319-765X. Volume 14, Issue 5 Ver. I (Sep - Oct 2018), PP 10-21


4. PopulationPyramid.net, India-website severe


5. I. Ming,Pratchaya Tang,Ruisheng Chanprasopchai,Wang, SIR Model for Dengue Disease with Effect of Dengue Vaccination, Computational and mathematical methods in Medicine, Volume 2018 |Article ID 9861572 | https://doi.org/10.1155/2018/9861572


6. Jain, A.,Gupta, N., Srivastava, S., & Chaturvedi, U. C. (2012). Dengue in India. The Indian journal of medical research,2012 Sep;136(3):373-90.





Molecular Dynamics Simulations



Introduction

How do these two proteins, DENV NS5 and hSTAT2 interact with each other? This was a question that intrigued us. Till date there is not much structural data available on these proteins. The interaction of human NS5 and hSTAT2 is still elusive. We performed molecular dynamics simulation on the protein complex to understand the interaction. Molecular dynamic (MD) simulation is a powerful technique to get detailed information at atomic resolution. Therefore, we moved to MD simulation to study the behaviour of DENV NS5 and hSTAT2 in-silico.


Starting Sructure

For the MD simulation study, we took Model 1 (described in the Protein-protein Interaction Studies Section) as the complex structure for NS5 and STAT2 obtained from docking.


Simulation Details

Molecular Dynamics (MD) is a computational method in which we analyse the physical movements of atoms and molecules under the influence of various forces in the system. The physical movements are obtained by solving Newton’s equations of motion numerically for each atom at each time point iteratively. We used GROMACS molecular modelling package (www.gromacs.org) .


We used the recent united atom force field GROMOS96 54a7 to speed up the simulation. This version of the force field has improved ϕ, ψ torsional angle terms, a modified N-H, C=O repulsive terms and other improvements. All the hydrogen atoms were treated implicitly by united atom approximation except for the polar and aromatic ring hydrogen atoms, which were treated explicitly. The simulations were performed with an explicit solvent, using a simple point charge (SPC) water model. The complex was placed initially in the center of the cubic periodic box and the box size was chosen such that the distance between any protein atom and the closest box edge was at least 1 nm. The structure of the complex was energy minimized three times using the steepest descent algorithm. Following this: water molecules and ions were added to solvate the proteins and to attain charge neutrality. Then, the potential energy of the entire system was minimized using the steepest descent algorithm with a tolerance of 1000 kJ mol-1 nm-1. Subsequent to energy minimization, position restrained MD simulation was carried out under NVT condition for 200 ps and followed by NPT condition for 200 ps at each of the predefined temperatures. MD run was initiated, which was used for integrating the equations of motion. Simulations were run on a high-performance computer.


Several plots were obtained and analysed using tools available in GROMACS. Visual Molecular Dynamics (VMD) (http://www.ks.uiuc.edu/Research/vmd/) and PyMol software were used to visualize and analyse the files generated from the simulation.





<

Results and Discussion


(i) Radius of Gyration indicates Diffusive Motion of two Proteins



Figure 1(a): Radius of Gyration of the NS5-STAT2 system as a function of time.





Figure 1(b): Radius of gyration of NS5 as a function of time .





Figure 1(c): Radius of gyration of STAT2 as a function of time.



The radius of gyration (Rg) is a measure of how expanded or compact is the conformation. Since in this case, we are measuring Rg between two proteins that are apart initially, Rg would tell us about how close or far away these two proteins are (Figure 1a). We can say so because independent Rg values of the two proteins are not changing much (Figure 1b and 1c) while the system Rg is fluctuating between higher value and lower values. From the plot of the radius of gyration of the system (NS5 and STAT2) we can see that the radius of gyration is first increasing to 6.4 nm, then decreases to a lower value of 5.5 nm at about 5 ns. Subsequently, Rg reaches to 6 nm around 10 ns. The radius of gyration again decreases substantially to 5.2 nm at about 13 ns and 15.1 ns before returning to the initial state. Ignoring the minor fluctuations and looking through the overall trend it seems that, initially the two proteins were far from each other (in comparison to the state in which they were at the beginning of the simulation), then they are coming towards each other (reducing the radius of gyration) repeatedly. This observation suggests that the two proteins are undergoing diffusive motion.


(ii) Root Means Square Deviation indicates that Proteins are Undergoing Relaxation



Figure 2(a): RMSD plot for NS5 as a function of time.



Figure 2(b): RMSD plot for STAT2 as a function of time.


We generated the Root mean squared deviation (RMSD) plots for the individual proteins. The RMSD plot (for individual protein) serves as a method to measure the conformational changes associated with the protein over a given timescale of simulation in comparison to the initial state in the simulation. RMSD for NS5 plot shows an increase in RMSD values 0.6 nm at about 7 ns, after which there is not much change in RMSD values and is stable around that value. RMSD for STAT2 gradually increases and stabilizes at about 0.4 nm of RMSD at about 12 ns (Figure 2b). Such an increase in RMSD values is often seen in proteins when the simulation starts from the crystal structure. The protein undergoes relaxation in the added solvent environment which soon stabilizes and attains equilibration.


(iii) Root Mean Square Fluctuation (RMSF) indicates regions with High and Low Flexibility



Figure 3(a): RMSF plot for NS5



Figure 3(b): RMSF plot for STAT2


RMSF is a measure of how much each residue in a protein fluctuates over the time course of the simulation. From the RMSF plots of NS5 and STAT2, we can observe that each protein is undergoing residue fluctuation across the polypeptide chain. There are distinct regions on the protein which show greater than 0.3 nm of fluctuation while other residues show lower than 0.3 nm. The regions where RMSF values are exceeding 0.3 nm mostly correspond to the loop regions of the protein while lesser values of RMSF indicate structural rigidity. The residues such as LYS183, HIS187 and GLN188 of STAT2 protein which are known to interact with NS5 also show high flexibility. It could be because such flexible regions often participate in protein - protein interactions.


(iv)Hydrogen Bonds indicate the Structural Stability during the entire course of Simulation



Figure 4(a): Plot for Number of hydrogen bonds with time for NS5



Figure 4(b): Plot for number of hydrogen bonds with time for STAT2


In these plots we have shown how the number of hydrogen bonds in each protein is varying with respect to time (in the simulation). From the plot for NS5, it can be seen that initially for a very short time period, the number of hydrogen bonds increases steeply, but afterwards remains at a relatively constant value 710(approx) before decreasing, whereas from the plot for STAT2 it is observed that the number of hydrogen bonds continues to increase, stays at a somewhat constant value of 275. From this it can be concluded that, STAT2 and NS5 undergo equilibration during the initial period of the simulation and afterwards both the proteins stabilize and sample a characteristic number of hydrogen bonds corresponding to each of the proteins which is 710(approx) for NS5 and 275(approx) for STAT2.


(v) Distance between the two Proteins fluctuates between higher and lower values



Figure 5: Plot showing distance between the centre of mass of proteins with respect to time.


Similar to Rg plot the distance between the two proteins show increase and decrease repeatedly, which suggest the proteins are undergoing diffusive motion in the solvent.


(vi) Concluding remark and simulation video

We started the complex protein simulation of STAT2 and NS5. During the energy minimization, the proteins got separated. So when we started the production run the protein molecules were apart. We observed that independently these two proteins are stable by itself during the entire course of the simulation. We also observed that the loop region(as mentioned in protein-protein interaction studies, is involved in the interaction with NS5) is showing affinity towards NS5. Although there are a few regions on the protein which show higher flexibility when compared to loop regions, while the regions of lower flexibility indicate rigid structure, which in turn indicates that the proteins are stable. Both the proteins come closer and are seen to be parted intermittently during the entire course of simulation which is akin to diffusive motion observed by protein in the solvent medium. MD simulation has not converged so far. We need to perform it for a longer time to reach a meaningful conclusion. Diffusive motion toward each other is an indication that both the proteins will interact if simulated for a longer time.





(vii) Snapshots showing extent of protein interaction



Figure 6(a): Snapshot of the mds taken during the initial time of the simulation. Highlighted are the residues present in STAT2 which are found to interact with NS5.



Figure 6(b): Snapshot of the mds taken during the end of the simulation


Figure 6: Comparing both the figures 6-a and 6-b it can be clearly seen that the loop region of STAT2 has an affinity towards NS5


The protein structures that we obtained from PDB had some missing elements/links in it (coordinates for few amino acids were missing). It happens mostly for the proteins with several loop regions, as methods such as X-ray crystallography or cryo-electron microscopy are not able to resolve flexible loop and turn regions.This is also relevant in the NS5-STAT2 interaction that we decided to investigate as our proof of concept investigation. A specific loop region of hSTAT2 (181-200) which is considered to be involved in the interaction has such missing elements, which might influence direct correlations with the MDS part.

Future prospect: One way to solve this problem is to use the homology modelling structure we have generated (which don't have any such missing links) or to use some online servers to fill the missing elements. If this work is further followed up (fully or partially), then this change should be applied to refine the project.




References



1. Dalke, A., Humphrey, W., and Schulten, K., "VMD - Visual Molecular Dynamics", J. Molec. Graphics, 1996, vol. 14, pp. 33-38.


2. A. Sijbers, D. van der Spoel, E.J. Dijkstra, H. Bekker, H.J.C. Berendsen,H. Keegstra, R. van Drunen, S. Achterop et al., “Gromacs: A parallel computer for molecular dynamics simulations”; pp. 252–256 in Physics computing 92. Edited by R.A. de Groot and J. Nadrchal. World Scientific, Singapore, 1993.



©iGEM IISER Berhampur