Difference between revisions of "Team:TU Darmstadt/Model/Enzyme Modeling"

Line 39: Line 39:
 
         </div>
 
         </div>
 
          
 
          
         <div>
+
        <div class="TFcontainer">
 +
         <div class="containertext" id=Chapter1 >
 
             By carrying out structure prediction calculations using the Rosetta <i>comparative modelling</i> application <i>RosettaCM</i> we hope to create a precise 3D-model of the enzyme EreB.  
 
             By carrying out structure prediction calculations using the Rosetta <i>comparative modelling</i> application <i>RosettaCM</i> we hope to create a precise 3D-model of the enzyme EreB.  
 
             <i>RosettaCM</i> is based on homology modelling, comparing the protein structure to known crystal structures of proteins with a high sequence homology.  
 
             <i>RosettaCM</i> is based on homology modelling, comparing the protein structure to known crystal structures of proteins with a high sequence homology.  
Line 45: Line 46:
 
             Low homology is a consequence of mismatches or gaps in the structures’ alignments.  
 
             Low homology is a consequence of mismatches or gaps in the structures’ alignments.  
 
             This protocol uses a library of nine-mer and three-mer fragments of known protein structures to predict possible folding. <sup id="cite_ref-1"><a href="#cite_note-1">[1]</a></sup>
 
             This protocol uses a library of nine-mer and three-mer fragments of known protein structures to predict possible folding. <sup id="cite_ref-1"><a href="#cite_note-1">[1]</a></sup>
         </div>
+
         <br>
        <div>
+
 
             EreB possesses high similarity in its active site with other esterases such as protein data bank (PDB) entry succinoglycan biosynthesis protein.  
 
             EreB possesses high similarity in its active site with other esterases such as protein data bank (PDB) entry succinoglycan biosynthesis protein.  
 
             This enhances the accuracy of the structure prediction method homology modelling since the similar domains can be used as a template for the structure.  
 
             This enhances the accuracy of the structure prediction method homology modelling since the similar domains can be used as a template for the structure.  
 
             Nevertheless, the highest found structural homology of another PDB entry is 25,90% (Succinoglycan biosynthesis protein 2QGM, Chain A).  
 
             Nevertheless, the highest found structural homology of another PDB entry is 25,90% (Succinoglycan biosynthesis protein 2QGM, Chain A).  
 
             Since the structure prediction relies on the statistical Monte Carlo method, multiple modelling runs are necessary to obtain a precise structure.
 
             Since the structure prediction relies on the statistical Monte Carlo method, multiple modelling runs are necessary to obtain a precise structure.
 +
        </div>
 +
<div class="containerimg" id=Chapter2>
 +
                <figure>
 +
                    <img src="https://static.igem.org/mediawiki/2020/0/01/T--TU_Darmstadt--EreB_turn_low.gif" alt="figure">
 +
                    <figcaption id="Figure#">Figure 1: EreB crystal structure generated with <i>RosettaCM</i></figcaption>
 +
                </figure>
 +
            </div>
 
         </div>
 
         </div>
 
         <div>
 
         <div>

Revision as of 12:50, 10 October 2020

image/svg+xml - O O



EreB CM-Modeling

By carrying out structure prediction calculations using the Rosetta comparative modelling application RosettaCM we hope to create a precise 3D-model of the enzyme EreB. RosettaCM is based on homology modelling, comparing the protein structure to known crystal structures of proteins with a high sequence homology. Structure of unaligned sequences showing no or low homology to the given template structures are generated using the Rosetta ab initio protocol. Low homology is a consequence of mismatches or gaps in the structures’ alignments. This protocol uses a library of nine-mer and three-mer fragments of known protein structures to predict possible folding. [1]
EreB possesses high similarity in its active site with other esterases such as protein data bank (PDB) entry succinoglycan biosynthesis protein. This enhances the accuracy of the structure prediction method homology modelling since the similar domains can be used as a template for the structure. Nevertheless, the highest found structural homology of another PDB entry is 25,90% (Succinoglycan biosynthesis protein 2QGM, Chain A). Since the structure prediction relies on the statistical Monte Carlo method, multiple modelling runs are necessary to obtain a precise structure.
figure
Figure 1: EreB crystal structure generated with RosettaCM
Proteins with high sequence homologies are found by blasting the EreB sequence against the PDB using the NCBI Blast application (blastp). 3 protein structures were identified to be sequentially related to EreB and were threaded onto the query sequence. Rosetta’s partial thread application assigns the templates’ structural data on the aligned sequences of the target structure to prepare the structure prediction run.
Succinoglycan biosynthesis protein (2QGM_A: X-ray diffraction, 1,70 Å + 2RAD_A: X-ray diffraction, 2,75 Å) by Bacillus cereus ATCC 14579 expressed in E. coli and Q81BN2_BACCR protein also from Bacillus cereus ATCC 14579 (3B55_A: X-ray diffraction, 2,30 Å) were used as templates.
The resulting threaded models are aligned in a single global frame and are then used to create a full chain model of the proteins 3D structure by Monte Carlo sampling. Monte Carlo sampling relies on random sampling of variants for solution of a problem that is deterministic, for example protein folding.
Fragment files were generated using the old Robetta fragment server. It outputs a three- and nine-mer file by aligning fragments to PDB entries. The fragment files are then used as input structures to enhance the model’s precision.
The structural changes are then scored using the Rosetta low resolution energy function. This approach relies on the fact that the desired proteins 3D structure is expected to be the minimum of its free energy function. Rosetta’s energy function takes hydrophobic interactions between non-polar residues, van der Waals interactions between buried atoms and the strong size dependence of forming a cavity in the solvent for accommodation of the folded protein in account. In this process atom-atom-interactions are calculated as Lennard-Jones potentials, hydrophobic interactions and the electrostatic desolvation of polar residues inside the molecule as implicit solvation models and an explicit hydrogen bonding potential for hydrogen bonds. [2] [3] The calculated energy is then assessed by the Metropolice acceptance criterion: If ΔE < 0 the structure is accepted, otherwise the newly proposed structure is accepted with probability p.
figure
Structures are built using the Rosetta “fold tree”.[2] Therefore, backbone and side chain conformation are displayed in a torsion space (Bonded interactions are mostly treated with ideal bond lengths and angles). Additionally, the position of each residue is displayed in a Cartesian space. Torsion angles are changed according to the fragment files or the provided templates and positions of homology fragments are substituted, combining an ab initio approach with a homology modelling approach. This combination of template derived fragments in Cartesian space with torsion angles and residue positions derived from fragments from the PDB database should ideally converge into the correctly folded protein topology. [3]
To solve clashes, distorted peptide bonds and poor backbone hydrogen bond geometry that often arises from this CM approach further improvements are necessary: In a second step the structure is improved by replacing backbone segments through Monte-Carlo-Method by either segments taken from the PDB that span the region and can roughly be superimposed on the selected residues or segments from the template structures that superimpose the complete segment. Afterwards the structure’s energy is minimized using a smoothed version of Rosetta’s low energy function. In a third step side chain residues are added and structure refinement is carried out using a physically realistic energy function.[3]
Since Comparative modelling is a statistical approach for protein folding based on the Monte-Carlo-Method and comparison to related structures, a high number of structures has to be generated to ensure the predicted structure is as close as possible to the actual protein structures. In this context 20000 structures were generated using the “Lichtenberg high performance computer” of the TU Darmstadt. After the run finished the best structures were sorted by their total score calculated by RosettaCM and the best structure (S_17070.pdb) was used for further calculations. The total run was analysed using the Biotite Python package and its implemented superimpose and RMSD feature. [4]
figure
Figure 1: 5 best scoring structures aligned in Pymol (S_17070: 4031.077, S_09664: 4033.266, S_00324: 4041.481, S_17103: 4069.085, S_12758: 4070.893)
figure
Figure 2: RMSD vs total score plot of the RosettaCM run. 20000 structures were generated and analyzed using the biotite python package.
RMSD values equals the Root-mean-square deviation of atomic positions to represent the structure similarity of two molecules. It is calculated using the following formula:
figure
For structure refinement an additional Relax run was carried out with an output of 100 structures to ensure realistic torsion angles. Relax is an all atom structure refinement application working in the structures local conformational space. [5] Regions of high energy in the proteins structure are optimized considering backbone and sidechain restraints to minimize structural derivation. Optimization follows usual methods as torsion-space sidechain minimization, torsion-space backbone minimization, and re-sampling of sidechain rotamers. [6]
To evaluate the obtained protein structure Ramachandran plots were created and can be compared to Ramachandran plots generated using a broad variety of protein’s crystal structures using the Procheck webserver. We checked whether the dihedral angles of the modelled secondary structure show the typical distribution to validate the model's accuracy.[7] The evaluation showed that 328 residues are located in the most favoured regions, 44 in the additional allowed regions and 2 residues in the disallowed regions. Glycine and proline residues were excluded since they show no predictable dihedral angle distribution. 87,7% of the structures dihedral angles are located in the most favourable regions and 99,5% in total in allowed regions. Therefore, the structure is expected to be a good model of EreB’s crystal structure.
figure
figure
figure
figure
To summarize we used the RosettaCM application to predict the structure of our target enzyme EreB by creating 20000 structures on the Lichtenberg server cluster. We then relaxed the best scoring structure and validated it dihedral angles using a Ramachandran plot. For further investigations on enzyme stability MD simulation will be performed on the obtained structure.