Difference between revisions of "Team:TU Darmstadt/Model/Enzyme Modeling"

Line 11: Line 11:
  
 
             <div class="sidenav">
 
             <div class="sidenav">
                 <a href="#Why">Why Rosetta?</a>
+
                 <a target = "_blank" href="#Why">Why Rosetta?</a>
                 <a href="#Docking">Laccase Docking</a>
+
                 <a target = "_blank" href="#Docking">Laccase Docking</a>
                 <a href="#EreB_CM">EreB CM-Modeling</a>
+
                 <a target = "_blank" href="#EreB_CM">EreB CM-Modeling</a>
                 <a href="#EreB_MD">EreB MD Simulation</a>
+
                 <a target = "_blank" href="#EreB_MD">EreB MD Simulation</a>
                 <a href="#fus">TasA fusion proteins</a>
+
                 <a target = "_blank" href="#fus">TasA fusion proteins</a>
  
 
             </div>
 
             </div>
Line 40: Line 40:
 
         solution of a broad range of applications by considering many different energy terms relevant for protein folding such as solvation, electrostatic effects and hydrogen bonding.  
 
         solution of a broad range of applications by considering many different energy terms relevant for protein folding such as solvation, electrostatic effects and hydrogen bonding.  
 
         It can be used to perform simulations starting from <b>designing macromolecular structures, interactions and RNA or fibril structures up to the de novo design of a fully functioning enzyme</b>!  
 
         It can be used to perform simulations starting from <b>designing macromolecular structures, interactions and RNA or fibril structures up to the de novo design of a fully functioning enzyme</b>!  
         It was originally developed by the David Baker Lab. <sup id="cite_ref-40"><a href="#cite_note-40">[1]</a></sup> Rosetta is <b>free for academic users</b> and a very powerful tool for multiple problems that come along with elaborating a synthetic biology problem.  
+
         It was originally developed by the David Baker Lab. <sup id="cite_ref-40"><a target = "_blank" href="#cite_note-40">[1]</a></sup> Rosetta is <b>free for academic users</b> and a very powerful tool for multiple problems that come along with elaborating a synthetic biology problem.  
 
         Although there are lot of information available in the Rosetta Documentation, it is very hard for people that want to get started with Rosetta to improve their project, especially for people without experience with console-based applications.  
 
         Although there are lot of information available in the Rosetta Documentation, it is very hard for people that want to get started with Rosetta to improve their project, especially for people without experience with console-based applications.  
 
         Nevertheless, Rosetta displays an <b>amazing and multifaceted tool for synthetic biology</b>. To counter the starting issues with the program we provide a <b>guide for Rosetta</b> on our Wiki. <br><br>  
 
         Nevertheless, Rosetta displays an <b>amazing and multifaceted tool for synthetic biology</b>. To counter the starting issues with the program we provide a <b>guide for Rosetta</b> on our Wiki. <br><br>  
Line 47: Line 47:
 
         binding affinity towards different pharmaceuticals or pollutants and many more.<br>  
 
         binding affinity towards different pharmaceuticals or pollutants and many more.<br>  
 
          
 
          
         <b> RosettaCM</b> was used to generate <b>structure predictions</b> of the azithromycin transforming enzyme <b>EreB</b> and <b> fusion proteins</b> consisting of matrix protein <b> TasA</b> and our enzymes <b>CueO, CotA and EreB</b> <sup id="cite_ref-41"><a href="#cite_note-41">[2]</a></sup>.<br>  
+
         <b> RosettaCM</b> was used to generate <b>structure predictions</b> of the azithromycin transforming enzyme <b>EreB</b> and <b> fusion proteins</b> consisting of matrix protein <b> TasA</b> and our enzymes <b>CueO, CotA and EreB</b> <sup id="cite_ref-41"><a target = "_blank" href="#cite_note-41">[2]</a></sup>.<br>  
 
          
 
          
         <b> Rosetta Ligand Docking</b> was used to <b>study the binding affinity</b> of various substrates towards the enzymes active site<sup id="cite_ref-42"><a href="#cite_note-42">[3]</a></sup>.<br>  
+
         <b> Rosetta Ligand Docking</b> was used to <b>study the binding affinity</b> of various substrates towards the enzymes active site<sup id="cite_ref-42"><a target = "_blank" href="#cite_note-42">[3]</a></sup>.<br>  
 
          
 
          
         <b> Protein Design</b> was used to<b> enhance the binding affinity</b> of our target molecules to the corresponding enzymes’ active site by introduction of mutations <sup id="cite_ref-43"><a href="#cite_note-43">[4]</a></sup>.<br><br>  
+
         <b> Protein Design</b> was used to<b> enhance the binding affinity</b> of our target molecules to the corresponding enzymes’ active site by introduction of mutations <sup id="cite_ref-43"><a target = "_blank" href="#cite_note-43">[4]</a></sup>.<br><br>  
 
          
 
          
         To find out more about the different aspects of the modelling using Rosetta you can read the articles thematizing these applications. If you want to use Rosetta on your own you can use our <a href = “https://2020.igem.org/Team:TU_Darmstadt/Model/Rosetta_Guide“>Rosetta Guide</a> to get started with the program.  
+
         To find out more about the different aspects of the modelling using Rosetta you can read the articles thematizing these applications. If you want to use Rosetta on your own you can use our <a target = "_blank" href = “https://2020.igem.org/Team:TU_Darmstadt/Model/Rosetta_Guide“>Rosetta Guide</a> to get started with the program.  
 
</div>
 
</div>
  
Line 81: Line 81:
 
             Structure of <b>unaligned sequences</b> showing no or low homology to the given template structures are generated using the <b>Rosetta <i>ab initio</i> protocol</b>.  
 
             Structure of <b>unaligned sequences</b> showing no or low homology to the given template structures are generated using the <b>Rosetta <i>ab initio</i> protocol</b>.  
 
             Low homology is a consequence of mismatches or gaps in the structures’ alignments.  
 
             Low homology is a consequence of mismatches or gaps in the structures’ alignments.  
             This protocol uses a library of nine-mer and three-mer fragments of known protein structures to predict possible folding. <sup id="cite_ref-1"><a href="#cite_note-1">[5]</a></sup>
+
             This protocol uses a library of nine-mer and three-mer fragments of known protein structures to predict possible folding. <sup id="cite_ref-1"><a target = "_blank" href="#cite_note-1">[5]</a></sup>
 
         <br>
 
         <br>
 
             EreB possesses <b>high similarity</b> in its active site with other esterases such as protein data bank (PDB) entry <b>succinoglycan biosynthesis protein</b>.  
 
             EreB possesses <b>high similarity</b> in its active site with other esterases such as protein data bank (PDB) entry <b>succinoglycan biosynthesis protein</b>.  
Line 96: Line 96:
 
         </div>
 
         </div>
 
         <div class="containertext">
 
         <div class="containertext">
             Proteins with high sequence homologies are found by blasting the EreB sequence against the PDB using the <a href="https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins">NCBI Blast application (blastp)</a>.  
+
             Proteins with high sequence homologies are found by blasting the EreB sequence against the PDB using the <a target = "_blank" href="https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins">NCBI Blast application (blastp)</a>.  
 
           <b>3 protein structures were identified to be sequentially related to EreB</b> and were threaded onto the query sequence.  
 
           <b>3 protein structures were identified to be sequentially related to EreB</b> and were threaded onto the query sequence.  
 
             Rosetta’s partial thread application assigns the templates’ structural data on the aligned sequences of the target structure to prepare the structure prediction run.
 
             Rosetta’s partial thread application assigns the templates’ structural data on the aligned sequences of the target structure to prepare the structure prediction run.
 
         </div>
 
         </div>
 
         <div class="containertext">
 
         <div class="containertext">
             Succinoglycan biosynthesis protein (<a href="https://www.rcsb.org/structure/2qgm">2QGM_A:</a> X-ray diffraction, 1,70 Å + <a href="https://www.rcsb.org/structure/2rad">2RAD_A:</a> X-ray diffraction, 2,75 Å) by <i>Bacillus cereus</i> ATCC 14579 expressed in <i>E. coli</i>  
+
             Succinoglycan biosynthesis protein (<a target = "_blank" href="https://www.rcsb.org/structure/2qgm">2QGM_A:</a> X-ray diffraction, 1,70 Å + <a target = "_blank" href="https://www.rcsb.org/structure/2rad">2RAD_A:</a> X-ray diffraction, 2,75 Å) by <i>Bacillus cereus</i> ATCC 14579 expressed in <i>E. coli</i>  
             and Q81BN2_BACCR protein also from <i>Bacillus cereus</i> ATCC 14579 (<a href="https://www.rcsb.org/structure/3b55">3B55_A:</a> X-ray diffraction, 2,30 Å) were used as templates.  
+
             and Q81BN2_BACCR protein also from <i>Bacillus cereus</i> ATCC 14579 (<a target = "_blank" href="https://www.rcsb.org/structure/3b55">3B55_A:</a> X-ray diffraction, 2,30 Å) were used as templates.  
 
         </div>
 
         </div>
 
         <div class="containertext">
 
         <div class="containertext">
Line 109: Line 109:
 
         </div>
 
         </div>
 
         <div class="containertext">
 
         <div class="containertext">
             Fragment files were generated using the old <a href="https://www.rcsb.org/structure/3b55"><i>Robetta</i> fragment server</a>.  
+
             Fragment files were generated using the old <a target = "_blank" href="https://www.rcsb.org/structure/3b55"><i>Robetta</i> fragment server</a>.  
 
             It outputs a <b>three- and nine-mer file</b> by aligning fragments to PDB entries. The fragment files are then used as input structures to enhance the model’s precision.
 
             It outputs a <b>three- and nine-mer file</b> by aligning fragments to PDB entries. The fragment files are then used as input structures to enhance the model’s precision.
 
         </div>
 
         </div>
Line 116: Line 116:
 
             Rosetta’s energy function takes <b>hydrophobic interactions</b> between non-polar residues, <b>van der Waals interactions</b> between buried atoms and the strong size dependence of <b>forming a cavity in the solvent</b> for accommodation of the folded protein in account.  
 
             Rosetta’s energy function takes <b>hydrophobic interactions</b> between non-polar residues, <b>van der Waals interactions</b> between buried atoms and the strong size dependence of <b>forming a cavity in the solvent</b> for accommodation of the folded protein in account.  
 
             In this process atom-atom-interactions are calculated as <b>Lennard-Jones potentials, hydrophobic interactions and the electrostatic desolvation</b> of polar residues  
 
             In this process atom-atom-interactions are calculated as <b>Lennard-Jones potentials, hydrophobic interactions and the electrostatic desolvation</b> of polar residues  
             inside the molecule as <b>implicit solvation models and an explicit hydrogen bonding potential</b> for hydrogen bonds. <sup id="cite_ref-2"><a href="#cite_note-2">[6]</a></sup> <sup id="cite_ref-41"><a href="#cite_note-41">[2]</a></sup>  
+
             inside the molecule as <b>implicit solvation models and an explicit hydrogen bonding potential</b> for hydrogen bonds. <sup id="cite_ref-41"><a target = "_blank" href="#cite_note-41">[2, </a></sup> <sup id="cite_ref-2"><a target = "_blank" href="#cite_note-2">6]</a></sup>  
 
             The calculated energy is then assessed by the <b>Metropolis acceptance criterion:</b> If ΔE < 0 the structure is accepted, otherwise the newly proposed structure is accepted with probability p.
 
             The calculated energy is then assessed by the <b>Metropolis acceptance criterion:</b> If ΔE < 0 the structure is accepted, otherwise the newly proposed structure is accepted with probability p.
 
         </div>
 
         </div>
Line 125: Line 125:
 
         </div>
 
         </div>
 
         <div class="containertext">
 
         <div class="containertext">
             Structures are built using the <b>Rosetta “fold tree”</b>.<sup id="cite_ref-2"><a href="#cite_note-2">[6]</a></sup> Therefore, backbone and side chain conformation are displayed in a <b>torsion space</b> (Bonded interactions are mostly treated with ideal bond lengths and angles).  
+
             Structures are built using the <b>Rosetta “fold tree”</b>.<sup id="cite_ref-2"><a target = "_blank" href="#cite_note-2">[6]</a></sup> Therefore, backbone and side chain conformation are displayed in a <b>torsion space</b> (Bonded interactions are mostly treated with ideal bond lengths and angles).  
 
             Additionally, the position of each residue is displayed in a Cartesian space. Torsion angles are changed according to the fragment files or the provided templates and positions of homology fragments are substituted, combining an <i>ab initio</i> approach with a homology modelling approach.  
 
             Additionally, the position of each residue is displayed in a Cartesian space. Torsion angles are changed according to the fragment files or the provided templates and positions of homology fragments are substituted, combining an <i>ab initio</i> approach with a homology modelling approach.  
             This combination of <b>template derived fragments in Cartesian space</b> with <b>torsion angles and residue positions derived from fragments from the PDB database</b> should ideally <b>converge into the correctly folded protein topology</b>. <sup id="cite_ref-41"><a href="#cite_note-41">[2]</a></sup>   
+
             This combination of <b>template derived fragments in Cartesian space</b> with <b>torsion angles and residue positions derived from fragments from the PDB database</b> should ideally <b>converge into the correctly folded protein topology</b>. <sup id="cite_ref-41"><a target = "_blank" href="#cite_note-41">[2]</a></sup>   
 
         </div>
 
         </div>
 
         <div class="containertext">
 
         <div class="containertext">
 
             To solve clashes, distorted peptide bonds and poor backbone hydrogen bond geometry that often arises from this CM approach further improvements are necessary:  
 
             To solve clashes, distorted peptide bonds and poor backbone hydrogen bond geometry that often arises from this CM approach further improvements are necessary:  
 
             In a second step the structure is <b>improved by replacing backbone segments through Monte-Carlo-Method</b> by either <b>segments taken from the PDB that span the region</b> and can roughly be superimposed on the selected residues or segments from the <b>template structures that superimpose the complete segment</b>.  
 
             In a second step the structure is <b>improved by replacing backbone segments through Monte-Carlo-Method</b> by either <b>segments taken from the PDB that span the region</b> and can roughly be superimposed on the selected residues or segments from the <b>template structures that superimpose the complete segment</b>.  
             Afterwards the structure’s energy is minimized using a smoothed version of Rosetta’s low energy function. In a third step side chain residues are added and structure refinement is carried out using a physically realistic energy function.<sup id="cite_ref-41"><a href="#cite_note-41">[2]</a></sup>   
+
             Afterwards the structure’s energy is minimized using a smoothed version of Rosetta’s low energy function. In a third step side chain residues are added and structure refinement is carried out using a physically realistic energy function.<sup id="cite_ref-41"><a target = "_blank" href="#cite_note-41">[2]</a></sup>   
 
         </div>
 
         </div>
 
         <div class="TFcontainer">
 
         <div class="TFcontainer">
Line 138: Line 138:
 
             Since Comparative modelling is a statistical approach for protein folding based on the Monte-Carlo-Method and comparison to related structures, a <b>high number of structures has to be generated</b> to ensure the predicted structure is as close as possible to the actual protein structures.  
 
             Since Comparative modelling is a statistical approach for protein folding based on the Monte-Carlo-Method and comparison to related structures, a <b>high number of structures has to be generated</b> to ensure the predicted structure is as close as possible to the actual protein structures.  
 
             In this context 20000 structures were generated using the “Lichtenberg high performance computer” of the TU Darmstadt. After the run finished the best structures were sorted by their total score calculated by RosettaCM and the best structure (S_17070.pdb) was used for further calculations.  
 
             In this context 20000 structures were generated using the “Lichtenberg high performance computer” of the TU Darmstadt. After the run finished the best structures were sorted by their total score calculated by RosettaCM and the best structure (S_17070.pdb) was used for further calculations.  
             The total run was analysed using the Biotite Python package and its implemented superimpose and <b>RMSD feature</b>. <sup id="cite_ref-4"><a href="#cite_note-4">[7]</a></sup>   
+
             The total run was analysed using the Biotite Python package and its implemented superimpose and <b>RMSD feature</b>. <sup id="cite_ref-4"><a target = "_blank" href="#cite_note-4">[7]</a></sup>   
 
             </div>
 
             </div>
 
             <div class="containerimg" id=Chapter2>
 
             <div class="containerimg" id=Chapter2>
Line 158: Line 158:
 
         </div>
 
         </div>
 
         <div class="containertext">
 
         <div class="containertext">
             For structure refinement an <b>additional Relax</b> run was carried out with an output of 100 structures to <b>ensure realistic torsion angles</b>. Relax is an <b>all atom structure refinement application</b> working in the structures local conformational space. <sup id="cite_ref-5"><a href="#cite_note-5">[8]</a></sup>     
+
             For structure refinement an <b>additional Relax</b> run was carried out with an output of 100 structures to <b>ensure realistic torsion angles</b>. Relax is an <b>all atom structure refinement application</b> working in the structures local conformational space. <sup id="cite_ref-5"><a target = "_blank" href="#cite_note-5">[8]</a></sup>     
             Regions of high energy in the proteins structure are optimized considering backbone and sidechain restraints to minimize structural derivation. Optimization follows usual methods as torsion-space sidechain minimization, torsion-space backbone minimization, and re-sampling of sidechain rotamers. <sup id="cite_ref-6"><a href="#cite_note-6">[9]</a></sup>   
+
             Regions of high energy in the proteins structure are optimized considering backbone and sidechain restraints to minimize structural derivation. Optimization follows usual methods as torsion-space sidechain minimization, torsion-space backbone minimization, and re-sampling of sidechain rotamers. <sup id="cite_ref-6"><a target = "_blank" href="#cite_note-6">[9]</a></sup>   
 
         </div>
 
         </div>
 
         <div class="containertext">
 
         <div class="containertext">
 
             To evaluate the obtained protein structure <b>Ramachandran plots were created</b> and can be compared to Ramachandran plots generated using a broad variety of protein’s crystal structures using the Procheck webserver.  
 
             To evaluate the obtained protein structure <b>Ramachandran plots were created</b> and can be compared to Ramachandran plots generated using a broad variety of protein’s crystal structures using the Procheck webserver.  
             We checked whether the dihedral angles of the modelled secondary structure show the <b>typical distribution to validate the model's accuracy</b>.<sup id="cite_ref-7"><a href="#cite_note-7">[10]</a></sup>    The evaluation showed that 328 residues are located in the most favoured regions, 44 in the additional allowed regions and 2 residues in the disallowed regions.  
+
             We checked whether the dihedral angles of the modelled secondary structure show the <b>typical distribution to validate the model's accuracy</b>.<sup id="cite_ref-7"><a target = "_blank" href="#cite_note-7">[10]</a></sup>    The evaluation showed that 328 residues are located in the most favoured regions, 44 in the additional allowed regions and 2 residues in the disallowed regions.  
 
             Glycine and proline residues were excluded since they show no predictable dihedral angle distribution. <b>87,7% of the structures dihedral angles are located in the most favourable regions</b> and <b>99,5% in total in allowed regions</b>. Therefore, the structure is expected to be a good model of EreB’s crystal structure.  
 
             Glycine and proline residues were excluded since they show no predictable dihedral angle distribution. <b>87,7% of the structures dihedral angles are located in the most favourable regions</b> and <b>99,5% in total in allowed regions</b>. Therefore, the structure is expected to be a good model of EreB’s crystal structure.  
 
         </div>
 
         </div>
Line 237: Line 237:
 
         <b>Periodic boundary conditions</b> allow this transfer: The system is assumed to be consisting of multiple small systems that act identical, so when a molecule moves out of the simulated box it enters again on the other site,  
 
         <b>Periodic boundary conditions</b> allow this transfer: The system is assumed to be consisting of multiple small systems that act identical, so when a molecule moves out of the simulated box it enters again on the other site,  
 
         keeping the <b>particle number constant</b> and acting like a subsystem for simulation of a big system. By <b>proving stability of our predicted (fusion)proteins we can also forecast their functionality</b>:  
 
         keeping the <b>particle number constant</b> and acting like a subsystem for simulation of a big system. By <b>proving stability of our predicted (fusion)proteins we can also forecast their functionality</b>:  
         If the <b>residues catalysing the ester cleavage of EreB</b> form their already described <b>catalytic centre</b> and the active site is <b>accessible for azithromycin</b> the catalytic activity can most likely be assumed.<sup id="cite_ref-8"><a href="#cite_note-8">[11]</a></sup>   
+
         If the <b>residues catalysing the ester cleavage of EreB</b> form their already described <b>catalytic centre</b> and the active site is <b>accessible for azithromycin</b> the catalytic activity can most likely be assumed.<sup id="cite_ref-8"><a target = "_blank" href="#cite_note-8">[11]</a></sup>   
         Also, for the laccases catalytic activity can be assumed, if the copper sites are correctly folded to coordinate copper ions for oxidation catalysis and the substrate binding site equals the one of the obtained crystal structures.<sup id="cite_ref-9"><a href="#cite_note-9">[12]</a></sup>   
+
         Also, for the laccases catalytic activity can be assumed, if the copper sites are correctly folded to coordinate copper ions for oxidation catalysis and the substrate binding site equals the one of the obtained crystal structures.<sup id="cite_ref-9"><a target = "_blank" href="#cite_note-9">[12]</a></sup>   
 
     </div>  
 
     </div>  
  
Line 250: Line 250:
 
         MD simulations contain methods <b>calculating the forces exerted to all atoms of a biomolecular systems</b>, mostly a target molecule solvated in water.  
 
         MD simulations contain methods <b>calculating the forces exerted to all atoms of a biomolecular systems</b>, mostly a target molecule solvated in water.  
 
         Therefore, Newtonian equations of motions are used to predict the position and velocity of any atom in the system in small timesteps (femtoseconds).  
 
         Therefore, Newtonian equations of motions are used to predict the position and velocity of any atom in the system in small timesteps (femtoseconds).  
         The forces are calculated using a force field, for example GROMOS or AMBER. “<b>Force fields are sets of potential functions and parametrized interactions that can be used to study physical systems</b>.”  <sup id="cite_ref-10"><a href="#cite_note-10">[13]</a></sup>  
+
         The forces are calculated using a force field, for example GROMOS or AMBER. “<b>Force fields are sets of potential functions and parametrized interactions that can be used to study physical systems</b>.”  <sup id="cite_ref-10"><a target = "_blank" href="#cite_note-10">[13]</a></sup>  
 
         They are derived from the equations of motion and therefore introduce time-dependency of the system. The force field consist of 3 types of interactions:<br>
 
         They are derived from the equations of motion and therefore introduce time-dependency of the system. The force field consist of 3 types of interactions:<br>
 
         1. <b>Bonded interactions</b> between 2, 3 or 4 particles including harmonic, cubic and morse potentials for 2 particle systems, harmonic interactions for 3 particle systems <br>
 
         1. <b>Bonded interactions</b> between 2, 3 or 4 particles including harmonic, cubic and morse potentials for 2 particle systems, harmonic interactions for 3 particle systems <br>
 
         2. <b>Nonbonded interactions</b> between different molecules. The repulsion described by an exponential term, e. g. Lennard Jones potential, and a Coulomb term. <br>
 
         2. <b>Nonbonded interactions</b> between different molecules. The repulsion described by an exponential term, e. g. Lennard Jones potential, and a Coulomb term. <br>
         3. <b>Special interactions defined by the position restraint</b> of a given system, f. e. distance restraints obtained by Nuclear Overhauser Effect data from high resolution NMR.<sup id="cite_ref-11"><a href="#cite_note-11">[14]</a></sup> <br>
+
         3. <b>Special interactions defined by the position restraint</b> of a given system, f. e. distance restraints obtained by Nuclear Overhauser Effect data from high resolution NMR.<sup id="cite_ref-11"><a target = "_blank" href="#cite_note-11">[14]</a></sup> <br>
 
     </div>
 
     </div>
 
     <div style="display: flex;justify-content: center;">
 
     <div style="display: flex;justify-content: center;">
 
         <figure>
 
         <figure>
 
             <img  src="https://static.igem.org/mediawiki/2020/7/74/T--TU_Darmstadt--Charmmff.png" alt="figure" width="800">
 
             <img  src="https://static.igem.org/mediawiki/2020/7/74/T--TU_Darmstadt--Charmmff.png" alt="figure" width="800">
             <caption id="Figure#"><br><b>Figure 4:</b> Forcefield equation for Charmm27 forcefield. K<sub>b</sub>, K<sub>θ</sub>, K<sub>UB</sub>, K<sub>χ</sub>, and K<sub>φ</sub> are the bond, valence angle, Urey–Bradley, dihedral angle, and im-proper dihedral angle force constants, respectively; b, θ, S, χ and δ are the bond length, bond angle, Urey–Bradley 1,3 distance, dihedral torsion angle and improper dihedral angle. <sup id="cite_ref-12"><a href="#cite_note-12">[15]</a></sup></caption>
+
             <caption id="Figure#"><br><b>Figure 4:</b> Forcefield equation for Charmm27 forcefield. K<sub>b</sub>, K<sub>θ</sub>, K<sub>UB</sub>, K<sub>χ</sub>, and K<sub>φ</sub> are the bond, valence angle, Urey–Bradley, dihedral angle, and im-proper dihedral angle force constants, respectively; b, θ, S, χ and δ are the bond length, bond angle, Urey–Bradley 1,3 distance, dihedral torsion angle and improper dihedral angle. <sup id="cite_ref-12"><a target = "_blank" href="#cite_note-12">[15]</a></sup></caption>
 
         </figure>
 
         </figure>
 
     </div>
 
     </div>
 
     <div class="containertext">
 
     <div class="containertext">
         Molecular dynamics simulation considers Newtonian forces on every atom in the simulated system and can therefore deliver, <b>dependent of the force fields accuracy, precise results</b>.<sup id="cite_ref-13"><a href="#cite_note-13">[16]</a></sup>   
+
         Molecular dynamics simulation considers Newtonian forces on every atom in the simulated system and can therefore deliver, <b>dependent of the force fields accuracy, precise results</b>.<sup id="cite_ref-13"><a target = "_blank" href="#cite_note-13">[16]</a></sup>   
 
         Nevertheless, it is still an <b>approximation and thus connected to imprecisions</b>: The forces calculated are <b>cut after a defined distance</b> to limit required computing power.  
 
         Nevertheless, it is still an <b>approximation and thus connected to imprecisions</b>: The forces calculated are <b>cut after a defined distance</b> to limit required computing power.  
 
         Also, the forcefield calculates forces on atomic levels <b>without quantum mechanics taken into account</b>. This adoption is based on the <b>Born-Oppenheimer MD approximation</b> that splits an atom’s energy into core-energy and electron-energy since the electrons’ dynamics do not directly influence the atomic core.
 
         Also, the forcefield calculates forces on atomic levels <b>without quantum mechanics taken into account</b>. This adoption is based on the <b>Born-Oppenheimer MD approximation</b> that splits an atom’s energy into core-energy and electron-energy since the electrons’ dynamics do not directly influence the atomic core.
         The <b>core’s kinetic energy</b> can then be taken into account by <b>classical approximation of Newton’s law of movement</b>. The quantum mechanical parts of this system, the electrons’ wave functions, are not considered for classical MD simulation. <sup id="cite_ref-14"><a href="#cite_note-14">[17]</a></sup></caption>
+
         The <b>core’s kinetic energy</b> can then be taken into account by <b>classical approximation of Newton’s law of movement</b>. The quantum mechanical parts of this system, the electrons’ wave functions, are not considered for classical MD simulation. <sup id="cite_ref-14"><a target = "_blank" href="#cite_note-14">[17]</a></sup></caption>
 
         Thus, MD simulations are very <b>accurate methods for most large systems</b> but still based on approximations. Therefore, the obtained results are reliable but <b>always have to be double checked by experiment in the laboratory</b>.
 
         Thus, MD simulations are very <b>accurate methods for most large systems</b> but still based on approximations. Therefore, the obtained results are reliable but <b>always have to be double checked by experiment in the laboratory</b>.
 
     </div>
 
     </div>
Line 280: Line 280:
 
     <div class="containertext">
 
     <div class="containertext">
 
         The system used is a <b>cubic box filled with explicit water molecules (TIP3 model)</b> with the target enzyme centred in the box. Also, <b>ions and counter ions</b> are added to simulate realistic conditions and equilibrate the enzymes charge in solution due to (de)protonation.  
 
         The system used is a <b>cubic box filled with explicit water molecules (TIP3 model)</b> with the target enzyme centred in the box. Also, <b>ions and counter ions</b> are added to simulate realistic conditions and equilibrate the enzymes charge in solution due to (de)protonation.  
         The system topology was created using the <b>CHARMM27 forcefield</b> with the TIP3P water model, specifying a 3-site rigid water molecule with charges and Lennard-Jones parameters assigned to each of the 3 atoms. <sup id="cite_ref-15"><a href="#cite_note-15">[18]</a></sup><sup id="cite_ref-16"><a href="#cite_note-16">[19]</a></sup>   
+
         The system topology was created using the <b>CHARMM27 forcefield</b> with the TIP3P water model, specifying a 3-site rigid water molecule with charges and Lennard-Jones parameters assigned to each of the 3 atoms. <sup id="cite_ref-15"><a target = "_blank" href="#cite_note-15">[18,</a></sup><sup id="cite_ref-16"><a target = "_blank" href="#cite_note-16"> 19]</a></sup>   
 
         Afterwards a cuboid box with <b>at least 1.2 nm distance</b> of the borders to the protein is created and filled with water molecules and ions countering the proteins charge (system size: 8.000 5.425 5.921 (nm)).  
 
         Afterwards a cuboid box with <b>at least 1.2 nm distance</b> of the borders to the protein is created and filled with water molecules and ions countering the proteins charge (system size: 8.000 5.425 5.921 (nm)).  
 
         The <b>system's energy is then minimized</b>. Afterwards <b>NVT</b> (constant number of particles, volume and temperature) and <b>NPT</b> (constant number of particles, pressure and temperature) equilibration is done using a position restraint file to keep the enzymes structure during equilibration of temperature and pressure of the system.  
 
         The <b>system's energy is then minimized</b>. Afterwards <b>NVT</b> (constant number of particles, volume and temperature) and <b>NPT</b> (constant number of particles, pressure and temperature) equilibration is done using a position restraint file to keep the enzymes structure during equilibration of temperature and pressure of the system.  
 
         Both NVT and NPT are carried out for <b>100 ps</b> to ensure stabilization of the parameters. After the equilibration the main simulation can be started. The simulation was carried out for <b>100 ns with totally 50000000 steps</b> (each 2 fs).  
 
         Both NVT and NPT are carried out for <b>100 ps</b> to ensure stabilization of the parameters. After the equilibration the main simulation can be started. The simulation was carried out for <b>100 ns with totally 50000000 steps</b> (each 2 fs).  
         <sup id="cite_ref-17"><a href="#cite_note-17">[20]</a></sup><sup id="cite_ref-18"><a href="#cite_note-18">[21]</a></sup><sup id="cite_ref-19"><a href="#cite_note-19">[22]</a></sup>     
+
         <sup id="cite_ref-17"><a target = "_blank" href="#cite_note-17">[20,</a></sup><sup id="cite_ref-18"><a target = "_blank" href="#cite_note-18"> 21,</a></sup><sup id="cite_ref-19"><a target = "_blank" href="#cite_note-19"> 22]</a></sup>     
 
     </div>
 
     </div>
  
Line 295: Line 295:
 
         <b>RMSD</b> was calculated compared to both <b>relaxed and crystal structure</b>, <b>RMSF</b> was calculated for <b>C alpha backbone atoms</b> and </b>gyration radius</b> for the <b>whole protein</b>. The specific value of convergence depends to the size of the protein that is subject to MD simulation.  
 
         <b>RMSD</b> was calculated compared to both <b>relaxed and crystal structure</b>, <b>RMSF</b> was calculated for <b>C alpha backbone atoms</b> and </b>gyration radius</b> for the <b>whole protein</b>. The specific value of convergence depends to the size of the protein that is subject to MD simulation.  
 
         The RMSD and gyration angle plots were analysed after the 100 ns simulation and show a clear trend of convergence.  
 
         The RMSD and gyration angle plots were analysed after the 100 ns simulation and show a clear trend of convergence.  
         Also, the RMSF plot shows really <b>small movement at the residues essential for the catalytic process of EreB</b> E43, H46, R55, R74, H285 and H288.<sup id="cite_ref-8"><a href="#cite_note-8">[11]</a></sup> <br> <br>   
+
         Also, the RMSF plot shows really <b>small movement at the residues essential for the catalytic process of EreB</b> E43, H46, R55, R74, H285 and H288.<sup id="cite_ref-8"><a target = "_blank" href="#cite_note-8">[11]</a></sup> <br> <br>   
 
     </div>
 
     </div>
 
     <div style="display: flex;justify-content: center;">
 
     <div style="display: flex;justify-content: center;">
Line 306: Line 306:
 
         For further analysis <b>principle component analysis (PCA)</b> was performed on the simulation logs. PCA allows us to filter <b>global collective movement from local, fast movement</b> to further visualize and study the dynamics of a protein.  
 
         For further analysis <b>principle component analysis (PCA)</b> was performed on the simulation logs. PCA allows us to filter <b>global collective movement from local, fast movement</b> to further visualize and study the dynamics of a protein.  
 
         GROMACS covar tool is used to calculate a <b>covariance matrix of the proteins atomic fluctuation</b> that can be diagonalized to create a set of eigenvalues and eigenvectors describing the proteins modes of fluctuation.  
 
         GROMACS covar tool is used to calculate a <b>covariance matrix of the proteins atomic fluctuation</b> that can be diagonalized to create a set of eigenvalues and eigenvectors describing the proteins modes of fluctuation.  
         The covariation matrix of a protein describes the covariance meaning the dependency of each fluctuation to the other movements. Hereby, the eigenvectors represent the largest-amplitude correlated motions and are called <b>principal components or essential modes</b>.<sup id="cite_ref-20"><a href="#cite_note-20">[23]</a></sup>  
+
         The covariation matrix of a protein describes the covariance meaning the dependency of each fluctuation to the other movements. Hereby, the eigenvectors represent the largest-amplitude correlated motions and are called <b>principal components or essential modes</b>.<sup id="cite_ref-20"><a target = "_blank" href="#cite_note-20">[23]</a></sup>  
         The GROMACS anaeig tool can be used to visualize these principal components by <b>projecting the proteins trajectory on the given eigenvectors</b> (here 1-4). <sup id="cite_ref-21"><a href="#cite_note-21">[24]</a></sup>  
+
         The GROMACS anaeig tool can be used to visualize these principal components by <b>projecting the proteins trajectory on the given eigenvectors</b> (here 1-4). <sup id="cite_ref-21"><a target = "_blank" href="#cite_note-21">[24]</a></sup>  
 
         As shown in the animation the molecule shows strong internal movement, but the <b>catalytically important residues and the azithromycin binding pocket stay structurally preserved</b>, suggesting activity of the enzyme.  <br>
 
         As shown in the animation the molecule shows strong internal movement, but the <b>catalytically important residues and the azithromycin binding pocket stay structurally preserved</b>, suggesting activity of the enzyme.  <br>
 
         In conclusion we used a MD simulation run of 100 ns to <b>validate the structure determined by homology modelling</b> in a physical time-dependant forcefield. The simulation was done in a cubic box with at least 1.2 nm distance of the centred protein to the corners of the cube. The system was minimized and equilibrated by NVT and NPT simulations for both 100 ps. We analysed the systems temperature and pressure afterwards, which showed small fluctuation about the expected values. We were able to start the MD production run for 100 ns. Analysis of the production run showed converging RMSD and radii of gyration values as well as small RMSF values on the active site residues. Consequently, we validated the obtained structure as a <b>possible crystal structure of erythromycin esterase type II EreB</b>.  <br> <br>
 
         In conclusion we used a MD simulation run of 100 ns to <b>validate the structure determined by homology modelling</b> in a physical time-dependant forcefield. The simulation was done in a cubic box with at least 1.2 nm distance of the centred protein to the corners of the cube. The system was minimized and equilibrated by NVT and NPT simulations for both 100 ps. We analysed the systems temperature and pressure afterwards, which showed small fluctuation about the expected values. We were able to start the MD production run for 100 ns. Analysis of the production run showed converging RMSD and radii of gyration values as well as small RMSF values on the active site residues. Consequently, we validated the obtained structure as a <b>possible crystal structure of erythromycin esterase type II EreB</b>.  <br> <br>
Line 338: Line 338:
 
         biofilm</b> by fusing the enzymes to the matrix protein <b>TasA</b> (PDB entry 5OF2). Therefore, we are following the method described in Programmable and printable  
 
         biofilm</b> by fusing the enzymes to the matrix protein <b>TasA</b> (PDB entry 5OF2). Therefore, we are following the method described in Programmable and printable  
 
         <i>Bacillus subtilis</i> biofilms as engineered living materials by Huang et al. (2018), that fused it exemplary to various proteins or protein domains  
 
         <i>Bacillus subtilis</i> biofilms as engineered living materials by Huang et al. (2018), that fused it exemplary to various proteins or protein domains  
         like mCherry or MHETase to introduce new functions to the biofilm <sup id="cite_ref-429"><a href="#cite_note-429">[25]</a></sup>.  
+
         like mCherry or MHETase to introduce new functions to the biofilm <sup id="cite_ref-429"><a target = "_blank" href="#cite_note-429">[25]</a></sup>.  
         The exact methods for fusion protein construction is documented in our corresponding wiki text in the <a href = https://2020.igem.org/Team:TU_Darmstadt/Project/Biofilm#DisplayingEnzymes>biofilm</a> category.  
+
         The exact methods for fusion protein construction is documented in our corresponding wiki text in the <a target = "_blank" href = https://2020.igem.org/Team:TU_Darmstadt/Project/Biofilm#DisplayingEnzymes>biofilm</a> category.  
 
     </div>
 
     </div>
  
Line 345: Line 345:
 
     <div class = "containertext" id = "Chapter1">
 
     <div class = "containertext" id = "Chapter1">
 
         We also planned a fusion protein by <b>integration of TasA into a surface loop of CotA</b> as described in Engineering Bifunctional Laccase-Xylanase  
 
         We also planned a fusion protein by <b>integration of TasA into a surface loop of CotA</b> as described in Engineering Bifunctional Laccase-Xylanase  
         Chimeras for Improved Catalytic Performance by Ribeiro et al (2011)<sup id="cite_ref-430"><a href="#cite_note-430">[26]</a></sup>.  
+
         Chimeras for Improved Catalytic Performance by Ribeiro et al (2011)<sup id="cite_ref-430"><a target = "_blank" href="#cite_note-430">[26]</a></sup>.  
 
         Therefore, we had to move the signal peptide to the N-terminus of CotA. CM modeling indicates that the signal peptide is still accessible for cell export and both protein domains are correctly folded.  
 
         Therefore, we had to move the signal peptide to the N-terminus of CotA. CM modeling indicates that the signal peptide is still accessible for cell export and both protein domains are correctly folded.  
 
         Nevertheless, we discarded this approach, because <b>end-to-end linkage to TasA also promises functioning enzymes and is far more modular</b>.  
 
         Nevertheless, we discarded this approach, because <b>end-to-end linkage to TasA also promises functioning enzymes and is far more modular</b>.  
Line 362: Line 362:
 
     <div class = "TFcontainer">
 
     <div class = "TFcontainer">
 
     <div class = "containertext">
 
     <div class = "containertext">
         First the fusion proteins structure was predicted using the <b><a href = https://2020.igem.org/Team:TU_Darmstadt/Model/Enzyme_Modeling#EreB_CM><i>RosettaCM</i></a> application</b>.  
+
         First the fusion proteins structure was predicted using the <b><a target = "_blank" href = https://2020.igem.org/Team:TU_Darmstadt/Model/Enzyme_Modeling#EreB_CM><i>RosettaCM</i></a> application</b>.  
 
         Comparative modelling is a homology modelling approach that uses known structures with high sequence homologies to determine the enzymes 3D structure combined with an <i>Ab-Initio</i> approach for regions that cannot be aligned.  
 
         Comparative modelling is a homology modelling approach that uses known structures with high sequence homologies to determine the enzymes 3D structure combined with an <i>Ab-Initio</i> approach for regions that cannot be aligned.  
 
         Changes in secondary structure are randomly introduced and validated using the Rosetta Energy Function to approach the <b>proteins lowest free energy state</b>. Comparative modelling is an <b>excellent method for fusion proteins</b>, because both structures of the enzymes  
 
         Changes in secondary structure are randomly introduced and validated using the Rosetta Energy Function to approach the <b>proteins lowest free energy state</b>. Comparative modelling is an <b>excellent method for fusion proteins</b>, because both structures of the enzymes  
         (PDB: CueO: 5B7E, CotA: 1GSK, EreB and TasA (PDB: 5OF2) were already determined and show <b>100% complementary to the corresponding protein domains</b>. <sup id="cite_ref-431"><a href="#cite_note-431">[27]</a></sup>  
+
         (PDB: CueO: 5B7E, CotA: 1GSK, EreB and TasA (PDB: 5OF2) were already determined and show <b>100% complementary to the corresponding protein domains</b>. <sup id="cite_ref-431"><a target = "_blank" href="#cite_note-431">[27]</a></sup>  
         The domains structures were aligned and threaded onto the fusion protein sequence using the Rosetta partial thread application. Fragment files were generated using the old <a href="https://www.rcsb.org/structure/3b55"><i>Robetta</i> fragment server</a>.  
+
         The domains structures were aligned and threaded onto the fusion protein sequence using the Rosetta partial thread application. Fragment files were generated using the old <a target = "_blank" href="https://www.rcsb.org/structure/3b55"><i>Robetta</i> fragment server</a>.  
 
         It outputs a three- and nine-mer file by aligning fragments to PDB entries. The fragment files are then used as input structures to enhance the <i>Ab-Initio</i>  model’s precision.  
 
         It outputs a three- and nine-mer file by aligning fragments to PDB entries. The fragment files are then used as input structures to enhance the <i>Ab-Initio</i>  model’s precision.  
         Detailed information on the algorithm of <a href = https://2020.igem.org/Team:TU_Darmstadt/Model/Enzyme_Modeling#EreB_CM><i>RosettaCM</i></a> can be found in the modelling section for EreB. <br><br>
+
         Detailed information on the algorithm of <a target = "_blank" href = https://2020.igem.org/Team:TU_Darmstadt/Model/Enzyme_Modeling#EreB_CM><i>RosettaCM</i></a> can be found in the modelling section for EreB. <br><br>
 
     </div>
 
     </div>
 
<div class="containerimg" id=Chapter2>
 
<div class="containerimg" id=Chapter2>
Line 379: Line 379:
  
 
     <div class = "containertext">
 
     <div class = "containertext">
         To investigate functionality of the protein domains and structure stability of the fusion proteins <b><a href = https://2020.igem.org/Team:TU_Darmstadt/Model/Enzyme_Modeling#EreB_MD> MD simulations</a></b> were carried out using GROMACS and the Charmm27 forcefield with explicit TIP3P water.  
+
         To investigate functionality of the protein domains and structure stability of the fusion proteins <b><a target = "_blank" href = https://2020.igem.org/Team:TU_Darmstadt/Model/Enzyme_Modeling#EreB_MD> MD simulations</a></b> were carried out using GROMACS and the Charmm27 forcefield with explicit TIP3P water.  
 
         The forcefield calculates the forces working on all atoms in the system considering interactions such as bonded, nonbonded and special interactions. MD is a useful tool to validate the proteins folding in aqueous environment and study the enzyme’s movements in a time-dependent physical forcefield of Newtonian equations of motion.  
 
         The forcefield calculates the forces working on all atoms in the system considering interactions such as bonded, nonbonded and special interactions. MD is a useful tool to validate the proteins folding in aqueous environment and study the enzyme’s movements in a time-dependent physical forcefield of Newtonian equations of motion.  
 
         The <b>structure’s energy is minimized in a first step</b>. Afterwards the system gets equilibrated in a <b>NVT and NPT</b> simulation step. All of these steps are carried out using a <b>system restraint file</b> to maintain the target’s structure during the preparation steps.  
 
         The <b>structure’s energy is minimized in a first step</b>. Afterwards the system gets equilibrated in a <b>NVT and NPT</b> simulation step. All of these steps are carried out using a <b>system restraint file</b> to maintain the target’s structure during the preparation steps.  
Line 403: Line 403:
  
 
     <div class = "containertext">
 
     <div class = "containertext">
         For further analysis <b>principal component analysis (PCA)</b> was performed to analyse internal movement of the protein. By PCA we are able to filter global collective movement from local movement to study the enzymes dynamics. The MD simulation run’s covariance matrix was generated and diagonalized using the GROMACS covar tool. The resulting eigenvectors were visualized using the GROMACS anaeig tool and are here presented as 3D visualization of the first eigenvectors showing strongest protein fluctuation. As visible in the simulation the protein shows internal movements of outer residues but <b>functional internal domains stay structurally preserved</b> as already visible in the RMSF graph. Especially the azithromycin binding pocket and residues E43, H46, R55, R74, H285 and H288 of EreB enzyme domain (highlighted in the simulation) important for <b>catalytic activity remain stable and show weak movement</b>. The most relevant principal mode (PM) shows <b>increasing distance of the EreB and TasA protein domain</b> during the simulation. This way <b>accessibility of both domains increases</b> and the function of the linker peptide as dynamic connection of both domains can be assumed. In summary the PMs describing internal movements show reinforces our method of immobilizing the transforming enzymes with extrapolymeric matrix protein TasA and suggests our assumption of correct protein folding after fusion based on the publication of Huang et al. (2018). <sup id="cite_ref-429"><a href="#cite_note-429">[25]</a></sup> <br><br>
+
         For further analysis <b>principal component analysis (PCA)</b> was performed to analyse internal movement of the protein. By PCA we are able to filter global collective movement from local movement to study the enzymes dynamics. The MD simulation run’s covariance matrix was generated and diagonalized using the GROMACS covar tool. The resulting eigenvectors were visualized using the GROMACS anaeig tool and are here presented as 3D visualization of the first eigenvectors showing strongest protein fluctuation. As visible in the simulation the protein shows internal movements of outer residues but <b>functional internal domains stay structurally preserved</b> as already visible in the RMSF graph. Especially the azithromycin binding pocket and residues E43, H46, R55, R74, H285 and H288 of EreB enzyme domain (highlighted in the simulation) important for <b>catalytic activity remain stable and show weak movement</b>. The most relevant principal mode (PM) shows <b>increasing distance of the EreB and TasA protein domain</b> during the simulation. This way <b>accessibility of both domains increases</b> and the function of the linker peptide as dynamic connection of both domains can be assumed. In summary the PMs describing internal movements show reinforces our method of immobilizing the transforming enzymes with extrapolymeric matrix protein TasA and suggests our assumption of correct protein folding after fusion based on the publication of Huang et al. (2018). <sup id="cite_ref-429"><a target = "_blank" href="#cite_note-429">[25]</a></sup> <br><br>
 
     </div>
 
     </div>
  
Line 429: Line 429:
 
         <h4 style="text-align: left"> References</h4>
 
         <h4 style="text-align: left"> References</h4>
  
         <a  class="anchor" id="cite_note-40"></a><a class="referencestd" href=" https://doi.org/10.1016/S0076-6879(04)83004-0" target="_blank">1. CA. Roth et al. Protein Structure Prediction Using Rosetta. Methods in Enzymology 2004, 382 66-93, https://doi.org/10.1016/S0076-6879(04)83004-0 </a>  
+
         <a  class="anchor" id="cite_note-40"></a><a class="referencestd" target = "_blank" href=" https://doi.org/10.1016/S0076-6879(04)83004-0" target="_blank">[1] CA. Roth et al. Protein Structure Prediction Using Rosetta. Methods in Enzymology 2004, 382 66-93, https://doi.org/10.1016/S0076-6879(04)83004-0 </a>  
 
          
 
          
 
             <a  class="anchor" id="cite_note-41"></a>
 
             <a  class="anchor" id="cite_note-41"></a>
             <a class="referencestd" href="https://doi.org/10.1016/j.str.2013.08.005" target="_blank">2. Song Y et al. High-Resolution Comparative Modeling with RosettaCM. Structure 2013, 21 1735-1742, https://doi.org/10.1016/j.str.2013.08.005</a>
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1016/j.str.2013.08.005" target="_blank">[2] Song Y et al. High-Resolution Comparative Modeling with RosettaCM. Structure 2013, 21 1735-1742, https://doi.org/10.1016/j.str.2013.08.005</a>
  
 
          
 
          
         <a  class="anchor" id="cite_note-42"></a><a class="referencestd" href=" https://doi.org/10.1038/nprot.2013.074" target="_blank">3. SA. Combs et al. Small-molecule ligand docking into comparative models with Rosetta, Nature Protocol 2013, 8(7) 1277-98, doi: 10.1038/nprot.2013.074</a>  
+
         <a  class="anchor" id="cite_note-42"></a><a class="referencestd" target = "_blank" href=" https://doi.org/10.1038/nprot.2013.074" target="_blank">[3] SA. Combs et al. Small-molecule ligand docking into comparative models with Rosetta, Nature Protocol 2013, 8(7) 1277-98, doi: 10.1038/nprot.2013.074</a>  
 
          
 
          
 
         <a  class="anchor" id="cite_note-43"></a>  
 
         <a  class="anchor" id="cite_note-43"></a>  
         <a class="referencestd" href="https://doi.org/10.1007/978-1-4939-3569-7_4" target="_blank">4. Moretti R et al. Rosetta and the Design of Ligand Binding Sites. Methods Mol Biol. 2016; 1414: 47–62. doi: 10.1007/978-1-4939-3569-7_4</a>  
+
         <a class="referencestd" target = "_blank" href="https://doi.org/10.1007/978-1-4939-3569-7_4" target="_blank">[4] Moretti R et al. Rosetta and the Design of Ligand Binding Sites. Methods Mol Biol. 2016; 1414: 47–62. doi: 10.1007/978-1-4939-3569-7_4</a>  
  
 
             <a  class="anchor" id="cite_note-1"></a>
 
             <a  class="anchor" id="cite_note-1"></a>
             <a class="referencestd" href="https://doi.org/10.1002/prot.1170" target="_blank">5. Bonneu R et al. Rosetta in CASP4: Progress in ab initio protein structure prediction. Proteins 2001 doi: 10.1002/prot.1170</a>
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1002/prot.1170" target="_blank">[5] Bonneu R et al. Rosetta in CASP4: Progress in ab initio protein structure prediction. Proteins 2001 doi: 10.1002/prot.1170</a>
 
      
 
      
 
             <a  class="anchor" id="cite_note-2"></a>
 
             <a  class="anchor" id="cite_note-2"></a>
             <a class="referencestd" href="https://doi.org/10.1146/annurev.biochem.77.062906.171838" target="_blank">6. Das R. and Baker D. Macromolecular Modeling with Rosetta. Annual Review of Biochemistry 2008, 77:363-382, https://doi.org/10.1146/annurev.biochem.77.062906.171838</a>
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1146/annurev.biochem.77.062906.171838" target="_blank">[6] Das R. and Baker D. Macromolecular Modeling with Rosetta. Annual Review of Biochemistry 2008, 77:363-382, https://doi.org/10.1146/annurev.biochem.77.062906.171838</a>
  
 
             <a  class="anchor" id="cite_note-4"></a>
 
             <a  class="anchor" id="cite_note-4"></a>
             <a class="referencestd" href="https://doi.org/10.1186/s12859-018-2367-z" target="_blank">7. Kunzmann P. and Hamacher K. Biotite: a unifying open source computational biology framework in Python. BMC Bioinformatics 2018, 19 346, https://doi.org/10.1186/s12859-018-2367-z </a>
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1186/s12859-018-2367-z" target="_blank">[7] Kunzmann P. and Hamacher K. Biotite: a unifying open source computational biology framework in Python. BMC Bioinformatics 2018, 19 346, https://doi.org/10.1186/s12859-018-2367-z </a>
  
 
             <a  class="anchor" id="cite_note-5"></a>
 
             <a  class="anchor" id="cite_note-5"></a>
             <a class="referencestd" href="8. https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/relax" target="_blank">5. https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/relax</a>
+
             <a class="referencestd" target = "_blank" href="8. https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/relax" target="_blank">[8] https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/relax</a>
  
 
             <a  class="anchor" id="cite_note-6"></a>
 
             <a  class="anchor" id="cite_note-6"></a>
             <a class="referencestd" href="https://doi.org/10.1371/journal.pone.0059004" target="_blank">9. Nivon LG et al. A Pareto-Optimal Refinement Method for Protein Design Scaffolds. PLOS ONE 2013, https://doi.org/10.1371/journal.pone.0059004</a>
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1371/journal.pone.0059004" target="_blank">[9] Nivon LG et al. A Pareto-Optimal Refinement Method for Protein Design Scaffolds. PLOS ONE 2013, https://doi.org/10.1371/journal.pone.0059004</a>
  
 
             <a  class="anchor" id="cite_note-7"></a>
 
             <a  class="anchor" id="cite_note-7"></a>
             <a class="referencestd" href="https://doi.org/10.4137/DTI.S10219" target="_blank">10. Mbah AN et al. Drug Target Exploitable Structural Features of Adenylyl Cyclase Activity in Schistosoma mansoni. Drug Target Insights 2012, 6 41-58, doi: 10.4137/DTI.S10219</a>
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.4137/DTI.S10219" target="_blank">[10] Mbah AN et al. Drug Target Exploitable Structural Features of Adenylyl Cyclase Activity in Schistosoma mansoni. Drug Target Insights 2012, 6 41-58, doi: 10.4137/DTI.S10219</a>
  
 
<a  class="anchor" id="cite_note-8"></a>
 
<a  class="anchor" id="cite_note-8"></a>
             <a class="referencestd" href="https://doi.org/10.1021/bi201790u" target="_blank">11. Morar M. et al. Mechanism and Diversity of the Erythromycin Esterase Family of Enzymes. Biochemistry 2012, 51(8) 1740-51, doi: 10.1021/bi201790u</a>
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1021/bi201790u" target="_blank">[11] Morar M. et al. Mechanism and Diversity of the Erythromycin Esterase Family of Enzymes. Biochemistry 2012, 51(8) 1740-51, doi: 10.1021/bi201790u</a>
 
      
 
      
 
             <a  class="anchor" id="cite_note-9"></a>
 
             <a  class="anchor" id="cite_note-9"></a>
             <a class="referencestd" href="https://doi.org/10.1039/b800799c" target="_blank">12. Edward I. Solomon, Anthony J. Augustine and Jungjoo Yoon O2 Reduction to H2O by the multicopper oxidases. Dalton Transport 2008, 30, doi: 10.1039/b800799c </a>
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1039/b800799c" target="_blank">[12] Edward I. Solomon, Anthony J. Augustine and Jungjoo Yoon O2 Reduction to H2O by the multicopper oxidases. Dalton Transport 2008, 30, doi: 10.1039/b800799c </a>
  
 
             <a  class="anchor" id="cite_note-10"></a>
 
             <a  class="anchor" id="cite_note-10"></a>
             <a class="referencestd" href="13. http://manual.gromacs.org/documentation/2018/user-guide/terminology.html#gmx-force-field" target="_blank">10. http://manual.gromacs.org/documentation/2018/user-guide/terminology.html#gmx-force-field</a>
+
             <a class="referencestd" target = "_blank" href="13. http://manual.gromacs.org/documentation/2018/user-guide/terminology.html#gmx-force-field" target="_blank">[13] http://manual.gromacs.org/documentation/2018/user-guide/terminology.html#gmx-force-field</a>
  
 
             <a  class="anchor" id="cite_note-11"></a>
 
             <a  class="anchor" id="cite_note-11"></a>
             <a class="referencestd" href="https://doi.org/10.1002/jcc.20291" target="_blank">14. Van Der Spoel D et al. GROMACS: Fast, flexible, and free. Journal of Computational Chemistry 2005, 26(16):1701-18. doi:10.1002/jcc.20291</a>
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1002/jcc.20291" target="_blank">[14] Van Der Spoel D et al. GROMACS: Fast, flexible, and free. Journal of Computational Chemistry 2005, 26(16):1701-18. doi:10.1002/jcc.20291</a>
  
 
             <a  class="anchor" id="cite_note-12"></a>
 
             <a  class="anchor" id="cite_note-12"></a>
             <a class="referencestd" href="https://doi.org/10.1002/1097-0282(2000)56:4<257::AID-BIP10029>3.0.CO;2-W" target="_blank">15. MacKerell Jr. AD et al. Development and Current Status of the CHARMM Force Field for Nucleic Acids. Biopolymers 2001, 56(4) 257-265 https://doi.org/10.1002/1097-0282(2000)56:4<257::AID-BIP10029>3.0.CO;2-W.</a>
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1002/1097-0282(2000)56:4<257::AID-BIP10029>3.0.CO;2-W" target="_blank">[15] MacKerell Jr. AD et al. Development and Current Status of the CHARMM Force Field for Nucleic Acids. Biopolymers 2001, 56(4) 257-265 https://doi.org/10.1002/1097-0282(2000)56:4<257::AID-BIP10029>3.0.CO;2-W.</a>
  
 
             <a  class="anchor" id="cite_note-13"></a>
 
             <a  class="anchor" id="cite_note-13"></a>
             <a class="referencestd" href="https://doi.org/10.1016/j.sbi.2013.12.006" target="_blank">16. Piana et al Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. Current Opinion in Structural Biology 2014, 24 98-105. https://doi.org/10.1016/j.sbi.2013.12.006</a>
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1016/j.sbi.2013.12.006" target="_blank">[16] Piana et al Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. Current Opinion in Structural Biology 2014, 24 98-105. https://doi.org/10.1016/j.sbi.2013.12.006</a>
  
 
             <a  class="anchor" id="cite_note-14"></a>
 
             <a  class="anchor" id="cite_note-14"></a>
             <a class="referencestd" href="https://www.uni-muenster.de/Physik.TP/archive/fileadmin/lehre/TheorieAKkM/ws12/Schelte.pdf" target="_blank">17. https://www.uni-muenster.de/Physik.TP/archive/fileadmin/lehre/TheorieAKkM/ws12/Schelte.pdf</a>
+
             <a class="referencestd" target = "_blank" href="https://www.uni-muenster.de/Physik.TP/archive/fileadmin/lehre/TheorieAKkM/ws12/Schelte.pdf" target="_blank">[17] https://www.uni-muenster.de/Physik.TP/archive/fileadmin/lehre/TheorieAKkM/ws12/Schelte.pdf</a>
  
 
             <a  class="anchor" id="cite_note-15"></a>
 
             <a  class="anchor" id="cite_note-15"></a>
             <a class="referencestd" href="https://doi.org/10.1002/jcc.23354" target="_blank">18. Huang J. and MacKerell Jr D. CHARMM36 all‐atom additive protein force field: Validation based on comparison to NMR data. Journal of computational Chemistry 2013, DOI: 10.1002/jcc.23354</a>
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1002/jcc.23354" target="_blank">[18] Huang J. and MacKerell Jr D. CHARMM36 all‐atom additive protein force field: Validation based on comparison to NMR data. Journal of computational Chemistry 2013, DOI: 10.1002/jcc.23354</a>
  
 
             <a  class="anchor" id="cite_note-16"></a>
 
             <a  class="anchor" id="cite_note-16"></a>
             <a class="referencestd" href="http://gensoft.pasteur.fr/docs/lammps/12Dec2018/Howto_tip3p.html" target="_blank">19. http://gensoft.pasteur.fr/docs/lammps/12Dec2018/Howto_tip3p.html</a>
+
             <a class="referencestd" target = "_blank" href="http://gensoft.pasteur.fr/docs/lammps/12Dec2018/Howto_tip3p.html" target="_blank">[19] http://gensoft.pasteur.fr/docs/lammps/12Dec2018/Howto_tip3p.html</a>
  
 
             <a  class="anchor" id="cite_note-17"></a>
 
             <a  class="anchor" id="cite_note-17"></a>
             <a class="referencestd" href="http://www.bpc.uni-frankfurt.de/guentert/wiki/images/9/96/180618_TutorialMD.pdf" target="_blank">20. http://www.bpc.uni-frankfurt.de/guentert/wiki/images/9/96/180618_TutorialMD.pdf</a>
+
             <a class="referencestd" target = "_blank" href="http://www.bpc.uni-frankfurt.de/guentert/wiki/images/9/96/180618_TutorialMD.pdf" target="_blank">[20] http://www.bpc.uni-frankfurt.de/guentert/wiki/images/9/96/180618_TutorialMD.pdf</a>
  
 
             <a  class="anchor" id="cite_note-18"></a>
 
             <a  class="anchor" id="cite_note-18"></a>
             <a class="referencestd" href="http://www.mdtutorials.com/gmx/lysozyme/index.html" target="_blank">21. http://www.mdtutorials.com/gmx/lysozyme/index.html</a>
+
             <a class="referencestd" target = "_blank" href="http://www.mdtutorials.com/gmx/lysozyme/index.html" target="_blank">[21] http://www.mdtutorials.com/gmx/lysozyme/index.html</a>
  
 
             <a  class="anchor" id="cite_note-19"></a>
 
             <a  class="anchor" id="cite_note-19"></a>
             <a class="referencestd" href="https://doi.org/10.1002/chem.201905598" target="_blank">22. Zhang L. et al Engineering of Laccase CueO for Improved Electron Transfer in Bioelectrocatalysis by Semi-Rational Design. Chemistry A European Journal 2020, 26(22), DOI: 10.1002/chem.201905598</a>
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1002/chem.201905598" target="_blank">[22] Zhang L. et al Engineering of Laccase CueO for Improved Electron Transfer in Bioelectrocatalysis by Semi-Rational Design. Chemistry A European Journal 2020, 26(22), DOI: 10.1002/chem.201905598</a>
  
 
             <a class="anchor" id="cite_note-20"></a>  
 
             <a class="anchor" id="cite_note-20"></a>  
             <a class="referencestd" href="23. http://manual.gromacs.org/documentation/2019-rc1/onlinehelp/gmx-covar.html#gmx-covar" target="_blank">20. http://manual.gromacs.org/documentation/2019-rc1/onlinehelp/gmx-covar.html#gmx-covar </a>  
+
             <a class="referencestd" target = "_blank" href="23. http://manual.gromacs.org/documentation/2019-rc1/onlinehelp/gmx-covar.html#gmx-covar" target="_blank">[23] http://manual.gromacs.org/documentation/2019-rc1/onlinehelp/gmx-covar.html#gmx-covar </a>  
  
 
             <a class="anchor" id="cite_note-21"></a>  
 
             <a class="anchor" id="cite_note-21"></a>  
             <a class="referencestd" href="https://www3.mpibpc.mpg.de/groups/de_groot/compbio1/p4/index.html" target="_blank">24. B. de Groot. Practical 5: Principal components analysis</a>
+
             <a class="referencestd" target = "_blank" href="https://www3.mpibpc.mpg.de/groups/de_groot/compbio1/p4/index.html" target="_blank">[24] B. de Groot. Practical 5: Principal components analysis</a>
 
              
 
              
 
             <a  class="anchor" id="cite_note-429"></a>
 
             <a  class="anchor" id="cite_note-429"></a>
             <a class="referencestd" href="https://doi.org/10.1038/s41589-018-0169-2" target="_blank">25. Huang, J et al. Programmable and printable Bacillus subtilis biofilms as engineered living materials. Nature Chemical Biology 2018, doi: 10.1038/s41589-018-0169-2</a>  
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1038/s41589-018-0169-2" target="_blank">[25] Huang, J et al. Programmable and printable Bacillus subtilis biofilms as engineered living materials. Nature Chemical Biology 2018, doi: 10.1038/s41589-018-0169-2</a>  
  
 
             <a  class="anchor" id="cite_note-430"></a>
 
             <a  class="anchor" id="cite_note-430"></a>
             <a class="referencestd" href="https://doi.org/10.1074/jbc.M111.253419" target="_blank">26. LF. Ribeiro et al. Engineering bifunctional laccase-xylanase chimeras for improved catalytic performance. J Biol Chem. 2011, 286(50) 43026-38, doi: 10.1074/jbc.M111.253419</a>  
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1074/jbc.M111.253419" target="_blank">[26] LF. Ribeiro et al. Engineering bifunctional laccase-xylanase chimeras for improved catalytic performance. J Biol Chem. 2011, 286(50) 43026-38, doi: 10.1074/jbc.M111.253419</a>  
  
 
             <a  class="anchor" id="cite_note-431"></a>
 
             <a  class="anchor" id="cite_note-431"></a>
             <a class="referencestd" href="https://doi.org/10.1016/j.pnsc.2008.12.007 " target="_blank">27. J. Zhang et al. Design and optimization of a linker for fusion protein construction. Progress in Natural Science 2009, 19(10) 1197-1200, https://doi.org/10.1016/j.pnsc.2008.12.007</a>  
+
             <a class="referencestd" target = "_blank" href="https://doi.org/10.1016/j.pnsc.2008.12.007 " target="_blank">[27] J. Zhang et al. Design and optimization of a linker for fusion protein construction. Progress in Natural Science 2009, 19(10) 1197-1200, https://doi.org/10.1016/j.pnsc.2008.12.007</a>  
 
  </div>
 
  </div>
  

Revision as of 09:23, 24 October 2020

image/svg+xml - O O



Why Rosetta?

Rosetta is a software suite capable of solving a multitude of computational macromolecular problems such as de novo protein design, enzyme design, ligand docking and structure prediction of biological macromolecules or macromolecular complexes. The Rosetta energy functions enables relatively precise solution of a broad range of applications by considering many different energy terms relevant for protein folding such as solvation, electrostatic effects and hydrogen bonding. It can be used to perform simulations starting from designing macromolecular structures, interactions and RNA or fibril structures up to the de novo design of a fully functioning enzyme! It was originally developed by the David Baker Lab. [1] Rosetta is free for academic users and a very powerful tool for multiple problems that come along with elaborating a synthetic biology problem. Although there are lot of information available in the Rosetta Documentation, it is very hard for people that want to get started with Rosetta to improve their project, especially for people without experience with console-based applications. Nevertheless, Rosetta displays an amazing and multifaceted tool for synthetic biology. To counter the starting issues with the program we provide a guide for Rosetta on our Wiki.

We used multiple Rosetta applications to model the properties of our enzymes. The collected data allows us to predict enzyme functionality when immobilized in our biofilm with TasA, binding affinity towards different pharmaceuticals or pollutants and many more.
RosettaCM was used to generate structure predictions of the azithromycin transforming enzyme EreB and fusion proteins consisting of matrix protein TasA and our enzymes CueO, CotA and EreB [2].
Rosetta Ligand Docking was used to study the binding affinity of various substrates towards the enzymes active site[3].
Protein Design was used to enhance the binding affinity of our target molecules to the corresponding enzymes’ active site by introduction of mutations [4].

To find out more about the different aspects of the modelling using Rosetta you can read the articles thematizing these applications. If you want to use Rosetta on your own you can use our Rosetta Guide to get started with the program.

Laccase Docking

EreB CM-Modeling

By carrying out structure prediction calculations using the Rosetta comparative modelling application RosettaCM we hope to create a precise 3D-model of the enzyme EreB. RosettaCM is based on homology modelling, comparing the protein structure to known crystal structures of proteins with a high sequence homology. Structure of unaligned sequences showing no or low homology to the given template structures are generated using the Rosetta ab initio protocol. Low homology is a consequence of mismatches or gaps in the structures’ alignments. This protocol uses a library of nine-mer and three-mer fragments of known protein structures to predict possible folding. [5]
EreB possesses high similarity in its active site with other esterases such as protein data bank (PDB) entry succinoglycan biosynthesis protein. This enhances the accuracy of the structure prediction method homology modelling since the similar domains can be used as a template for the structure. Nevertheless, the highest found structural homology of another PDB entry is 25,90% (Succinoglycan biosynthesis protein 2QGM, Chain A). Since the structure prediction relies on the statistical Monte Carlo method, multiple modelling runs are necessary to obtain a precise structure.
figure Figure 1: EreB crystal structure generated with RosettaCM
Proteins with high sequence homologies are found by blasting the EreB sequence against the PDB using the NCBI Blast application (blastp). 3 protein structures were identified to be sequentially related to EreB and were threaded onto the query sequence. Rosetta’s partial thread application assigns the templates’ structural data on the aligned sequences of the target structure to prepare the structure prediction run.
Succinoglycan biosynthesis protein (2QGM_A: X-ray diffraction, 1,70 Å + 2RAD_A: X-ray diffraction, 2,75 Å) by Bacillus cereus ATCC 14579 expressed in E. coli and Q81BN2_BACCR protein also from Bacillus cereus ATCC 14579 (3B55_A: X-ray diffraction, 2,30 Å) were used as templates.
The resulting threaded models are aligned in a single global frame and are then used to create a full chain model of the proteins 3D structure by Monte Carlo sampling. Monte Carlo sampling relies on random sampling of variants for solution of a problem that is deterministic, for example protein folding.
Fragment files were generated using the old Robetta fragment server. It outputs a three- and nine-mer file by aligning fragments to PDB entries. The fragment files are then used as input structures to enhance the model’s precision.
The structural changes are then scored using the Rosetta low resolution energy function. This approach relies on the fact that the desired proteins 3D structure is expected to be the minimum of its free energy function. Rosetta’s energy function takes hydrophobic interactions between non-polar residues, van der Waals interactions between buried atoms and the strong size dependence of forming a cavity in the solvent for accommodation of the folded protein in account. In this process atom-atom-interactions are calculated as Lennard-Jones potentials, hydrophobic interactions and the electrostatic desolvation of polar residues inside the molecule as implicit solvation models and an explicit hydrogen bonding potential for hydrogen bonds. [2, 6] The calculated energy is then assessed by the Metropolis acceptance criterion: If ΔE < 0 the structure is accepted, otherwise the newly proposed structure is accepted with probability p.
figure
Structures are built using the Rosetta “fold tree”.[6] Therefore, backbone and side chain conformation are displayed in a torsion space (Bonded interactions are mostly treated with ideal bond lengths and angles). Additionally, the position of each residue is displayed in a Cartesian space. Torsion angles are changed according to the fragment files or the provided templates and positions of homology fragments are substituted, combining an ab initio approach with a homology modelling approach. This combination of template derived fragments in Cartesian space with torsion angles and residue positions derived from fragments from the PDB database should ideally converge into the correctly folded protein topology. [2]
To solve clashes, distorted peptide bonds and poor backbone hydrogen bond geometry that often arises from this CM approach further improvements are necessary: In a second step the structure is improved by replacing backbone segments through Monte-Carlo-Method by either segments taken from the PDB that span the region and can roughly be superimposed on the selected residues or segments from the template structures that superimpose the complete segment. Afterwards the structure’s energy is minimized using a smoothed version of Rosetta’s low energy function. In a third step side chain residues are added and structure refinement is carried out using a physically realistic energy function.[2]
Since Comparative modelling is a statistical approach for protein folding based on the Monte-Carlo-Method and comparison to related structures, a high number of structures has to be generated to ensure the predicted structure is as close as possible to the actual protein structures. In this context 20000 structures were generated using the “Lichtenberg high performance computer” of the TU Darmstadt. After the run finished the best structures were sorted by their total score calculated by RosettaCM and the best structure (S_17070.pdb) was used for further calculations. The total run was analysed using the Biotite Python package and its implemented superimpose and RMSD feature. [7]
Figure 2: 5 best scoring structures aligned in Pymol (S_17070: 4031.077, S_09664: 4033.266, S_00324: 4041.481, S_17103: 4069.085, S_12758: 4070.893)
RMSD values equals the Root-mean-square deviation of atomic positions to represent the structure similarity of two molecules. It is calculated using the following formula:
figure figure
Figure 3: RMSD vs total score plot of the RosettaCM run. 20000 structures were generated and analyzed using the biotite python package.
For structure refinement an additional Relax run was carried out with an output of 100 structures to ensure realistic torsion angles. Relax is an all atom structure refinement application working in the structures local conformational space. [8] Regions of high energy in the proteins structure are optimized considering backbone and sidechain restraints to minimize structural derivation. Optimization follows usual methods as torsion-space sidechain minimization, torsion-space backbone minimization, and re-sampling of sidechain rotamers. [9]
To evaluate the obtained protein structure Ramachandran plots were created and can be compared to Ramachandran plots generated using a broad variety of protein’s crystal structures using the Procheck webserver. We checked whether the dihedral angles of the modelled secondary structure show the typical distribution to validate the model's accuracy.[10] The evaluation showed that 328 residues are located in the most favoured regions, 44 in the additional allowed regions and 2 residues in the disallowed regions. Glycine and proline residues were excluded since they show no predictable dihedral angle distribution. 87,7% of the structures dihedral angles are located in the most favourable regions and 99,5% in total in allowed regions. Therefore, the structure is expected to be a good model of EreB’s crystal structure.
figure
figure
To summarize we used the RosettaCM application to predict the structure of our target enzyme EreB by creating 20000 structures on the Lichtenberg server cluster. We then relaxed the best scoring structure and validated it dihedral angles using a Ramachandran plot. For further investigations on enzyme stability MD simulation will be performed on the obtained structure.

EreB MD Simulation

Molecular dynamics simulation (MD) simulates the behaviour of a molecule in a small space that can be filled with solvent molecules (mostly H2O). Therefore, it is suited to study the stability of our enzymes solved in aqueous environment. MD displays a far more dynamic and physical approach than comparative modelling.
Comparative modelling only creates a temporary, stationary image of a dynamic biomolecule without simulating the molecules behaviour within a force field even though dynamic motion of the protein is crucial for enzymatic activity. Molecular dynamics offers a physical approach over a certain simulated period of time for structure evaluation in contrast to statistical approaches like homology modelling neglecting dynamic physical forces between residues or interactions with solvent molecules. To validate the modelled structure of EreB and the fusion proteins of our enzymes with matrix protein TasA MD simulations were executed with the GROMACS (Groningen Machine for Chemical Simulations) software suit.
Although the structures obtained from our CM run were relaxed using the Rosetta Relax application, Monte Carlo based structure prediction tools do not always output fully relaxed protein structures and implicitly simulate the interactions with water molecules, Water molecules are a main part of the cause for highly important interactions responsible for the proteins structure such as hydrophobic interactions mostly between a protein’s inner residues or hydrophilic interactions of the proteins surface residues with water molecules. Also, interaction with water is one of the primary mechanisms for protein folding and consequently has to be considered for structure prediction or validation.
These interactions can be considered in MD simulation by providing a set of explicit water molecules interacting with the molecule and physical, time-dependant calculation of the systems’ forces using Newton’s equation of motion. To limit computing power only a small system containing one protein and enough water to negotiate interactions between different proteins is created and results are transferred to a realistic system. Periodic boundary conditions allow this transfer: The system is assumed to be consisting of multiple small systems that act identical, so when a molecule moves out of the simulated box it enters again on the other site, keeping the particle number constant and acting like a subsystem for simulation of a big system. By proving stability of our predicted (fusion)proteins we can also forecast their functionality: If the residues catalysing the ester cleavage of EreB form their already described catalytic centre and the active site is accessible for azithromycin the catalytic activity can most likely be assumed.[11] Also, for the laccases catalytic activity can be assumed, if the copper sites are correctly folded to coordinate copper ions for oxidation catalysis and the substrate binding site equals the one of the obtained crystal structures.[12]
figure
The GROMACS software suit combines many tools for chemical and biochemical calculations. For MD simulation an external force field is integrated into the application. MD simulations contain methods calculating the forces exerted to all atoms of a biomolecular systems, mostly a target molecule solvated in water. Therefore, Newtonian equations of motions are used to predict the position and velocity of any atom in the system in small timesteps (femtoseconds). The forces are calculated using a force field, for example GROMOS or AMBER. “Force fields are sets of potential functions and parametrized interactions that can be used to study physical systems.” [13] They are derived from the equations of motion and therefore introduce time-dependency of the system. The force field consist of 3 types of interactions:
1. Bonded interactions between 2, 3 or 4 particles including harmonic, cubic and morse potentials for 2 particle systems, harmonic interactions for 3 particle systems
2. Nonbonded interactions between different molecules. The repulsion described by an exponential term, e. g. Lennard Jones potential, and a Coulomb term.
3. Special interactions defined by the position restraint of a given system, f. e. distance restraints obtained by Nuclear Overhauser Effect data from high resolution NMR.[14]
figure
Figure 4: Forcefield equation for Charmm27 forcefield. Kb, Kθ, KUB, Kχ, and Kφ are the bond, valence angle, Urey–Bradley, dihedral angle, and im-proper dihedral angle force constants, respectively; b, θ, S, χ and δ are the bond length, bond angle, Urey–Bradley 1,3 distance, dihedral torsion angle and improper dihedral angle. [15]
Molecular dynamics simulation considers Newtonian forces on every atom in the simulated system and can therefore deliver, dependent of the force fields accuracy, precise results.[16] Nevertheless, it is still an approximation and thus connected to imprecisions: The forces calculated are cut after a defined distance to limit required computing power. Also, the forcefield calculates forces on atomic levels without quantum mechanics taken into account. This adoption is based on the Born-Oppenheimer MD approximation that splits an atom’s energy into core-energy and electron-energy since the electrons’ dynamics do not directly influence the atomic core. The core’s kinetic energy can then be taken into account by classical approximation of Newton’s law of movement. The quantum mechanical parts of this system, the electrons’ wave functions, are not considered for classical MD simulation. [17] Thus, MD simulations are very accurate methods for most large systems but still based on approximations. Therefore, the obtained results are reliable but always have to be double checked by experiment in the laboratory.
Responsive image
Figure 5: EreB structure after relaxation and equilibration in a cubic box filled with water and Na+ counter ions. Catalytically relevant residues are highlighted on the right and show functional positioning.
The system used is a cubic box filled with explicit water molecules (TIP3 model) with the target enzyme centred in the box. Also, ions and counter ions are added to simulate realistic conditions and equilibrate the enzymes charge in solution due to (de)protonation. The system topology was created using the CHARMM27 forcefield with the TIP3P water model, specifying a 3-site rigid water molecule with charges and Lennard-Jones parameters assigned to each of the 3 atoms. [18, 19] Afterwards a cuboid box with at least 1.2 nm distance of the borders to the protein is created and filled with water molecules and ions countering the proteins charge (system size: 8.000 5.425 5.921 (nm)). The system's energy is then minimized. Afterwards NVT (constant number of particles, volume and temperature) and NPT (constant number of particles, pressure and temperature) equilibration is done using a position restraint file to keep the enzymes structure during equilibration of temperature and pressure of the system. Both NVT and NPT are carried out for 100 ps to ensure stabilization of the parameters. After the equilibration the main simulation can be started. The simulation was carried out for 100 ns with totally 50000000 steps (each 2 fs). [20, 21, 22]
can be used to analyse and validate a MD simulation. These values give an important first insight into the structural stability of the structure obtained from CM modelling. Converging RMSD and radii of gyration can therefore be used as primary indicators for stable protein structures. RMSF displays the average derivation of particles over a time from its original position and therefore shows which regions of the protein show the highest dynamic and which domains are structurally preserved. Radius of gyration displays the root-mean-square distance from each atom of a protein to its centroid and can be used to analyse which regions of a protein are denatured or which regions show a high amount of secondary structure motifs. RMSD was calculated compared to both relaxed and crystal structure, RMSF was calculated for C alpha backbone atoms and gyration radius for the whole protein. The specific value of convergence depends to the size of the protein that is subject to MD simulation. The RMSD and gyration angle plots were analysed after the 100 ns simulation and show a clear trend of convergence. Also, the RMSF plot shows really small movement at the residues essential for the catalytic process of EreB E43, H46, R55, R74, H285 and H288.[11]

figure
Figure 6: A: RMSD plot of the EreB structure obtained by CM. The red plot represents RMSD and the black plot the RMSD calculated against the crystal structure. After 65 ns both the values start fluctuating about a nearly constant value suggesting convergence, a main indicator for a stable protein structure. B: Radius of gyration (Rg) plot. Convergence can be assumed as the value only fluctuates around a constant Rg of 2.44 nm. C: RMSF plot for every residue in the protein’s 3D structure. For RMSF determination the C alpha values were observed. The catalytically relevant residues E43, H46, R55, R74, H285 and H288 show low fluctuation, suggesting a preserved active site and enzymatic activity.
For further analysis principle component analysis (PCA) was performed on the simulation logs. PCA allows us to filter global collective movement from local, fast movement to further visualize and study the dynamics of a protein. GROMACS covar tool is used to calculate a covariance matrix of the proteins atomic fluctuation that can be diagonalized to create a set of eigenvalues and eigenvectors describing the proteins modes of fluctuation. The covariation matrix of a protein describes the covariance meaning the dependency of each fluctuation to the other movements. Hereby, the eigenvectors represent the largest-amplitude correlated motions and are called principal components or essential modes.[23] The GROMACS anaeig tool can be used to visualize these principal components by projecting the proteins trajectory on the given eigenvectors (here 1-4). [24] As shown in the animation the molecule shows strong internal movement, but the catalytically important residues and the azithromycin binding pocket stay structurally preserved, suggesting activity of the enzyme.
In conclusion we used a MD simulation run of 100 ns to validate the structure determined by homology modelling in a physical time-dependant forcefield. The simulation was done in a cubic box with at least 1.2 nm distance of the centred protein to the corners of the cube. The system was minimized and equilibrated by NVT and NPT simulations for both 100 ps. We analysed the systems temperature and pressure afterwards, which showed small fluctuation about the expected values. We were able to start the MD production run for 100 ns. Analysis of the production run showed converging RMSD and radii of gyration values as well as small RMSF values on the active site residues. Consequently, we validated the obtained structure as a possible crystal structure of erythromycin esterase type II EreB.

1st PM: dark blue, 2nd PM: light blue 3rd PM: purple, 4th PM: light blue
Responsive image Responsive image
Figure 6: Visualization of the first four essential modes' extremes. The covariance matrix was analysed using the GROMACS anaig tool and compared to the CM modeling derived structure candidate S_17070.pdb. To limit computational dependendencys and because they allow good conclusions to the protein structures only backbone atoms were considered. The first two principal modes (PMs) containing the stongest fluctuation are shown on the left. On the right the 3rd and 4th PMs are displayed.

TasA fusion proteins

Our selected enzymes for pharmaceutical transformation are supposed to be embedded into the extracellular polymeric matrix of our B. subtilis biofilm by fusing the enzymes to the matrix protein TasA (PDB entry 5OF2). Therefore, we are following the method described in Programmable and printable Bacillus subtilis biofilms as engineered living materials by Huang et al. (2018), that fused it exemplary to various proteins or protein domains like mCherry or MHETase to introduce new functions to the biofilm [25]. The exact methods for fusion protein construction is documented in our corresponding wiki text in the biofilm category.
We also planned a fusion protein by integration of TasA into a surface loop of CotA as described in Engineering Bifunctional Laccase-Xylanase Chimeras for Improved Catalytic Performance by Ribeiro et al (2011)[26]. Therefore, we had to move the signal peptide to the N-terminus of CotA. CM modeling indicates that the signal peptide is still accessible for cell export and both protein domains are correctly folded. Nevertheless, we discarded this approach, because end-to-end linkage to TasA also promises functioning enzymes and is far more modular. For integration of TasA into the enzymes sequence we always had to find surface loops that aren’t sufficient for the enzymes’ functionality by allowing correct protein folding, including residues essential for substrate binding or active site residues. This would contradict with our idea of a modular biofilm, to which new enzymes could easily be added.
figure Figure 7: TasA CotA fusion protein. TasA was inserted in a surface loop as described by Ribeiro et al. The structure was aligned to CotA and TasA crystal structure taken from the PDB. (signal peptide: darkblue, fusion protein CotA domain: deeppurple, fusion TasA domain: deepsalomon, CotA: turqoise, TasA: grey)
First the fusion proteins structure was predicted using the RosettaCM application. Comparative modelling is a homology modelling approach that uses known structures with high sequence homologies to determine the enzymes 3D structure combined with an Ab-Initio approach for regions that cannot be aligned. Changes in secondary structure are randomly introduced and validated using the Rosetta Energy Function to approach the proteins lowest free energy state. Comparative modelling is an excellent method for fusion proteins, because both structures of the enzymes (PDB: CueO: 5B7E, CotA: 1GSK, EreB and TasA (PDB: 5OF2) were already determined and show 100% complementary to the corresponding protein domains. [27] The domains structures were aligned and threaded onto the fusion protein sequence using the Rosetta partial thread application. Fragment files were generated using the old Robetta fragment server. It outputs a three- and nine-mer file by aligning fragments to PDB entries. The fragment files are then used as input structures to enhance the Ab-Initio model’s precision. Detailed information on the algorithm of RosettaCM can be found in the modelling section for EreB.

figure Figure 8: EreB TasA fusion protein. TasA was fused N-terminally to EreB with a short linker peptide as described by Huang et al. Structure was obtained by homology modelling using RosettaCM.
To investigate functionality of the protein domains and structure stability of the fusion proteins MD simulations were carried out using GROMACS and the Charmm27 forcefield with explicit TIP3P water. The forcefield calculates the forces working on all atoms in the system considering interactions such as bonded, nonbonded and special interactions. MD is a useful tool to validate the proteins folding in aqueous environment and study the enzyme’s movements in a time-dependent physical forcefield of Newtonian equations of motion. The structure’s energy is minimized in a first step. Afterwards the system gets equilibrated in a NVT and NPT simulation step. All of these steps are carried out using a system restraint file to maintain the target’s structure during the preparation steps. After equilibration is finished the main simulation for 100 ns can be started. If the MD simulations output conserved tertiary structures of the protein domains we assumed functionality of the immobilization induced by TasA matrix protein. Also, preserved structure indicates successful azithromycin or diclofenac etc. transformation by our enzymes (CotA, CueO and EreB). All steps were carried out equally to the previously described MD simulation of our EreB crystal structure candidate.

figure Figure 9: A: RMSD vs SCORE plot of the fusion protein. RMSD was calculated using the python package Biotite and total score values outputted by Rosetta were used as score. B: Ramachandran plot of the diheadreal angles of the best scoring candidate. C: Ramachandran plot of the top 5 highest scoring structures.
Root-mean-square deviation (RMSD), small root-mean-square fluctuation (RMSF) and radii of gyration (Rg) were plotted to analyse the simulation and validate the structure’s stability. Converging RMSD and Rg are primary indicators for stable protein structures. Analysis of the EreB-TasA fusion protein shows fluctuation of both values around a certain value showing a clear trend of convergence. RMSF analysis displays low derivation of the internal residues, especially those relevant for catalytic activity suggesting stable protein domains and could thus signify preserved catalytic activity and enzyme immobilization, respectively. It was already shown that the active side residues of unfused EreB protein show low atomic fluctuation in the EreB MD simulation. Also low Rg values are an indicator for preserved secondary structure, so we can assume that the protein did not denaturate during the simulation.

figure Figure 10: Figure 1: A: RMSD plot of the TasA-EreB fusion protein structure obtained by CM. The red plot represents RMSD and the black plot the RMSD calculated against the crystal structure. After 65 ns both the values start fluctuating about a nearly constant value suggesting convergence, a main indicator for a stable protein structure. B: Radius of gyration (Rg) plot. Convergence can be assumed as the value only fluctuates around a constant Rg. C: RMSF plot for every residue in the protein’s 3D structure. For RMSF determination the C alpha values were observed.
For further analysis principal component analysis (PCA) was performed to analyse internal movement of the protein. By PCA we are able to filter global collective movement from local movement to study the enzymes dynamics. The MD simulation run’s covariance matrix was generated and diagonalized using the GROMACS covar tool. The resulting eigenvectors were visualized using the GROMACS anaeig tool and are here presented as 3D visualization of the first eigenvectors showing strongest protein fluctuation. As visible in the simulation the protein shows internal movements of outer residues but functional internal domains stay structurally preserved as already visible in the RMSF graph. Especially the azithromycin binding pocket and residues E43, H46, R55, R74, H285 and H288 of EreB enzyme domain (highlighted in the simulation) important for catalytic activity remain stable and show weak movement. The most relevant principal mode (PM) shows increasing distance of the EreB and TasA protein domain during the simulation. This way accessibility of both domains increases and the function of the linker peptide as dynamic connection of both domains can be assumed. In summary the PMs describing internal movements show reinforces our method of immobilizing the transforming enzymes with extrapolymeric matrix protein TasA and suggests our assumption of correct protein folding after fusion based on the publication of Huang et al. (2018). [25]

Responsive image Responsive image
Responsive image Responsive image

Figure 11: Visualization of the first four essential modes' extremes. The covariance matrix was analysed using the GROMACS anaig tool and compared to the CM modeling derived structure candidate S_0484.pdb. Only backbone atoms were considered. TasA domain is coloured in light blue, the linker peptide in red and the EreB domain in blue. Top left: Fist (most relevant) PM Top right: second PM containing Bottom left: third PM Bottom right fourth PM
We used an MD simulation run of 100 ns to validate the structure determined by Rosetta comparative modelling in a physical time-dependant forcefield. A cubic box with at least 1.2 nm distance of the centred protein to the corners of the cube was used for the simulation run. The system was minimized and equilibrated by NVT and NPT simulations for both 100 ps. The stability of temperature and pressure was analysed and showed small fluctuation so consistency of the values and stability of the system could be assumed allowing the production run for 100 ns. The fusion proteins structure remained stable during the simulation suggesting functionality of both protein domains TasA and EreB, since they are structurally preserved and accessible. The signalling peptide (TasA N-terminus) is also accessible allowing transportation of the fusion proteins extracellular matrix to enable immobilization of the enzyme in our biofilm.

References

[1] CA. Roth et al. Protein Structure Prediction Using Rosetta. Methods in Enzymology 2004, 382 66-93, https://doi.org/10.1016/S0076-6879(04)83004-0 [2] Song Y et al. High-Resolution Comparative Modeling with RosettaCM. Structure 2013, 21 1735-1742, https://doi.org/10.1016/j.str.2013.08.005 [3] SA. Combs et al. Small-molecule ligand docking into comparative models with Rosetta, Nature Protocol 2013, 8(7) 1277-98, doi: 10.1038/nprot.2013.074 [4] Moretti R et al. Rosetta and the Design of Ligand Binding Sites. Methods Mol Biol. 2016; 1414: 47–62. doi: 10.1007/978-1-4939-3569-7_4 [5] Bonneu R et al. Rosetta in CASP4: Progress in ab initio protein structure prediction. Proteins 2001 doi: 10.1002/prot.1170 [6] Das R. and Baker D. Macromolecular Modeling with Rosetta. Annual Review of Biochemistry 2008, 77:363-382, https://doi.org/10.1146/annurev.biochem.77.062906.171838 [7] Kunzmann P. and Hamacher K. Biotite: a unifying open source computational biology framework in Python. BMC Bioinformatics 2018, 19 346, https://doi.org/10.1186/s12859-018-2367-z [8] https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/relax [9] Nivon LG et al. A Pareto-Optimal Refinement Method for Protein Design Scaffolds. PLOS ONE 2013, https://doi.org/10.1371/journal.pone.0059004 [10] Mbah AN et al. Drug Target Exploitable Structural Features of Adenylyl Cyclase Activity in Schistosoma mansoni. Drug Target Insights 2012, 6 41-58, doi: 10.4137/DTI.S10219 [11] Morar M. et al. Mechanism and Diversity of the Erythromycin Esterase Family of Enzymes. Biochemistry 2012, 51(8) 1740-51, doi: 10.1021/bi201790u [12] Edward I. Solomon, Anthony J. Augustine and Jungjoo Yoon O2 Reduction to H2O by the multicopper oxidases. Dalton Transport 2008, 30, doi: 10.1039/b800799c [13] http://manual.gromacs.org/documentation/2018/user-guide/terminology.html#gmx-force-field [14] Van Der Spoel D et al. GROMACS: Fast, flexible, and free. Journal of Computational Chemistry 2005, 26(16):1701-18. doi:10.1002/jcc.20291 [15] MacKerell Jr. AD et al. Development and Current Status of the CHARMM Force Field for Nucleic Acids. Biopolymers 2001, 56(4) 257-265 https://doi.org/10.1002/1097-0282(2000)56:4<257::AID-BIP10029>3.0.CO;2-W. [16] Piana et al Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. Current Opinion in Structural Biology 2014, 24 98-105. https://doi.org/10.1016/j.sbi.2013.12.006 [17] https://www.uni-muenster.de/Physik.TP/archive/fileadmin/lehre/TheorieAKkM/ws12/Schelte.pdf [18] Huang J. and MacKerell Jr D. CHARMM36 all‐atom additive protein force field: Validation based on comparison to NMR data. Journal of computational Chemistry 2013, DOI: 10.1002/jcc.23354 [19] http://gensoft.pasteur.fr/docs/lammps/12Dec2018/Howto_tip3p.html [20] http://www.bpc.uni-frankfurt.de/guentert/wiki/images/9/96/180618_TutorialMD.pdf [21] http://www.mdtutorials.com/gmx/lysozyme/index.html [22] Zhang L. et al Engineering of Laccase CueO for Improved Electron Transfer in Bioelectrocatalysis by Semi-Rational Design. Chemistry A European Journal 2020, 26(22), DOI: 10.1002/chem.201905598 [23] http://manual.gromacs.org/documentation/2019-rc1/onlinehelp/gmx-covar.html#gmx-covar [24] B. de Groot. Practical 5: Principal components analysis [25] Huang, J et al. Programmable and printable Bacillus subtilis biofilms as engineered living materials. Nature Chemical Biology 2018, doi: 10.1038/s41589-018-0169-2 [26] LF. Ribeiro et al. Engineering bifunctional laccase-xylanase chimeras for improved catalytic performance. J Biol Chem. 2011, 286(50) 43026-38, doi: 10.1074/jbc.M111.253419 [27] J. Zhang et al. Design and optimization of a linker for fusion protein construction. Progress in Natural Science 2009, 19(10) 1197-1200, https://doi.org/10.1016/j.pnsc.2008.12.007