EreB CM EreB MD Simulation Chapter 3 Chapter 4

EreB CM-Modeling

By carrying out structure prediction calculations using the Rosetta comparative modelling application RosettaCM we hope to create a precise 3D-model of the enzyme EreB. RosettaCM is based on homology modelling, comparing the protein structure to known crystal structures of proteins with a high sequence homology. Structure of unaligned sequences showing no or low homology to the given template structures are generated using the Rosetta ab initio protocol. Low homology is a consequence of mismatches or gaps in the structures’ alignments. This protocol uses a library of nine-mer and three-mer fragments of known protein structures to predict possible folding. ^[1]
EreB possesses high similarity in its active site with other esterases such as protein data bank (PDB) entry succinoglycan biosynthesis protein. This enhances the accuracy of the structure prediction method homology modelling since the similar domains can be used as a template for the structure. Nevertheless, the highest found structural homology of another PDB entry is 25,90% (Succinoglycan biosynthesis protein 2QGM, Chain A). Since the structure prediction relies on the statistical Monte Carlo method, multiple modelling runs are necessary to obtain a precise structure.

Figure 1: EreB crystal structure generated with *RosettaCM*

Proteins with high sequence homologies are found by blasting the EreB sequence against the PDB using the NCBI Blast application (blastp). 3 protein structures were identified to be sequentially related to EreB and were threaded onto the query sequence. Rosetta’s partial thread application assigns the templates’ structural data on the aligned sequences of the target structure to prepare the structure prediction run.

Succinoglycan biosynthesis protein (2QGM_A: X-ray diffraction, 1,70 Å + 2RAD_A: X-ray diffraction, 2,75 Å) by Bacillus cereus ATCC 14579 expressed in E. coli and Q81BN2_BACCR protein also from Bacillus cereus ATCC 14579 (3B55_A: X-ray diffraction, 2,30 Å) were used as templates.

The resulting threaded models are aligned in a single global frame and are then used to create a full chain model of the proteins 3D structure by Monte Carlo sampling. Monte Carlo sampling relies on random sampling of variants for solution of a problem that is deterministic, for example protein folding.

Fragment files were generated using the old Robetta fragment server. It outputs a three- and nine-mer file by aligning fragments to PDB entries. The fragment files are then used as input structures to enhance the model’s precision.

The structural changes are then scored using the Rosetta low resolution energy function. This approach relies on the fact that the desired proteins 3D structure is expected to be the minimum of its free energy function. Rosetta’s energy function takes hydrophobic interactions between non-polar residues, van der Waals interactions between buried atoms and the strong size dependence of forming a cavity in the solvent for accommodation of the folded protein in account. In this process atom-atom-interactions are calculated as Lennard-Jones potentials, hydrophobic interactions and the electrostatic desolvation of polar residues inside the molecule as implicit solvation models and an explicit hydrogen bonding potential for hydrogen bonds. ^[2] ^[3] The calculated energy is then assessed by the Metropolice acceptance criterion: If ΔE < 0 the structure is accepted, otherwise the newly proposed structure is accepted with probability p.

Structures are built using the Rosetta “fold tree”.^[2] Therefore, backbone and side chain conformation are displayed in a torsion space (Bonded interactions are mostly treated with ideal bond lengths and angles). Additionally, the position of each residue is displayed in a Cartesian space. Torsion angles are changed according to the fragment files or the provided templates and positions of homology fragments are substituted, combining an ab initio approach with a homology modelling approach. This combination of template derived fragments in Cartesian space with torsion angles and residue positions derived from fragments from the PDB database should ideally converge into the correctly folded protein topology. ^[3]

To solve clashes, distorted peptide bonds and poor backbone hydrogen bond geometry that often arises from this CM approach further improvements are necessary: In a second step the structure is improved by replacing backbone segments through Monte-Carlo-Method by either segments taken from the PDB that span the region and can roughly be superimposed on the selected residues or segments from the template structures that superimpose the complete segment. Afterwards the structure’s energy is minimized using a smoothed version of Rosetta’s low energy function. In a third step side chain residues are added and structure refinement is carried out using a physically realistic energy function.^[3]

Since Comparative modelling is a statistical approach for protein folding based on the Monte-Carlo-Method and comparison to related structures, a high number of structures has to be generated to ensure the predicted structure is as close as possible to the actual protein structures. In this context 20000 structures were generated using the “Lichtenberg high performance computer” of the TU Darmstadt. After the run finished the best structures were sorted by their total score calculated by RosettaCM and the best structure (S_17070.pdb) was used for further calculations. The total run was analysed using the Biotite Python package and its implemented superimpose and RMSD feature. ^[4]

Figure 1: 5 best scoring structures aligned in Pymol (S_17070: 4031.077, S_09664: 4033.266, S_00324: 4041.481, S_17103: 4069.085, S_12758: 4070.893)

RMSD values equals the Root-mean-square deviation of atomic positions to represent the structure similarity of two molecules. It is calculated using the following formula:

Figure 2: RMSD vs total score plot of the RosettaCM run. 20000 structures were generated and analyzed using the biotite python package.

For structure refinement an additional Relax run was carried out with an output of 100 structures to ensure realistic torsion angles. Relax is an all atom structure refinement application working in the structures local conformational space. ^[5] Regions of high energy in the proteins structure are optimized considering backbone and sidechain restraints to minimize structural derivation. Optimization follows usual methods as torsion-space sidechain minimization, torsion-space backbone minimization, and re-sampling of sidechain rotamers. ^[6]

To evaluate the obtained protein structure Ramachandran plots were created and can be compared to Ramachandran plots generated using a broad variety of protein’s crystal structures using the Procheck webserver. We checked whether the dihedral angles of the modelled secondary structure show the typical distribution to validate the model's accuracy.^[7] The evaluation showed that 328 residues are located in the most favoured regions, 44 in the additional allowed regions and 2 residues in the disallowed regions. Glycine and proline residues were excluded since they show no predictable dihedral angle distribution. 87,7% of the structures dihedral angles are located in the most favourable regions and 99,5% in total in allowed regions. Therefore, the structure is expected to be a good model of EreB’s crystal structure.

To summarize we used the RosettaCM application to predict the structure of our target enzyme EreB by creating 20000 structures on the Lichtenberg server cluster. We then relaxed the best scoring structure and validated it dihedral angles using a Ramachandran plot. For further investigations on enzyme stability MD simulation will be performed on the obtained structure.

EreB MD Simulation

Molecular dynamics simulation (MD) simulates the behaviour of a molecule in a small space that can be filled with solvent molecules (mostly H₂O). Therefore, it is suited to study the stability of our enzymes solved in aqueous environment. MD displays a far more dynamic and physical approach than comparative modelling.

Comparative modelling only creates a temporary, stationary image of a dynamic biomolecule without simulating the molecules behaviour within a force field even though dynamic motion of the protein is crucial for enzymatic activity. Molecular dynamics offers a physical approach over a certain simulated period of time for structure evaluation in contrast to statistical approaches like homology modelling neglecting dynamic physical forces between residues or interactions with solvent molecules. To validate the modelled structure of EreB and the fusion proteins of our enzymes with matrix protein TasA MD simulations were executed with the GROMACS (Groningen Machine for Chemical Simulations) software suit.

Although the structures obtained from our CM run were relaxed using the Rosetta Relax application, Monte Carlo based structure prediction tools do not always output fully relaxed protein structures and implicitly simulate the interactions with water molecules, Water molecules are a main part of the cause for highly important interactions responsible for the proteins structure such as hydrophobic interactions mostly between a protein’s inner residues or hydrophilic interactions of the proteins surface residue with water molecules. Also, interaction with water is one of the primary mechanisms for protein folding and consequently has to be considered for structure prediction or validation.

These interactions can be considered in MD simulation by providing a set of explicit water molecules interacting with the molecule and physical, time-dependant calculation of the systems’ forces using Newton’s equation of motion. To limit computing power only a small system containing one protein and enough water to negotiate interactions between different proteins is created and results are transferred to a realistic system. Periodic boundary conditions allow this transfer: The system is assumed to be consisting of multiple small systems that act identical, so when a molecule moves out of the simulated box it enters again on the other site, keeping the particle number constant and acting like a subsystem for simulation of a big system. By proving stability of our predicted (fusion)proteins we can also forecast their functionality: If the residues catalysing the ester cleavage of EreB form their already described catalytic centre and the active site is accessible for azithromycin the catalytic activity can most likely be assumed.^[8] Also, for the laccases catalytic activity can be assumed, if the copper sites are correctly folded to coordinate copper ions for oxidation catalysis and the substrate binding site equals the one of the obtained crystal structures.^[9]

The GROMACS software suit combines many tools for chemical and biochemical calculations. For MD simulation an external force field is integrated into the application. MD simulations contain methods calculating the forces exerted to all atoms of a biomolecular systems, mostly a target molecule solvated in water. Therefore, Newtonian equations of motions are used to predict the position and velocity of any atom in the system in small timesteps (femtoseconds). The forces are calculated using a force field, for example GROMOS or AMBER. “Force fields are sets of potential functions and parametrized interactions that can be used to study physical systems.” ^[10] They are derived from the equations of motion and therefore introduce time-dependency of the system. The force field consist of 3 types of interactions:
1. Bonded interactions between 2, 3 or 4 particles including harmonic, cubic and morse potentials for 2 particle systems, harmonic interactions for 3 particle systems
2. Nonbonded interactions between different molecules. The repulsion described by an exponential term, e. g. Lennard Jones potential, and a Coulomb term.
3. Special interactions defined by the position restraint of a given system, f. e. distance restraints obtained by Nuclear Overhauser Effect data from high resolution NMR.^[11]

Fig 1: Forcefield equation for Charmm27 forcefield. K_b, K_θ, K_UB, K_χ, and K_φ are the bond, valence angle, Urey–Bradley, dihedral angle, and im-proper dihedral angle force constants, respectively; b, θ, S, χ and δ are the bond length, bond angle, Urey–Bradley 1,3 distance, dihedral torsion angle and improper dihedral angle. ^[12]

Molecular dynamics simulation considers Newtonian forces on every atom in the simulated system and can therefore deliver, dependent of the force fields accuracy, precise results.^[13] Nevertheless, it is still an approximation and thus connected to imprecisions: The forces calculated are cut after a defined distance to limit required computing power. Also, the forcefield calculates forces on atomic levels without quantum mechanics taken into account. This adoption is based on the Born-Oppenheimer- MD approximation that splits an atom’s energy into core-energy and electron-energy since the electrons’ dynamics do not directly influence the atomic core. The core’s kinetic energy can then be taken into account by classical approximation of Newton’s law of movement. The quantum mechanical parts of this system, the electrons’ wave functions, are not considered for classical MD simulation. Thus, MD simulations are very accurate methods for most large systems but still based on approximations. Therefore, the obtained results are reliable but always have to be double checked by experiment in the laboratory.

Figure 2: EreB structure after relaxation and equilibration in a cubic box filled with water and Na⁺ counter ions. Catalytically relevant residues are highlighted on the right and show functional positioning.

The system used is a cubic box filled with explicit water molecules (TIP3 model) with the target enzyme centred in the box. Also, ions and counter ions are added to simulate realistic conditions and equilibrate the enzymes charge in solution due to (de)protonation. The system topology was created using the CHARMM27 forcefield with the TIP3P water model, specifying a 3-site rigid water molecule with charges and Lennard-Jones parameters assigned to each of the 3 atoms. ^[15]^[16] Afterwards a cuboid box with at least 1.2 nm distance of the borders to the protein is created and filled with water molecules and ions countering the proteins charge (system size: 8.000 5.425 5.921 (nm)). The system energy is then minimized. Afterwards NVT (constant number of particles, volume and temperature) and NPT (constant number of particles, pressure and temperature) equilibration is done using a position restraint file to keep the enzymes structure during equilibration of temperature and pressure of the system. Both NVT and NPT are carried out for 100 ps to ensure stabilization of the parameters. After the equilibration the main simulation can be started. The simulation was carried out for 100 ns with totally 50000000 steps (each 2 fs). ^[17]^[18]^[19]

References

1. Bonneu R et al. Rosetta in CASP4: Progress in ab initio protein structure prediction. Proteins 2001 doi: 10.1002/prot.1170 2. Das R. and Baker D. Macromolecular Modeling with Rosetta. Annual Review of Biochemistry 2008, 77:363-382, https://doi.org/10.1146/annurev.biochem.77.062906.171838 3. Song Y et al. High-Resolution Comparative Modeling with RosettaCM. Structure 2013, 21 1735-1742, https://doi.org/10.1016/j.str.2013.08.005 4. Kunzmann P. and Hamacher K. Biotite: a unifying open source computational biology framework in Python. BMC Bioinformatics 2018, 19 346, https://doi.org/10.1186/s12859-018-2367-z 5. https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/relax 6. Nivon LG et al. A Pareto-Optimal Refinement Method for Protein Design Scaffolds. PLOS ONE 2013, https://doi.org/10.1371/journal.pone.0059004 7. Mbah AN et al. Drug Target Exploitable Structural Features of Adenylyl Cyclase Activity in Schistosoma mansoni. Drug Target Insights 2012, 6 41-58, doi: 10.4137/DTI.S10219 8. Morar M. et al. Mechanism and Diversity of the Erythromycin Esterase Family of Enzymes. Biochemistry 2012, 51(8) 1740-51, doi: 10.1021/bi201790u 9. Edward I. Solomon, Anthony J. Augustine and Jungjoo Yoon O2 Reduction to H2O by the multicopper oxidases. Dalton Transport 2008, 30, doi: 10.1039/b800799c 10. http://manual.gromacs.org/documentation/2018/user-guide/terminology.html#gmx-force-field 11. Van Der Spoel D et al. GROMACS: Fast, flexible, and free. Journal of Computational Chemistry 2005, 26(16):1701-18. doi:10.1002/jcc.20291 12. MacKerell Jr. AD et al. Development and Current Status of the CHARMM Force Field for Nucleic Acids. Biopolymers 2001, 56(4) 257-265 https://doi.org/10.1002/1097-0282(2000)56:4<257::AID-BIP10029>3.0.CO;2-W. 13. Piana et al Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. Current Opinion in Structural Biology 2014, 24 98-105. https://doi.org/10.1016/j.sbi.2013.12.006 14. https://www.uni-muenster.de/Physik.TP/archive/fileadmin/lehre/TheorieAKkM/ws12/Schelte.pdf 15. Huang J. and MacKerell Jr D. CHARMM36 all‐atom additive protein force field: Validation based on comparison to NMR data. Journal of computational Chemistry 2013, DOI: 10.1002/jcc.23354 16. http://gensoft.pasteur.fr/docs/lammps/12Dec2018/Howto_tip3p.html 17. http://www.bpc.uni-frankfurt.de/guentert/wiki/images/9/96/180618_TutorialMD.pdf 18. http://www.mdtutorials.com/gmx/lysozyme/index.html 19. Zhang L. et al Engineering of Laccase CueO for Improved Electron Transfer in Bioelectrocatalysis by Semi-Rational Design. Chemistry A European Journal 2020, 26(22), DOI: 10.1002/chem.201905598

Team:TU Darmstadt/Model/Enzyme Modeling

Enzyme Modeling

EreB CM-Modeling

EreB MD Simulation

References