Team:TU Darmstadt/Model/Enzyme Modeling

image/svg+xml - O O



Why Rosetta?

Rosetta is a software suite capable of solving a multitude of computational macromolecular problems such as de novo protein design, enzyme design, ligand docking and structure prediction of biological macromolecules or macromolecular complexes. The Rosetta energy functions enables relatively precise solution of a broad range of applications by considering many different energy terms relevant for protein folding such as solvation, electrostatic effects and hydrogen bonding. It can be used to perform simulations starting from designing macromolecular structures, interactions and RNA or fibril structures up to the de novo design of a fully functioning enzyme! It was originally developed by the David Baker Lab[1]. Rosetta is free for academic users and a very powerful tool for multiple problems that come along with elaborating a synthetic biology problem. Although there are lot of information available in the Rosetta Documentation, it is very hard for people that want to get started with Rosetta to improve their project, especially for people without experience with console-based applications. Nevertheless, Rosetta displays an amazing and multifaceted tool for synthetic biology. To counter the starting issues with the program we provide a guide for Rosetta on our Wiki.

We used multiple Rosetta applications to model the properties of our enzymes. The collected data allows us to predict enzyme functionality when immobilized in our biofilm with TasA, binding affinity towards different pharmaceuticals or pollutants and many more.
RosettaCM was used to generate structure predictions of the azithromycin transforming enzyme EreB and fusion proteins consisting of matrix protein TasA and our enzymes CotA and EreB[2].
Rosetta Ligand Docking was used to study the binding affinity of various substrates towards the enzymes active site[3].

To find out more about the different aspects of the modelling using Rosetta you can read the articles thematizing these applications. If you want to use Rosetta on your own you can use our Rosetta Guide to get started with the program.

Laccase Ligand-Docking

What Did We Want to Achieve?

As already mentioned in our Project Overview and Pharmaceutical degradation chapter, we want to degrade problematic pharmaceutical substances like diclofenac and thus reduce the toxicity of waste water. One of our special approaches was to use bacterial laccases instead of the more common used fungal ones[5]. It has already been shown that fungal laccases like from Trametes versicolor are able to degrade diclofenac[6]. However, the degradation respectively transformation of diclofenac has not yet been shown for bacterial laccases[7]. This means that it would be necessary to modify and optimize these bacterial enzymes to degrade diclofenac[8]. This is where our docking project comes in. We wanted to use computational methods to study the binding properties of the substances to be degraded. For this purpose we wanted to identify the exact interactions between bacterial enzyme and substance and compare them with the laccase from T. versicolor[7]. On the basis of this data the binding properties of our bacterial enzymes should be optimized and a foundation for targeted experiments in the laboratory should be laid.

Why Ligand-Docking With Rosetta?

As already mentioned, Rosetta is a software suite for macromolecular modeling. It was initially developed to predict protein folding and has since been greatly expanded to include dozens of other options. Thus, Rosetta has also been used to predict protein structures, perform protein-protein and protein-ligand docking, design novel proteins and redesign existing ones just to name a few. One of those options, the small-molecule ligand-docking applications, attempt to predict the protein or small-molecule binding free energy, as well as critical binding interactions. These predictions can provide structural information of a ligand-binding site and can guide the enzyme engineering process for optimal adaptation of the active site to the corresponding ligand, in our case for environmentally problematic pharmaceuticals like diclofenac[9].
We used the ligand docking implementation of Rosetta to determine how our ligand interacts with the active site of the corresponding laccase. This means we used the laccase CotA from Bacillus subtilis for our modelling experiments to determine the crucial interactions between the active site and the ligand in its energetically most favorable binding pose[10,11].

Our Approach

The basic concept of our approach was to use the docking protocol with a Monte Carlo algorithm. This algorithm moves and rotates the ligand into different conformations to find a possible lowest energy position. This results in a pool of differently well-fitting ligand conformations within the active site. We selected these lowest energy conformations and determined based on these the most important residues for the interaction as well as residues which only have a small or no contribution to the interaction. This is an important information, because residues which already have a positive effect on the ligand binding should not be changed. While the residues, which have no or even a negative contribution to the interaction, must be changed by the introduction of site-directed/targeted mutations. These side chains, which are available for mutations, form the basis for further enzyme engineering to optimize binding interactions. Hereinafter the model of the mutated laccase is used to create another pool of conformations. This continues for several iterations. The mutating process allows for diversity in the pool of models to be maintained, improving the protocols ability to find low energy structures and to optimize the active site for our purpose[9,10,11]. With this iterative process, it was possible to acquire an optimized model of the active site within CotA for the specific substrate of interest. In our project this was mostly diclofenac. We wanted to achieve this to provide a good starting point for our wet lab experiments and to further prove whether the generated modelling data fits the experimental data. Using this data, we could have quickly generated optimized CotA and CueO species through targeted experiments in the lab, that might have been able to catalyze a decent enzymatic turnover of diclofenac. We launched our modeling experiments with focus on the laccase CotA.

Application and Methodology

A schematic process of the Rosetta protocol we used for the ligand docking is shown in Figure 1. The protocol was performed as follows. After preparing the protein and ligand structures, the placement of the ligand into the binding pocket is optimized, followed by sidechain optimization. This process is repeated iteratively, and the proposed designs are sorted and filtered by affinity and hydrogen bonding. Parts of this procedure are explained in more detail below[10,12].
figure
Figure 1: Flowchart of the Rosetta ligand docking protocol. From the combined input coordinates of the protein and ligand, the position of the ligand is optimized. Next, residues in the protein/ligand interface are optimized for both identity and position. After several cycles of small molecule perturbation, sidechain rotamer sampling, Monte Carlo minimization (MCM) and a final gradient-based minimization of the protein to resolve any clashes (“high resolution redocking”), the final model is the output. Further optimization can occur by using the final models of one round of design as the input models of the next round. Illustration similar to R. Moretti et al.[10].

At first, we needed a crystal structure of our laccase CotA. The protein structure is fully identified and high-resolution crystal structures are available at the Protein Data Base (PDB). Therefore, it was not necessary to generate a homology model of the laccase and we could start directly with the preparation of the protein structures. We used the PDB structure 1GSK for CotA (figure 2). This structure weas prepared by the Rosetta command protein_prep. Through this the PDB file was stripped of any information other than the desired protein coordinates.


Figure 2: Ribbon structure of CotA (1GSK). Catalytic copper atoms and the residues involved in the catalytic process are shown as sticks and colored by heteroatom.

Conformer File

The next step was the preparation of the ligand molecules, the following example uses diclofenac. The 3D structure was downloaded from the Pubchem database or generated with the software tool Chemdraw[13,14]. This generated diclofenac.sdf file only contains one conformation of the ligand. To get a more accurate simulation, a conformer file was necessary to provide Rosetta with various possible conformations which diclofenac can take. Having various conformations simulates a flexible molecule and leads to more accurate models. For this reason, a library of conformations for diclofenac was generated outside of Rosetta. We used the free accessible tool BioChemicalLibrary (BCL) from the Meiler lab[15]. This tool created the conformation library and added hydrogen atoms to the now called diclofenac_conformers.sdf file.

Params File

The last file which needed to be prepared is the params file, a Rosetta parameter file that provides the necessary parameters to Rosetta on how to treat the ligand diclofenac. A params file was necessary for the ligand docking because the Rosetta force field had no parameters for custom small molecules. This necessary file for the docking process with the ligand diclofenac includes various information about the molecules chemical and geometric properties. Those are needed to determine its environment of interaction with other amino acid residues. The creation of the params file was done by the molfile_to_params.py script provided by Rosetta. As input the diclofenac_conformer and the cleaned protein file were used.
After that we manually positioned the ligand within the binding pocket by a visualization software like Pymol or Chimera. For the laccases CotA the binding pocket is located at the residues H491 and M502 in close proximity to the catalytic copper ions (figure 3). Placing the ligand near to the location of the suspected binding pocket helped Rosetta narrow its search region. In addition, we defined a starting point for the ligand with the StartFrom mover command for the exact same reason.

figure Figure 3: Ribbon structure of CotA (1GSK) with diclofenac. Manually positioned diclofenac in close proximity to the active site. Amino acid residues of CotA near diclofenac are shown in sticks and colored by heteroatoms.

Ligand-Docking XML

The dock.xml file tells Rosetta the type of sampling and scoring to do. It defines the scoring function and provides parameters for both low-resolution coarse sampling and high-resolution Monte Carlo sampling[9]. The xml script we used to run the ligand docking was oriented on the tutorials from Rosettacommons and Meiler Lab[9,10]. The complete script of the xml file is documented in the How to Rosetta Guide . The description of some main functions in their respective order are listed in table 1[9,12].

Table 1: Overview of the used Rosetta xml functions. For more details see RosettaCommons.
Function Explanation
ScoreFunction <SCOREFXNS> Defines functions that will be used for the socring process in filter and movers.
Ligand Areas <LIGAND_AREAS> describes parameters specific to each ligand.
Interface Builders <INTERFACE_BUILDERS> Describes how to choose residues that will be part of a protein-ligand interface.
Movemap Builders <MOVEMAP_BUILDERS> Constucts a movemap, a 2xN table of true/false values, where N is the number of residues your protein/ligand complex. The movemap builder
combines previously constructed backbone and side-chain interfaces. Further definitions are made via the
optione “cutoff; all_atom_mode; add_nbr_radius”
Scoring Grids <SCORINGGRIDS> Defines the grid size in which the scoring function is considered
Movers <MOVERS> defines virtual rooms/boxes in which the residues and ligands involved in the docking process are allowed to move
Protocols <PROTOCOLS> Defines the output

Results

At first we did the ligand docking of diclofenac with the laccase CotA. Preparations of the ligand and protein were done as explained above. For realization of the docking process we needed a working Rosetta xml script for our purpose. The script we worked out is shown in our Rosetta Guide. With this script, adjusted over many trials, we docked many thousand structures. As output we chose the Rosetta total score and the Rosetta interface score. We plotted these scores against the root-mean-square deviation of atomic positions (RMSD) from the ligand to allow a visible evaluation of the docked structures respectively poses[9]. The formula for calculation of the RMSD is shown below (formula 1).
$$ \mathrm {Formula \: 1:}\: \mathrm {RMSD(v,w)} = {\sqrt{{\frac{1}{N}\sum_{i=1}^{N}}\|\vec{v}_i-\vec{w}_i\|^2}} $$
An interface score against RMSD plot from Diclofenac with CotA is shown in figure 4. Based on these plots it was easier to identify low energy structures with good interactions with the enzyme. Ideally the data points of these plots should form a funnel like plot. The structures on the closing end are especially good structures. These structures are highlighted with by a red square in figure 4.
figure
Figure 4: Interface score RMSD plot of diclofenac docking with CotA. RMSD values of diclofenac in nm plotted against the Rosetta interface score. Every datapoint represents a single docked structure. The structures highlighted with a red box are especially good structures.

One of these diclofenac structures in CotA is shown in Figure 5. Interestingly, diclofenac was positioned here in a binding channel like structure and not directly near the active copper atom within the active site. This binding channel structure is flanked by two flexible loops of CotA. These loops could undergo a conformational change due to the binding of the substrate, which could hold the substrate in place or even bring it closer to the active site. The most dominant interaction in this model were with the residue’s threonine262 and threonine418. The threonine residues may interact with the secondary nitrogen and the carboxy group of diclofenac.

CotA Ribbon
A generic square placeholder image with rounded corners in a figure.
Figure 5: Diclofenac docking with CotA. One of the best low energy interface score structures of CotA with diclofenac. Residues of the active site and in diclofenac binding involved residues are shown with sticks and colored heteroatoms. Diclofenac binds into a cavity and interacts with T262 and T418.

For comparison, the same docking procedure was performed with diclofenac and the laccase of T. versicolor. A structure with top scoring was also selected (Fig. 6). In this model diclofenac binds closer to the active site of the laccase of T. versicolor. The substance also bound within a cavity of the enzyme and may interact by π-π-stacking with phenylalanine 163. Other interaction with the proline 163 and the carboxy group and aspartate 206 with the secondary nitrogen of diclofenac. The docking with the laccase from T. versicolor showed more possible interactions and a better positioning of diclofenac near the active site. This assumption was supported by the interface and total score values of both docking experiments (table 2).

T Versicolor Ribbon
T Versicolor Surface
Figure 6: Diclofenac docking with the laccase from T. versicolor. One of the best low energy interface score structures of T. versicolor laccase and diclofenac. Residues of the active site and in diclofenac binding involved residues are shown as sticks and with colored heteroatoms. Diclofenac binds into a cavity and interacts with P163, F265, D206.


The following table 2 also shows an overview of the different scoring values of the top scored structures from different substrates like 4-Nonylphenol, Ibuprofen and Carbamazepin with CotA. The scoring values of all these substrates were relatively similar. Only the scoring values for the laccase from T. versicolor were slightly better. But taking into account the positioning of the substance T. ersicolor is the superior laccase.

Table 2: Overview of the best interface and total score values from the docking experiments with CotA und T. versicolor.

Enzyme
   
Substance   
   
Best Interface Score   
   
Best Total Score   

CotA
   
4-Nonylphenol   
   
-11.014   
   
-1232.434   
   
Ibuprofen   
   
-12.170   
   
-1232.533   
   
Carbamazepin   
   
-12.633   
   
-1230.243   
   
Diclofenac   
   
-12.820   
   
-1230.488   

T. Versicolor
   
Diclofenac   
   
-13.266   
   
-1528.821   

It is also confirmed in our docking experiments, that T. versicolor binds diclofenac more optimal and thus may naturally degrade this substrate better. For a reliable confirmation of our results, we could further perform a Quantum Mechanical (QM) docking procedure[18]. But possible starting points for modifications could be shown. Consequently, we could start modifying our bacterial laccase by generating mutants with the aspartate and phenylalanine residues in similar loop region to the laccase of T. versicolor. This procedure could possibly improve the binding affinity of diclofenac to CotA und therefore improve the ability of degrading this toxic substance by enzymatic oxidation. The many advantages of bacterial enzymes outweigh the use of fungal enzymes. For example, bacterial enzymes are a lot easier to produce in a greater scale, do not have complex glycosylation patterns and are prone to modifications. For this reason, it is worth pursuing these approaches.


EreB Comparative Modeling

1. Implementation

By carrying out structure prediction calculations using the Rosetta comparative modelling (CM) application RosettaCM we hope to create a precise 3D-model of the enzyme EreB. RosettaCM is based on homology modelling, comparing the protein structure to known crystal structures of proteins with a high sequence homology. Structure of unaligned sequences showing no or low homology to the given template structures are generated using the Rosetta ab initio protocol. Low homology is a consequence of mismatches or gaps in the structures’ alignments. This protocol uses a library of nine-mer and three-mer fragments of known protein structures to predict possible folding[19].
EreB possesses high similarity in its active site with other esterases such as protein data bank (PDB) entry succinoglycan biosynthesis protein. This enhances the accuracy of the structure prediction method homology modelling since the similar domains can be used as a template for the structure. Nevertheless, the highest found structural homology of another PDB entry is 25,90% (Succinoglycan biosynthesis protein 2QGM, Chain A). Since the structure prediction relies on the statistical Monte Carlo method, multiple modelling runs are necessary to obtain a precise structure.
figure Figure 7: EreB crystal structure generated with RosettaCM
Proteins with high sequence homologies are found by blasting the EreB sequence against the PDB using the NCBI Blast application (blastp). 3 protein structures were identified to be sequentially related to EreB and were threaded onto the query sequence. Rosetta’s partial thread application assigns the templates’ structural data on the aligned sequences of the target structure to prepare the structure prediction run.
Succinoglycan biosynthesis protein (2QGM_A: X-ray diffraction, 1,70 Å + 2RAD_A: X-ray diffraction, 2,75 Å) by Bacillus cereus ATCC 14579 expressed in E. coli and Q81BN2_BACCR protein also from Bacillus cereus ATCC 14579 (3B55_A: X-ray diffraction, 2,30 Å) were used as templates.
The resulting threaded models are aligned in a single global frame and are then used to create a full chain model of the proteins 3D structure by Monte Carlo sampling. Monte Carlo sampling relies on random sampling of variants for solution of a problem that is deterministic, for example protein folding.
Fragment files were generated using the old Robetta fragment server. It outputs a three- and nine-mer file by aligning fragments to PDB entries. The fragment files are then used as input structures to enhance the model’s precision.
The structural changes are then scored using the Rosetta low resolution energy function. This approach relies on the fact that the desired proteins 3D structure is expected to be the minimum of its free energy function. Rosetta’s energy function takes hydrophobic interactions between non-polar residues, van der Waals interactions between buried atoms and the strong size dependence of forming a cavity in the solvent for accommodation of the folded protein in account. In this process atom-atom-interactions are calculated as Lennard-Jones potentials, hydrophobic interactions and the electrostatic desolvation of polar residues inside the molecule as implicit solvation models and an explicit hydrogen bonding potential for hydrogen bonds[2 ,20]. The calculated energy is then assessed by the Metropolis acceptance criterion: If ΔE < 0 the structure is accepted, otherwise the newly proposed structure is accepted with probability p.
figure
Formula 2: Metropolis acceptance criterion for structural changes

2. Rosetta algorithm

Structures are built using the Rosetta “fold tree”[20]. Therefore, backbone and side chain conformation are displayed in a torsion space (Bonded interactions are mostly treated with ideal bond lengths and angles). Additionally, the position of each residue is displayed in a Cartesian space. Torsion angles are changed according to the fragment files or the provided templates and positions of homology fragments are substituted, combining an ab initio approach with a homology modelling approach. This combination of template derived fragments in Cartesian space with torsion angles and residue positions derived from fragments from the PDB database should ideally converge into the correctly folded protein topology[2].
To solve clashes, distorted peptide bonds and poor backbone hydrogen bond geometry that often arises from this CM approach further improvements are necessary: In a second step the structure is improved by replacing backbone segments through Monte-Carlo-Method by either segments taken from the PDB that span the region and can roughly be superimposed on the selected residues or segments from the template structures that superimpose the complete segment. Afterwards the structure’s energy is minimized using a smoothed version of Rosetta’s low energy function. In a third step side chain residues are added and structure refinement is carried out using a physically realistic energy function[2].

3. Results and discussion

Since Comparative modelling is a statistical approach for protein folding based on the Monte-Carlo-Method and comparison to related structures, a high number of structures has to be generated to ensure the predicted structure is as close as possible to the actual protein structures. In this context 20000 structures were generated using the “Lichtenberg high performance computer” of the TU Darmstadt. After the run finished the best structures were sorted by their total score calculated by RosettaCM and the best structure (S_17070.pdb) was used for further calculations. The total run was analysed using the Biotite Python package and its implemented superimpose and RMSD feature[21].

Figure 8: 5 best scoring structures aligned in Pymol (S_17070: 4031.077, S_09664: 4033.266, S_00324: 4041.481, S_17103: 4069.085, S_12758: 4070.893)
RMSD values equals the Root-mean-square deviation of atomic positions to represent the structure similarity of two molecules. It is calculated using the following formula:
figure figure
Figure 9: RMSD vs total score plot of the RosettaCM run. 20000 structures were generated and analyzed using the biotite python package.
For structure refinement an additional Relax run was carried out with an output of 100 structures to ensure realistic torsion angles. Relax is an all atom structure refinement application working in the structures local conformational space[22]. Regions of high energy in the proteins structure are optimized considering backbone and sidechain restraints to minimize structural derivation. Optimization follows usual methods as torsion-space sidechain minimization, torsion-space backbone minimization, and re-sampling of sidechain rotamers[23].
To evaluate the obtained protein structure Ramachandran plots were created and can be compared to Ramachandran plots generated using a broad variety of protein’s crystal structures using the Procheck webserver. We checked whether the dihedral angles of the modelled secondary structure show the typical distribution to validate the model's accuracy.[24] The evaluation showed that 328 residues are located in the most favoured regions, 44 in the additional allowed regions and 2 residues in the disallowed regions. Glycine and proline residues were excluded since they show no predictable dihedral angle distribution. 87,7% of the structures dihedral angles are located in the most favourable regions and 99,5% in total in allowed regions. Therefore, the structure is expected to be a good model of EreB’s crystal structure.
figure
figure

4. Conclusion

To summarize we used the RosettaCM application to predict the structure of our target enzyme EreB by creating 20000 structures on the Lichtenberg server cluster. We then relaxed the best scoring structure and validated its dihedral angles using a Ramachandran plot. For further investigations on enzyme stability MD simulation will be performed with the obtained structure.

EreB MD Simulation

1. Implementation

Molecular dynamics simulation (MD) simulates the behaviour of a molecule in a small space that can be filled with solvent molecules (mostly H2O). Therefore, it is suited to study the stability of our enzymes solved in aqueous environment. MD displays a far more dynamic and physical approach than comparative modelling.
Comparative modelling only creates a temporary, stationary image of a dynamic biomolecule without simulating the molecules behaviour within a force field even though dynamic motion of the protein is crucial for enzymatic activity. Molecular dynamics offers a physical approach over a certain simulated period of time for structure evaluation in contrast to statistical approaches like homology modelling neglecting dynamic physical forces between residues or interactions with solvent molecules. To validate the modelled structure of EreB and the fusion proteins of our enzymes with matrix protein TasA MD simulations were executed with the GROMACS (Groningen Machine for Chemical Simulations) software suit.
Although the structures obtained from our CM run were relaxed using the Rosetta Relax application, Monte Carlo based structure prediction tools do not always output fully relaxed protein structures and implicitly simulate the interactions with water molecules, Water molecules are a main part of the cause for highly important interactions responsible for the proteins structure such as hydrophobic interactions mostly between a protein’s inner residues or hydrophilic interactions of the proteins surface residues with water molecules. Also, interaction with water is one of the primary mechanisms for protein folding and consequently has to be considered for structure prediction or validation.
These interactions can be considered in MD simulation by providing a set of explicit water molecules interacting with the molecule and physical, time-dependant calculation of the systems’ forces using Newton’s equation of motion. To limit computing power only a small system containing one protein and enough water to negotiate interactions between different proteins is created and results are transferred to a realistic system. Periodic boundary conditions allow this transfer: The system is assumed to be consisting of multiple small systems that act identical, so when a molecule moves out of the simulated box it enters again on the other site, keeping the particle number constant and acting like a subsystem for simulation of a big system. By proving stability of our predicted (fusion)proteins we can also forecast their functionality: If the residues catalysing the ester cleavage of EreB form their already described catalytic centre and the active site is accessible for azithromycin the catalytic activity can most likely be assumed[25]. Also, for the laccases catalytic activity can be assumed, if the copper sites are correctly folded to coordinate copper ions for oxidation catalysis and the substrate binding site equals the one of the obtained crystal structures[26].
figure
Formula 3: Newton's second law of motion

2. Force fields

The GROMACS software suit combines many tools for chemical and biochemical calculations. For MD simulation an external force field is integrated into the application. MD simulations contain methods calculating the forces exerted to all atoms of a biomolecular systems, mostly a target molecule solvated in water. Therefore, Newtonian equations of motions are used to predict the position and velocity of any atom in the system in small timesteps (femtoseconds). The forces are calculated using a force field, for example GROMOS or AMBER. “Force fields are sets of potential functions and parametrized interactions that can be used to study physical systems[27].” They are derived from the equations of motion and therefore introduce time-dependency of the system. The force field consist of 3 types of interactions:
1. Bonded interactions between 2, 3 or 4 particles including harmonic, cubic and morse potentials for 2 particle systems, harmonic interactions for 3 particle systems
2. Nonbonded interactions between different molecules. The repulsion described by an exponential term, e. g. Lennard Jones potential, and a Coulomb term.
3. Special interactions defined by the position restraint of a given system, f. e. distance restraints obtained by Nuclear Overhauser Effect data from high resolution NMR[28].
figure
Formula 4: Forcefield equation for Charmm27 forcefield. Kb, Kθ, KUB, Kχ, and Kφ are the bond, valence angle, Urey–Bradley, dihedral angle, and im-proper dihedral angle force constants, respectively; b, θ, S, χ and δ are the bond length, bond angle, Urey–Bradley 1,3 distance, dihedral torsion angle and improper dihedral angle[29].

3. MD: theoretical aspects

Molecular dynamics simulation considers Newtonian forces on every atom in the simulated system and can therefore deliver, dependent of the force fields accuracy, precise results.[30] Nevertheless, it is still an approximation and thus connected to imprecisions: The forces calculated are cut after a defined distance to limit required computing power. Also, the forcefield calculates forces on atomic levels without quantum mechanics taken into account. This adoption is based on the Born-Oppenheimer MD approximation that splits an atom’s energy into core-energy and electron-energy since the electrons’ dynamics do not directly influence the atomic core. The core’s kinetic energy can then be taken into account by classical approximation of Newton’s law of movement. The quantum mechanical parts of this system, the electrons’ wave functions, are not considered for classical MD simulation. [31] Thus, MD simulations are very accurate methods for most large systems but still based on approximations. Therefore, the obtained results are reliable but always have to be double checked by experiment in the laboratory.
Responsive image
Figure 10: EreB structure after relaxation and equilibration in a cubic box filled with water and Na+ counter ions. Catalytically relevant residues are highlighted on the right and show functional positioning.

4. Results and discussion

The system used is a cubic box filled with explicit water molecules (TIP3 model) with the target enzyme centred in the box. Also, ions and counter ions are added to simulate realistic conditions and equilibrate the enzymes charge in solution due to (de)protonation. The system topology was created using the CHARMM27 forcefield with the TIP3P water model, specifying a 3-site rigid water molecule with charges and Lennard-Jones parameters assigned to each of the 3 atoms [32,33]. Afterwards a cuboid box with at least 1.2 nm distance of the borders to the protein is created and filled with water molecules and ions countering the proteins charge (system size: 8.000 5.425 5.921 (nm)). The system's energy is then minimized. Afterwards NVT (constant number of particles, volume and temperature) and NPT (constant number of particles, pressure and temperature) equilibration is done using a position restraint file to keep the enzymes structure during equilibration of temperature and pressure of the system. Both NVT and NPT are carried out for 100 ps to ensure stabilization of the parameters. After the equilibration the main simulation can be started. The simulation was carried out for 100 ns with totally 50000000 steps (each 2 fs)[34,35,36].

RMSD, small root-mean-square fluctuation (RMSF) and radii of gyration can be used to analyse and validate a MD simulation. These values give an important first insight into the structural stability of the structure obtained from comparative modelling. Converging RMSD and radii of gyration can therefore be used as primary indicators for stable protein structures. RMSF displays the average derivation of particles over a time from its original position and therefore shows which regions of the protein show the highest dynamic and which domains are structurally preserved. Radius of gyration displays the root-mean-square distance from each atom of a protein to its centroid and can be used to analyse which regions of a protein are denatured or which regions show a high amount of secondary structure motifs. RMSD was calculated compared to both relaxed and crystal structure, RMSF was calculated for C alpha backbone atoms and gyration radius for the whole protein. The specific value of convergence depends to the size of the protein that is subject to MD simulation. The RMSD and gyration angle plots were analysed after the 100 ns simulation and show a clear trend of convergence. Also, the RMSF plot shows really small movement at the residues essential for the catalytic process of EreB E43, H46, R55, R74, H285 and H288[25].

For further analysis principle component analysis (PCA) was performed on the simulation logs. PCA allows us to filter global collective movement from local, fast movement to further visualize and study the dynamics of a protein. GROMACS covar tool is used to calculate a covariance matrix of the proteins atomic fluctuation that can be diagonalized to create a set of eigenvalues and eigenvectors describing the proteins modes of fluctuation. The covariation matrix of a protein describes the covariance meaning the dependency of each fluctuation to the other movements. Hereby, the eigenvectors represent the largest-amplitude correlated motions and are called principal components or essential modes[37]. The GROMACS anaeig tool can be used to visualize these principal components by projecting the proteins trajectory on the given eigenvectors (here 1-4)[38]. As shown in the animation the molecule shows strong internal movement, but the catalytically important residues and the azithromycin binding pocket stay structurally preserved, suggesting activity of the enzyme.
1st PM: dark blue, 2nd PM: light blue 3rd PM: purple, 4th PM: light blue
Responsive image Responsive image
Figure 12: Visualization of the first four essential modes' extremes. The covariance matrix was analysed using the GROMACS anaig tool and compared to the CM derived structure candidate S_17070.pdb. To limit computational dependendencys and because they allow good conclusions to the protein structures only backbone atoms were considered. The first two principal modes (PMs) containing the stongest fluctuation are shown on the left. On the right the 3rd and 4th PMs are displayed.

5. Conclusion

In conclusion we used a MD simulation run of 100 ns to validate the structure determined by homology modelling in a physical time-dependant forcefield. The simulation was done in a cubic box with at least 1.2 nm distance of the centred protein to the corners of the cube. The system was minimized and equilibrated by NVT and NPT simulations for both 100 ps. We analysed the systems temperature and pressure afterwards, which showed small fluctuation about the expected values. We were able to start the MD production run for 100 ns. Analysis of the production run showed converging RMSD and radii of gyration values as well as small RMSF values on the active site residues. Consequently, we validated the obtained structure as a possible crystal structure of erythromycin esterase type II EreB.

TasA fusion proteins

1. Implementation and comparative modelling

Our selected enzymes for pharmaceutical transformation are supposed to be embedded into the extracellular polymeric matrix of our B. subtilis biofilm by fusing the enzymes to the matrix protein TasA (PDB entry 5OF2). Therefore, we are following the method described in Programmable and printable Bacillus subtilis biofilms as engineered living materials by Huang et al. (2018), that fused it exemplary to various proteins or protein domains like mCherry or MHETase to introduce new functions to the biofilm [39]. The exact methods for fusion protein construction is documented in our corresponding wiki text in the biofilm category.
We also planned a fusion protein by integration of TasA into a surface loop of CotA as described in Engineering Bifunctional Laccase-Xylanase Chimeras for Improved Catalytic Performance by Ribeiro et al (2011)[40]. Therefore, we had to move the signal peptide to the N-terminus of CotA. CM indicates that the signal peptide is still accessible for cell export and both protein domains are correctly folded. Nevertheless, we discarded this approach, because end-to-end linkage to TasA also promises functioning enzymes and is far more modular. For integration of TasA into the enzymes sequence we always had to find surface loops that aren’t sufficient for the enzymes’ functionality by allowing correct protein folding, including residues essential for substrate binding or active site residues. This would contradict with our idea of a modular biofilm, to which new enzymes could easily be added.
figure
Figure 13: TasA CotA fusion protein. TasA was inserted in a surface loop as described by Ribeiro et al. The structure was aligned to CotA and TasA crystal structure taken from the PDB. (signal peptide: darkblue, fusion protein CotA domain: deeppurple, fusion TasA domain: deepsalomon, CotA: turqoise, TasA: grey)
First the fusion proteins structure was predicted using the RosettaCM application. Comparative modelling is a homology modelling approach that uses known structures with high sequence homologies to determine the enzymes 3D structure combined with an Ab-Initio approach for regions that cannot be aligned. Changes in secondary structure are randomly introduced and validated using the Rosetta Energy Function to approach the proteins lowest free energy state. Comparative modelling is an excellent method for fusion proteins, because both structures of the enzymes (PDB: CueO: 5B7E, CotA: 1GSK, EreB and TasA (PDB: 5OF2) were already determined and show 100% complementary to the corresponding protein domains[41]. The domains structures were aligned and threaded onto the fusion protein sequence using the Rosetta partial thread application. Fragment files were generated using the old Robetta fragment server. It outputs a three- and nine-mer file by aligning fragments to PDB entries. The fragment files are then used as input structures to enhance the Ab-Initio model’s precision. Detailed information on the algorithm of RosettaCM can be found in the modelling section for EreB.

figure
Figure 14: EreB TasA fusion protein. TasA was fused N-terminally to EreB with a short linker peptide as described by Huang et al. Structure was obtained by homology modelling using RosettaCM.

2. MD simulation

To investigate functionality of the protein domains and structure stability of the fusion proteins MD simulations were carried out using GROMACS and the Charmm27 forcefield with explicit TIP3P water. The forcefield calculates the forces working on all atoms in the system considering interactions such as bonded, nonbonded and special interactions. MD is a useful tool to validate the proteins folding in aqueous environment and study the enzyme’s movements in a time-dependent physical forcefield of Newtonian equations of motion. The structure’s energy is minimized in a first step. Afterwards the system gets equilibrated in a NVT and NPT simulation step. All of these steps are carried out using a system restraint file to maintain the target’s structure during the preparation steps. After equilibration is finished the main simulation for 100 ns can be started. If the MD simulations output conserved tertiary structures of the protein domains we assumed functionality of the immobilization induced by TasA matrix protein. Also, preserved structure indicates successful azithromycin or diclofenac etc. transformation by our enzymes (CotA, CueO and EreB). All steps were carried out equally to the previously described MD simulation of our EreB crystal structure candidate.

figure Figure 15: A: RMSD vs SCORE plot of the fusion protein. RMSD was calculated using the python package Biotite and total score values outputted by Rosetta were used as score. B: Ramachandran plot of the diheadreal angles of the best scoring candidate. C: Ramachandran plot of the top 5 highest scoring structures.
Root-mean-square deviation (RMSD), small root-mean-square fluctuation (RMSF) and radii of gyration (Rg) were plotted to analyse the simulation and validate the structure’s stability. Converging RMSD and Rg are primary indicators for stable protein structures. Analysis of the EreB-TasA fusion protein shows fluctuation of both values around a certain value showing a clear trend of convergence. RMSF analysis displays low derivation of the internal residues, especially those relevant for catalytic activity suggesting stable protein domains and could thus signify preserved catalytic activity and enzyme immobilization, respectively. It was already shown that the active side residues of unfused EreB protein show low atomic fluctuation in the EreB MD simulation. Also low Rg values are an indicator for preserved secondary structure, so we can assume that the protein did not denaturate during the simulation.

For further analysis principal component analysis (PCA) was performed to analyse internal movement of the protein. By PCA we are able to filter global collective movement from local movement to study the enzymes dynamics. The MD simulation run’s covariance matrix was generated and diagonalized using the GROMACS covar tool. The resulting eigenvectors were visualized using the GROMACS anaeig tool and are here presented as 3D visualization of the first eigenvectors showing strongest protein fluctuation. As visible in the simulation the protein shows internal movements of outer residues but functional internal domains stay structurally preserved as already visible in the RMSF graph. Especially the azithromycin binding pocket and residues E43, H46, R55, R74, H285 and H288 of EreB enzyme domain (highlighted in the simulation) important for catalytic activity remain stable and show weak movement. The most relevant principal mode (PM) shows increasing distance of the EreB and TasA protein domain during the simulation. This way accessibility of both domains increases and the function of the linker peptide as dynamic connection of both domains can be assumed. In summary the PMs describing internal movements show reinforces our method of immobilizing the transforming enzymes with extrapolymeric matrix protein TasA and suggests our assumption of correct protein folding after fusion based on the publication of Huang et al. (2018)[39].

Responsive image Responsive image
Responsive image Responsive image

Figure 17: Visualization of the first four essential modes' extremes. The covariance matrix was analysed using the GROMACS anaig tool and compared to the CM derived structure candidate S_0484.pdb. Only backbone atoms were considered. TasA domain is coloured in light blue, the linker peptide in red and the EreB domain in blue. Top left: Fist (most relevant) PM Top right: second PM containing Bottom left: third PM Bottom right fourth PM

3. Conclusion

We used an MD simulation run of 100 ns to validate the structure determined by Rosetta comparative modelling in a physical time-dependant forcefield. A cubic box with at least 1.2 nm distance of the centred protein to the corners of the cube was used for the simulation run. The system was minimized and equilibrated by NVT and NPT simulations for both 100 ps. The stability of temperature and pressure was analysed and showed small fluctuation so consistency of the values and stability of the system could be assumed allowing the production run for 100 ns. The fusion proteins structure remained stable during the simulation suggesting functionality of both protein domains TasA and EreB, since they are structurally preserved and accessible. The signalling peptide (TasA N-terminus) is also accessible allowing transportation of the fusion proteins extracellular matrix to enable immobilization of the enzyme in our biofilm.

References

[1] CA. Roth et al. Protein Structure Prediction Using Rosetta. Methods in Enzymology 2004, 382 66-93, https://doi.org/10.1016/S0076-6879(04)83004-0 [2] Song Y et al. High-Resolution Comparative Modeling with RosettaCM. Structure 2013, 21 1735-1742, https://doi.org/10.1016/j.str.2013.08.005 [3] SA. Combs et al. Small-molecule ligand docking into comparative models with Rosetta, Nature Protocol 2013, 8(7) 1277-98, doi: 10.1038/nprot.2013.074 [4] Moretti R et al. Rosetta and the Design of Ligand Binding Sites. Methods Mol Biol. 2016; 1414: 47–62. doi: 10.1007/978-1-4939-3569-7_4 [5] Mitra Naghdi et al., Removal of pharmaceutical compounds in water and wastewater using fungal oxidoreductase enzymes, Environmental Pollution, 2018, 234:190-213, doi: 10.1016/j.envpol.2017.11.060
[6] Sultan K. Alharbi et al., Degradation of diclofenac, trimethoprim, carbamazepine, and sulfamethoxazole by laccase from Trametes versicolor: Transformation products and toxicity of treated effluent, Biocatalysis and Biotransformation, 2019, doe: 10.1080/10242422.2019.1580268
[7] Leticia Arregui et al., Laccases: structure, function, and potential application in water bioremediation, Microbial Cell Factories, 2019, 18:200, doi: 10.1186/s12934-019-1248-0
[8] Isabel Pardo and Susana Camarero, Laccase engineering by rational and evolutionary design, Cellular and Molecular Life Sciences, 2014, doi: 10.1007/s00018-014-1824-8
[9] Steven A. Combs et al., Small-molecule ligand docking into comparative models with Rosetta, Nature Protocols, 2013, 8:1277–1298, doi:10.1038/nprot.2013.074
[10] Rocco Moretti et al., Rosetta and the Design of Ligand Binding Sites, Methods Molecular Biology, 2017, 1414:47-62, doi:10.1007/978-1-4939-3569-7_4
[11] Ian W. Davis and David Baker, RosettaLigand Docking with Full Ligand and Receptor Flexibility, Journal of Molecular Biology, 2009, 385:381-392, doi: doi:10.1016/j.jmb.2008.11.010
[12] Gordon Lemmon and Jens Meiler, RosettaLigand docking with flexible XML protocols, Methods Molecular Biology, 2012, 819:143-155, doi: 10.1007/978-1-61779-465-0_10
[13] https://pubchem.ncbi.nlm.nih.gov/, accessed on August 10th 2020
[14] https://www.perkinelmer.com/de/category/chemdraw, accessed on October 21th 2020
[15] http://www.meilerlab.org/index.php/bclcommons/show/b_apps_id/1, accessed on May 5th 2020
[16] https://www.rosettacommons.org/demos/latest/tutorials/ligand_docking/ligand_docking_tutorial, accessed on October 23th 2020
[17] http://www.meilerlab.org/index.php/rosetta-tutorials, accessed on May 5th 2020
[18] A. A. Adeniyi, M. E.S. Soliman, Implementing QM in docking calculations: is it a waste of computational time?, Drug Discovery Today, 2017, 22(8):1216-1223, doi: 10.1016/j.drudis.2017.06.012
[19] Bonneu R et al. Rosetta in CASP4: Progress in ab initio protein structure prediction. Proteins 2001 doi: 10.1002/prot.1170
[20] Das R. and Baker D. Macromolecular Modeling with Rosetta. Annual Review of Biochemistry 2008, 77:363-382, https://doi.org/10.1146/annurev.biochem.77.062906.171838
[21] Kunzmann P. and Hamacher K. Biotite: a unifying open source computational biology framework in Python. BMC Bioinformatics 2018, 19 346, https://doi.org/10.1186/s12859-018-2367-z
[22] https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/relax
[23] Nivon LG et al. A Pareto-Optimal Refinement Method for Protein Design Scaffolds. PLOS ONE 2013, https://doi.org/10.1371/journal.pone.0059004
[24] Mbah AN et al. Drug Target Exploitable Structural Features of Adenylyl Cyclase Activity in Schistosoma mansoni. Drug Target Insights 2012, 6 41-58, doi: 10.4137/DTI.S10219
[25] Morar M. et al. Mechanism and Diversity of the Erythromycin Esterase Family of Enzymes. Biochemistry 2012, 51(8) 1740-51, doi: 10.1021/bi201790u
[26] Edward I. Solomon, Anthony J. Augustine and Jungjoo Yoon O2 Reduction to H2O by the multicopper oxidases. Dalton Transport 2008, 30, doi: 10.1039/b800799c
[27] http://manual.gromacs.org/documentation/2018/user-guide/terminology.html#gmx-force-field
[28] Van Der Spoel D et al. GROMACS: Fast, flexible, and free. Journal of Computational Chemistry 2005, 26(16):1701-18. doi:10.1002/jcc.20291
[29] MacKerell Jr. AD et al. Development and Current Status of the CHARMM Force Field for Nucleic Acids. Biopolymers 2001, 56(4) 257-265 https://doi.org/10.1002/1097-0282(2000)56:4<257::AID-BIP10029>3.0.CO;2-W.
[30] Piana et al Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. Current Opinion in Structural Biology 2014, 24 98-105. https://doi.org/10.1016/j.sbi.2013.12.006
[31] https://www.uni-muenster.de/Physik.TP/archive/fileadmin/lehre/TheorieAKkM/ws12/Schelte.pdf
[32] Huang J. and MacKerell Jr D. CHARMM36 all‐atom additive protein force field: Validation based on comparison to NMR data. Journal of computational Chemistry 2013, DOI: 10.1002/jcc.23354
[33] http://gensoft.pasteur.fr/docs/lammps/12Dec2018/Howto_tip3p.html
[34] http://www.bpc.uni-frankfurt.de/guentert/wiki/images/9/96/180618_TutorialMD.pdf
[35] http://www.mdtutorials.com/gmx/lysozyme/index.html
[36] Zhang L. et al Engineering of Laccase CueO for Improved Electron Transfer in Bioelectrocatalysis by Semi-Rational Design. Chemistry A European Journal 2020, 26(22), DOI: 10.1002/chem.201905598
[37] http://manual.gromacs.org/documentation/2019-rc1/onlinehelp/gmx-covar.html#gmx-covar
[38] B. de Groot. Practical 5: Principal components analysis
[39] Huang, J et al. Programmable and printable Bacillus subtilis biofilms as engineered living materials. Nature Chemical Biology 2018, doi: 10.1038/s41589-018-0169-2
[40] LF. Ribeiro et al. Engineering bifunctional laccase-xylanase chimeras for improved catalytic performance. J Biol Chem. 2011, 286(50) 43026-38, doi: 10.1074/jbc.M111.253419
[41] J. Zhang et al. Design and optimization of a linker for fusion protein construction. Progress in Natural Science 2009, 19(10) 1197-1200, https://doi.org/10.1016/j.pnsc.2008.12.007