When developing the sequence for the SEGI-8 protein, we first focused on the catalytic domain. We utilized the SEGI-8 modified domain as it has been proven to work with higher efficiency in a broader range of conditions. For the linker, we opted to move forward with the ApCel5A linker. The ApCel5A linker was selected due to its flexibility and length, which have been shown to enhance protein efficiency with a feedstock that contains lignin. Finally, the wild type Endoglucanase I cellulose-binding module was selected to ensure the synergy between the catalytic domain and binding module was maintained.

For the modified and wild type enzyme, we predicted the three-dimensional structure using homology modelling. From these structures, we completed molecular dynamic simulations. These simulations were then characterized by GausHaus to determine if the changes increased variance in the enzyme dynamics. The modifications actually had a net lowering of the variance in the enzyme allowing the team to move forward confident in the modifications.


Endoglucanase I design

Endoglucanase has a pivotal role in our system. It is responsible for the first step in breaking down cellulose into the fuel that powers our entire dream. In the name of efficiency, we sought to engineer an endoglucanase to have an even more impactful effect on our system. In our hunt for efficient and exciting endoglucanases, we ran into SEGI-8. SEGI-8 was generated through DNA shuffling and demonstrated improved active conditions and activity compared to wild type cellulases. These findings solidified our choice in using SEGI-8 for the catalytic domain of our modified endoglucanase. Next came the task of selecting a linker region for the modified endoglucanase. For this, we turned to Altering the linker in processive GH5 endoglucanase 1 modulates lignin binding and catalytic properties. Biotechnology for Biofuelswhere they compared multiple different linkers with WT endoglucanase I. This gave us the ApCel5A linker that was proven to increase the amount of total reducing sugars and remain competitive when using cellulose sources with lignin presence. As we are yet unsure what feedstock will be used by our intended communities, we thought this inclusion would help broaden our system's use. Finally, we looked to the CBM, which we opted to go with the wildtype variant to ensure there is no confounding with the linker and catalytic domain. This resulted in the final protein being assembled.

Figure 1. Representation of cellulase componential structure.


How did we verify our changes?

Before we could begin any in silico analysis of the chimeric protein, we needed to generate structural predictions as a starting point. We used homology modelling through Chimera using the MODELLER regiment for both the wild type protein and the modified endoglucanase. This resulted in the initial structural conditions that we placed in molecular dynamics. Molecular Dynamics was completed using GROMACS version 18.02 on the University of Calgary ARC Computing Cluster. Molecular dynamics were run for a nanosecond in spc16 solvent.
Molecular Dynamic Simulations were carried out by the team using the following general scheme with commands in paranthesis.
1. Convert PDB file to GRO file (gmx pdb2gmx)
2. Generate empty box to have 0.75 nanometers extra room around the protein (gmx editconf)
3. Solvate the protein with water using the spc216 water approximation (gmx solvate)
4. Generate ions to ensure the system has neutral charge (gmx genion)
5. Perform Energy Minimization (gmx mdrun with energy minimization mdp file)
6. Perform isothermal-isochoric equilibration (gmx mdrun with isothermal-isochoric equilibration mdp file)
7. Perform isothermal-isobaric Equilibration (gmx mdrun with isothermal-isobaric Equilibration mdp file)
8. Perform Molecule Dynamics (gmx mdrun with molecular dynamics mdp file)

This resulted in the following two simulations:

Figure 2. Molecular dynamic simulation of the wild type endoglucanase(blue) solvated by water.

Above is the molecular dynamic simulation of EG-1 wild type. The protein is present in blue. The white and red streaks are water molecules. The purple dots are ions to ensure the neutral charge of the entire simulation.

Figure 3. Molecular dynamic simulation of modified endoglucanase(pink) solvated by water.

Next, we did the same modelling scheme for our SEGI-8 protein. This is presented above in the same form, but the protein is magenta instead of blue.

After Molecular Dynamics was completed, we had obtained new structure files that better represented the protein in the dynamic state and a trajectory file depicting its dynamics. The structural files were compared using alignment to indicate changes that occurred in the protein structure. This was easier said than done as the flexible tails were in wildly different conformations after simulation. This was addressed by doing an alignment of the catalytic domains without the linker and cellulose-binding module. Therefore, the alignment would be in the context of the linker and cellulose-binding module, but only latently. The catalytic domain of both proteins was extracted from the solvent and cleaved from their tails. After all of the cleaning, the two domains were aligned with each other using the standard Pymol tool. The alignment can be seen below:

Figure 4. Alignment between the wildtype catalytic domain(white) and the catalytic domain of SEGI-8(orange).

From the alignment, we can interrogate structural differences caused by the protein modifications. The main difference appears to be found in the spacing of the alpha-helices. The alpha-helices of SEGI-8 appear to be more spaced out and expanded in comparison to the wild types. This was concerning for our team as this may cause aberrant dynamics with the extra wiggle room leading to inhibited interaction. This led us to look deeper into the dynamics of the protein.

After a structural comparison, we sought to understand the difference the modifications had on the protein dynamics. This was achieved through measurements obtained via GausHaus. GausHaus parameters were able to elucidate the impact our modifications had on the dynamic properties of the endoglucanase. After running the dynamics of both SEGI-8 and the wild type through the script, we compared the variance within the protein's dynamics along each axis. These quantifications can be seen below:

Figure 5. Variance output of the GausHaus measurement.

These findings indicate that the introductions of SEGI-8s modifications led to a decrease in total variance. This was unexpected but welcome. Therefore we were able to move forward with project development, confident that SEGI-8 would not throw wrenches into our end goal.


Where did this model make an impact?

These models allowed us to move forward with the project with this modified endoglucanase as one of its cornerstones. The verification via dynamics meant that the team could dedicate the part design, time, and laboratory supplies, and financial resources to get these enzymes rolling out to feed our beta carotene system.

Along with this models impact on the wetlab, it also had a great impact on our drylab. This model provided a great example and case for GausHaus, allowing us to refine and better implement it as a measuring software. This thereby empowered our measurement and software developments.


Step by Step

  1. Generate a sequence for your protein
  2. Use Homology or ab initio modelling to develop a probable structure starting point for both the wild type and modified protein
  3. Run Molecular Dynamics to generate a dynamically obtained structure and accompanied trajectory file for both proteins
  4. Using Pymol or Chimera run an alignment on the dynamically obtained structures
  5. Use the GausHaus measurement script available on our github to measure the impact of your modifications on the dynamics of the protein
  6. Interrogate and hypothesize on the cause/effect of your protein's differences to your wild type.
  7. Integrate your findings into your project design and see the physical impact your model can have.


R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL

The PyMOL Molecular Graphics System, Version 1.2r3pre, Schrödinger, LLC,

Wang, X., Rong, L., Wang, M., Pan, Y., Zhao, Y., & Tao, F. (2017). Improving the activity of endoglucanase I (EGI) from Saccharomyces cerevisiae by DNA shuffling. RSC Adv.,7(73), 46246-46256. doi:10.1039/c6ra26508a

Wang, Z., Zhang, T., Long, L., & Ding, S. (2018). Altering the linker in processive GH5 endoglucanase 1 modulates lignin binding and catalytic properties. Biotechnology for Biofuels, 11(332).