Team:TJUSLS China/Experiments

<!DOCTYPE html> Team members

Site prediction to improve thermostability
Rational prediction of disulfide bond sites
Build the structure of mutant
Evaluate the stability of mutants

Site prediction to improve thermostability

Consensus Finder


Consensus protein sequences are useful for numerous applications. Often, mutating a protein to be more like the consensus of homologs will often increase the stability of a protein, allowing it to function at higher temperatures, and have better soluble expression when expressed recombinantly in various hosts. "Consensus Finder" will help identify the consensus sequence and find potentially stabilizing mutations.

What it does

Consensus Finder will take given your given protein sequence, find similar sequences from the NCBI database, align them, remove redundant/highly similar sequences, trim alignments to the size of the original query, and analyze consensus. Output is trimmed alignment, consensus sequence, frequency and count tables for amino acids at each position, as well as a list of suggested mutations to consensus that may be stabilizing.

How to use

Choose File

Choose a file containing your protein sequence. Needs to be a plain text file in FASTA format containing the sequence of your protein of interest .

If you don’t have that handy, you can download a FASTA file for proteins from NCBI.


Press Submit and wait. Click the link to see if it’s done or wait for an email if you provided an email address. Usually takes 10-30 minutes, but sometimes takes longer.

The results page shows you a list of mutations to make which are likely to increase the stability of your protein



FireProt is a web server for an automated design of thermostable mutants. The design of thermostable mutants is based on the integration of structural and evolutionary information obtained from several bioinformatics databases and computational tools.

FireProt strategy combines energy- and evolution-based approaches together with several filters that accelerate the calculation by omitting potentially deleterious mutations. Within its workflow, FireProt integrates 16 computational tools, utilizing both sequence and structural information in the process. FireProt web server provides users with a one-stop-shop solution for the design of thermostable proteins, constructed by three distinct strategies: (i) evolution-based approach, utilizing back-to-consensus analysis; (ii) energy-based approach, using conservation, correlation and energy information and (iii) combined approach. Furthermore, the server allows users to include user-defined mutations into the constructed thermostable protein and generate corresponding sequences in the FASTA format. The results are visualized in the web browser in an intuitive and comprehensive way, allowing users to directly analyze the designed proteins.

How to use

structure selection

Select the type of input: PDB code or  PDB file

Select biological unit

Alternative option: Specify chains manually, once this box is checked, the "CHAIN SELECTION" section appears and you can choose the chains for which the FireProt calculation will be performed.


Optionally, you may specify a job title and an e-mail address for receiving the notification about finished calculation.



BLAST E-value:The Expectation value (E-value) describes the number of hits one can expect to see just by chance when searching a database of a particular size.

Minimal identity (%): All sequence homologs having sequence identity with the query protein less than the specified value will be excluded from the database.

Maximal identity(%): All sequence homologs having sequence identity with the query protein greater than the specified value will be excluded from the database.

Max. number of sequences: The maximum number of sequences in the sequence dataset. Sequence homologs identified by BLAST and filtered to fulfill the “Minimal/Maximal identity” thresholds are clustered by UCLAST. Representants of individual clusters are ordered based on the BLAST query coverage and the specified number of the top-ranked sequences are selected. The sequence dataset is used for the estimation of mutability of individual protein positions, analysis of correlated positions and identification of consensus residues. Replace the default value with 1000000 if you want to use all identified homologs


Majority threshold (%):All non-wild-type residues that occur at a given position in at least n% of the analyzed sequences will be assigned as the “majority consensus” residues (n= the specified value)

Minimal frequency(%): All residues that occur at a given position in at least f % of the analyzed sequences and, at the same time are at least r times more frequent at this position than the wild-type residue will be assigned as the “frequency ratio consensus” residues (f= the value specified by the “Minimal frequency” parameter, r= the value specified by the “Minimal ratio” parameter)

Minimal ratio: All residues that occur at a given position in at least f% of the analyzed sequences and, at the same time are at least r times more frequent at this position than the wild-type residue will be assigned as the “frequency ratio consensus” residues (f= the value specified by the “Minimal frequency” parameter, r= the value specified by the “Minimal ratio” parameter)s

Gap threshold (%): All positions of multiple sequence alignment with the gap occurrence above the specified threshold will be excluded from the analysis.


FoldX threshold (evolution approach): Cut-off used by FoldX in the evolution-based approach. Mutations with predicted ddG higher than this threshold will be discarded.

FoldX threshold (energy approach): Cut-off used by FoldX in the energy-based approach. Mutations with predicted ddG higher than this threshold will be omitted from further calculation.

Rosetta threshold (energy approach): Threshold used by Rosetta in the energy-based approach. Mutations with the predicted ddG lower than this value will be accepted as potentially stabilizing mutations.

Rosetta threshold (multipoint mutants): Threshold used by Rosetta when predicting the antagonistic effect of mutations. Mutations with the predicted ddG higher than this threshold will be noted as potentially antagonistic and will not occur together in the frame of one multiple-point mutant


Z-score threshold: Each two protein positions which have the average z-score greater than or equal to the specified threshold will be assigned as correlated positions. An average z-score is calculated from the results of all tools that returned a valid score. The full set tools for the prediction of correlated positions involves: ELSC, FreeContact, McBASC, Ml, aMlc, OMES and SCA


We can directly analyze multiple-point mutants designed by three distinct strategies: evolution-based approach, utilizing back-to-consensus analysis; (ii) energy-based approach, using (ii) stability hot spots by structural flexibility represented by flexible residues, (iii) stability hot spots by sequence consensus represented by positions which are frequently occupied by the same conservation, correlation and energy information and (iii) combined approach.

The "Results browser" page contains the information about the job, status of the calculation, links to the results and for downloading the data.

JOB INFO section

This section provides the basic information about the job and current status of the calculation (queued / running / successfully finished / finished with error).

Structure: The link to the RCSB PDB database or name of the uploaded PDB file

Title: The user-defined description of the job.

ID: The unique identifier of the job

REPORT section

This section provides a detailed information about the status of the calculation.

RESULTS section

This section provides the possibility to download the raw data from individual calculation steps in a zip archive and also redirect yourself to the result page for further analysis of the results



The freely available PoPMuSiC-2.1 web server is highly useful for identifying very rapidly a list of possibly relevant mutations with the desired stability properties, on which subsequent experimental studies can be focused. It can also be used to detect sequence regions corresponding to structural weaknesses, which could be functionally important or structurally delicate regions, with obvious applications in rational protein design.

What we should do is to submit our protein structure, the results will be given soon.


PositionScan mutates one amino acid to the other 20, or to selected ones and repairs the neighbour residues. The minimal PDBFile configuration file is: command=PositionScan pdb=PS.pdb. It can be run from the command line: FoldX --command=PositionScan --pdb=PS.pdb

Rational prediction of disulfide bond sites


Requirements for MODIP

There is a provision to run MODIP online, for proteins not present in DSDBASE or for new protein structure

1. Upload or Paste the Coordinates file in PDB format fields.

2. PDB file can be single chain or multi chain.

3. MODIP won't accept 'C-alpha only' proteins.


The output is displayed on to the screen with information such as total number of disulphides, number of modelled and native disulphides and grade distribution are given. followed by this detailed results will be given in table. Table gives the information about Residues in postion "i" and "j" which can accomadate disulphide bridge, and corresponding grade.

You can get the Mutant pdb file, where choosen pair of residues are changed to "Cysteine" and sulfur coordinates are fixed( Sulphur coordinates are not available for D grade disulphides).

Disulfide by Design 2


Disulfide by Design 2 (DbD2) is a redesigned and enhanced version of original DbD application (Dombkowski, 2003) for the rational design of disulfide bonds in proteins. For a given protein structural model, all residue pairs are rapidly assessed for proximity and geometry consistent with disulfide formation, assuming the residues were mutated to cysteines. The output displays residue pairs meeting the appropriate criteria. The input model will typically be a Protein Data Bank (PDB) structure for the protein of interest; however, structures developed through homology modeling may also be used. Engineered disulfides have proven useful for increasing the stability of proteins and to assist the investigation of protein dynamics and interactions. This software was written by Dr. Alan Dombkowski and Douglas Craig, and is based on algorithms created for disulfide identification in protein fold recognition methods. The Disulfide by Design algorithm has been successfully used for disulfide engineering in a wide variety of applications.

Getting started

load protein file

Try loading the sample PDB file by clicking on “Load Example PDB”

Run Analysis

Click on the “Run” button at the bottom of the Options Pane.

Evaluate Results

Inspect the analysis results. Find candidate residue pairs based on bond energy, B-factor, associated secondary structure, or 3-D considerations.

Build Mutant Protein

Select one or more residue pairs using the checkboxes under the ANALYSIS tab.

Click on “Create/View Mutant” to build the new disulfide bonds.

View & Save Results

Inspect the resulting model under the 3-D VIEWS tab. Click on “Save Mutant PDB” to save the new PDB file.

Build the structure of mutant

Swiss model

SWISS-MODEL is a web-based integrated service dedicated to protein structure homology modelling. It guides the user in building protein homology models at different levels of complexity.

Building a homology model comprises four main steps: (i) identification of structural template(s), (ii) alignment of target sequence and template structure(s), (iii) model-building, and (iv) model quality evaluation. These steps require specialised software and integrate up-to-date protein sequence and structure databases. Each of the above steps can be repeated interactively until a satisfying modelling result is achieved.

What you should do is input the protein sequence and click on the build model, and the results will be displayed in a few minutes. Finally, you can download the model.


This is the workhorse of FoldX mutation engine. This command ensures that whenever you are mutating a protein you always move the same neighbours in the WT and in the mutant producing for each mutant PDB a corresponding PDB for its WT (each mutation will move different neighbours and therefore you need different WT references). Build model requires a mutant-file to run. The minimal configuration file for BuildModel: command=BuildModel pdb=BM.pdb mutant-file=individual_list.text

It can be run using the command line: FoldX --command=BuildModel --pdb=BM.pdb --mutant-file=individual_list.text

The parameter numberOfRuns tells the algorithm how many times it should do the specified mutations. Normally it should be set to 1. However, sometimes it can be set to higher numbers to see if the algorithm has achieved convergence, or in other words if the solution offered is the optimal or a trapped solution. The option out-pdb is setted to true by default and can be use with commands as Pssm or PssmStability.

FoldX uses output-file as a tag to label different outputs from different commands in batch runs. After running BuildModel you'll get three files to look at. Given pdb=PDB.pdb the output files are.

Average_PDB_BM.fxout -> average energy of the different runs

Dif_PDB_BM.fxout -> energy difference between reference WT and the corresponding mutant

Raw_PDB_BM.fxout -> full energy decomposition for the WT references and the mutants

PdbList_PDB_BM.fxout -> names of all the mutants and the corresponding WT references

Produces a FOLD-X pdb file. The minimal PDBFile configuration file is: command=PDBFile pdb=PF.pdb. It can be run from the command line: FoldX --command=PDBFile --pdb=PF.pdb

It is highly recommended to repair your structures before you do any modelling with FoldX. RepairPDB identify those residues which have bad torsion angles, or VanderWaals' clashes, or total energy, and repairs them. The minimal configuration file for RepairPDB is: command=RepairPDB pdb=RP.pdb. It can be run from the command line: FoldX --command=RepairPDB --pdb=RP.pdb.

Evaluate the stability of mutants


ProtParam computes various physico-chemical properties that can be deduced from a protein sequence. No additional information is required about the protein under consideration. The protein can either be specified as a Swiss-Prot/TrEMBL accession number or ID, or in form of a raw sequence. White space and numbers are ignored. If you provide the accession number of a Swiss-Prot/TrEMBL entry, you will be prompted with an intermediary page that allows you to select the portion of the sequence on which you would like to perform the analysis. The choice includes a selection of mature chains or peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking on the positions), as well as the possibility to enter start and end position in two boxes. By default the complete sequence will be analyzed.


The parameters computed by ProtParam include the molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index and grand average of hydropathicity


In the learning model, iStable adopted the support vector machine as an integrator, while not just choosing the majority answer given by element predictors. Furthermore, the role of the sequence information played was analyzed in our model, and an 11-window size was determined. On the other hand, iStable is available with two different input types: structural and sequential. After training and cross-validation, iStable has better performance than all of the element predictors on several datasets. Under different classifications and conditions for validation, this study has also shown better overall performance in different types of secondary structures, relative solvent accessibility circumstances, and protein memberships in different superfamilies.

The trained and validated version of iStable provides an accurate approach for prediction of protein stability change


I-Mutant is a suite of Support Vector Machine based predictors integrated in an unique web server. It offers the opportunity to predict automatically protein changes upon single-site mutation starting from protein sequence alone or protein structure when available.

Firstly, you should choose whether you want to enter a protein sequence or a structure on the front page, then click enter. Next, you should choose the position that you want to mutate, the new residue, PH and temperature, then enter your email address and click sumit. Finally, you will receive your result in your email address.


B-factor, also known as "temperature Factor," is another indicator of protein flexibility. RMSF reflects the dynamic flexibility of proteins, b-factor reflects the static flexibility of proteins. The B-factor value is obtained from the X-ray diffraction crystal structure data of the protein, which reflects the dispersion degree of the electron layer of the atom. The B-factor value of an amino acid is obtained by averaging the B-factors of all the atoms that make up the amino acid.

B-fitter is one such software that takes atomic B-factor data from a protein's third-order crystal structure and, on average, takes all the b-factor values of its residues to measure the protein's flexibility.

Download B-FITTER

Execute B-FITTER.exe and select the pdb file that you want to use. The program automatically generates a rank list of 20 residues showing the highest B-factor, and an output file that appears in the same folder where the pdb file is located. The output file has the same name as the pdb file with the extension .out. This file can be opened with any text editor.



An important part of protein modification is the evaluation of mutants. Rosetta algorithm can predict, design and analyze proteins, provide a flexible library of functions for various biomolecular modeling tasks, and can reflect the stability of proteins in the form of ΔΔG.

DDG: Rosetta ddg_monomer

The purpose of this program is to predict stability changes (ΔΔG) in monomer proteins caused by point mutations. The program takes the wild-type crystal structure as an input (which must first be pre-minimized), and produces a structural model of point mutation. ΔΔG is given by Rosetta energy differences between wild-type and point-mutated structures.

More precisely, the wild-type and mutant structures each generated 50 models, with the most accurate ΔΔG as the difference between the average of the first three wild-type structures and the first three scoring points of the mutant structures.

Rosetta follows the convention that a negative ΔΔG value indicates increased stability, i.e., ΔΔG= mutant energy - wild-type energy.

ddG_monomer has been successfully applied in the stability engineering process of several proteins including limonene epoxide hydrolase and halogenated alkane dehalase. Recently, some studies have applied this software to improve the thermal stability of escherichia coli ketoalcohol enzyme, and evaluated its prediction accuracy, and finally found that the accuracy can reach 65%.


Calculates the DG to fold the proteins from their unfolded state. The minimal configuration file for Stability is: command=Stablity pdb=ST.pdb. It can be run from the command line: FoldX –command=Stablity –pdb=ST.pdb.


The first step:

Get and process PDB files:

It mainly deals with crystal water // ligands // ions // missing atoms/residues/side chains

The second step:

Initialization parameters:

1.Put the PDB and get the topology file with pdb2gmx

gmx pdb2gmx -ignh -ff amber99sb-ildn -f 5xjh.pdb -o 5xjh.gro -p -water tip3p

We get three output files: structure file 8.gro, topology file, and posre.itp.

-ignh: Ignore all hydrogen atoms in the input PDB file. Since this PDB file is generated by NMR and contains hydrogen atoms, use this option to ignore the hydrogen atoms in the file and use the uniform naming rules for hydrogen atoms in the GROMOS field to avoid inconsistent hydrogen names.

-ff: Specify the force field to use. We use the Amber99SB-ILDN force field. Other force fields can also be used, please choose according to your own system.

-f: Specifies the protein structure file that needs to be processed

-o: Specify a newly generated GROMACS filename (or some other file type, such as PDB)

-p: Specifies the filename of the newly generated topology that contains all the atoms and the parameters of the interactions between the atoms

- water: Specify water model. We use TIP3P water model. Other water models such as SPC/E can also be used, please choose according to your needs

Step 3: Create the mock box:

Next we use the editconf command to create the periodic mock boxes.

gmx editconf -f 5xjh.gro -o 5xjh-PBC.gro -bt cubic -d 1.2 -c

-f: Input protein structure

-o: Outputs a structure file with simulated box information

-bt: The rectangular box is used by default. The -BT option can be used to change the box type, such as octahedron, dodecahedron, etc

-d: The minimum distance between the protein and the simulated box in the X, Y, and Z directions

We initially created a rhombic dodecahedron box using the -bt option, because this box is nearly spherical and computatively most efficient. Later, because the box is too small, the protein movement beyond the boundary and broken, replaced with a cube box

The -d option sets the minimum distance in nm from the edge of the box, which determines the size of the box. In theory, in most systems, -d cannot be less than 0.9 nm. We used 1.2 nm.

Step 4: Fill the box with solvent and ions and minimize the energy to obtain the filled structure file fws-b4ion.gro.

gmx solvate -cp 5xjh.gro -cs spc216.gro -p -o 5xjh-b4ion.gro

-cp specifies systems that need to be filled with water molecules, with analog protein boxes,

-cs specifies that SPC water model is used for filling, spC216 is a uniform three-point water substructure of GROMACS,

-p modifies the topology file of the system to add the physical parameters of the corresponding water molecules

-o specifies the output file after filling the water molecule

Use GroMPP to process the energy minimization parameter file em-sol-pme.mdp as follows

We will use the TPR file (Ion.tpr) obtained in the previous step to add ions to the system. The compensating ions added are used to neutralize the net charge in the system. Our model has a + 2.00e net charge, so the number of anions added is greater than the number of cations.

The following genion command replaces some water molecules with ions.

gmx grompp -f em-sol-pme.mdp -c 5xjh-b4ion.gro -p -o ion.tpr -maxwarn 1

gmx genion -s ion.tpr -o 5xjh-b4em.gro -neutral -conc 0.15 -p

Select a group:13

The -conc option sets the desired ion concentration (0.15m in this case). The -g option specifies the name of the output log file. The -Norandom option places the ions according to the electrostatic potential rather than randomly (the default), but we use random placement here.

Select 13 (SOL), press enter, and the program will tell you that two solvent molecules have been replaced by chlorine ions. fws. top will contain NA and CL ions. Note that the order of the molecules appearing here must be the same as in the coordinate file.

gmx grompp-f em-sol-pme.mdp-c 5xjh-b4em. gro -p em-sol.tpr-r 5xjh-b4em.gro generates the tpr file again

Step 4: Operating energy minimization:

gmx mdrun -v -deffnm em-sol -nt 24

When energy minimization is over, you will see the following summary in the log file, indicating that the steepest descent converges

Steepest Descents:

Tolerance (Fmax) = 2.50000e+02

Number of steps = 5000

Step= 0, Dmax= 1.0e-02 nm, Epot= -6.33612e+04 Fmax= 3.90439e+05, atom= 3703

Step= 1, Dmax= 1.0e-02 nm, Epot= -7.90314e+04 Fmax= 1.08842e+05, atom= 3703


Step= 3353, Dmax= 1.9e-03 nm, Epot= -2.13599e+05 Fmax= 2.44805e+02, atom= 64

writing lowest energy coordinates.

Steepest Descents converged to Fmax < 250 in 3354 steps

Potential Energy = -2.1359919e+05

Maximum force = 2.4480508e+02 on atom 64

Norm of force = 8.9025946e+00

NOTE: 18 % of the run time was spent in pair search,you might want to increase nstlist (this has no effect on accuracy) gcq#101: "The World is a Friendly Place" (Magnapop)

Note: The default salt used by Genion is NaCl. If you need to use different cations and anions, you can use -pName and -nname to specify the names of anions and ions respectively (according to the set of ions in the corresponding field field in the.itP file), and you can also use -PN and -NN to specify the number of added anions, respectively.

Step 6: Location-restricted preequilibrium simulation

We now need to do a position-limiting simulation of the whole system, which is to relax the solvent and the ions while keeping the protein atoms where they are. Location-limiting simulations limit (or partially freeze) the position of the atoms in the macromolecules and allow the solvent molecules to move, as if immersing the macromolecules in water would further balance the system.

The relaxation time of water molecules is about 10ps, and large systems (large proteins or lipids) may require longer equilibrium times, so we need to conduct positional limiting simulations over 10ps (at least an order of magnitude longer). We will perform two stage location-constrained simulations: NVT ensemble balance of 100 ps and NPT ensemble balance of 100 ps. The temperature we used for the simulation was 300 K, which is close to room temperature for most experimental conditions. Some people do simulations at 310 kelvin, which is closer to body temperature or physiological temperature.

(For the parameter file OF NVT position restriction simulation, see the attachment for the contents of nvt-pr-md.mdp)

Preprocess the file and run the NVT simulation

gmx grompp -f nvt-pr-md. mdp -c em-sol.gro -p -o nvt-pr.tpr -r em-sol.gro

gmx mdrun -v -deffnm nvt-pr

(See the attachment for NPT position limiting simulation parameter file NPT PR Md.MDP)

Use the structure of the NVT simulation output as the initial structure preprocessing file for the NPT simulation, and then run the simulation

gmx grompp -f npt-pr-md.mdp -c nvt-pr.gro -p -o npt-pr.tpr -r nvt-pr.gro

gmx mdrun -v -deffnm npt-pr

Step 7: Simulate the finished product

Now that the system is pre-balanced, we can begin to simulate the final product.

(See the attachment for NPT noPR-Md.MDP parameter file used for NPT simulation)

Handling documents: gmx grompp -f npt-nopr-md.mdp -c npt-pr.gro -p -o npt-nopr.tpr -r npt-pr.gro

Product simulation: gmx mdrun -v -s npt-nopr.tpr -deffnm npt-nopr -nt 42

Various result files with the file name NPT - NOPR will be obtained after the simulation, among which the track file (TRR file) is the most important, which is the basis for analyzing the simulation results.

You can use the trjconv command to compress the TRR track file, which saves hard disk space, and the XTC file is faster for analysis. In fact, setting nstxtcout = 500 has made the resulting track file a little smaller. To display the trajectory, you might want to use the -pbc nojump option to keep all the atoms in the box.

gmx-5.x: gmx trjconv -f npt-nopr.trr -s npt-nopr.tpr -o npt-nopr.xtc -pbc nojump -ur compact -center

When prompted to select, select 0 both times.

After obtaining the xtc file, you can delete the trr file if speed and force are not required for subsequent analysis

Simulation results analysis Results

g_rms calculates the root mean square deviation RMSD

These programs are useful for measuring root mean square deviations in the structure. Use g_rmsto evaluate the deviation of the structure from the original starting structure over the course of thesimulation gmx-5.x: gmx rms -s npt-nopr.tpr -f npt-nopr.xtc -o bkbone-rmsd.xvg. When prompted, select the 4 (Backbone) program to perform a least squares fit to the structure and give the RMSD change over time.

g_rmsf calculates root mean square fluctuation RMSF and temperature factor

The g_rmsf program is used to calculate the root mean square fluctuation (RMSF) of the atomic position. The formula for the root mean square fluctuation is as follows: RMSF= ,where rt is the conformation at a certain moment, rref is the reference conformation, and T is the total time. Both RMSF and RMSD are the quantities that characterize structural changes. The equations of the two are similar and their functions are similar. The difference is that RMSD is the average of the total number of atoms, and RMSF is the average of time. In GROMACS, g_rms gets the curve with time, g_rmsf gets the curve with atomic number (you can use the -res option to get the curve with amino acid residues). Similar to g_covar, g_rmsf can also be used to calculate the average Structure, and the calculation is faster. In actual research, it is usually not necessary to calculate the average structure of the entire trajectory, only the average structure after equilibrium is calculated, and a relatively stable time range for the structure can be selected according to the RMSD diagram (calculated by g_rms) For example, to calculate the average structure within 500 ps, the following command can be used: gmx-5.x: gmx rmsf -s npt-nopr.tpr -f npt-nopr.xtc -b 500 -o rmsf.xvg -ox avg.pdb. When prompted, select 1 (Protein) *Note: The average structure is often rough and requires further energy minimization.

g_rmsf is used to calculate the B-factor of each residue, and the obtained B-factor can be compared with the b-factor of the X-ray crystal structure.

gmx-5.x: gmx rmsf -s npt-nopr.tpr -f npt-nopr.xtc -o rmsf.xvg -ox avg.pdb -res -oq bfac.pdb When prompted, select 1 Protein Note: you will get bfac .pdb load PyMOL, click Hide->everything>Show->cartoon>Color->spectrum->b-factors in turn, if you want to specify the maximum cutoff value of B-factor, you can use: Q = Xcmd.alter("all" , "q = b> Q and Q or b")spectrum q. Note: X is the maximum cutoff value of B-factor

Use g_gyrate to measure the radius of gyration

This quantity gives a measure of the“compactness” of the structure. This gives a measure of the mass of the atom(s) relative the centerof mass of the gmx gyrate -s npt-nopr.tpr -f npt-nopr.xtc -o gyrate.xvg. Select 1 protein.

Use g_sas to calculate solvent accessible surface area

The hydrophobicity of amino acid residues is an important factor affecting protein folding. Solvent accessible surface area (SASA) is an important parameter describing the hydrophobicity of proteins. g_sas can be used to calculate the solvent accessible surface area of proteins: gmx-5.x: gmx sasa -s npt-nopr.tpr -f npt-nopr.xtc -o area.xvg -or resarea.xvg -oa atomarea.xvg. Select 1 when prompted.

Use g_hbond to statistic hydrogen bond

The g_hbond program can be used to calculate the number of hydrogen bonds between molecules or groups and the distribution of hydrogen bond distances or angles during the simulation. gmx-5.x: gmx hbond -s npt-nopr.tpr -f npt-nopr.xtc -num fws_hnum.xvg. When prompted, select (Protein) and (protein). The default judgment criteria for hydrogen bonds in GROAMCS are as follows: r≤0.35nmα≤30∘ You can use the -r and -a options to set other thresholds. By default, g_hbond calculates donor receptors The distance rDA can be set to calculate the rHA distance with the -da no option.

Use g_covar for principal component analysis principal component analysis

The PCA (Principal Components Analysis) method can help us determine which motion modes contribute the most to the overall dynamics of the protein. In a system containing N atoms, there are 3N-6 possible internal motion modes (6 degrees of freedom are required to describe The external rotation and translation of the system). For PCA analysis, we are mainly concerned with the atoms on the protein backbone.

gmx-5.x: gmx covar -s npt-nopr.tpr -f npt-nopr.xtc -o eigenval.xvg -v eigenvect.trr -xpma covara.xpm

For both stacking and analysis, select Group 4 (Protein backbone) to use xpm2ps to make a picture of the atomic covariance matrix, where the -do option will output a configuration file for modifying the settings of the drawing (axis title, legend, etc.).gmx -5.x: gmx xpm2ps -f covara.xpm -o covara.eps -do covara.m2p Use ghostview (or PhotoShop) to view the resulting picture (gv covara.eps). Use xpm2ps -rainbow option to use other colors Scheme. In a typical situation, only a few eigenvectors (fluctuation patterns) have a significant contribution to the overall movement of the protein. To view the most important patterns (1), you can use the following command: gmx-5.x : gmx anaeig -v eigenvec.trr -s npt-nopr.tpr -f npt-nopr.xtc -first 1 -last 1 -nframes 100 -extr fws-ev1.pdbgmx-5.x: gmx anaeig -s npt-nopr .tpr -f npt-nopr.xtc -first 1 -last 2 -2d proj-1-2.xvg.


Autodock vina

In order to improve the thermal stability of PETase, we need to analyze the interaction forces in the protein. But after the combination of PET, the structure of PETase may change, and these structural changes may affect various forces in the protein. Therefore, in order for our analysis to be closer to the real situation, we need a PET and PETase Complex model. However, in the PDB database (RCSB), we did not find a ready-made model, so we need to build a model by molecular docking. In order to ensure the accuracy of this model, we refer to the parameter settings in a document: Structural insight into molecular mechanism of poly (ethylene terephthalate) degradation.


Before we do docking, we need to understand how computers do docking.

The biological basis of molecular docking

For docking, space matching and energy matching must be met. Spatial matching is a structural fit. There are three main aspects of energy matching, namely, the following conditions are met at the junction of the receptor (such as PETase) and the ligand (such as PET): the electrostatic interactions match each other (positive charge corresponds to negative charge), The hydrogen bonding interactions match each other (the hydrogen bond donor corresponds to the acceptor), and the hydrophobic interactions match each other (the hydrophobic region corresponds to the hydrophobic region). The determination of the binding strength of the receptor and the ligand depends on the free energy of binding. The software mentioned below is also based on this biological principle.

Docking with software

After trying several docking software, we chose autodock vina because it is easy to run, has high accuracy, and is not prone to bugs.

Next, the process of using autodock vina for docking is explained. First, the computer needs to hydrogenate the ligand and the receptor and calculate the charge respectively before docking, and then manually set the irreversible bond for the ligand and the flexible residue for the receptor. . After that, the most critical step is to delineate a box range on the receptor, so that the docking software program can search for the conformation of the ligand within the box range, and then score according to the different conformation, orientation, position and energy of the ligand, and finally Sort the results. The accuracy of Box settings will affect the accuracy of the docking results.