Team:TU Darmstadt/Model/Rosetta Guide

image/svg+xml - O O



Rosetta Guide

Rosetta is a console-based software suite capable of solving multiple computational macromolecular problems such as de novo protein design, enzyme design, ligand docking, and structure prediction of biological macromolecules or macromolecular complexes. Since it is very hard to start with for beginners we want to provide a guide for everyone wanting to take their first steps with Rosetta. In this guide we closely orientated on the Rosetta Documentation, which can be pretty confusing for beginners so we added our own experiences and some extra steps that may not be obvious to starters.

Installation

Unfortunately, Rosetta does not provide you an installer with a graphical user interface to easily install the program with a few clicks. Instead you need to download the source code, unpack it in the location you want to install Rosetta and compile it with a C++ compiler collection. You can only install Rosetta on Linux operating systems. Luckily, Windows already has a built in Linux subsystem and you can easily download an OS like Ubuntu in the Microsoft App Store. Alternatively, you can install a virtual machine software such as VirtualBox and install Linux manually. After you downloaded the source code change the directory to the installation using the "cd" command. Simply add the path you want to change to after the "cd", for example “cd iGEM” if you want to get into a directory called “iGEM”. If you want to have a look at the directory you are currently in you can type "ls" to see the content in this directory. After you found your rosetta[releasenumber].tar.gz file you unpack it with the "tar" command and change directory to the rosetta[releasenumber]/main/source folder. We added an example of Rosetta unpacking below.
#Assuming you downloaded the Rosetta Source Code to the directory “Software” on your c drive on a linux subsystem for windows this would be your way of unpacking:
>cd /mnt/c/Software
>tar -xf rosetta[releasenumber].tar.gz   #This may take a while
>cd rosetta[releasenumber]/main/source 
#[releasenumber] can be simply replaced by the Rosetta version you downloaded
We recommend gcc as a compiler because it provides good performance and can easily installed using apt-get. After you installed gcc you can continue to the main Rosetta installation.
#sudo gives superuser allowance, that is needed to install the program. You also need to enter your password before the installation begins.
>sudo apt-get install gcc
Rosetta provides you a python script that is able to locate your installed compiler and automatically compile the program afterwards. Use the following command to change your current directory to the source folder of Rosetta.
#The -j flags defines the number of cores g++ or MPICC can use to compile the source code. The more cores you pick less shorter the installation will take. Take in mind that a installation of Rosetta will take really long. If you are compiling on a single core it will take longer than 24h.
> ./scons.py -j  mode=release bin
After the installation is finished the bin directory should be added to your source folder. You can check if the installation was successful by running one of Rosetta’s multiple applications in the bin folder. This is done by simply writing the name of the program in the command prompt. If you want extras added to your installation like MPI, which can be used for big modelling runs because it allows the execution of multiple Rosetta tasks on multiple cores, you can add an additional flag to your scons command. You can see all options in the Rosetta documentation.
#Checking if the installation was successful: When the error thrown after running the application is thrown by Rosetta the installation succeeded. 
>$PATH_TO_ROSETTA$/main/source/bin/AbinitioRelax.default.linuxgccrelease
ERROR: Error: can't read sequence! Use -in::file::fasta sequence.fasta or -in::file::native native.pdb!
ERROR:: Exit from: src/protocols/abinitio/AbrelaxApplication.cc line: 372
[ ERROR ]: Caught exception:
File: src/protocols/abinitio/AbrelaxApplication.cc:372
[ ERROR ] UtilityExitException
ERROR: Error: can't read sequence! Use -in::file::fasta sequence.fasta or -in::file::native native.pdb! 
#The internal Rosetta error shows us that Rosetta is installed.

#Rosetta can be compiled wih MPI support by adding the mpi flag to your scons command:
>./scons.py extras=mpi
Now you have successfully installed Rosetta and can continue with the tutorials either on our Wiki page, the Meiler Lab page or the Rosetta Documentation itself.

Rosetta CM

1. Blast search and alignement

An amazing tool for structure prediction is RosettaCM[1]. To model a protein sequence’s structure, you need to find 3D-structures of proteins that possess a high homology to your target sequence. These structures can be found using blast applications such as NCBI blasptp by blasting your target sequence against the PDB database. The results can then be aligned to your target sequence using the Clustal Omega webserver. Afterwards, you have to convert the alignments to the Rosetta Grishan datatype. We provide you an example of our target sequence EreB aligned with pdb entry 2RAD_A.
## WP_063844486 2rad.pdb
#
scores_from_program: 0
0 ---------------------------------------MRFEEWVKDKHIPFKLNHPDDNYDDFKPLRKIIGDTRVVALGENSHFIKEFFLLRHTLLRFFIEDLGFTTFAFEFGFAEGQIINNWIHGQGTDDEIGRFLKHFYYPEELKTTFLWLREYNKAA--KEKITFLGIDIPRNGGSYLPNMEIVHDFFRTADKEALHIIDDAFNIAKKIDYFSTSQAALNLHELTDSEKCRLTSQLARVKVRLEAMAPIHIEKYGIDKYETILHYANGMIYLDYNIQAMSGFISG----GGMQG-DMGAKDKYMADSVLWHLKNPQSEQKVIVVAHNAHIQKTPILYDGFLSCLPMGQRLKNAIGDDYMSLGITSYSGHTAALYPEVDTKYGFRVDNFQLQEPNE-----GSVEK------AISGCGVTNSFVFFRNIPEDLQSIPNMIRFDSIYMKAELEKAFDGIFQIEKSSVSEVVYE-------
0 MKKKIIIAIVASAITMTHFVGNTYADSKTEVSVTAPYNTNQIAKWLEAHAKPLKTTNPTASLNDLKPLKNMVGSASIVGLGEATHGAHEVFTMKHRIVKYLVSEKGFTNLVLEEGWDRALELDRYVLTGK--GNPSQHLTPVFKTKEMLDLLDWIRQYNANPKHKSKVRVIGMDIQSVNENVYNN---IIEYIKANNSKLLPRVEEKIKG---LIPVTK--DMNTFESLTKEEKEKYVLDAKTISALLEENKS----------YLN--GKSKEFAWIKQNARIIEQFTTMLATPPDKPADFYLKHDIAMYENAKWTEEH--L-GKTIVWGHNGHVSKTNML--SFIYPKVAGQHLAEYYGKRYVSIGTSVYEGQYNVKNSDGEFG---PYGTLKSDDPNSYNYIFGQVKKDQFFIDLRKANGVTKTWLN—EQHPIFAGITTEGPDIPKTVDISLGKAFDILVQIQKVSPSQVHQLEHHHHHH
#The first sequence is the target sequence, the second sequence the template sequence. Note that the sequences should both be written in one line and both start with a 0 and a space. Further information on the Grishin format can be found in the Rosetta Documentation.
Now you can use the Rosetta partial thread applications to thread the target sequence onto the structure of your template.
#Threading of the template sequences. This step is performed for every structure used as a template. You need to provide the templates structure as a pdb file, the targets sequence as a fasta file, the structures’ alignment and the path to the Rosetta database.
> $PATH_TO_ROSETTA$/main/source/bin/partial_thread.default.linuxgccrelease -database $PATH_TO_ROSETTA$/main/database/ -in:file:fasta “target_structure.fasta” -in:file:alignment “alignment.grishin” -in:file:template_pdb “template.pdb” -ignore_unrecognized_res  

2. XML

To enhance the model’s precision structure fragments can be generated using the old Robetta webserver. The main script for Rosetta is a so-called XML File. You can add all steps Rosetta should perform by simply adding them to the script.
#This is the example XML script similar to the one provided on the Rosetta Documentation site. 
    
<ROSETTASCRIPTS>
    <TASKOPERATIONS>
    </TASKOPERATIONS>
    <SCOREFXNS>
        <ScoreFunction name="stage1" weights="score3" symmetric="0">
            <Reweight scoretype="atom_pair_constraint" weight="0.5"/>
        </ScoreFunction>
        <ScoreFunction name="stage2" weights="score4_smooth_cart" symmetric="0">
            <Reweight scoretype="atom_pair_constraint" weight="0.5"/>
        </ScoreFunction>
        <ScoreFunction name="fullatom" weights="talaris2013_cart" symmetric="0">
            <Reweight scoretype="atom_pair_constraint" weight="0.5"/>
        </ScoreFunction>
    </SCOREFXNS>
    <FILTERS>
    </FILTERS>
    <MOVERS>
        <Hybridize name="hybridize" stage1_scorefxn="stage1" stage2_scorefxn="stage2" fa_scorefxn="fullatom" batch="1" stage1_increase_cycles="1.0" stage2_increase_cycles="1.0" linmin_only="1">
            <Fragments three_mers="aat000_03_05.200_v1_3" nine_mers="aat000_09_05.200_v1_3"/>
            <Template pdb="2qgm.pdb.pdb" cst_file="AUTO" weight="1.000" />
            <Template pdb="2rad.pdb.pdb" cst_file="AUTO" weight="1.000"/>
            <Template pdb="3b55.pdb.pdb" cst_file="AUTO" weight="1.000"/>
        </Hybridize>
    </MOVERS>
    <APPLY_TO_POSE>
    </APPLY_TO_POSE>
    <PROTOCOLS>
        <Add mover="hybridize"/>
    </PROTOCOLS>
</ROSETTASCRIPTS>

3. Options

Additionally, you have to give Rosetta the instructions on where to find the XML file, the target sequence and other options, for example for the final structure relaxation. You can pick whatever options are suitable for your simulation and simply add them to a text file that you add in the command prompt when starting the Rosetta application.
#This is an example for an options file. You have to add the names of the structures behind the flags (start with a – followed by the description, e.g. -nstruct). You can also provide this information directly in the command prompt but it is easier to have a options file. Let’s assume we named our file “options”
# i/o
-in:file:fasta EreB.fasta
-parser:protocol ereb_hybridization.xml
-nstruct 1
-database /mnt/c/Software/rosetta_src_2020.08.61146_bundle/main/database
#-seed_offset  	# seed offset if necessary

# relax options
-relax:minimize_bond_angles
-relax:minimize_bond_lengths
-relax:jump_move true
-default_max_cycles 200
-relax:min_type lbfgs_armijo_nonmonotone
-relax:jump_move true
-hybridize:stage1_probability 1.0
-restore_talaris_behavior

# reduce memory footprint
-chemical:exclude_patches LowerDNA  UpperDNA Cterm_amidation SpecialRotamer VirtualBB ShoveBB VirtualDNAPhosphate VirtualNTerm CTermConnect sc_orbitals pro_hydroxylated_case1 pro_hydroxylated_case2 ser_phosphorylated thr_phosphorylated  tyr_phosphorylated tyr_sulfated lys_dimethylated lys_monomethylated  lys_trimethylated lys_acetylated glu_carboxylated cys_acetylated tyr_diiodinated N_acetylated C_methylamidated MethylatedProteinCterm

#Now the main simulation can be started using the Rosetta scripts application, that is able to read the XML file and execute the commands:
>$PATH_TO_ROSETTA$/main/source/bin/rosetta_scripts.linuxgccrelease @options -database $PATH_TO_ROSETTA$/main/database

Rosetta Docking

The Rosetta Docking tool is a powerful method for the evaluation of Enzyme-Ligand-Interactions. Here, we focus on the basic commands and files that are needed to perform small molecule docking simulations[2].
Before you are able to perform the docking itself, you will need to have six different files in your working directory:

  • enzyme.pdb: the pdb-file of the enzyme that you want your ligand to dock
  • ligand.pdb: the respective pdb-file of your ligand
  • ligand.params: a parameter file that contains the necessary information for Rosetta to process the ligand
  • ligand_conformers.pdb: a conformational library of your ligand
  • options.txt: a textfile which acts as a starting point for Rosetta to gather all information surrounding the simulation such as input, output or the number of structures you want to create
  • dock.xml: the xml-file specifies the sequence of the Rosetta-functions that will act on the Enzyme and the Ligand.

The options file as well as the xml-file can be modified in a multitude of ways to fit specific needs for individual docking runs.

1. Preparing the Ligand

First of all, an sdf-file of the ligand is necessary. This can be downloaded from the PDB or basically any chem-database. To create the conformer library, we used the conformer generator tool of Meiler lab’s BioChemicalLibrary (BCL), which is freely available for academic users. If you installed this on your system, the library can be created by using
 
~path_to_bcl/bcl.exe molecule:ConformerGenerator -ensemble_filenames ligand.sdf -conformers_single_file ligand_conformers.sdf 

which will give you an sdf-file which contains all possible conformations of the ligand. Now you can create the ligand.params-file with Rosettas molfile_to_params.py script:
 
~path_to_Rosetta/main/source/scripts/python/public/molfile_to_params.py -n ligand -p ligand --conformers-in-one-file ligand_conformers.sdf 
which will give you the ligand_conformers.pdb-file, the ligand.pdb-file as well as the ligand.params-file.

2. Preparing the Enzyme

With Rosetta’s cleanpdb.py-script, you can easily fetch the enzyme structure directly from the PDB with only the necessary information if you know the enzymes pbd-code. The ‘A’-option tells the script to process only the enzymes A-chain.
 
~path_to_Rosetta/Software/rosetta/tools/protein_tools/scripts/clean_pdb.py enzyme-code A 

3. Options

To be able to find the Input files, the xml-file and other information, Rosetta needs an options-textfile. Since the possibilities to specify certain options are endless, we will only display the options file that we used. For further information on the possible options, please read the Rosetta Documentation.
 
#s option imports the protein and ligand PDB structures 
#extra_res_fa option imports the parameters for the ligand 
#path: specifies the output-directory(if none is given, the structures will be created in the working directory) 
#nstruct: specify the number of structures that you want to create 

-in 
        -file 
                -s 'ligand_A.pdb ligand_4.pdb' 
                -extra_res_fa ligand.params 
-out 
        -path 
                -pdb ~/path_to_output 
-nstruct number_of_structures 

#the packing options allow Rosetta to sample additional rotamers 
#protein sidechain angles chi 1 (ex1) and chi 2 (ex2) 
#no_optH false tells Rosetta to optimize hydrogen placements 
#flip_HNQ tells Rosetta to consider HIS,ASN,GLN hydrogen flips 
#ignore_ligand_chi prevents Rosetta from adding additional ligand rotamer 

-packing 
        -ex1 
        -ex2 
        -no_optH false 
        -flip_HNQ true 
        -ignore_ligand_chi true 

#parser:protocol locates the XML file for RosettaScripts 
 
-parser 
        -protocol dock.xml 

#jd2: only needed if you are using mpi to parallelize your simulations   

-jd2 
        -mpi_work_partition_job_distributor 

#overwrite allows Rosetta to write over previous structures and scores 

-overwrite 

#This flag restores certain parameters to previously published values 

-mistakes 
        -restore_pre_talaris_2013_behavior true 

4. Xml-file

This is the heart of the docking simulation: the script that tells Rosetta which actions to perform. Similar to the options-file you can alter the xml-file to fit your individual simulations and we will only display the file that we used.
 

<ROSETTASCRIPTS> 
               <SCOREFXNS>
                        <ScoreFunction name="ligand_soft_rep" weights="ligand_soft_rep"> 
                        </ScoreFunction> 
                        <ScoreFunction name="hard_rep" weights="ligand"> 
                        </ScoreFunction> 
                        </SCOREFXNS>
               <LIGAND_AREAS> 
                        <LigandArea name="inhibitor_dock_sc" chain="X" cutoff="6.0" add_nbr_radius="true" all_atom_mode="false"/> 
                        <LigandArea name="inhibitor_final_sc" chain="X" cutoff="6.0" add_nbr_radius="true" all_atom_mode="false"/>
                        <LigandArea name="inhibitor_final_bb" chain="X" cutoff="7.0" add_nbr_radius="false" all_atom_mode="true" Calpha_restraints="0.3"/> 
               </LIGAND_AREAS>
               <INTERFACE_BUILDERS>
                        <InterfaceBuilder name="side_chain_for_docking" ligand_areas="inhibitor_dock_sc"/> 
                        <InterfaceBuilder name="side_chain_for_final" ligand_areas="inhibitor_final_sc"/>      
                        <InterfaceBuilder name="backbone" ligand_areas="inhibitor_final_bb" extension_window="3"/> 
               </INTERFACE_BUILDERS>
               <MOVEMAP_BUILDERS> 
                        <MoveMapBuilder name="docking" sc_interface="side_chain_for_docking" minimize_water="false"/> 
                        <MoveMapBuilder name="final" sc_interface="side_chain_for_final" bb_interface="backbone" minimize_water="false"/>
               </MOVEMAP_BUILDERS>
               <SCORINGGRIDS ligand_chain="X" width="15"> 
                        <ClassicGrid grid_name="classic" weight="1.0"/>
               </SCORINGGRIDS>
               <MOVERS> 
                        <Transform name="transform" chain="X" box_size="7.0" move_distance="0.2" angle="20" cycles="500" repeats="1" temperature="5"/> 
                        <HighResDocker name="high_res_docker" cycles="6" repack_every_Nth="3" scorefxn="ligand_soft_rep" movemap_builder="docking"/> 
                        <FinalMinimizer name="final" scorefxn="hard_rep" movemap_builder="final"/> 
                        <InterfaceScoreCalculator name="interface_score_calculator" chains="X" scorefxn="hard_rep"/> 
               </MOVERS>
               <PROTOCOLS> 
                        <Add mover_name="transform"/>
                        <Add mover_name="high_res_docker"/>
                        <Add mover_name="final"/> 
                        <Add mover_name="interface_score_calculator"/> 
               </PROTOCOLS> 
</ROSETTASCRIPTS> 

5. Docking

Before you jump into the actual simulation, make sure to double check all your inputs by i.e. opening the structures with a molecular visualization software such as Pymol or Chimera. Be aware of the fact that Rosetta will not drastically alter the coordinates of the ligand, so it has to already be at the enzymes site that you want to dock it to. If this is not the case, you can simply adjust the coordinates within the software and export the new ligand file into your working directory.
Now that all the preparations are done, you can initiate your docking with:
 
~path_to_Rosetta/main/source/bin/rosetta_scripts.linuxgccrelease @options.txt 

Or if you are using mpi:
 
mpirun –number_of_cores ~path_to_Rosetta/main/source/bin/rosetta_scripts.mpi.linuxgccrelease @options.txt 

6. Scores

The docking simulation will not only give you pdb-files of the form: enzyme-code_A_ligand_number-of-struc.pdb but also a score.sc file which contains a list of every structure as well as the respective scores. The meaning of the scores can be examined at the Rosetta documentation.