Team:IIT Roorkee/Software

<!DOCTYPE html> PYOMANCER

Software

Illustration

Inspiration

Our project attempts to create a novel engineered protein, based on R-Type Pyocins to target Priority 1 ESKAPE Pathogen - A.Baumannii.
Our team did an extensive literature review to identify which exact strains of Pyocin and Bacteriophages could be engineered together.
This process was very cumbersome and time-consuming, and we realised there is a need for a software which could automate this whole process so that the first model could be made bioinformatically.
We came to know about TAU Israel’s TAIL OR SWIFT software which was designed to fulfil a similar purpose. As we set out to explore that software, we realised the whole process had to be done in individual steps by the user and the user needs to have a basic understanding of the science behind engineering this protein. We thus decided on improving and automating the software further so that anyone can model engineered proteins to fight resistant bacteria using Django Web Framework. After discussion with our Instructors, we decided to integrate REST APIs of Clustal Omega and JPred, which form the backbone of our software.

Project Description

TailScout is a user-friendly software that can be used either by some pharma industry which targets to combat AMR using Engineered pyocins or any other iGEM team to build upon their iGEM project. TailScout is basically designed to increase the user’s productivity by reducing their time in bioinformatically designing the engineered pyocin and quickly proceeding them towards the wet lab experiments.

By using our software, pharmacology labs and future igem teams will be able to produce stable 3D structures of the engineered proteins with detailed analysis.

The workflow is straightforward, and TailScout's simple GUI makes the complete process effortless:

Select Resistant Bacteria from the given list of Antimicrobial Resistant Pathogens.
TailScout detects the most lytic phage for the chosen bacteria.
The software produces the sequence of engineered pyocin.
The secondary structure file of the engineered pyocin is available for the user to download.

We have uploaded all of our source code on our github. You can easily access the codes from there and it will also help any future teams to further build upon our work.

Features

Gene Detection
TailScout uses a comprehensive tail fibers sequence database of Bacteriophages and a unique algorithm to detect the most lytic phage tail.
MSA and sequence of engineered pyocin
Tail Scout gives the sequence of the protein by combining the lytic area of the phage with R-type pyocin.
Secondary Structure Prediction
Secondary Structure prediction and detailed analysis of the stability of proteins.
OS Independent
Since tail scout is a web-based application we do not need any specific Operating System.

Comparison with Tail-or Swift

As we set out to explore TAIL-OR-SWIFT, we realised the whole process had to be done in individual steps by the user and the user needs to have a basic understanding of the science behind engineering this protein. Also, we realised Tail-or-Swift is a type of standalone application developed completely on Python GUI and its versions are only available for Windows and MAC and the secondary structure predicting function works by manually providing the input file.
So, after analyzing these features we worked upon these shortcomings and tried to design our web server accordingly which will work on any type of system and the prediction function will be automated where the user has to just give input at the starting of the process.

	Tail-Or-Swift	TailScout
Operating System	Windows MAC	Any
Application Type	Computer Software	Web Application
Secondary Str Prediction	Manual	Automatic

Why a Web application and not a Software

While discovering our inspiration project we decided to follow the approach of Design thinking and started revolving our thought process around empathising and prototyping.
Keeping in mind the different accessibility that users have issues with, we decided to make our software in the form of WebApplication so that It can be used on different platforms.
It is more beneficial than a regular software as software requires periodic updates and it also faces issues with regards to compatibility with different operating systems.

Why Django and not flask

While thinking about the scalability of the webapp, we chose Django over flask.
Django provides its own Django ORM (object-relational mapping) and uses data models, while Flask doesn’t have any data models at all.
Data models allow us to link database tables with classes in a programming language so they can work with models in the same way as database references.
And as for our gene detection algorithm, we had to use a phage database so we decided to go with Django.
Also, Flask generally is used for very small web applications but as in the future we plan to scale up the process we decided to go with django as in the end if we plan to scale up we would have to switch to django.

Algorithms

Gene Name Detection :

An algorithm which searches for genes with names which have ‘tail in them’ such as ‘tail fibre protein’ or ‘putative tail fibre’ etc.
For making this algorithm we first made a local database using Virus-Host DB and then a sequence file of all phage tails of some prominent Resistant bacterias.
We wanted a sequence file containing all the phage tail sequences of a particular resistant bacteria, so we divided our thought process into two sub processes:
- The first function will be used for getting the list of all the phages of that bacteria
- The second function will be used for getting their particular sequences

Clustal Omega Algorithm [1] :

An algorithm which aims to determine the similarity in the sequence of phages(talking about TailScout here).
This algorithm starts by computing a rough distance matrix between each pair of sequences based on pairwise sequence alignment scores.
These scores are computed using the pairwise alignment parameters for DNA and protein sequences.
Next, the algorithm uses the neighbour-joining method with midpoint rooting to create a guide tree, which is used to generate a global alignment.
The guide tree serves as a rough template for clades that tend to share insertion and deletion features.

JPred Jnet Neural network model for protein production [2] :

Jnet is a neural network prediction algorithm that works by applying multiple sequence alignments, alongside PSI-BLAST and HMM profiles.
Consensus techniques are applied that predict the final secondary structure more accurately.
Jnet can also predict 2 state solvent exposure at 25, 5, and 0% relative exposure. Positions, where the different prediction methods do not agree, are marked as no jury positions.
A separate network is applied for these positions, which improves the cross-validated accuracy. A reliability index indicates which residues are predicted with high confidence.

Flowchart

Testing the Bugs

For checking the bugs in our codes, we made a presentation giving the gist of the software and how to run a corresponding code. We reached out to some professors, professionals through our mentor and some seniors who helped us in debugging the codes and providing us with meaningful discussions paving the way for TAILSCOUT.

Next Phase

Plans for the Next Phase of Project

Increasing the numbers of Multiple Drug Resistant Bacteria in our phage library.
Result Analysis of Secondary and Tertiary structure of the fusion protein.
Incorporating protein modeling of the pyocin bacteriophage fusion protein.

References

Madeira F, Park YM, Lee J, et al. The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Research. 2019 Jul;47(W1) :W636-W641. DOI: 10.1093/nar/gkz268.
Drozdetskiy A, Cole C, Procter J, Barton GJ. JPred4: a protein secondary structure prediction server. Nucleic Acids Res. 2015 Jul 1;43(W1):W389-94. doi: 10.1093/nar/gkv332. Epub 2015 Apr 16. PMID: 25883141; PMCID: PMC4489285.