Team:IIT Roorkee/Software

<!DOCTYPE html> PYOMANCER

Software

Overview

Therapeutic-strategies targeting antibiotic resistance can be aimed at sensitizing pathogens to conventional antibiotics, or developing novel antimicrobials to treat infections.

We have developed a machine learning approach called DARG, i.e. Detection of Antibiotic Resistant Genes, utilising support vector machines-based algorithm. The algorithm uses pan-genome information of pathogen strains along with their resistant phenotype to find genes important for conferring antibiotic resistance. The resulting list of genes consisted of previously validated target-genes and novel hits uncovered by the algorithm. For perspective experiments, these genes can be knocked out as an attempt to make bacterial strain susceptible to a particular antibiotic and validate their function.

Click here to visit the Machine Learning Page

Alternatively, we developed TailScout, a webserver for assembly and secondary-structure prediction of novel Seekercins. Seekercins are novel protein-based antimicrobials inspired from R-Type Pyocins and bacteriophages, that can target resistant pathogens. TailScout helps researchers in designing Seekercins specific for antimicrobial resistant pathogens by detecting the best combination of Pyocin and bacteriophage-tail fiber. As a final result, the software produces the sequence of engineered Seekercin that can be readily ordered as DNA products for cloning and further experiments.
Since iGEM promotes open science practices, all the codes are available at Github, which makes it easier for future iGEM teams who wish to build upon our work and embed it in new workflows.

Visit TailScout Contribute

Inspiration

Our project attempts to create a novel engineered protein, based on R-Type Pyocins to target Priority 1 ESKAPE Pathogen - A. baumannii.
Our team did an extensive literature review to identify which exact strains of Pyocin and Bacteriophages could be engineered together.
This process was very cumbersome and time-consuming, and we realised there is a need for a software which could automate this whole process so that the first model could be made bioinformatically.
We came to know about TAU Israel’s TAIL OR SWIFT software which was designed to fulfil a similar purpose. As we set out to explore that software, we realised the whole process had to be done in individual steps by the user and the user needs to have a basic understanding of the science behind engineering this protein. We thus decided on improving and automating the software further so that anyone can model engineered proteins to fight resistant bacteria using Django Web Framework. After discussion with our Instructors, we decided to integrate REST APIs of Clustal Omega and JPred, which form the backbone of our software.

Project Description

TailScout is a user-friendly software that can be used either by some pharma industry which targets to combat AMR using Engineered pyocins or any other iGEM team to build upon their iGEM project. TailScout is basically designed to increase the user’s productivity by reducing their time in bioinformatically designing the engineered pyocin and quickly proceeding them towards the wet lab experiments.

By using our software, pharmacology labs and future igem teams will be able to produce stable 3D structures of the engineered proteins with detailed analysis.

The workflow is straightforward, and TailScout's simple GUI makes the complete process effortless:

Select Resistant Bacteria from the given list of Antimicrobial Resistant Pathogens.
TailScout detects the most lytic phage for the chosen bacteria.
The software produces the sequence of engineered pyocin.
The secondary structure file of the engineered pyocin is available for the user to download.

We have uploaded all of our source code on our github. You can easily access the codes from there and it will also help any future teams to further build upon our work.

Features

Tail Fiber Detection

TailScout uses a comprehensive tail fibers sequence database of Bacteriophages and provides us with the most lytic phage tail of that particular bacteria.

MSA and sequence of engineered pyocin

TailScout gives the sequence of the protein by combining the lytic area of the phage with R-type pyocin.

Secondary Structure Prediction

Secondary Structure prediction and detailed analysis of the stability of proteins.

OS Independent

Since TailScout is a web-based application we do not need any specific Operating System.

Comparison with Tail-or-Swift

As we set out to explore TAIL-OR-SWIFT, we realised the whole process had to be done in individual steps by the user and the user needs to have a basic understanding of the science behind engineering this protein. Also, we realised Tail-or-Swift is a type of standalone application developed completely on Python GUI and its versions are only available for Windows and MAC and the secondary structure predicting function works by manually providing the input file.
So, after analyzing these features we worked upon these shortcomings and tried to design our web server accordingly which will work on any type of system and the prediction function will be automated where the user has to just give input at the starting of the process.

	Tail-Or-Swift	TailScout
Operating System	Windows MAC	Any
Application Type	Computer Software	REST API + Web Based Frontend
Secondary Structure Prediction	Manual	Automatic

Why a Web application and not a Software

While discovering our inspiration project we decided to follow the approach of Design thinking and started revolving our thought process around empathising and prototyping.
Keeping in mind the different accessibility that users have issues with, we decided to make our software in the form of WebApplication so that It can be used on different platforms.
It is more beneficial than a regular software as software requires periodic updates and it also faces issues with regards to compatibility with different operating systems.

Why Django and not flask

While thinking about the scalability of the webapp, we chose Django over flask.
Django provides its own Django ORM (object-relational mapping) and uses data models, while Flask doesn’t have any data models at all.
Data models allow us to link database tables with classes in a programming language so they can work with models in the same way as database references.
And as for our tail fiber algorithm, we had to use a phage database so we decided to go with Django.
Also, Flask generally is used for very small web applications but as in the future we plan to scale up the process we decided to go with django as in the end if we plan to scale up we would have to switch to django.

Walkthrough Video

Algorithms

Tail Fiber Detection :

An algorithm which searches for genes with names which have ‘tail in them’ such as ‘tail fiber protein’ or ‘putative tail fiber’ etc.
For making this algorithm we first made a local database using Virus-Host DB and then a sequence file of all phage tails of some prominent Resistant bacterias.
We wanted a sequence file containing all the phage tail sequences of a particular resistant bacteria, so we divided our thought process into two sub processes:
- The first function will be used for getting the list of all the phages of that bacteria
- The second function will be used for getting their particular sequences

Clustal Omega Algorithm [1] :

An algorithm which aims to determine the similarity in the sequence of phages(talking about TailScout here).
This algorithm starts by computing a rough distance matrix between each pair of sequences based on pairwise sequence alignment scores.
These scores are computed using the pairwise alignment parameters for DNA and protein sequences.
Next, the algorithm uses the neighbour-joining method with midpoint rooting to create a guide tree, which is used to generate a global alignment.
The guide tree serves as a rough template for clades that tend to share insertion and deletion features.

JPred Jnet Neural network model for protein production [2] :

Jnet is a neural network prediction algorithm that works by applying multiple sequence alignments, alongside PSI-BLAST and HMM profiles.
Consensus techniques are applied that predict the final secondary structure more accurately.
Jnet can also predict 2 state solvent exposure at 25, 5, and 0% relative exposure. Positions, where the different prediction methods do not agree, are marked as no jury positions.
A separate network is applied for these positions, which improves the cross-validated accuracy. A reliability index indicates which residues are predicted with high confidence.

Backend Work Flow

For deploying our web server, our institute provided us with AWS assistance.

Testing the Bugs

For checking the bugs in our codes, we made a presentation giving the gist of the software and how to run a corresponding code. We reached out to some professors, professionals through our mentor and some seniors who helped us in debugging the codes and providing us with meaningful discussions paving the way for TailScout.

Future Directions

Increasing the numbers of Multiple Drug-Resistant Bacteria in our phage library.
Incorporating protein modelling of the pyocin bacteriophage fusion protein.
Result Analysis of Secondary and Tertiary structure of the fusion protein.
Improving the result and job status retrieval process.

References

Madeira F, Park YM, Lee J, et al. The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Research. 2019 Jul;47(W1) :W636-W641. DOI: 10.1093/nar/gkz268.
Drozdetskiy A, Cole C, Procter J, Barton GJ. JPred4: a protein secondary structure prediction server. Nucleic Acids Res. 2015 Jul 1;43(W1):W389-94. doi: 10.1093/nar/gkv332. Epub 2015 Apr 16. PMID: 25883141; PMCID: PMC4489285.