Team:IIT Roorkee/ML Overview

<!DOCTYPE html> PYOMANCER

Overview

DARG
(Detection of Antibiotic Resistant Genes)



About

Antibiotic resistance is a major threat to mankind with an estimated 10 million deaths annually by the year 2050. Acinetobacter baumannii is the most critical priority pathogen and is resistant to most of the antibiotics tested (>70% approx.) except to colistin (ICMR Report). The majority of focus on overcoming the resistance of A. baumannii has been on understanding biological mechanisms using wet-lab analysis. But, the availability of public datasets makes it crucial to carry analysis using computational tools especially machine learning algorithms.


Brief Description

We have developed a machine learning approach called DARG, i.e. Detection of Antibiotic Resistant Genes, utilising a class of machine learning algorithms called Support Vector Machines (SVMs) [1] on allele-strain database PATRIC [2]. The approach is inspired by previous work in literature for Mycobacterium tuberculosis [3] and Escherichia coli, Pseudomonas aeruginosa, and Staphylococcus aureus [4]. The genomes of A. baumanii (target pathogen) strains are annotated using Prokka software [5] to develop pan-genome. We used SVMs to predict phenotype of strains using knowledge of genes present in a particular strain and interpreted the trained SVM model to calculate the weights given to different alleles while predicting resistance phenotype of strains. Following which, we selected top alleles based on these weights and performed correlation and mutation analysis, analysing the impact of mutation on resistance phenotype of strains.


Requirements

  • System having python and other necessary libraries, numpy, pandas, matplotlib and sklearn
  • Ability to run and use Prokka software
  • Knowledge of PATRIC database
  • Basic understanding of Support Vector Machines and their working
  • For more information, check our github link


Brief Results

Implementation of our approach yielded three key results.

  • A list of top genes conferring resistance in A. baumannii strains to several well-known antibiotics
  • Correlation analysis between top alleles for understanding impact of mutation in one gene on the other
  • Mutational analysis, analysing the impact of mutation in genes on the resistant phenotype of strain


References

  1. Cortes, C., Vapnik, V. 1995. Support-Vector Networks. Machine Learning, 20, pp.273-297
  2. Wattam, A. R. et al. 2013. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Research, Database issue (42), pp.581-D591
  3. Kavvas, S. E. et al. 2018. Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance. Nature Communications, 9(4306)
  4. Hyun, J. C. et al. 2020. Machine learning with random subspace ensembles identifies antimicrobial resistance determinants from pan-genomes of three pathogens. PLOS Computational Biology, 16(3), pp.e1007608
  5. Seemann, T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics, 30(14):2068-9