Difference between revisions of "Team:IIT Roorkee/ML Overview"

 
(2 intermediate revisions by the same user not shown)
Line 135: Line 135:
 
         <br/><br/>
 
         <br/><br/>
  
         <h3 class="wiki-h wiki-h3">
+
          
          Inspiration:
+
          <a class="wiki-a" href="https://2018.igem.org/Team:Paris_Bettencourt/AMPDesigner_Overview">https://2018.igem.org/Team:Paris_Bettencourt/AMPDesigner_Overview</a>.
+
        </h3>
+
        <br/><br/>
+
 
+
 
         <h3 class="wiki-h wiki-h3 wiki-section-start" id="about">About</h3>
 
         <h3 class="wiki-h wiki-h3 wiki-section-start" id="about">About</h3>
 
         <p class="wiki-p">
 
         <p class="wiki-p">
 
           Antibiotic resistance is a major threat to mankind with an estimated 10 million deaths  
 
           Antibiotic resistance is a major threat to mankind with an estimated 10 million deaths  
 
           annually by the year 2050. <i>Acinetobacter baumannii</i> is the most critical priority pathogen  
 
           annually by the year 2050. <i>Acinetobacter baumannii</i> is the most critical priority pathogen  
           with 77% of its strains in India being carbapenem-resistant as per ICMR Report, 2017. The  
+
           and is resistant to most of the antibiotics tested (&gt;70% approx.) except
           majority of focus on overcoming the resistance of <i><i>A. Baumannii</i></i> has been on understanding  
+
          to colistin (<a class="wiki-a" href="https://main.icmr.nic.in/sites/default/files/reports/annual_report_amr_jan2017-18.pdf" target="_blank">ICMR Report</a>). The  
 +
           majority of focus on overcoming the resistance of <i><i>A. baumannii</i></i> has been on understanding  
 
           biological mechanisms using wet-lab analysis. But, the availability of public datasets  
 
           biological mechanisms using wet-lab analysis. But, the availability of public datasets  
 
           makes it crucial to carry analysis using computational tools especially machine learning algorithms.
 
           makes it crucial to carry analysis using computational tools especially machine learning algorithms.
 
         </p>
 
         </p>
         <br/><br/><br/><br/>
+
         <br/>
  
 
         <h3 class="wiki-h wiki-h3 wiki-section-start" id="description">Brief Description</h3>
 
         <h3 class="wiki-h wiki-h3 wiki-section-start" id="description">Brief Description</h3>
Line 158: Line 154:
 
           utilising a class of machine learning algorithms called Support Vector Machines (SVMs) [1]  
 
           utilising a class of machine learning algorithms called Support Vector Machines (SVMs) [1]  
 
           on allele-strain database PATRIC [2]. The approach is inspired by previous work in literature  
 
           on allele-strain database PATRIC [2]. The approach is inspired by previous work in literature  
           for Mycobacterium tuberculosis [3] and <i>Escherichia coli</i>, <i>Pseudomonas aeruginosa</i>, and Staphylococcus  
+
           for <i>Mycobacterium tuberculosis</i> [3] and <i>Escherichia coli</i>, <i>Pseudomonas aeruginosa</i>, and <i>Staphylococcus  
           aureus [4]. The genomes of A. baumanii (target pathogen) strains are annotated using Prokka software  
+
           aureus</i> [4]. The genomes of <i>A. baumanii</i> (target pathogen) strains are annotated using Prokka software  
 
           [5] to develop pan-genome. We used SVMs to predict phenotype of strains using knowledge of genes present  
 
           [5] to develop pan-genome. We used SVMs to predict phenotype of strains using knowledge of genes present  
 
           in a particular strain and interpreted the trained SVM model to calculate the weights given to different  
 
           in a particular strain and interpreted the trained SVM model to calculate the weights given to different  
 
           alleles while predicting resistance phenotype of strains. Following which, we selected top alleles based  
 
           alleles while predicting resistance phenotype of strains. Following which, we selected top alleles based  
           on these weights and performed correlation analysis along with analysing the impact of mutation on  
+
           on these weights and performed correlation and mutation analysis, analysing the impact of mutation on  
 
           resistance phenotype of strains.
 
           resistance phenotype of strains.
 
         </p>
 
         </p>
         <br/><br/><br/><br/>
+
         <br/>
 
+
  
 
         <h3 class="wiki-h wiki-h3 wiki-section-start" id="Requirements">Requirements</h3>
 
         <h3 class="wiki-h wiki-h3 wiki-section-start" id="Requirements">Requirements</h3>
Line 185: Line 180:
 
             </li>
 
             </li>
 
             <li>
 
             <li>
               For more information, check our github link
+
               For more information, check our <a class="wiki-a" href="https://github.com/TEAM-IGEM-IIT-ROORKEE/machine-learning-darg" target="_blank">github link</a>
 
             </li>
 
             </li>
 
           </ul>
 
           </ul>
 
         <p></p>
 
         <p></p>
         <br/><br/><br/><br/>
+
         <br/>
  
 
         <h3 class="wiki-h wiki-h3 wiki-section-start" id="Brief_Results">Brief Results</h3>
 
         <h3 class="wiki-h wiki-h3 wiki-section-start" id="Brief_Results">Brief Results</h3>
Line 196: Line 191:
 
           </p><ul class="wiki-ul">
 
           </p><ul class="wiki-ul">
 
             <li>
 
             <li>
               A list of top genes conferring resistance in <i><i>A. Baumannii</i></i> strains to several well-known antibiotics
+
               A list of top genes conferring resistance in <i><i>A. baumannii</i></i> strains to several well-known antibiotics
 
             </li>
 
             </li>
 
             <li>
 
             <li>
               Epistatic and correlation analysis between top alleles for understanding impact of mutation in one gene on the other
+
               Correlation analysis between top alleles for understanding impact of mutation in one gene on the other
 
             </li>
 
             </li>
 
             <li>
 
             <li>
               Analysed the impact of mutation in genes on the resistant phenotype of strain
+
               Mutational analysis, analysing the impact of mutation in genes on the resistant phenotype of strain
 
             </li>
 
             </li>
 
           </ul>
 
           </ul>
 
         <p></p>
 
         <p></p>
         <br/><br/><br/><br/>
+
         <br/>
 
+
  
 
         <h3 class="wiki-h wiki-h3 wiki-section-start" id="References">References</h3>
 
         <h3 class="wiki-h wiki-h3 wiki-section-start" id="References">References</h3>
 
         <p class="wiki-p">
 
         <p class="wiki-p">
 
           </p><ol class="wiki-ol">
 
           </p><ol class="wiki-ol">
 
+
             <li>Cortes, C., Vapnik, V. 1995. Support-Vector Networks. <i>Machine Learning</i>, 20, pp.273-297</li>
             <li>Support-Vector Networks, Cortes C., Vapnik V., Machine Learning, 20, 273-297 (1995), 
+
             <li>Wattam, A. R. et al. 2013. PATRIC, the bacterial bioinformatics database and analysis resource. <i>Nucleic Acids Research</i>, Database issue (42), pp.581-D591</li>
              <a class="wiki-a" href="https://doi.org/10.1007/BF00994018">DOI: 10.1007/BF00994018</a>
+
             <li>Kavvas, S. E. et al. 2018. Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome  
            </li>
+
               identifies genetic signatures of antibiotic resistance. <i>Nature Communications</i>, 9(4306)</li>
 
+
             <li>Hyun, J. C. et al. 2020. Machine learning with random subspace ensembles identifies antimicrobial  
             <li>PATRIC, the bacterial bioinformatics database and analysis resource, Wattam A. R. et al.,
+
               resistance determinants from pan-genomes of three pathogens. <i>PLOS Computational Biology</i>, 16(3), pp.e1007608</li>
              Nucleic Acids Research, Database issue (42), D581-D591 (2013),
+
             <li>Seemann, T. 2014. Prokka: rapid prokaryotic genome annotation. <i>Bioinformatics</i>, 30(14):2068-9</li>
              <a class="wiki-a" href="https://doi.org/10.1093/nar/gkt1099">DOI: 10.1093/nar/gkt1099</a>
+
         
            </li>
+
 
+
             <li>Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome  
+
               identifies genetic signatures of antibiotic resistance, Kavvas S. E. et al.,
+
              Nature Communications, 9:4306 (2018)
+
              <a class="wiki-a" href="https://doi.org/10.1038/s41467-018-06634-y">DOI: 10.1038/s41467-018-06634-y</a>
+
            </li>
+
 
+
             <li>Machine learning with random subspace ensembles identifies antimicrobial  
+
               resistance determinants from pan-genomes of three pathogens, PLOS  
+
              Computational Biology, 16(3): e1007608,
+
              <a class="wiki-a" href="https://doi.org/10.1371/journal.pcbi.1007608">DOI: 10.1371/journal.pcbi.1007608</a>
+
            </li>
+
 
+
             <li>Prokka: rapid prokaryotic genome annotation, Seemann T., Bioinformatics, 30(14):2068-9 (2014),
+
              <a class="wiki-a" href="https://doi.org/10.1093/bioinformatics/btu153">DOI: 10.1093/bioinformatics/btu153</a>
+
            </li>
+
 
+
 
           </ol>
 
           </ol>
         <p></p><br/><br/><br/><br/><br/>
+
         <p></p><br/>
  
  

Latest revision as of 17:21, 18 December 2020

<!DOCTYPE html> PYOMANCER

Overview

DARG
(Detection of Antibiotic Resistant Genes)



About

Antibiotic resistance is a major threat to mankind with an estimated 10 million deaths annually by the year 2050. Acinetobacter baumannii is the most critical priority pathogen and is resistant to most of the antibiotics tested (>70% approx.) except to colistin (ICMR Report). The majority of focus on overcoming the resistance of A. baumannii has been on understanding biological mechanisms using wet-lab analysis. But, the availability of public datasets makes it crucial to carry analysis using computational tools especially machine learning algorithms.


Brief Description

We have developed a machine learning approach called DARG, i.e. Detection of Antibiotic Resistant Genes, utilising a class of machine learning algorithms called Support Vector Machines (SVMs) [1] on allele-strain database PATRIC [2]. The approach is inspired by previous work in literature for Mycobacterium tuberculosis [3] and Escherichia coli, Pseudomonas aeruginosa, and Staphylococcus aureus [4]. The genomes of A. baumanii (target pathogen) strains are annotated using Prokka software [5] to develop pan-genome. We used SVMs to predict phenotype of strains using knowledge of genes present in a particular strain and interpreted the trained SVM model to calculate the weights given to different alleles while predicting resistance phenotype of strains. Following which, we selected top alleles based on these weights and performed correlation and mutation analysis, analysing the impact of mutation on resistance phenotype of strains.


Requirements

  • System having python and other necessary libraries, numpy, pandas, matplotlib and sklearn
  • Ability to run and use Prokka software
  • Knowledge of PATRIC database
  • Basic understanding of Support Vector Machines and their working
  • For more information, check our github link


Brief Results

Implementation of our approach yielded three key results.

  • A list of top genes conferring resistance in A. baumannii strains to several well-known antibiotics
  • Correlation analysis between top alleles for understanding impact of mutation in one gene on the other
  • Mutational analysis, analysing the impact of mutation in genes on the resistant phenotype of strain


References

  1. Cortes, C., Vapnik, V. 1995. Support-Vector Networks. Machine Learning, 20, pp.273-297
  2. Wattam, A. R. et al. 2013. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Research, Database issue (42), pp.581-D591
  3. Kavvas, S. E. et al. 2018. Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance. Nature Communications, 9(4306)
  4. Hyun, J. C. et al. 2020. Machine learning with random subspace ensembles identifies antimicrobial resistance determinants from pan-genomes of three pathogens. PLOS Computational Biology, 16(3), pp.e1007608
  5. Seemann, T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics, 30(14):2068-9