Project Inspiration and Description
The discovery of an intra-tumoral bacterial microbiome has been one of the most exciting discoveries in oncology in recent years. What is even more interesting about this discovery is that there are bacteria that have been found to upregulate tumor-suppressing gene pathways in the human cell. This discovery can potentially allow us to use these bacteria as tumor-suppressing vehicles in a form of targeted bacteriotherapy and open more doors in the world of precision medicine.
However in order to verify the effectiveness of such proposed methods, in vitro experiments must be conducted via the use of the organism in question. In order to procure such an organism there are two main options: either purchase cultures of the microbe in question or generate synthetic microbes. The latter option has many intriguing possibilities, especially if the process of creating a synthetic bacteria becomes cheaper and more efficient. In order to create such cheap and usable bacteria for clinical trials, we must first synthesize the genome of the bacteria in an efficient way.
One such way that we can do this is through the use of machine learning, or more specifically, with the use of a generative deep learning model. One of the most interesting models that has been proposed in recent years is that of Generative Adversarial Networks, a framework in which two neural networks, a generator network which generates synthetic data and a discriminator network which judges data to be real or synthetic, play a minimax game in which the generator network tries to fool the discriminator network into thinking that the synthetic generated data is in fact real.
One particular GAN framework that has been proposed is that of a Feedback Generative Adversarial Network (FBGAN) for DNA sequence generation. This framework makes use of a functional analyzer network that feeds back the most accurate sequences generated back to the generator network, thus increasing the chances of the generator generating a correct synthetic sequence. While this method has been found to work extremely well for sequences of less than 50 amino acids, the use of this method for long sequences, such as bacterial genomes has not been proven.
We propose a framework to use the FBGAN framework in order to generate genomes of microbes that we believe are tumor suppressing. However, before we even train our GAN, considerations must be made as to which microbes should have their genomes synthetically generated. In order to solve this problem as well as generalize this case, we used RNA-seq input data for Gene Set Enrichment Analysis as a training set to train a classification network which decides whether the microbe in question affects a sufficient number of tumor-suppressing pathways.
With this pipeline of classification of tumor-suppression into generation of novel synthetic bacterial genomes using a generative model, we hope that the amount of data that can be used to make synthetic bacteria dramatically increases. For future research, we would like to see if we can identify specific regions of the genome that affect tumor suppression and how common they are. If such a commonality is found then the generation of the genomic regions in question is an area that can see a lot of growth in the future. At the end of the day, we might be able to even engineer machine learning generated personalized bacteria that act as a vehicle tailored to the medical needs of the patient. The possibilities, in short, are endless.
References
- [1] https://arxiv.org/abs/1712.06148
- [2] https://arxiv.org/abs/1804.01694