Abstract
Authors
Inspiration
- Filamentous fungi possess a special class of genetic components known as Biosynthetic Gene Clusters (BGCs). These BGCs have a few unique properties which make them worth exploring:
- They have an evolutionary history of horizontal gene transfer, causing many filamentous fungi species to have orthologous clusters. [1]
- The transcription factor genes regulating the BGC tend to be within the cluster as well. [2]
- They encode important secondary metabolites, such as antibiotics and environmental toxins. [3]
- Filamentous fungi genomes are understudied. In fact, an analysis of past iGEM teams showed that only 7.8% of them worked with fungi, and only 0.05% worked with fungi other than Saccharomyces cerevisiae.
- As bacteria continue to evolve antibiotic resistance, the race to discover new secondary metabolites becomes a more crucial endeavor. Meanwhile, the catalog of genetic parts for studying filamentous fungi lacks the diversity needed to be able to discover new secondary metabolites. [4]
Introduction
- Fungal secondary metabolites are produced by complicated pathways that require many genetic parts native to filamentous fungi genomes. Not all of these parts are included in the BGCs. Therefore, moving a functional BGC from filamentous fungi to another organism requires the movement of many other parts along with it, making it very difficult to study these fungal components outside of filamentous fungi [5]. Therefore, it is pertinent to the field of synthetic biology to build up filamentous fungi as a synthetic biology system, by expanding it’s catalog of genetic parts.
- In order to find and characterize transcription factors and their respective binding sites, we utilize computational methods. Specifically, we developed bioinformatics software tools to optimize the parameters of a common motif-finding software, MEME.
- MEME is part of the MEME suite, which is a collection of bioinformatics software designed to help scientists locate specific motifs from DNA sequences. MEME works by taking in a FASTA file, and outputting a position weight matrix (PWM) that represents a specific motif. After running multiple verification tests to find the best parameters for MEME (things like promoter length, amount of genes per cluster, etc.), we look in the immense library of fungal genomes provided to us by the Joint Genome Institute. Then, using our optimized pipeline, we determine a set of our most significant motifs within the genus Aspergillus. We chose this genus as a first trial, because it is currently one of the more well studied genuses in biology. Also, it contains an easily cultured BSL 1 species, Aspergillus niger.
Methodology
- memescape: based on user specifications, memescape will generate a file of synthetic promoter sequences with closely specified parameters and a set of search parameters MEME will use to scan the file for motifs. The .csv output file from memescape will reveal the specific sets of file and search parameters that result in the best motif search results. These parameters are collected and used in the motifomatic software program in the UC-Davis software package.
- Collecting real data: 35 gigabytes of raw genomic data provided by the Joint Genome Institute were parsed into smaller, more easily processed components by first selecting only Aspergillus genomes. The annotated clusters in each genome were identified by genomic coordinates provided and for each cluster, 1000 base pairs of promoter sequence were extracted upstream of every gene and formatted into a single fasta file.
- motifomatic: To extract MEME outputs using the intended search parameters, this program accepts the promoter files generated from the JGI promoter files we generated and accepts parameters determined in memescape. The program can trim promoter sequences and then direct MEME search before extracting position weight matrix, consensus sequence, and indications of search success such as number of sites and E-value.
Results and Conclusion
Proof of Concept: Before testing our workflow on clusters with unknown putative binding sites, proof of concept needed to be established by accurately finding a known binding site from a well researched cluster.
Aspergillus nidulans sterigmatocystin/aflatoxin cluster
- 55 genes
- Putative TF: Aflr
- Consensus Sequence of Binding site: 'TCGSWNNSCGR'
Aspergillus fumigatus gliotoxin cluster:
- 24 genes
- Putative TF: GliZ
- Consensus Sequence of Binding Site: 'TCGGNNNCCGA'
Proof of concept established confidence in the chosen search parameters and promoter specifications below:
MEME model | Promoter size | Markov model | Min width | Max width | Number of Motifs |
---|---|---|---|---|---|
'Zoops' (zero or one per sequence) | 400 | 0 | 11 | 16 | 1 |
12 initial clusters were chosen across multiple aspergillus species primarily for their large cluster size. The clusters and their results are as follows:
Cluster ID | Species | JGI Nickname | Cluster Type | Candidate TF (JGI Protein ID) | # Genes | e-value | Consensus Sequence | # Sites |
---|---|---|---|---|---|---|---|---|
1126949 | A. niger | Aspni7 | NRPS | 1126974, 1187367,1223042 | 57 | 2200 | GMCAGCCRMGRASW | 12 |
1585 | A. bombycis | Aspbom1 | Terpene | 1586, 1590, 1599, 1607, 1627, 1646 | 72 | 0.00013 | WTYTHTTCYTTKTYY | 28 |
308983 | A. phoenicis | Aspph1 | Hybrid | 258609, 241529, 339635 | 64 | 3200 | ABKTCTGATKTCYKS | 7 |
363683 | A. piperis | Asppip1 | PKS non-reducing | 500956, 501041, | 60 | 48 | AGCAKSGWKGGGGKR | 13 |
3694* | A. nidulans | Aspnid1 | PKS-like | 3700 | 55 | 0.0062 | CTCGSTGRCCG | 16 |
375915 | A. steynii | Aspste1 | PKS non-reducing | 433424, 443651, 487283, 487301, 453185, 453189, 471414 | 82 | 2.10E-06 | TYTKBMHTNTYHYYC | 58 |
42025 | A. aculeatis | Aspac1 | NRPS | 42057 | 55 | 2.20E-09 | YYTCHYYCCYYYYCT | 36 |
520563 | A. versicolor | Aspve1 | NRPS | 26379, 81201, 39012, 69969 | 61 | 8.1 | AAMCMGHCCYCAWSG | 19 |
763569 | A. calidoustis | Aspcal1 | DMAT | 763580, 763602, | 72 | 130 | ATCCCCGMBCTGC | 11 |
768825 | A.calidoustis | Aspcal1 | PKS non-reducing | 768832, 768861, 768862, 768870 | 60 | 86 | VTATNBAVCACAAAA | 16 |
8033* | A. fumigatus | Aspfu1 | NRPS | 8035 | 24 | 9.90E-10 | TKYTCGGAKGCCGA | 14 |
8528 | A. bombycis | Aspbom1 | PKS non-reducing | 8562, 8565 | 52 | 0.068 | TTTMHYYTTTKYTWH | 21 |
8622 | A. niger | Aspni_NRRL3_1 | Terpene | 8625, 8635, 8652, 8671, 8673 | 60 | 0.013 | CDACCCCCAYYCTTG | 10 |
9827 | A. niger | Aspni_NRRL3_1 | PKS non-reducing | 9817, 9831, 9840, 9846 | 57 | 0.00034 | CYRTYYCCKCCTYCC | 20 |
* clusters used in proof of concept
Future Directions
Experimental Design
- To validate our binding sites experimentally in addition to computationally, a proposed experimental procedure was developed to be acted on in the event that our team can get access to a lab. It details two plasmids containing two expression systems that would be transformed into cultures of the filamentous fungus Aspergillus niger. Our first expression system, on the Tet construct, would provide inducible expression of the transcription factor associated with the binding site predicted by MEME [10]. The reporter construct would provide constitutive expression of the reporter gene sgfp, expression that would be altered only when expression of the transcription factor of interest is turned on. This predicted change in expression would be achieved by the MEME-outputted binding site, which would be inserted into the constitutive ToxA promoter in front of sgfp [11]. This predicted binding site would be placed 10 bp downstream of the ToxA TATA box [12]. If the transcription factor of interest does indeed bind to that site, it would hopefully block the ToxA transcriptional machinery from continuing to the sgfp start codon. At any rate, a change in sgfp expression correlating to the concentration of the transcription factor of interest would support our hypothesis that the binding sites predicted by MEME are indeed correct.
Computational Methods
- In addition to validating the current discovered motifs with wet lab experimentation, we intend to create a more comprehensive and sophisticated method of scoring discovered motifs to further improve the accuracy of our prediction and characterize the usefulness of the site.
- Our intent is to scale up and automate the workflow to process these thousands of clusters in an accurate and efficient way. Part of this process requires characterizing which clusters are not ideal for the proposed workflow and identifying orthologous clusters in other species that can be consolidated into a single longer promoter file for better motif searches. Future focus will be in identifying and consolidating cluster orthologs to increase the depth and quality of promoter files.
References
- [1] Rokas, Antonis, et al. “Biosynthetic Gene Clusters and the Evolution of Fungal Chemodiversity.” Natural Product Reports, vol. 37, no. 7, Royal Society of Chemistry, 2020, pp. 868–78, doi:10.1039/c9np00045c.
- [2] Brakhage, Axel A. “Regulation of Fungal Secondary Metabolism.” Nature Reviews Microbiology, vol. 11, no. 1, Jan. 2013, pp. 21–32, doi:10.1038/nrmicro2916.
- [3] Keller, Nancy P. “Fungal Secondary Metabolism: Regulation, Function and Drug Discovery.” Nature Reviews Microbiology, vol. 17, no. 3, Springer US, 2019, pp. 167–80, doi:10.1038/s41579-018-0121-1.
- [4] Livermore, David M. “The Need for New Antibiotics.” Clinical Microbiology and Infection, Supplement, vol. 10, no. 4, European Society of Clinical Infectious Diseases, 2004, pp. 1–9, doi:10.1111/j.1465-0691.2004.1004.x.
- [5] Awan, Ali R., et al. “Biosynthesis of the Antibiotic Nonribosomal Peptide Penicillin in Baker’s Yeast.” Nature Communications, vol. 8, no. May, Nature Publishing Group, 2017, pp. 1–8, doi:10.1038/ncomms15202.
- [7] Ehrlich, K C et al. “Binding of the C6-zinc cluster protein, AFLR, to the promoters of aflatoxin pathway biosynthesis genes in Aspergillus parasiticus.” Gene vol. 230,2 (1999): 249-57. doi:10.1016/s0378-1119(99)00075-x
- [8] Schoberle, Taylor J et al. “A novel C2H2 transcription factor that regulates gliA expression interdependently with GliZ in Aspergillus fumigatus.” PLoS genetics vol. 10,5 e1004336. 1 May. 2014, doi:10.1371/journal.pgen.1004336
- [9] Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. [pdf]
- [10] Vogt, Keith, et al. “Doxycycline-Regulated Gene Expression in the Opportunistic Fungal Pathogen Aspergillus Fumigatus.” BMC Microbiology, vol. 5, 2005, pp. 1–11, doi:10.1186/1471-2180-5-1.
- [11] Lorang, J. M., et al. “Green Fluorescent Protein Is Lighting Up Fungal Biology.” Applied and Environmental Microbiology, vol. 67, no. 5, May 2001, p. 1987, doi:10.1128/AEM.67.5.1987-1994.2001.
- [12] Ciuffetti, L M et al. “A single gene encodes a selective toxin causal to the development of tan spot of wheat.” The Plant cell vol. 9,2 (1997): 135-44. doi:10.1105/tpc.9.2.135
Acknowledgements
- UC Davis College of Biological Sciences Dean Winey for his generous donation
- UC Davis Genome Center/Marc Facciotti for their generous donation
- Mr. Mayer and Mrs. McDowell for hosting our presentations at Nevada Union High School
- Dr. Sirulnik for hosting our presentation at Saddleback College
- Dr. Asaf Salamov and Dr. Igor Grigoriev of JGI for allowing us to work with their fungal genomes and for answering our questions
- Dr. Mark Yarborough for speaking with us about bioethics
- Dr. C. Titus Brown for speaking with us about computational ethics
- Ms. Trina Kleist for speaking with us about effective communication for our science communication medal requirement
- Dr. Vasavada and Dr. Pierce of Marrone BioInnovations for discussing industry practices with us
- Dr. Amanda Fischer of Novozymes for answering our questions about industry and experimental design
- Cooper Houston of Fever Boys for creating the music in our presentation video