Abstract

Biosynthetic gene clusters (BGC’s) are ordered groupings of genes which work collectively to produce unique forms of secondary metabolites. Previous work with BGC’s in filamentous fungi has led to the discovery of important metabolites like penicillin. Most BGC’s contain a cluster-specific transcription factor binding site (TFBS). Identification of this TFBS is crucial for expanding the genetic toolkit in filamentous fungi to aid in secondary metabolites discovery. Here, we propose a wholly computational approach for the identification of TFBS in BGC’s, using publicly available motif finding software, fungal genomes, and our custom python programs. We show validation of our computational approach using two well characterized TFBS from the genus Aspergillus. Additionally, we propose a novel approach for the experimental verification of our computationally identified BGC’s.

Authors

Lily Karim¹, Daniel LaBolle¹, Jason Hu¹, Noah Husband¹, Paul Osuna-Kleist¹, Alejandra Wilson¹, Marc Facciotti², Ian Korf², Andrew Yao², Mark Winey^§

¹iGEM Student Team Member, ²iGEM Team Mentor/Primary PI, ^§Faculty Sponsor, College of Biological Sciences Dean, University of California, Davis.

Inspiration

Filamentous fungi possess a special class of genetic components known as Biosynthetic Gene Clusters (BGCs). These BGCs have a few unique properties which make them worth exploring:

They have an evolutionary history of horizontal gene transfer, causing many filamentous fungi species to have orthologous clusters. [1]
The transcription factor genes regulating the BGC tend to be within the cluster as well. [2]
They encode important secondary metabolites, such as antibiotics and environmental toxins. [3]

Filamentous fungi genomes are understudied. In fact, an analysis of past iGEM teams showed that only 7.8% of them worked with fungi, and only 0.05% worked with fungi other than Saccharomyces cerevisiae.

As bacteria continue to evolve antibiotic resistance, the race to discover new secondary metabolites becomes a more crucial endeavor. Meanwhile, the catalog of genetic parts for studying filamentous fungi lacks the diversity needed to be able to discover new secondary metabolites. [4]

Introduction

Fungal secondary metabolites are produced by complicated pathways that require many genetic parts native to filamentous fungi genomes. Not all of these parts are included in the BGCs. Therefore, moving a functional BGC from filamentous fungi to another organism requires the movement of many other parts along with it, making it very difficult to study these fungal components outside of filamentous fungi [5]. Therefore, it is pertinent to the field of synthetic biology to build up filamentous fungi as a synthetic biology system, by expanding it’s catalog of genetic parts.
In order to find and characterize transcription factors and their respective binding sites, we utilize computational methods. Specifically, we developed bioinformatics software tools to optimize the parameters of a common motif-finding software, MEME.
MEME is part of the MEME suite, which is a collection of bioinformatics software designed to help scientists locate specific motifs from DNA sequences. MEME works by taking in a FASTA file, and outputting a position weight matrix (PWM) that represents a specific motif. After running multiple verification tests to find the best parameters for MEME (things like promoter length, amount of genes per cluster, etc.), we look in the immense library of fungal genomes provided to us by the Joint Genome Institute. Then, using our optimized pipeline, we determine a set of our most significant motifs within the genus Aspergillus. We chose this genus as a first trial, because it is currently one of the more well studied genuses in biology. Also, it contains an easily cultured BSL 1 species, Aspergillus niger.

Methodology

memescape: based on user specifications, memescape will generate a file of synthetic promoter sequences with closely specified parameters and a set of search parameters MEME will use to scan the file for motifs. The .csv output file from memescape will reveal the specific sets of file and search parameters that result in the best motif search results. These parameters are collected and used in the motifomatic software program in the UC-Davis software package.
Collecting real data: 35 gigabytes of raw genomic data provided by the Joint Genome Institute were parsed into smaller, more easily processed components by first selecting only Aspergillus genomes. The annotated clusters in each genome were identified by genomic coordinates provided and for each cluster, 1000 base pairs of promoter sequence were extracted upstream of every gene and formatted into a single fasta file.
motifomatic: To extract MEME outputs using the intended search parameters, this program accepts the promoter files generated from the JGI promoter files we generated and accepts parameters determined in memescape. The program can trim promoter sequences and then direct MEME search before extracting position weight matrix, consensus sequence, and indications of search success such as number of sites and E-value.

Results and Conclusion

Proof of Concept: Before testing our workflow on clusters with unknown putative binding sites, proof of concept needed to be established by accurately finding a known binding site from a well researched cluster.

Aspergillus nidulans sterigmatocystin/aflatoxin cluster

55 genes
Putative TF: Aflr
Consensus Sequence of Binding site: 'TCGSWNNSCGR'

Aspergillus fumigatus gliotoxin cluster:

24 genes
Putative TF: GliZ
Consensus Sequence of Binding Site: 'TCGGNNNCCGA'

Proof of concept established confidence in the chosen search parameters and promoter specifications below:

MEME model	Promoter size	Markov model	Min width	Max width	Number of Motifs
'Zoops' (zero or one per sequence)	400	0	11	16	1

12 initial clusters were chosen across multiple aspergillus species primarily for their large cluster size. The clusters and their results are as follows:

Cluster ID	Species	JGI Nickname	Cluster Type	Candidate TF (JGI Protein ID)	# Genes	e-value	Consensus Sequence	# Sites
1126949	A. niger	Aspni7	NRPS	1126974, 1187367,1223042	57	2200	GMCAGCCRMGRASW	12
1585	A. bombycis	Aspbom1	Terpene	1586, 1590, 1599, 1607, 1627, 1646	72	0.00013	WTYTHTTCYTTKTYY	28
308983	A. phoenicis	Aspph1	Hybrid	258609, 241529, 339635	64	3200	ABKTCTGATKTCYKS	7
363683	A. piperis	Asppip1	PKS non-reducing	500956, 501041,	60	48	AGCAKSGWKGGGGKR	13
3694*	A. nidulans	Aspnid1	PKS-like	3700	55	0.0062	CTCGSTGRCCG	16
375915	A. steynii	Aspste1	PKS non-reducing	433424, 443651, 487283, 487301, 453185, 453189, 471414	82	2.10E-06	TYTKBMHTNTYHYYC	58
42025	A. aculeatis	Aspac1	NRPS	42057	55	2.20E-09	YYTCHYYCCYYYYCT	36
520563	A. versicolor	Aspve1	NRPS	26379, 81201, 39012, 69969	61	8.1	AAMCMGHCCYCAWSG	19
763569	A. calidoustis	Aspcal1	DMAT	763580, 763602,	72	130	ATCCCCGMBCTGC	11
768825	A.calidoustis	Aspcal1	PKS non-reducing	768832, 768861, 768862, 768870	60	86	VTATNBAVCACAAAA	16
8033*	A. fumigatus	Aspfu1	NRPS	8035	24	9.90E-10	TKYTCGGAKGCCGA	14
8528	A. bombycis	Aspbom1	PKS non-reducing	8562, 8565	52	0.068	TTTMHYYTTTKYTWH	21
8622	A. niger	Aspni_NRRL3_1	Terpene	8625, 8635, 8652, 8671, 8673	60	0.013	CDACCCCCAYYCTTG	10
9827	A. niger	Aspni_NRRL3_1	PKS non-reducing	9817, 9831, 9840, 9846	57	0.00034	CYRTYYCCKCCTYCC	20

* clusters used in proof of concept

Future Directions

Experimental Design

To validate our binding sites experimentally in addition to computationally, a proposed experimental procedure was developed to be acted on in the event that our team can get access to a lab. It details two plasmids containing two expression systems that would be transformed into cultures of the filamentous fungus Aspergillus niger. Our first expression system, on the Tet construct, would provide inducible expression of the transcription factor associated with the binding site predicted by MEME [10]. The reporter construct would provide constitutive expression of the reporter gene sgfp, expression that would be altered only when expression of the transcription factor of interest is turned on. This predicted change in expression would be achieved by the MEME-outputted binding site, which would be inserted into the constitutive ToxA promoter in front of sgfp [11]. This predicted binding site would be placed 10 bp downstream of the ToxA TATA box [12]. If the transcription factor of interest does indeed bind to that site, it would hopefully block the ToxA transcriptional machinery from continuing to the sgfp start codon. At any rate, a change in sgfp expression correlating to the concentration of the transcription factor of interest would support our hypothesis that the binding sites predicted by MEME are indeed correct.

Computational Methods

In addition to validating the current discovered motifs with wet lab experimentation, we intend to create a more comprehensive and sophisticated method of scoring discovered motifs to further improve the accuracy of our prediction and characterize the usefulness of the site.
Our intent is to scale up and automate the workflow to process these thousands of clusters in an accurate and efficient way. Part of this process requires characterizing which clusters are not ideal for the proposed workflow and identifying orthologous clusters in other species that can be consolidated into a single longer promoter file for better motif searches. Future focus will be in identifying and consolidating cluster orthologs to increase the depth and quality of promoter files.

References

[1] Rokas, Antonis, et al. “Biosynthetic Gene Clusters and the Evolution of Fungal Chemodiversity.” Natural Product Reports, vol. 37, no. 7, Royal Society of Chemistry, 2020, pp. 868–78, doi:10.1039/c9np00045c.
[2] Brakhage, Axel A. “Regulation of Fungal Secondary Metabolism.” Nature Reviews Microbiology, vol. 11, no. 1, Jan. 2013, pp. 21–32, doi:10.1038/nrmicro2916.
[3] Keller, Nancy P. “Fungal Secondary Metabolism: Regulation, Function and Drug Discovery.” Nature Reviews Microbiology, vol. 17, no. 3, Springer US, 2019, pp. 167–80, doi:10.1038/s41579-018-0121-1.
[4] Livermore, David M. “The Need for New Antibiotics.” Clinical Microbiology and Infection, Supplement, vol. 10, no. 4, European Society of Clinical Infectious Diseases, 2004, pp. 1–9, doi:10.1111/j.1465-0691.2004.1004.x.
[5] Awan, Ali R., et al. “Biosynthesis of the Antibiotic Nonribosomal Peptide Penicillin in Baker’s Yeast.” Nature Communications, vol. 8, no. May, Nature Publishing Group, 2017, pp. 1–8, doi:10.1038/ncomms15202.
[7] Ehrlich, K C et al. “Binding of the C6-zinc cluster protein, AFLR, to the promoters of aflatoxin pathway biosynthesis genes in Aspergillus parasiticus.” Gene vol. 230,2 (1999): 249-57. doi:10.1016/s0378-1119(99)00075-x
[8] Schoberle, Taylor J et al. “A novel C2H2 transcription factor that regulates gliA expression interdependently with GliZ in Aspergillus fumigatus.” PLoS genetics vol. 10,5 e1004336. 1 May. 2014, doi:10.1371/journal.pgen.1004336
[9] Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. [pdf]
[10] Vogt, Keith, et al. “Doxycycline-Regulated Gene Expression in the Opportunistic Fungal Pathogen Aspergillus Fumigatus.” BMC Microbiology, vol. 5, 2005, pp. 1–11, doi:10.1186/1471-2180-5-1.
[11] Lorang, J. M., et al. “Green Fluorescent Protein Is Lighting Up Fungal Biology.” Applied and Environmental Microbiology, vol. 67, no. 5, May 2001, p. 1987, doi:10.1128/AEM.67.5.1987-1994.2001.
[12] Ciuffetti, L M et al. “A single gene encodes a selective toxin causal to the development of tan spot of wheat.” The Plant cell vol. 9,2 (1997): 135-44. doi:10.1105/tpc.9.2.135

Acknowledgements

UC Davis College of Biological Sciences Dean Winey for his generous donation
UC Davis Genome Center/Marc Facciotti for their generous donation
Mr. Mayer and Mrs. McDowell for hosting our presentations at Nevada Union High School
Dr. Sirulnik for hosting our presentation at Saddleback College
Dr. Asaf Salamov and Dr. Igor Grigoriev of JGI for allowing us to work with their fungal genomes and for answering our questions
Dr. Mark Yarborough for speaking with us about bioethics
Dr. C. Titus Brown for speaking with us about computational ethics
Ms. Trina Kleist for speaking with us about effective communication for our science communication medal requirement
Dr. Vasavada and Dr. Pierce of Marrone BioInnovations for discussing industry practices with us
Dr. Amanda Fischer of Novozymes for answering our questions about industry and experimental design
Cooper Houston of Fever Boys for creating the music in our presentation video

Team:UC Davis/Poster

Abstract

Inspiration

Introduction

Methodology

Results and Conclusion

Future Directions

References and Acknowledgements

Abstract

Authors

Inspiration

Introduction

Methodology

Results and Conclusion

Aspergillus nidulans sterigmatocystin/aflatoxin cluster

Aspergillus fumigatus gliotoxin cluster:

Future Directions

Experimental Design

Computational Methods

References

Acknowledgements