Proof of Concept
Validation with the aflatoxin biosynthetic gene cluster
- The Aflatoxin cluster in Aspergillus nidulans has a cluster-specific transcription factor AFLR, which is essential for activating aflatoxin expression. The AFLR binding site consensus sequence was discovered to be 5'-TCGSWNNSCGR-3, using an electromobility shift assay[2].
- As our proof of concept, we demonstrated that we could identify this binding site using only our computational pipeline.
- In our workflow, we first identified the Aspergillus nidulans aflatoxin cluster from among the 11,000 Aspergillus clusters in the JGI fungal sequence data. From our collection of promoter FASTA files (created by extracting 1,000 base pairs of upstream sequence and converting to plus strand), we identified the file for the aflatoxin cluster.
- We then used “motifomatic” to input the A. nidulans aflatoxin promoter file (3694.fa), along with our desired search parameters, into MEME.
- To match our chosen parameters for “motifomatic” we trim our promoters to 400 base pairs upstream of the start site. For this reason, it is essential that all promoter files are completely converted to positive-strand sequences. We set our default MEME model to ‘zoops’ because we do not expect every promoter in the file to contain a motif[2]. We only search for one motif in MEME, with the expectation that the highest scoring motif represents our putative cluster motif. We set the minimum motif width to 11, and the maximum to 15. This establishes an expected motif length, and prevents extra long and unrealistic motifs from being discovered by MEME. We also keep the default 0-order markov model setting, since it plays little to no role in the success of our synthetic promoter sequences in memescape. We expect that if our initial proof of concept results are not accurate we can adjust these first parameters as needed.
- In practice, however, adjustments proved unnecessary, because the motif discovered by “motifomatic” was nearly identical to the expected binding site of AFLR. This proved that our software was in fact capable of discovering the known binding site in silico.
- The expected consensus sequence was 5'-TCGSWNNSCGR-3’, where S=C/G, W=A/T, N=A/C/G/T, and R=A/G [2].
- Our discovered motif was 5’-CTCGSTGRCCG-3’.
- When the second letter of the expected consensus was aligned with the first letter of the found sequence, the two motifs were nearly identical, though uneven by one base pair. The consensus nucleotide at each position in the result matched the expected nucleotides for the same position. To further refine this proof of concept, one could reduce the trim parameter to 200 for a more precise motif.
Validation with Gliotoxin biosynthetic gene cluster
- To further test the ability of this workflow, a second validation cluster was chosen from literature and successfully used for proof of concept.
- The second successful discovery was the expected binding site for the transcription factor, GliZ, in the Aspergillus fumigatus Gliotoxin cluster. The proposed binding site for GliZ in this cluster is 5’-TCGGN3CCGA-3’[3]. With the same workflow and search parameters we used for aflatoxin, motifomatic ran MEME on the Gliotoxin promoters and extracted the output consensus sequence 5’-TKYTCGGAKGCCGA-3’.
- When the fourth base pair of the output sequence was aligned with the first base pair of the expected, the two motifs were completely identical, though the result was slightly longer.
- With two successful examples, we felt confident enough to then move toward discovering completely unknown binding sites.
References
- [1] Ehrlich, K. C., et al. “Binding of the C6-Zinc Cluster Protein, AFLR, to the Promoters of Aflatoxin Pathway Biosynthesis Genes in Aspergillus Parasiticus.” Gene, vol. 230, no. 2, 1999, pp. 249–57, doi:10.1016/S0378-1119(99)00075-X.
- [2] Schoberle, Taylor J., et al. “A Novel C2H2 Transcription Factor That Regulates GliA Expression Interdependently with GliZ in Aspergillus Fumigatus.” PLoS Genetics, vol. 10, no. 5, 2014, doi:10.1371/journal.pgen.1004336.