Our project has, from the start to finish, relied on the ability to simulate the infection of T7 Bacteriophage. PineTree, a stochastic gene expression simulator, has been previously optimized to explore T7 gene expression, and thus allows us to model T7 infection and the ability to draw conclusions based on the overall gene expression. To run simulations, genebank files were created and run by PineTree at a specified time limit to generate a gene expression profile. These profiles were then analyzed and used to generate conclusions on the properties, lysis time, burst size, and GFP expression, of wildtype T7 bacteriophage and mutant strands that we created.
Using Breseq to Generate T7 Bacteriophage Mutants
Breseq is an application used to apply mutations to the wildtype T7 bacteriophage genome via its gdtools toolkit. To engineer T7 bacteriophage, we created a gd file for each mutant that specifies what mutations we want to apply (insertion, deletion, replacements, or any combination of the three) to the T7 genome. The gd file is used by Breseq to output the mutated genome sequence in the form of a genbank file. This genbank file is then run by Pinetree to generate output (protein count, ribosome density, etc.) at each time point in the simulation. When creating these mutations, we followed a set of guidelines as follows:
- When removing genes, we decided to delete regions associated with each gene's CDS.
- For genes containing overlaps downstream of the CDS, we ended the deletion 30 bp upstream of the start of the overlapping gene.
- For genes with upstream overlaps, we started the deletion at the end of the previous gene.
Using these guidelines, we created several T7 bacteriophage genomes varying in the placement of holin and GFP genes. These genomes, encoded in their respective genebank files, were then simulated through Pinetree to gather their respective gene expression profile, which was subsequently analyzed for GFP expression and used to calculate a lysis time and burst size.
Speeding Up Data Collection through the Texas Advanced Computing Center
TACC (Texas Advanced Computing Center) is a supercomputer complex that provides computing resources and services to researchers across the United States. Within our project, we used TACC's computing services to run multiple simulations, at different seeds, of the single designed T7 Mutant. This allowed us to generate more data within a shorter period, rather than running each simulation individually.
After the Gene Expression profiles of our mutant T7 Bacteriophages were generated by Pinetree, we sought to analyze how our mutations affected the Phage's lysis time, burst size (progeny production), and GFP expressing capabilities. Analyzing the GFP expression of the T7 Mutant was quite simple as Pinetree directly outputs the quantity of GFP made during the infection process. However, to analyze lysis time and burst size, we had to design our own calculators that would interpret the data generated in the gene expression profile. Specifically, we used R studio to create and calibrate our lysis time/burst size calculators. To create our burst size calculator, we first determined the amount of each virion protein needed to make a complete virion. Based on this information, we made a function that inputs the output tsv file of Pinetree (which represents one infection within a cell) and calculates the minimum number of complete virions (meaning that the virion has all of the proteins it is supposed to contain). The burst size calculator also takes into account the lysis time, which determines where in the infection the cell should lyse, releasing phage progeny. To calibrate the lysis time calculator, we first determined the amount of holins and lysins at 1320 seconds for the wild type (which is the estimated lysis time for the wild type T7 in the Pinetree simulation). This determines how much holin and lysins are needed for an E.coli cell to lyse. Then, in a similar fashion to the burst size calculator, each mutant’s lysis time is the point at which holins and lysins get to this determined level within the PineTree simulation. Thus, by inputting a gene expression profile, from Pinetree, within these calculators we successfully derived the lysis time and burst size for respective runs