Team:UC San Diego/Data



Data Aquisition: Raw whole-transcriptome RNA-sequencing data for tumor tissue were downloaded from the TCGA legacy archive ( for 428 head and neck squamous cell carcinoma (HNSC) patients and 44 adjacent solid normal tissue samples.

Extraction of microbial reads and calculation of differential microbial abundance: Microbial read counts for each sample will be inferred using the PathoScope2 computational framework to align RNA-seq data to reference genomes containing reads for bacterial taxa. Pathoscope generates two output measures quantifying the amount of bacterial species present in samples. One measure, best guess, quantifies the relative abundance of each species, expressed as a percentage. The other measure, best hit, signifies the absolute integer count of each species in the sequencing data. Using the best hit matrices, we used the Kruskal-Wallis analysis test to determine microbes that are differentially abundant in HNSC samples vs adjacent normal samples.

Fig 1

Figure 1: Phyla of bacteria found to be differentially abundant in HNSCC samples over healthy normals.

Association of differentially abundant microbes to survival : Survival analyses were performed while using the Kaplan–Meier Model, with microbe expression being designated as a binary variable based on presence or absence of microbe in tumor samples. Univariate Cox regression analysis was used to identify candidates that were significantly associated with patient survival. (p < 0.05).

Fig 2

Figure 2: Kaplan-Meyer Graph utilizing Cox regression analysis. In this graph, it indicates that for this microbe (Agrobacterium tumefaciens), the more abundant the microbe, the better the patient's survival.

Gene Set Enrichment Analysis: GSEA was utilized to identify microbes associated with the dysregulation of biological pathways and signatures, which are obtained from the Molecular Signature Database (MSigDB). Specifically, canonical pathways (C2), oncogenic signatures (C6), and immunologic signatures (C7) were examined.


Figure 3: Example of a GSEA plot. In this figure, there is a positive correlation between the upregulation of the macrophage deactivation pathway and the presence of this particular microbe.

Compiling a panel of tumor suppressing microbes: Ultimately, we gathered a panel of 11 tumor suppressing microbes after the series of these computational analyses. These microbes were all significantly differentially abundant in normal samples over HNSC samples, statistically significant to be good for patient survival via the Cox regression analysis, as well as positively correlated with tumor suppressing and immune-inducing pathways.


Figure 4: Example of some tumor suppressing pathways and some of their genes.


  • [1] Francis, F.E., et al., “Pathoscope: Species identification and strain attribution with unassembled sequencing data” Genome Res, 2013
  • [2] Subramanian, A., et al., “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.” Proc Natl Acad Sci USA, 2005