Team:Chalmers-Gothenburg/Metagenomics

iGEM Chalmers Gothenburg 2020

C:\No_Time_to_Waste\Modeling\METAGENOMICS.html>_

Metagenomics Introduction to metagenomics
Here, you can read about our metagenomics analysis!

We retrieved the raw data from several metagenomic experiments and analyzed it to find insights into microbial biodegradation of plastic. We decided to do a metagenomics analysis on readily available data because we believed that very likely there are microorganisms out there can degrade elastane, or similar plastics. Therefore, our goal was to find new enzymes that can degrade plastic. We used HUMAnN2.0 [1] to quantify the abundance of gene families and metabolic pathways in microbial communities, to achieve a functional profiling: find out what the microbial community can do. By studying what nature is already able to do, there is a chance we can improve the design of our project!

Metagenomics is the study of all the genetic material contained in a sample. This means that a complex community of cells and microorganisms are studied, rather than studying an individual organism. It is a powerful tool for generating hypotheses of microbial functions [2]. Shotgun DNA sequencing is used to capture all available DNA in a sample and consequently obtain the abundances of microorganisms, genetic content and functional potential of microbial communities [3]. It is important to distinguish between metagenomics utilizing shotgun sequencing and other methods such as 16S-rRNA sequencing. The latter method sequences specific organisms or marker genes, instead of the total genomic content [3]. Therefore, it is not strictly a metagenomic method. Since we are interested in the functionality of the microbial communities in different environments, we only retrieved data from studies that applied shotgun metagenomics. We quickly realized that there were very few studies that have done a proper metagenomic analysis for environmental samples related to plastic waste where we expect spandex to be present. This probably means that there is huge potential for future research to discover new functions from microbial communities living in different environments.

Metagenomic data retrieval from environments across the world
Our metagenomic data was retrieved from five different studies [4]–[8], corresponding to five different environments which represent different waste sources that we could expect to harbor plastics, such as landfills, wastewater, etc., see figure 1, and an overview of the retrieved datasets in table 1. Our main reason for choosing the studies that we did, was that the method used was shotgun metagenomics, which sequences all the genomic material in a sample (see “Introduction to metagenomics”). The analysis of some datasets in our chosen studies had already revealed microbial pathways responsible for degrading waste. Our main goal was to compare the functional potential in terms of the presence or absence of specific pathways and enzymes, between the different waste sources. In this way, we expected to gain insight into the microbial degradation of plastic to the extent in which plastic litters the globe and how its disposal possesses a huge contamination problem. Metagenomic Map Figure 1. Data was retrieved from five different environments: a landfill in India, a landfill in UK, from the Gulf of Mexico, from North Pacific Ocean and from a wastewater treatment site in US.
Article Environment Dataset
J. Gupta et al. (2017) Soil landfill, India. Link
E. Ransom-Jones et al. (2017) Wastewater landfill, UK Link
L.J. Pinnell, J.W. Turner (2019) Marine, Gulf of Mexico Link
J.A. Bryant et al. (2016) Marine, North Pacific Ocean Link
M.L. Petrovich et al. (2019) Wastewater treatment site, US Link
Table 1. An overview of the retrieved datasets from five different environments.


J. Gupta et al. (2017), revealed through metagenomic analysis that the presence and functional diversity of microbial communities in soil samples, retrieved from a landfill in New Delhi, India, were responsible for waste degradation in the landfill [4]. Additionally, E. Ransom-Jones et al. (2017) were looking at biomass-degrading communities in the leachate microcosms (liquid that has passed through the landfills) in Wirral, United Kingdom. In other words, their samples were recovered from water that had been passing through these landfills. It was evident that mechanisms of biomass degradation were present in the landfill microbiome [5].

Plastic debris is incredibly abundant in our oceans. L.J. Pinnell, J.W. Turner (2019) studied biofilms covering plastic in a marine environment. Their samples were recovered from the sediment-water interface of a coastal lagoon in the northern Gulf of Mexico [6]. Another area where plastic debris is a concern for our ecosystems is the North Pacific Ocean. During an expedition through the great Pacific, J.A. Bryant et al. (2016) studied the concentration of plastic particles in surface seawater. Metagenomic sequencing of plastic attached communities, revealed bacteria that seemed to thrive better inhabiting plastic near the surface [7]. Another interesting area where bacteria have been studied, is in wastewater treatment processes. These contain a high density of viruses which can impact microbial communities in aquatic systems [8]. M.L. Petrovich et al. (2019) retrieved their samples from two wastewater treatment sites, both by Lake Michigan in Wisconsin, US.

Data processing using FASTP and HUMAnN2.0
In order to ensure that our results could be reproducible, we created a Conda environment to manage our packages. We installed the tools needed for our pipeline (fastp [9], HUMAnN2.0 [1], [10], MethaPhlAn2.0 [11]) in this environment. Conda makes sure that all the package versions are compatible with each other. By using this, you ensure that you can go back to the same software installation you had when you performed your analysis. This is important for reproducibility since software tools may behave differently when updated or when the developers change their functionality. To read more about the reproducibility crisis in science and the changes that it entails, To read more about the reproducibility crisis in science and the changes that it entails, click here!

To process the data sampled from the five environments described above we used the following pipeline, summarized in Figure 2. For a more detailed view of our pipeline, go to our Github.

First, fastq files were retrieved from the databases listed in table 1. First, the fastq files were quality filtered using fastp. The next step of the preprocessing was to use HUMAnN2.0 [1], [10]. The metagenomic sequences (DNA or RNA reads) from the fastp are used as input for HUMAnN2.0. The known species within a community are identified by this tool, which are then aligned to their pangenome (the entire gene set of all strains of a species). Thereafter, this tool makes a translated search on unclassified reads and eventually computes and quantifies gene family and pathway abundances. [1]. The aim of this tool is to describe the metabolic potential of a microbial community, and to efficiently profile the presence of microbial pathways in a community from metagenomic sequencing data [10]. In the end, HUMAnN2.0 provides two outputs; a pathway abundance matrix and a gene abundance matrix. The tool quantifies genes and pathways in RPK units, which stands for reads per kilobase [12].

Metagenomics 2 Figure 2. Overview of data preprocessing. Fastq-files from 5 different environments were quality filtered using fastp before profiling the presence of microbial pathways in a community from metagenomic sequencing data. HUMAnN2.0 provides two output files: a gene abundance matrix and a pathway abundance matrix.

Each pathway describes chemical reactions where several enzymes are present and reflect the functionality of the microbial community, e.g. some pathways may show that the microbial community is able to degrade plastic. So, the pathway abundance matrix includes, as the name implies, details of the abundance of reads that map to genes belonging to a specific biochemical pathway. Each reaction’s abundance is calculated as the sum of the abundances for all gene families that map to the reaction [10]. The pathway abundance is proportional to the number of complete copies of a pathway in a community [12].

The gene abundance matrix contains details about the abundance of each gene family, in the community. Gene families are clusters at 90 % similarity of protein coding sequences that are evolutionary-related and that often perform similar functions [13]. The unit for gene family abundance is RPK. This is calculated as a sum of scores for all alignments for a gene family. The alignment score is the number of matches to the reference gene for a specific sequence [12]. Another matrix also created when running HUMAnN2.0, is the MetaPhlAn2 microbial abundance matrix.

Once we had our two desired outputs from HUMAnN2.0, the pathway abundance matrix was imported into R, which is a programming language used for statistical computing [14]. The pathway abundance matrix was merged with metadata from each of the five environments, that contained additional information required for our analysis, see the datasets in table 1. The data was processed to make it suitable for further analysis. In brief, individual feature counts (e.g. genes, microbes, pathways) were normalized by sequencing depth and log-transformed. Additional information on how these steps were carried out can be found on our Github, see link below.

Principal component analysis
To get a better overview of our data, we used principal component analysis (PCA), a technique used to visualize data. By doing a PCA, the number of dimensions of the dataset were reduced and thus, the dataset easier to interpret. With PCA, interpretability is increased at the same time as information loss is minimized when reducing the dimensionality. This can be done since new uncorrelated variables, principal components, are created that successively maximize variance. [15]. Our set of metagenomic data from the different environments was very large and complex, so we needed to reduce the dimensionality of our data set, while preserving as much variability (i.e. statistical information) as possible, to get information that we could more easily interpret. By plotting the principal components PC1 and PC2, we could visualize the abundance distribution of our microbial pathways in the different environments. The performed PCA on our dataset from the five environments can be seen in figure 3. The first two principal components, PC1 and PC2, account for 12.26% and 5.81% respectively of the total variation of our dataset. Some of the environments seen in different colors; Gulf of Mexico (red), North Pacific Ocean (blue), India (green), UK (purple) and US (orange) overlap to some extent which implies that they have similar abundance distributions in PC1 and PC2.

PCA Figure 3. Principal component analysis. PC1 and PC2, account for 12.26% and 5.81% respectively of the total variation of our dataset. The lines represent the density of PC1 and PC2. Some environments overlap, which means that they have similar abundance distributions in PC1 and PC2.

Pathways associated with plastic degradation
We extracted three pathways from our set of metagenomic data that were all associated with degradation of plastic. This was done by searching for pathways responsible for plastic degradation on Metacyc. In total, we identified the following three pathways:

  • "P184-PWY: protocatechuate degradation I (meta-cleavage pathway)"
  • "PWY-6435: 4-hydroxybenzoate biosynthesis V"
  • "PWY-7002: 4-hydroxyacetophenone degradation"

The above stated pathways were all found as partial steps in the degradation of bphA. BphA is a chemical which is formed in the degradation of the hard segment part of elastane. Additionally, pathway number two presented above is also similar to a step in the degradation of polyethylene terephthalate (PET), which is one of the plastic types constituting of the soft segment of elastane, present in textiles.

Violin plots were created to visualize how abundant the three pathways associated with degradation of plastic are in our studied environments. To do this, we plotted the microbial abundance (normalized and with the logarithm to the base 10) against each of our environments, see figures 4-6. Our zero in each figure is at –10 because a small number was added, since it is not possible to take the logarithm of zero. An overall observation of the jittered dots indicates that we have few samples from the landfill in India, and more samples from the rest of the environments.

Our first pathway associated with degradation of bphA, “P184-PWY: protocatechuate degradation I (meta-cleavage pathway)”, figure 4, is abundant in the North Pacific Ocean, in the wastewater from the landfill in UK and in the wastewater treatment site, US. There seems to be a modest abundance in the landfill in India. The pathway is not abundant at all in the Gulf of Mexico.

Figure 4. P184-PWY: protocatechuate degradation I (meta-cleavage pathway), associated with plastic degradation (bphA). Normalized abundance data plotted against five different environments. Our zero is at -10, so no abundance in Gulf of Mexico, but remaining environments has the pathway abundant.

Our second pathway, "PWY-6435: 4-hydroxybenzoate biosynthesis V", see figure 5, shows no abundance in both marine environments. Instead these pathways seem more abundant in both wastewater environments. Here also, we have a modest abundance in the landfill in India.

PWY-6435 Figure 5. PWY-6435: 4-hydroxybenzoate biosynthesis V, associated with plastic degradation (bphA). Normalized abundance data plotted against five different environments. Our zero is at -10, so no abundance in both marine environments, but remaining environments has some abundance of the pathway.

The third pathway associated with plastic degradation, "PWY-7002: 4-hydroxyacetophenone degradation", see figure 6, shows little abundance in our studied environments. Possibly, the pathway is abundant in the wastewater from a landfill in UK.

PWY-7002 Figure 6. PWY-7002: 4-hydroxyacetophenone degradation, associated with plastic degradation (bphA). Normalized abundance data plotted against five different environments. Our zero is at -10. Barely any abundance of the pathway in the studied environments, except for the UK landfill, that possibly has some abundance of the pathway.


We then tried to examine the microbes in each of the above chosen pathways, to find out if any microbe was overrepresented in each pathway. This was done by looking for microbes associated with the pathways in our pathway abundance matrix, but they were all marked as unclassified species. This means that there are pangenomes of some organisms that have not been mapped out yet.

Investigating enzymes included in the pathways associated with plastic degradation
Enzymes were picked out from each pathway of the three pathways associated with plastic degradation ("P184-PWY: protocatechuate degradation I (meta-cleavage pathway)", "PWY-6435: 4-hydroxybenzoate biosynthesis V", "PWY-7002: 4-hydroxyacetophenone degradation"). The enzymes were found on MetaCyc, a metabolic database that contains metabolic pathways. Table 2 below provides information about the name of each enzyme, Uniref ID, EC number and the pathway in which the enzyme was included.

Enzyme Uniref ID EC number Pathway
protocatechuate 4,5-dioxygenase UniRef90_P22636 EC:1.13.11.8 P184-PWY: protocatechuate degradation I (meta-cleavage pathway)
2-hydroxy-4-carboxymuconate-6-semialdehyde dehydrogenase UniRef90_Q9KWL3 EC:1.1.1.312 P184-PWY: protocatechuate degradation I (meta-cleavage pathway)
2-pyrone-4,6-dicarboxylate hydrolase UniRef90_A0A024YVL4 EC:3.1.1.57 P184-PWY: protocatechuate degradation I (meta-cleavage pathway)
4-oxalomesaconate hydratase UniRef90_A0A2U3Q460 EC:4.2.1.83 P184-PWY: protocatechuate degradation I (meta-cleavage pathway)
4-carboxy-4-hydroxy-2-oxoadipate aldolase UniRef90_A0A089QEG1 EC:4.1.3.17 P184-PWY: protocatechuate degradation I (meta-cleavage pathway)
4-hydroxy-4-methyl-2-oxoglutarate aldolase UniRef90_Q9AQI0 EC:4.1.3.16 P184-PWY: protocatechuate degradation I (meta-cleavage pathway)
4-coumarate coenzyme A ligase UniRef90_Q42524 EC:6.2.1.12 PWY-6435: 4-hydroxybenzoate biosynthesis V
4-coumaroyl-CoA hydratase/aldolase / Enoyl-CoA hydratase UniRef90_Q9P4U9 EC:4.2.1.17 PWY-6435: 4-hydroxybenzoate biosynthesis V
peroxisomal multifunctional enzyme type 2 UniRef90_A0A0J8YSR7 EC:1.1.1.35 PWY-6435: 4-hydroxybenzoate biosynthesis V
3-oxoacyl CoA thiolase UniRef90_P21151 EC:2.3.1.16 PWY-6435: 4-hydroxybenzoate biosynthesis V
4-hydroxybenzoyl-CoA thioesterase UniRef90_Q7BI34 EC:3.1.2.23 PWY-6435: 4-hydroxybenzoate biosynthesis V
4-hydroxyacetophenone monooxygenase UniRef90_Q93TJ5 EC:1.14.13.84 PWY-7002: 4-hydroxyacetophenone degradation
hydroquinone 1,2-dioxygenase UniRef90_A0A5E7VGT5 EC:1.13.11.66 PWY-7002: 4-hydroxyacetophenone degradation
4-hydroxymuconic-semialdehyde dehydrogenase UniRef90_A0A3G2US96 EC:1.2.1.61 PWY-7002: 4-hydroxyacetophenone degradation
maleylacetate reductase II UniRef90_Q93T12 EC:1.2.1.32 PWY-7002: 4-hydroxyacetophenone degradation
Table 2. Enzymes found in three pathways associated with plastic degradation. ("P184-PWY: protocatechuate degradation I (meta-cleavage pathway)", "PWY-6435: 4-hydroxybenzoate biosynthesis V", "PWY-7002: 4-hydroxyacetophenone degradation"). The table provides information about the name of the enzyme, uniref ID, EC number and the pathway in which the enzyme is included.


We compared the enzymes present in this pathway against the enzymes present in our samples and we obtained two matches, corresponding to 4-hydroxy-4-methyl-2-oxoglutarate aldolase and 2-hydroxy-4-carboxymuconate-6-semialdehyde dehydrogenase, both part of the protocatechuate degradation pathway.

metagenomics fig 7 Figure 7. Abundance of 4-hydroxy-4-methyl-2-oxoglutarate aldolase across different samples. Each dot corresponds to one sample. ****: p <= 0.0001.

metagenomics fig 8 Figure 8. Abundance of 2-hydroxy-4-carboxymuconate-6-semialdehyde dehydrogenase across different environments. Each dot corresponds to one sample. Ns: no significant.

We plotted the normalized abundance of these enzymes across the different environments and we discovered that they are present in all the aquatic environments, and significantly less present in soil. 4-hydroxy-4-methyl-2-oxoglutarate aldolase is significantly more abundant in wastewater than it is in the rest of environments. This is not unexpected since one would expect a higher content of plastic in streams from wastewater than in the ocean. No significant differences were found in the abundance of 2-hydroxy-4-carboxymuconate-6-semialdehyde dehydrogenase in the different environments.

Besides manually picking enzymes from pathways present in our microbial community, we queried the database PMDB: Plastic Microbial Biodegradation Database [16], from where we got a list of enzymes that have been confirmed to play a role in the degradation of polyurethane, PET and PEG. We included PEG in this search because it is a plastic usually present in the soft-segment of the molecules of elastane. For the list of enzymes, we retrieved their UniRef90 ID [17], which provides a cluster of enzymes sharing at least 90% of identity. Then, we compared this list with our data to see if any of the enzymes from the database is present in the microbial communities that were studied.

However, only one enzyme was a match: A9ZMOO, or Polyethylene glycol dehydrogenase. We studied the distribution of the enzyme in the different sampled environments.

metagenomics fig 9 Figure 9. Abundance of polyehtylene glycol dehydrogenase in different environments. The abundance was normalized. Each dot represents a sample. **: p <= 0.01.

We can see that the enzyme is expressed only in the environments that are related to water, and highly expressed in both marine and wastewater environments. This could be a representation of the problem of microplastic contamination in the oceans, since there seems to be more PEG dehydrogenase expressed in marine environments than in freshwater, although further research should be conducted to confirm this.

Limitations
Although metagenomics can be very useful to find more information about microbial activity and composition in different environments, it does - just like every other tool, have its limitations. One such limitation is that it is genomics and not transcriptomics, meaning that one cannot obtain information about whether the genes are expressed to proteins or not. Another limitation is that the quality will be dependent of the quality of the sequencing technique: it is possible that we cannot detect microorganisms present in very small amounts.

We used HUMAn2.0 in our study, but now HUMAN3.0 is also available, which is an updated version, albeit still in the alpha phase of development. This newer version might include more information than we were able to find out from this study and might be something to investigate and use in future research on our topic since larger pangenomes are mapped out in this version. Yet another limitation is that HUMAnN2.0 cannot assign new functions to a species [1], which means that if a certain function has not been already discovered (i.e. degradation of elastane), our analysis will not show it.

One last consideration is that our analysis heavily relies on preexisting knowledge. First, we are using raw data from other researchers. Secondly, many organisms that appeared in our analysis were unclassified. Thirdly, many of the genes that we found were unmapped, which means that we do not know their function. However, this is probably an indication that there is huge potential for new discoveries from profiling microbial communities!

Conclusions and future aspects
From the performed PCA, we can tell that some of our studied environments have similar functional content. Since information about functional content is provided as an output from our study, this could be used if you were doing a study on a similar topic or taking a look at other targets within our studied environments, because the pathway abundance matrix and gene abundance matrix that we have provided contains a huge amount of information that can be utilized for more than just our purpose. Additionally, we have now provided more details about the environmental community metagenomes and more information to help characterize the species within these environments.

Pathways that were associated with plastic degradation (more specifically bphA and PET) were proved to be abundant in some of our studied environments. We presumed that these environments contain enzymes possibly responsible for plastic degradation. To further test this hypothesis, we handpicked some of the enzymes present in these pathways and compared them with our results. We found that two enzymes from the protocatechuate degradation pathway are also present in most of our environments. Furthermore, we studied the presence in our samples of enzymes from the PMBD database, which have been experimentally confirmed to play a role in plastic degradation. Our analysis revealed the presence of PEG dehydrogenase in marine and wastewater environments.

The first pathway (P184-PWY: protocatechuate degradation I (meta-cleavage pathway)) has a higher abundance in all environments except for the Gulf of Mexico, which could imply that these environments have a higher plastic content. Furthermore, the second pathway (PWY-6435: 4-hydroxybenzoate biosynthesis V) has a higher abundance of the pathway in the wastewater treatment sites than in both the marine environments, which can be expected since wastewater treatments sites most likely contain more plastic than the marine environments. Although, there is some abundance of the pathway in the oceans as well. This likely confirms that our oceans are indeed contaminated by plastic, because we can from our results tell that there are plastic degrading pathways present there. The third pathway (PWY-7002: 4-hydroxyacetophenone degradation) shows some abundance in all environments, but it is unfortunately hard to draw further conclusions from this plot.

When examining microbes in each pathway, the only information we found was that the pathways contained unclassified species. The pangenomes of some organisms have not been mapped out - yet! This is what future research could be focused on. Also, as we mentioned before, we used the version HUMAnN2.0, but who knows what information could be obtained when using HUMAnN3.0? This version namely contains two times more species pangenomes and three times more gene families[18].



  1. [1]E. A. Franzosa et al., “Species-level functional profiling of metagenomes and metatranscriptomes,” Nat. Methods, vol. 15, no. 11, pp. 962–968, Nov. 2018, doi: 10.1038/s41592-018-0176-y.
  2. [2]T. Thomas, J. Gilbert, and F. Meyer, “Metagenomics - a guide from sampling to data analysis,” Microb. Inform. Exp., vol. 2, no. 1, p. 3, Dec. 2012, doi: 10.1186/2042-5783-2-3.
  3. [3]C. Quince, A. W. Walker, J. T. Simpson, N. J. Loman, and N. Segata, “Shotgun metagenomics, from sampling to analysis,” Nature Biotechnology, vol. 35, no. 9. Nature Publishing Group, pp. 833–844, Sep. 12, 2017, doi: 10.1038/nbt.3935.
  4. [4]J. Gupta, R. Rathour, M. Kumar, and I. S. Thakur, “Metagenomic analysis of microbial diversity in landfill lysimeter soil of Ghazipur landfill site, New Delhi, India,” Genome Announc., vol. 5, no. 42, Oct. 2017, doi: 10.1128/genomeA.01104-17.
  5. [5]E. Ransom-Jones, A. J. McCarthy, S. Haldenby, J. Doonan, and J. E. McDonald, “Lignocellulose-Degrading Microbial Communities in Landfill Sites Represent a Repository of Unexplored Biomass-Degrading Diversity,” mSphere, vol. 2, no. 4, Aug. 2017, doi: 10.1128/msphere.00300-17.
  6. [6]L. J. Pinnell and J. W. Turner, “Shotgun metagenomics reveals the benthic microbial community response to plastic and bioplastic in a coastal marine environment,” Front. Microbiol., vol. 10, no. JUN, p. 1252, Jun. 2019, doi: 10.3389/fmicb.2019.01252.
  7. [7]J. A. Bryant et al., “Diversity and Activity of Communities Inhabiting Plastic Debris in the North Pacific Gyre,” mSystems, vol. 1, no. 3, pp. 24–40, Jun. 2016, doi: 10.1128/msystems.00024-16.
  8. [8]M. L. Petrovich, S. Ben Maamar, E. M. Hartmann, B. T. Murphy, R. S. Poretsky, and G. F. Wells, “Viral composition and context in metagenomes from biofilm and suspended growth municipal wastewater treatment plants,” Microb. Biotechnol., vol. 12, no. 6, pp. 1324–1336, Nov. 2019, doi: 10.1111/1751-7915.13464.
  9. [9]S. Chen, Y. Zhou, Y. Chen, and J. Gu, “Fastp: An ultra-fast all-in-one FASTQ preprocessor,” in Bioinformatics, Sep. 2018, vol. 34, no. 17, pp. i884–i890, doi: 10.1093/bioinformatics/bty560.
  10. [10]“humann2 – The Huttenhower Lab.” https://huttenhower.sph.harvard.edu/humann (accessed Sep. 23, 2020).
  11. [11]N. Segata, L. Waldron, A. Ballarini, V. Narasimhan, O. Jousson, and C. Huttenhower, “Metagenomic microbial community profiling using unique clade-specific marker genes,” Nat. Methods, vol. 9, no. 8, pp. 811–814, Aug. 2012, doi: 10.1038/nmeth.2066.
  12. [12]“biobakery/humann: HUMAnN 2.0 is the next generation of HUMAnN 1.0 (HMP Unified Metabolic Analysis Network).” https://github.com/biobakery/humann?fbclid=IwAR2rwdxGgVyHPmaErv4UMvvYXpizPs53oRDbXEm2kzQ-TSs8guTAzFKLEAM#output-files (accessed Oct. 22, 2020).
  13. [13]“Functional Analysis – NGS Analysis.” https://learn.gencore.bio.nyu.edu/metgenomics/shotgun-metagenomics/functional-analysis/ (accessed Aug. 28, 2020).
  14. [14]“RStudio | Open source & professional software for data science teams - RStudio.” https://rstudio.com/ (accessed Aug. 28, 2020).
  15. [15]I. T. Jolliffe and J. Cadima, “Principal component analysis: a review and recent developments,” Philos. Trans. R. Soc. A Math. Phys. Eng. Sci., vol. 374, no. 2065, p. 20150202, Apr. 2016, doi: 10.1098/rsta.2015.0202.
  16. [16]Z. Gan and H. Zhang, “PMBD: a Comprehensive Plastics Microbial Biodegradation Database,” Database (Oxford)., vol. 2019, Jan. 2019, doi: 10.1093/database/baz119.
  17. [17]“UniRef.” https://www.uniprot.org/help/uniref (accessed Oct. 21, 2020).
  18. [18]“humann3 – The Huttenhower Lab.” https://huttenhower.sph.harvard.edu/humann3/ (accessed Oct. 21, 2020).