Nội dung text Shallow Shotgun Metagenome Sequencing (SSMS) Analysis Demo Report MetaPhlAn_Marker gene-based approach.pdf
4/2/24, 10:43 PM Shallow Shotgun Metagenome Sequencing (SSMS) Analysis Demo Report MetaPhlAn/Marker gene-based approach file:///C:/Users/Novogene/Desktop/SSMS_Demo_MetaPhlAn2_EN/report1.html 1/7 1 Overview 2 Library construction and sequencing 3 Bioinformatic analysis 4 Analysis Results 5 References 6 Appendix Shallow Shotgun Metagenome Sequencing (SSMS) Analysis Demo Report MetaPhlAn/Marker gene-based approach Contract ID xxxxxxxx Contract Name xxxxxxxx Batch ID xxxxxxxx Report Time 2023-04-21 Reminder The Report is only used to show some results. See the Result folder for details of all analysis contents The result hyperlink in the report is invalid before you get the result. After you confirm the settlement, the hyperlink of the result directory in the report of the release file is v 1 Overview Microorganisms are found in almost every habitat present in the nature, even in hostile environments such as the poles, deserts, geysers, rocks, and the deep sea. They are vital to human health, ecology, and other environments. Ever since Antonie van Leeuwenhoek invented the microscope, for centuries microorganisms have been studied based on pure cultures. Among the trillions of microbial species, only 0.1%~1% can be cultured ,which limits the research and development of microbial diversity resources tremendously. The term 'metagenomics' was first used by Jo Handelsman's group ,referring to the function-based analysis of mixed environmental DNA species. Recently, metagenomics was defined by Kevin et al., as "the application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments, bypassing the need for isolation and lab cultivation of individual species" .It circumvents the need to isolate and culture microorganisms in samples, and provides a way to study microorganisms that cannot be isolated and cultured, hence more accurately reflecting the microbial composition and interaction within samples, while also providing the ability to study their metabolic pathways and gene functions at the molecular level . The recent advances in next-generation sequencing (NGS) and bioinformatic analysis helped to precisely identify microbial species and associated metabolic pathways. The Human Microbiome Project (HMP) http://www.hmpdacc.org/,and Earth Microbiome Project (EMP) http://www.earthmicrobiome.org/ together with NGS immensely improved the areas of novel genome predictions, genetic associations, pathogen identifications and clinical diagnostics. Characterization of the bacterial composition and functional repertoires of microbiome samples is the most common application of metagenomics. Although deep whole-metagenome shotgun sequencing (WMS) provides high taxonomic resolution, it is generally cost-prohibitive for large longitudinal investigations. Until now, marker gene amplicon sequencing (e.g. 16S) has been the most widely used approach and usually cooperates with WMS to achieve cost-efficiency. However, the accuracy of the results and its consistency with WMS data have not been fully elaborated, especially by complicated microbiomes with defined compositional information. In contrast, shallow shotgun metagenome sequencing (SSMS) with shallower sequencing outputs highly resembled WMS data at both genus and species levels and presented much higher accuracy taxonomic assignments and functional predictions than the amplicon-based approach, thereby representing a better and cost-efficient alternative for large-scale microbiome studies. 2 Library construction and sequencing Collected raw material or extracted DNA were shipped to our laboratory refrigerated. Upon receipt, DNA was isolated from raw material. Subsequently, the quality of extracted and delivered DNA samples was assessed. Qualified DNA samples were used for library preparation. After library QC, the qualified libraries were sequenced using Illumina NovaSeq 6000 instrument with paired end 150 bp (PE150) strategy, and the raw sequencing data was used for downstream bioinformatic analysis. Novogene implemented a rigorous quality control system to ensure the accuracy and reliability of the results. Please see the SSMS experimental workflow below: [1] [2] [1] [3]
4/2/24, 10:43 PM Shallow Shotgun Metagenome Sequencing (SSMS) Analysis Demo Report MetaPhlAn/Marker gene-based approach file:///C:/Users/Novogene/Desktop/SSMS_Demo_MetaPhlAn2_EN/report1.html 2/7 1 Overview 2 Library construction and sequencing 3 Bioinformatic analysis 4 Analysis Results 5 References 6 Appendix Figure 2.1 Metagenomics Experomental Flow Chart 3 Bioinformatic analysis (a) Data QC: Raw NGS data usually contained sequencing artifacts such as low-quality reads and contaminated reads, which significantly compromised downstream analysis. Therefore, low-quality reads and host sequences were removed to obtain clean data for reliable downstream bioinformatic analysis. (b) Taxonomic annotation: MetaPhlAn was used to characterize the taxonomic composition of the shallow shotgun metagenome sequencing (SSMS) samples. Figure 3.1 Bioinformatic analysis pipeline Note:(1) clustering analysis require sample number >= 3. (2) Statistical analysis such as LEfSe results require at least 3 replicates per group.
4/2/24, 10:43 PM Shallow Shotgun Metagenome Sequencing (SSMS) Analysis Demo Report MetaPhlAn/Marker gene-based approach file:///C:/Users/Novogene/Desktop/SSMS_Demo_MetaPhlAn2_EN/report1.html 3/7 1 Overview 2 Library construction and sequencing 3 Bioinformatic analysis 4 Analysis Results 5 References 6 Appendix 4 Analysis Results 4.1 Data Pre-processing The protocols of data pre-processing are as follows: (1) Trim low quality bases (Q-value ≤ 38) which exceed certain threshold (40bp by default); (2) Trim reads which contain N nucleotides over certain threshold (10bp by default); (3) Trim reads which overlap with adapter over certain threshold (15 bp by default); (4) If the target community is associated with a host, bowtie2 was used to minimalize host DNA. Table 4.1 Statistics for data pre-processing Current: 1/4 page | Total items: 48 | First | Previous | Next | Last | Go to page Jump #Sample InsertSize(bp) SeqStrategy RawData RawReads(#) Low_Q N_num Adapter Duplica W0.1 350 (150:150) 6,190.08 41,267,172 0.00 0.32 1.85 0.00 W0.2 350 (150:150) 6,007.12 40,047,458 0.00 0.28 1.64 0.00 W0.4 350 (150:150) 6,266.32 41,775,466 0.00 0.27 1.49 0.00 W0.8 350 (150:150) 6,624.12 44,160,800 0.00 0.35 1.96 0.00 W0.11 350 (150:150) 6,559.13 43,727,502 0.00 0.30 2.53 0.00 W0.12 350 (150:150) 6,845.70 45,637,982 0.00 0.36 1.61 0.00 W0.21 350 (150:150) 6,126.25 40,841,656 0.00 0.27 2.76 0.00 W0.23 350 (150:150) 6,775.32 45,168,768 0.02 0.11 2.70 0.00 W0.24 350 (150:150) 6,430.04 42,866,960 0.02 0.10 1.86 0.00 W0.25 350 (150:150) 6,466.46 43,109,726 0.02 0.10 2.34 0.00 W0.26 350 (150:150) 6,526.52 43,510,144 0.02 0.10 1.95 0.00 W0.28 350 (150:150) 6,733.74 44,891,626 0.02 0.11 2.87 0.00 W0.29 350 (150:150) 6,463.57 43,090,490 0.02 0.10 1.93 0.00 W0.33 350 (150:150) 6,446.90 42,979,334 0.02 0.10 2.02 0.00 W0.34 350 (150:150) 5,994.38 39,962,526 0.00 0.32 1.79 0.00 Show Annotation Results Directory: Qualified Data: result/01.CleanData/Sample_Name/*_350.fq1(2).gz Nonhost Data: result/01.CleanData/Sample_Name/*_350.nohost.fq1(2).gz QC result: result/01.CleanData/total.QCstat.info.xls QC result after filtering host: result/01.CleanData/total.*.NonHostQCstat.info.xls 4.2 taxonVisual MetaPhlAn (Metagenomic Phylogenetic Analysis) is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. MetaPhlAn relies on unique clade-specific marker genes identified from ~17,000 reference genomes (~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic), allowing: 1.Unambiguous taxonomic assignments; 2.Accurate estimation of relative taxonomic abundance; 3.Species-level identification of bacteria, archaea, eukaryotes, and viruses; 4.Strain identification and tracking; 5.Faster analysis speed compared to existing methods. 4.2.1 Taxonomic relative abundance analysis MetaPhlAn estimates the relative abundance of microbial cells by mapping reads against a reduced set of clade-specific marker sequences that are computationally pre-selected from coding sequences that unequivocally identify specific microbial clades at the species or higher taxonomic levels and cover all main functional categories. The MetaPhlAn classifier compares each metagenomic [13]
4/2/24, 10:43 PM Shallow Shotgun Metagenome Sequencing (SSMS) Analysis Demo Report MetaPhlAn/Marker gene-based approach file:///C:/Users/Novogene/Desktop/SSMS_Demo_MetaPhlAn2_EN/report1.html 4/7 1 Overview 2 Library construction and sequencing 3 Bioinformatic analysis 4 Analysis Results 5 References 6 Appendix read from a sample to this marker catalog to identify high-confidence matches. The classifier normalizes the total number of reads in each clade by the nucleotide length of its markers and provides the relative abundance of each taxonomic unit, taking into account any markers specific to subclades. Microbial reads belonging to clades with no sequenced genomes available are reported as an “unclassified” subclade of the closest ancestor with available sequence data. The abundance analysis results can be visualized using bar chart and Krona plot. Figure 4.1 Bar chart showing the relative taxonomic abundance of the 10 most abundant taxa across the samples Notes:(a) Relative abundance of the top 10 phyla; (b) Relative abundance of the top 10 genera. The x-axis represents sample names and the y-axis represents relative abundance presented as a percentage. Each taxon is represented in a different colour. To intuitively explore the relative abundances and confidences within the complex hierarchies of metagenomic classifications, the visualization tool, Krona was used. Please see representative figure(s) below: Krona Figure 4.2 Krona plots of the microbiome detected in representative metagenomic datasets (representative figure) Note: In the figure, the circles represent different classification levels (kingdom, phylum, class, order, family, genus, and species) from inside to outside. The size of the sector represents the abundance of different taxa. Results Directory The Krona plot(s) is available at: result/03.Diversity/04.Krona/taxonomy.krona.html. The bar chart(s) of relative abundance analysis is available at: result/03.Diversity/01.barplot, which includes the results of five taxonomic ranks (phylum, class, order, family, and genus). 4.2.2 Hierarchical cluster analysis based on taxonomic relative abundance From the relative abundance tables at different taxonomic levels, the top 35 genera were selected to create heat maps based on relative abundance information in each sample. These top 35 genera were clustered at the species level to facilitate results representation and information discovery for species identification, especially for aggregated species within samples. See the presented results in the following heatmap figure: Search.. c.relative.top10.png s.relative.top10.png k.relative.top10.png f.relative.top10.png o.relative.top10.png g.relative.top10.png [15] p.relative.top10.png