Writing reference
We have summarized relevant materials that may be useful for writing articles, as a reference when you publish your paper.
Tips
The following content is an overview, detailed description can be viewed in the corresponding chapter of Analysis Help .
Sample extraction
Experiment process
1.Sample testing Select the corresponding testing method according to the sample and product requirements for quality control. 2. Sample fragmentation Take a certain amount of genomic DNA and perform fragment processing. 3. Fragment size selection Perform magnetic bead fragment selection on the fragmented sample. 4. End repair, A-tailing, and adapter ligation Prepare the reaction system and set the reaction program to repair the DNA ends and add an A base to the 3' end. Prepare the adapter ligation reaction system and set the reaction program to connect the adapter with DNA. 5. PCR Prepare the PCR reaction system and set the reaction program to amplify the product. 6. Library detection Choose the corresponding detection method according to the product requirements for quality control on the library. 7. Circularization After denaturing the PCR product into a single strand, prepare the circularization reaction system and set the reaction program to obtain a single-stranded circular product and digest the linear DNA molecules that have not been circularized. 8. Sequencing The single-stranded circular DNA molecules are replicated into a DNA nanoball (DNB) containing multiple copies through rolling circle replication. The obtained DNBs are placed in the mesh holes on the high-density DNA nanochip and sequenced using combined probe anchor polymerization technology (cPAS).
Library construction and sequencing
- Sample interruption. Take a certain amount of metagenomic DNA and break it with Covaris ultrasonic crusher.
- Clip size selection. After interrupting the sample beads, the fragments are selected so that the sample strips are concentrated around 200-400bp.
- End repair, add A (adenine) base, joint connection. Prepare the reaction system, react at suitable temperature for a certain time, repair the end of double-stranded cDNA, and add A base to the 3 'end, prepare the joint connection reaction system, and react at suitable temperature for a certain time to make the joint connect with DNA.
- PCR reaction and product recovery. The PCR reaction system was prepared, and the reaction procedure was set up to amplify the linked products. The amplified products were purified and recovered by magnetic beads.
- Cyclization of the product. After the PCR product was denatured into a single chain, the cyclization reaction system was prepared, and the single-chain ring product was obtained by fully mixing the reaction at the right temperature for a certain time. After digesting the uncyclized linear DNA molecule, the final library was obtained.
- Library detection. The cyclization product was used to detect the concentration before going on the machine.
Data analysis
Data filtering
The following steps were taken to obtain Clean Data from the raw sequencing data:
- Remove reads containing 10% or more of uncertain bases (N bases).
- Remove reads containing sequencing adapter sequences (regions with 15 or more bases aligned to adapter sequences).
- Remove reads containing 50% or more low-quality bases (bases with a quality score of Q<20).
- For samples from a host environment, to reduce interference from host sequences in subsequent analyses, an additional filtering step was added to remove sequences aligned to the host genome. Software and versions used: Data filtering: SOAPnuke(v1.5.0) [1] Data alignment: Bowtie2(2.2.5) [2] Data processing: Samtools(1.2)
Metagenome assembly
For the QC clean data, the assembly software MEGAHIT [3] was used for de novo assembly of the samples. Assembled sequences with lengths less than 200bp were filtered out.
Gene prediction and abundance information
First, MetaGeneMark [4] was used for de novo prediction of metagenome genes, and then CD-HIT [5] software was used to remove redundancy from the predicted genes for each sample. Based on sequence similarity (with an identity threshold of 95% and a coverage threshold of 90%), they were classified into one category or used as the representative sequence of a new cluster. All sequences were traversed to complete the clustering process. Finally, Salmon [6] software was used for quantification, and the obtained TPM value was the normalized gene abundance value. The TPM quantification formula is shown in the figure below:
Gene function prediction
For non-redundant genes, the Diamond [7] software's BLASTP function was generally used for functional annotation, including BacMet, CARD, KEGG, eggNOG, COG, Swiss-Prot, and CAZy. BacMet [8],Antibacterial Biocide and Metal Resistance Genes Database;version:20180311 CARD [9],The Comprehensive Antibiotic Resistance Database;version:3.0.9 KEGG [10],Kyoto Encyclopedia of Genes and Genomes;version:101 eggnog [11],evolutionary genealogy of genes: Non-supervised Orthologous Groups;version:5.0 COG [12],Clusters of Orthologous Groups;version:20201125 Swiss-Prot [13];version:release-2021_04 CAZy [14],Carbohydrate-Active enZYmes Database;version:20211013 ##Species annotation and species abundance calculation Kraken2 was used with default parameters for species annotation, and Bracken was used to estimate the species-level abundance of the metagenome sample using Bayesian algorithms and Kraken classification results. When selecting the database, the human gut genome UHGG [15] database will be used for human intestinal samples, while Nt (202011) database will be used for other samples. ##Species diversity analysis Using the R package, species Alpha diversity was calculated, including chao1 index, shannon index, and simpson index. The Bray-Curtis distance [16] and JSD distance [17] were also calculated to measure the differences between samples or groups, that is, Beta diversity [18], which reflects whether there are significant differences in microbial communities between samples (or groups).
References
[1] Chen Y, Chen Y, Shi C, Huang Z, Zhang Y, Li S, Li Y, Ye J, Yu C, Li Z, Zhang X, Wang J, Yang H, Fang L, Chen Q. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data[J]. Gigascience. 2018 Jan 1;7(1):1-6. doi: 10.1093/gigascience/gix120. PMID: 29220494; PMCID: PMC5788068.
[2] Ben Langmead, Steven L Salzberg. Fast gapped-read alignment with Bowtie 2[J]. Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.
[3] Dinghua Li, Chi-Man Liu, Ruibang Luo, Kunihiko Sadakane, Tak-Wah Lam. MEGAHIT: an ultra-fast single-node solutionfor large and complex metagenomics assembly via succinct de Bruijn graph[J]. Bioinformatics. 2015 May 15;31(10):1674-6. doi: 10.1093/bioinformatics/btv033. Epub 2015 Jan 20.
[4] Wenhan Zhu, Alexandre Lomsadze, Mark Borodovsky. Ab initio gene identification in metagenomic sequences[J]. Nucleic Acids Res. 2010 Jul;38(12):e132. doi: 10.1093/nar/gkq275. Epub 2010 Apr 19.
[5] Limin Fu, Beifang Niu, Zhengwei Zhu, Sitao Wu, Weizhong Li. CD-HIT: accelerated for clustering the next-generation sequencing data[J]. Bioinformatics. 2012 Dec 1;28(23):3150-2. doi: 10.1093/bioinformatics/bts565. Epub 2012 Oct 11.
[6] Rob Patro, Geet Duggal, Michael I Love, Rafael A Irizarry, Carl Kingsford. Salmon provides fast and bias-aware quantification of transcript expression[J]. Nat Methods. 2017 Apr;14(4):417-419. doi: 10.1038/nmeth.4197. Epub 2017 Mar 6.
[7] Benjamin Buchfink, Chao Xie, Daniel H Huson. Fast and sensitive protein alignment using DIAMOND[J]. Nature Methods. 2015 Jan;12(1):59-60. doi: 10.1038/nmeth.3176.
[8] Chandan Pal, Johan Bengtsson-Palme, Christopher Rensing, Erik Kristiansson, D G Joakim Larsson. BacMet: antibacterial biocide and metal resistance genes database[J]. Nucleic Acids Res. 2014 Jan;42(Database issue):D737-43. doi: 10.1093/nar/gkt1252. Epub 2013 Dec 3.
[9] Baofeng Jia, Amogelang R Raphenya, Brian Alcock, Nicholas Waglechner, Peiyao Guo, Kara K Tsang, Briony A Lago, Biren M Dave, Sheldon Pereira, Arjun N Sharma, Sachin Doshi, Mélanie Courtot, Raymond Lo, Laura E Williams, Jonathan G Frye, Tariq Elsayegh, Daim Sardar, Erin L Westman, Andrew C Pawlowski, Timothy A Johnson, Fiona S L Brinkman, Gerard D Wright, Andrew G McArthur. CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database[J]. Nucleic Acids Res. 2017 Jan 4;45(D1):D566-D573. doi: 10.1093/nar/gkw1004. Epub 2016 Oct 26.
[10] M Kanehisa, S Goto. KEGG: Kyoto encyclopedia of genes and genomes[J]. Nucleic Acids Res. 2000 Jan 1;28(1):27-30. doi: 10.1093/nar/28.1.27.
[11] Jaime Huerta-Cepas, Damian Szklarczyk, Davide Heller, Ana Hernández-Plaza, Sofia K Forslund, Helen Cook, Daniel R Mende, Ivica Letunic, Thomas Rattei, Lars J Jensen, Christian von Mering, Peer Bork. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses[J]. Nucleic Acids Res. 2019 Jan 8;47(D1):D309-D314. doi: 10.1093/nar/gky1085.
[12] Michael Y Galperin, Kira S Makarova, Yuri I Wolf, Eugene V Koonin. Expanded microbial genome coverage and improved protein family annotation in the COG database[J]. Nucleic Acids Res. 2015 Jan;43(Database issue):D261-9. doi: 10.1093/nar/gku1223. Epub 2014 Nov 26.
[13] Sylvain Poux, Cecilia N Arighi, Michele Magrane, Alex Bateman, Chih-Hsuan Wei, Zhiyong Lu, Emmanuel Boutet, Hema Bye-A-Jee, Maria Livia Famiglietti, Bernd Roechert, The UniProt Consortium. On expert curation and scalability: UniProtKB/Swiss-Prot as a case study[J]. Bioinformatics. 2017 Nov 1;33(21):3454-3460. doi: 10.1093/bioinformatics/btx439.
[14] Vincent Lombard, Hemalatha Golaconda Ramulu, Elodie Drula, Pedro M Coutinho, Bernard Henrissat. The carbohydrate-active enzymes database (CAZy) in 2013[J]. Nucleic Acids Res. 2014 Jan;42(Database issue):D490-5. doi: 10.1093/nar/gkt1178. Epub 2013 Nov 21.
[15] Alexandre Almeida, Stephen Nayfach, Miguel Boland, Francesco Strozzi, Martin Beracochea, Zhou Jason Shi, Katherine S Pollard, Ekaterina Sakharova, Donovan H Parks, Philip Hugenholtz, Nicola Segata, Nikos C Kyrpides, Robert D Finn. A unified catalog of 204,938 reference genomes from the human gut microbiome[J]. Nat Biotechnol. 2021 Jan;39(1):105-114. doi: 10.1038/s41587-020-0603-3. Epub 2020 Jul 20.
[16] Bray JR, Curtis JT . (1957). An ordination of the upland forest communities of southern Wisconsin. Ecol Monogr 27: 325–349.
[17] Majtey A P, Lamberti P W, Prato D P. Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states[J]. Physical Review A, 2005, 72(5): 052310.
[18] Vegetation of the Siskiyou Mountains, Oregon and California Robert H. Whittaker Ecological Monographs[J]. 1960 Jul 1;30: 279-338. doi:10.2307/1943563.