Analysis Pipeline
Data Filtering
The sequencing data was filtered with SOAPnuke [1] by (1) Removing reads containing sequencing adapter; (2) Removing reads whose low-quality base ratio (base quality less than or equal to 15) is more than 20%; (3) Removing reads whose unknown base ('N' base) ratio is more than 5%. Afterwards, clean reads were obtained and stored in FASTQ format.
Structure Variation Detection
The clean reads were mapped to the reference genome using HISAT2 [2]. After that, Ericscript (0.5.5-5) [3] and rMATS (v4.1.1) [4] were used to detect fusion genes and differential splicing genes (DSGs), respectively
RNA Identification
Bowtie2[5] was applied to align the clean reads to the gene set.
Gene Quantification Differential Expression Analysis
Expression level of gene was calculated by RSEM (v1.2.28) [6] to get read count, FPKM and TPM. Essentially, differential expression gene (DEG) analysis was performed using the DESeq2 [7] (or DEGseq[8] or PossionDis[9])with Q value ≤ 0.05. The DEG heatmap was drawn by pheatmap [10] according to the DEG analysis results.
Gene Annotation
To take insight to the change of phenotype, GO (http://www.geneontology.org/) and KEGG (https://www.kegg.jp/) enrichment analysis of annotated different expression gene was performed by Phyper (https://en.wikipedia.org/wiki/Hypergeometric_distribution) based on Hypergeometric test. The significant levels of terms and pathways were corrected by Q value with a rigorous threshold (Q value ≤ 0.05) [11].
Softwares
Software | Parameter | References | Source |
---|---|---|---|
SOAPnuke (v1.5.6) | -l 15 -q 0.2 -n 0.05 | [1:1] | https://github.com/BGI-flexlab/SOAPnuke |
HISAT2 (v2.2.1) | --sensitive --no-discordant --no-mixed -I 1 -X 1000 -p 8 | [2:1] | http://www.ccb.jhu.edu/software/hisat |
Ericscript (v0.5.5-5) | Default | [3:1] | http://ericscript.sourceforge.net/ |
rMATS (V4.1.1) | Default | [4:1] | http://rnaseq-mats.sourceforge.net |
Bowtie2 (2.4.4) | -q --sensitive --dpad 0 --gbar 99999999 --mp 1,1 --np 1 --score-min L,0,-0.1 -I 1 -X 1000 --no-mixed --no-discordant -p 1 -k 200 | [5:1] | http://bowtie-bio.sourceforge.net/index.shtml |
RSEM (v1.2.28) | --forward-prob 0 | [6:1] | http://deweylab.biostat.wisc.edu/rsem |
DESeq2 | Default | [7:1] | http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html |
DEGseq | Default | [8:1] | http://bioinfo.au.tsinghua.edu.cn/software/degseq/ |
Pheatmap | Default | [10:1] | https://cran.r-project.org/web/packages/pheatmap/ |
qvalue | Default | [11:1] | https://bioconductor.org/packages/release/bioc/html/qvalue.html |
References
Li R, Li Y, Kristiansen K, Wang J. (2008). SOAP: short oligonucleotide alignment program. Bioinformatics. 24(5):713-4 ↩︎ ↩︎
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357-360 (2015). ↩︎ ↩︎
Matteo Benelli, Chiara Pescucci, Giuseppina Marseglia, Marco Severgnini, Francesca Torricelli, Alberto Magi, Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript, Bioinformatics, Volume 28, Issue 24, December 2012, Pages 3232–3239. ↩︎ ↩︎
Shen, S. et al. rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc. Natl Acad. Sci. USA 111, E5593-E5601 (2014). ↩︎ ↩︎
Langmead, B. et al. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357-359 (2012). ↩︎ ↩︎
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011). ↩︎ ↩︎
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). ↩︎ ↩︎
Wang L. et al. (2010). DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics, Jan 1;26(1):136-8. ↩︎ ↩︎
Audic, S. & Claverie, J. M. The significance of digital gene expression profiles. Genome Res. 7, 986-995 (1997). ↩︎
Raivo Kolde. Package ‘pheatmap’. 2019-01-04 13:50:12 UTC. ↩︎ ↩︎
Storey JD, Bass AJ, Dabney A, Robinson D (2021). qvalue: Q-value estimation for false discovery rate control. R package version 2.26.0, http://github.com/jdstorey/qvalue ↩︎ ↩︎