Species Alpha diversity
Species Alpha Diversity
Description of Speciesdiversity within community
Alpha diversity [1](α diversity,Alpha diversity,α-diversity)is concerned with the composition of Species with a habitat(within-habitatdiversity)or with a sample(within-sample),which is one of most important content in microbial ecology analysis. Alpha diversity analysis includes calculation of a series of diversity indices that reflect the number of Species in microbial community and whether the distribution is uniform. The significance test can also compare whether the differences of samples in different habitat are significant.
- Community richness describes the amount of Species types in the environment sample。
- Community eveness describes whether the Species of the microbial community in the environment is uniform, i.e. the level of relative abundance.
- Community diversity comprehensively considers the richness and uniformity of Species in community.
The type of alpha diversity calculation can be genes and species.
Richness
Assume three communities A、B、C have three Species including Species1、Species2 and Species3, of which the distribution is as follows
Species1 | Species2 | Species3 | |
---|---|---|---|
Community A | Yes | - | Yes |
Community B | Yes | Yes | - |
Community C | Yes | Yes | Yes |
“-” means the Species is not included in the community,so we consider .The indices describing community include Chao1 index [2] and ACE index [3]. Larger index indicates higher richness。
Chao1 index is suitable for abundance data like metagenomic gene abundance and species abundance. However, Chao1 algorithm is more sensitive to low-abundance data, which means the low abundant Species data influence Chao1 index more. The larger Chao1 index represents the larger totol number of Species.
:Actually observed number of Species
:The number of Species(singletons)is observed only once
:The number of Species(doubletons)is observed only twice
ACE index is another index to show richness of Species. The formula is:
in which
:high abundance(above low abundance threshold)Species quantity
:low abundance(less than or equal to low abundance threshold)Species quantity
:i-thSpecies
:the number of Species(singletons)is observed only once
:abundance of i-th Species
Evenness
Assume there are two Species A and B in C and D communities,of which the distribution is as follows:
SpeciesA | SpeciesB | |
---|---|---|
Community C | 5 | 5 |
Community D | 2 | 8 |
So, ,but . The indices describing evenness are Pielou's evenness [4] and Simpson's evenness [5].
Pielou evenness, also known as Shannon's evenness, is the ratio of actual Shannon index of the community to the maximum Shannon index that can be obtained in a comunity with the same Species richness; If all Species have the same relative abundance, the value should be 1.
:Shannon index
:In condition of same abundance of Species, the Shannon index reaches maximum (that is, when abundances of all Species are the same)
:richness index of community Species
:normally ,then the index can be call Pielou_e
Simpson evenness (Simpson’s evenness),also called as equitability, means Simpson valid Species number (i.e. Simpson diversity) to the richness index of Species.
:Simpson valid Species number
:richness index of community Species
Diversity
The most commonly used indices in metagenomic analysis are the Shannon index [6] and the Simpson index [5:1], which comprehensively consider the richness and evenness of the community Species.
Shannon index, also known as Shannon entropy index, Shannon-Wiener index, comprehensively considers the richness and evenness of the community. The Shannon index of the sample is large, indicating that the Species is rich and uniform in the sample.
:the actual number of OTU
:the number of OTU with i-th sequence
:total number of sequences
Simpson indexis one of the indices used to estimate the microbial diversity in a sample, describing the probability that the number of individuals obtained from two consecutive samplings from a community species belong to the same species. However, the common Species and the dominant Species in the sample have a greater impact on the index, that is to say, the low abundance in the sample will not have a great impact on the index Impact. Calculated as follows:
:the relative abundance of the i-th Species
The value range of the Simpson exponent calculated by this formula is [0, 1], and the larger the value, the smaller the diversity, which is contrary to our intuition, so is often used to represent the Simpson exponent.
metagenomic analysis in Dr.Tom use follow Simpson formula
:the relative abundance of the i-th Species
Therefore, the larger the Simpson index in the system, the higher the diversity of Species.
Hypothetical Test
The diversity index describes the diversity of microbial communities within a sample, but how the diversity differs between samples requires a hypothesis test (also often called a significance test). The commonly used parametric test method is T test/analysis of variance, and the commonly used nonparametric test method is Wilcoxon/Kruskal-Wallis test.
- Parametric test: It is necessary to assume that the sample conforms to a specific distribution (usually a normal distribution) and then perform a statistical test on the mean and variance of the parameters. When the number of comparison groups is equal to 2, the T test is commonly used, and when the number of comparison groups is greater than 2, analysis of variance is commonly used.
- Nonparametric test: When it is impossible to determine which distribution the sample belongs to, first sort the samples according to certain sorting rules, and then perform a statistical test on the ranking. This method has no requirements on data distribution, but its sensitivity will be lower than that of parametric tests. When the number of comparison groups is equal to 2, Wilcoxon is commonly used, and when the number of comparison groups is greater than 2, Kruskal-Wallis is commonly used.
The null hypothesis of the significance test is that the diversity indices of the two samples are not different. It is generally considered that when the significance test result p < 0.05, the null hypothesis is rejected, and the α diversity difference between the two samples is considered to be significant.
Data Visualization
The Dr. Tom system uses boxplots to visualize the results of Alpha Diversity Analysis.
FAQ
Q: What are the specific methods used to test Alpha diversity statistics?
A: According to the selected method and the number of comparison groups, the specific statistical test methods are as follows:
Parametric tests | non-Parametric tests | |
---|---|---|
Group=2 | T test | Wilcoxon |
Group>2 | variance analysis | Kruskal-Wallis |
Reference
Whittaker, R. H. (1960). Vegetation of the Siskiyou Mountains, Oregon and California. Ecological Monographs, 30(3), 279–338. https://doi.org/10.2307/1943563 ↩︎
Colwell, R. K., Mao, C. X., & Chang, J. (2004). Interpolating, Extrapolating, and Comparing Incidence-Based Species Accumulation Curves. Ecology, 85(10), 2717–2727. https://doi.org/10.1890/03-0557 ↩︎
Chao, A., & Yang, M. C. K. (1993). Stopping Rules and Estimation for Recapture Debugging with Unequal Failure Rates. Biometrika, 80(1), 193–201. https://doi.org/10.1093/biomet/80.1.193 ↩︎
Pielou, E. C. (1966). The Measurement of Diversity in Different Types of Biological Collections. Journal of Theoretical Biology, 13, 131–144. https://doi.org/10.1016/0022-5193(66)90013-0 ↩︎
Simpson, E. H. (1949). Measurement of Diversity. Nature, 163(4148), 688–688. https://doi.org/10.1038/163688a0 ↩︎ ↩︎
Shannon, C. E. (2001). A Mathematical Theory of Communication. ACM SIGMOBILE Mobile Computing and Communications Review, 5(1), 3–55. ↩︎