Species Alpha diversity

Dr.TomAbout 18 wordsLess than 1 minute

Species Alpha Diversity

Description of Speciesdiversity within community

Alpha diversity [1](α diversity,Alpha diversity,α-diversity)is concerned with the composition of Species with a habitat(within-habitatdiversity)or with a sample(within-sample),which is one of most important content in microbial ecology analysis. Alpha diversity analysis includes calculation of a series of diversity indices that reflect the number of Species in microbial community and whether the distribution is uniform. The significance test can also compare whether the differences of samples in different habitat are significant.

  • Community richness describes the amount of Species types in the environment sample。
  • Community eveness describes whether the Species of the microbial community in the environment is uniform, i.e. the level of relative abundance.
  • Community diversity comprehensively considers the richness and uniformity of Species in community.

The type of alpha diversity calculation can be genes and species.

Richness

Assume three communities A、B、C have three Species including Species1、Species2 and Species3, of which the distribution is as follows

Species1Species2Species3
Community AYes-Yes
Community BYesYes-
Community CYesYesYes

“-” means the Species is not included in the community,so we consider richnessC>richnessB=richnessArichness_C > richness_B = richness_A .The indices describing community include Chao1 index [2] and ACE index [3]. Larger index indicates higher richness。

Chao1 index is suitable for abundance data like metagenomic gene abundance and species abundance. However, Chao1 algorithm is more sensitive to low-abundance data, which means the low abundant Species data influence Chao1 index more. The larger Chao1 index represents the larger totol number of Species.

Schao1=Sobs+n1(n11)2(n2+1) S_{chao1} = S_{obs} + \frac{n_1(n_1-1)}{2(n_2+1)}

SobsS_{obs}:Actually observed number of Species

n1n_1:The number of Species(singletons)is observed only once

n2n_2:The number of Species(doubletons)is observed only twice

ACE index is another index to show richness of Species. The formula is:

Sace=Sabund+SrareCace+F1Caceγace2 S_{ace}=S_{abund}+\frac{S_{rare}}{C_{ace}}+ \frac{F_1}{C_{ace}}\gamma^2_{ace}

in which

γace2=max[SrareCacei=110i(i1)Fi(Nrare)(Nrare1)1,0] \gamma^2_{ace}=max\left[\frac{S_{rare}}{C_{ace}} \frac{\sum^{10}_{i=1}{i * \left(i-1\right)}F_i} {\left(N_{rare}\right)\left(N_{rare}-1\right)} -1,0\right]

Nrare=i=1abundini N_{rare} = \sum_{i=1}^{abund}in_i

Cace=1F1Nrare C_{ace} = 1-\frac{F1}{N_{rare}}

SabundS_{abund}:high abundance(above low abundance threshold)Species quantity

SrareS_{rare}:low abundance(less than or equal to low abundance threshold)Species quantity

ii:i-thSpecies

F1F_1:the number of Species(singletons)is observed only once

FiF_i:abundance of i-th Species

Evenness

Assume there are two Species A and B in C and D communities,of which the distribution is as follows:

SpeciesASpeciesB
Community C55
Community D28

So, richnessC=richnessD\text{richness}_\text{C} = \text{richness}_\text{D},but evennessC>evennessD\text{evenness}_\text{C} > \text{evenness}_\text{D}. The indices describing evenness are Pielou's evenness [4] and Simpson's evenness [5].

Pielou evenness, also known as Shannon's evenness, is the ratio of actual Shannon index of the community to the maximum Shannon index that can be obtained in a comunity with the same Species richness; If all Species have the same relative abundance, the value should be 1.

J=HHmax=HlogxS J = \frac{H}{H_{max}} = \frac{H}{log_{x}S}

HH:Shannon index

HmaxH_{max}:In condition of same abundance of Species, the Shannon index reaches maximum (that is, when abundances of all Species are the same)

SS:richness index of community Species

xx:normally x=ex = e,then the index can be call Pielou_e

Simpson evenness (Simpson’s evenness),also called as equitability, means Simpson valid Species number (i.e. Simpson diversity) to the richness index of Species.

equitability=DensS equitability = \frac{D_{ens}}{S}

DensD_{ens}:Simpson valid Species number

SS:richness index of community Species

Diversity

The most commonly used indices in metagenomic analysis are the Shannon index [6] and the Simpson index [5:1], which comprehensively consider the richness and evenness of the community Species.

Shannon index, also known as Shannon entropy index, Shannon-Wiener index, comprehensively considers the richness and evenness of the community. The Shannon index of the sample is large, indicating that the Species is rich and uniform in the sample.

Hshannon=i=1SobsniNlnniN H_{shannon} = -\sum_{i=1}^{S_{obs}}\frac{n_i}{N}ln\frac{n_i}{N}

SobsS_{obs}:the actual number of OTU

nin_i:the number of OTU with i-th sequence

NN:total number of sequences

Simpson indexis one of the indices used to estimate the microbial diversity in a sample, describing the probability that the number of individuals obtained from two consecutive samplings from a community species belong to the same species. However, the common Species and the dominant Species in the sample have a greater impact on the index, that is to say, the low abundance in the sample will not have a great impact on the index Impact. Calculated as follows:

D=pi2 D = \sum{p_i^2}

pip_i:the relative abundance of the i-th Species

The value range of the Simpson exponent calculated by this formula is [0, 1], and the larger the value, the smaller the diversity, which is contrary to our intuition, so 1D1 - D is often used to represent the Simpson exponent.

metagenomic analysis in Dr.Tom use follow Simpson formula

S=1pi2 S = 1 - \sum{p_i^2}

pip_i:the relative abundance of the i-th Species

Therefore, the larger the Simpson index in the system, the higher the diversity of Species.

Hypothetical Test

The diversity index describes the diversity of microbial communities within a sample, but how the diversity differs between samples requires a hypothesis test (also often called a significance test). The commonly used parametric test method is T test/analysis of variance, and the commonly used nonparametric test method is Wilcoxon/Kruskal-Wallis test.

  • Parametric test: It is necessary to assume that the sample conforms to a specific distribution (usually a normal distribution) and then perform a statistical test on the mean and variance of the parameters. When the number of comparison groups is equal to 2, the T test is commonly used, and when the number of comparison groups is greater than 2, analysis of variance is commonly used.
  • Nonparametric test: When it is impossible to determine which distribution the sample belongs to, first sort the samples according to certain sorting rules, and then perform a statistical test on the ranking. This method has no requirements on data distribution, but its sensitivity will be lower than that of parametric tests. When the number of comparison groups is equal to 2, Wilcoxon is commonly used, and when the number of comparison groups is greater than 2, Kruskal-Wallis is commonly used.

The null hypothesis H0H_0 of the significance test is that the diversity indices of the two samples are not different. It is generally considered that when the significance test result p < 0.05, the null hypothesis is rejected, and the α diversity difference between the two samples is considered to be significant.

Data Visualization

The Dr. Tom system uses boxplots to visualize the results of Alpha Diversity Analysis.

FAQ

Q: What are the specific methods used to test Alpha diversity statistics?

A: According to the selected method and the number of comparison groups, the specific statistical test methods are as follows:

Parametric testsnon-Parametric tests
Group=2T testWilcoxon
Group>2variance analysisKruskal-Wallis

Reference


  1. Whittaker, R. H. (1960). Vegetation of the Siskiyou Mountains, Oregon and California. Ecological Monographs, 30(3), 279–338. https://doi.org/10.2307/1943563open in new window ↩︎

  2. Colwell, R. K., Mao, C. X., & Chang, J. (2004). Interpolating, Extrapolating, and Comparing Incidence-Based Species Accumulation Curves. Ecology, 85(10), 2717–2727. https://doi.org/10.1890/03-0557open in new window ↩︎

  3. Chao, A., & Yang, M. C. K. (1993). Stopping Rules and Estimation for Recapture Debugging with Unequal Failure Rates. Biometrika, 80(1), 193–201. https://doi.org/10.1093/biomet/80.1.193open in new window ↩︎

  4. Pielou, E. C. (1966). The Measurement of Diversity in Different Types of Biological Collections. Journal of Theoretical Biology, 13, 131–144. https://doi.org/10.1016/0022-5193(66)90013-0open in new window ↩︎

  5. Simpson, E. H. (1949). Measurement of Diversity. Nature, 163(4148), 688–688. https://doi.org/10.1038/163688a0open in new window ↩︎ ↩︎

  6. Shannon, C. E. (2001). A Mathematical Theory of Communication. ACM SIGMOBILE Mobile Computing and Communications Review, 5(1), 3–55. ↩︎