Bacterial abundance Quantification
Bacteria identification and quantification via 16Sseq: During 16S rDNA sequencing two variable regions, V3 and V4, of the 16S rRNA gene will be analyzed to define the phylogenetic diversity of each sample. These variable regions are widely used in phylogenetic classifications at the genus or species level in diverse microbial populations. Library preparation of ~250 samples per run will be performed as recommended by Illumina for 16S rRNA metagenomic sequencing on MiSeq system (Illumina).
16Sseq data analysis: Briefly, for Sequence clustering and OTU filtering Illumina’s 16S metagenomics tool in Basespace will be used. The classified organisms will be converted to percentages in each sample, so as to show the representation of each organism in the sample. Tabular and graphical statistics on OTU alpha diversity (e.g., box plots) and on OTU beta diversity (e.g., principal component analysis (PCA)) will be created. Parametric and non-parametric tests on a single response variable (e.g., t-test, Mann-Whitney U test) and on multiple response variables (e.g., (distance-based) redundancy analysis, generalized linear models for multivariate abundance data) will be performed to test for differential abundance between groups. Enterotypes will be identified using clustering techniques such as hierarchical clustering and k-means.
Anticipated results and alternative strategies: It is possible that mucosal or fecal bacteria might bias our gene expression results. Thus we will stratify all individuals into 2 subgroups according to the bacterial load found in their biopsy samples: the high and the low bacterial load individuals. In addition, we will stratify all individuals into 2 subgroups according to:
(a) fecal enterotypes (co-abundant genera),
(b) the relative abundance of any fecal phyla and genera, and
(c) the relative abundance of potentially beneficial fecal genera e.g. Lactobacillus and Bifidobacteria and the potentially pathogenic ones e.g. Enterobacteriaceae and other gamma-Proteobacteria genera.
Should we find strong correlations between any of these subgroups and our selected as differentially expressed genes, we will re-analyze these genes for each such subgroup between “histologically-healthy” and “CRC-prone” individuals. If the number of available individuals in each subgroup does not suffice, we may re-assess up to 20 of the differentially expressed genes by RT-qPCR increasing the number of individuals to be assessed from 32 and 64 to up to 50 and 100 respectively.
Bacterial abundance relative to the host tissue via qPCR:
To target 16S rDNA genes we are using universal (pan-bacterial) primers & primers specific to the following genera that dominate the human colon: a) Firmicutes b) Bacteroides c) Proteobacteria d) Actinobacteria. To normalize bacterial load to host tissue sample, two reference gene, RSP13 and GAPDH, are used, because their number per genome/cell is fixed. In principle, the process follows the absolute quantification based on copy numbers of 16S rRNA via RT-qPCR method.