QuantSeq – The Cyprus Intestinal Health Study

3’mRNA sequencing quantification by Next Generation Sequencing will be achieved using the QuantSeq 3' mRNA-Seq Library Prep Kit (Lexogen). QuantSeq is a highly strand-specific sequencing approach that reduces data analysis time and enables a higher level of multiplexing per run compared to standard mRNA sequencing method.

Statistical power & transcriptomics analysis: Primary QuantSeq analysis covers all transcripts of all human genes whereas it normalizes arbitrary values in the range of 1 to 1000 Fragments per Kilobase per Million Reads. To be selected as deferentially expressed, the value of a gene between the control and the target group must be at least 2-fold different (1:2 ratio). Accordingly, a statistical power analysis is performed to assess the minimum number of human samples needed in total as well as separately for control (histologically-healthy) and target (CRC-prone) group in order to ensure a minimum statistical significance level of 95% and a minimum statistical power level of 80% as per the vast majority of scientific publications.

Given the small sample involved we assumed non-parametricity for our test statistics. As a first part of our analysis we would perform a Mann-Whitney U test, which is the non-parametric equivalent of the parametric t test for independent samples. For the power calculation, we used a mean of 0.002 for control group and 0.003 to 0.004 (a 150% to 200% increase) for target group with a +-10% standard deviation. Moreover, we assumed a 2:1 ratio between the sample sizes of the two groups. Given the above assumptions, the sample size required will be 96 people (64 and 32) for the independent comparison (control vs. target).

Our analysis would also involve paired comparisons of different regions within each subjects’ colon which would involve the use of the Wilcoxon test which is the non-parametric equivalent of the parametric t-test for paired samples. For this power calculation, we assumed variability between different regions of about 50% on average, which gave us a minimum sample size of 86 people (from both groups). In conclusion based on the expected parameters and the expected statistical tests to be performed a total sample of 100 at the ratio 1:2 should be enough for all statistical tests.

Anticipated results and alternative strategies: We anticipate regenerative inflammation and DNA repair genes to be expressed higher in the CRC-prone group. 2 samples per location per individual safeguard the sufficiency in available samples. If for any reason our QuantSeq analysis is unsatisfactory or if it needs to be validated through alternative methods we have 2 options: (a) perform RT-qPCR analysis of the 20 regenerative inflammation genes we have already standardized for the purpose; (b) perform western blot or ELIZA analysis of the protein fraction isolated from the first biopsies; (c) make use of the second biopsies per location per individual for cryosectioning & immmunohistochemistry against regenerative inflammation proteins using antibodies for mitosis e.g. anti-Ki-67, anti-PCNA, anti-pH3 and/or for EGFR, JAK/STAT, TNFR, JNK pathway markers and/or DNA repair genes e.g. anti-MLH1, anti-histone H2A.XY.