天天看點

quantile-quantile plot (qqplot) of the p-values

The QQ plot shows the expected distribution of association test statistics (X-axis) across the million SNPs compared to the observed values (Y-axis). Any deviation from the X=Y line implies a consistent difference between cases and controls across the whole genome (suggesting a bias like the ones I’ve mentioned). A clean QQ plot (see below), on the other hand, should show a solid line matching X=Y until it sharply curves at the end (representing the small number of true associations among thousands of unassociated SNPs). The blue points in this figure show what’s left after removing the validated associations, which shows that most of that tail was, in fact, due to true disease variants, but also that more interesting results might still be lurking in the data.

quantile-quantile plot (qqplot) of the p-values

The QQ plot is a graphical representation of the deviation of the observed P values from the null hypothesis: the observed P values for each SNP are sorted from largest to smallest and plotted against expected values from a theoretical χ2-distribution. If the observed values correspond to the expected values, all points are on or near the middle line between the x-axis and the y-axis (null hypothesis: light gray line in Fig. 2b and c). If some observed P values are clearly more significant than expected under the null hypothesis, points will move towards the y-axis, as shown in Figure 2b. If there is an early separation of the expected from the observed (Fig. 2c), this means that many moderately significant Pvalues are more significant than expected under the null hypothesis. This result is rarely due to thousands of true positives; more often, it is due to population stratification: systematic differences in allele frequencies between subpopulations of the collection of individuals investigated, so that a large number of P values are smaller than expected from chance alone.

quantile-quantile plot (qqplot) of the p-values