Introduction

Complex diseases such as diabetes, schizophrenia and cancer lay a heavy burden on public health, with frequencies over 1%. As they are caused by interactions among genes, an increasing number of studies (Hoh et al, 2001; Nelson et al, 2001; Ritchie et al, 2001; Cordell and Clayton, 2002) have employed multilocus methods to detect the causative genes. Despite the abundance of such studies, only 16–30% of the reports of significant associations have been consistently replicated (Page et al, 2003). One reason for the low replication rate is the failure in controlling for the effect of multiple testing (Lander and Kruglyak, 1995).

Multilocus methods search loci in combinations, which inevitably tests a great number of hypotheses simultaneously: a k-locus analysis of n loci would have Cnk locus-combinations to inspect. In addition to Cnk being a large and cumbersome number, the tests are correlated with each other. For example, locus-combination AB is obviously correlated with combination BC for they share a mutual locus B. Even when loci are completely independent, their combinations are still correlated. These characteristics complicate the control of the error rate in multilocus analyses. Appropriate adjustment for multiple testing is very important to prevent a flood of false-positive claims and to prevent true associations being overlooked.

A traditional criterion for multiple testing is the experiment-wise significant level (EWSL, αe), which is the probability that one or more hypotheses would be falsely rejected. To control the EWSL, the corresponding point-wise significant level (PWSL, αp), which is the probability that an individual hypothesis would be falsely rejected, should be calculated appropriately. For independent tests, the Bonferroni (or Sidak) correction provides a simple and accurate control: αp=1−(1−αe)1/M≈αe/M, where M is the number of tests. However, this adjustment is overly conservative for correlated tests. Lander and Botstein (1989) and Lander and Kruglyak (1995) offered analytical adjustments for correlated tests in single-locus analyses, but it is hard to apply them to multilocus analyses.

In practice, most researchers have therefore employed resampling-based methods to control the EWSL of correlated tests. Hoh et al (2001) and Ritchie et al (2001) used the permutation test; Nelson et al (2001) used the cross-validation test. Although resampling-based methods are accurate, they demand intensive computation, which constrains them to cases with a limited number of tests. Taking the permutation test as an example, Churchill and Doerge (1994) recommended at least 1000 shuffles to estimate a 0.05 EWSL and at least 10 000 shuffles to estimate a 0.01 EWSL.

An alternative criterion for multiple testing is the false discovery rate (FDR) (Benjamini and Hochberg, 1995), which is the expected proportion of falsely rejected hypotheses among all those rejected. When all hypotheses are true, the FDR is equal to the EWSL, but otherwise it is smaller. For independent tests, Benjamini and Hochberg (1995) and Benjamini and Liu (1999) proposed a step-up procedure and a step-down procedure to control the FDR. For correlated tests, Benjamini and Yekutieli (2001) proposed a simple but highly conservative procedure. Storey (2002) and Storey and Tibshirani (2003) proposed a variation of the FDR: the positive false discovery rate (pFDR), which is the conditional FDR given that at least one hypothesis is rejected. These authors gave direct estimates of the pFDR for independent tests, and suggested resampling-based methods such as the permutation test for correlated tests. Correlation among tests is still a problem for FDR methods.

In 2001, Cheverud proposed a new idea to adjust correlated tests as though they were independent according to an effective number (Meff) of independent tests. He also gave an estimation of the Meff for single-locus analyses. In Nyholt (2004)'s evaluation, the Meff method approximated the permutation test accurately, and was more than 100 times faster. It is highly desirable to apply this method to control the EWSL and FDR in multilocus analyses. However, our experience suggests that Cheverud's (2001) estimate of Meff is overly large, especially when there are a substantial number of moderately correlated tests. This situation is, unfortunately, typical of multilocus analyses, leading to overly conservative results.

In this study, we propose a more accurate estimate of Meff, and design Meff-based procedures to control the EWSL and FDR. An assessment based on both real and simulated data shows that the procedures can control the error rate accurately in multilocus analyses.

Approach

The Meff method

When loci are in linkage disequilibrium (LD), hypothesis tests in single-locus analyses are not independent. Cheverud (2001) proposed the method below to adjust the correlated tests:

Step 1: Calculate the correlation matrix for the loci.

Step2: Estimate the effective number (Meff) of independent tests from the eigenvalues of the correlation matrix with Equation (1), where M is the number of tests (equal to the number of loci in single-locus analyses) and λi (i=1,…, M) are the eigenvalues

Step 3: Adjust the test criteria as though there were Meff independent tests with the Sidak (1967) correction below

Step 4: Test the association between genotypes and phenotypes locus by locus. If the p-value of any test is lower than αp, the nonassociation hypothesis is rejected.

A new Meff equation

The key step of the Meff method is Step 2, estimating the Meff from the eigenvalues. As Cheverud (2001) explained, Equation (1) considers two extreme situations. When the tests are completely independent, the eigenvalues are all equal to 1 and Equation (1) gives Meff=M. When the tests are completely identical, the first eigenvalue is M, while all the others are zero, and Equation (1) gives Meff=1. Although Equation (1) gives correct estimates at the two extremes, it overestimates the Meff in other situations. If the M tests are composed of c (1⩽c⩽M) copies of M/c independent tests, the eigenvalues are Sequence (3)

According to Equation (1), Meff=M+1–c, but actually there are only M/c independent tests. The ratio r is

In multilocus analyses, M is large and c is moderate, so r tends to be considerably larger than 1.

Intuitively, c identical tests will result in an eigenvalue λi=c, while a test partially correlated with others will result in an eigenvalue 0<λi<1. Thus, an eigenvalue can be decomposed into two parts: the integral part and the nonintegral part. The integral part represents identical tests that should be counted as one in the Meff. The nonintegral part represents a partially correlated test that should be counted as a fractional number between 0 and 1. From this intuitive consideration, we propose Equation (5) to estimate the Meff:

where I(x⩾1) is the indicator function, which gives 1 when x⩾1 and gives 0 otherwise, and is the floor function, which gives the largest integer less than or equal to x. Thus, completely correlated tests will be counted as I(x⩾1) and partially correlated tests will be counted as . Equation (5) gives the right Meff for integral eigenvalue Sequence (6), where t is the true number of independent tests and c1, c2,…, ct are integers

Multilocus analyses

By regarding locus-combinations as ‘generalized loci’ (GLs), we can apply the Meff method to multilocus analyses. For example, if two loci have g1 and g2 genotypes, respectively, their combination can be regarded as a new ‘locus’ with g1g2 genotypes. Consequently, the calculation of the correlation coefficients in Step 1 needs adaptation. As GLs are multinomial variables, neither the Pearson correlation (used by Cheverud, 2001) nor the LD measure (used by Nyholt, 2004) is suitable.

Similar to the Pearson correlation for numerical variables, the correlation for multinomial variables can be defined as in Equation (7a), but the variance (Var) and the covariance (Cov) should be measured differently. Let X1 and X2 denote two GLs, that is, two multinomial variables. Their variance and covariance can be measured with the entropy and mutual information (Cover and Thomas, 1991), respectively, as in Equation (7b) and (7c).

where P(Xi) is the probability function of Xi and E[x] is the expectation of a random variable x.

Besides its compatibility with both single- and multilocus analysis, this correlation coefficient has a property:

where A, B and C are independent loci or GLs.

An Meff-based FDR procedure

Let R denote the number of rejected hypotheses and V the number of those falsely rejected. Benjamini and Hochberg (1995) defined the FDR as

where I(R>0) is the indicator function, which gives 1 when R>0 and gives 0 otherwise. The FDR is equal to the EWSL when all null hypotheses are true and is smaller otherwise. They also proposed the following step-up procedure to control the FDR for independent tests. Let p(1)⩽⋯⩽p(M) be the ordered p-values of M independent hypotheses H(1),…, H(M). To control the FDR at level q (usually 5%), hypotheses H(1),…, H(k) should be rejected, where

The ordered p-values are compared with an arithmetic sequence from q/M to q. The start of the sequence, q/M, ensures that the FDR is equal to the EWSL when all the hypotheses are true. The end of the sequence, q, ensures that the FDR is controlled at q when all the hypotheses are rejected. To involve the effect of correlation, we can replace the start q/M with q/Meff, keep the end as q, and consequently replace the step q/M with (q−q/Meff)/(M−1). The Meff-based step-up procedure is to reject H(1),…, H(k), where

Performance

Single-locus analyses

The Meff method has previously been applied to two single-locus analyses. Cheverud (2001) applied it to the LG/J and SM/J mouse data (Cheverud et al, 2001), and Nyholt (2004) applied it to the angiotensin-I-converting enzyme (ACE) gene data (Keavney et al, 1998). In these analyses, Cheverud (2001) and Nyholt (2004) derived two eigenvalue sequences from the two data sets, respectively. In this study, we calculated different Meff's with both Equations (1) and (5) from the two eigenvalue sequences, and compared the results. As shown in Table 1, the two equations gave similar Meff's, so they should have similar performance on the two data sets. Setting αe=5% for the Keavney data set, Nyholt (2004) obtained αp=0.015 from a permutation test with 50 000 shuffles, and obtained αp=0.011 with the Meff=4.59 from Equation (1). Using Meff=4.01 from Equation (5), we obtained αp=0.013, a value closer to the permutation test's 0.015 than the value of 0.011 obtained from Equation (1).

Table 1 Comparison between Equations (1) and (5) on two eigenvalue sequences

Multilocus analyses

To compare the performance of Equations (1) and (5) in multilocus analyses, we used four sub-data sets extracted from a schizophrenia data set (Xu et al, 2004). The schizophrenia data set concerns 12 genes of the dopamine pathway, COMT (7), ALDH3 (11), ADH (14), TYR (1), TPO (2), MAO (3), DAO (2), AOX1 (2), DDC (4), HDG (1), DBH (2) and GSTZ1 (1), with 83 cases and 108 controls. (The integers in the parentheses are the numbers of SNPs genotyped for each gene.) To investigate the performance on correlated loci, we extracted the SNPs of gene COMT, ALDH3 and ADH as three sub-data sets COMT, ALDH3 and ADH, respectively. The three genes were reported as potential effective SNPs by Xu et al (2004), and they have enough SNPs (more than five) to ensure the multiplicity of tests in the analyses. To investigate the performance on independent loci, we selected one SNP per gene as the fourth sub-data set and called it DOP (short for dopamine). For each data set, we performed single-, two- and three-locus analyses.

We used the permutation test as the golden standard to evaluate the performance. In each permutation shuffle, we applied the Meff method to control the EWSL (αe). As a result, we obtained a decision of whether or not to reject the nonassociation hypothesis. After repeating this procedure 3000 times, we counted the proportion of rejection (Pr). Ideally, Pr should be equal to the preset αe. The association between a GL and the phenotype was tested with a combination of the χ2 test and the Fisher's exact test (When Cochran (1954)'s condition was satisfied, the association was tested with the χ2 test. When the condition was not satisfied, it was tested with the Fisher's exact test. Cochran's condition: in a contingency table, every cell has an expected value no less than 1, and 80% of the cells have expected values of no less than 5. This algorithm was implemented in ACM algorithm 643 (Clarkson et al, 1993).)

Figure 1 shows the comparison between Pr and αe. In single-locus analyses, both Equations (1) and (5) controlled Pr at αe. In two- and three-locus analyses, the Meff's obtained from Equation (1) were much greater than those obtained from the permutation test (see Table 2), and consequently the Pr of Equation (1) was much lower than αe. In contrast, the Pr of Equation (5) approximated αe fairly well in most cases.

Figure 1
figure 1

Proportion of rejection vs EWSL. The EWSL (αe) was set at different levels, and then the proportion of rejection (Pr) was estimated with a permutation test with 3000 shuffles. Ideally, Pr should be equal to αe. The dashed lines are the result of the Meff from Equation (1), and the solid lines are the result of the Meff from Equation (5). All correlation matrices were calculated with Equation (7).

Table 2 Comparison between Equations (1) and (5) on the schizophrenia data

Effects on FDR

We performed two-locus analyses on simulated case–control data to evaluate the effects of the Meff on the FDR. The simulated data contains 10 genomic regions with five observed biallelic SNPs in each region. All SNPs are in the Hardy–Weinberg equilibrium and the minor alleles have the same frequency. SNPs in different regions are in linkage equilibrium, and SNPs in the same region are in LD with r2=0.8 (Hill and Robertson, 1968) between two successive SNPs. The disease is caused by the epistatic interaction of two unobserved loci, which hide in the first region and the second region, respectively. The LD (r2) between a causative loci and its nearest marker SNP is 0.8. As neither of the causative loci has a marginal effect, a two-locus analysis is necessary to detect the epistatic interaction. Consequently, there are C502=1225 SNP pairs to inspect, and only the 25 pairs composed of the loci from the first and second regions are associated with the disease. The data were simulated from the six epistasis models in Table 3 (Ritchie et al, 2003). For each model, 10 000 data sets were generated with 200 cases and 200 controls in each of them.

Table 3 Epistasis models of disease

Both Benjamini and Hochberg (1995)'s direct step-up procedure and the new Meff-based step-up procedure were applied to control the FDR. Although Benjamini and Yekutieli (2001) designed an FDR procedure for correlated multiple testing, we did not employ it because it is much more conservative than the direct step-up procedure. The Meff was estimated with Equation (5). As shown in Figure 2, the Meff-based step-up procedure controlled the FDR more accurately than the direct method, and was considerably more powerful.

Figure 2
figure 2

Comparison between the direct step-up procedure and the Meff -based step-up procedure. The FDR and power were estimated from 10 000 data sets, each with 200 cases and 200 controls. Ideally, the empirical FDR should be equal to q, which is the FDR level expected to be controlled at.

Discussion and conclusion

In 2001, Cheverud proposed the idea of adjusting correlated tests as if they were independent according to the effective number (Meff) of independent tests. Piepho (2001) has also proposed a quick method for computing approximate threshold levels that control the genome-wise type I error rate of tests for quantitative trait locus detection in interval mapping and composite interval mapping. In the multilocus analyses presented in this paper, our extended concept of Meff succeeded in controlling both the EWSL and the FDR. This result supports the use of Meff in error-rate control of correlated multiple testing. As a data-driven algorithm based on a correlation matrix, the Meff method can be applied to a wide range of problems. Nowadays, dense and linked markers are widely used in genetic study, and it is hard to model their genetic behavior precisely, especially in multilocus analyses. The diversity of populations also prohibits models obtained from one population from being applied to another. Avoiding the error of an imprecise model, the data-driven Meff method is suitable for different situations. In addition to the data-driven property, the Meff can be calculated for various types of data by generalizing the definition of correlation coefficient. With Equation (7), it can be applied to categorical data. With the Pearson correlation, it can be applied to numerical data such as microarray data. With King and Chinchilli (2001)'s generalized correlation coefficient, it can be applied to a mixture of numerical and categorical data.

Cheverud (2001) proposed Equation (1) to estimate the Meff. As he explained, Equation (1) considers two extreme situations: hypothesis tests are completely independent or they are completely identical. However, Equation (4) shows that Equation (1) tends to overestimate the Meff when there are a large number of moderately correlated tests. Unfortunately, in multilocus analyses, this situation is usually encountered, and Equation (1) will lead to conservative reports. In this paper, we propose Equation (5) to estimate the Meff more accurately. It considers a more general situation than the two extremes Equation (1) concerns, where tests are composed of copies of independent tests. Therefore, it outperforms Equation (1) in multilocus analyses. Figure 1 clearly shows that both Equations (1) and (5) perform excellently in single-locus analyses, but only the latter is effective in multilocus analyses. The higher performance of Equation (5) over Equation (1) is not because it better fits the correlation coefficient defined by Equation (7). As Equation (4) has shown, no matter what correlation coefficient is employed to calculate the correlation matrix, Equation (1) will overestimate the Meff for eigenvalue Sequence (3), while Equation (5) will give an appropriate estimate. We also used the Pearson correlation in single-locus analyses, but as shown in Table 2, its performance was similar to that of Equation (7).

If n tests need adjustment, the computation complexity of a permutation test is O(stn), where s is the number of shuffles and t is the time needed to test a hypothesis. The computation cost for the Meff method concentrates on eigenvalues' calculation whose complexity is O(mn3), where m is the time needed to multiply two real numbers. Theoretically, when n is large, the permutation test is more efficient than the Meff method. However, because m is tiny in comparison with st, the Meff method is much faster than the permutation test until n is very large. The largest n we encountered in this study was 1225 in Section Effects on FDR. It took about 10 000 s to repeat the calculation of the 1225 p-values 10 000 times, and took about 3 s to solve the eigenvalues with the numeric package Lapack. (The computation time was measured on a 2.8 GHz Pentium with Linux 2.6.10.) When there are about 1000 tests, the Meff method is at least 1000 times faster than the permutation test. This would meet the need of a genome scan with about 1000 SNPs per chromosome. As SNPs in different chromosomes are in linkage equilibrium, the correlation matrix of a single-locus analysis is diagonally blocked. We can calculate the Meff for each chromosome individually and then sum them up as the Meff for the scan.

Freed from the heavy computation burden, statisticians can realize many sophisticated multilocus methods. For example, we can split a classification tree's nodes according to the EWSL or the FDR rather than the PWSL. In a tree's growth procedure, a node splits when the p-value of the optimal variable exceeds a threshold. The more the variables are, the easier it is for the optimal variable to exceed the threshold, and the more easily the tree overfits. Zhang and Bonney (2000) considered this problem in splitting, but they arbitrarily set the PWSL<0.001 as the threshold. With the Meff method, the threshold can be estimated efficiently. If the threshold is calculated with the permutation test, the computation will be very demanding because every node needs thousands of shuffles.

Multiple testing is not only important in data analysis but also in experiment design. Neglecting the issue would lead to an underpowered design, while overemphasizing it would waste resources. Owing to the expense of genetic experiments, it is highly desirable to apply the Meff method to experiment designs. By calculating the Meff from a pilot study, we can estimate the appropriate PWSL and the sample size for an experiment involving multiple testing.