Research on Permutation Test
1. Introduction of Permutation Test
In order to compare two populations, such as if there is any difference between treatment groups, permutation tests is often used in practice. This is because permutation test has the flexibility of the test statistic and minimal assumption of the data distribution. As for permutation test, it is possible to choose a test statistic that is suited to the task at hand. For example, we can choose the test statistic as the difference between the means of treatment 1 and means of treatment 2, and we also can use the sum of of either treatment 1 or 2 to be our test statistic. Thus, permutation test is flexible in choosing the test statistic. In addition, permutation test does not require the underlying distribution to be specified as compared to parametric test (i.e z-test and t-test). Therefore, it is widely used to test the null hypothesis of no difference between treatment groups. In permutation test, the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points.
2. Steps to conduct permutation test
Step 1: Analyze the problem and identify the null hypothesis and alternative hypothesis
Step 2: Choose a test statistic and establish a rejection rule that will distinguish the null from the alternative
Step 3: Compute the test statistic for the original observations
Step 4: Rearrange the observations
– Compute the test statistic for the rearrange data
– Compare the test statistic from original observation with the ones from re-arranged data
Step 5: Draw conclusion
3. Formula to be used:
Calculation of the number of combination: n! / [r!* (n-r)!]
4. Advantages and disadvantages of permutation test
1) Advantages:
– Flexibility in choosing test statistic
– Provides an exact test of our hypothesis without requiring us to make any assumptions about the underlying model that generated the data (e.g., the assumption of normality in the t-test)
2) Disadvantages: The magnitude of the computational problem scales up very rapidly with the sample size.
Resource:
[1] http://ww.w.msme.us/2008-2-3.pdf
[2] http://mcardle.oncology.wisc.edu/mstat/help/help/Notes-05.html