GRIMMEST: Granularity of test statistics
James Heathers and Nick Brown founded the field of granularity testing with their method to check the consistency of means, the GRIM test. I then extended granularity testing to measures of variability such as standard deviations, variances, and standard errors with the GRIMMER test. The question was then whether granularity testing could be extended to test statistics such as the F statistic or T statistic. Through my recent work I realized how this can be accomplished.
Multiple test statistics can be calculated when you only know the sample sizes, means, and standard deviations. For example, it is fairly simple to calculate a one-way ANOVA or two-sample t-test when given these values. Even two-way ANOVAs can be recalculated with just these statistics, see here. However, because of rounding issues it is not sufficient to simply plug the reported statistics into these formulas.
When a statistic is reported to two decimals, such as 5.45, the original statistic could have been anything from 5.445 to 5.455 depending on rounding conventions. As a result, it is necessary to add and subtract .005 before recalculating the test statistic. However, this does not have to be done completely haphazardly. For example, when trying to determine the upper limit of the possible test statistics .005 should be subtracted from all standard deviations to make them smaller. Alternatively, when determining the lower limit of possible test statistics .005 should be added to all standard deviations. If only two groups are present, .005 should be added to the larger mean and .005 should be subtracted from the smaller mean to get the largest possible test statistic, while the opposite should be done for the smallest possible test statistic. When more than two groups are present all combinations need to be tested which can become tedious. For example, with a 3X2 ANOVA, i.e. 6 groups, there are 729 possible changes to the means that need to be performed.
This presented method works fairly well, however it only provides a range of possible test statistics, it does not provide the exact test statistic. When working with small standard deviations or small means, adding and subtracting .005 can result in a large range of values. This range can be narrowed or completely eliminated by taking advantage of granularity testing.
When you run the GRIM test or GRIMMER test the result is that the value was either consistent or inconsistent. When the value is consistent it is not difficult to determine what the exact mean(s) or standard deviation(s) were. For example, with a sample size of 3 a possible mean for a data set comprised of whole numbers would be 1.333333... This would get rounded to 1.33 in the publication and pass the GRIM test. Although the number is rounded to 1.33, it is obvious in this case that the actual value was 1.3333333... As a result, when the means and standard deviations pass the GRIM and GRIMMER tests, the true means and standard deviations can then be determined to high precision, and then these high precision values can be used to recalculate the test statistic to high precision.
As always with granularity testing, large sample sizes will cause a problem. At larger sample sizes there will be multiple possible means, and multiple possible standard deviations. In this case each high precision value would have to be used to recalculate the test statistic, which would result in several possible test statistics, but will still represent an improvement over the range method.
The GRIMMEST test can be applied to any test statistic that can be recalculated when you only know the sample sizes, means, and either standard deviations, variances, or standard errors. I do not currently have a web application set up for the GRIMMEST test. I also haven't settled on a name for the acronym. I just wanted share the final frontier of granularity testing with the scientific community.