4 Conclusion

Stephen J. Kaputa and Katherine Jenny Thompson

Previous

The fundamental finding from Thompson and Sigman (2000) was that interpolation methods can be used to produce stable median estimates for samples from positively skewed populations, but the effectiveness of the interpolation was highly dependent on both the width of the bins and their location in the sample. Their primary contribution was to develop a data-dependent binning approach that used each individual estimation cell distribution.

Our approach to determining a decile estimation method for complex samples from a positively skewed population builds on these earlier findings, recognizing both that data-dependent binning is a necessity and that the binning method selected must account for a positively skewed distribution to facilitate the complete set of decile estimates. We considered three interpolation methods, each of which took a different approach to resolving the sparse data problem at the 90th decile posed by the skewed distributions. Our empirical analysis showed that all of the studied approaches yielded complete sets of decile estimates with reasonable statistical properties, at least for the Survey of Construction. However, the properties of the corresponding MHS variance estimates were not as good and exhibited different patterns. At the U.S. level, our simulation results demonstrate consistently good statistical properties for decile estimation and variance estimation using the P95 transformation and 75 bins in one simulated population in terms of estimate bias, MSE, bias of the variance estimates, and stability, while rarely achieving a 90% coverage rate.

Of course, it is much more challenging to estimate a complete set of deciles than a single median, especially from positively skewed distributions. However, our recommended method appears to work quite well for most decile estimates and could certainly be modified to produce viable quartile estimates if the production decile estimates prove too unstable. In the meantime, the SOC program has decided to implement the P95 interpolation method and produce complete sets of decile estimates for selected annual characteristics in future reports.

Although we believe that our findings can be extended to other survey designs, we recognize that our research is conducted under extremely restrictive conditions, namely multi-stage cluster sampling from a highly skewed population, with a two-PSU per stratum design at the first stage. In other applications, interpolation with data-dependent bins could be combined with the variance estimator proposed in the 1952 Woodruff paper, as suggested by J.N.K. Rao. For surveys that are not well suited to BRR or MHS replication that publish decile estimates, our data-dependent binning and interpolation approach could be used in conjunction with a bootstrap replication method such as the Rao-Wu bootstrap (Rao and Wu 1988).

Acknowledgements

This report is released to inform interested parties of research and to encourage discussion. Any views expressed on statistical, methodological, or technical issues are those of the authors and not necessarily those of the U.S. Census Bureau. The authors acknowledge Erica Filipek, Bonnie Kegan, Amy Newman-Smith for their valuable contributions to this research project. In addition, we thank Wan-Ying Chang, Laura Bechtel, Xijian Liu, J.N.K. Rao, the Associate Editor, and two anonymous referees for their helpful comments of earlier drafts of this manuscript.

References

Fay, R.E. (1989). Theory and application of replicate weighting for variance calculations. Proceedings of the Section on Survey Research Methods, American Statistical Association.

Judkins, D.R. (1990). Fay's method for variance estimation. Journal of Official Statistics, 6, 223-239.

Lienhard, S. (2004). Multivariate Lognormal Simulation with Correlation. http://www.mathworks.com/ matlabcentral/fileexchange/loadFile.do?objectId=6426&objectType=File.

Rao, J.N.K., and Shao, J. (1996). On balanced half-sample variance estimation in stratified random sampling. Journal of the American Statistical Association, 91, 343-348.

Rao, J.N.K., and Shao, J. (1999). Modified balanced repeated replication for complex survey data. Biometrika, 86, 403-415.

Rao, J.N.K., and Wu, C.F.J. (1988). Re-sampling inference with complex survey data. Journal of the American Statistical Association, 83, 231-241.

Steel, P., and Fay, R.W. (1995). Variance estimation for finite populations with imputed data. Proceedings of the Section on Survey Research Methods, American Statistical Association.

Thompson, J.R. (2000). Simulation: A Modeler's Approach. New York: John Wiley & Sons, Inc., 87-110.

Thompson, K.J. (1998). Evaluation of Modified Half Sample Replication for Estimating Variances for the Survey of Construction (SOC). Technical Report #ESM9801, available upon request to the Office of Statistical Methods and Research for Economic Programs from the U.S. Census Bureau.

Thompson, K.J., and Sigman, R.S. (2000). Estimation and replicate variance estimation of median sales prices of sold houses. Survey Methodology, 26, 2, 153-162.

Woodruff, R.S. (1952). Confidence intervals for medians and other position measures. Journal of the American Statistical Association, 47, 635-646.

Previous

Date modified: