Sample survey theory and methods: Past, present, and future directions
Section 4. The future

We can project a number of current situations into the future. Budgets will be tight and requests for products will expand. There will be demand for forecasts, and for improved access by users. There will be requests for statistics to be produced more rapidly and, naturally, with no compromise in quality. There will be pressure to bring estimates from different sources into agreement.

We expect faster computing to influence all aspects of the field. More complex edit and imputation algorithms will be developed. The time from collection to publication will be shortened. More complex analyses will be performed on survey data. Record linkage procedures will be improved. Data will be made available in different forms. Searchable databases where the user provides queries will become more common. The use of auxiliary data of all kinds, and in particular administrative data, will increase. Administrative data will be used both as auxiliary data and as the direct estimates for certain items. Citro (2014) gives examples of items where administrative data can be used to replace answers to questions in a questionnaire. Uses of auxiliary data where matching to collected data is imperfect will be a research area.

Modern communication methods and social media have resulted in vast quantities of data, much generated with short term and poorly identified purpose. The term “Big Data” is not well defined, but most would agree that social media data are a part of Big Data. The AAPOR report on Big Data (2015) is an excellent analysis of the potential and the challenges associated with Big Data. Tam and Clarke (2015) and Pfeffermann (2015) discuss the issues from the perspective of a governmental statistical organization. As part of modern society, social media are of interest to social scientists in their own right. Therefore, indexes and summaries of these data are, and will be, produced. An example is the University of Michigan Social Media Job Loss Index. Sampling has a large role to play in the creation of products from these data.

A challenge is transforming some types of Big Data into a form useful as auxiliary data. One example is the Porter, Holan, Wikle and Cressie (2014) use of Google trends of Spanish words as functional covariates to estimate state proportions of people speaking Spanish using American Community Survey estimates as dependent variables in small area models.

One of the often quoted advantages of samples relative to censuses is cost. The cost structure has changed with increased computing power and seems destined to continue to change. In the United States, the National Land Cover Database is a census of land cover (Han, Yang, Di and Mueller, 2012). Classification procedures are expected to improve so that use of such data as auxiliary data will increase. Data collection agencies will invest more in constructing improved auxiliary data files at the population level so that some data now collected on a sample basis will be collected at a population level. The same types of data development will continue for population and business statistics.

Of necessity, our discussion has little on collection. The way in which data collection procedures have been modified with changing technology is perhaps more obvious than the link between technology and theory. For the links to theory see Bellhouse (2000). Computer-assisted data collection is the evolving standard. The use of geo-location technology can be expected to increase. It is safe to forecast the increased use of remote sensing and remote data collection devices. For example, it would be easy to incorporate physical data collected by something like the Apple Watch or Fitbit into a health study. Larger and less attractive monitoring devices are currently in use in physical activity surveys (van Remoortel, Giavedoni, Raste, Burtin, Louvaris, Gimeno-Santos, Langer, Glendenning, Hopkinson, Vogiatzis, Peterson, Wilson, Mann, Rabinovich, Puhan, Troosters and PROactive consortium, 2012).

The recent experience is that phone and personal interview data collection is becoming more and more difficult. Respondents are facing expanded organized data collection activities. The ubiquitous questionnaire on satisfaction for everything from medical services to tooth paste surely must impact an individual’s willingness to respond. It seems reasonable to forecast increased difficulty in obtaining cooperation for traditional methods of data collection. Associated with that trend will be increased study of the nature of non-respondents and of non-response. Likewise efforts will be made to adapt data collection to the changing methods of communication.

Nonprobability samples have been a part of survey activity throughout the post-Neyman period. In particular, quota sampling is commonly used in marketing research and other areas for cost reasons (Sudman, 1966; 1976). Moser and Stuart (1953) and Stephan and McCarthy (1958) made early comparisons between quota sampling and probability sampling. Cochran (1977, page 136) says “The quota method seems likely to produce samples that are biased on characteristics such as income, education and occupation, although it often agrees with the probability samples on questions of opinion and attitude”. Use of procedures such as post stratification and regression estimation in nonprobability samples has continued at pace with use in probability samples. The changing nature of human communication offers opportunities for both model-based and probability-based procedures. Because of cost structures, new methods such as web-based procedures will often be used first in nonprobability settings and for nongovernmental purposes.

As matching procedures improve and as demand for detailed data increases, disclosure limitation procedures and associated research will receive increased attention.

Survey sampling is an application discipline, functioning in the current social, geographic, cultural, and technological world. To forecast how our field will be impacted by social and cultural changes, even in the short run, is a challenge. Will the fact that one must assume that almost all of one’s public activity and a great deal of one’s private activity has potential of being recorded lead to a more relaxed attitude in responding to questions? Will improved monitoring devices make respondents more willing to permit their physical activities be monitored? Or will all of the incidental monitoring lead to a reaction against organized data collection? Will increased availability of results based on collected data have a positive or negative effect on data collection efforts? What is the impact of various Social Media?

This discussion makes clear that factors external to our discipline will determine our future activities. We will be required to adapt in data collection, data processing, and data presentation-dissemination.

Acknowledgements

We thank Graham Kalton for comments and suggestions that led to improvements in the original draft. We thank the four discussants, Graham Kalton, Sharon Lohr, Danny Pfeffermann and Chris Skinner, for their supplements on the history, insightful observations on the present, and comments on the future of survey sampling. We elected not to prepare a rejoinder because we found much to appreciate and little basis for disagreement.

References

AAPOR Big Data Task Force (2015). AAPOR Report on Big Data. https://www.aapor.org/AAPOR_Main/media/MainSiteFiles/images/BigDataTaskForceReport_FINAL_2_12_15_b.pdf.

Anderson, R., Kasper, J. and Frankel, M. (1979). Total Survey Error: Applications to Improve Health Surveys. San Francisco, CA: Jossey-Bass.

Andridge, R.H., and Little, R.J. (2009). The use of sample weights in hot deck imputation. Journal of Official Statistics, 25, 21-36.

Antal, E., and Tillé, Y. (2011). A direct bootstrap method for complex sampling designs from a finite population. Journal of the American Statistical Association, 106, 534-543.

Battese, G.E., Harter, R.M. and Fuller, W.A. (1988). An error component model for prediction of county crop areas using survey and satellite data. Journal of the American Statistical Association, 83, 28-36.

Beaumont, J.-F., and Patak, Z. (2012). On the generalized bootstrap for sample surveys with special attention to Poisson sampling. International Statistical Review/Revue Internationale de Statistique, 80, 127-148.

Bellhouse, D.R. (1988). Systematic sampling. In Handbook of Statistics, (Eds., P.R. Kreshnaiah and C.R. Rao), Elsevier, 6, 125-145.

Bellhouse, D.R. (2000). Survey sampling theory over the twentieth century and its relation to computing technology. Survey Methodology, 26, 1, 11-20. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2000001/article/5174-eng.pdf.

Bethlehem, J. (2009). The rise of survey sampling. Discussion Paper (09015), Statistics Netherlands, The Hague.

Bickel, P.J., and Freedman, D.A. (1984). Asymptotic normality and the bootstrap in stratified sampling. The Annals of Statistics, 12, 470-482.

Biemer, P.P., and Lyberg, L. (2003). Introduction to Survey Quality. New York: John Wiley & Sons, Inc.

Binder, D.A., and Roberts, G.A. (2003). Design-based and model-based methods for estimating model parameters. In Analysis of Survey Data, (Eds., R.L. Chambers and C.J. Skinner), Wiley, Chichester, UK, 29-48.

Bowley, A.L. (1926). Measurement of the precision attained in sampling. Bulletin of the International Statistical Institute, 22(1), 1-62.

Bowley, A.L. (1936). The application of sampling to economics and sociological problems. Journal of the American Statistical Association, 31, 474-480.

Brewer, K.R.W. (1963). Ratio estimation and finite populations: Some results deducible from the assumption of an underlying stochastic process. Australian Journal of Statistics, 5, 93-105.

Brewer, K.R.W. (2013). Three controversies in the history of survey sampling. Survey Methodology, 39, 2, 249-262. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2013002/article/11883-eng.pdf.

Brick, M.J. (2011). The future of survey sampling. Public Opinion Quarterly, 75, 872-888.

Brick, M.J., and Montaquila, J.M. (2009). Non response and weights. In Handbook of Statistics, (Eds., D. Pfeffermann and C.R. Rao), Elsevier, Amsterdam, 29A, 163-185.

Cassel, C.M., Särndal, C.-E. and Wretman, J.H. (1976). Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika, 63, 615-620.

Citro, C.F. (2014). From multiple modes for surveys to multiple data sources for estimates. Survey Methodology, 40, 2, 137-161. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2014002/article/14128-eng.pdf.

Cochran, W.G. (1953). Sampling Techniques. New York: John Wiley & Sons, Inc.

Cochran, W.G. (1977). Sampling Techniques, 3rd Edition. New York: John Wiley & Sons, Inc.

Dalenius, T. (1974). Ends and Means of Total Survey Design. Report in “Errors in Surveys”, Stockholm University.

Deming, E. (1944). On errors in surveys. American Sociological Review, 9, 359-369.

Deming, E. (1950). Some Theory of Sampling. New York: John Wiley & Sons, Inc.

Deming, W.E. (1953). On a probability mechanism to attain an economic balance between the resultant error of non-response and the bias of non-response. Journal of the American Statistical Association, 48, 743-772.

Deming, W.E., and Stephan, F.F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Annals of Mathematical Statistics, 11, 4, 427-444.

Deville, J.-C., and Särndal, C.-E. (1992). Calibration-estimators in survey sampling. Journal of the American Statistical Association, 376-382.

Deville, J.-C., and Tillé, Y. (2004). Efficient balanced sampling: The cube method. Biometrika, 91, 893-912.

Dippo, C.S., Fay, R.E. and Morgenstein, D.H. (1984). Computing variances from complex samples with replicate weights. Proceedings of the American Statistical Association, Section on Survey Research Methods, 489-494.

DuMouchel, W.H., and Duncan, G.J. (1983). Using sample survey weights in multiple regression analysis of stratified samples. Journal of the American Statistical Association, 78, 535-543.

Durbin, J. (1958). Sampling theory for estimates based on fewer individuals than the number selected. Bulletin of the International Statistical Institute, 36, 113-119.

Durbin, J. (1959). A note on the application of Quenouille’s method of bias reduction to the estimation of ratios. Biometrika, 46, 477-480.

Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7, 1-26.

Fay, R.E., and Herriot, R.A. (1979). Estimates of income for small places. An application of James-Stein procedures to census data. Journal of the American Statistical Association, 74, 366, 269-277.

Francisco, C.A., and Fuller, W.A. (1991). Estimation of quantiles with survey data. Annals of Statistics, 19, 454-469.

Fuller, W.A. (1975). Regression analysis sample survey. Sankhyā, Series C, 37, 117-132.

Fuller, W.A. (1984). Least squares and related analyses for complex survey designs. Survey Methodology, 10, 1, 97-118. Paper available at http://www.statcan.gc.ca/pub/12-001-x/1984001/article/14352-eng.pdf.

Fuller, W.A. (2009a). Some design properties of a rejective sampling procedure. Biometrika, 96, 1-12.

Fuller, W.A. (2009b). Sampling Statistics. New York: John Wiley & Sons, Inc.

Fuller, W.A., and An, A.B. (1998). Regression adjustments for nonresponse. Journal of Indian Society of Agricultural Statistics, 51, 331-342.

Godambe, V.P. (1955). A unified theory of sampling from finite populations. Journal of the Royal Statistical Society, Series B, 17, 269-278.

Godambe, V.P. (1966). A new approach to sampling from finite populations. Journal of the Royal Statistical Society, Series B, 28, 310-328.

Gonzalez, M.E. (1973). Use and evaluation of synthetic estimates. Proceedings of the Social Statistics Section of the American Statistical Association, 33-36.

Groves, R.M. (1989). Survey Errors and Survey Costs. New York: John Wiley & Sons, Inc.

Groves, R.M. (2011). Three eras of survey research. Public Opinion Quarterly, 73, 861-871.

Groves, R.M., and Heeringa, S.G. (2006). Responsive designs for household surveys: Tolls for actively controlling survey errors and costs. Journal of the Royal Statistical Society, Series A, 169, 439-457.

Groves, R.M., and Lyberg, L. (2010). Total survey error: Past, present and future. Public Opinion Quarterly, 74, 849-879.

Hájek, J. (1960). Limiting distributions in simple random sampling from a finite population. Publications of the Mathematical Institute of the Hungarian Academy, 5, 361-374.

Hájek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Annals of Mathematical Statistics, 35, 1491-1523.

Han, W., Yang, Z., Di, L. and Mueller, R. (2012). CropScape: A Web service based application for exploring and disseminating US conterminous geospatial cropland data products for decision support. Computers and Electronics in Agriculture, 84, 111-123.

Hansen, M.H., and Hurwitz, W.N. (1943). On the theory of sampling from finite populations. Annals of Mathematical Statistics, 14, 333-362.

Hansen, M.H., and Hurwitz, W.N. (1946). The problem of non-response in sample surveys. Journal of the American Statistical Association, 41, 517-529.

Hansen, M.H., Hurwitz, W.N. and Madow, W.G. (1953). Sample Survey Methods and Theory, Vols. I and II, New York: John Wiley & Sons, Inc.

Hansen, M.H., Madow, W.G. and Tepping, B.J. (1983). An evaluation of model-dependent and probability-sampling inferences in sample surveys. Journal of the American Statistical Association, 78, 776-793.

Hansen, M.H., Hurwitz, W.N., Marks, E.S. and Mauldin, W.P. (1951). Response errors in surveys. Journal of the American Statistical Association, 46, 147-190.

Hansen, M.H., Hurwitz, W.N., Nisselson, H. and Steinberg, J. (1955). The redesign of the census current population survey. Journal of the American Statistical Association, 50, 701-719.

Hartley, H.O. (1959). Analytical studies of survey data. In Volume in Honor of Corrado Gini, Instituto di Statistica, Rome, 1-32.

Hartley, H.O., and Rao, J.N.K. (1968). A new estimation theory for sample surveys. Biometrika, 55, 547-557.

Hidiroglou, M.A., Fuller, W.A. and Hickman, R.D. (1976). SUPER CARP Statistical Laboratory, Survey Section, Iowa State University, Ames, IA.

Horvitz, D.G., and Thompson, D.J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663-685.

Huang, E.T., and Fuller, W.A. (1978). Nonnegative regression estimation for sample survey data. Proceedings of the Social Statistics Section, American Statistical Association, 300-305.

Jagers, P. (1986). Post-stratification against bias in sampling. International Statistical Review/Revue Internationale de Statistique, 54, 159-167.

Kalton, G. (1983). Compensating for Missing Survey Data. Survey Research Center, University of Michigan, Ann Arbor, Michigan.

Kalton, G., and Kish, L. (1984). Some efficient random imputation methods. Communications in Statistics, A13, 1919-1939.

Kiaer, A. (1897). The representative method of statistical surveys (1976 English translation of the original Norwegian), Oslo, Central Bureau of Statistics of Norway.

Kim, J.K., and Fuller, W.A. (2004). Fractional hot deck imputation. Biometrika, 91, 559-578.

Kim, J.K., and Shao, J. (2013). Statistical Methods for Handling Incomplete Data. CRC Press, Boca Raton, FL.

Kish, L. (1995). The hundred years’ wars of survey sampling. Statistics in Transition, 2, 813-830.

Kish, L., and Frankel, M.R. (1970). Balanced repeated replications for standard errors. Journal of the American Statistical Association, 65, 1071-1094.

Kish, L., and Frankel, M.R. (1974). Inference from complex samples (with discussion). Journal of the Royal Statistical Society, Series B, 36, 1-37.

Korn, E.L., and Graubard, B.I. (1995). Analysis of large health surveys: Accounting for the sampling designs. Journal of the Royal Statistical Society, Series A, 158, 263-295.

Kreuter, F. (2013). Improving Surveys with Paradata. Hoboken: Wiley.

Krewski, D., and Rao, J.N.K. (1981). Inference from stratified samples: Properties of the linearization, jackknife, and balanced repeated replication methods. Annals of Statistics, 9, 1010-1019.

Little, R.J.A. (1982). Models for nonresponse in sample surveys. Journal of the American Statistical Association, 77, 237-250.

Little, R.J.A., and Rubin, D.B. (1987, 2014). Statistical Analysis with Missing Data. New York: John Wiley & Sons, Inc., (Second Edition 2014).

Madow, W.G. (1948). On the limiting distribution of estimates based on samples from finite universes. Annals of Mathematical Statistics, 19, 535-545.

Madow, W.G., and Madow, L.H. (1944). On the theory of systematic sampling, I. The Annals of Mathematical Statistics, 15, 1-24.

Madow, W.G., Nisselson, H. and Olkin, I. (Eds.) (1983). Incomplete Data in Sample Surveys, 1, 2, 3, New York: Academic Press.

Mahalanobis, P.C. (1939). A sample survey of the acreage under jute in Bengal. Sankhyā, 4, 511-531.

Mahalanobis, P.C. (1944). On large-scale sample surveys. Philosophical Transactions of the Royal Society of London, Series B, 231, 329-451.

Mahalanobis, P.C. (1946). Recent experiments in statistical sampling in the Indian Statistical Institute. Journal of the Royal Statistical Society, 109, 325-378.

McCarthy, P.J. (1966). Replication: An approach to the analysis of data from complex surveys. Vital and Health Statisitcs, Series 2, No. 14, National Center for Health Statistics, Public Health Service, Washington DC.

McCarthy, P.J. (1969). Pseudoreplication: Further evaluation and application of the balanced half-sample technique. Vital and Health Statistics, Series 2, No. 31, National Center for Health Statistics, Public Health Service, Washington, DC.

McCarthy, P.J. (1969). Pseudoreplication: Half-samples. International Statistical Review/Revue Internationale de Statistique, 37, 239-264.

McCarthy, P.J., and Snowden, L.B. (1985). The bootstrap and finite population sampling. Vital Health Statistics, 2-95, Public Health Service Publication, 85-1369, U.S. Government Printing Office, Washington, D.C.

Narain, R.D. (1951). On sampling without replacement with varying probabilities. Journal of the Indian Society of Agricultural Statistics, 3, 169-174.

Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97, 558-625.

Owen, A.B. (1988). Empirical likelihood ration confidence intervals for a single functional. Biometrika, 75, 237-249.

Pfeffermann, D. (2015). Methodological issues and challenges in the production of official statistics. Journal of Survey Statistics and Methodology, 3, 425-483.

Pfeffermann, D., and Sverchkov, M. (1999). Parametric and semiparametric estimation of regression models fitted to survey data. Sankhyā, Series B, 61, 166-186.

Porter, A.T., Holan, S.H., Wikle, C.K. and Cressie, N. (2014). Spatial Fay-Herriot model for small area estimation with functional covariates. Spatial Statistics, 10, 27-42.

Purcell, N., and Kish, L. (1979). Estimation for small domains. Biometrics, 35, 365-384.

Rao, J.N.K. (2003). Small Area Estimation. Hoboken: Wiley.

Rao, J.N.K. (2005). Interplay between sample survey theory and practice: An appraisal. Survey Methodology, 31, 2, 117-138. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2005002/article/9040-eng.pdf.

Rao, J.N.K., and Molina, I. (2015). Small Area Estimation: Second Edition. Hoboken: Wiley.

Rao, J.N.K., and Scott, A.J. (1981). The analysis of categorical data from complex sample surveys: Chi-squared tests for goodness of fit and independence in two-way tables. Journal of the American Statistical Association, 76, 221-230.

Rao, J.N.K., and Scott, A.J. (1984). On chi-squared tests for multiway contingency tables with cell proportions estimated from survey data. Annals of Statistics, 12, 46-60.

Rao, J.N.K., and Shao, J. (1999). Modified balanced repeated replication for complex survey data. Biometrika, 86, 403-415.

Rao, J.N.K., and Wu, C.F.J. (1988). Resampling inference with complex survey data. Journal of the American Statistical Association, 83, 231-241.

Royall, R.M. (1968). An old approach to finite population sampling. Journal of the American Statistical Association, 63, 1269-1279.

Royall, R.M. (1970). On finite population sampling theory under certain linear regression models. Biometrika, 57, 377-387.

Rubin, D.B. (1974). Characterizing the estimation of parameters in incomplete data problems. Journal of the American Statistical Association, 69, 467-474.

Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581-590.

Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons, Inc.

Schaible, W.L. (Ed.) (1996). Indirect Estimators in U.S. Federal Programs. New York: Springer.

Sitter, R.R. (1992). A resampling procedure for complex survey data. Journal of the American Statistical Association, 87, 755-765.

Skinner, C.J. (1994). Sample models and weights. In Proceedings of the Survey Research Methods Section, American Statistical Association, 133-142.

Steinberg, J. (Ed.) (1979). Synthetic Estimates for Small Areas: Statistical Workshop Papers and Discussion. NIDA Research Monograph No. 24. U.S. Government Printing Office, Washington, D.C., U.S.A.

Stephan, F., and McCarthy, P.J. (1958). Sampling Opinions. New York: John Wiley & Sons, Inc.

Sudman, S. (1966). Probability sampling with quotas. Journal of the American Statistical Association, 61, 749-791.

Sudman, S. (1976). Applied Sampling. New York: Academic Press.

Sukhatme, P.V. (1954). Sampling Theory of Surveys with Applications. Iowa State College Press, Ames.

Tam, S.-M., and Clarke, F. (2015). Big data, official statistics and some initiatives by the Australian Bureau of Statistics. International Statistical Review/Revue Internationale de Statistique, 83, 436-448.

Tchuprow, A.A. (1923). On the mathematical expectation of the moments of frequency distributions in the case of correlated observations. Metron, 2, 461-493, 646-683.

Thomsen, I. (1973). A note on the efficiency of weighting subclass means to reduce the effects of non-response when analyzing survey data. Statistisk Tidskrift, 11, 278-283.

van Remoortel, H., Giavedoni, S., Raste, Y., Burtin, C., Louvaris, Z., Gimeno-Santos, E., Langer, D., Glendenning, A., Hopkinson, N.S., Vogiatzis, I., Peterson, B.T., Wilson, F., Mann, B., Rabinovich, R., Puhan, M.A., Troosters, T. and PROactive consortium (2012). Validity of activity monitors in health and chronic disease: A systematic review. International Journal of Behavioral Nutrition and Physical Activity, 9.

Wolter, K.M. (2007). Introduction to Variance Estimation. New York: Springer-Verlag.

Woodruff, R.S. (1952). Confidence intervals for medians and other position measures. Journal of the American Statistical Association, 47, 635-636.

Yates, F. (1948). Systematic sampling. Philosophocal Transaction of the Royal Society of London, Series A, A241, 345-377.

Yates, F. (1949). Sampling Methods for Censuses and Surveys. Griffin, London.


Date modified: