Sample survey theory and methods: Past, present, and future directions
Section 4. The future

We can project a number of current situations into the future. Budgets will be tight and requests for products will expand. There will be demand for forecasts, and for improved access by users. There will be requests for statistics to be produced more rapidly and, naturally, with no compromise in quality. There will be pressure to bring estimates from different sources into agreement.

We expect faster computing to influence all aspects of the field. More complex edit and imputation algorithms will be developed. The time from collection to publication will be shortened. More complex analyses will be performed on survey data. Record linkage procedures will be improved. Data will be made available in different forms. Searchable databases where the user provides queries will become more common. The use of auxiliary data of all kinds, and in particular administrative data, will increase. Administrative data will be used both as auxiliary data and as the direct estimates for certain items. Citro (2014) gives examples of items where administrative data can be used to replace answers to questions in a questionnaire. Uses of auxiliary data where matching to collected data is imperfect will be a research area.

Modern communication methods and social media have resulted in vast quantities of data, much generated with short term and poorly identified purpose. The term “Big Data” is not well defined, but most would agree that social media data are a part of Big Data. The AAPOR report on Big Data (2015) is an excellent analysis of the potential and the challenges associated with Big Data. Tam and Clarke (2015) and Pfeffermann (2015) discuss the issues from the perspective of a governmental statistical organization. As part of modern society, social media are of interest to social scientists in their own right. Therefore, indexes and summaries of these data are, and will be, produced. An example is the University of Michigan Social Media Job Loss Index. Sampling has a large role to play in the creation of products from these data.

A challenge is transforming some types of Big Data into a form useful as auxiliary data. One example is the Porter, Holan, Wikle and Cressie (2014) use of Google trends of Spanish words as functional covariates to estimate state proportions of people speaking Spanish using American Community Survey estimates as dependent variables in small area models.

One of the often quoted advantages of samples relative to censuses is cost. The cost structure has changed with increased computing power and seems destined to continue to change. In the United States, the National Land Cover Database is a census of land cover (Han, Yang, Di and Mueller, 2012). Classification procedures are expected to improve so that use of such data as auxiliary data will increase. Data collection agencies will invest more in constructing improved auxiliary data files at the population level so that some data now collected on a sample basis will be collected at a population level. The same types of data development will continue for population and business statistics.

Of necessity, our discussion has little on collection. The way in which data collection procedures have been modified with changing technology is perhaps more obvious than the link between technology and theory. For the links to theory see Bellhouse (2000). Computer-assisted data collection is the evolving standard. The use of geo-location technology can be expected to increase. It is safe to forecast the increased use of remote sensing and remote data collection devices. For example, it would be easy to incorporate physical data collected by something like the Apple Watch or Fitbit into a health study. Larger and less attractive monitoring devices are currently in use in physical activity surveys (van Remoortel, Giavedoni, Raste, Burtin, Louvaris, Gimeno-Santos, Langer, Glendenning, Hopkinson, Vogiatzis, Peterson, Wilson, Mann, Rabinovich, Puhan, Troosters and PROactive consortium, 2012).

The recent experience is that phone and personal interview data collection is becoming more and more difficult. Respondents are facing expanded organized data collection activities. The ubiquitous questionnaire on satisfaction for everything from medical services to tooth paste surely must impact an individual’s willingness to respond. It seems reasonable to forecast increased difficulty in obtaining cooperation for traditional methods of data collection. Associated with that trend will be increased study of the nature of non-respondents and of non-response. Likewise efforts will be made to adapt data collection to the changing methods of communication.

Nonprobability samples have been a part of survey activity throughout the post-Neyman period. In particular, quota sampling is commonly used in marketing research and other areas for cost reasons (Sudman, 1966; 1976). Moser and Stuart (1953) and Stephan and McCarthy (1958) made early comparisons between quota sampling and probability sampling. Cochran (1977, page 136) says “The quota method seems likely to produce samples that are biased on characteristics such as income, education and occupation, although it often agrees with the probability samples on questions of opinion and attitude”. Use of procedures such as post stratification and regression estimation in nonprobability samples has continued at pace with use in probability samples. The changing nature of human communication offers opportunities for both model-based and probability-based procedures. Because of cost structures, new methods such as web-based procedures will often be used first in nonprobability settings and for nongovernmental purposes.

As matching procedures improve and as demand for detailed data increases, disclosure limitation procedures and associated research will receive increased attention.

Survey sampling is an application discipline, functioning in the current social, geographic, cultural, and technological world. To forecast how our field will be impacted by social and cultural changes, even in the short run, is a challenge. Will the fact that one must assume that almost all of one’s public activity and a great deal of one’s private activity has potential of being recorded lead to a more relaxed attitude in responding to questions? Will improved monitoring devices make respondents more willing to permit their physical activities be monitored? Or will all of the incidental monitoring lead to a reaction against organized data collection? Will increased availability of results based on collected data have a positive or negative effect on data collection efforts? What is the impact of various Social Media?

This discussion makes clear that factors external to our discipline will determine our future activities. We will be required to adapt in data collection, data processing, and data presentation-dissemination.


We thank Graham Kalton for comments and suggestions that led to improvements in the original draft. We thank the four discussants, Graham Kalton, Sharon Lohr, Danny Pfeffermann and Chris Skinner, for their supplements on the history, insightful observations on the present, and comments on the future of survey sampling. We elected not to prepare a rejoinder because we found much to appreciate and little basis for disagreement.


