1 Introduction

Barry Schouten, Melania Calinescu and Annemieke Luiten

Previous | Next

In most surveys, all sample units receive the same treatment and the same design features apply to all selected people and households. When auxiliary information is available from registry data or interviewer observations, then survey designs may be tailored to optimize response rates, to reduce nonresponse selectivity or more, generally, to improve quality. Although a general terminology is lacking in the literature, such designs are usually referred to as adaptive survey designs.

With this paper, we aim to describe the basic ingredients of adaptive survey designs, to systematize these designs by providing a mathematical framework, to illustrate their potential to improve efficiency of survey data collection, and to propagate their use in survey practice.

Adaptive survey designs assume that different people or households may receive different treatments. These treatments are defined before the survey starts, but may also be updated via data that are observed during data collection. In other words, allocation of treatments is based on data that are linked to the survey sample and on paradata. Paradata are data about the survey data collection process, e.g., observations of interviewers about the neighborhood, the dwelling or the respondents, or the performance of interviewers themselves. In this paper, paradata are used in the widest sense as data that are observed during data collection and that are informative about the response behavior of sampled people and households.

A general introduction to adaptive survey designs is given by Wagner (2008). Adaptive survey designs find their origin in the literature on medical statistics where treatments are varied beforehand over patient groups but also depend on the responses of patients, i.e., depend on measurements during data collection. See for example Heyd and Carlin (1999), Murphy (2003) and Zajonc (2012).

A special case of an adaptive survey design is the responsive survey design. Responsive survey designs were introduced by Groves and Heeringa (2006). Like general adaptive survey designs, responsive survey designs may apply differential design features to sample units. However, the main distinction is that responsive survey designs identify promising and effective treatments or design features during data collection. In order to do so, the data collection is divided into multiple design phases. A new phase employs the outcomes of randomized contrasts between sample units in previous phases to distinguish effective from ineffective treatments and to identify costs associated with the treatments. Randomized contrasts are differences in response rates between subpopulations for randomly assigned design features. See for example Mohl and Laflamme (2007), Laflamme and Karaganis (2010), Phillips and Tabuchi (2009) and Peytchev, Riley, Rosen, Murphy and Lindblad (2010). The allocation of design features must be done in such a way that each phase reaches its phase capacity, which is the optimal trade-off between quality and costs. Responsive designs are motivated by survey settings where little is known about the sample beforehand and/or little information about the effectiveness of treatments is available from historic data. In these settings multiple phases are needed and responsive designs are practical. If the second and higher design phases of responsive designs are considered, however, then the starting point is similar to survey settings where substantial prior information about sample units is available or where a survey is repeated many times. The only distinction is that in previous design phases part of the sample has already responded. In this paper, it is assumed that historic data are available, that effective treatments are identified beforehand and that it is specified what linked data and paradata are going to be used to adapt the design.

What is new in this paper? We make three contributions. First, we set up a general mathematical framework for optimizing response quality given cost constraints. Second, we explicitly allocate different design features to different sample units within this framework. Third, we propose to optimize quality indicators for nonresponse error. The last two contributions are by themselves not completely new. Simple adaptive survey designs are already applied, e.g., in the Dutch Labour Force Survey larger households are not interviewed by web or telephone and proxy reporting is only allowed by a member of the household core. Attempts to optimize survey design accounting for nonresponse error go at least as far back as Hartley and Monroe (1979). And there is a vast literature on optimizing timing and number of contact calls in interviewer surveys, e.g., Kulka and Weeks (1988), Greenberg and Stokes (1990) and Kalsbeek, Botman, Massey and Liu (1994). What is new is the ensemble of all the pieces into a general mathematical framework that abstracts from single design features and that allows to apply general quality indicators. The main motivations for the advance of such a framework are the strong pressure on survey costs and the rise of web as a survey mode. Web has a strong quality-cost differential; it is cheap but has low response rates and has different measurement properties than interviewer modes. As such, web challenges the trade off between quality and costs. Although survey literature has devoted considerable attention to trade-offs in survey designs between the various surveys errors, e.g., Lyberg, Biemer, Collins, de Leeuw, Dippo, Schwarz and Trewin (1997) and Dillman (2007), in survey practice there are still surprisingly few cases where differential design features are investigated and implemented. With this paper, we hope to provide a steppingstone for future research and discussion into adaptive survey designs.

In Section 2, we describe theory and concepts behind adaptive survey designs. In Section 3, we present an example based on virtual survey data, and, in Section 4 we discuss a simulation study based on real survey data. Finally, in Section 5 we end with a summary and discussion.

Previous | Next

Date modified: