5. Concluding Remarks

Brady T. West and Michael R. Elliott

Previous

This paper has considered frequentist and Bayesian methods for comparing the interviewer variance components for non-normally distributed survey items between two independent groups of survey interviewers. The methods are based on a flexible class of hierarchical generalized linear models (HGLMs) that allow the variance components for two mutually exclusive groups of interviewers to vary, and alternative inferential approaches based on those models. Results from a simulation study suggest that the two approaches have little empirical bias, comparable empirical MSE values and good coverage for moderate-to-large samples of interviewers and respondents. Analyses of real data from the U.S. National Survey of Family Growth (NSFG) suggest that inferences based on the two approaches tend to be quite similar. We find the similar performance of these two approaches to be good news for survey researchers, in that frequentists and Bayesians alike have tools available to them for analyzing this problem that will lead to similar conclusions.

There are some subtle distinctions between the two approaches that emerged in the analyses, mainly related to sample sizes and estimates of variance components that are extremely small or equal to zero. These issues warrant further discussion, given their implications for survey practice. The Bayesian approach illustrated here is capable of accommodating uncertainty in the estimation of variance components when forming credible sets and does not rely on asymptotic theory, but we found that inferences about differences in variance components between a number of different subgroups of NSFG interviewers (each of moderate size) did not vary from those that would be made using frequentist approaches. Whether or not we would see the same results for even smaller groups of interviewers requires future investigation; the simulation study presented in Section 3 suggested that neither method performs well in a context where two groups of 20 interviewers collect data from 10 respondents each. An initial application of these two methods to data from the first quarter of data collection in this cycle of the NSFG (with about 20 interviewers in each of two groups interviewing about 20 respondents each on average) yielded findings similar to those reported here for larger samples, with some evidence of the Bayesian approach being more conservative (West 2011).

In general, the Bayesian approach provides a more natural form of inference for this problem, indicating a range of values for the difference in which approximately 95% of differences will fall. This may appeal to certain consumers of a given survey's products, as opposed to the simple p - MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiCaiaab2 caaaa@378C@ value for a likelihood ratio test, which does not give users a sense of the range of possible differences. In the frequentist setting, the likelihood ratio test may be the only method of inference available if the pseudo maximum likelihood point estimate for one or more of the variance components is zero, with no corresponding standard error (preventing computation of Wald-type intervals). This situation was observed in both the simulations and the NSFG analyses, especially for groups with smaller samples of interviewers; given the reliance of likelihood ratio tests on asymptotic theory, the Bayesian approach may be better a better choice for smaller samples. The performance of the Bayesian approach is not ideal, however, for very small samples, as illustrated in the simulation study in Section 3.

We noted two significant differences between subgroups of interviewers in the NSFG data, and in each of these cases, the group with the smaller variance had an estimated variance component set to zero (with no standard error computed) when using the frequentist approach. The resulting inferences based on these estimates (where likelihood values were computed using the estimates of zero for the subgroups in question when performing the likelihood ratio tests) agreed with the Bayesian approach. We remind readers using frequentist methods that small samples of interviewers or extremely small amounts of variance among interviewers for particular variables may lead to negative maximum likelihood estimates of variance components, which can be problematic for the interpretation of interviewer variance for individual groups. Some software procedures capable of fitting multilevel models (e.g., the gllamm procedure in Stata, or the lmer() function in R) constrain variance components to be greater than zero during estimation to prevent this problem, which can increase estimation times. Other software procedures (like GLIMMIX in SAS) will simply fix these negative estimates to be zero, and fail to compute an estimated standard error. While these variance components technically cannot be equal to zero, we suggest interpreting these findings as evidence that there in negligible variance among the interviewers in a particular group. Bates (2009) argues against the use of standard errors for making inferences about variance components in the frequentist setting, especially when variance components are close to zero, instead suggesting that the profiled deviance function should be used to visualize the precision of the estimates. Both this approach and the Wald approach to computing confidence intervals will still be limited by smaller samples.

We do not see an empirical problem with using these zero estimates to perform the likelihood ratio tests demonstrated here for comparing groups of interviewers, given that Bayesian draws of the variance components in these groups would also be very small. However, in the case of estimating interviewer variance for single groups, examination of the sensitivity of Bayesian inferences to choices of different prior distributions for the variance components should be performed when variance components close to zero are expected, or the number of interviewers is relatively small (Browne and Draper 2006; Lambert et al. 2005). Furthermore, if survey researchers are interested in predicting random interviewer effects in the case where interviewer variance components are expected to be close to zero, both frequentist and Bayesian methods perform very poorly, and prediction is not recommended in this case (Singh et al. 1998, p. 390). See Savalei and Kolenikov (2008) for more discussion of the zero variance issue.

This study was certainly not without limitations. We acknowledge that the design of the NSFG, where interviewers are typically assigned to work in a single primary sampling area, did not allow for interpenetrated assignment of sampled cases to interviewers. As a result, disentangling interviewer effects from effects of the primary sampling areas is difficult. The methodologies illustrated in this paper can easily incorporate additional interviewer- or area-level covariates in an effort to "explain� variance among interviewers or areas due to observable covariates. The question of how to estimate interviewer variance in the presence of a strictly non-interpenetrated sample design needs more research in general, and we did not address this open question in this paper. As mentioned in Section 1, interpenetrated sample designs have been used in recent studies to disentangle interviewer and area effects. Future studies should examine the ability of the two approaches reviewed in this paper to detect differences in interviewer variance components when using cross-classified multilevel models that also include the effects of areas in an interpenetrated sample design.

On a similar note, we did not account for any of the complex sampling features of the NSFG (i.e., weighting or stratified cluster sampling) in the analyses. The theory that underlies the estimation of parameters in multilevel models in the presence of survey weights calls for weights for both the respondents and the higher-level clusters, which in this case would be interviewers (Rabe-Hesketh and Skrondal 2006; Pfefferman, Skinner, Holmes, Goldstein and Rasbash 1998). The analyses presented here effectively assume that we have a sample of interviewers from some larger population that was selected with equal probability, and that all respondents within each interviewer had equal weight. Methods outlined by Gabler and Lahiri (2009) might prove useful for addressing this limitation, and analysts could also include fixed effects of survey weights or stratification codes in the models proposed here. We leave these extensions for future research.

Finally, this paper also did not consider another rich aspect of the Bayesian approach, in that posterior draws of the 87 random interviewer effects in the models were also generated by the BUGS Gibbs sampling algorithm. These draws would enable survey managers to make inferences about the effects specific interviewers are having on particular survey measures. Consistent and regular updating of these posterior distributions as data collection progresses would enable survey managers to intervene when the posterior distributions for particular interviewers suggest that these interviewers are having non-zero effects on the survey measures.

Acknowledgements

The authors are grateful for support from a contract with the National Center for Health Statistics that enabled the seventh cycle of the National Survey of Family Growth (contract 200-2000-07001).

Appendix

A.1 Example Code

We provide example code for fitting the types of models discussed in the paper using SAS PROC GLIMMIX below. In this code, PARITY and SEXMAIN are the count and binary variables, respectively, measured on NSFG respondents, FINAL_INT_ID is a final interviewer ID code, and INT_NVMARRIED is an indicator variable for whether or not an interviewer has never been married. The ASYCOV option will print asymptotic estimates of the variances and covariances of the estimated variance components.

/* marital status */

proc glimmix data = bayes.final_analysis asycov;

class final_int_id int_nvmarried;

model parity = int_nvmarried / dist = negbin link = log solution cl;

random int / subject = final_int_id group = int_nvmarried;

covtest homogeneity / cl (type = plr);

nloptions tech=nrridg;

run;

proc glimmix data = bayes.final_analysis asycov;

class final_int_id int_nvmarried;

model sexmain (event = "1") = int_nvmarried / dist = binary link = logit solution cl;

random int / subject = final_int_id group = int_nvmarried;

covtest homogeneity / cl (type = plr);

nloptions tech=nrridg;

run;

We also provide example WinBUGS code for fitting the models using the Bayesian approaches discussed below. We call the WinBUGS code from the R software. In this code, LOWAGE.G is an interviewer-level indicator (with 87 values) for being in the younger interviewer age group, and HIGHAGE.G is an indicator for being in the older group. The full code, including code creating the variables used below, is available from the authors upon request.

# load necessary packages for using BUGS from R

library(arm)

library(R2WinBUGS)

############# Parity Analyses

# BUGS file for Age Group and Parity (age_nb.bug)

model {

for (i in 1:n){

parity[i] ~ dpois(lambda[i])

lambda[i] <- rho[i]*mu[i]

log(mu[i]) <- b0[intid[i]]

rho[i]~dgamma(alpha,alpha)

}

for (j in 1:J){

b0[j] ~ dnorm(b0.hat[j], tau.b0[highage.g[j]+1])

b0.hat[j] <- beta0 + beta1*lowage.g[j]

}

beta0 ~ dnorm(0,0.01)

beta1 ~ dnorm(0,0.01)

alpha <- exp(logalpha)

logalpha ~ dnorm(0,0.01)

for (k in 1:2){

tau.b0[k] <- pow(sigma.b0[k], -2)

sigma.b0[k] ~ dunif(0,10)

}

}

# Simulations for Parity/Age Group model in BUGS

n <- length(parity)

J <- 87

age.data <- list("n", "J", "parity", "intid", "highage.g", "lowage.g")

age.inits <- function(){

list (b0=rnorm(J), beta0=rnorm(1), beta1=rnorm(1), sigma.b0=runif(2), logalpha=rnorm(1))}

age.parameters <- c("b0", "beta0", "beta1", "sigma.b0", "alpha")

age.1 <- bugs(age.data, age.inits, age.parameters, "age_nb.bug", n.chains = 3, n.iter=5000, debug=TRUE, bugs.directory="C:/Users/bwest/Desktop/winbugs14/WinBUGS14")

attach.bugs(age.1)

# for tables of results and inference

resultsmat <- cbind(numeric(6),numeric(6),numeric(6),numeric(6))

resultsmat[1,1] <- quantile(beta0,0.5)

resultsmat[1,2] <- sd(beta0)

resultsmat[1,3] <- quantile(beta0,0.025)

resultsmat[1,4] <- quantile(beta0,0.975)

resultsmat[2,1] <- quantile(beta1,0.5)

resultsmat[2,2] <- sd(beta1)

resultsmat[2,3] <- quantile(beta1,0.025)

resultsmat[2,4] <- quantile(beta1,0.975)

resultsmat[3,1] <- quantile(sigma.b0[,1]^2,0.5)

resultsmat[3,2] <- sd(sigma.b0[,1]^2)

resultsmat[3,3] <- quantile(sigma.b0[,1]^2,0.025)

resultsmat[3,4] <- quantile(sigma.b0[,1]^2,0.975)

resultsmat[4,1] <- quantile(sigma.b0[,2]^2,0.5)

resultsmat[4,2] <- sd(sigma.b0[,2]^2)

resultsmat[4,3] <- quantile(sigma.b0[,2]^2,0.025)

resultsmat[4,4] <- quantile(sigma.b0[,2]^2,0.975)

resultsmat[5,1] <- quantile(1/alpha,0.5)

resultsmat[5,2] <- sd(1/alpha)

resultsmat[5,3] <- quantile(1/alpha,0.025)

resultsmat[5,4] <- quantile(1/alpha,0.975)

vardiff <- sigma.b0[,1]^2 - sigma.b0[,2]^2

resultsmat[6,1] <- quantile(vardiff,0.5)

resultsmat[6,2] <- sd(vardiff)

resultsmat[6,3] <- quantile(vardiff,0.025)

resultsmat[6,4] <- quantile(vardiff,0.975)

resultsmat

############# Current Sexual Activity Analyses

# BUGS file for Age Group and Sexual Activity (age_bin.bug)

model {

for (i in 1:n){

      sexmain[i] ~ dbern(p[i])

      logit(p[i]) <- b0[intid[i]]

}

for (j in 1:J){

b0[j] ~ dnorm(b0.hat[j], tau.b0[highage.g[j]+1])

b0.hat[j] <- beta0 + beta1*lowage.g[j]

}

beta0 ~ dnorm(0,0.01)

beta1 ~ dnorm(0,0.01)

for (k in 1:2){

tau.b0[k] <- pow(sigma.b0[k], -2)

sigma.b0[k] ~ dunif(0,10)

}

}

# Simulations for Parity/Age Group model in BUGS

n <- length(sexmain)

J <- 87

age.data <- list("n", "J", "sexmain", "intid", "highage.g", "lowage.g")

age.inits <- function(){

list (b0=rnorm(J), beta0=rnorm(1), beta1=rnorm(1), sigma.b0=runif(2))}

age.parameters <- c("b0", "beta0", "beta1", "sigma.b0")

age.1 <- bugs(age.data, age.inits, age.parameters, "age_bin.bug", n.chains = 3, n.iter=5000, debug=TRUE, bugs.directory="C:/Users/bwest/Desktop/winbugs14/WinBUGS14")

attach.bugs(age.1)

References

Bates, D. (2009). Assessing the precision of estimates of variance components. Presentation to the Max Planck Institute for Ornithology, Seewiesen, July 21, 2009. Presentation can be downloaded from http://lme4.r-forge.r-project.org/slides/2009-07-21-Seewiesen/4PrecisionD.pdf.

Biemer, P.P. and Trewin, D. (1997). A review of measurement error effects on the analysis of survey data. Chapter 27 of Survey Measurement and Process Quality, Editors Lyberg, Biemer, Collins, de Leeuw, Dippo, Schwarz and Trewin. Wiley-Interscience, 603‑632.

Binder, D.A. (1983). On the variances of asymptotically normal estimators from complex surveys. International Statistical Review, 51, 279‑292.

Browne, W.J. and Draper, D. (2006). A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Analysis, 1(3), 473‑514.

BUGS, http://www.mrc-bsu.cam.ac.uk/bugs/welcome.html.

Carlin, B.P. and Louis, T.A. (2009). Bayesian Methods for Data Analysis. Chapman and Hall / CRC Press.

Chaloner, K. (1987). A Bayesian approach to the estimation of variance components for the unbalanced one-way random model. Technometrics, 29(3), 323‑337.

Collins, M. and Butcher, B. (1982). Interviewer and clustering effects in an attitude survey. Journal of the Market Research Society, 25, 39‑58.

Durham, C.A., Pardoe, I. and Vega, E. (2004). A methodology for evaluating how product characteristics impact choice in retail settings with many zero observations: An application to restaurant wine purchase. Journal of Agricultural and Resource Economics,29(1), 112‑131.

Durrant, G.B., Groves, R.M., Staetsky, L. and Steele, F. (2010). Effects of interviewer attitudes and behaviors on refusal in household surveys. Public Opinion Quarterly, 74, 1‑36.

Faraway, J.J. (2006). Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. Chapman and Hall / CRC Press: Boca Raton, FL.

Farrell, P.J. (2000). Bayesian inference for small area proportions. Sankhya: The Indian Journal of Statistics, Series B (1960‑2002), 62(3), 402‑416.

Fowler, F.J. and Mangione, T.W. (1990). Standardized Survey Interviewing: Minimizing Interviewer-Related Error. Newbury Park, CA: Sage.

Gabler, S. and Lahiri, P. (2009). On the definition and interpretation of interviewer variability for a complex sampling design. Survey Methodology, 35(1), 85‑99.

Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515‑533.

Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (2004). Bayesian Data Anaylsis. Chapman and Hall / CRC Press.

Gelman, A. and Hill, J. (2007). Data Analysis using Regression and Multilevel / Hierarchical Models. Cambridge University Press.

Gelman, A. and Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statistical Science, 7, 457‑511.

Gilks, W.R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41, 337‑348.

Goldstein, H. (1995). Multilevel Statistical Models, Second Edition. Kendall’s Library of Statistics 3, Edward Arnold.

Groves, R.M. (2004). Survey Errors and Survey Costs (2nd Edition). In chapter 8: The Interviewer as a Source of Survey Measurement Error. Wiley-Interscience.

Groves, R.M., Mosher, W.D., Lepkowski, J.M. and Kirgis, N.G. (2009). Planning and development of the continuous National Survey of Family Growth. National Center for Health Care Statistics. Vital Health Statistics, 1(48).

Hilbe, J.M. (2007). Negative Binomial Regression. Cambridge University Press.

Hox, J. (1998). Multilevel Modeling: When and Why. In I. Balderjahn, R. Mathar and M. Schader (Eds.). Classification, data anaylsis, and data highways. New York: Springer-Verlag, 147‑154.

Kish, L. (1962). Studies of interviewer variance for attitudinal variables. Journal of the American Statistical Association, 57, 92‑115.

Lambert, P.C., Sutton, A.J., Burton, P.R., Abrams, K.R. and Jones, D.R. (2005). How vague is vague? A simulation study of the impact of the use of vague prior distributions in MCMC using WinBUGS. Statistics in Medicine, 24(15), 2401‑2428.

Lepkowski, J.M., Mosher, W.D., Davis, K.E., Groves, R.M. and Van Hoewyk, J. (2010). The 2006‑2010 National Survey of Family Growth: sample design and analysis of a continuous survey. National Center for Health Statistics, Vital and Health Statistics, 2(150), June 2010.

Lynn, P., Kaminska, O. and Goldstein, H. (2011). Panel attrition: how important is it to keep the same interviewer? ISER Working Paper Series, Paper 2011‑02.

Mahalanobis, P.C. (1946). Recent experiments in statistical sampling in the Indian Statistical Institute. Journal of the Royal Statistical Society, 109, 325‑378.

Mangione, T.W., Fowler, F.J. and Louis, T.A. (1992). Question characteristics and interviewer effects. Journal of Official Statistics, 8(3), 293‑307.

Molenberghs, G. and Verbeke, G. (2005). Models for Discrete Longitudinal Data. Springer-Verlag, Berlin.

O’Muircheartaigh, C. and Campanelli, P. (1998). The relative impact of interviewer effects and sample design effects on survey precision. Journal of the Royal Statistical Society, Series A, 161 (1), 63‑77.

O’Muircheartaigh, C. and Campanelli, P. (1999). A multilevel exploration of the role of interviewers in survey non-response. Journal of the Royal Statistical Society, Series A, 162(3), 437‑446.

Pfefferman, D., Skinner, C.J., Holmes, D.J., Goldstein, H. and Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society, Series B, 60(1), 23‑40.

Pinheiro, J.C. and Chao, E.C. (2006). Efficient Laplacian and adaptive Gaussian quadrature algorithms for multilevel generalized linear mixed models. Journal of Computational and Graphical Statistics, 15, 58‑81.

Rabe-Hesketh, S. and Skrondal, A. (2006). Multilevel modeling of complex survey data. Journal of the Royal Statistical Society, Series A, 169, 805‑827.

Raudenbush, S.W. and Bryk, A.S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods. Sage Publications, Newbury Park, CA.

Rodriguez, G. and Goldman, N. (2001). Improved estimation procedures for multilevel models with binary response: a case-study. Journal of the Royal Statistical Society, Series A, 164(2), 339‑355.

SAS Institute, Inc. (2010). Online Documentation for the GLIMMIX Procedure.

Savalei, V. and Kolenikov, S. (2008). Constrained versus unconstrained estimation in structural equation modeling. Psychological Methods, 13(2), 150‑170.

Schaeffer, N.C., Dykema, J. and Maynard, D.W. (2010). Handbook of Survey Research, Second Edition. In Interviewers and Interviewing. J.D. Wright and P.V. Marsden (eds). Bingley, U.K.: Emerald Group Publishing Limited.

Schnell, R. and Kreuter, F. (2005). Separating interviewer and sampling-point effects. Journal of Official Statistics, 21(3), 389‑410.

Schober, M. and Conrad, F. (1997). Does conversational interviewing reduce survey measurement error? Public Opinion Quarterly, 61, 576‑602.

Singh, A.C., Stukel, D.M. and Pfeffermann, D. (1998). Bayesian versus frequentist measures of error in small area estimation. Journal of the Royal Statistical Society, Series B, 60(2), 377‑396.

Ugarte, M.D., Goicoa, T. and Militino, A.F. (2009). Empirical bayes and fully bayes procedures to detect high-risk areas in disease mapping. Computational Statistics and Data Analysis, 53, 2938‑2949.

Van Tassell, C.P. and Van Vleck, L.D. (1996). Multiple-trait Gibbs sampler for animal models: Flexible programs for Bayesian and likelihood-based (co)variance component inference. Journal of Animal Science, 74, 2586‑2597.

Viechtbauer, W. (2007). Confidence intervals for the amount of heterogeneity in meta-analysis. Statistics in Medicine, 26, 37‑52.

West, B.T. (2011). Bayesian analysis of between-group differences in variance components in hierarchical generalized linear models. In JSMProceedings, Survey Research Methods Section. Alexandria, VA: American Statistical Association, 1828‑1842.

West, B.T. and Galecki, A.T. (2011). An overview of current software procedures for fitting linear mixed models. The American Statistician, 65(4), 274‑282.

West, B.T., Kreuter, F. and Jaenichen, U. (2013). “Interviewer” effects in face-to-face surveys: A function of sampling, measurement error, or nonresponse? Journal of Official Statistics, 29(2), 277‑297.

West, B.T. and Olson, K. (2010). How much of interviewer variance is really nonresponse error variance? Public Opinion Quarterly, 74(5), 1004‑1026.

Zhang, D. and Lin, X. (2010). Variance component testing in generalized linear mixed models for longitudinal / clustered data and other related topics. Random Effect and Latent Variable Model Selection. Springer Lecture Notes in Statistics, Volume 192.

Previous

Date modified: