5 Conclusions and future research
Iván A. Carrillo and Alan F. Karr
Previous
We have proposed a novel
approach to combining different cohorts of a longitudinal survey. The major
requirement of our method is that there is a cross-sectional survey weight for
each wave, or that one can be built from available information. This weight
should allow for statistical inference to the population of interest at the
corresponding wave. In that case, our method should perform better than usual
estimation procedures (where the auto-correlation is not incorporated) in many
practical situations, in particular when there is a high auto-correlation among
responses from the same subject.
In general, survey
practitioners avoid as much as possible the use of multiple survey weights.
However, in the case of rotating panels this is an appealing approach for at
least two reasons. On the one hand, it allows for the use of all the available
data in a clear and cohesive way in a single analysis procedure. On the other
hand, we have shown how readily available cross-sectional survey weights can be
directly used for longitudinal analysis, without the need to develop, store,
and distribute an additional longitudinal weight or weights.
Our method is directly
applicable to any kind of longitudinal survey as long as there are
cross-sectional survey weights available (or these can be created) at each
wave, and these weights represent the population of interest at the particular
wave.
For the theory that we
developed about the variance of the estimator proposed, we utilized the
(cross-sectional) design weights which are the inverse of the inclusion
probabilities. Yet for the application in our model for salary in the SDR we
used the final (cross-sectional) survey weights, which are not the original
design weights, but adjusted (in the usual way) weights. This mismatch requires
further exploration.
Similarly, in our
derivations of the variance, we assumed that the cohorts were independent.
However, the SDR does not totally satisfy this assumption for two reasons.
Firstly, at any particular wave, the selection of the sample from the old
cohorts is not performed independently across cohorts. In order to reduce the
number of strata, since 1991 the NSF has collapsed strata over year of degree
receipt for the old cohorts. Additionally, the post-stratification adjustments
made to the design weights do not condition over cohort either, and as a
result, weights are shared across cohorts. This sampling selection scheme and
weighting adjustment procedure violate the independence across cohorts. Some
additional calculations (included in the Appendix) have shown that the
independence among cohort is not such a crucial requirement for our variance
estimation method to produce good approximations, as explained in Section
3.3.1. In future research we plan to evaluate in more detail the impact of this
issue.
Acknowledgements
This research was
supported by NSF grant SRS-1019244 to the National Institute of Statistical
Sciences (NISS). Any opinions, findings, and conclusions or recommendations
expressed in this publication are those of the authors and do not necessarily
reflect the views of the National Science Foundation. The authors thank Paul
Biemer of RTI International, Stephen Cohen and Nirmala Kannankutty of the
National Center for Science and Engineering Statistics at NSF, and Criselda
Toto, formerly of NISS, for numerous insightful discussions during the
research. We are also grateful to the Associate Editor and two referees for
their useful suggestions.
Appendix - Proofs
To develop an expression for we first simplify Let for then we have:
where and let The two sums in are model-independent, and (in are two model-independent terms, and A and B
both have model-expectation zero; therefore, equation (3.9) follows.
We now develop the expression for the design variance of the estimating
function; we redefine and then
where, for line (A.1), we assume that the
(three) cohorts are design-independent. Now, where is, for a column vector a diagonal matrix with diagonal entries being
the elements of and
Similarly we can get and where and
Now, let us concentrate on letting we have:
Let us do each of the terms in (A.2) in turn,
beginning with we have:
where then
where
and this implies that
For we have:
where then,
where and line (A.3) is because, conditional on is constant and therefore the variance of that
component is zero. This means that:
We can, similarly, show that:
With similar calculations, we obtain the
corresponding expressions for and
Finally, we
sketch the development of an expression for without assuming independence among cohorts.
First, notice that can be written as:
letting and can be expanded as:
In this last expression, the first thing we
notice is that all the diagonal elements in all the covariance
terms are exactly equal to zero; this means that whether or not the cohorts are
independent of one another, expression (3.13) is exact for the variance terms.
To analyze the importance of the covariance terms, we concentrate on the term
in line (A.4); the conclusion for the other terms is the same; note that this
term can be written as:
Property 3.1 states that if the cohorts
are design-independent, all the covariance terms are exactly equal to zero. In
addition to that, from this last expression we conclude, trivially, that if the
waves are design-independent, all the covariance terms are equal to zero
too. This formula for the term in line (A.4) also implies that if the
individual weights do not vary greatly between consecutive waves, and there is
a high overlap between consecutive waves, the covariance terms are not too
large. Finally, if the overlap is small, it is reasonable to assume
design-independence between the waves, and then the covariance terms can be
safely approximated by zero.
References
Ardilly, P., and Lavallée, P. (2007). Weighting in rotating samples: The
SILC survey in France. Survey Methodology, 33, 2, 131-137.
Berger, Y.G. (2004a).
Variance estimation for change: An evaluation based upon the 2000 finnish
labour force survey. Proceedings. European Conference on Quality and Methodology
in Official Statistics.
Berger, Y.G. (2004b).
Variance estimation for measures of change in probability sampling. The
Canadian Journal of Statistics, 32, 4, 451-467.
Binder, D.A. (1983). On
the variances of asymptotically normal estimators from complex surveys.
International Statistical Review, 51, 279-292.
Carrillo, I.A., Chen, J.
and Wu, C. (2010). The pseudo-GEE approach to the analysis of longitudinal
surveys. The Canadian Journal of Statistics, 38, 4, 540-554.
Carrillo, I.A., Chen, J. and
Wu, C. (2011). A pseudo-GEE approach to analyzing longitudinal surveys under
imputation for missing responses. Journal of Official Statistics, 27, 2,
255-277.
Carrillo, I.A., and Karr,
A.F. (2011). Combining cohorts in longitudinal surveys. Technical Report 180,
National Institute of Statistical Sciences, Research Triangle Park, NC. URL
http://www.niss.org/sites/default/files/tr180.pdf.
Carrillo, I.A., and Karr,
A.F. (2012). Estimating change with multi-cohort longitudinal surveys.In preparation.
Cox, B.G., Grigorian, K.,
Wang, R. and Harter, R. (2010). 2008 Survey of Doctorate Recipients Weighting
Implementation Report, document prepared by the National Opinion Research
Center (NORC) for the National Science Foundation (NSF).
Diggle, P., Heagerty, P.,
Liang, K.-Y. and Zeger, S. (2002). Analysis of Longitudinal Data, 2nd
Edition. Oxford University Press, New York.
Hedeker, D., and Gibbons,
R.D. (2006). Longitudinal Data Analysis. Wiley Series in Probability and
Statistics. New Jersey: John Wiley & Sons, Inc., Hoboken.
Hirano, K., Imbens, G.W.,
Ridder, G. and Rubin, D.B. (2001). Combining panel data sets with attrition and
refreshment samples. Econometrica, 69, 6, 1645-1659.
Hu, F., and Kalbfleisch,
J.D. (2000). The estimating function bootstrap (Pkg: P449-495). The Canadian
Journal of Statistics, 28, 3, 449-481.
Larsen, M.D., Qing, S.,
Zhou, B. and Foulkes, M.A. (2011). Calibration estimation and longitudinal
survey weights: Application to the NSF Survey of Doctorate Recipients. In Proceedings of the Survey Research Method
Section, American Statistical Association, 1360-1374.
Liang, K.-Y., and Zeger,
S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika,
73, 13-22.
Lohr, S. (2007). Recent
developments in multiple frame surveys. In Joint
Statistical Meeting of the American Statistical Association, 3257-3264.
McLaren, C.H., and Steel,
D.G. (2000). The impact of different rotation patterns on the sampling variance
of seasonally adjusted and trend estimates. Survey Methodology, 26, 2,
163-172.
National Science
Foundation, National Center for Science and Engineering Statistics (2012).
Survey of doctorate recipients.
http://www.nsf.gov/statistics/srvydoctoratework/, accessed Feb. 09 2012.
Nevo, A. (2003). Using
weights to adjust for sample selection when auxiliary information is available.
Journal of Business & Economic Statistics, 21, 1, 43-52.
Qualité, L., and Tillé, Y. (2008). Variance estimation of changes in repeated surveys and
its application to the Swiss survey of value added. Survey Methodology,
34, 2, 173-181.
Rao, J.N.K., and Wu, C.
(2010). Pseudo-empirical likelihood inference for multiple frame surveys.
Journal of the American Statistical Association, 105, 492, 1494-1503.
Roberts, G., Binder, D., Kova�ević, M., Pantel, M. and Phillips, O. (2003). Using an
estimating function bootstrap approach for obtaining variance estimates when
modelling complex health survey data. Proceedings
of the Survey Methods Section, Statistical Society of Canada, Halifax.
Robins, J.M., Rotnitzky,
A. and Zhao, L.P. (1995). Analysis of semiparametric regression models for
repeated outcomes in the presence of missing data. Journal of the American
Statistical Association, 90, 106-121.
Rubin, D.B. (1976).
Inference and missing data. Biometrika, 63, 581-592.
Särndal, C.-E., Swensson,
B. and Wretman, J. (1992). Model Assisted Survey Sampling. New York:
Springer-Verlag.
Smith, P., Lynn, P. and
Elliot, D. (2009). Sample design for longitudinal surveys. In Methodology of Longitudinal Surveys,
(Ed., P. Lynn). Wiley, Chichester, Chapter 2, 21-33.
Song, P.X.-K. (2007).
Correlated Data Analysis: Modeling, Analytics, and Applications. Springer
Series in Statistics. New York: Springer.
Steel, D., and McLaren,
C. (2007). Design and analysis of repeated surveys. Keynote lecture.
International Conference on Quality Management of Official Statistics, Korea.
Vieira, M.D.T. (2009).
Analysis of Longitudinal Survey Data: Allowing for the Complex Survey Design in
Covariance Structure Models. VDM Verlag.
Vieira, M.D.T., and
Skinner, C.J. (2008). Estimating models for panel survey data under complex
sampling. Journal of Official Statistics, 24, 3, 343-364.
Previous