1 Introduction

Iván A. Carrillo and Alan F. Karr

Previous | Next

The Survey of Doctorate Recipients (SDR) is a National Science Foundation (NSF) longitudinal survey whose design incorporates features of both repeated panels and rotating panels. The purpose of the survey is to study U.S. doctorate recipients in science, engineering, and health fields. It is conducted approximately every two years. A detailed description of the SDR can be found at NSF (2012). In this paper we restrict our attention to the data collected from 1995 through 2008 (7 waves).

At any particular wave a new cohort is selected. The new cohort consists of a sample of recent graduates (from the previous two years) selected from the Doctorate Records File, which is a database constructed mainly from the Survey of Earned Doctorates (http://www.nsf.gov/statistics/srvydoctorates/). The selected individuals are kept in the sample, i.e., interviewed every two years, until the age of 75, while living in the U.S. during the survey reference week, and while not institutionalized. However, not all the sampled graduates satisfying these characteristics are retained forever. Some individuals, rather than entire cohorts, are dropped from the sample in order to a) include the new graduates in the new cohorts and b) maintain a relatively constant sample size across waves. In Section 2.2 we describe how the selection of the individuals who are dropped is made.

Survey weights for cross-sectional analyses of the SDR are already available, but not for longitudinal analyses. Rather than requiring a new longitudinal weight for all the data, the method proposed in this paper is able to use the existing cross-sectional weights for longitudinal analyses without ignoring any data. We concentrate on estimation of parameters of statistical models of the effect of covariates on a response of interest, but the method can also be used for estimation of finite population quantities (Carrillo and Karr 2012). We focus on analysis of the SDR, but our method is applicable to any fixed-panel, fixed-panel-plus-'births', repeated-panel, rotating-panel, split-panel, or refreshment sample survey, as long as for each wave there is a cross-sectional weight to represent the population of interest at that wave. See Smith, Lynn and Elliot (2009), Hirano, Imbens, Ridder and Rubin (2001), and Nevo (2003) for definitions of all these types of longitudinal sample designs.

The SDR is a hybrid of repeated-panel and rotating panel designs. It is not purely a repeated-panel design because of the removal of some subjects at each wave. It is not purely a rotating-panel design because entire panels (or cohorts) are not removed, only individuals; additionally, the composition of the finite population of interest changes over time, unlike in a rotating panel survey.

Diggle, Heagerty, Liang and Zeger (2002) and Hedeker and Gibbons (2006) point out that, with longitudinal studies, contrary to a cross-sectional study, it is possible to separate age effect (actual change within subjects over time) and cohort effect (difference between units at the beginning of the study period).

Hedeker and Gibbons (2006) also suggest that since longitudinal studies allow for the measurement of time-varying explanatory variables (covariates), the statistical inferences about dynamic relationship between the outcome on interest (response) and these covariates are much stronger than those based on cross-sectional studies.

When we are interested in the marginal mean of a variable, possibly conditionally on some covariates, and not in measuring change, a longitudinal study is not necessary; a cross-sectional study suffices. However, even in this case, a longitudinal study tends to be more powerful, because each subject serves as his or her own control for any unmeasured characteristics (Diggle, et al. 2002).

Our approach differs from the existing alternatives in the literature, which have some limitations for analysis of such data, and in particular for application to the SDR. For example, Berger (2004a) and Berger (2004b) go into detail about the estimation of change using rotating samples, but they assume that the composition of the finite population does not change over time, which is not the case of the SDR. This assumption does not hold in many other large-scale surveys. Also, the methodology proposed by Berger is not easily generalizable to more than two waves. Similarly, Qualité and Tillé (2008) also assume the finite population is fixed over time. Hirano, et al. (2001) and Nevo (2003) present different methods of estimation assuming a fixed-panel plus refreshment for attrition design, but also assume the finite population composition is fixed over time.

A time series approach is utilized by McLaren and Steel (2000) and Steel and McLaren (2007) to estimate change and trend with survey data. Although their approach allows for the incorporation of within-subject association in the point estimates, they do not consider covariates in their models (beyond the implicit time covariates). Also, they only discuss the estimation of change for continuous variables.

Another alternative for analyzing longitudinal data is to fix the finite population of interest, except perhaps for deaths, which could be allowed. Studies of this kind are those where there are data available only for a single cohort. For example, Vieira and Skinner (2008), Carrillo, Chen and Wu (2010), and Carrillo, Chen and Wu (2011) show some alternatives for modeling with single-cohort survey data. However, to use these kinds of analyses with multi-cohort surveys, one needs to ignore some (or many) available data, for example those data from subjects who are not common to all waves. An example of a weighting procedure of this type can be found in Ardilly and Lavallée (2007).

Finally, the approach of Larsen, Qing, Zhou and Foulkes (2011) is appealing, in principle, because it is the way survey practitioners generally proceed. An initial weight is adjusted, among other things for calibration to known totals, in this case totals by survey wave. Nonetheless, for rotating panels this method is still in its infancy; there are some things that are not completely clear how to carry out. For example, it is not clear what the initial weight should be: a constant weight?, the earliest available weight?, the average of the available weights for each case?, or the latest available weight?  Also, in the case of dropouts, as there exist in the SDR, the authors do not clarify how to carry out a nonresponse adjustment with this method. Even more, it is not clear why a nonresponse adjustment for dropouts at, say, wave 4 should have any influence on the observations at wave 3, as this methodology permits since there is a single weight for each subject. Additionally, the authors mention that they estimated standard errors, but they do not indicate how to take into account all the features of the sampling design, such as changes over time in the stratification and weighting adjustment classes of the SDR. Our method, on the other hand, utilizes only cross-sectional weights and variance estimation methods, which have been studied thoroughly in the literature and are readily available for the SDR.

The rest of the paper is organized as follows. In the next section we give a description of the SDR design. After that, in Section 3, we propose a novel approach for longitudinal analysis of marginal mean models with multi-cohort surveys. Then we present the application of the methodology to the SDR in Section 4. Finally we offer a few discussion points in Section 5.

Previous | Next

Date modified: