Publications

Survey Methodology

Browse by

5 Conclusions and future research

Iván A. Carrillo and Alan F. Karr

We have proposed a novel approach to combining different cohorts of a longitudinal survey. The major requirement of our method is that there is a cross-sectional survey weight for each wave, or that one can be built from available information. This weight should allow for statistical inference to the population of interest at the corresponding wave. In that case, our method should perform better than usual estimation procedures (where the auto-correlation is not incorporated) in many practical situations, in particular when there is a high auto-correlation among responses from the same subject.

In general, survey practitioners avoid as much as possible the use of multiple survey weights. However, in the case of rotating panels this is an appealing approach for at least two reasons. On the one hand, it allows for the use of all the available data in a clear and cohesive way in a single analysis procedure. On the other hand, we have shown how readily available cross-sectional survey weights can be directly used for longitudinal analysis, without the need to develop, store, and distribute an additional longitudinal weight or weights.

Our method is directly applicable to any kind of longitudinal survey as long as there are cross-sectional survey weights available (or these can be created) at each wave, and these weights represent the population of interest at the particular wave.

For the theory that we developed about the variance of the estimator proposed, we utilized the (cross-sectional) design weights $w_{i j},$ which are the inverse of the inclusion probabilities. Yet for the application in our model for salary in the SDR we used the final (cross-sectional) survey weights, which are not the original design weights, but adjusted (in the usual way) weights. This mismatch requires further exploration.

Similarly, in our derivations of the variance, we assumed that the cohorts were independent. However, the SDR does not totally satisfy this assumption for two reasons. Firstly, at any particular wave, the selection of the sample from the old cohorts is not performed independently across cohorts. In order to reduce the number of strata, since 1991 the NSF has collapsed strata over year of degree receipt for the old cohorts. Additionally, the post-stratification adjustments made to the design weights do not condition over cohort either, and as a result, weights are shared across cohorts. This sampling selection scheme and weighting adjustment procedure violate the independence across cohorts. Some additional calculations (included in the Appendix) have shown that the independence among cohort is not such a crucial requirement for our variance estimation method to produce good approximations, as explained in Section 3.3.1. In future research we plan to evaluate in more detail the impact of this issue.

Acknowledgements

This research was supported by NSF grant SRS-1019244 to the National Institute of Statistical Sciences (NISS). Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation. The authors thank Paul Biemer of RTI International, Stephen Cohen and Nirmala Kannankutty of the National Center for Science and Engineering Statistics at NSF, and Criselda Toto, formerly of NISS, for numerous insightful discussions during the research. We are also grateful to the Associate Editor and two referees for their useful suggestions.

Appendix - Proofs

To develop an expression for $C_{ξ},$ we first simplify $Ψ_{s} (β) {Ψ^{'}}_{U} (β) .$ Let $F_{i (k)} = B_{i} I_{i} (U) e_{i (k \dots 3)}$ for $k = 1, 2, 3,$ then we have:

$N^{2} Ψ_{s} (B) Ψ'_{U} (B) = \sum_{i \in s} B_{i} W_{i} e_{i} \sum_{i \in U} e'_{i} I_{i} (U) B'_{i} = [\sum_{i \in s} B_{i} W_{i} e_{i}] [\sum_{i \in s} {F^{'}}_{i (1)} + \sum_{i \notin s} {F^{'}}_{i (1)}]$
$= \sum_{i \in s} B_{i} W_{i} e_{i} \sum_{i \in s} {F^{'}}_{i (1)} + \sum_{i \in s} B_{i} W_{i} e_{i} \sum_{i \notin s} {F^{'}}_{i (1)}$
$= \sum_{i \in s} B_{i} W_{i} e_{i} {e^{'}}_{k} {B^{'}}_{i} + \sum_{i \in s} \sum_{\begin{matrix} k \in s \\ k \neq i \end{matrix}} B_{i} W_{i} e_{i} {e^{'}}_{k} I_{k} (U) {B^{'}}_{k} + A,$

where $A = (\sum_{i \in s} B_{i} W_{i} e_{i}) (\sum_{i \notin s} {F^{'}}_{i (1)}),$ and let $B = \sum_{i \in s} \sum_{\begin{matrix} k \in s \\ k \neq i \end{matrix}} B_{i} W_{i} e_{i} {e^{'}}_{k} I_{k} (U) {B^{'}}_{k} .$ The two sums in $A$ are model-independent, $e_{i}$ and ${e^{'}}_{k}$ (in $B)$ are two model-independent terms, and A and B both have model-expectation zero; therefore, $E_{ξ} [Ψ_{s} (β) {Ψ^{'}}_{U} (β)] = N^{- 2} \sum_{i \in s} B_{i} W_{i} E_{ξ} [e_{i} {e^{'}}_{i}] {B^{'}}_{i} = N^{- 2} \sum_{i \in s} B_{i} W_{i} Σ_{i} {B^{'}}_{i} = N^{- 1} {\hat{H}}_{Σ V} (β);$ equation (3.9) follows.

We now develop the expression for ${Var}_{p} [Ψ_{s} (β_{N})],$ the design variance of the estimating function; we redefine $B_{i} = {(\partial {μ^{'}}_{i} / \partial β) |}_{β = β_{N}} V_{i}^{- 1}$ and $e_{i} = y_{i} - μ_{i} (β_{N});$ then

${Var}_{p} [Ψ_{p} (β_{N})] {=Var}_{p} (\frac{1}{N} \sum_{i \in s} B_{i} W_{i} e_{i})$
$= \frac{1}{N^{2}} {Var}_{p} (\sum_{i \in s_{1 (1)}} B_{i} W_{i} e_{i} + \sum_{i \in s_{2 (2)}} B_{i} W_{i} e_{i} + \sum_{i \in s_{3 (3)}} B_{i} W_{i} e_{i})$
$= \frac{1}{N^{2}} {Var}_{p} (\sum_{i \in s_{1 (1)}} B_{i} W_{i} e_{i}) + \frac{1}{N^{2}} {Var}_{p} (\sum_{i \in s_{2 (2)}} B_{i} W_{i} e_{i})$
$+ \frac{1}{N^{2}} {Var}_{p} (\sum_{i \in s_{3 (3)}} B_{i} W_{i} e_{i}) = D_{(1)} + D_{(2)} + D_{(3)},$

where, for line (A.1), we assume that the (three) cohorts are design-independent. Now, $N^{2} D_{(1)} = {Var}_{p} [\sum_{i \in U_{1 (1)}} B_{i} W_{i} Diag {I_{i (1)}} e_{i}] = {Var}_{p} [\sum_{i \in U_{1 (1)}} B_{i} W_{i} Diag {e_{i}} I_{i (1)}],$ where $Diag {e}$ is, for a column vector $e,$ a diagonal matrix with diagonal entries being the elements of $e,$ and $I_{i (1)} = (I_{i} (s_{1 (1)}), I_{i} (s_{2 (1)}) I_{i} (s_{1 (1)}), {I_{i} (s_{3 (1)}) I_{i} (s_{2 (1)}) I_{i} (s_{1 (1)}))}^{'} .$

Similarly we can get $N^{2} D_{(2)} = {Var}_{p} [\sum_{i \in U_{2 (2)}} B_{i} W_{i} Diag {e_{i}} I_{i (2)}]$ and $N^{2} D_{(3)} = {Var}_{p} [\sum_{i \in U_{3 (3)}} B_{i} W_{i} Diag {e_{i}} I_{i (3)}],$ where $I_{i (2)} = {(0, I_{i} (s_{2 (2)}), I_{i} (s_{3 (2)}) I_{i} (s_{2 (2)}))}^{'},$ and $I_{i (3)} = {(0, 0, I_{i} (s_{3 (3)}))}^{'} .$

Now, let us concentrate on $D_{(1)};$ letting $C_{i} = B_{i} W_{i} Diag {e_{i}},$ we have:

$N^{2} D_{(1)} {=Var}_{p} (\sum_{i \in U_{1 (1)}} C_{i} I_{i (1)}) = Var {E [\sum_{i \in U_{1 (1)}} C_{i} I_{i (1)} | s_{1 (1)}]}$
$+ E {Var [\sum_{i \in U_{1 (1)}} C_{i} I_{i (1)} | s_{1 (1)}]}$
$=Var {E [E (\sum_{i \in U_{1 (1)}} C_{i} I_{i (1)} | s_{2 (1)}, s_{1 (1)}) | s_{1 (1)}]}$
$+ E {Var [E (\sum_{i \in U_{1 (1)}} C_{i} I_{i (1)} | s_{2 (1)}, s_{1 (1)}) | s_{1 (1)}] (A .2)$
$+ E [Var (\sum_{i \in U_{1 (1)}} C_{i} I_{i (1)} | s_{2 (1)}, s_{1 (1)}) | s_{1 (1)}]}$
$= N^{2} D_{(1) 1} + N^{2} D_{(1) 2} + N^{2} D_{(1) 3} .$

Let us do each of the terms in (A.2) in turn, beginning with $N^{2} D_{1 (1)},$ we have:

$E (\sum_{i \in U_{1 (1)}} B_{i} W_{i} Diag {e_{i}} I_{i (1)} | s_{2 (1)}, s_{1 (1)}) = \sum_{i \in U_{1 (1)}} B_{i} W_{i} Diag {e_{i}} I_{i (1)}^{(1)},$

where $I_{i (1)}^{(1)} = {(I_{i} (s_{1 (1)}), I_{i} (s_{2 (1)}) I_{i} (s_{1 (1)}), π_{i 3 | s_{2 (1)}} I_{i} (s_{2 (1)}) I_{i} (s_{1 (1)}))}^{'},$ then

$E [E (\sum_{i \in U_{1 (1)}} C_{i} I_{i (1)} | s_{2 (1)}, s_{1 (1)}) | s_{1 (1)}] = \sum_{i \in U_{1 (1)}} C_{i} I_{i (1)}^{(2)} = \sum_{i \in U_{1 (1)}} B_{i} I_{i}^{(1)} (U) Diag {I_{i (1)}^{(2)}} e_{i}$

$= \sum_{i \in U_{1 (1)}} F_{i} {[\frac{I_{i} (s_{1 (1)})}{π_{i 1}}, \frac{I_{i} (s_{1 (1)})}{π_{i 1}}, \frac{I_{i} (s_{1 (1)})}{π_{i 1}}]}^{'}$

$= \sum_{i \in U_{1 (1)}} F_{i} 1_{3} \frac{I_{i} (s_{1 (1)})}{π_{i 1}} = \sum_{i \in U_{1 (1)}} w_{i 1 (1)} F_{i (1)} I_{i} (s_{1 (1)}),$

where $I_{i (1)}^{(2)} = {(I_{i} (s_{1 (1)}), π_{i 2 | s_{1 (1)}} I_{i} (s_{1 (1)}), π_{i 3 | s_{2 (1)}} π_{i 2 | s_{1 (1)}} I_{i} (s_{1 (1)}))}^{'},$
$I_{i}^{(1)} (U) = diag [I_{i} (U_{1}) / π_{i 1}, I_{i} (U_{2}) / (π_{i 1} π_{i 2 | s_{1 (1)}}), I_{i} (U_{3}) / (π_{i 1} π_{i 2 | s_{1 (1)}} π_{i 3 | s_{2 (1)}})],$
$F_{i} = B_{i} I_{i} (U) Diag {e_{i}},$ and $1_{3} = {(1,1,1)}^{'};$ this implies that $N^{2} D_{(1) 1} = Var [\sum_{i \in U_{1 (1)}} w_{i 1} B_{i} I_{i} (U) e_{i} I_{i} (s_{1 (1)})] = Var [\sum_{i \in s_{1 (1)}} w_{i 1} F_{i (1)}] .$

For $N^{2} D_{(1) 2},$ we have:

$\begin{array}{l} E (\sum_{i \in U_{1 (1)}} C_{i} I_{i (1)} | s_{2 (1)}, s_{1 (1)}) \\ = \sum_{i \in U_{1 (1)}} B_{i} W_{i} Diag {e_{i}} I_{i (1)}^{(1)} = \sum_{i \in U_{1 (1)}} B_{i} I_{i}^{(1)} (U) Diag {I_{i (1)}^{(3)}} e_{i} \\ = \sum_{i \in U_{1 (1)}} B_{i} I_{i} (U) Diag {e_{i}} [\frac{I_{i} (s_{1 (1)})}{π_{i 1}}, \frac{I_{i} (s_{2 (1)}) I_{i} (s_{1 (1)})}{π_{i 1} π_{i 2 | s_{1 (1)}}}, {\frac{I_{i} (s_{2 (1)}) I_{i} (s_{1 (1)})}{π_{i 1} π_{i 2 | s_{1 (1)}}}]}^{'} \\ = \sum_{i \in s_{1 (1)}} w_{i 2} B_{i} I_{i} (U) Diag {e_{i}} {[π_{i 2 | s_{1 (1)}}, I_{i} (s_{2 (1)}), I_{i} (s_{2 (1)})]}^{'}, \end{array}$

where $I_{i (1)}^{(3)} = {(I_{i} (s_{1 (1)}), I_{i} (s_{2 (1)}) I_{i} (s_{1 (1)}), π_{i 3 | s_{2 (1)}} I_{i} (s_{2 (1)}) I_{i} (s_{1 (1)}))}^{'};$ then,

$\begin{array}{l} Var [E (\sum_{i \in U_{1 (1)}} C_{i} I_{i (1)} | s_{2 (1)}, s_{1 (1)}) | s_{1 (1)}] \\ = Var [\sum_{i \in s_{1 (1)}} w_{i 2} B_{i} I_{i} (U) Diag {e_{i}} I_{i (1)}^{(4)} | s_{1 (1)}] \\ = Var [\sum_{i \in s_{1 (1)}} w_{i 2} B_{i} I_{i} (U) Diag {e_{i}} {[0, I_{i} (s_{2 (1)}), I_{i} (s_{2 (1)})]}^{'} | s_{1 (1)}] \\ = Var [\sum_{i \in s_{1 (1)}} w_{i 2} B_{i} I_{i} (U) Diag {e_{i}} I_{i} (s_{2 (1)}) 1_{02} | s_{1 (1)}] \\ = Var [\sum_{i \in s_{2 (1)}} w_{i 2} B_{i} I_{i} (U) e_{i (2 \dots 3)} | s_{1 (1)}], (A.3) \end{array}$

where $I_{i (1)}^{(4)} = {[π_{i 2 | s_{1 (1)}}, I_{i} (s_{2 (1)}), I_{i} (s_{2 (1)})]}^{'}, 1_{02} = {(0,1,1)}^{'},$ and line (A.3) is because, conditional on $s_{1 (1)}, π_{i 2 | s_{1 (1)}}$ is constant and therefore the variance of that component is zero. This means that:

$N^{2} D_{(1) 2} = E {Var [E (\sum_{i \in U_{1 (1)}} C_{i} I_{i (1)} | s_{2 (1)}, s_{1 (1)}) | s_{1 (1)}]}$
$= E {Var [\sum_{i \in s_{2 (1)}} w_{i 2} F_{i (2)} | s_{1 (1)}]}$
$= Var [\sum_{i \in s_{2 (1)}} w_{i 2} F_{i (2)}] - Var {E [\sum_{i \in s_{2 (1)}} w_{i 2} F_{i (2)} | s_{1 (1)}]}$
$= Var [\sum_{i \in s_{2 (1)}} w_{i 2} F_{i (2)}] - Var {E [\sum_{i \in s_{2 (1)}} w_{i 2 | s_{1 (1)}} w_{i 1} F_{i (2)} | s_{1 (1)}]}$
$= Var [\sum_{i \in s_{2 (1)}} w_{i 2} F_{i (2)}] - Var {\sum_{i \in s_{1 (1)}} w_{i 1} F_{i}} .$

We can, similarly, show that:

$N^{2} D_{(1) 3} = E {E [Var (\sum_{i \in U_{1 (1)}} B_{i} W_{i} Diag {e_{i}} I_{i (1)} | s_{2 (1)}, s_{1 (1)}) | s_{1 (1)}]}$
$= E {E [Var (\sum_{i \in s_{3 (1)}} w_{i 3} I_{i} (s_{2 (1)}) I_{i} (s_{1 (1)}) F_{i (3)} | s_{2 (1)}, s_{1 (1)}) | s_{1 (1)}]}$
$= E {Var [\sum_{i \in s_{3 (1)}} w_{i 3} I_{i} (s_{2 (1)}) F_{i (3)} | s_{1 (1)}]$
$- Var [E (\sum_{i \in s_{3 (1)}} w_{i 3} I_{i} (s_{2 (1)}) F_{i (3)} | s_{2 (1)}, s_{1 (1)}) | s_{1 (1)}]}$
$= E {Var [\sum_{i \in s_{3 (1)}} w_{i 3} F_{i (3)} | s_{1 (1)}] - Var [\sum_{i \in s_{2 (1)}} w_{i 2} F_{i (3)} | s_{1 (1)}]}$
$= Var [\sum_{i \in s_{3 (1)}} w_{i 3} F_{i (3)}] - Var [E (\sum_{i \in s_{3 (1)}} w_{i 3} F_{i (3)} | s_{1 (1)})]$
$- Var [\sum_{i \in s_{2 (1)}} w_{i 2} F_{i (3)}] + Var [E (\sum_{i \in s_{2 (1)}} w_{i 2} F_{i (3)} | s_{1 (1)})]$
$= Var [\sum_{i \in s_{3 (1)}} w_{i 3} F_{i (3)}] - Var [\sum_{i \in s_{1 (1)}} w_{i 1} F_{i (3)}]$
$- Var [\sum_{i \in s_{2 (1)}} w_{i 2} F_{i (3)}] + Var [\sum_{i \in s_{1 (1)}} w_{i 1} F_{i (3)}] .$

With similar calculations, we obtain the corresponding expressions for $N^{2} D_{(2)}, N^{2} D_{(2) 2}, N^{2} D_{(2) 3},$ and $N^{2} D_{(3)} = N^{2} D_{(3) 3} .$

Finally, we sketch the development of an expression for ${Var}_{p} [Ψ_{s} (β_{N})]$ without assuming independence among cohorts. First, notice that $Ψ_{s} (β_{N})$ can be written as:

$\begin{array}{l} \sum_{i \in s_{1}} B_{i} I_{i} (U) [\begin{matrix} w_{i 1} & O \\ w_{i 2} \\ O & w_{i 3} \end{matrix}] [\begin{matrix} e_{i 1} \\ e_{i 2} \\ e_{i 3} \end{matrix}] + \sum_{i \in s_{2}} B_{i} I_{i} (U) [\begin{matrix} 0 & O \\ w_{i 2} \\ O & w_{i 3} \end{matrix}] [\begin{matrix} 0 \\ e_{i 2} \\ e_{i 3} \end{matrix}] \\ - \sum_{i \in s_{1}} B_{i} I_{i} (U) [\begin{matrix} 0 & O \\ w_{i 2} \\ O & w_{i 3} \end{matrix}] [\begin{matrix} 0 \\ e_{i 2} \\ e_{i 3} \end{matrix}] \\ + \sum_{i \in s_{3}} B_{i} I_{i} (U) [\begin{matrix} 0 & O \\ 0 \\ O & w_{i 3} \end{matrix}] [\begin{matrix} 0 \\ 0 \\ e_{i 3} \end{matrix}] - \sum_{i \in s_{2}} B_{i} I_{i} (U) [\begin{matrix} 0 & O \\ 0 \\ O & w_{i 3} \end{matrix}] [\begin{matrix} 0 \\ 0 \\ e_{i 3} \end{matrix}] \\ = \sum_{i \in s_{1}} w_{i 1} B_{i} I_{i} (U) e_{i} - \sum_{i \in s_{1}} w_{i 1} B_{i} I_{i} (U) [\begin{matrix} 0 \\ e_{i 2} \\ e_{i 3} \end{matrix}] + \sum_{i \in s_{2}} w_{i 2} B_{i} I_{i} (U) [\begin{matrix} 0 \\ e_{i 2} \\ e_{i 3} \end{matrix}] - \sum_{i \in s_{2}} w_{i 2} B_{i} I_{i} (U) [\begin{matrix} 0 \\ 0 \\ e_{i 3} \end{matrix}] \\ + \sum_{i \in s_{3}} w_{i 3} B_{i} I_{i} (U) [\begin{matrix} 0 \\ 0 \\ e_{i 3} \end{matrix}]; \end{array}$

letting $z_{i} = B_{i} I_{i} (U) e_{i}, z_{i (2 \dots 3)} = B_{i} I_{i} (U) {[0, e_{i 2}, e_{i 3}]}^{'},$ and $z_{i (3 \dots 3)} = B_{i} I_{i} (U) {[0,0, e_{i 3}]}^{'},$ ${Var}_{p} [Ψ_{s} (β_{N})]$ can be expanded as:

$\begin{array}{l} {Var}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i}] + {Var}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i (2 \dots 3)}] \\ + {Var}_{p} [\sum_{i \in s_{2}} w_{i 2} z_{i (2 \dots 3)}] + {Var}_{p} [\sum_{i \in s_{2}} w_{i 2} z_{i (3 \dots 3)}] \\ + {Var}_{p} [\sum_{i \in s_{3}} w_{i 3} z_{i (3 \dots 3)}] - 2 {Cov}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i}, \sum_{i \in s_{1}} w_{i 1} z_{i (2 \dots 3)}] \\ + 2 {Cov}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i}, \sum_{i \in s_{2}} w_{i 2} z_{i (2 \dots 3)}] \\ - 2 {Cov}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i}, \sum_{i \in s_{2}} w_{i 2} z_{i (3 \dots 3)}] + 2 {Cov}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i}, \sum_{i \in s_{3}} w_{i 3} z_{i (3 \dots 3)}] \\ - 2 {Cov}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i (2 \dots 3)}, \sum_{i \in s_{2}} w_{i 2} z_{i (2 \dots 3)}] + 2 {Cov}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i (2 \dots 3)}, \sum_{i \in s_{2}} w_{i 2} z_{i (3 \dots 3)}] \\ - 2 {Cov}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i (2 \dots 3)}, \sum_{i \in s_{3}} w_{i 3} z_{i (3 \dots 3)}] - 2 {Cov}_{p} [\sum_{i \in s_{2}} w_{i 2} z_{i (2 \dots 3)}, \sum_{i \in s_{2}} w_{i 2} z_{i (3 \dots 3)}] \\ + 2 {Cov}_{p} [\sum_{i \in s_{2}} w_{i 2} z_{i (2 \dots 3)}, \sum_{i \in s_{3}} w_{i 3} z_{i (3 \dots 3)}] - 2 {Cov}_{p} [\sum_{i \in s_{2}} w_{i 2} z_{i (3 \dots 3)}, \sum_{i \in s_{3}} w_{i 3} z_{i (3 \dots 3)}] \\ = {Var}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i}] + {Var}_{p} [\sum_{i \in s_{2}} w_{i 2} z_{i (2 \dots 3)}] - {Var}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i (2 \dots 3)}] \\ + {Var}_{p} [\sum_{i \in s_{3}} w_{i 3} z_{i (3 \dots 3)}] - {Var}_{p} [\sum_{i \in s_{2}} w_{i 2} z_{i (3 \dots 3)}] \\ + 2 {Cov}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i (1 \dots 1)}, \sum_{i \in s_{2}} w_{i 2} z_{i (2 \dots 3)}] - 2 {Cov}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i (1 \dots 1)}, \sum_{i \in s_{1}} w_{i 1} z_{i (2 \dots 3)}] \\ + 2 {Cov}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i (1 \dots 2)}, \sum_{i \in s_{3}} w_{i 3} z_{i (3 \dots 3)}] - 2 {Cov}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i (1 \dots 2)}, \sum_{i \in s_{2}} w_{i 2} z_{i (3 \dots 3)}] \\ + 2 {Cov}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i (2 \dots 2)}, \sum_{i \in s_{2}} w_{i 2} z_{i (3 \dots 3)}] - 2 {Cov}_{p} [\sum_{i \in s_{1}} w_{i 1} z_{i (2 \dots 2)}, \sum_{i \in s_{3}} w_{i 3} z_{i (3 \dots 3)}] (A.4) \\ + 2 {Cov}_{p} [\sum_{i \in s_{2}} w_{i 2} z_{i (2 \dots 2)}, \sum_{i \in s_{3}} w_{i 3} z_{i (3 \dots 3)}] - 2 {Cov}_{p} [\sum_{i \in s_{2}} w_{i 2} z_{i (2 \dots 2)}, \sum_{i \in s_{2}} w_{i 2} z_{i (3 \dots 3)}] . \end{array}$

In this last expression, the first thing we notice is that all the diagonal elements in all the covariance terms are exactly equal to zero; this means that whether or not the cohorts are independent of one another, expression (3.13) is exact for the variance terms.
To analyze the importance of the covariance terms, we concentrate on the term in line (A.4); the conclusion for the other terms is the same; note that this term can be written as:

$2 {Cov}_{p} [\sum_{i \in s_{1}} w_{i 1} \frac{\partial {μ^{'}}_{i}}{\partial β} V_{i}^{- 1} (\begin{matrix} e_{i 1} \\ e_{i 2} \\ 0 \end{matrix}), \sum_{i \in s_{3}} w_{i 3} \frac{\partial {μ^{'}}_{i}}{\partial β} V_{i}^{- 1} (\begin{matrix} 0 \\ 0 \\ e_{i 3} \end{matrix}) - \sum_{i \in s_{2}} w_{i 2} \frac{\partial {μ^{'}}_{i}}{\partial β} V_{i}^{- 1} (\begin{matrix} 0 \\ 0 \\ e_{i 3} \end{matrix})];$

Property 3.1 states that if the cohorts are design-independent, all the covariance terms are exactly equal to zero. In addition to that, from this last expression we conclude, trivially, that if the waves are design-independent, all the covariance terms are equal to zero too. This formula for the term in line (A.4) also implies that if the individual weights do not vary greatly between consecutive waves, and there is a high overlap between consecutive waves, the covariance terms are not too large. Finally, if the overlap is small, it is reasonable to assume design-independence between the waves, and then the covariance terms can be safely approximated by zero.

References

Ardilly, P., and Lavallée, P. (2007). Weighting in rotating samples: The SILC survey in France. Survey Methodology, 33, 2, 131-137.

Berger, Y.G. (2004a). Variance estimation for change: An evaluation based upon the 2000 finnish labour force survey. Proceedings. European Conference on Quality and Methodology in Official Statistics.

Berger, Y.G. (2004b). Variance estimation for measures of change in probability sampling. The Canadian Journal of Statistics, 32, 4, 451-467.

Binder, D.A. (1983). On the variances of asymptotically normal estimators from complex surveys. International Statistical Review, 51, 279-292.

Carrillo, I.A., Chen, J. and Wu, C. (2010). The pseudo-GEE approach to the analysis of longitudinal surveys. The Canadian Journal of Statistics, 38, 4, 540-554.

Carrillo, I.A., Chen, J. and Wu, C. (2011). A pseudo-GEE approach to analyzing longitudinal surveys under imputation for missing responses. Journal of Official Statistics, 27, 2, 255-277.

Carrillo, I.A., and Karr, A.F. (2011). Combining cohorts in longitudinal surveys. Technical Report 180, National Institute of Statistical Sciences, Research Triangle Park, NC. URL http://www.niss.org/sites/default/files/tr180.pdf.

Carrillo, I.A., and Karr, A.F. (2012). Estimating change with multi-cohort longitudinal surveys.In preparation.

Cox, B.G., Grigorian, K., Wang, R. and Harter, R. (2010). 2008 Survey of Doctorate Recipients Weighting Implementation Report, document prepared by the National Opinion Research Center (NORC) for the National Science Foundation (NSF).

Diggle, P., Heagerty, P., Liang, K.-Y. and Zeger, S. (2002). Analysis of Longitudinal Data, 2^nd Edition. Oxford University Press, New York.

Hedeker, D., and Gibbons, R.D. (2006). Longitudinal Data Analysis. Wiley Series in Probability and Statistics. New Jersey: John Wiley & Sons, Inc., Hoboken.

Hirano, K., Imbens, G.W., Ridder, G. and Rubin, D.B. (2001). Combining panel data sets with attrition and refreshment samples. Econometrica, 69, 6, 1645-1659.

Hu, F., and Kalbfleisch, J.D. (2000). The estimating function bootstrap (Pkg: P449-495). The Canadian Journal of Statistics, 28, 3, 449-481.

Larsen, M.D., Qing, S., Zhou, B. and Foulkes, M.A. (2011). Calibration estimation and longitudinal survey weights: Application to the NSF Survey of Doctorate Recipients. In Proceedings of the Survey Research Method Section, American Statistical Association, 1360-1374.

Liang, K.-Y., and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13-22.

Lohr, S. (2007). Recent developments in multiple frame surveys. In Joint Statistical Meeting of the American Statistical Association, 3257-3264.

McLaren, C.H., and Steel, D.G. (2000). The impact of different rotation patterns on the sampling variance of seasonally adjusted and trend estimates. Survey Methodology, 26, 2, 163-172.

National Science Foundation, National Center for Science and Engineering Statistics (2012). Survey of doctorate recipients. http://www.nsf.gov/statistics/srvydoctoratework/, accessed Feb. 09 2012.

Nevo, A. (2003). Using weights to adjust for sample selection when auxiliary information is available. Journal of Business & Economic Statistics, 21, 1, 43-52.

Qualité, L., and Tillé, Y. (2008). Variance estimation of changes in repeated surveys and its application to the Swiss survey of value added. Survey Methodology, 34, 2, 173-181.

Rao, J.N.K., and Wu, C. (2010). Pseudo-empirical likelihood inference for multiple frame surveys. Journal of the American Statistical Association, 105, 492, 1494-1503.

Roberts, G., Binder, D., Kova�ević, M., Pantel, M. and Phillips, O. (2003). Using an estimating function bootstrap approach for obtaining variance estimates when modelling complex health survey data. Proceedings of the Survey Methods Section, Statistical Society of Canada, Halifax.

Robins, J.M., Rotnitzky, A. and Zhao, L.P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90, 106-121.

Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581-592.

Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. New York: Springer-Verlag.

Smith, P., Lynn, P. and Elliot, D. (2009). Sample design for longitudinal surveys. In Methodology of Longitudinal Surveys, (Ed., P. Lynn). Wiley, Chichester, Chapter 2, 21-33.

Song, P.X.-K. (2007). Correlated Data Analysis: Modeling, Analytics, and Applications. Springer Series in Statistics. New York: Springer.

Steel, D., and McLaren, C. (2007). Design and analysis of repeated surveys. Keynote lecture. International Conference on Quality Management of Official Statistics, Korea.

Vieira, M.D.T. (2009). Analysis of Longitudinal Survey Data: Allowing for the Complex Survey Design in Covariance Structure Models. VDM Verlag.

Vieira, M.D.T., and Skinner, C.J. (2008). Estimating models for panel survey data under complex sampling. Journal of Official Statistics, 24, 3, 343-364.

Date modified:: 2017-09-20

Language selection

Search and menus

Search