Bayesian predictive inference of a proportion under a two-fold small area model with heterogeneous correlations
Section 1. Introduction

We assume that there are several small areas, each area consists of several clusters and each cluster consists of a number of units (individuals). A random sample of clusters is taken from each area and within each sampled cluster a random sample of units is taken. This is a two-fold sampling design; see Rao and Molina (2015). When there is cluster sampling, the units within a cluster are generally positive and this correlation can have a significant impact on inference. We consider this situation for binary responses; see Nandram (2015) who defined an intracluster (between two units in the same cluster) correlation and an intercluster (between two units in two different clusters in the same area) correlation. We extend the model of Nandram (2015), who assumes that the correlation remains constant over areas, to accommodate the situation in which the correlations can be different. We are interested in the finite population proportion for each area, and like Nandram (2015), we use a hierarchical Bayesian model for this purpose.

Given such correlated data, a statistical problem arises from the intracluster correlation, leading to a smaller effective sample size and therefore larger variability in the estimates. Thus, when there is clustering effect, analyses that assume independence of the units will generally result in smaller p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqr=fFv0dd9Wqpe0dd9 qqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9Ff0dfrpe0dXdHqps0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiCaiaayk W7cqGHsislaaa@3769@ values (i.e., rejection when it is otherwise). Rao and Scott (1981, 1984) have studied this problem and presented simple corrections to standard chi-squared statistic for the test of independence in two-way contingency tables under a complex sample design (e.g., two-stage cluster sampling).

Nandram and Sedransk (1993) presented a hierarchical Bayesian model under two-stage cluster sampling. This is the design we have within each area in a two-fold sample design with binary responses. As a discrete analogue of the model for two-stage cluster sampling with normal data (Scott and Smith 1969), this model makes inference about the overall finite population proportion. This model was also extended by Nandram (1998) to multinomial data which can be viewed as a Bayesian analogue of the multinomial-Dirichlet model for cluster sampling (Brier 1980).

For two-fold modeling, there are a limited number of studies for continuous response variables, and almost none for discrete (binary) data. Most of the analyses for two-fold modeling are based on the empirical Bayes framework. Fuller and Battese (1973) introduced one-fold and two-fold nested error regression models. Ghosh and Lahiri (1988) studied multistage sampling under posterior linearity using Bayes and empirical Bayes methods. Under two-stage and three-stage cluster sampling, estimation of regression models with nested error structure and unequal error variances were further studied by Stukel and Rao (1997). Small area models under two-fold nested error regression models were also studied by Stukel and Rao (1999); see Rao and Molina (2015) for a review. Nandram (2015) proposed a hierarchical Bayesian model for binary data arising from a two-fold sample design.

Nandram (2015) showed that it is important to consider the sample design within each area. Specifically, similar to Rao and Scott (1981, 1984), he showed that if a model does not capture the two-stage cluster sample design within each small area, the result will be too optimistic. That is, the variability will be too small. It is also true that the point estimates could be different when the two-stage cluster sample design is ignored. He also noted that there are other situations where the result could be the opposite. For example, if there is a stratified design, rather than a two-stage cluster sampling design, there will be increased precision within each area (i.e., the design effect for each area will be smaller than one). See Nandram, Bhatta, Sedransk and Bhadra (2013) for a Bayesian analysis on this problem.

To gain flexibility and generality over the two-fold hierarchical Bayesian model, Nandram (2015), we generalize it to incorporate unequal intracluster correlations. Our idea is to extend the model of Nandram (2015) by considering an additional layer for intracluser correlation to vary over areas in the two-fold sample design and to compare the two-fold model with homogeneous correlation (equal over areas) and heterogeneous correlations (vary over areas). Like the homogeneous model, the heterogeneous model has weakly identified parameters. When a Markov chain Monte Carlo sampler is used to fit such a model, there can be long-range dependence, and there will be difficulties in monitoring convergence of a Gibbs sampler. Nandram (2015) showed how to overcome the difficulty with these weakly identified parameters using random draws. Similar random draws are discussed in Molina, Nandram and Rao (2014) and Toto and Nandram (2010) who avoided Markov chain Monte Carlo modeling fitting completely. Unfortunately, it is not simple to use random draws to fit the heterogeneous model; we are forced to use the Gibbs sampler.

We use the blocked Gibbs sampler to fit our two-fold small area model. There are two difficulties we face. First, the conditional posterior densities of the correlation parameters can be multimodal. Second, some parameters can be related in a complex manner. Both of these issues can give difficulties in using a Markov chain Monte Carlo sampler leading to long-range dependence in the iterates. Thus, to help relieve these difficulties, we have restricted the prior densities of the area parameters to be unimodal and we have used the blocked Gibbs sampler to draw groups of parameters simultaneously. Both strategies lead to additional complexities but much better fitting samplers.

As a summary, we extend Nandram (2015) to accommodate heterogeneous correlations. The model with heterogeneous correlations is desirable because one may assume that the correlation does not vary with area when it actually does and this can lead to inaccurate results. Evidently, this is an important contribution beyond Nandram (2015). However, we encounter three difficulties. 

  1. The heterogeneous correlations introduce weakly identifiable parameters into the model.
  2. Unlike Nandram (2015) Markov chain Monte Carlo methods are needed to fit the model.
  3. A useful unimodal restriction is imposed on the hyperparameters to help proper mixing.

We have an innovative construction of a griddy blocked Gibbs sampler to fit the model with heterogeneous correlations. We have extensive testing of our model beyond Nandram (2015).

In this paper we consider Bayesian predictive inference of the finite population proportions of a number of small areas when there is a cluster sample design within each area. In our main contributions we use a hierarchical Bayesian model, which has unequal intracluster correlations, to make posterior inference about the finite population proportion of each area. In Section 2 we have a detailed description of the heterogeneous model. Specifically, first for motivation and updating, we give a brief review of the homogeneous model, Nandram (2015). We show that some parameters can be weakly identified. We also describe the computation to draw a random sample from the posterior distribution using the blocked Gibbs sampler. In Section 3, to compare the models with homogeneous correlation and heterogeneous correlations, we present an illustrative example on the Third International Mathematics and Science Study (TIMSS) and a small-scale simulation study. Finally, Section 4 contains concluding remarks and future research directions. Appendices A and B contain proofs and additional information.


Report a problem on this page

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

Privacy notice

Date modified: