Comparison of unit level and area level small area estimators 1. Introduction

Model-based small area estimators have been widely used in practice to provide reliable indirect estimates for small areas in recent years. The model-based estimators are based on explicit models that provide a link to related small areas through supplementary data such as census and administrative records. Small area models can be classified into two broad types: (i) Unit level models that relate the unit values of the study variable to unit-specific auxiliary variables and (ii) Area level models that relate direct estimators of the study variable of the small area to the corresponding area-specific auxiliary variables. In general, area level models are used to improve the direct estimators if unit level data are not available. The sampling set-up is as in Rao (2003). That is, a universe U MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGvbaaaa@3879@ of size N MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGobaaaa@3872@ is split into m MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGTbaaaa@3891@ non-overlapping small areas U i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGvbWaaS baaSqaaiaadMgaaeqaaaaa@3993@ of size N i , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGobWaaS baaSqaaiaadMgaaeqaaOGaaiilaaaa@3A46@ where i = 1 , , m . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGPbGaey ypa0JaaGymaiaacYcacqWIMaYscaGGSaGaamyBaiaac6caaaa@3E74@ Sampling is carried out in each small area using a probabilistic mechanism, resulting in samples s i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGZbWaaS baaSqaaiaadMgaaeqaaaaa@39B1@ of size n i . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGUbWaaS baaSqaaiaadMgaaeqaaOGaaiOlaaaa@3A68@ The selection probabilities associated with each element j = 1 , , n i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGQbGaey ypa0JaaGymaiaacYcacqWIMaYscaGGSaGaamOBamaaBaaaleaacaWG Pbaabeaaaaa@3EDE@ selected in sample s i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGZbWaaS baaSqaaiaadMgaaeqaaaaa@39B1@ is denoted as p i j . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGWbWaaS baaSqaaiaadMgacaWGQbaabeaakiaac6caaaa@3B59@ The resulting design weights are given by w i j = n i 1 p i j 1 . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWG3bWaaS baaSqaaiaadMgacaWGQbaabeaakiabg2da9iaad6gadaqhaaWcbaGa amyAaaqaaiabgkHiTiaaigdaaaGccaWGWbWaa0baaSqaaiaadMgaca WGQbaabaGaeyOeI0IaaGymaaaakiaac6caaaa@44D7@ In practice, these weights can be adjusted to account for non-response and/or auxiliary information. The resulting weights are known as the survey weights. In this paper, we assume full response to the survey, and no adjustment to the auxiliary data. Direct area level estimates are obtained for each area using the survey weights and unit observations from the area. The survey design can be incorporated into small area models in different ways. In the area level case, direct design-based estimators are modeled directly and the survey variance of the associated direct estimator is introduced into the model via the design-based errors. In the case of the unit level, the observations can be weighted using the survey weight. A number of factors affect the success of using these estimators. Two important factors are whether the assumed model is correct and whether the variable of interest is correlated with the selection probabilities associated with the sampling process, that is, informativeness of the sampling process. In this paper, we compare, via a simulation study, the impact of model misspecification and the informativeness of the sampling design for two basic small area procedures based on unit and area levels in terms of bias, estimated mean squared error and confidence interval coverage rates. A sampling design is informative if the selection probabilities p i j MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGWbWaaS baaSqaaiaadMgacaWGQbaabeaaaaa@3A9D@ are related to the variable of interest y i j MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWG5bWaaS baaSqaaiaadMgacaWGQbaabeaaaaa@3AA6@ even after conditioning on the covariates x i j . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWH4bWaaS baaSqaaiaadMgacaWGQbaabeaakiaac6caaaa@3B65@ In such cases, we have informative sampling in the sense that the population model no longer holds for the sample. Pfeffermann and Sverchkov (2007) accounted for this possibility by adjusting the small area procedures. Verret, Rao and Hidiroglou (2015) simplified the procedure. In this paper, we do not adjust the small area procedures for informativeness, but study their impact.

The paper is structured as follows. The point estimators and associated mean squared error estimators for the unit level and area models are described in Section 2 and in Section 3 respectively. The description of the simulation and results are given in Section 4. This simulation computes the point and associated mean squared errors for a probability proportional to size with replacement (PPSWR) sampling scheme by varying the following two factors: (a) the assumed model is correct or incorrect; and (b) design informativeness varies from being non-significant to being very significant. In Section 5, we give an example using data from Battese, Harter and Fuller (1988) that compares the unit level and area level estimates. Finally, conclusions resulting from this work are presented in Section 6.

Date modified: