Sample allocation for efficient model-based small area estimation
Section 5. Concluding remarks

This research was focused on seven different allocation solutions which were categorized into three groups according to the auxiliary data needed in their implementation. The least amount of auxiliary information is needed in equal and proportional allocation which are based on the number of areas and the number of statistical units in each area. The Neyman, Bankier and NLP allocations are based on pre-set optimization criteria, and application of these methods presumes area-specific parameter information such as the standard deviation or CV of the study variable, and in the Bankier allocation the area totals of at least one auxiliary variable must be known. Because the study variable is unknown, it must be replaced with a suitable proxy or auxiliary variable to enable the use of these three methods. A common feature of the number-based and parameter-based allocations is that they are not based on any model, whereas the other three allocations utilize the underlying model, in addition to number-based information.

On the basis of the empirical results, the performance of the model-based g 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqr=fFD0xd9Wqpe0dd9 qqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9Ff0dfrpm0dXdHqps0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeeaaaaaaaa4 0BPjhapeGaam4zaiaabgdacaaMc8UaeyOeI0caaa@3A42@ allocation can be regarded as the best compared with the other allocations tested in this research. Also equal and proportional allocations reached good results, but the model-assisted allocations and the parameter-based allocations had clearly weaker performances. The last three allocations are developed originally for direct design-based estimation, and their results can be understood from that point of view. Compared with g 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqr=fFD0xd9Wqpe0dd9 qqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9Ff0dfrpm0dXdHqps0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeeaaaaaaaa4 0BPjhapeGaam4zaiaabgdacaaMc8UaeyOeI0caaa@3A42@ allocation, the MC-allocations are based on a different model and this fact seems to affect their results.

One of the characteristics of the g 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqr=fFD0xd9Wqpe0dd9 qqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9Ff0dfrpm0dXdHqps0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeeaaaaaaaa4 0BPjhapeGaam4zaiaabgdacaaMc8UaeyOeI0caaa@3A42@ allocation is that when the sampling design is constructed, also the model and estimation method are fixed, meaning that they are regarded as given preliminary information. This allocation, which is based on a unit-level linear mixed model and EBLUP estimation method, needs only the homogeneity coefficient between areas which is computed by using the values of the auxiliary variable. In this respect, the g 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqr=fFD0xd9Wqpe0dd9 qqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9Ff0dfrpm0dXdHqps0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeeaaaaaaaa4 0BPjhapeGaam4zaiaabgdacaaMc8UaeyOeI0caaa@3A42@ allocation differs from the other allocations used in the comparison. Also the starting point for choosing the final estimation method is different, because this allocation is focused on model-based estimation, not on direct design-based estimation using sampling weights. The choice of the model-based estimation is justified also for the reason that it is commonly used in small area estimation. On the other hand, the g 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqr=fFD0xd9Wqpe0dd9 qqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9Ff0dfrpm0dXdHqps0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeeaaaaaaaa4 0BPjhapeGaam4zaiaabgdacaaMc8UaeyOeI0caaa@3A42@ allocation enables the use of small sample sizes, because information can be borrowed between areas when the model is applied. This can be significant in quick surveys or studies carried out by market research organizations, when a single measurement is expensive. However, it is important to examine the characteristics of the areas and especially the small areas, before the final sample sizes are determined.

As a recommendation, it would be justified to start a wider research to find out what advantages and disadvantages are encountered if the applicable computing technique for producing area statistics is decided as early as in the design of the research plan.

Acknowledgements

The authors thank the Editor, Associate Editor and two referees as well as Professor Risto Lehtonen for constructive comments and suggestions.

References

Bankier, M.D. (1988). Power allocations: Determining sample sizes for subnational areas. The American Statistician, 42, 174-177.

Choudhry, G.H., Rao, J.N.K. and Hidiroglou, M.A. (2012). On sample allocation for efficient domain estimation. Survey Methodology, 38, 1, 23-29. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2012001/article/11682-eng.pdf.

Costa, A., Satorra, A. and Ventura, E. (2004). Improving both domain and total area estimation by composition. SORT, 28(1), 69-86.

Falorsi, P.D., and Righi, P. (2008). A balanced sampling approach for multi-way stratification for small area estimation. Survey Methodology, 34, 2, 223-234. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2008002/article/10763-eng.pdf.

Keto, M., and Pahkinen, E. (2009). On sample allocation for effective EBLUP estimation of small area totals MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeaaciGaaiaabeqaamaabaabaaGcbaacbaqcLbwaqa aaaaaaaaWdbiaa=nbiaaa@3D01@ “Experimental Allocation”. In Survey Sampling Methods in Economic and Social Research, (Eds., J. Wywial and W. Gamrot), 2010. Katowice: Katowice University of Economics.

Keto, M., and Pahkinen, E. (2014). On sample allocation for efficient small area estimation. Book of Abstracts. SAE 2014, Poland: Poznan University of Economics, page 50.

Longford, N.T. (2006). Sample size calculation for small-area estimation. Survey Methodology, 32, 1, 87-96. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2006001/article/9259-eng.pdf.

Molefe, W.B., and Clark, R.G. (2015). Model-assisted optimal allocation for planned domains using composite estimation. Survey Methodology, 41, 2, 377-387. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2015002/article/14230-eng.pdf.

Nissinen, K. (2009). Small Area Estimation with Linear Mixed Models from Unit-Level Panel and Rotating Panel Data. Ph.D. thesis, University of Jyväskylä, Department of Mathematics and Statistics, Report 117, https://jyx.jyu.fi/dspace/handle/123456789/21312.

Pfefferman, D. (2013). New important developments in small area estimation. Statistical Science, 28, 40-68.

Rao, J.N.K. (2003). Small Area Estimation. Hobogen, New Jersey: John Wiley & Sons, Inc.

Tschuprow, A.A. (1923). On the mathematical expectation of the moments of frequency distributions in the case of correlated observations. Metron, Vol. 2, 3, 461-493; 4, 646-683.


Date modified: