Joint determination of optimal stratification and sample allocation using genetic algorithm

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Marco Ballin and Giulio Barcaroli1

Abstract

This paper offers a solution to the problem of finding the optimal stratification of the available population frame, so as to ensure the minimization of the cost of the sample required to satisfy precision constraints on a set of different target estimates. The solution is searched by exploring the universe of all possible stratifications obtainable by cross-classifying the categorical auxiliary variables available in the frame (continuous auxiliary variables can be transformed into categorical ones by means of suitable methods). Therefore, the followed approach is multivariate with respect to both target and auxiliary variables. The proposed algorithm is based on a non deterministic evolutionary approach, making use of the genetic algorithm paradigm. The key feature of the algorithm is in considering each possible stratification as an individual subject to evolution, whose fitness is given by the cost of the associated sample required to satisfy a set of precision constraints, the cost being calculated by applying the Bethel algorithm for multivariate allocation. This optimal stratification algorithm, implemented in an R package (SamplingStrata), has been so far applied to a number of current surveys in the Italian National Institute of Statistics: the obtained results always show significant improvements in the efficiency of the samples obtained, with respect to previously adopted stratifications.

Key Words

Genetic algorithm; Optimal stratification; Sample design; Sample allocation; R package.

Table of content

1 Introduction

2 Formalization of the optimization problem

3 Application of the genetic algorithm to the optimal stratification problem

4 An example based on the Iris flowers dataset

5 An application: the Italian Farm Structure Survey (FSS)

6 A further application: the Monthly survey on milk and milk products

7 Conclusions and future work

 

 

 

 

 


1Marco Ballin and Giulio Barcaroli, Istituto Nazionale di Statistica, via C.Balbo 16 - 00184 Roma (Italy). E mail: ballin@istat.it, barcarol@istat.it.

Date modified: