Integer programming formulations applied to optimal allocation in stratified sampling
5. Final remarksInteger programming formulations applied to optimal allocation in stratified sampling
5. Final remarks
In this paper we provided two new
formulations leading to the achievement of the global minimum in multivariate
optimum allocation problems. These exact integer programming formulations can
be efficiently implemented using off the shelf free software (namely the
R package). In
addition, the proposed formulations enable the definition of minimum sample
sizes per strata, something which is clearly of interest in practice to avoid
allocations with sample sizes less than 2, for example, which would lead to
difficulties regarding variance estimation. Such minimum sample sizes may be
set at larger values (say 5, 10, 30 or some other number) to ensure that the
samples are large enough to tolerate some nonresponse or to ensure estimation
is feasible for each stratum, if the strata are used as estimation domains.
The proposed approach improves upon
the existing methods by tackling the allocation problem directly, and dealing
with the non-linearity of either the objective function or the constraints, as
well as the requirement that the solution provides only integer sample sizes
for the strata. In the literature, previously existing methods tackle the
problem with approaches which are not guaranteed to reach the global optimum,
or that produce real-valued allocations that must be rounded to integer-values.
In practice, finding real-valued
allocations is not a big problem, unless the stratum population sizes
are very small
or when there is a very large number of strata. In the first case, sampling one
unit more, or less, can make a big change in the sampling fractions, which can
cause some large impacts in the variances. In the second case, rounding the
allocated sample sizes can make a difference in the total sample size
When all the
stratum population sizes
are
relatively large, and the number of strata is reasonable, rounding non-integer
sample sizes will not create a problem.
In this paper we carried out some
limited numerical work, aimed essentially at demonstrating the feasibility of
the proposed approach. The results obtained using Formulation C of the proposed
approach are comparable to those achieved using the Bethel method, while
providing integer-valued allocations that correspond to the global optimum. But
given that only little differences were found between the two methods (BSSM and
Bethel) in the applications considered, there may be little incentive to move
to the BSSM method. The results obtained under Formulation D showed modest
improvements over the textbook method used in the comparison.
Further research is needed to test
the approach for larger problems and to assess its merits compared to other
methods under other practical scenarios. An important advantage of the proposed
approach is that both formulations can be implemented using off the shelf
software, as indicated.
Acknowledgements
This
research was supported by FAPERJ. Research Grant E-26/111.947/2012.
Appendix A
Description of the survey populations considered in the numerical
experiment
Table A1
Description of the populations Table summary
This table displays the results of Description of the populations. The information is grouped by Population (appearing as row headers), Description and Survey Variables XXXX (appearing as column headers).
Population
Description
Survey Variables
CoffeeFarms
Coffee farms in the state of Paraná, Brazil, from 1996 Agricultural Census.
Number of Coffee Trees
Total Farm Area
Coffee Production
SchoolsNortheast
Data from the 2012 census of schools, by school, for schools in the Northeast region of Brazil.
Number of classrooms
Number of employees
MunicSw
Information about Swiss municipalities from the package
Area of Farming
Industrial Area
Number of Households
Population
Table A2
Stratification of the populations Table summary
This table displays the results of Stratification of the populations. The information is grouped by Population (appearing as row headers), Stratification (appearing as column headers).
Population
Stratification
CoffeeFarms
Stratified considering the Number of Coffee Trees variable, using the Kozak algorithm available in the
package.
SchoolsNortheast
Twelve strata were formed considering: school type (4 classes), and school size - number of students (3 classes). School size stratification was performed using
clustering algorithm within each school type.
MunicSw
This population is available from the
package and the strata correspond to regions of Switzerland.
Table A3
Number of strata, number of survey variables and total size for the survey populations considered Table summary
This table displays the results of Number of strata. The information is grouped by Population (appearing as row headers), XXXX (appearing as column headers).
Population
CoffeeFarms
3
3
20,472
SchoolsNortheast
12
2
75,084
MunicSw
7
4
2,896
Table A4
Population summaries per stratum
Table summary
This table displays the results of Population summaries per stratum
. The information is grouped by Summary (appearing as row headers), XXXX and Stratum
XXXX (appearing as column headers).
Summary
Stratum
17,821
2,440
211
4,291
26,688
218,712
22
84
488
2,671
13,204
129,033
2,873
15,541
193,366
69
262
583
4,611
24,704
200,447
Table A5
Population summaries per stratum
Table summary
This table displays the results of Population summaries per stratum
. The information is grouped by Stratum (appearing as row headers), XXXX (appearing as column headers).
Stratum
82
45.1
54.0
309.2
24.9
63
23.9
146.3
14.4
92.6
7
80.9
700.4
29
342.5
783
16.2
95.7
6.4
49.5
2,676
10.9
57.7
21.6
23.7
3,958
6.1
26.7
4.2
17.9
2,172
13.6
76.8
5.7
27.9
45,243
2.5
9.3
3
8.8
9,674
7.7
38.0
3.2
17.9
1,743
17.3
49.1
9.2
36.7
8,445
7.3
15.3
4.1
13.5
238
37.7
140.8
18.4
88.9
Table A6
Population summaries per stratum
Table summary
This table displays the results of Population summaries per stratum
. The information is grouped by Summary (appearing as row headers), XXXX and Stratum
XXXX (appearing as column headers).
Summary
Statum
589
913
321
171
471
186
245
262.5
367.2
262.7
438.0
429.5
668.9
47.0
5.5
5.3
9.7
13.3
7.9
11.0
4.1
963.9
782.1
1,345.2
3,319.1
906.0
1,465.2
550.7
2,252.5
1,839.4
3,099.5
7,297.7
2,226.0
3,675.8
1,252.4
220.5
342.4
173.2
290.2
414.2
568.7
65.3
15.1
13.0
19.4
29.7
14.9
15.5
8.2
4,600.9
2,794.7
5,003.5
14,610.0
2,178.6
2,802.1
1,197.5
9,540.3
5,621.6
9,764.5
28,589.4
4,759.4
5,914.5
2,514.9
References
Ballin, M., and
Barcaroli, G. (2008). Optimal stratification of sampling frames in a multivariate
and multidomain sample design. Contributi
ISTAT, 10.
Bazaraa, M.S.,
Sherali, H.D. and Shetty, C.M. (2006). Nonlinear
Programming: Theory and Algorithms. New York: John Wiley & Sons, Inc,
Third Edition.
Bethel, J.
(1985). An optimum allocation algorithm for multivariate surveys. Proceedings of the Survey Research
Methods Section, American Statistical Association, 209-212.
Bethel, J.
(1989). Sample allocation in multivariate surveys. Survey Methodology, 15, 1, 47-57.
Chromy, J.
(1987). Design optimization with multiple objectives. Proceedings of the Survey Research
Methods Section, American Statistical Association, 194-199.
Cochran, W.G.
(1977). Sampling Techniques. Third
Edition-Wiley.
Day, C.D.
(2010). A multi-objective evolutionary algorithm for multivariate optimal allocation. Proceedings of
the Survey Research Methods Section, American Statistical
Association.
Folks, J.L., and
Antle, C.E. (1965). Optimum allocation of sampling units to strata when there are
R responses of interest. Journal of theAmerican Statistical Association, 60
(309), 225-233.
García,
J.A.D., and Cortez, L.U. (2006). Optimum allocation in multivariate stratified sampling:
Multi-objective programming. Comunicaciones Del Cimat, no I-06-07/28-03-2006.
Huddleston,
H.F., Claypool, P.L. and Hocking, R.R. (1970). Optimal sample allocation to strata
using convex programming. Journal of the
Royal Statistical Society, Series C, 19 (3).
Ismail, M.V.,
Nasser, K. and Ahmad, Q.S. (2011). Solution of a multivariate stratified sampling
problem through Chebyshev’s Goal programming. Pakistan Journal of Statistics
and Operation Research, vol. vii,
1, 101-108.
Khan, M.G.M., and Ahsan, M.J. (2003). A note on optimum allocation in
multivariate stratified sampling. The South Pacific Journal of Natural Science, 21, 91-95.
Khan, M.F., Ali, I. and Ahmad, Q.S. (2011). Chebyshev approximate solution
to allocation problem in multiple objective surveys with random costs. American Journal of Computational
Mathematics, 1, 247-251.
Khan, M.F., Ali, I., Raghav, Y.S. and Bari, A. (2012). Allocation in multivariate
stratified surveys with non-linear random cost function. American Journal of Operations Research, 2, 100-105.
Kish, L. (1976). Optima and proxima in linear sample designs. Journal of the Royal Statistical Society, Series A, 139 (1), 80-95.
Kokan, A.R. (1963). Optimum allocation in multivariate surveys. Journal of the Royal Statistical Society, Series A, 126 (4), 557-565.
Kokan, A.R., and Khan, S. (1967). Optimum allocation in multivariate surveys:
An analytical solution. Journal of the
Royal Statistical Society, Series B,
29 (1), 115-125.
Kozak, M. (2006). Multivariate sample allocation: Application of random search
method. Statistics in Transition, 7
(4), 889-900.
Land, A.H., and
Doig, A.G. (1960). An Automatic method for solving discrete programming
problems. Econometrica, 28 (3),
497-520.
Lohr, S.L. (2010). Sampling: Design and Analysis, Second
edition. Brooks/Cole, Cengage Learning.
Luenberger, D.G., and
Ye, Y. (2008). Linear and Non-Linear
Programming, Third Edition. Springer.
Särndal, C.-E.,
Swensson, B. and Wretman, J. (1992). Model
Assisted Survey Sampling. New York: Springer-Verlag.
Valliant, R., and
Gentle, J.E. (1997). An application of mathematical programming to sample
allocation. Computational Statistics
& Data Analysis, 25, 337-360.
Wolsey, L.A. (1998). Integer Programming.
Wiley-Interscience Series in Discrete Mathematics and Optimization.
Wolsey, L.A., and Nemhauser, G.L. (1999).Integer and Combinatorial Optimization.
Wiley-Interscience Series in Discrete Mathematics and Optimization.
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.