3 Application of the genetic algorithm to the optimal stratification problem

Marco Ballin and Giulio Barcaroli

Previous | Next

On the basis of the G A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4rai aadgeaaaa@3B93@ setting, the stratification allocation problem can be represented as follows:

  • a given stratification is considered as an individual;
  • the genome of an individual is a vector whose dimension is given by the number K MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4saa aa@3AD1@
  • of atomic strata;   each position i( i=1,,K ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyAam aabmaabaGaamyAaiabg2da9iaaigdacaGGSaGaeSOjGSKaaiilaiaa dUeaaiaawIcacaGLPaaaaaa@4279@ in the vector is associated to a given atomic stratum, and contains an integer value v i ( 1< v i <U ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamODam aaBaaaleaacaWGPbaabeaakmaabmaabaGaaGymaiabgYda8iaadAha daWgaaWcbaGaamyAaaqabaGccqGH8aapcaWGvbaacaGLOaGaayzkaa aaaa@4364@ with UK, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyvai abgsMiJkaadUeacaGGSaaaaa@3E10@  where U MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyvaa aa@3ADB@ is defined as the maximum number of strata in the final solution: if some elements of the vector have the same value, it means that the corresponding atomic strata collapse into a new stratum identified by this value;
  • in this way, a stratification P( ν ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiuam aabmaabaGaeqyVd4gacaGLOaGaayzkaaaaaa@3E16@ can be identified by a vector v=[ v 1 , , v K ], MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaacbmGaa8 NDaiabg2da9maadmaabaGaamODamaaBaaaleaacaaIXaaabeaakiaa cYcacaWGGaGaeSOjGSKaamiiaiaacYcacaWG2bWaaSbaaSqaaiaadU eaaeqaaaGccaGLBbGaayzxaaGaaiilaaaa@4664@ where each value v i MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamODam aaBaaaleaacaWGPbaabeaaaaa@3C15@  is positionally associated to the atomic stratum identified by the label l i MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiBam aaBaaaleaacaWGPbaabeaaaaa@3C0B@  and can assume an integer value internal to an interval [ 1,U ]. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaWaamWaae aacaaIXaGaaiilaiaadwfaaiaawUfacaGLDbaacaGGUaaaaa@3EEA@  The space of all potential stratifications (or partitions) P( ν ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiuam aabmaabaGaeqyVd4gacaGLOaGaayzkaaaaaa@3E16@  (space of solutions) is given by all possible vectors v; MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaacbmGaa8 NDaiaacUdaaaa@3BC3@
  • the fitness function of an individual P( ν ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiuam aabmaabaGaeqyVd4gacaGLOaGaayzkaaaaaa@3E16@  is the value of the cost function C( n 1 ,, n H P( v ) )= C 0 + h=1 H P( v ) C h n h , MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4qam aabmqabaGaamOBamaaBaaaleaacaaIXaaabeaakiaacYcacqWIMaYs caGGSaGaamOBamaaBaaaleaacaWGibWaaSbaaWqaaiaadcfadaqada qaaiaadAhaaiaawIcacaGLPaaaaeqaaaWcbeaaaOGaayjkaiaawMca aiabg2da9iaadoeadaWgaaWcbaGaaGimaaqabaGccqGHRaWkdaaeWa qaaiaadoeadaWgaaWcbaGaamiAaaqabaGccaWGUbWaaSbaaSqaaiaa dIgaaeqaaaqaaiaadIgacqGH9aqpcaaIXaaabaGaamisamaaBaaame aacaWGqbWaaeWaaeaacaWG2baacaGLOaGaayzkaaaabeaaa0Gaeyye IuoakiaacYcaaaa@5788@ where the terms C 0 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4qam aaBaaaleaacaaIWaaabeaaaaa@3BAE@  and C h MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4qam aaBaaaleaacaWGObaabeaaaaa@3BE1@ are given constants, and the n 1 ,, n H P( v ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOBam aaBaaaleaacaaIXaaabeaakiaacYcacqWIMaYscaGGSaGaamOBamaa BaaaleaacaWGibWaaSbaaWqaaiaadcfadaqadaqaaiaadAhaaiaawI cacaGLPaaaaeqaaaWcbeaaaaa@43E3@ are calculated by applying the Bethel algorithm to the stratification, under precision constraints set on the target variables.

It is worth while noting that, if we set C 0 = 0 , MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4qam aaBaaaleaacaaIWaaabeaakiabg2da9iaaicdacaGGSaaaaa@3E28@ and C h = 1 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4qam aaBaaaleaacaWGObaabeaakiabg2da9iaaigdaaaa@3DAC@ for all the atomic strata, then the value of the cost function simply coincides with the sample size required to satisfy precision constraints.

Having defined a suitable representation of the domain of all possible solutions, and the fitness function to be calculated for each solution, in the following it is reported how G A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4rai aadgeaaaa@3B93@ operates.

Step 0: Creation of the initial generation of individuals

The first step consists in forming an initial set of different stratifications (the initial generation of individuals): on the basis of the value of the parameter size of the generations, p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCaa aa@3AF6@ different individuals are generated. This means that, for the j th MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOAam aaCaaaleqabaGaaeiDaiaabIgaaaaaaa@3CFF@ individual, K MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4saa aa@3AD1@ integer values (one for each element of the vector representing the genome) are randomly generated from a uniform distribution in the interval [ 1 , U ] . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaWaamWaae aacaaIXaGaaiilaiaadwfaaiaawUfacaGLDbaacaGGUaaaaa@3EEA@ Fixing U K MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyvai abgsMiJkaadUeaaaa@3D5F@ we can set an upper limit to the maximum number of distinct aggregate strata.

Step 1: Evaluation of fitness for each individual in the population

For each individual in the population (that is for each one of p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCaa aa@3AF6@ stratifications), its related fitness is evaluated by calculating the total cost required to satisfy precision constraints on the G MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4raa aa@3ACD@ different T ^ g MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabmivay aajaWaaSbaaSqaaiaadEgaaeqaaaaa@3C01@ estimates (in order to remove the dependence on the scale (or range) of the values associated with the G MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4raa aa@3ACD@ target variables, instead of considering the constraints expressed in the (2.7) as an upper limit to the variance of the target variables, we set constraints on their coefficient of variation CV = var ( T ^ G ) / T ^ G MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaae4qai aabAfacqGH9aqpdaWcgaqaamaakaaabaGaciODaiaacggacaGGYbGa aiikaiqadsfagaqcamaaBaaaleaacaWGhbaabeaakiaacMcaaSqaba aakeaaceWGubGbaKaadaWgaaWcbaGaam4raaqabaaaaaaa@44DD@ ). The evaluation is carried out by applying the Bethel algorithm, that requires as input, for each stratum of the current solution:

  • means and standard deviations of target variables;
  • cost of interviewing per unit;
  • population (number of units).

Each one of the above items is computed on the basis of corresponding values in the atomic strata.

Let us consider a particular partition P ( ν ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiuam aabmaabaGaeqyVd4gacaGLOaGaayzkaaaaaa@3E16@ of L MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamitaa aa@3AD2@ determined by a given solution v = [ v 1 , , v K ] . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaacbmGaa8 NDaiabg2da9maadmaabaGaamODamaaBaaaleaacaaIXaaabeaakiaa cYcacaWGGaGaeSOjGSKaamiiaiaacYcacaWG2bWaaSbaaSqaaiaadU eaaeqaaaGccaGLBbGaayzxaaGaaiOlaaaa@4666@ Let D i ( i = 1 , 2 , , Q P ( v ) ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiram aaBaaaleaacaWGPbaabeaakmaabmqabaGaamyAaiabg2da9iaaigda caGGSaGaaGOmaiaacYcacqWIMaYscaGGSaGaamyuamaaBaaaleaaca WGqbWaaeWaaeaacaWG2baacaGLOaGaayzkaaaabeaaaOGaayjkaiaa wMcaaaaa@4879@ be one stratum in this partition. There are two possibilities:

  1. D i   MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiram aaBaaaleaacaWGPbaabeaakiaabccaaaa@3C90@ coincides with an atomic stratum l k ; MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiBam aaBaaaleaacaWGRbaabeaakiaacUdaaaa@3CD6@
  2. D i = { l j i , , l l i } MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiram aaBaaaleaacaWGPbaabeaakiabg2da9maacmaabaGaamiBamaaDaaa leaacaWGQbaabaGaamyAaaaakiaacYcacqWIMaYscaGGSaGaamiBam aaDaaaleaacaWGSbaabaGaamyAaaaaaOGaay5Eaiaaw2haaaaa@47B2@ is the result of the aggregation of a subset { l j i , , l l i } MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaWaaiWaae aacaWGSbWaa0baaSqaaiaadQgaaeaacaWGPbaaaOGaaiilaiablAci ljaacYcacaWGSbWaa0baaSqaaiaadYgaaeaacaWGPbaaaaGccaGL7b GaayzFaaaaaa@44BF@ of the atomic strata.

 

In the first case, means and variances of target variables in the stratum are known. In the second case, means and variances in D i MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiram aaBaaaleaacaWGPbaabeaaaaa@3BE3@ may be calculated by using the following formulas:

Y ¯ g , D i = l k D i Y ¯ g , l k N l k l k D i N l k        ( 3.1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabmyway aaraWaaSbaaSqaaiaadEgacaGGSaGaamiramaaBaaameaacaWGPbaa beaaaSqabaGccqGH9aqpdaWcaaqaamaaqafabaGabmywayaaraWaaS baaSqaaiaadEgacaGGSaGaamiBamaaBaaameaacaWGRbaabeaaaSqa baGccaWGobWaaSbaaSqaaiaadYgadaWgaaadbaGaam4Aaaqabaaale qaaaqaaiaadYgadaWgaaadbaGaam4AaaqabaWccqGHiiIZcaWGebWa aSbaaWqaaiaadMgaaeqaaaWcbeqdcqGHris5aaGcbaWaaabuaeaaca WGobWaaSbaaSqaaiaadYgadaWgaaadbaGaam4Aaaqabaaaleqaaaqa aiaadYgadaWgaaadbaGaam4AaaqabaWccqGHiiIZcaWGebWaaSbaaW qaaiaadMgaaeqaaaWcbeqdcqGHris5aaaakiaaxMaacaWLjaGaaCzc amaabmaabaaeaaaaaaaaa8qacaaIZaGaaiOlaiaaigdaa8aacaGLOa Gaayzkaaaaaa@6002@

S g , D i 2 = ( l k D i N l k 1 ) 1 { l k D i ( N l k 1 ) S g , l k 2 + l k D i N l k ( Y ¯ g , l k Y ¯ g , D i ) 2 }        ( 3.2 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4uam aaDaaaleaacaWGNbGaaiilaiaadseadaWgaaadbaGaamyAaaqabaaa leaacaaIYaaaaOGaeyypa0ZaaeWabeaadaaeqbqaaiaad6eadaWgaa WcbaGaamiBamaaBaaameaacaWGRbaabeaaaSqabaGccqGHsislcaaI XaaaleaacaWGSbWaaSbaaWqaaiaadUgaaeqaaSGaeyicI4Saamiram aaBaaameaacaWGPbaabeaaaSqab0GaeyyeIuoaaOGaayjkaiaawMca amaaCaaaleqabaGaeyOeI0IaaGymaaaakmaacmqabaWaaabuaeaada qadaqaaiaad6eadaWgaaWcbaGaamiBamaaBaaameaacaWGRbaabeaa aSqabaGccqGHsislcaaIXaaacaGLOaGaayzkaaGaam4uamaaDaaale aacaWGNbGaaiilaiaadYgadaWgaaadbaGaam4AaaqabaaaleaacaaI YaaaaaqaaiaadYgadaWgaaadbaGaam4AaaqabaWccqGHiiIZcaWGeb WaaSbaaWqaaiaadMgaaeqaaaWcbeqdcqGHris5aOGaey4kaSYaaabu aeaacaWGobWaaSbaaSqaaiaadYgadaWgaaadbaGaam4Aaaqabaaale qaaOWaaeWaaeaaceWGzbGbaebadaWgaaWcbaGaam4zaiaacYcacaWG SbWaaSbaaWqaaiaadUgaaeqaaaWcbeaakiabgkHiTiqadMfagaqeam aaBaaaleaacaWGNbGaaiilaiaadseadaWgaaadbaGaamyAaaqabaaa leqaaaGccaGLOaGaayzkaaWaaWbaaSqabeaacaaIYaaaaaqaaiaadY gadaWgaaadbaGaam4AaaqabaWccqGHiiIZcaWGebWaaSbaaWqaaiaa dMgaaeqaaaWcbeqdcqGHris5aaGccaGL7bGaayzFaaGaaCzcaiaaxM aacaWLjaWaaeWaaeaaqaaaaaaaaaWdbiaaiodacaGGUaGaaGOmaaWd aiaawIcacaGLPaaaaaa@8463@

where:

Y ¯ g , D i MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabmyway aaraWaa0baaSqaaiaadEgacaGGSaGaamiramaaBaaameaacaWGPbaa beaaaSqaaaaaaaa@3EAE@ and Y ¯ g , l k MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabmyway aaraWaa0baaSqaaiaadEgacaGGSaGaamiBamaaBaaameaacaWGRbaa beaaaSqaaaaaaaa@3ED8@ are the mean values in aggregated stratum D i MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiram aaBaaaleaacaWGPbaabeaaaaa@3BE3@ and atomic strata l k ; MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiBam aaBaaaleaacaWGRbaabeaakiaacUdaaaa@3CD6@

N l k MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOtam aaBaaaleaacaWGSbWaaSbaaWqaaiaadUgaaeqaaaWcbeaaaaa@3D18@ is the number of units in atomic stratum l k ; MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiBam aaBaaaleaacaWGRbaabeaakiaacUdaaaa@3CD6@

S g , D i 2 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4uam aaDaaaleaacaWGNbGaaiilaiaadseadaWgaaadbaGaamyAaaqabaaa leaacaaIYaaaaaaa@3F4C@ and S g , l k 2 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4uam aaDaaaleaacaWGNbGaaiilaiaadYgadaWgaaadbaGaam4Aaaqabaaa leaacaaIYaaaaaaa@3F76@ are the variances in aggregated stratum D i MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiram aaBaaaleaacaWGPbaabeaaaaa@3BE3@ and atomic strata l k . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiBam aaBaaaleaacaWGRbaabeaakiaac6caaaa@3CC9@

The expected cost of observing a unit in a given aggregate stratum is calculated by averaging the costs in each contributing atomic stratum, weighted by their population:

C D i = l k D i C l k N l k l k D i N l k        ( 3.3 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4qam aaBaaaleaacaWGebWaaSbaaWqaaiaadMgaaeqaaaWcbeaakiabg2da 9maalaaabaWaaabuaeaacaWGdbWaaSbaaSqaaiaadYgadaWgaaadba Gaam4AaaqabaaaleqaaOGaamOtamaaBaaaleaacaWGSbWaaSbaaWqa aiaadUgaaeqaaaWcbeaaaeaacaWGSbWaaSbaaWqaaiaadUgaaeqaaS GaeyicI4SaamiramaaBaaameaacaWGPbaabeaaaSqab0GaeyyeIuoa aOqaamaaqafabaGaamOtamaaBaaaleaacaWGSbWaaSbaaWqaaiaadU gaaeqaaaWcbeaaaeaacaWGSbWaaSbaaWqaaiaadUgaaeqaaSGaeyic I4SaamiramaaBaaameaacaWGPbaabeaaaSqab0GaeyyeIuoaaaGcca WLjaGaaCzcaiaaxMaadaqadaqaaabaaaaaaaaapeGaaG4maiaac6ca caaIZaaapaGaayjkaiaawMcaaaaa@5C70@

Finally, we can compute the population in any aggregate stratum as the sum of the units in the contributing atomic strata:

N D i = l k D i N l k        ( 3.4 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9LqFf0xe9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9 q8as0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamOtamaaBaaaleaacaWGebWaaSbaaWqaaiaadMgaaeqaaaWcbeaa kiabg2da9maaqafabaGaamOtamaaBaaaleaacaWGSbWaaSbaaWqaai aadUgaaeqaaaWcbeaaaeaacaWGSbWaaSbaaWqaaiaadUgaaeqaaSGa eyicI4SaamiramaaBaaameaacaWGPbaabeaaaSqab0GaeyyeIuoaki aaxMaacaWLjaWaaeWaaeaaqaaaaaaaaaWdbiaaiodacaGGUaGaaGin aaWdaiaawIcacaGLPaaaaaa@5008@

So, in correspondence of each potential solution, we are able to calculate dynamically all the information required to apply the optimal allocation algorithm, that produces the total cost

C ( n 1 , , n H P ( ν ) ) = C 0 + h = 1 H P ( ν ) C h n h MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9LqFf0xe9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9 q8as0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4qamaabmqabaGaamOBamaaBaaaleaacaaIXaaabeaakiaacYca cqWIMaYscaGGSaGaamOBamaaBaaaleaadaahaaadbeqaamaaBaaaba GaamisaaqabaaaaSWaaSbaaWqaaiaadcfadaqadaqaaiabe27aUbGa ayjkaiaawMcaaaqabaaaleqaaaGccaGLOaGaayzkaaGaeyypa0Jaam 4qamaaBaaaleaacaaIWaaabeaakiabgUcaRmaaqahabaGaam4qamaa BaaaleaacaWGObaabeaakiaad6gadaWgaaWcbaGaamiAaaqabaaaba GaamiAaiabg2da9iaaigdaaeaacaWGibWaaSbaaWqaaiaadcfadaqa daqaaiabe27aUbGaayjkaiaawMcaaaqabaaaniabggHiLdaaaa@5AFD@

that is the fitness of the individual.

 Step 2: Breeding a new generation

Once the fitness of each individual is evaluated, a proportion of them are selected to breed a new generation. Individuals are selected through this fitness-based process, where fitter individuals are more likely to be selected, while only a small proportion of less fit individuals are selected. The presence of this second component helps to keep the diversity of the generation large enough, preventing premature convergence on poor solutions. There is also the option of indicating the number of the best individuals (expressed as a percentage of the p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCaa aa@3AF6@ size of the generation) that in any case must be present also in the next generation (parameter elitism).

The next generation will thus be composed by a number of individuals from the previous generation (the best ones), plus a number of "children�, obtained by selecting and crossing "parents� from the current generation. In the G A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4rai aadgeaaaa@3B93@ approach, the genome of a "child� individual is formed using the crossover and mutation operators:

  • crossover: many crossover techniques exist for G A , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4rai aadgeacaGGSaaaaa@3C43@ which use different data structures and different criteria of chromosomes selection, but the general approach is to exchange a subset of chromosomes between two parents. In our implementation, once two parents have been selected with probability proportional to their fitness, a crossover-point is generated, still on a random basis. This crossover-point is an integer belonging to the interval [ 1 , K ] , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaWaamWaae aacaaIXaGaaiilaiaadUeaaiaawUfacaGLDbaacaGGSaaaaa@3EDE@ Let c MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4yaa aa@3AE9@ be this generated crossover-point: then, the child individual will be formed by inheriting the first c MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4yaa aa@3AE9@ chromosomes from the first parent, and the remaining ( K c ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaWaaeWaae aacaWGlbGaeyOeI0Iaam4yaaGaayjkaiaawMcaaaaa@3E2F@ chromosomes from the second parent;
  • mutation: given the probability that an arbitrary value in a genetic sequence will be changed from its original state (mutation chance), G A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9qr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaam4rai aadgeaaaa@3B93@ proceeds to draw, for each chromosome in the genome, a random value to decide if the value will be changed or not.

By applying the above methods of crossover and mutation, a new individual is created which typically shares many of the characteristics of its "parents�. New parents are selected to produce new children, and the process continues until a new generation of individuals (stratifications) of appropriate size is generated.

Step 3: Iteration and stopping criteria

Usually, the average fitness is increased moving from one generation to the next. Steps 1 and 2 are repeated until a termination condition has been reached. Common terminating conditions are:

  1. the maximum number of iterations has been reached;
  2. a "plateau� has been reached, such that successive iterations no longer produce better results;
  3. a combination of the above.

In our case, the terminating condition can be considered as a combination of the above. Actually, the used rule is the maximum number of iterations, but this number is determined by analysing previous runs, in order to detect the "plateau� and be sure that additional iterations are not likely to improve the final solution.

Critical parameters of the optimal stratification algorithm

Here a distinction is made between the parameters that are common to genetic algorithm, and the ones that are peculiar to the particular problem to which it is applied, i.e., the optimal stratification of a population frame (the names of the parameters are those used in the R package SamplingStrata).

Among the first we list:

  • size of generation of individuals (pop);
  • number of iterations (iterations);
  • mutation chance (mut_chance);
  • elitism (elitism_rate).

Instead, the context parameters are:

  • minimum number of units per stratum (minnumstrat) (the Bethel algorithm is forced to allocate in each stratum at least the number of units indicated by this parameter);
  • initial number of strata (initialStrata);
  • possibility to increase the maximum number of strata (addStrataFactor).

As for the first group, there are no strict rules to assign values to these parameters. Given a particular problem, it is suggested to carry out a number of trials in order to assess the sensitivity of the solutions to the values of the parameters.

It is important to take into account that parameters as size of generation and elitism are in general influent on the rapidity of convergence, and not so much on the final solution, given that a "reasonable� number of iterations is given.

The reasonability of the parameter number of iterations can be assessed by analysing the behaviour of the fitness function: if the values of this function are no longer decreasing after a certain number of iterations, it is reasonable to expect that to increase the number of iterations will not produce better results.

On the contrary, the value of mutation chance has effects on both rapidity of convergence and the goodness of the final solution: a high mutation chance allows to avoid local minima, at the cost of a slower convergence.

Conversely, parameters of the second group should be given on the basis of practical considerations, related to the characteristics and requirements of the survey that is under design.

As for the parameter minimum number of units per stratum, if an adequate number of observations in all strata is to be ensured (in order to take into account the expected non response, the need of calculating sampling variance, fieldwork reasons, etc.), a value can be set higher than the default one (which is set to 2).

The parameter initial number of strata is very important. First of all, its value, if associated with a value of the parameter addStrataFactor equal to zero, determines the maximum acceptable number of strata in the final solution. This possibility may be useful not only for fieldwork reasons (if, for example, for organizational considerations the number of strata is to be limited), but especially because the final solution is very sensitive to the value of this parameter. We have experimented that if the algorithm with different values of initialStrata is run, from low values up to the maximum given by the number of atomic strata, solutions can be very different. It is possible to let the algorithm to choose for us, in this way: we set initialStrata by assigning a low value to it, together with a high value of parameter addStrataFactor (the parameter addStrataFactor is used to increase dynamically the value set by parameter initialStrata: each time a mutation takes place, a random number between 0 and 1 is generated, and if it is greater than the quantity (1-addStrataFactor), the maximum number of strata is increased of one unit) (by default, it is equal to 0). Manoeuvring these two parameters, there are different possibilities:

  1. for any given value of initialStrata, if addStrataFactor is set equal to 0, then the algorithm has to consider that value as a fixed limit, and all solutions to be explored will be characterised by that maximum number of strata;
  2. otherwise, if addStrataFactor is set to a value greater then 0, then the algorithm may explore solutions varying the number of strata, from an initial value given by initialStrata, up to a maximum number given by the number of atomic strata.

Previous | Next

Date modified: