5. Non-optimal properties of existing methods

Sun Woong Kim, Steven G. Heeringa and Peter W. Solenberger

Previous | Next

As described in Section 4, the algorithms for controlled selection may be divided into two parts, manual algorithms before 1980s and computer-intensive algorithms since then. For large controlled selection problems with many cells, the latter class of algorithms may be preferred. But when the problem is small, the former can be easily used without the complexity of the latter. Therefore we would not say that the former is always inferior to the latter. More objective criteria for comparing them would be necessary, and the optimal solution may be adopted as one of the better criteria to compare their strengths or weaknesses.

As discussed by Jessen (1978, pages 375-376), the algorithms of Jessen (1970) aim to minimize the number of arrays in a solution set B MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWefv 3ySLgzgjxyRrxDYbqeguuDJXwAKbIrYf2A0vNCaGqbbiqb=fa8czaa faaaaa@4488@ , and the algorithm of Jessen (1978) quite easily achieves that purpose relative to those of Jessen (1970). Thus his algorithms pursue “simplicity” in formulating a solution rather than an optimal solution.

The algorithm of Causey et al. (1985) may give a “partially” optimal solution. Other than the original problem, A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam yqaaaa@38C8@ , it sequentially creates a small number of new controlled selection problems, and then as a solution it finds only one array B k ( B ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam OqamaaBaaaleaacaWGRbaabeaakiaayIW7caGGOaGaeyicI48efv3y SLgzgjxyRrxDYbqeguuDJXwAKbIrYf2A0vNCaGqbbiab=fa8cjaacM caaaa@4AD7@ to be nearest to each problem, starting with A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam yqaaaa@38C8@ . Each problem is regarded as the transportation problem of Cox and Ernst (1982), which is formed by the objective function mimicking the behavior of

i = 1 R j = 1 C | b i j k a i j | p ,   k = 1 , , L ,   1 p < . ( 5.1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaabCaeaada aeWbqaamaaemaabaGaamOyamaaBaaaleaacaWGPbGaamOAaiaadUga aeqaaOGaeyOeI0IaamyyamaaBaaaleaacaWGPbGaamOAaaqabaaaki aawEa7caGLiWoadaahaaWcbeqaaiaadchaaaaabaGaamOAaiabg2da 9iaaigdaaeaacaWGdbaaniabggHiLdaaleaacaWGPbGaeyypa0JaaG ymaaqaaiaadkfaa0GaeyyeIuoakiaacYcacaqGGaGaam4Aaiabg2da 9Gqaaiaa=fdacaGGSaGaeSOjGSKaaiilaiaadYeacaGGSaGaaeiiai aa=fdacqGHKjYOcaWGWbGaeyipaWJaeyOhIuQaaiOlaiaaywW7caaM f8UaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7caaMf8Uaaiikaiaaiw dacaGGUaGaaGymaiaacMcaaaa@6C9D@

Note that since function (5.1) violates the triangle inequality axiom (iii), it is not a distance function. It needs the inclusion of the p -th MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaa aaaaaaa8qacaWGWbGaaeylaiaabshacaqGObaaaa@3BA8@ root to be a distance function. Also, each p ( B k ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaa aaaaaaa8qacaWGWbWdamaabmaabaWdbiaadkeapaWaaSbaaSqaa8qa caWGRbaapaqabaaakiaawIcacaGLPaaaaaa@3CD9@  is calculated by a simple formula. In view of the optimality requirements given by R1 and R2 the Causey et al. algorithm has the following weaknesses: 1) Since other controlled selection problems in addition to the original problem A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam yqaaaa@38C8@ are involved, it is difficult to obtain the solution consistently based on the closeness between the unique A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam yqaaaa@38C8@ and every individual B k MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam OqamaaBaaaleaacaWGRbaabeaaaaa@39E5@ in B MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWefv 3ySLgzgjxyRrxDYbqeguuDJXwAKbIrYf2A0vNCaGqbbiab=fa8cbaa @447C@ ; 2) The maximization of the probabilities of selection for the arrays nearest to A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam yqaaaa@38C8@ is not guaranteed.

Winkler (2001) presented a modification of the method of Causey et al. (1985). Instead of using the transportation problem, he proposed integer linear programming, resulting in slight changes of the p ( B k ) . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaa aaaaaaa8qacaWGWbWdamaabmaabaWdbiaadkeapaWaaSbaaSqaa8qa caWGRbaapaqabaaakiaawIcacaGLPaaapeGaaiOlaaaa@3D9B@ Nevertheless, the Winkler (2001) algorithm is not free from the weaknesses of the Causey et al. (1985) method.

Adopting a network flow problem approach, the Huang and Lin (1998) algorithm imposes the additional subgroup constraints in A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam yqaaaa@38C8@ , raised by Goodman and Kish (1950). However, it does not attain objectives R1 and R2, just as in Causey et al. (1985) and Winkler (2001), since a new network, instead of a new controlled selection problem, is generated at every iteration, an arbitrary B k ( B ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam OqamaaBaaaleaacaWGRbaabeaakiaaykW7daqadaqaaiabgIGioprr 1ngBPrMrYf2A0vNCaeHbfv3ySLgzGyKCHTgD1jhaiuqacqWFbaVqai aawIcacaGLPaaaaaa@4B01@ is obtained as a solution to the network, and p ( B k ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam iCamaabmqabaGaamOqamaaBaaaleaacaWGRbaabeaaaOGaayjkaiaa wMcaaaaa@3C6E@ is calculated by a simple formula.

In contrast, the LP algorithms proposed by Sitter and Skinner (1994) and Tiwari and Nigam (1998) use all possible arrays in B MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWefv 3ySLgzgjxyRrxDYbqeguuDJXwAKbIrYf2A0vNCaGqbbiab=fa8cbaa @447C@ . Note that finding all those arrays is an important issue, and that p ( B k ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam iCamaabmaabaGaamOqamaaBaaaleaacaWGRbaabeaaaOGaayjkaiaa wMcaaaaa@3C6D@ for all possible arrays are simultaneously obtained by running the software for LP only once. The key idea underlying the algorithm of Sitter and Skinner (1994) is to use a “loss function” defined by

i = 1 R ( b i . k a i . ) 2 + j = 1 C ( b . j k a . j ) 2 . ( 5.2 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaa bCaeaadaqadaqaaiaadkgadaWgaaWcbaGaamyAaiaac6cacaWGRbaa beaakiabgkHiTiaadggadaWgaaWcbaGaamyAaiaac6caaeqaaaGcca GLOaGaayzkaaWaaWbaaSqabeaacaaIYaaaaaqaaiaadMgacqGH9aqp caaIXaaabaGaamOuaaqdcqGHris5aOGaey4kaSYaaabCaeaadaqada qaaiaadkgadaWgaaWcbaGaaiOlaiaadQgacaWGRbaabeaakiabgkHi TiaadggadaWgaaWcbaGaaiOlaiaadQgaaeqaaaGccaGLOaGaayzkaa WaaWbaaSqabeaacaaIYaaaaaqaaiaadQgacqGH9aqpcaaIXaaabaGa am4qaaqdcqGHris5aOGaaiOlaiaaywW7caaMf8UaaGzbVlaaywW7ca aMf8UaaGzbVlaaywW7caaMf8UaaiikaiaaiwdacaGGUaGaaGOmaiaa cMcaaaa@689D@

In terms of R1 and R2, their algorithm has the following disadvantages: 1) The closeness between A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam yqaaaa@38C8@ and B k MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam OqamaaBaaaleaacaWGRbaabeaaaaa@39E5@ is not well captured by loss function (5.2). This is because it is not a distance function that satisfies axiom (iii), as the marginal totals are used, instead of the cell entries; 2) Loss function (5.2) is irrelevant to the maximization of the probabilities of selection over the arrays nearest to A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam yqaaaa@38C8@ in Problems 2.1, 2.2, and 2.4, since it is always zero.

The LP method of Tiwari and Nigam (1998) can be used to reduce the selection probabilities of non-preferred arrays (e.g., arrays not containing the PSU corresponding to the cell i j = 23 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam yAaiaadQgacqGH9aqpcaaIYaGaaG4maaaa@3C5E@ in Problem 2.1), which are initially determined by the samplers. For controlled selection problems with integer margins and without considering the non-preferred arrays, their method will give the same solutions as that of Sitter and Skinner (1994).

The solutions from these previous methods will be compared with those from the proposed method in Section 6, on several examples in Section 8.

Previous | Next

Date modified: