Adaptive rectangular sampling: An easy, incomplete, neighbourhood-free adaptive cluster sampling design Section 1. Introduction

Adaptive cluster sampling (ACS) is an efficient design for rare and clustered populations (Thompson 1990; Thompson and Seber 1996). ACS was introduced for quadrat-based sampling, where the study area is usually partitioned into non-overlapping quadrats for sample selection. Depending on the situation, these are called “cells” or “secondary sampling units” (SSUs). In the first phase of the design, an initial sample is selected using one of the conventional designs, usually simple random sampling without replacement (SRSWOR). The term “conventional designs” (Thompson and Seber 1996) refers to designs in which the procedure for selecting the sample does not depend on any observation of the main variable, such as SRSWOR, stratified sampling and systematic sampling. If a rare event (a cell whose value is at least as large as the prespecified condition C ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipC0xd9Wqpe0dd9 qqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9Ff0dfrpm0dXdHqps0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaeGaaeaaca WGdbaacaGLPaaaaaa@35B7@  is found after the initial sample is obtained, then sampling continues in the neighbourhood of that location with the hope of observing more rare events. The process of searching the neighbourhood is continued until no more rare events are found. This design has been shown to be useful for estimating the parameters of highly clustered and rare populations (Smith, Brown and Lo 2004). However, ACS has some disadvantages, including the following two:

To overcome the first problem, many designs, such as two-stage ACS and incomplete or restricted ACS (IACS), have been introduced.

Thompson (1991) introduced stratified ACS, and Salehi and Seber (1997) developed two-stage ACS. Two-stage ACS is designed to select a fixed number of primary sampling units (PSUs) by SRSWOR in the first stage, and then to select a fixed number of SSUs in each selected PSU, also by SRSWOR, in the second stage. The condition that is to be adapted and the neighbourhood are defined in terms of secondary units (rather than primary units). Salehi and Seber considered two schemes, depending on whether the clusters are allowed to overlap primary-unit boundaries or not, that would later be more desirable for controlling the final sample size. Some other related designs have also been introduced; as they are not related to the discussion in this paper, they are not mentioned.

Salehi and Smith (2005) made an essential change to two-stage ACS, known as two-stage sequential sampling, and Brown et al. (2008) introduced an adaptive version of two-stage sequential sampling (ATS). In ATS, the allocation of second-stage efforts among PSUs is based on preliminary information from the sampled PSUs. Additional survey efforts are directed to PSUs where the SSUs in the initial sample meet a prespecified criterion, or condition C MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipC0xd9Wqpe0dd9 qqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9Ff0dfrpm0dXdHqps0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4qaaaa@34EF@ (e.g., an individual from the rare population is present). More precisely, d MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipC0xd9Wqpe0dd9 qqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9Ff0dfrpm0dXdHqps0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamizaaaa@3510@ times the number of units that satisfy condition C MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipC0xd9Wqpe0dd9 qqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9Ff0dfrpm0dXdHqps0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4qaaaa@34EF@ in the initial sample in a PSU is dedicated to the respective PSU as an additional sample using SRSWOR. Therefore, ATS could almost overcome the two problems, since, in this design, the final sample size is limited, and there is no need to define and follow the neighbourhood. But, ATS does not directly employ clustering of the population. This means that in ATS, the additional sample is a random sample of the whole respective PSU MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGWj0Jf9crFfpeea0xh9v8qiW7rqqrpu0xh9Wqpm0db9Wq pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Fn0dbbG8Fq0Jfr=x fr=xfbpdbiqaaeaaciGaaiaabeqaamaabaabaaGcbaacbaqcLbwaqa aaaaaaaaWdbiaa=nbiaaa@3D01@ ATS depends on the size, shape and location of the PSUs. Other developments in ATS are not essential (in other words, they have not changed the special aspects of ATS), so there is no need to mention them here.

IACS, Brown and Manly (1998) suggested a restricted version of ACS to control final sample size. They put a limit on the final sample size prior to sampling by selecting the initial sample sequentially. Chao and Thompson (1999) and Su and Quinn (2003) imposed a restriction on the number of neighbourhood levels beyond each unit that satisfies the condition in the initial fixed-size sample. All the neighbours of the units that satisfy the condition in the initial sample are in the first neighbourhood level. In turn, all the neighbours that are to be added based on the units in the first neighbourhood level are considered to be in the second neighbourhood level, and so on. In brief, a cluster, as defined in conventional ACS, is allowed in IACS to be truncated at a predetermined distance from the unit in the initial sample that intersects it. These authors introduced a biased estimator for the population mean.

Interestingly, Yang (2011) and Yang et al. (2011) introduced an adaptive plot design to overcome disadvantages of ACS in practice, and to use and define the neighbourhood, especially in tree-sampling surveys. They aimed to improve the practicability of ACS while maintaining some of its major characteristics. According to Yang et al. (2011), “the plot design is based on a conditional plot expansion: a larger plot (by a pre-defined plot size factor) is installed at a sample point instead of the smaller initial plot if a pre-defined condition is fulfilled.” Their target population was a tree population, and their aim was to estimate its density (the number of objects per hectare). Their design was not planned for quadrat-based sampling. In addition, they assumed that they could survey an additional area (after they finished sampling) to calculate the inclusion probability of the required tree, but this would be impossible or costly in other surveys.

Chao and Thompson (1999) introduced IACS for the first time. This design enables ACS to function without measuring all members of a cluster. They introduced the design in graph-theory form. Because it uses neighbourhoods like ACS, the design is complicated to manage, and the calculation of inclusion probabilities is also complicated.

In this paper, a manageable version of ACS, which has positive aspects of the designs discussed above, is proposed. Adaptive rectangular sampling (ARS) is a practical, efficient and easy-to-calculate adaptive design that is able to find rare events, does not need to follow the neighbourhood and controls the final sample size well.

ARS is introduced in Section 2 of the paper. Section 3 contains a real case study and some simulations, and Section 4 concludes the paper and provides a complete discussion of the advantages and disadvantages of the design.

Date modified: