3 Construction of confidence bands
Hervé Cardot, Alain Dessertaine, Camelia Goga,
Étienne Josserand and Pauline Lardin
Previous | Next
Here we are considering confidence bands for the mean
curve
that have the form
where the value of the coefficient is unknown and depends on the desired
confidence level , and is an estimator of the standard deviation of The calculation of
is based on the fact that according to some
hypotheses (Cardot et al. 2013), the
process
converges toward a Gaussian process in the space of
continuous functions .
We then have
and it is therefore sufficient to determine , the quantile of order of the real random variable
to construct the confidence band completely.
The distribution of the sup of Gaussian processes is known explicitly for only
a few specific cases, such as the Brownian motion.
We propose two approaches to determine the value of . The first is based on a direct
estimate of the standard deviation and the simulation of Gaussian processes .
The second, which does not require having an estimator of the variance, is
based on resampling techniques where both the standard deviation and the value
of are obtained from bootstrap replications.
3.1 Construction of confidence bands by simulation
of Gaussian processes
The steps of the algorithm are as follows:
1. Draw sample of size using sampling design and calculate the estimator as well as the estimator of the covariance function , .
2. Simulate curves , of
the same distribution as where is a Gaussian process of expectation 0 and of
covariance function where .
3. Determine the quantile of order of the variables, .
This algorithm, which is very fast and easy to
implement, has already been proposed in the context of i.i.d. observations by
Faraway (1997), Cuevas et al. (2006)
and Degras (2011) to construct confidence bands. A rigorous asymptotic
justification of this approach may be found in Cardot et al. (2013) for sampling in finite populations.
3.2 Construction of confidence bands by
bootstrapping
In this work, we use the bootstrap method proposed by
Gross (1980) for SRSWOR sampling and the extensions proposed by Chauvet (2007)
for STRAT and designs. It is based on the following
principle: the sample is used to simulate a fictitious population in which we select a number of bootstrapped
samples. The implementation of this algorithm is not straightforward when the
ratio is not an integer. Many variants have been proposed
in the literature to deal with the general case, and we decided to adopt the
one first proposed by Booth, Butler and Hall (1994) for the SRSWOR design.
Assume that sample of size was selected using sampling design and let be the estimator of calculated from
General bootstrap algorithm
1. Duplicate
each individual times, where [.] designates the integer
portion. We complete the population thus obtained by selecting a sample in with an inclusion probability Let be the value of the variable of interest in
the pseudo-population.
2. Draw
samples of size in the pseudo-population using the sampling design with inclusion probabilities and calculate
3. Estimate
the function by the corrected empirical standard deviation
of
where
4. Choose
as the quantile of order of the variables
A technique similar to the one used in step 4 of the
algorithm was used by Bickel and Krieger (1989) to construct confidence bands
for a distribution function.
The SRSWOR design uses the general bootstrap algorithm
for , and for the STRAT design, we
apply in each stratum for the algorithm used for the SRSWOR design with In this case, we are back to the algorithm
proposed by Booth et al. (1994).
The adaptation of the bootstrap algorithm to the design was proposed by Chauvet (2007). It
consists in selecting, during step 2 of the general algorithm, the sample in with inclusion probabilities
This change is necessary in order to comply with
the constraint of fixed size during re-sampling. The inclusion probabilities are also used to estimate in step 2 of the general algorithm. The
selection of a sample can be carried out
using the cube algorithm with the balancing variable . In these conditions, it is
desirable to perform a random sort in the population (or ) before the selection of (or ) in order to obtain a sampling design
close to maximum entropy (Chauvet 2007, Tillé 2011). Chauvet (2007) also gives
asymptotic results concerning the convergence of the variance estimator
obtained in the case of the bootstrap for the design.
Finally, it is also possible to adapt this general
algorithm to estimate the variance function of the estimator In step 1 of the algorithm, we also calculate
the values of in the pseudo-population .
Using the fact that the linear-model-assisted estimator is a nonlinear function
of Horvitz-Thompson estimators, we calculate the bootstrapped value of over sample according to
where As Canty and Davison (1999) note, using the
total of the variable over the population instead of the pseudo-population yields better results, especially when this
variable has extreme values.
Previous | Next