3 Construction of confidence bands

Hervé Cardot, Alain Dessertaine, Camelia Goga, Étienne Josserand and Pauline Lardin

Here we are considering confidence bands for the mean curve $μ$ that have the form

$ℙ (μ (t) \in [\hat{μ} (t) \pm c_{α} \hat{σ} (t)], \forall t \in [0, T]) = 1 - α, (3.1)$

where the value of the coefficient $c_{α}$ is unknown and depends on the desired confidence level $1 - α$ , and $\hat{σ} (t)$ is an estimator of the standard deviation of $\hat{μ} (t) .$ The calculation of $c_{α}$ is based on the fact that according to some hypotheses (Cardot et al. 2013), the process

$Z (t) = (\hat{μ} (t) - μ (t)) / \hat{σ} (t), t \in [0, T],$

converges toward a Gaussian process in the space of continuous functions $C ([0, T])$ . We then have

$ℙ (\sup_{t \in [0, T]} | Z (t) | \leq c_{α}) = ℙ (μ (t) \in [\hat{μ} (t) \pm c_{α} \hat{σ} (t)], \forall t \in [0, T]) (3.2)$

and it is therefore sufficient to determine $c_{α}$ , the quantile of order $1 - α$ of the real random variable $\sup_{t \in [0, T]} | Z (t) |$ to construct the confidence band completely. The distribution of the sup of Gaussian processes is known explicitly for only a few specific cases, such as the Brownian motion.

We propose two approaches to determine the value of $c_{α}$ . The first is based on a direct estimate of the standard deviation and the simulation of Gaussian processes $Z (t)$ . The second, which does not require having an estimator of the variance, is based on resampling techniques where both the standard deviation and the value of $c_{α}$ are obtained from bootstrap replications.

3.1 Construction of confidence bands by simulation of Gaussian processes

The steps of the algorithm are as follows:

1. Draw sample $s$ of size $n$ using sampling design $p$ and calculate the estimator $\hat{μ}$ as well as the estimator $\hat{γ} (r, t)$ of the covariance function $γ (r, t)$ , $r, t \in [0, T]$ .

2. Simulate $M$ curves $Z_{m},$ $m = 1, ... M$ , of the same distribution as $Z$ where $Z$ is a Gaussian process of expectation 0 and of covariance function $ρ$ where $ρ (r, t) = \hat{γ} (r, t) / {(\hat{γ} (r) \hat{γ} (t))}^{1 / 2}$ $r, t \in [0, T]$ .

3. Determine $c_{α},$ the quantile of order $1 - α$ of the variables, ${(\sup_{t \in [0, T]} | Z_{m} (t) |)}_{m = 1,..., M}$ .

This algorithm, which is very fast and easy to implement, has already been proposed in the context of i.i.d. observations by Faraway (1997), Cuevas et al. (2006) and Degras (2011) to construct confidence bands. A rigorous asymptotic justification of this approach may be found in Cardot et al. (2013) for sampling in finite populations.

3.2 Construction of confidence bands by bootstrapping

In this work, we use the bootstrap method proposed by Gross (1980) for SRSWOR sampling and the extensions proposed by Chauvet (2007) for STRAT and $π p s$ designs. It is based on the following principle: the sample $s$ is used to simulate a fictitious population $U^{*}$ in which we select a number of bootstrapped samples. The implementation of this algorithm is not straightforward when the ratio $1 / π_{k}$ is not an integer. Many variants have been proposed in the literature to deal with the general case, and we decided to adopt the one first proposed by Booth, Butler and Hall (1994) for the SRSWOR design.

Assume that sample $s$ of size $n$ was selected using sampling design $p$ and let $\hat{μ}$ be the estimator of $μ$ calculated from $s .$

General bootstrap algorithm

1. Duplicate each individual $k \in s,$ $[1 / π_{k}]$ times, where [.] designates the integer portion. We complete the population thus obtained by selecting a sample in $s$ with an inclusion probability $α_{k} = 1 / π_{k} - [1 / π_{k}] .$ Let $Y_{k}^{*},$ $k \in U^{*}$ be the value of the variable of interest in the pseudo-population.

2. Draw $M$ samples $s_{m}^{*},$ $m = 1, \dots, M$ of size $n$ in the pseudo-population $U^{*}$ using the sampling design $p^{*}$ with inclusion probabilities $π_{k}^{*}$ and calculate

${\hat{μ}}_{m}^{*} (t) = \frac{1}{N} \sum_{k \in s_{m}^{*}} \frac{Y_{k}^{*} (t)}{π_{k}^{*}}, t \in [0, T] and m = 1,..., M .$

3. Estimate the function $\hat{σ} (t)$ by the corrected empirical standard deviation of ${\hat{μ}}_{m}^{*} (t), m = 1, \dots, M,$

${\hat{σ}}^{2} (t) = \frac{1}{M - 1} \sum_{m = 1}^{M} {({\hat{μ}}_{m}^{*} (t) - {\hat{μ}}_{•}^{*} (t))}^{2},$

where

${\hat{μ}}_{•}^{*} (t) = \frac{1}{M} \sum_{m = 1}^{M} {\hat{μ}}_{m}^{*} (t) and t \in [0, T] .$

4. Choose $c_{α}$ as the quantile of order $1 - α$ of the variables

${(\sup_{t \in [0, T]} \frac{| {\hat{μ}}_{m}^{*} (t) - \hat{μ} (t) |}{\hat{σ} (t)})}_{m = 1,..., M} .$

A technique similar to the one used in step 4 of the algorithm was used by Bickel and Krieger (1989) to construct confidence bands for a distribution function.

The SRSWOR design uses the general bootstrap algorithm for $π_{k}^{*} = n / N$ , and for the STRAT design, we apply in each stratum $U_{h},$ for $h = 1, \dots, H,$ the algorithm used for the SRSWOR design with $π_{k}^{*} = n_{h} / N_{h}$ $k \in U_{h} .$ In this case, we are back to the algorithm proposed by Booth et al. (1994).

The adaptation of the bootstrap algorithm to the $π p s$ design was proposed by Chauvet (2007). It consists in selecting, during step 2 of the general algorithm, the sample $s^{*}$ in $U^{*}$ with inclusion probabilities

$π_{k}^{*} = \frac{n x_{k}}{\sum_{k \in U^{*}} x_{k}} .$

This change is necessary in order to comply with the constraint of fixed size during re-sampling. The inclusion probabilities $π_{k}^{*}$ are also used to estimate ${\hat{μ}}_{m}^{*}$ in step 2 of the general algorithm. The selection of a $π p s$ sample can be carried out using the cube algorithm with the balancing variable $π$ . In these conditions, it is desirable to perform a random sort in the population $U$ (or $U^{*}$ ) before the selection of $s$ (or $s_{m}^{*}$ ) in order to obtain a sampling design close to maximum entropy (Chauvet 2007, Tillé 2011). Chauvet (2007) also gives asymptotic results concerning the convergence of the variance estimator obtained in the case of the bootstrap for the $π p s$ design.

Finally, it is also possible to adapt this general algorithm to estimate the variance function of the estimator ${\hat{μ}}_{M A} .$ In step 1 of the algorithm, we also calculate the values $x_{k}^{*}$ of $x_{k}$ in the pseudo-population $U^{*}$ . Using the fact that the linear-model-assisted estimator is a nonlinear function of Horvitz-Thompson estimators, we calculate the bootstrapped value ${\hat{μ}}_{M A}^{*}$ of ${\hat{μ}}_{M A}$ over sample $s_{m}^{*}$ according to

${\hat{μ}}_{M A}^{*} (t) = \frac{1}{N} \sum_{k \in s_{m}^{*}} \frac{Y_{k}^{*} (t) - x_{k}^{*'} {\hat{β}}^{*} (t)}{π_{k}^{*}} + \frac{1}{N} (\sum_{k \in U} x_{k}^{'}) {\hat{β}}^{*} (t)$

where ${\hat{β}}^{*} (t) = {(\sum_{s_{m}^{*}} x_{k}^{*} x_{k}^{*}')}^{- 1} \sum_{s_{m}^{*}} x_{k}^{*} Y_{k}^{*} (t) .$ As Canty and Davison (1999) note, using the total of the variable $x_{k}$ over the population $U$ instead of the pseudo-population $U^{*}$ yields better results, especially when this variable has extreme values.

Previous | Next

Date modified:: 2017-09-20

Language selection

Search and menus

Search

Publications

Survey Methodology

Browse by

3 Construction of confidence bands

3.1 Construction of confidence bands by simulation of Gaussian processes

3.2 Construction of confidence bands by bootstrapping

General bootstrap algorithm