3. Theoretical results

Jianqiang C. Wang, Jean D. Opsomer and Haonan Wang

Previous | Next

We begin by briefly describing the asymptotic analysis of the bagging estimators under general sampling design from a finite population, i.e. the design-based setting. We do this under the usual increasing-population framework, where we consider an increasing sequence of nested populations, say U N ,   N = 1 , 2 , , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGvbWaaSbaaS qaaiaad6eaaeqaaOGaaGilaiaabccacaWGobGaeyypa0JaaGymaiaa cYcacaaIYaGaaiilaiablAciljaacYcaaaa@3F32@  with finite population means μ N . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWH8oWaaSbaaS qaaiaad6eaaeqaaOGaaiOlaaaa@3877@  Associated with the sequence of populations is a sequence of sampling designs used to draw random sample A N U N MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGbbWaaSbaaS qaaiaad6eaaeqaaOGaeyOHI0SaamyvamaaBaaaleaacaWGobaabeaa aaa@3B1D@  of sample size n N , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGUbWaaSbaaS qaaiaad6eaaeqaaOGaaiilaaaa@3820@  with associated inclusion probabilities π i N . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacqaHapaCdaWgaa WcbaGaamyAaiaad6eaaeqaaOGaaiOlaaaa@39DA@  As commonly done in the survey literature, we suppress the N MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGobaaaa@3647@  subscript in the sample A , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGbbGaaiilaa aa@36EA@  the sample size n MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGUbaaaa@3667@  and the inclusion probabilities π i . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacqaHapaCdaWgaa WcbaGaamyAaaqabaGccaGGUaaaaa@3907@  For the sake of brevity, only design-based asymptotic results for bagging differentiable estimator θ ^ d MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacuaH4oqCgaqcam aaBaaaleaacaWGKbaabeaaaaa@384F@  and non-differentiable θ ^ n d MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacuaH4oqCgaqcam aaBaaaleaacaWGUbGaamizaaqabaaaaa@3942@  are provided. The formal assumptions under which the results are obtained and the theorems for differentiable and non-differentiable estimators are in Appendix A.1. The main result we are able to obtain in this design-based setting is that, if we are starting from a design-consistent estimator and we let the number of bootstrap samples k MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGRbaaaa@3664@  grow with n , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGUbGaaiilaa aa@3717@  the bagged versions of the estimators are also design consistent. This is clearly a key property of these estimators, since there would be no reason to consider them unless they satisfied this design consistency.

Unfortunately, the above design-based results are quite limited and in particular, do not provide an asymptotic distribution with which one might be able to perform inference, another highly desirable property of survey estimators. We therefore also consider a model-based setting, under which we are able to obtain an asymptotic variance approximation. In presenting model-based results, we assume the sampling design selecting the original sample A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGbbaaaa@363A@  is an equal probability design, and the population characteristics can be regarded as an i i d MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGPbGaamyAai aadsgaaaa@3839@  sample from a superpopulation distribution. Under this framework, the bagging estimator can be treated as a U-statistic. Thus we can apply the theory on U-statistics to obtain asymptotic expansion of bagging estimators. The analysis parallels that of Bühlmann and Yu (2002) and Buja and Stuetzle (2006). For the current paper, we restrict ourselves to bootstrap samples of size k MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGRbaaaa@3664@  where k MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGRbaaaa@3664@  is bounded and fixed. Under this asumption, the bagging estimators can be regarded as fixed-degree U-statistics, for which asymptotic theory has been well developed. A more interesting case is when the resample size k MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGRbaaaa@3664@  grows with sample size n , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGUbGaaiilaa aa@3717@  and this leads to infinite-degree U-statistics. Infinite-degree U-statistics have applications in studying the Kaplan-Meier estimator and m -out-of- n MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGTbGaaeylai aab+gacaqG1bGaaeiDaiaab2cacaqGVbGaaeOzaiaab2cacaWGUbaa aa@3E25@  bootstrap estimators, and the readers are referred to Frees (1989); Heilig (1997); Heilig and Nolan (2001), and the references therein on their statistical properties. Schick and Wefelmeyer (2004) studied the statistical properties of infinite-degree U-statistics constructed from moving averages of innovations in time series. The study of bagging estimators by viewing them as infinite-degree U-statistics is out of the scope of the current paper, and hence we limit ourselves to the case of fixed and bounded bootstrap sample size in the model-based case.

We first consider bagged estimator (2.5). Under SRSWOR, estimator (2.5) can be simplified to

θ ^ n d = 1 n i A h ( y i λ ^ ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacuaH4oqCgaqcam aaBaaaleaacaWGUbGaamizaaqabaGccqGH9aqpdaWcaaqaaiaaigda aeaacaWGUbaaamaaqafabeWcbaGaamyAaiabgIGiolaadgeaaeqani abggHiLdGccaWGObWaaeWaaeaacaWH5bWaaSbaaSqaaiaadMgaaeqa aOGaeyOeI0IabC4UdyaajaaacaGLOaGaayzkaaaaaa@4855@

and the bagged version of θ ^ n d MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacuaH4oqCgaqcam aaBaaaleaacaWGUbGaamizaaqabaaaaa@3942@  is defined as

θ ^ n d , b a g = 1 n i = 1 n 1 ( n 1 k 1 ) A b i h ( y i λ ^ ( Y b * ) ) ( 3.1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacuaH4oqCgaqcam aaBaaaleaacaWGUbGaamizaiaaiYcacaWGIbGaamyyaiaadEgaaeqa aOGaeyypa0ZaaSaaaeaacaaIXaaabaGaamOBaaaadaaeWbqabSqaai aadMgacqGH9aqpcaaIXaaabaGaamOBaaqdcqGHris5aOWaaSaaaeaa caaIXaaabaWaaeWaaeaafaqabeGabaaabaGaamOBaiabgkHiTiaaig daaeaacaWGRbGaeyOeI0IaaGymaaaaaiaawIcacaGLPaaaaaWaaabu aeqaleaacaWGbbWaaSbaaWqaaiaadkgaaeqaaSGaeyydICIaamyAaa qab0GaeyyeIuoakiaadIgadaqadaqaaiaahMhadaWgaaWcbaGaamyA aaqabaGccqGHsislceWH7oGbaKaadaqadaqaamrr1ngBPrwtHrhAXa qeguuDJXwAKbstHrhAG8KBLbacfaGae8hgXN1aa0baaSqaaiaadkga aeaacaGGQaaaaaGccaGLOaGaayzkaaaacaGLOaGaayzkaaGaaGzbVl aaywW7caaMf8UaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7caGGOaGa aG4maiaac6cacaaIXaGaaiykaaaa@787D@

where λ ^ ( Y b * ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaaceWH7oGbaKaada qadaqaamrr1ngBPrwtHrhAXaqeguuDJXwAKbstHrhAG8KBLbacfaGa e8hgXN1aa0baaSqaaiaadkgaaeaacaGGQaaaaaGccaGLOaGaayzkaa aaaa@4591@  only depends on resample A b . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGbbWaaSbaaS qaaiaadkgaaeqaaOGaaiOlaaaa@3809@  For ease of presentation, we take λ ^ ( Y b * ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaaceWH7oGbaKaada qadaqaamrr1ngBPrwtHrhAXaqeguuDJXwAKbstHrhAG8KBLbacfaGa e8hgXN1aa0baaSqaaiaadkgaaeaacaGGQaaaaaGccaGLOaGaayzkaa aaaa@4590@  as the sample mean. In this case, straightforward algebra reveals that

θ ^ n d , b a g = 1 ( n k ) A b A { 1 k i A b h ( k 1 k y i 1 k j i y j ) } , MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacuaH4oqCgaqcam aaBaaaleaacaWGUbGaamizaiaaiYcacaWGIbGaamyyaiaadEgaaeqa aOGaeyypa0ZaaSaaaeaacaaIXaaabaWaaeWaaeaafaqabeGabaaaba GaamOBaaqaaiaadUgaaaaacaGLOaGaayzkaaaaamaaqafabeWcbaGa amyqamaaBaaameaacaWGIbaabeaaliabgIGioprr1ngBPrwtHrhAXa qeguuDJXwAKbstHrhAG8KBLbacfaGae8haXheabeqdcqGHris5aOWa aiWaaeaadaWcaaqaaiaaigdaaeaacaWGRbaaamaaqafabeWcbaGaam yAaiabgIGiolaadgeadaWgaaadbaGaamOyaaqabaaaleqaniabggHi LdGccaWGObWaaeWaaeaadaWcaaqaaiaadUgacqGHsislcaaIXaaaba Gaam4AaaaacaWH5bWaaSbaaSqaaiaadMgaaeqaaOGaeyOeI0YaaSaa aeaacaaIXaaabaGaam4AaaaadaaeqbqabSqaaiaadQgacqGHGjsUca WGPbaabeqdcqGHris5aOGaaCyEamaaBaaaleaacaWGQbaabeaaaOGa ayjkaiaawMcaaaGaay5Eaiaaw2haaiaaiYcaaaa@70D8@

where A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1uy0H wmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaiab=bq8bbaa@40B5@  is the collection of subsets of size k MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGRbaaaa@3664@  from set { 1,2, , n } . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaadaGadaqaaiaaig dacaaISaGaaGOmaiaaiYcacqWIVlctcaaISaGaamOBaaGaay5Eaiaa w2haaiaac6caaaa@3ED1@  The estimator θ ^ n d , b a g MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacuaH4oqCgaqcam aaBaaaleaacaWGUbGaamizaiaaiYcacaWGIbGaamyyaiaadEgaaeqa aaaa@3CB1@  is a degree- k MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaqGKbGaaeyzai aabEgacaqGYbGaaeyzaiaabwgacaqGTaGaam4Aaaaa@3C92@  U-statistic with kernel

g ( y 1 , , y k ) = 1 k i = 1 k h ( k 1 k y i 1 k j = 1 j i k y j ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGNbWaaeWaae aacaWG5bWaaSbaaSqaaiaaigdaaeqaaOGaaGilaiabl+UimjaaiYca caWG5bWaaSbaaSqaaiaadUgaaeqaaaGccaGLOaGaayzkaaGaeyypa0 ZaaSaaaeaacaaIXaaabaGaam4AaaaadaaeWbqaaiaadIgadaqadaqa amaalaaabaGaam4AaiabgkHiTiaaigdaaeaacaWGRbaaaiaahMhada WgaaWcbaGaamyAaaqabaGccqGHsisldaWcaaqaaiaaigdaaeaacaWG RbaaamaaqahabeWcbaabaiqabaGaamOAaiabg2da9iaaigdaaeaaca WGQbGaeyiyIKRaamyAaaaaaeaacaWGRbaaniabggHiLdGccaWH5bWa aSbaaSqaaiaadQgaaeqaaaGccaGLOaGaayzkaaaaleaacaWGPbGaey ypa0JaaGymaaqaaiaadUgaa0GaeyyeIuoaaaa@5E94@

provided that k MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGRbaaaa@3664@  remains finite.

One can see that the bagging estimator θ ^ n d , b a g MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacuaH4oqCgaqcam aaBaaaleaacaWGUbGaamizaiaaiYcacaWGIbGaamyyaiaadEgaaeqa aaaa@3CB1@  is a symmetric statistic of y i , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWH5bWaaSbaaS qaaiaadMgaaeqaaOGaaiilaaaa@384A@  and standard theory on symmetric statistics (Lee 1990) applies. The results are stated in Theorem 1, with assumptions and proofs in Appendix A.2.

Theorem 1 Under Assumptions M.1-M.4 on the superpopulation distribution, sampling and resampling designs,   

AV ( θ ^ n d , b a g ) 1 / 2 ( θ ^ n d , b a g θ n d , ) p N ( 0,1 ) , ( 3.2 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaqGbbGaaeOvam aabmaabaGafqiUdeNbaKaadaWgaaWcbaGaamOBaiaadsgacaaISaGa amOyaiaadggacaWGNbaabeaaaOGaayjkaiaawMcaamaaCaaaleqaba GaeyOeI0IaaGymaiaac+cacaaIYaaaaOWaaeWaaeaacuaH4oqCgaqc amaaBaaaleaacaWGUbGaamizaiaaiYcacaWGIbGaamyyaiaadEgaae qaaOGaeyOeI0IaeqiUde3aaSbaaSqaaiaad6gacaWGKbGaaGilaiab g6HiLcqabaaakiaawIcacaGLPaaadaWfGaqaaiabgkziUcWcbeqaai aadchaaaGccaWGobWaaeWaaeaacaaIWaGaaGilaiaaigdaaiaawIca caGLPaaacaaISaGaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7caaMf8 UaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7 caaMf8UaaGzbVlaaywW7caaMf8UaaGjbVlaacIcacaaIZaGaaiOlai aaikdacaGGPaaaaa@7C59@

where the limiting value θ n d , = lim n E [ h ( y i λ ^ ) ] , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacqaH4oqCdaWgaa WcbaGaamOBaiaadsgacaaISaGaeyOhIukabeaakiabg2da9maaxaba baGaciiBaiaacMgacaGGTbaaleaacaWGUbGaeyOKH4QaeyOhIukabe aakiaabweadaWadaqaaiaadIgadaqadaqaaiaahMhadaWgaaWcbaGa amyAaaqabaGccqGHsislceWH7oGbaKaaaiaawIcacaGLPaaaaiaawU facaGLDbaacaGGSaaaaa@4E17@  the asymptotic variance

AV( θ ^ nd,bag )= 1 n Var[ u( y i ) ]+ ( k1 ) 2 n Var[ v( y i ) ]+ 2( k1 ) n Cov[ u( y i ),v( y i ) ],(3.3) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaqGbbGaaeOvam aabmaabaGafqiUdeNbaKaadaWgaaWcbaGaamOBaiaadsgacaaISaGa amOyaiaadggacaWGNbaabeaaaOGaayjkaiaawMcaaiaai2dadaWcaa qaaiaaigdaaeaacaWGUbaaaiaabAfacaqGHbGaaeOCamaadmaabaGa amyDamaabmaabaGaaCyEamaaBaaaleaacaWGPbaabeaaaOGaayjkai aawMcaaaGaay5waiaaw2faaiabgUcaRmaalaaabaWaaeWaaeaacaWG RbGaeyOeI0IaaGymaaGaayjkaiaawMcaamaaCaaaleqabaGaaGOmaa aaaOqaaiaad6gaaaGaaeOvaiaabggacaqGYbWaamWaaeaacaWG2bWa aeWaaeaacaWH5bWaaSbaaSqaaiaadMgaaeqaaaGccaGLOaGaayzkaa aacaGLBbGaayzxaaGaey4kaSYaaSaaaeaacaaIYaWaaeWaaeaacaWG RbGaeyOeI0IaaGymaaGaayjkaiaawMcaaaqaaiaad6gaaaGaae4qai aab+gacaqG2bWaamWaaeaacaWG1bWaaeWaaeaacaWH5bWaaSbaaSqa aiaadMgaaeqaaaGccaGLOaGaayzkaaGaaGilaiaadAhadaqadaqaai aahMhadaWgaaWcbaGaamyAaaqabaaakiaawIcacaGLPaaaaiaawUfa caGLDbaacaaISaGaaGzbVlaaywW7caaMf8UaaiikaiaaiodacaGGUa GaaG4maiaacMcaaaa@7A4C@

and

u( y )=E[ h( y λ ^ ( y 1 , y 2 ,, y k1 ,y ) ) ], v( y )=E[ h( y 1 λ ^ ( y 1 , y 2 ,, y k1 ,y ) ) ]. MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakqaaeeqaaiaadwhada qadaqaaiaahMhaaiaawIcacaGLPaaacqGH9aqpcaqGfbWaamWaaeaa caWGObWaaeWaaeaacaWH5bGaeyOeI0IabC4UdyaajaWaaeWaaeaaca WH5bWaaSbaaSqaaiaaigdaaeqaaOGaaGilaiaahMhadaWgaaWcbaGa aGOmaaqabaGccaaISaGaeS47IWKaaGilaiaahMhadaWgaaWcbaGaam 4AaiabgkHiTiaaigdaaeqaaOGaaGilaiaahMhaaiaawIcacaGLPaaa aiaawIcacaGLPaaaaiaawUfacaGLDbaacaaISaaabaGaamODamaabm aabaGaaCyEaaGaayjkaiaawMcaaiabg2da9iaabweadaWadaqaaiaa dIgadaqadaqaaiaahMhadaWgaaWcbaGaaGymaaqabaGccqGHsislce WH7oGbaKaadaqadaqaaiaahMhadaWgaaWcbaGaaGymaaqabaGccaaI SaGaaCyEamaaBaaaleaacaaIYaaabeaakiaaiYcacqWIVlctcaaISa GaaCyEamaaBaaaleaacaWGRbGaeyOeI0IaaGymaaqabaGccaaISaGa aCyEaaGaayjkaiaawMcaaaGaayjkaiaawMcaaaGaay5waiaaw2faai aai6caaaaa@6FEF@

As indicated by (3.3), the asymptotic variance of the bagging estimator depends on unknown functions u ( y ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWG1bWaaeWaae aacaWH5baacaGLOaGaayzkaaaaaa@38F9@  and v ( y ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWG2bWaaeWaae aacaWH5baacaGLOaGaayzkaaGaaiilaaaa@39AA@  which are expectations of h ( ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGObGaaiikai abgwSixlaacMcaaaa@3A04@  with respect to the superpopulation distribution. In u ( y ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWG1bWaaeWaae aacaWH5baacaGLOaGaayzkaaaaaa@38F9@  and v ( y ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWG2bWaaeWaae aacaWH5baacaGLOaGaayzkaaGaaiilaaaa@39AA@   λ ^ ( y 1 , y 2 , , y k 1 , y ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaaceWH7oGbaKaada qadaqaaiaahMhadaWgaaWcbaGaaGymaaqabaGccaaISaGaaCyEamaa BaaaleaacaaIYaaabeaakiaaiYcacqWIVlctcaaISaGaaCyEamaaBa aaleaacaWGRbGaeyOeI0IaaGymaaqabaGccaaISaGaaCyEaaGaayjk aiaawMcaaaaa@45D3@  is calculated from y 1 , y 2 , , y k 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWH5bWaaSbaaS qaaiaaigdaaeqaaOGaaGilaiaahMhadaWgaaWcbaGaaGOmaaqabaGc caaISaGaeS47IWKaaGilaiaahMhadaWgaaWcbaGaam4AaiabgkHiTi aaigdaaeqaaaaa@4131@  together with an arbitrary vector y . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWH5bGaaiOlaa aa@3728@  The expectation is with respect to the distribution of i i d MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGPbGaamyAai aadsgaaaa@3839@  random vectors y 1 , y 2 , , y k 1 . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWH5bWaaSbaaS qaaiaaigdaaeqaaOGaaGilaiaahMhadaWgaaWcbaGaaGOmaaqabaGc caaISaGaeS47IWKaaGilaiaahMhadaWgaaWcbaGaam4AaiabgkHiTi aaigdaaeqaaOGaaiOlaaaa@41ED@  This high-dimensional expectation is difficult to calculate and may not have an explicit expression in general. The exact form of u ( ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWG1bGaaiikai abgwSixlaacMcaaaa@3A11@  and v ( ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWG2bGaaiikai abgwSixlaacMcaaaa@3A12@  can not be obtained but can be approximated via a resampling-based approach. The unknown functions u ( ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWG1bGaaiikai abgwSixlaacMcaaaa@3A11@  and v ( ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWG2bGaaiikai abgwSixlaacMcaaaa@3A12@  are defined as expectations of respective quantities with respect to the superpopulation distribution, which can be approximated by the expectation with respect to the empirical distribution.

The model-based asymptotic variance can be estimated along with the process of bagging. We can calculate integrands h ( y λ ^ ( y 1 , y 2 , , y k 1 , y ) ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGObWaaeWaae aacaWH5bGaeyOeI0IabC4UdyaajaWaaeWaaeaacaWH5bWaa0baaSqa aiaaigdaaeaacqGHxiIkaaGccaaISaGaaCyEamaaDaaaleaacaaIYa aabaGaey4fIOcaaOGaaGilaiabl+UimjaaiYcacaWH5bWaa0baaSqa aiaadUgacqGHsislcaaIXaaabaGaey4fIOcaaOGaaGilaiaahMhaai aawIcacaGLPaaaaiaawIcacaGLPaaaaaa@4D08@  and h ( y 1 λ ^ ( y 1 , y 2 , , y k 1 , y ) ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWGObWaaeWaae aacaWH5bWaaSbaaSqaaiaaigdaaeqaaOGaeyOeI0IabC4UdyaajaWa aeWaaeaacaWH5bWaa0baaSqaaiaaigdaaeaacqGHxiIkaaGccaaISa GaaCyEamaaDaaaleaacaaIYaaabaGaey4fIOcaaOGaaGilaiabl+Ui mjaaiYcacaWH5bWaa0baaSqaaiaadUgacqGHsislcaaIXaaabaGaey 4fIOcaaOGaaGilaiaahMhaaiaawIcacaGLPaaaaiaawIcacaGLPaaa aaa@4DF9@  based on each bootstrap sample, with y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWH5baaaa@3676@  denoting where we want to evaluate u ( ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWG1bGaaiikai abgwSixlaacMcaaaa@3A11@  and v ( ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWG2bGaaiikai abgwSixlaacMcacaGGSaaaaa@3AC2@  and y 1 , y 2 , , y k 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWH5bWaa0baaS qaaiaaigdaaeaacqGHxiIkaaGccaaISaGaaCyEamaaDaaaleaacaaI YaaabaGaey4fIOcaaOGaaGilaiabl+UimjaaiYcacaWH5bWaa0baaS qaaiaadUgacqGHsislcaaIXaaabaGaey4fIOcaaaaa@4401@  denoting resampled values. Then we can average each quantity to approximate the expectation. Finally, the variance can be estimated by computing the sample variance of the expectations evaluated at each of the sample points. For nonsmooth estimators like the ones we are dealing with, it is often recommended to use smoothed bootstrap in variance approximation (Efron 1979; Davison and Hinkley 1997). We apply the smoothed bootstrap and add a small amount of noise to each resampled value to smooth the underlying function. The detailed algorithm will be explained in Section 5 through an example.

We now study the model-based result of bagging estimators defined by estimating equations (2.7). A special case in this framework is bagging sample quantiles, which was studied by Knight and Bassett (2002). Knight and Bassett (2002) considered both bootstrap and SRSWOR for resampling, and studied the effects of bagging on the remainder term in the Bahadur representation of quantiles (Bahadur 1966). We take a slightly different perspective and treat the bagging estimator as a U-statistic. Assumptions and proof are again in Appendix A.2. Note that Assumption M.5 requires that the non-differentiable estimating function have a smooth limit. In the next theorem, we provide linearization of the bagging estimating equation estimator and give an expression for the asymptotic variance.

Theorem 2 Under Assumptions M.1-M.3 and M.5, the following asymptotic result holds for the bagged estimating equation estimator (2.7),

AV ( θ ^ e e , b a g ) 1 / 2 ( θ ^ e e , b a g θ e e , ) p N ( 0,1 ) , ( 3.4 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaqGbbGaaeOvam aabmaabaGafqiUdeNbaKaadaWgaaWcbaGaamyzaiaadwgacaaISaGa amOyaiaadggacaWGNbaabeaaaOGaayjkaiaawMcaamaaCaaaleqaba GaeyOeI0IaaGymaiaac+cacaaIYaaaaOWaaeWaaeaacuaH4oqCgaqc amaaBaaaleaacaWGLbGaamyzaiaaiYcacaWGIbGaamyyaiaadEgaae qaaOGaeyOeI0IaeqiUde3aaSbaaSqaaiaadwgacaWGLbGaaGilaiab g6HiLcqabaaakiaawIcacaGLPaaadaWfGaqaaiabgkziUcWcbeqaai aadchaaaGccaWGobWaaeWaaeaacaaIWaGaaGilaiaaigdaaiaawIca caGLPaaacaaISaGaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7caaMf8 UaaGzbVlaaywW7caaMe8UaaiikaiaaiodacaGGUaGaaGinaiaacMca aaa@6CB7@

where θ e e , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacqaH4oqCdaWgaa WcbaGaamyzaiaadwgacaaISaGaeyOhIukabeaaaaa@3B51@  denotes the asymptotic limit of population quantity θ e e , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacqaH4oqCdaWgaa WcbaGaamyzaiaadwgaaeqaaOGaaiilaaaa@39E4@  the asymptotic variance of θ ^ e e , b a g MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacuaH4oqCgaqcam aaBaaaleaacaWGLbGaamyzaiaaiYcacaWGIbGaamyyaiaadEgaaeqa aaaa@3CA9@  is

AV ( θ ^ e e , b a g ) = k 2 n Var [ u ( y i ) ] , ( 3.5 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaqGbbGaaeOvam aabmaabaGafqiUdeNbaKaadaWgaaWcbaGaamyzaiaadwgacaaISaGa amOyaiaadggacaWGNbaabeaaaOGaayjkaiaawMcaaiabg2da9maala aabaGaam4AamaaCaaaleqabaGaaGOmaaaaaOqaaiaad6gaaaGaaeOv aiaabggacaqGYbWaamWaaeaacaWG1bWaaeWaaeaacaWG5bWaaSbaaS qaaiaadMgaaeqaaaGccaGLOaGaayzkaaaacaGLBbGaayzxaaGaaGil aiaaywW7caaMf8UaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7caaMf8 UaaGzbVlaaywW7caaMf8UaaGzbVlaaysW7caGGOaGaaG4maiaac6ca caaI1aGaaiykaaaa@657F@

and

u ( y ) =E  inf { γ : 1 k i = 1 k 1 ψ ( y i γ ) + 1 k ψ ( y γ ) 0 } . ( 3.6 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfea0=yr0RYxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9 Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaWG1bWaaeWaae aacaWG5baacaGLOaGaayzkaaGaaeypaiaabweacaqGGaGaciyAaiaa c6gacaGGMbWaaiWaaeaacqaHZoWzcaGG6aWaaSaaaeaacaaIXaaaba Gaam4AaaaadaaeWbqaaiabeI8a5naabmaabaGaamyEamaaBaaaleaa caWGPbaabeaakiabgkHiTiabeo7aNbGaayjkaiaawMcaaaWcbaGaam yAaiabg2da9iaaigdaaeaacaWGRbGaeyOeI0IaaGymaaqdcqGHris5 aOGaey4kaSYaaSaaaeaacaaIXaaabaGaam4AaaaacqaHipqEdaqada qaaiaadMhacqGHsislcqaHZoWzaiaawIcacaGLPaaacqGHLjYScaaI WaaacaGL7bGaayzFaaGaaGOlaiaaywW7caaMf8UaaGzbVlaaywW7ca GGOaGaaG4maiaac6cacaaI2aGaaiykaaaa@6A59@

As we saw for the bagged estimator (3.1), the asymptotic results in Theorem 2 involve an unknown function. This function can again be computed using resampling that takes advantage of the available replicate samples.   

Previous | Next

Report a problem on this page

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

Privacy notice

Date modified: