1 Introduction

Stephen J. Kaputa and Katherine Jenny Thompson

Previous | Next

Developing viable decile estimates for positively skewed populations from complex survey data poses interesting challenges. The literature supports two different approaches to percentile estimation with complex survey data. The first method (the "traditional� method) obtains decile estimates from empirical cumulative-distribution functions, selecting the item value that corresponds to the sample percentile computed by summing associated survey weights. This approach yields decile estimates that are "close to unbiased� but unstable. An alternative approach is to group the continuous data into disjoint intervals (bins), then use linear interpolation over the bin containing the decile. With appropriately defined bins, this approach also produces nearly unbiased decile estimates while improving their stability - at least for the percentiles that are far from the tail of the distribution. For the upper percentiles, often the binned data contain very few observations, with little or no uniformity. Hence, the reliability of the large decile estimates (e.g., 90th percentile or greater) is rarely comparable to that of the other deciles.

Although the usage of interpolation is advantageous for developing stable estimates, developing an optimal set of bins for a given characteristic is not always an easy task. Often, the distributions change over time, and the bin widths/locations in the sample should reflect this change in scale. For example, the average sales price for single-family homes in a geographic area may increase over time due to inflation, but the population of single-family homes in that area is still characterized by a skewed distribution, with a few expensive homes located in the tail. Many economic data programs share this trait. Consequently, developing a fixed set of bins for interpolation with an ongoing survey is unwise. To address this, Thompson and Sigman (2000) introduced an estimation procedure for estimating medians from highly positively skewed population data. Their procedure uses interpolation over data-dependent intervals (bins), after scaling by the 75th percentile. The earlier paper examined the estimation and variance estimation properties of the considered methods, using modified half sample (MHS) replication for variance estimation (Fay 1989; Judkins 1990).

This research extends the previous work to decile estimation methods using complex survey data sampled from a positively skewed population. We present three different interpolation methods along with the traditional decile estimation method (no bins) and evaluate each method empirically, using residential housing data from the Survey of Construction (SOC) conducted by the U.S. Census Bureau and via a simulation study. Our research was motivated by a recent request from the SOC data users to estimate and publish complete sets of decile estimates for several housing characteristics. Thus, our research was conducted under the constraints of maintaining comparably reliable median estimates as those currently published and using MHS replication for variance estimation.

Section 2 presents the candidate decile estimation methods and gives an overview of modified half-sample replication. Section 3 evaluates these procedures, using empirical and simulated data from the Survey of Construction (SOC). Finally, we conclude with recommendations in Section 4.

Previous | Next

Date modified: