# Statistics by subject – Statistical methods

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
All (28)

## All (28) (25 of 28 results)

• Articles and reports: 12-001-X201600214676
Description:

Winsorization procedures replace extreme values with less extreme values, effectively moving the original extreme values toward the center of the distribution. Winsorization therefore both detects and treats influential values. Mulry, Oliver and Kaputa (2014) compare the performance of the one-sided Winsorization method developed by Clark (1995) and described by Chambers, Kokic, Smith and Cruddas (2000) to the performance of M-estimation (Beaumont and Alavi 2004) in highly skewed business population data. One aspect of particular interest for methods that detect and treat influential values is the range of values designated as influential, called the detection region. The Clark Winsorization algorithm is easy to implement and can be extremely effective. However, the resultant detection region is highly dependent on the number of influential values in the sample, especially when the survey totals are expected to vary greatly by collection period. In this note, we examine the effect of the number and magnitude of influential values on the detection regions from Clark Winsorization using data simulated to realistically reflect the properties of the population for the Monthly Retail Trade Survey (MRTS) conducted by the U.S. Census Bureau. Estimates from the MRTS and other economic surveys are used in economic indicators, such as the Gross Domestic Product (GDP).

Release date: 2016-12-20

• Articles and reports: 12-001-X201600214664
Description:

This paper draws statistical inference for finite population mean based on judgment post stratified (JPS) samples. The JPS sample first selects a simple random sample and then stratifies the selected units into H judgment classes based on their relative positions (ranks) in a small set of size H. This leads to a sample with random sample sizes in judgment classes. Ranking process can be performed either using auxiliary variables or visual inspection to identify the ranks of the measured observations. The paper develops unbiased estimator and constructs confidence interval for population mean. Since judgment ranks are random variables, by conditioning on the measured observations we construct Rao-Blackwellized estimators for the population mean. The paper shows that Rao-Blackwellized estimators perform better than usual JPS estimators. The proposed estimators are applied to 2012 United States Department of Agriculture Census Data.

Release date: 2016-12-20

• Technical products: 11-522-X201700014741
Description:

Release date: 2016-03-24

• Articles and reports: 12-001-X201400214118
Description:

Bagging is a powerful computational method used to improve the performance of inefficient estimators. This article is a first exploration of the use of bagging in survey estimation, and we investigate the effects of bagging on non-differentiable survey estimators including sample distribution functions and quantiles, among others. The theoretical properties of bagged survey estimators are investigated under both design-based and model-based regimes. In particular, we show the design consistency of the bagged estimators, and obtain the asymptotic normality of the estimators in the model-based context. The article describes how implementation of bagging for survey estimators can take advantage of replicates developed for survey variance estimation, providing an easy way for practitioners to apply bagging in existing surveys. A major remaining challenge in implementing bagging in the survey context is variance estimation for the bagged estimators themselves, and we explore two possible variance estimation approaches. Simulation experiments reveal the improvement of the proposed bagging estimator relative to the original estimator and compare the two variance estimation approaches.

Release date: 2014-12-19

• Technical products: 11-522-X201300014266
Description:

Monitors and self-reporting are two methods of measuring energy expended in physical activity, where monitor devices typically have much smaller error variances than do self-reports. The Physical Activity Measurement Survey was designed to compare the two procedures, using replicate observations on the same individual. The replicates permit calibrating the personal report measurement to the monitor measurement and make it possible to estimate components of the measurement error variances. Estimates of the variance components of measurement error in monitor-and self-report energy expenditure are given for females in the Physical Activity Measurement Survey.

Release date: 2014-10-31

• Articles and reports: 82-003-X201301011873
Description:

A computer simulation model of physical activity was developed for the Canadian adult population using longitudinal data from the National Population Health Survey and cross-sectional data from the Canadian Community Health Survey. The model is based on the Population Health Model (POHEM) platform developed by Statistics Canada. This article presents an overview of POHEM and describes the additions that were made to create the physical activity module (POHEM-PA). These additions include changes in physical activity over time, and the relationship between physical activity levels and health-adjusted life expectancy, life expectancy and the onset of selected chronic conditions. Estimates from simulation projections are compared with nationally representative survey data to provide an indication of the validity of POHEM-PA.

Release date: 2013-10-16

• Articles and reports: 82-003-X201200111633
Description:

This paper explains the methodology for creating Geozones, which are area-based thresholds of population characteristics derived from census data, which can be used in the analysis of social or economic differences in health and health service utilization.

Release date: 2012-03-21

• Articles and reports: 12-001-X201100211610
Description:

In this paper, a discussion of the three papers from the US Census Bureau special compilation is presented.

Release date: 2011-12-21

• Articles and reports: 82-003-X201100211437
Description:

This article examines the internal consistency of the English and French versions of the Medical Outcomes Study social support scale for a sample of older adults. The second objective is to conduct a confirmatory factor analysis to assess the factor structure of the English and French versions of the scale. A third purpose is to determine if the items comprising the scale operate in the same way for English- and French-speaking respondents.

Release date: 2011-05-18

• Articles and reports: 82-003-X201100111404
Description:

This study assesses three child-reported parenting behaviour scales (nurturance, rejection and monitoring) in the National Longitudinal Survey of Children and Youth.

Release date: 2011-02-16

• Articles and reports: 12-001-X200900211039
Description:

Propensity weighting is a procedure to adjust for unit nonresponse in surveys. A form of implementing this procedure consists of dividing the sampling weights by estimates of the probabilities that the sampled units respond to the survey. Typically, these estimates are obtained by fitting parametric models, such as logistic regression. The resulting adjusted estimators may become biased when the specified parametric models are incorrect. To avoid misspecifying such a model, we consider nonparametric estimation of the response probabilities by local polynomial regression. We study the asymptotic properties of the resulting estimator under quasi-randomization. The practical behavior of the proposed nonresponse adjustment approach is evaluated on NHANES data.

Release date: 2009-12-23

• Technical products: 11-522-X200800010968
Description:

Statistics Canada has embarked on a program of increasing and improving the usage of imaging technology for paper survey questionnaires. The goal is to make the process an efficient, reliable and cost effective method of capturing survey data. The objective is to continue using Optical Character Recognition (OCR) to capture the data from questionnaires, documents and faxes received whilst improving the process integration and Quality Assurance/Quality Control (QC) of the data capture process. These improvements are discussed in this paper.

Release date: 2009-12-03

• Technical products: 11-536-X200900110810
Description:

Post-stratification is frequently used to improve the precision of survey estimators when categorical auxiliary information is available from sources outside the survey. In natural resource surveys, such information is often obtained from remote sensing data, classified into categories and displayed as pixel-based maps. These maps may be constructed based on classification models fitted to the sample data. Post-stratification of the sample data based on categories derived from the sample data ("endogenous post-stratification") violates several standard post-stratification assumptions, and has been generally considered invalid as a design-based estimation method. In this presentation, properties of the endogenous post-stratification estimator are derived for the case of a sample-fitted generalized linear model. Design consistency of the endogenous post-stratification estimator is established under mild conditions. Under a superpopulation model, consistency and asymptotic normality of the endogenous post-stratification estimator are established. Simulation experiments demonstrate that the practical effect of first fitting a model to the survey data before post-stratifying is small, even for relatively small sample sizes.

Release date: 2009-08-11

• Technical products: 11-522-X200600110446
Description:

Immigrants have health advantages over native-born Canadians, but those advantages are threatened by specific risk situations. This study explores cardiovascular health outcomes in districts of Montréal classified by the proportion of immigrants in the population, using a principal component analysis. The first three components are immigration, degree of socio-economic disadvantage and degree of economic disadvantage. The incidence of myocardial infarction is lower in districts with large immigrant populations than in districts dominated by native-born Canadians. Mortality rates are associated with the degree of socio-economic disadvantage, while revascularization is associated with the proportion of seniors in the population.

Release date: 2008-03-17

• Technical products: 11-522-X200600110418
Description:

The current use of multilevel models to examine the effects of surrounding contexts on health outcomes attest to their value as a statistical method for analyzing grouped data. But the use of multilevel modeling with data from population-based surveys is often limited by the small number of cases per level-2 unit, prompting a recent trend in the neighborhood literature to apply cluster analysis techniques to address the problem of data sparseness. In this paper we use Monte Carlo simulations to investigate the effects of marginal group sizes and cluster analysis techniques on the validity of parameter estimates in both linear and non-linear multilevel models.

Release date: 2008-03-17

• Technical products: 11-522-X200600110441
Description:

How does one efficiently estimate sample size while building concensus among multiple investigators for multi-purpose projects? We present a template using common spreadsheet software to provide estimates of power, precision, and financial costs under varying sampling scenarios, as used in development of the Ontario Tobacco Survey. In addition to cost estimates, complex sample size formulae were nested within a spreadsheet to determine power and precision, incorporating user-defined design effects and loss-to-followup. Common spreadsheet software can be used in conjunction with complex formulae to enhance knowledge exchange between the methodologists and stakeholders; in effect demystifying the "sample size black box".

Release date: 2008-03-17

• Articles and reports: 12-001-X200700210495
Description:

The purpose of this work is to obtain reliable estimates in study domains when there are potentially very small sample sizes and the sampling design stratum differs from the study domain. The population sizes are unknown as well for both the study domain and the sampling design stratum. In calculating parameter estimates in the study domains, a random sample size is often necessary. We propose a new family of generalized linear mixed models with correlated random effects when there is more than one unknown parameter. The proposed model will estimate both the population size and the parameter of interest. General formulae for full conditional distributions required for Markov chain Monte Carlo (MCMC) simulations are given for this framework. Equations for Bayesian estimation and prediction at the study domains are also given. We apply the 1998 Missouri Turkey Hunting Survey, which stratified samples based on the hunter's place of residence and we require estimates at the domain level, defined as the county in which the turkey hunter actually hunted.

Release date: 2008-01-03

• Articles and reports: 12-001-X20070019850
Description:

Auxiliary information is often used to improve the precision of survey estimators of finite population means and totals through ratio or linear regression estimation techniques. Resulting estimators have good theoretical and practical properties, including invariance, calibration and design consistency. However, it is not always clear that ratio or linear models are good approximations to the true relationship between the auxiliary variables and the variable of interest in the survey, resulting in efficiency loss when the model is not appropriate. In this article, we explain how regression estimation can be extended to incorporate semiparametric regression models, in both simple and more complicated designs. While maintaining the good theoretical and practical properties of the linear models, semiparametric models are better able to capture complicated relationships between variables. This often results in substantial gains in efficiency. The applicability of the approach for complex designs using multiple types of auxiliary variables will be illustrated by estimating several acidification-related characteristics for a survey of lakes in the Northeastern US.

Release date: 2007-06-28

• Articles and reports: 12-001-X20050029052
Description:

Estimates of a sampling variance-covariance matrix are required in many statistical analyses, particularly for multilevel analysis. In univariate problems, functions relating the variance to the mean have been used to obtain variance estimates, pooling information across units or variables. We present variance and correlation functions for multivariate means of ordinal survey items, both for complete data and for data with structured non-response. Methods are also developed for assessing model fit, and for computing composite estimators that combine direct and model-based predictions. Survey data from the Consumer Assessments of Health Plans Study (CAHPS®) illustrate the application of the methodology.

Release date: 2006-02-17

• Articles and reports: 12-001-X20040016993
Description:

The weighting cell estimator corrects for unit nonresponse by dividing the sample into homogeneous groups (cells) and applying a ratio correction to the respondents within each cell. Previous studies of the statistical properties of weighting cell estimators have assumed that these cells correspond to known population cells with homogeneous characteristics. In this article, we study the properties of the weighting cell estimator under a response probability model that does not require correct specification of homogeneous population cells. Instead, we assume that the response probabilities are a smooth but otherwise unspecified function of a known auxiliary variable. Under this more general model, we study the robustness of the weighting cell estimator against model misspecification. We show that, even when the population cells are unknown, the estimator is consistent with respect to the sampling design and the response model. We describe the effect of the number of weighting cells on the asymptotic properties of the estimator. Simulation experiments explore the finite sample properties of the estimator. We conclude with some guidance on how to select the size and number of cells for practical implementation of weighting cell estimation when those cells cannot be specified a priori.

Release date: 2004-07-14

• Articles and reports: 12-001-X20000025538
Description:

Cochran (1977, p.374) proposed some ratio and regression estimators of the population mean using the Hansen and Hurwitz (1946) procedure of sub-sampling the non-respondents assuming that the population mean of the auxiliary character is known. For the case where the population mean of the auxiliary character is not known in advance, some double (two-phase) sampling ratio and regression estimators are presented in this article. The relative performances of the proposed estimators are compared with the estimator proposed by Hansen and Hurwitz (1946).

Release date: 2001-02-28

• Technical products: 11-522-X19980015017
Description:

Longitudinal studies with repeated observations on individuals permit better characterizations of change and assessment of possible risk factors, but there has been little experience applying sophisticated models for longitudinal data to the complex survey setting. We present results from a comparison of different variance estimation methods for random effects models of change in cognitive function among older adults. The sample design is a stratified sample of people 65 and older, drawn as part of a community-based study designed to examine risk factors for dementia. The model summarizes the population heterogeneity in overall level and rate of change in cognitive function using random effects for intercept and slope. We discuss an unweighted regression including covariates for the stratification variables, a weighted regression, and bootstrapping; we also did preliminary work into using balanced repeated replication and jackknife repeated replication.

Release date: 1999-10-22

• Articles and reports: 12-001-X19990014710
Description:

Most statistical offices select the sample of commodities of which prices are collected for their Consumer Price Indexes with non-probability techniques. In the Netherlands, and in many other countries as well, those judgemental sampling methods come close to some kind of cut-off selection, in which a large part of the population (usually the items with the lowest expenditures) is deliberately left unobserved. This method obviously yields biased price index numbers. The question arises whether probability sampling would lead to better results in terms of the mean square error. We have considered simple random sampling, stratified sampling and systematic sampling proportional to expenditure. Monte Carlo simulations using scanner data on coffee, baby's napkins and toilet paper were carried out to assess the performance of the four sampling designs. Surprisingly perhaps, cut-off selection is shown to be a successful strategy for item sampling in the consumer price index.

Release date: 1999-10-08

• Articles and reports: 12-001-X19970013101
Description:

In the main body of statistics, sampling is often disposed of by assuming a sampling process that selects random variables such that they are independent and identically distributed (IID). Important techniques, like regression and contingency table analysis, were developed largely in the IID world; hence, adjustments are needed to use them in complex survey settings. Rather than adjust the analysis, however, what is new in the present formulation is to draw a second sample from the original sample. In this second sample, the first set of selections are inverted, so as to yield at the end a simple random sample. Of course, to employ this two-step process to draw a single simple random sample from the usually much larger complex survey would be inefficient, so multiple simple random samples are drawn and a way to base inferences on them developed. Not all original samples can be inverted; but many practical special cases are discussed which cover a wide range of practices.

Release date: 1997-08-18

• Articles and reports: 12-001-X199400114432
Description:

Two sampling strategies for estimation of population mean in overlapping clusters with known population size have been proposed by Singh (1988). In this paper, ratio estimators under these two strategies are studied assuming the actual population size to be unknown, which is the more realistic situation in sample surveys. The sampling efficiencies of the two strategies are compared and a numerical illustration is provided.

Release date: 1994-06-15

Data (0)

You may try:

Analysis (20)

## Analysis (20) (20 of 20 results)

• Articles and reports: 12-001-X201600214676
Description:

Winsorization procedures replace extreme values with less extreme values, effectively moving the original extreme values toward the center of the distribution. Winsorization therefore both detects and treats influential values. Mulry, Oliver and Kaputa (2014) compare the performance of the one-sided Winsorization method developed by Clark (1995) and described by Chambers, Kokic, Smith and Cruddas (2000) to the performance of M-estimation (Beaumont and Alavi 2004) in highly skewed business population data. One aspect of particular interest for methods that detect and treat influential values is the range of values designated as influential, called the detection region. The Clark Winsorization algorithm is easy to implement and can be extremely effective. However, the resultant detection region is highly dependent on the number of influential values in the sample, especially when the survey totals are expected to vary greatly by collection period. In this note, we examine the effect of the number and magnitude of influential values on the detection regions from Clark Winsorization using data simulated to realistically reflect the properties of the population for the Monthly Retail Trade Survey (MRTS) conducted by the U.S. Census Bureau. Estimates from the MRTS and other economic surveys are used in economic indicators, such as the Gross Domestic Product (GDP).

Release date: 2016-12-20

• Articles and reports: 12-001-X201600214664
Description:

This paper draws statistical inference for finite population mean based on judgment post stratified (JPS) samples. The JPS sample first selects a simple random sample and then stratifies the selected units into H judgment classes based on their relative positions (ranks) in a small set of size H. This leads to a sample with random sample sizes in judgment classes. Ranking process can be performed either using auxiliary variables or visual inspection to identify the ranks of the measured observations. The paper develops unbiased estimator and constructs confidence interval for population mean. Since judgment ranks are random variables, by conditioning on the measured observations we construct Rao-Blackwellized estimators for the population mean. The paper shows that Rao-Blackwellized estimators perform better than usual JPS estimators. The proposed estimators are applied to 2012 United States Department of Agriculture Census Data.

Release date: 2016-12-20

• Articles and reports: 12-001-X201400214118
Description:

Bagging is a powerful computational method used to improve the performance of inefficient estimators. This article is a first exploration of the use of bagging in survey estimation, and we investigate the effects of bagging on non-differentiable survey estimators including sample distribution functions and quantiles, among others. The theoretical properties of bagged survey estimators are investigated under both design-based and model-based regimes. In particular, we show the design consistency of the bagged estimators, and obtain the asymptotic normality of the estimators in the model-based context. The article describes how implementation of bagging for survey estimators can take advantage of replicates developed for survey variance estimation, providing an easy way for practitioners to apply bagging in existing surveys. A major remaining challenge in implementing bagging in the survey context is variance estimation for the bagged estimators themselves, and we explore two possible variance estimation approaches. Simulation experiments reveal the improvement of the proposed bagging estimator relative to the original estimator and compare the two variance estimation approaches.

Release date: 2014-12-19

• Articles and reports: 82-003-X201301011873
Description:

A computer simulation model of physical activity was developed for the Canadian adult population using longitudinal data from the National Population Health Survey and cross-sectional data from the Canadian Community Health Survey. The model is based on the Population Health Model (POHEM) platform developed by Statistics Canada. This article presents an overview of POHEM and describes the additions that were made to create the physical activity module (POHEM-PA). These additions include changes in physical activity over time, and the relationship between physical activity levels and health-adjusted life expectancy, life expectancy and the onset of selected chronic conditions. Estimates from simulation projections are compared with nationally representative survey data to provide an indication of the validity of POHEM-PA.

Release date: 2013-10-16

• Articles and reports: 82-003-X201200111633
Description:

This paper explains the methodology for creating Geozones, which are area-based thresholds of population characteristics derived from census data, which can be used in the analysis of social or economic differences in health and health service utilization.

Release date: 2012-03-21

• Articles and reports: 12-001-X201100211610
Description:

In this paper, a discussion of the three papers from the US Census Bureau special compilation is presented.

Release date: 2011-12-21

• Articles and reports: 82-003-X201100211437
Description:

This article examines the internal consistency of the English and French versions of the Medical Outcomes Study social support scale for a sample of older adults. The second objective is to conduct a confirmatory factor analysis to assess the factor structure of the English and French versions of the scale. A third purpose is to determine if the items comprising the scale operate in the same way for English- and French-speaking respondents.

Release date: 2011-05-18

• Articles and reports: 82-003-X201100111404
Description:

This study assesses three child-reported parenting behaviour scales (nurturance, rejection and monitoring) in the National Longitudinal Survey of Children and Youth.

Release date: 2011-02-16

• Articles and reports: 12-001-X200900211039
Description:

Propensity weighting is a procedure to adjust for unit nonresponse in surveys. A form of implementing this procedure consists of dividing the sampling weights by estimates of the probabilities that the sampled units respond to the survey. Typically, these estimates are obtained by fitting parametric models, such as logistic regression. The resulting adjusted estimators may become biased when the specified parametric models are incorrect. To avoid misspecifying such a model, we consider nonparametric estimation of the response probabilities by local polynomial regression. We study the asymptotic properties of the resulting estimator under quasi-randomization. The practical behavior of the proposed nonresponse adjustment approach is evaluated on NHANES data.

Release date: 2009-12-23

• Articles and reports: 12-001-X200700210495
Description:

The purpose of this work is to obtain reliable estimates in study domains when there are potentially very small sample sizes and the sampling design stratum differs from the study domain. The population sizes are unknown as well for both the study domain and the sampling design stratum. In calculating parameter estimates in the study domains, a random sample size is often necessary. We propose a new family of generalized linear mixed models with correlated random effects when there is more than one unknown parameter. The proposed model will estimate both the population size and the parameter of interest. General formulae for full conditional distributions required for Markov chain Monte Carlo (MCMC) simulations are given for this framework. Equations for Bayesian estimation and prediction at the study domains are also given. We apply the 1998 Missouri Turkey Hunting Survey, which stratified samples based on the hunter's place of residence and we require estimates at the domain level, defined as the county in which the turkey hunter actually hunted.

Release date: 2008-01-03

• Articles and reports: 12-001-X20070019850
Description:

Auxiliary information is often used to improve the precision of survey estimators of finite population means and totals through ratio or linear regression estimation techniques. Resulting estimators have good theoretical and practical properties, including invariance, calibration and design consistency. However, it is not always clear that ratio or linear models are good approximations to the true relationship between the auxiliary variables and the variable of interest in the survey, resulting in efficiency loss when the model is not appropriate. In this article, we explain how regression estimation can be extended to incorporate semiparametric regression models, in both simple and more complicated designs. While maintaining the good theoretical and practical properties of the linear models, semiparametric models are better able to capture complicated relationships between variables. This often results in substantial gains in efficiency. The applicability of the approach for complex designs using multiple types of auxiliary variables will be illustrated by estimating several acidification-related characteristics for a survey of lakes in the Northeastern US.

Release date: 2007-06-28

• Articles and reports: 12-001-X20050029052
Description:

Estimates of a sampling variance-covariance matrix are required in many statistical analyses, particularly for multilevel analysis. In univariate problems, functions relating the variance to the mean have been used to obtain variance estimates, pooling information across units or variables. We present variance and correlation functions for multivariate means of ordinal survey items, both for complete data and for data with structured non-response. Methods are also developed for assessing model fit, and for computing composite estimators that combine direct and model-based predictions. Survey data from the Consumer Assessments of Health Plans Study (CAHPS®) illustrate the application of the methodology.

Release date: 2006-02-17

• Articles and reports: 12-001-X20040016993
Description:

The weighting cell estimator corrects for unit nonresponse by dividing the sample into homogeneous groups (cells) and applying a ratio correction to the respondents within each cell. Previous studies of the statistical properties of weighting cell estimators have assumed that these cells correspond to known population cells with homogeneous characteristics. In this article, we study the properties of the weighting cell estimator under a response probability model that does not require correct specification of homogeneous population cells. Instead, we assume that the response probabilities are a smooth but otherwise unspecified function of a known auxiliary variable. Under this more general model, we study the robustness of the weighting cell estimator against model misspecification. We show that, even when the population cells are unknown, the estimator is consistent with respect to the sampling design and the response model. We describe the effect of the number of weighting cells on the asymptotic properties of the estimator. Simulation experiments explore the finite sample properties of the estimator. We conclude with some guidance on how to select the size and number of cells for practical implementation of weighting cell estimation when those cells cannot be specified a priori.

Release date: 2004-07-14

• Articles and reports: 12-001-X20000025538
Description:

Cochran (1977, p.374) proposed some ratio and regression estimators of the population mean using the Hansen and Hurwitz (1946) procedure of sub-sampling the non-respondents assuming that the population mean of the auxiliary character is known. For the case where the population mean of the auxiliary character is not known in advance, some double (two-phase) sampling ratio and regression estimators are presented in this article. The relative performances of the proposed estimators are compared with the estimator proposed by Hansen and Hurwitz (1946).

Release date: 2001-02-28

• Articles and reports: 12-001-X19990014710
Description:

Most statistical offices select the sample of commodities of which prices are collected for their Consumer Price Indexes with non-probability techniques. In the Netherlands, and in many other countries as well, those judgemental sampling methods come close to some kind of cut-off selection, in which a large part of the population (usually the items with the lowest expenditures) is deliberately left unobserved. This method obviously yields biased price index numbers. The question arises whether probability sampling would lead to better results in terms of the mean square error. We have considered simple random sampling, stratified sampling and systematic sampling proportional to expenditure. Monte Carlo simulations using scanner data on coffee, baby's napkins and toilet paper were carried out to assess the performance of the four sampling designs. Surprisingly perhaps, cut-off selection is shown to be a successful strategy for item sampling in the consumer price index.

Release date: 1999-10-08

• Articles and reports: 12-001-X19970013101
Description:

In the main body of statistics, sampling is often disposed of by assuming a sampling process that selects random variables such that they are independent and identically distributed (IID). Important techniques, like regression and contingency table analysis, were developed largely in the IID world; hence, adjustments are needed to use them in complex survey settings. Rather than adjust the analysis, however, what is new in the present formulation is to draw a second sample from the original sample. In this second sample, the first set of selections are inverted, so as to yield at the end a simple random sample. Of course, to employ this two-step process to draw a single simple random sample from the usually much larger complex survey would be inefficient, so multiple simple random samples are drawn and a way to base inferences on them developed. Not all original samples can be inverted; but many practical special cases are discussed which cover a wide range of practices.

Release date: 1997-08-18

• Articles and reports: 12-001-X199400114432
Description:

Two sampling strategies for estimation of population mean in overlapping clusters with known population size have been proposed by Singh (1988). In this paper, ratio estimators under these two strategies are studied assuming the actual population size to be unknown, which is the more realistic situation in sample surveys. The sampling efficiencies of the two strategies are compared and a numerical illustration is provided.

Release date: 1994-06-15

• Articles and reports: 12-001-X199300114471
Description:

Binomial-Poisson and Poisson-Poisson sampling are introduced for use in forest sampling. Several estimators of the population total are discussed for these designs. Simulation comparisons of the properties of the estimators were made for three small forestry populations. A modification of the standard estimator used for Poisson sampling and a new estimator, called a modified Srivastava estimator, appear to be most efficient. The latter is unfortunately badly biased for all 3 populations.

Release date: 1993-06-15

• Articles and reports: 12-001-X198700114513
Description:

Singh and Srivastava (1973) proposed a linear unbiased estimator of the population mean when sampling on successive occasions using several auxiliary variables whose known population means remain unchanged for all occasions. In this paper, three composite estimators T_1, T_2 and T_3, each utilising an auxiliary variable whose known population mean changes from one occasion to the next, are presented for the estimation of the current population total. The proposed estimators are compared with the ordinary estimator, T_0, and the usual successive sampling estimator, T \prime, of the current population total without the use of auxiliary information. We find that using auxiliary information in conjunction with successive sampling does not always uniformly produce a gain in efficiency over T_0 or T \prime. However, when applied to a survey of teak plantations to estimate the mean height of teak trees, T_1, T_2 and T_3 proved more efficient than T_0 and T \prime.

Release date: 1987-06-15

• Articles and reports: 12-001-X198500214401
Description:

This paper describes a method of producing current age/sex specific population estimates for small areas utilizing as inputs total population estimates, birth and death data and estimates of historical residual net migration. An evaluation based on the 1981 Census counts for census divisions and school districts in British Columbia is presented.

Release date: 1985-12-16

Reference (8)

## Reference (8) (8 of 8 results)

• Technical products: 11-522-X201700014741
Description:

Release date: 2016-03-24

• Technical products: 11-522-X201300014266
Description:

Monitors and self-reporting are two methods of measuring energy expended in physical activity, where monitor devices typically have much smaller error variances than do self-reports. The Physical Activity Measurement Survey was designed to compare the two procedures, using replicate observations on the same individual. The replicates permit calibrating the personal report measurement to the monitor measurement and make it possible to estimate components of the measurement error variances. Estimates of the variance components of measurement error in monitor-and self-report energy expenditure are given for females in the Physical Activity Measurement Survey.

Release date: 2014-10-31

• Technical products: 11-522-X200800010968
Description:

Statistics Canada has embarked on a program of increasing and improving the usage of imaging technology for paper survey questionnaires. The goal is to make the process an efficient, reliable and cost effective method of capturing survey data. The objective is to continue using Optical Character Recognition (OCR) to capture the data from questionnaires, documents and faxes received whilst improving the process integration and Quality Assurance/Quality Control (QC) of the data capture process. These improvements are discussed in this paper.

Release date: 2009-12-03

• Technical products: 11-536-X200900110810
Description:

Post-stratification is frequently used to improve the precision of survey estimators when categorical auxiliary information is available from sources outside the survey. In natural resource surveys, such information is often obtained from remote sensing data, classified into categories and displayed as pixel-based maps. These maps may be constructed based on classification models fitted to the sample data. Post-stratification of the sample data based on categories derived from the sample data ("endogenous post-stratification") violates several standard post-stratification assumptions, and has been generally considered invalid as a design-based estimation method. In this presentation, properties of the endogenous post-stratification estimator are derived for the case of a sample-fitted generalized linear model. Design consistency of the endogenous post-stratification estimator is established under mild conditions. Under a superpopulation model, consistency and asymptotic normality of the endogenous post-stratification estimator are established. Simulation experiments demonstrate that the practical effect of first fitting a model to the survey data before post-stratifying is small, even for relatively small sample sizes.

Release date: 2009-08-11

• Technical products: 11-522-X200600110446
Description:

Immigrants have health advantages over native-born Canadians, but those advantages are threatened by specific risk situations. This study explores cardiovascular health outcomes in districts of Montréal classified by the proportion of immigrants in the population, using a principal component analysis. The first three components are immigration, degree of socio-economic disadvantage and degree of economic disadvantage. The incidence of myocardial infarction is lower in districts with large immigrant populations than in districts dominated by native-born Canadians. Mortality rates are associated with the degree of socio-economic disadvantage, while revascularization is associated with the proportion of seniors in the population.

Release date: 2008-03-17

• Technical products: 11-522-X200600110418
Description:

The current use of multilevel models to examine the effects of surrounding contexts on health outcomes attest to their value as a statistical method for analyzing grouped data. But the use of multilevel modeling with data from population-based surveys is often limited by the small number of cases per level-2 unit, prompting a recent trend in the neighborhood literature to apply cluster analysis techniques to address the problem of data sparseness. In this paper we use Monte Carlo simulations to investigate the effects of marginal group sizes and cluster analysis techniques on the validity of parameter estimates in both linear and non-linear multilevel models.

Release date: 2008-03-17

• Technical products: 11-522-X200600110441
Description:

How does one efficiently estimate sample size while building concensus among multiple investigators for multi-purpose projects? We present a template using common spreadsheet software to provide estimates of power, precision, and financial costs under varying sampling scenarios, as used in development of the Ontario Tobacco Survey. In addition to cost estimates, complex sample size formulae were nested within a spreadsheet to determine power and precision, incorporating user-defined design effects and loss-to-followup. Common spreadsheet software can be used in conjunction with complex formulae to enhance knowledge exchange between the methodologists and stakeholders; in effect demystifying the "sample size black box".

Release date: 2008-03-17

• Technical products: 11-522-X19980015017
Description:

Longitudinal studies with repeated observations on individuals permit better characterizations of change and assessment of possible risk factors, but there has been little experience applying sophisticated models for longitudinal data to the complex survey setting. We present results from a comparison of different variance estimation methods for random effects models of change in cognitive function among older adults. The sample design is a stratified sample of people 65 and older, drawn as part of a community-based study designed to examine risk factors for dementia. The model summarizes the population heterogeneity in overall level and rate of change in cognitive function using random effects for intercept and slope. We discuss an unweighted regression including covariates for the stratification variables, a weighted regression, and bootstrapping; we also did preliminary work into using balanced repeated replication and jackknife repeated replication.

Release date: 1999-10-22

Date modified: