Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

39 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

39 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (51)

All (51) (25 of 51 results)

  • Technical products: 11-522-X201700014720
    Description:

    This paper is intended to give a brief overview of Statistics Canada’s involvement with open data. It will first discuss how the principles of open data are being adopted in the agency’s ongoing dissemination practices. It will then discuss the agency’s involvement with the whole of government open data initiative. This involvement is twofold: Statistics Canada is the major data contributor to the Government of Canada Open Data portal, but also plays an important behind the scenes role as the service provider responsible for developing and maintaining the Open Data portal (which is now part of the wider Open Government portal.)

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014734
    Description:

    Data protection and privacy are key challenges that need to be tackled with high priority in order to enable the use of Big Data in the production of Official Statistics. This was emphasized in 2013 by the Directors of National Statistical Insitutes (NSIs) of the European Statistical System Committee (ESSC) in the Scheveningen Memorandum. The ESSC requested Eurostat and the NSIs to elaborate an action plan with a roadmap for following up the implementation of the Memorandum. At the Riga meeting on September 26, 2014, the ESSC endorsed the Big Data Action Plan and Roadmap 1.0 (BDAR) presented by the Eurostat Task Force on Big Data (TFBD) and agreed to integrate it into the ESS Vision 2020 portfolio. Eurostat also collaborates in this field with external partners such as the United Nations Economic Commission for Europe (UNECE). The big data project of the UNECE High-Level Group is an international project on the role of big data in the modernization of statistical production. It comprised four ‘task teams’ addressing different aspects of Big Data issues relevant for official statistics: Privacy, Partnerships, Sandbox, and Quality. The Privacy Task Team finished its work in 2014 and gave an overview of the existing tools for risk management regarding privacy issues, described how risk of identification relates to Big Data characteristics and drafted recommendations for National Statistical Offices (NSOs). It mainly concluded that extensions to existing frameworks, including use of new technologies were needed in order to deal with privacy risks related to the use of Big Data. The BDAR builds on the work achieved by the UNECE task teams. Specifically, it recognizes that a number of big data sources contain sensitive information, that their use for official statistics may induce negative perceptions with the general public and other stakeholders and that this risk should be mitigated in the short to medium term. It proposes to launch multiple actions like e.g., an adequate review on ethical principles governing the roles and activities of the NSIs and a strong communication strategy. The paper presents the different actions undertaken within the ESS and in collaboration with UNECE, as well as potential technical and legal solutions to be put in place in order to address the data protection and privacy risks in the use of Big Data for Official Statistics.

    Release date: 2016-03-24

  • Articles and reports: 82-003-X201501114243
    Description:

    A surveillance tool was developed to assess dietary intake collected by surveys in relation to Eating Well with Canada’s Food Guide (CFG). The tool classifies foods in the Canadian Nutrient File (CNF) according to how closely they reflect CFG. This article describes the validation exercise conducted to ensure that CNF foods determined to be “in line with CFG” were appropriately classified.

    Release date: 2015-11-18

  • Articles and reports: 12-001-X201500114200
    Description:

    We consider the observed best prediction (OBP; Jiang, Nguyen and Rao 2011) for small area estimation under the nested-error regression model, where both the mean and variance functions may be misspecified. We show via a simulation study that the OBP may significantly outperform the empirical best linear unbiased prediction (EBLUP) method not just in the overall mean squared prediction error (MSPE) but also in the area-specific MSPE for every one of the small areas. A bootstrap method is proposed for estimating the design-based area-specific MSPE, which is simple and always produces positive MSPE estimates. The performance of the proposed MSPE estimator is evaluated through a simulation study. An application to the Television School and Family Smoking Prevention and Cessation study is considered.

    Release date: 2015-06-29

  • Technical products: 11-522-X201300014267
    Description:

    Statistics Sweden has, like many other National Statistical Institutes (NSIs), a long history of working with quality. More recently, the agency decided to start using a number of frameworks to address organizational, process and product quality. It is important to consider all three levels, since we know that the way we do things, e.g., when asking questions, affects product quality and therefore process quality is an important part of the quality concept. Further, organizational quality, i.e., systematically managing aspects such as training of staff and leadership, is fundamental for achieving process quality. Statistics Sweden uses EFQM (European Foundation for Quality Management) as a framework for organizational quality and ISO 20252 for market, opinion and social research as a standard for process quality. In April 2014, as the first National Statistical Institute, Statistics Sweden was certified according to the ISO 20252. One challenge that Statistics Sweden faced in 2011 was to systematically measure and monitor changes in product quality and to clearly present them to stakeholders. Together with external consultants, Paul Biemer and Dennis Trewin, Statistics Sweden developed a tool for this called ASPIRE (A System for Product Improvement, Review and Evaluation). To assure that quality is maintained and improved, Statistics Sweden has also built an organization for quality comprising a quality manager, quality coaches, and internal and external quality auditors. In this paper I will present the components of Statistics Sweden’s quality management system and some of the challenges we have faced.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014287
    Description:

    The purpose of the EpiNano program is to monitor workers who may be exposed to intentionally produced nanomaterials in France. This program is based both on industrial hygiene data collected in businesses for the purpose of gauging exposure to nanomaterials at workstations and on data from self-administered questionnaires completed by participants. These data will subsequently be matched with health data from national medical-administrative databases (passive monitoring of health events). Follow-up questionnaires will be sent regularly to participants. This paper describes the arrangements for optimizing data collection and matching.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014286
    Description:

    The Étude Longitudinale Française depuis l’Enfance (ELFE) [French longitudinal study from childhood on], which began in 2011, involves over 18,300 infants whose parents agreed to participate when they were in the maternity hospital. This cohort survey, which will track the children from birth to adulthood, covers the many aspects of their lives from the perspective of social science, health and environmental health. In randomly selected maternity hospitals, all infants in the target population, who were born on one of 25 days distributed across the four seasons, were chosen. This sample is the outcome of a non-standard sampling scheme that we call product sampling. In this survey, it takes the form of the cross-tabulation between two independent samples: a sampling of maternity hospitals and a sampling of days. While it is easy to imagine a cluster effect due to the sampling of maternity hospitals, one can also imagine a cluster effect due to the sampling of days. The scheme’s time dimension therefore cannot be ignored if the desired estimates are subject to daily or seasonal variation. While this non-standard scheme can be viewed as a particular kind of two-phase design, it needs to be defined within a more specific framework. Following a comparison of the product scheme with a conventional two-stage design, we propose variance estimators specially formulated for this sampling scheme. Our ideas are illustrated with a simulation study.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014260
    Description:

    The Survey of Employment, Payrolls and Hours (SEPH) produces monthly estimates and determines the month-to-month changes for variables such as employment, earnings and hours at detailed industrial levels for Canada, the provinces and territories. In order to improve the efficiency of collection activities for this survey, an electronic questionnaire (EQ) was introduced in the fall of 2012. Given the timeframe allowed for this transition as well as the production calendar of the survey, a conversion strategy was developed for the integration of this new mode. The goal of the strategy was to ensure a good adaptation of the collection environment and also to allow the implementation of a plan of analysis that would evaluate the impact of this change on the results of the survey. This paper will give an overview of the conversion strategy, the different adjustments that were made during the transition period and the results of various evaluations that were conducted. For example, the impact of the integration of the EQ on the collection process, the response rate and the follow-up rate will be presented. In addition, the effect that this new collection mode has on the survey estimates will also be discussed. More specifically, the results of a randomized experiment that was conducted in order to determine the presence of a mode effect will be presented.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014255
    Description:

    The Brazilian Network Information Center (NIC.br) has designed and carried out a pilot project to collect data from the Web in order to produce statistics about the webpages’ characteristics. Studies on the characteristics and dimensions of the web require collecting and analyzing information from a dynamic and complex environment. The core idea was collecting data from a sample of webpages automatically by using software known as web crawler. The motivation for this paper is to disseminate the methods and results of this study as well as to show current developments related to sampling techniques in a dynamic environment.

    Release date: 2014-10-31

  • Articles and reports: 12-001-X201300211888
    Description:

    When the study variables are functional and storage capacities are limited or transmission costs are high, using survey techniques to select a portion of the observations of the population is an interesting alternative to using signal compression techniques. In this context of functional data, our focus in this study is on estimating the mean electricity consumption curve over a one-week period. We compare different estimation strategies that take account of a piece of auxiliary information such as the mean consumption for the previous period. The first strategy consists in using a simple random sampling design without replacement, then incorporating the auxiliary information into the estimator by introducing a functional linear model. The second approach consists in incorporating the auxiliary information into the sampling designs by considering unequal probability designs, such as stratified and pi designs. We then address the issue of constructing confidence bands for these estimators of the mean. When effective estimators of the covariance function are available and the mean estimator satisfies a functional central limit theorem, it is possible to use a fast technique for constructing confidence bands, based on the simulation of Gaussian processes. This approach is compared with bootstrap techniques that have been adapted to take account of the functional nature of the data.

    Release date: 2014-01-15

  • Articles and reports: 82-003-X201100311534
    Description:

    Using data from the 2007 to 2009 Canadian Health Measures Survey, this study investigates the bias that exists when height, weight and body mass index are based on parent-reported values. Factors associated with reporting error are used to establish the feasibility of developing correction equations to adjust parent-reported estimates.

    Release date: 2011-08-17

  • Articles and reports: 82-003-X201100311533
    Description:

    This study compares the bias in self-reported height, weight and body mass index in the 2008 and 2005 Canadian Community Health Surveys and the 2007 to 2009 Canadian Health Measures Survey. The feasibility of using correction equations to adjust self-reported 2008 Canadian Community Health Survey values to more closely approximate measured values is assessed.

    Release date: 2011-08-17

  • Articles and reports: 12-001-X201000211378
    Description:

    One key to poverty alleviation or eradication in the third world is reliable information on the poor and their location, so that interventions and assistance can be effectively targeted to the neediest people. Small area estimation is one statistical technique that is used to monitor poverty and to decide on aid allocation in pursuit of the Millennium Development Goals. Elbers, Lanjouw and Lanjouw (ELL) (2003) proposed a small area estimation methodology for income-based or expenditure-based poverty measures, which is implemented by the World Bank in its poverty mapping projects via the involvement of the central statistical agencies in many third world countries, including Cambodia, Lao PDR, the Philippines, Thailand and Vietnam, and is incorporated into the World Bank software program PovMap. In this paper, the ELL methodology which consists of first modeling survey data and then applying that model to census information is presented and discussed with strong emphasis on the first phase, i.e., the fitting of regression models and on the estimated standard errors at the second phase. Other regression model fitting procedures such as the General Survey Regression (GSR) (as described in Lohr (1999) Chapter 11) and those used in existing small area estimation techniques: Pseudo-Empirical Best Linear Unbiased Prediction (Pseudo-EBLUP) approach (You and Rao 2002) and Iterative Weighted Estimating Equation (IWEE) method (You, Rao and Kovacevic 2003) are presented and compared with the ELL modeling strategy. The most significant difference between the ELL method and the other techniques is in the theoretical underpinning of the ELL model fitting procedure. An example based on the Philippines Family Income and Expenditure Survey is presented to show the differences in both the parameter estimates and their corresponding standard errors, and in the variance components generated from the different methods and the discussion is extended to the effect of these on the estimated accuracy of the final small area estimates themselves. The need for sound estimation of variance components, as well as regression estimates and estimates of their standard errors for small area estimation of poverty is emphasized.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X201000111244
    Description:

    This paper considers the problem of selecting nonparametric models for small area estimation, which recently have received much attention. We develop a procedure based on the idea of fence method (Jiang, Rao, Gu and Nguyen 2008) for selecting the mean function for the small areas from a class of approximating splines. Simulation results show impressive performance of the new procedure even when the number of small areas is fairly small. The method is applied to a hospital graft failure dataset for selecting a nonparametric Fay-Herriot type model.

    Release date: 2010-06-29

  • Articles and reports: 12-001-X200900211045
    Description:

    In analysis of sample survey data, degrees-of-freedom quantities are often used to assess the stability of design-based variance estimators. For example, these degrees-of-freedom values are used in construction of confidence intervals based on t distribution approximations; and of related t tests. In addition, a small degrees-of-freedom term provides a qualitative indication of the possible limitations of a given variance estimator in a specific application. Degrees-of-freedom calculations sometimes are based on forms of the Satterthwaite approximation. These Satterthwaite-based calculations depend primarily on the relative magnitudes of stratum-level variances. However, for designs involving a small number of primary units selected per stratum, standard stratum-level variance estimators provide limited information on the true stratum variances. For such cases, customary Satterthwaite-based calculations can be problematic, especially in analyses for subpopulations that are concentrated in a relatively small number of strata. To address this problem, this paper uses estimated within-primary-sample-unit (within PSU) variances to provide auxiliary information regarding the relative magnitudes of the overall stratum-level variances. Analytic results indicate that the resulting degrees-of-freedom estimator will be better than modified Satterthwaite-type estimators provided: (a) the overall stratum-level variances are approximately proportional to the corresponding within-stratum variances; and (b) the variances of the within-PSU variance estimators are relatively small. In addition, this paper develops errors-in-variables methods that can be used to check conditions (a) and (b) empirically. For these model checks, we develop simulation-based reference distributions, which differ substantially from reference distributions based on customary large-sample normal approximations. The proposed methods are applied to four variables from the U.S. Third National Health and Nutrition Examination Survey (NHANES III).

    Release date: 2009-12-23

  • Technical products: 11-522-X200800010954
    Description:

    Over the past year, Statistics Canada has been developing and testing a new way to monitor the performance of interviewers conducting computer-assisted personal interviews (CAPI). A formal process already exists for monitoring centralized telephone interviews. Monitors listen to telephone interviews as they take place to assess the interviewer's performance using pre-defined criteria and provide feedback to the interviewer on what was well done and what needs improvement. For the CAPI program, we have developed and are testing a pilot approach whereby interviews are digitally recorded and later a monitor listens to these recordings to assess the field interviewer's performance and provide feedback in order to help improve the quality of the data. In this paper, we will present an overview of the CAPI monitoring project at Statistics Canada by describing the CAPI monitoring methodology and the plans for implementation.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010946
    Description:

    In the mid 1990s the first question testing unit was set-up in the UK Office for National Statistics (ONS). The key objective of the unit was to develop and test the questions and questionnaire for the 2001 Census. Since the establishment of this unit the area has been expanded into a Data Collection Methodology (DCM) Centre of Expertise which now sits in the Methodology Directorate. The DCM centre has three branches which support DCM work for social surveys, business surveys, the Census and external organisations.

    In the past ten years DCM has achieved a variety of things. For example, introduced survey methodology involvement in the development and testing of business survey question(naire)s; introduced a mix-method approach to the development of questions and questionnaires; developed and implemented standards e.g. for the 2011 census questionnaire & showcards; and developed and delivered DCM training events.

    This paper will provide an overview of data collection methodology at the ONS from the perspective of achievements and challenges. It will cover areas such as methods, staff (e.g. recruitment, development and field security), and integration with the survey process.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011006
    Description:

    The Office for National Statistics (ONS) has an obligation to measure and annually report on the burden that it places on businesses participating in its surveys. There are also targets for reduction of costs to businesses complying with government regulation as part of the 2005 Administrative Burdens Reduction Project (ABRP) coordinated by the Better Regulation Executive (BRE).

    Respondent burden is measured by looking at the economic costs to businesses. Over time the methodology for measuring this economic cost has changed with the most recent method being the development and piloting of a Standard Cost Model (SCM) approach.

    The SCM is commonly used in Europe and is focused on measuring objective administrative burdens for all government requests for information e.g. tax returns, VAT, as well as survey participation. This method was not therefore specifically developed to measure statistical response burden. The SCM methodology is activity-based, meaning that the costs and time taken to fulfil requirements are broken down by activity.

    The SCM approach generally collects data using face-to-face interviews. The approach is therefore labour intensive both from a collection and analysis perspective but provides in depth information. The approach developed and piloted at ONS uses paper self-completion questionnaires.

    The objective of this paper is to provide an overview of respondent burden reporting and targets; and to review the different methodologies that ONS has used to measure respondent burden from the perspectives of sampling, data collection, analysis and usability.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010995
    Description:

    In 1992, a paper entitled "The Optimum Time at which to Conduct Survey Interviews" sought to illustrate the economic benefits to market research organisations in structuring the calling pattern of interviewers in household surveys. The findings were based on the Welsh Inter Censal Survey in 1986. This paper now brings additional information on the calling patterns of interviewers from similar surveys in 1997 and 2006 to ascertain whether the calling patterns of interviewers have changed. The results also examine the importance of having a survey response that is representative of the population, and how efficient calling strategies can help achieve this.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010953
    Description:

    As survey researchers attempt to maintain traditionally high response rates, reluctant respondents have resulted in increasing data collection costs. This respondent reluctance may be related to the amount of time it takes to complete an interview in large-scale, multi-purpose surveys, such as the National Survey of Recent College Graduates (NSRCG). Recognizing that respondent burden or questionnaire length may contribute to lower response rates, in 2003, following several months of data collection under the standard data collection protocol, the NSRCG offered its nonrespondents monetary incentives about two months before the end of the data collection,. In conjunction with the incentive offer, the NSRCG also offered persistent nonrespondents an opportunity to complete a much-abbreviated interview consisting of a few critical items. The late respondents who completed the interviews as the result of the incentive and critical items-only questionnaire offers may provide some insight into the issue of nonresponse bias and the likelihood that such interviewees would have remained survey nonrespondents if these refusal conversion efforts had not been made.

    In this paper, we define "reluctant respondents" as those who responded to the survey only after extra efforts were made beyond the ones initially planned in the standard data collection protocol. Specifically, reluctant respondents in the 2003 NSRCG are those who responded to the regular or shortened questionnaire following the incentive offer. Our conjecture was that the behavior of the reluctant respondents would be more like that of nonrespondents than of respondents to the surveys. This paper describes an investigation of reluctant respondents and the extent to which they are different from regular respondents. We compare different response groups on several key survey estimates. This comparison will expand our understanding of nonresponse bias in the NSRCG, and of the characteristics of nonrespondents themselves, thus providing a basis for changes in the NSRCG weighting system or estimation procedures in the future.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010996
    Description:

    In recent years, the use of paradata has become increasingly important to the management of collection activities at Statistics Canada. Particular attention has been paid to social surveys conducted over the phone, like the Survey of Labour and Income Dynamics (SLID). For recent SLID data collections, the number of call attempts was capped at 40 calls. Investigations of the SLID Blaise Transaction History (BTH) files were undertaken to assess the impact of the cap on calls.The purpose of the first study was to inform decisions as to the capping of call attempts, the second study focused on the nature of nonresponse given the limit of 40 attempts.

    The use of paradata as auxiliary information for studying and accounting for survey nonresponse was also examined. Nonresponse adjustment models using different paradata variables gathered at the collection stage were compared to the current models based on available auxiliary information from the Labour Force Survey.

    Release date: 2009-12-03

  • Articles and reports: 12-001-X200800110607
    Description:

    Respondent incentives are increasingly used as a measure of combating falling response rates and resulting risks of nonresponse bias. Nonresponse in panel surveys is particularly problematic, since even low wave-on-wave nonresponse rates can lead to substantial cumulative losses; if nonresponse is differential, this may lead to increasing bias across waves. Although the effects of incentives have been studied extensively in cross-sectional contexts, little is known about cumulative effects across waves of a panel. We provide new evidence about the effects of continued incentive payments on attrition, bias and item nonresponse, using data from a large scale, multi-wave, mixed mode incentive experiment on a UK government panel survey of young people. In this study, incentives significantly reduced attrition, far outweighing negative effects on item response rates in terms of the amount of information collected by the survey per issued case. Incentives had proportionate effects on retention rates across a range of respondent characteristics and as a result did not reduce attrition bias in terms of those characteristics. The effects of incentives on retention rates were larger for unconditional than conditional incentives and larger in postal than telephone mode. Across waves, the effects on attrition decreased somewhat, although the effects on item nonresponse and the lack of effect on bias remained constant. The effects of incentives at later waves appeared to be independent of incentive treatments and mode of data collection at earlier waves.

    Release date: 2008-06-26

  • Technical products: 11-522-X200600110453
    Description:

    National Food and Nutrition Surveys provide critical information to support the understanding the complex relationship between health and diet in the population. Many of these surveys use 24 hour recall methodology which collects at a detailed level all food and beverages consumed over a day. Often it is the longer term intake of foods and nutrients that is of interest and a number of techniques are available that allow estimation of population usual intakes. These techniques require that at least one repeat 24 hour recall be collected from at least a subset of the population in order to estimate the intra individual variability of intakes. Deciding on the number of individuals required to provide a repeat is an important step in the survey design that must recognize that too few repeat individuals compromises the ability to estimate usual intakes, but large numbers of repeats are costly and pose added burden to the respondents. This paper looks at the statistical issues related to the number of repeat individuals, assessing the impact of the number of repeaters on the stability and uncertainty in the estimate of intra individual variability and provides guidance on required number of repeat responders .

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110413
    Description:

    The National Health and Nutrition Examination Survey (NHANES) has been conducted by the National Center for Health Statistics for over forty years. The survey collects information on the health and nutritional status of the United States population using in-person interviews and standardized physical examinations conducted in mobile examination centers. During the course of these forty years, numerous lessons have been learned about the conduct of a survey using direct physical measures. Examples of these "lessons learned" are described and provide a guide for other organizations and countries as they plan similar surveys.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110449
    Description:

    Traditionally administrative hospital discharge databases have been mainly used for administrative purposes. Recently, health services researchers and population health researchers have been using the databases for a wide variety of studies; in particular health care outcomes. Tools, such as comorbidity indexes, have been developed to facilitate this analysis. Every time the coding system for diagnoses and procedures is revised or a new one is developed, these comorbidity indexes need to be updated. These updates are important in maintaining consistency when trends are examined over time.

    Release date: 2008-03-17

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (19)

Analysis (19) (19 of 19 results)

  • Articles and reports: 82-003-X201501114243
    Description:

    A surveillance tool was developed to assess dietary intake collected by surveys in relation to Eating Well with Canada’s Food Guide (CFG). The tool classifies foods in the Canadian Nutrient File (CNF) according to how closely they reflect CFG. This article describes the validation exercise conducted to ensure that CNF foods determined to be “in line with CFG” were appropriately classified.

    Release date: 2015-11-18

  • Articles and reports: 12-001-X201500114200
    Description:

    We consider the observed best prediction (OBP; Jiang, Nguyen and Rao 2011) for small area estimation under the nested-error regression model, where both the mean and variance functions may be misspecified. We show via a simulation study that the OBP may significantly outperform the empirical best linear unbiased prediction (EBLUP) method not just in the overall mean squared prediction error (MSPE) but also in the area-specific MSPE for every one of the small areas. A bootstrap method is proposed for estimating the design-based area-specific MSPE, which is simple and always produces positive MSPE estimates. The performance of the proposed MSPE estimator is evaluated through a simulation study. An application to the Television School and Family Smoking Prevention and Cessation study is considered.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201300211888
    Description:

    When the study variables are functional and storage capacities are limited or transmission costs are high, using survey techniques to select a portion of the observations of the population is an interesting alternative to using signal compression techniques. In this context of functional data, our focus in this study is on estimating the mean electricity consumption curve over a one-week period. We compare different estimation strategies that take account of a piece of auxiliary information such as the mean consumption for the previous period. The first strategy consists in using a simple random sampling design without replacement, then incorporating the auxiliary information into the estimator by introducing a functional linear model. The second approach consists in incorporating the auxiliary information into the sampling designs by considering unequal probability designs, such as stratified and pi designs. We then address the issue of constructing confidence bands for these estimators of the mean. When effective estimators of the covariance function are available and the mean estimator satisfies a functional central limit theorem, it is possible to use a fast technique for constructing confidence bands, based on the simulation of Gaussian processes. This approach is compared with bootstrap techniques that have been adapted to take account of the functional nature of the data.

    Release date: 2014-01-15

  • Articles and reports: 82-003-X201100311534
    Description:

    Using data from the 2007 to 2009 Canadian Health Measures Survey, this study investigates the bias that exists when height, weight and body mass index are based on parent-reported values. Factors associated with reporting error are used to establish the feasibility of developing correction equations to adjust parent-reported estimates.

    Release date: 2011-08-17

  • Articles and reports: 82-003-X201100311533
    Description:

    This study compares the bias in self-reported height, weight and body mass index in the 2008 and 2005 Canadian Community Health Surveys and the 2007 to 2009 Canadian Health Measures Survey. The feasibility of using correction equations to adjust self-reported 2008 Canadian Community Health Survey values to more closely approximate measured values is assessed.

    Release date: 2011-08-17

  • Articles and reports: 12-001-X201000211378
    Description:

    One key to poverty alleviation or eradication in the third world is reliable information on the poor and their location, so that interventions and assistance can be effectively targeted to the neediest people. Small area estimation is one statistical technique that is used to monitor poverty and to decide on aid allocation in pursuit of the Millennium Development Goals. Elbers, Lanjouw and Lanjouw (ELL) (2003) proposed a small area estimation methodology for income-based or expenditure-based poverty measures, which is implemented by the World Bank in its poverty mapping projects via the involvement of the central statistical agencies in many third world countries, including Cambodia, Lao PDR, the Philippines, Thailand and Vietnam, and is incorporated into the World Bank software program PovMap. In this paper, the ELL methodology which consists of first modeling survey data and then applying that model to census information is presented and discussed with strong emphasis on the first phase, i.e., the fitting of regression models and on the estimated standard errors at the second phase. Other regression model fitting procedures such as the General Survey Regression (GSR) (as described in Lohr (1999) Chapter 11) and those used in existing small area estimation techniques: Pseudo-Empirical Best Linear Unbiased Prediction (Pseudo-EBLUP) approach (You and Rao 2002) and Iterative Weighted Estimating Equation (IWEE) method (You, Rao and Kovacevic 2003) are presented and compared with the ELL modeling strategy. The most significant difference between the ELL method and the other techniques is in the theoretical underpinning of the ELL model fitting procedure. An example based on the Philippines Family Income and Expenditure Survey is presented to show the differences in both the parameter estimates and their corresponding standard errors, and in the variance components generated from the different methods and the discussion is extended to the effect of these on the estimated accuracy of the final small area estimates themselves. The need for sound estimation of variance components, as well as regression estimates and estimates of their standard errors for small area estimation of poverty is emphasized.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X201000111244
    Description:

    This paper considers the problem of selecting nonparametric models for small area estimation, which recently have received much attention. We develop a procedure based on the idea of fence method (Jiang, Rao, Gu and Nguyen 2008) for selecting the mean function for the small areas from a class of approximating splines. Simulation results show impressive performance of the new procedure even when the number of small areas is fairly small. The method is applied to a hospital graft failure dataset for selecting a nonparametric Fay-Herriot type model.

    Release date: 2010-06-29

  • Articles and reports: 12-001-X200900211045
    Description:

    In analysis of sample survey data, degrees-of-freedom quantities are often used to assess the stability of design-based variance estimators. For example, these degrees-of-freedom values are used in construction of confidence intervals based on t distribution approximations; and of related t tests. In addition, a small degrees-of-freedom term provides a qualitative indication of the possible limitations of a given variance estimator in a specific application. Degrees-of-freedom calculations sometimes are based on forms of the Satterthwaite approximation. These Satterthwaite-based calculations depend primarily on the relative magnitudes of stratum-level variances. However, for designs involving a small number of primary units selected per stratum, standard stratum-level variance estimators provide limited information on the true stratum variances. For such cases, customary Satterthwaite-based calculations can be problematic, especially in analyses for subpopulations that are concentrated in a relatively small number of strata. To address this problem, this paper uses estimated within-primary-sample-unit (within PSU) variances to provide auxiliary information regarding the relative magnitudes of the overall stratum-level variances. Analytic results indicate that the resulting degrees-of-freedom estimator will be better than modified Satterthwaite-type estimators provided: (a) the overall stratum-level variances are approximately proportional to the corresponding within-stratum variances; and (b) the variances of the within-PSU variance estimators are relatively small. In addition, this paper develops errors-in-variables methods that can be used to check conditions (a) and (b) empirically. For these model checks, we develop simulation-based reference distributions, which differ substantially from reference distributions based on customary large-sample normal approximations. The proposed methods are applied to four variables from the U.S. Third National Health and Nutrition Examination Survey (NHANES III).

    Release date: 2009-12-23

  • Articles and reports: 12-001-X200800110607
    Description:

    Respondent incentives are increasingly used as a measure of combating falling response rates and resulting risks of nonresponse bias. Nonresponse in panel surveys is particularly problematic, since even low wave-on-wave nonresponse rates can lead to substantial cumulative losses; if nonresponse is differential, this may lead to increasing bias across waves. Although the effects of incentives have been studied extensively in cross-sectional contexts, little is known about cumulative effects across waves of a panel. We provide new evidence about the effects of continued incentive payments on attrition, bias and item nonresponse, using data from a large scale, multi-wave, mixed mode incentive experiment on a UK government panel survey of young people. In this study, incentives significantly reduced attrition, far outweighing negative effects on item response rates in terms of the amount of information collected by the survey per issued case. Incentives had proportionate effects on retention rates across a range of respondent characteristics and as a result did not reduce attrition bias in terms of those characteristics. The effects of incentives on retention rates were larger for unconditional than conditional incentives and larger in postal than telephone mode. Across waves, the effects on attrition decreased somewhat, although the effects on item nonresponse and the lack of effect on bias remained constant. The effects of incentives at later waves appeared to be independent of incentive treatments and mode of data collection at earlier waves.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X20070019850
    Description:

    Auxiliary information is often used to improve the precision of survey estimators of finite population means and totals through ratio or linear regression estimation techniques. Resulting estimators have good theoretical and practical properties, including invariance, calibration and design consistency. However, it is not always clear that ratio or linear models are good approximations to the true relationship between the auxiliary variables and the variable of interest in the survey, resulting in efficiency loss when the model is not appropriate. In this article, we explain how regression estimation can be extended to incorporate semiparametric regression models, in both simple and more complicated designs. While maintaining the good theoretical and practical properties of the linear models, semiparametric models are better able to capture complicated relationships between variables. This often results in substantial gains in efficiency. The applicability of the approach for complex designs using multiple types of auxiliary variables will be illustrated by estimating several acidification-related characteristics for a survey of lakes in the Northeastern US.

    Release date: 2007-06-28

  • Articles and reports: 12-001-X20060029555
    Description:

    Researchers and policy makers often use data from nationally representative probability sample surveys. The number of topics covered by such surveys, and hence the amount of interviewing time involved, have typically increased over the years, resulting in increased costs and respondent burden. A potential solution to this problem is to carefully form subsets of the items in a survey and administer one such subset to each respondent. Designs of this type are called "split-questionnaire" designs or "matrix sampling" designs. The administration of only a subset of the survey items to each respondent in a matrix sampling design creates what can be considered missing data. Multiple imputation (Rubin 1987), a general-purpose approach developed for handling data with missing values, is appealing for the analysis of data from a matrix sample, because once the multiple imputations are created, data analysts can apply standard methods for analyzing complete data from a sample survey. This paper develops and evaluates a method for creating matrix sampling forms, each form containing a subset of items to be administered to randomly selected respondents. The method can be applied in complex settings, including situations in which skip patterns are present. Forms are created in such a way that each form includes items that are predictive of the excluded items, so that subsequent analyses based on multiple imputation can recover some of the information about the excluded items that would have been collected had there been no matrix sampling. The matrix sampling and multiple-imputation methods are evaluated using data from the National Health and Nutrition Examination Survey, one of many nationally representative probability sample surveys conducted by the National Center for Health Statistics, Centers for Disease Control and Prevention. The study demonstrates the feasibility of the approach applied to a major national health survey with complex structure, and it provides practical advice about appropriate items to include in matrix sampling designs in future surveys.

    Release date: 2006-12-21

  • Articles and reports: 12-001-X20050029044
    Description:

    Complete data methods for estimating the variances of survey estimates are biased when some data are imputed. This paper uses simulation to compare the performance of the model-assisted, the adjusted jackknife, and the multiple imputation methods for estimating the variance of a total when missing items have been imputed using hot deck imputation. The simulation studies the properties of the variance estimates for imputed estimates of totals for the full population and for domains from a single-stage disproportionate stratified sample design when underlying assumptions, such as unbiasedness of the point estimate and item responses being randomly missing within hot deck cells, do not hold. The variance estimators for full population estimates produce confidence intervals with coverage rates near the nominal level even under modest departures from the assumptions, but this finding does not apply for the domain estimates. Coverage is most sensitive to bias in the point estimates. As the simulation demonstrates, even if an imputation method gives almost unbiased estimates for the full population, estimates for domains may be very biased.

    Release date: 2006-02-17

  • Articles and reports: 12-001-X20030016611
    Description:

    Optimal and approximately optimal fixed-cost Bayesian sampling designs are considered for simultaneous estimation in independent homogeneous Poisson processes. General allocation formulae are developed for a basic Poisson-Gamma model and these are compared with more traditional allocation methods. Techniques for finding representative gamma priors under more general hierarchical models are also discussed. The techniques show that, in many practical situations, these gamma priors provide reasonable approximations to the hierarchical prior and Bayes risk. The methods developed are general enough to apply to a wide variety of models and are not limited to Poisson processes.

    Release date: 2003-07-31

  • Articles and reports: 12-001-X19970013107
    Description:

    Often one of the key objectives of multi-purpose demographic surveys in the U.S. is to produce estimates for small domains of the population such as race, ethnicity, and income. Geographic-based oversampling is one of the techniques often considered for improving the reliability of the small domain statistics using block or block group information from the Bureau of the Census to identify areas where the small domains are concentrated. This paper reviews the issues involved in oversampling geographical areas in conjunction with household screening to improve the precision of small domain estimates. The results from an empirical evaluation of the variance reduction from geographic-based oversampling are given along with an assessment of the robustness of the sampling efficiency over time as information for stratification becomes out of date. The simultaneous oversampling of several small domains is also discussed.

    Release date: 1997-08-18

  • Articles and reports: 12-001-X19960022979
    Description:

    This paper empirically compares three estimation methods - regression, restricted regression, and principal person - used in a household survey of consumer expenditures. The three methods are applied to post-stratification which is important in many household surveys to adjust for under-coverage of the target population. Post-stratum population counts are typically available from an external census for numbers of persons but not for numbers of households. If household estimates are needed, a single weight must be assigned to each household while using the person counts for post-stratification. This is easily accomplished with regression estimators of totals or means by using person counts in each household's auxiliary data. Restricted regression estimation refines the weights by controlling extremes and can produce estimators with lower variance than Horvitz-Thompson estimators while still adhering to the population controls. The regression methods also allow controls to be used for both person-level counts and quantitative auxiliaries. With the principal person method, persons are classified into post-strata and person weights are ratio adjusted to achieve population control totals. This leads to each person in a household potentially having a different weight. The weight associated with the "principal person" is then selected as the household weight. We will compare estimated means from the three methods and their estimated standard errors for a number of expenditures from the Consumer Expenditure survey sponsored by the U.S. Bureau of Labor Statistics.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X19960022982
    Description:

    In work with sample surveys, we often use estimators of the variance components associated with sampling within and between primary sample units. For these applications, it can be important to have some indication of whether the variance component estimators are stable, i.e., have relatively low variance. This paper discusses several data-based measures of the stability of design-based variance component estimators and related quantities. The development emphasizes methods that can be applied to surveys with moderate or large numbers of strata and small numbers of primary sample units per stratum. We direct principal attention toward the design variance of a within-PSU variance estimator, and two related degrees-of-freedom terms. A simulation-based method allows one to assess whether an observed stability measure is consistent with standard assumptions regarding variance estimator stability. We also develop two sets of stability measures for design-based estimators of between-PSU variance components and the ratio of the overall variance to the within-PSU variance. The proposed methods are applied to interview and examination data from the U.S. Third National Health and Nutrition Examination Survey (NHANES III). These results indicate that the true stability properties may vary substantially across variables. In addition, for some variables, within-PSU variance estimators appear to be considerably less stable than one would anticipate from a simple count of secondary units within each stratum.

    Release date: 1997-01-30

  • Articles and reports: 12-001-X199500114416
    Description:

    Stanley Warner was widely known for the creation of the randomized response technique for asking sensitive questions in surveys. Over almost two decades he also formulated and developed statistical methodology for another problem, that of deriving balanced information in advocacy settings so that both positions regarding a policy issue can be fairly and adequately represented. We review this work, including two survey applications implemented by Warner in which he applied the methodology, and we set the ideas into the context of current methodological thinking.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199000114553
    Description:

    The National Farm Survey is a sample survey which produces annual estimates on a variety of subjects related to agriculture in Canada. The 1988 survey was conducted using a new sample design. This design involved multiple sampling frames and multivariate sampling techniques different from those of the previous design. This article first describes the strategy and methods used to develop the new sample design, then gives details on factors affecting the precision of the estimates. Finally, the performance of the new design is assessed using the 1988 survey results.

    Release date: 1990-06-15

  • Articles and reports: 12-001-X198900114573
    Description:

    The Census Bureau makes extensive use of administrative records information in its various economic programs. Although the volume of records processed annually is vast, even larger numbers will be received during the census years. Census Bureau mainframe computers perform quality control (QC) tabulations on the data; however, since such a large number of QC tables are needed and resources for programming are limited and costly, a comprehensive mainframe QC system is difficult to attain. Add to this the sensitive nature of the data and the potentially very negative ramifications from erroneous data, and the need becomes quite apparent for a sophisticated quality assurance system on the microcomputer level. Such a system is being developed by the Economic Surveys Division and will be in place for the 1987 administrative records data files. The automated quality assurance system integrates micro and mainframe computer technology. Administrative records data are received weekly and processed initially through mainframe QC programs. The mainframe output is transferred to a microcomputer and formatted specifically for importation to a spreadsheet program. Systematic quality verification occurs within the spreadsheet structure, as data review, error detection, and report generation are accomplished automatically. As a result of shifting processes from mainframe to microcomputer environments, the system eases the burden on the programming staff, increases the flexibility of the analytical staff, and reduces processing costs on the mainframe and provides the comprehensive quality assurance component for administrative records.

    Release date: 1989-06-15

Reference (32)

Reference (32) (25 of 32 results)

  • Technical products: 11-522-X201700014720
    Description:

    This paper is intended to give a brief overview of Statistics Canada’s involvement with open data. It will first discuss how the principles of open data are being adopted in the agency’s ongoing dissemination practices. It will then discuss the agency’s involvement with the whole of government open data initiative. This involvement is twofold: Statistics Canada is the major data contributor to the Government of Canada Open Data portal, but also plays an important behind the scenes role as the service provider responsible for developing and maintaining the Open Data portal (which is now part of the wider Open Government portal.)

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014734
    Description:

    Data protection and privacy are key challenges that need to be tackled with high priority in order to enable the use of Big Data in the production of Official Statistics. This was emphasized in 2013 by the Directors of National Statistical Insitutes (NSIs) of the European Statistical System Committee (ESSC) in the Scheveningen Memorandum. The ESSC requested Eurostat and the NSIs to elaborate an action plan with a roadmap for following up the implementation of the Memorandum. At the Riga meeting on September 26, 2014, the ESSC endorsed the Big Data Action Plan and Roadmap 1.0 (BDAR) presented by the Eurostat Task Force on Big Data (TFBD) and agreed to integrate it into the ESS Vision 2020 portfolio. Eurostat also collaborates in this field with external partners such as the United Nations Economic Commission for Europe (UNECE). The big data project of the UNECE High-Level Group is an international project on the role of big data in the modernization of statistical production. It comprised four ‘task teams’ addressing different aspects of Big Data issues relevant for official statistics: Privacy, Partnerships, Sandbox, and Quality. The Privacy Task Team finished its work in 2014 and gave an overview of the existing tools for risk management regarding privacy issues, described how risk of identification relates to Big Data characteristics and drafted recommendations for National Statistical Offices (NSOs). It mainly concluded that extensions to existing frameworks, including use of new technologies were needed in order to deal with privacy risks related to the use of Big Data. The BDAR builds on the work achieved by the UNECE task teams. Specifically, it recognizes that a number of big data sources contain sensitive information, that their use for official statistics may induce negative perceptions with the general public and other stakeholders and that this risk should be mitigated in the short to medium term. It proposes to launch multiple actions like e.g., an adequate review on ethical principles governing the roles and activities of the NSIs and a strong communication strategy. The paper presents the different actions undertaken within the ESS and in collaboration with UNECE, as well as potential technical and legal solutions to be put in place in order to address the data protection and privacy risks in the use of Big Data for Official Statistics.

    Release date: 2016-03-24

  • Technical products: 11-522-X201300014267
    Description:

    Statistics Sweden has, like many other National Statistical Institutes (NSIs), a long history of working with quality. More recently, the agency decided to start using a number of frameworks to address organizational, process and product quality. It is important to consider all three levels, since we know that the way we do things, e.g., when asking questions, affects product quality and therefore process quality is an important part of the quality concept. Further, organizational quality, i.e., systematically managing aspects such as training of staff and leadership, is fundamental for achieving process quality. Statistics Sweden uses EFQM (European Foundation for Quality Management) as a framework for organizational quality and ISO 20252 for market, opinion and social research as a standard for process quality. In April 2014, as the first National Statistical Institute, Statistics Sweden was certified according to the ISO 20252. One challenge that Statistics Sweden faced in 2011 was to systematically measure and monitor changes in product quality and to clearly present them to stakeholders. Together with external consultants, Paul Biemer and Dennis Trewin, Statistics Sweden developed a tool for this called ASPIRE (A System for Product Improvement, Review and Evaluation). To assure that quality is maintained and improved, Statistics Sweden has also built an organization for quality comprising a quality manager, quality coaches, and internal and external quality auditors. In this paper I will present the components of Statistics Sweden’s quality management system and some of the challenges we have faced.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014287
    Description:

    The purpose of the EpiNano program is to monitor workers who may be exposed to intentionally produced nanomaterials in France. This program is based both on industrial hygiene data collected in businesses for the purpose of gauging exposure to nanomaterials at workstations and on data from self-administered questionnaires completed by participants. These data will subsequently be matched with health data from national medical-administrative databases (passive monitoring of health events). Follow-up questionnaires will be sent regularly to participants. This paper describes the arrangements for optimizing data collection and matching.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014286
    Description:

    The Étude Longitudinale Française depuis l’Enfance (ELFE) [French longitudinal study from childhood on], which began in 2011, involves over 18,300 infants whose parents agreed to participate when they were in the maternity hospital. This cohort survey, which will track the children from birth to adulthood, covers the many aspects of their lives from the perspective of social science, health and environmental health. In randomly selected maternity hospitals, all infants in the target population, who were born on one of 25 days distributed across the four seasons, were chosen. This sample is the outcome of a non-standard sampling scheme that we call product sampling. In this survey, it takes the form of the cross-tabulation between two independent samples: a sampling of maternity hospitals and a sampling of days. While it is easy to imagine a cluster effect due to the sampling of maternity hospitals, one can also imagine a cluster effect due to the sampling of days. The scheme’s time dimension therefore cannot be ignored if the desired estimates are subject to daily or seasonal variation. While this non-standard scheme can be viewed as a particular kind of two-phase design, it needs to be defined within a more specific framework. Following a comparison of the product scheme with a conventional two-stage design, we propose variance estimators specially formulated for this sampling scheme. Our ideas are illustrated with a simulation study.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014260
    Description:

    The Survey of Employment, Payrolls and Hours (SEPH) produces monthly estimates and determines the month-to-month changes for variables such as employment, earnings and hours at detailed industrial levels for Canada, the provinces and territories. In order to improve the efficiency of collection activities for this survey, an electronic questionnaire (EQ) was introduced in the fall of 2012. Given the timeframe allowed for this transition as well as the production calendar of the survey, a conversion strategy was developed for the integration of this new mode. The goal of the strategy was to ensure a good adaptation of the collection environment and also to allow the implementation of a plan of analysis that would evaluate the impact of this change on the results of the survey. This paper will give an overview of the conversion strategy, the different adjustments that were made during the transition period and the results of various evaluations that were conducted. For example, the impact of the integration of the EQ on the collection process, the response rate and the follow-up rate will be presented. In addition, the effect that this new collection mode has on the survey estimates will also be discussed. More specifically, the results of a randomized experiment that was conducted in order to determine the presence of a mode effect will be presented.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014255
    Description:

    The Brazilian Network Information Center (NIC.br) has designed and carried out a pilot project to collect data from the Web in order to produce statistics about the webpages’ characteristics. Studies on the characteristics and dimensions of the web require collecting and analyzing information from a dynamic and complex environment. The core idea was collecting data from a sample of webpages automatically by using software known as web crawler. The motivation for this paper is to disseminate the methods and results of this study as well as to show current developments related to sampling techniques in a dynamic environment.

    Release date: 2014-10-31

  • Technical products: 11-522-X200800010954
    Description:

    Over the past year, Statistics Canada has been developing and testing a new way to monitor the performance of interviewers conducting computer-assisted personal interviews (CAPI). A formal process already exists for monitoring centralized telephone interviews. Monitors listen to telephone interviews as they take place to assess the interviewer's performance using pre-defined criteria and provide feedback to the interviewer on what was well done and what needs improvement. For the CAPI program, we have developed and are testing a pilot approach whereby interviews are digitally recorded and later a monitor listens to these recordings to assess the field interviewer's performance and provide feedback in order to help improve the quality of the data. In this paper, we will present an overview of the CAPI monitoring project at Statistics Canada by describing the CAPI monitoring methodology and the plans for implementation.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010946
    Description:

    In the mid 1990s the first question testing unit was set-up in the UK Office for National Statistics (ONS). The key objective of the unit was to develop and test the questions and questionnaire for the 2001 Census. Since the establishment of this unit the area has been expanded into a Data Collection Methodology (DCM) Centre of Expertise which now sits in the Methodology Directorate. The DCM centre has three branches which support DCM work for social surveys, business surveys, the Census and external organisations.

    In the past ten years DCM has achieved a variety of things. For example, introduced survey methodology involvement in the development and testing of business survey question(naire)s; introduced a mix-method approach to the development of questions and questionnaires; developed and implemented standards e.g. for the 2011 census questionnaire & showcards; and developed and delivered DCM training events.

    This paper will provide an overview of data collection methodology at the ONS from the perspective of achievements and challenges. It will cover areas such as methods, staff (e.g. recruitment, development and field security), and integration with the survey process.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011006
    Description:

    The Office for National Statistics (ONS) has an obligation to measure and annually report on the burden that it places on businesses participating in its surveys. There are also targets for reduction of costs to businesses complying with government regulation as part of the 2005 Administrative Burdens Reduction Project (ABRP) coordinated by the Better Regulation Executive (BRE).

    Respondent burden is measured by looking at the economic costs to businesses. Over time the methodology for measuring this economic cost has changed with the most recent method being the development and piloting of a Standard Cost Model (SCM) approach.

    The SCM is commonly used in Europe and is focused on measuring objective administrative burdens for all government requests for information e.g. tax returns, VAT, as well as survey participation. This method was not therefore specifically developed to measure statistical response burden. The SCM methodology is activity-based, meaning that the costs and time taken to fulfil requirements are broken down by activity.

    The SCM approach generally collects data using face-to-face interviews. The approach is therefore labour intensive both from a collection and analysis perspective but provides in depth information. The approach developed and piloted at ONS uses paper self-completion questionnaires.

    The objective of this paper is to provide an overview of respondent burden reporting and targets; and to review the different methodologies that ONS has used to measure respondent burden from the perspectives of sampling, data collection, analysis and usability.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010995
    Description:

    In 1992, a paper entitled "The Optimum Time at which to Conduct Survey Interviews" sought to illustrate the economic benefits to market research organisations in structuring the calling pattern of interviewers in household surveys. The findings were based on the Welsh Inter Censal Survey in 1986. This paper now brings additional information on the calling patterns of interviewers from similar surveys in 1997 and 2006 to ascertain whether the calling patterns of interviewers have changed. The results also examine the importance of having a survey response that is representative of the population, and how efficient calling strategies can help achieve this.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010953
    Description:

    As survey researchers attempt to maintain traditionally high response rates, reluctant respondents have resulted in increasing data collection costs. This respondent reluctance may be related to the amount of time it takes to complete an interview in large-scale, multi-purpose surveys, such as the National Survey of Recent College Graduates (NSRCG). Recognizing that respondent burden or questionnaire length may contribute to lower response rates, in 2003, following several months of data collection under the standard data collection protocol, the NSRCG offered its nonrespondents monetary incentives about two months before the end of the data collection,. In conjunction with the incentive offer, the NSRCG also offered persistent nonrespondents an opportunity to complete a much-abbreviated interview consisting of a few critical items. The late respondents who completed the interviews as the result of the incentive and critical items-only questionnaire offers may provide some insight into the issue of nonresponse bias and the likelihood that such interviewees would have remained survey nonrespondents if these refusal conversion efforts had not been made.

    In this paper, we define "reluctant respondents" as those who responded to the survey only after extra efforts were made beyond the ones initially planned in the standard data collection protocol. Specifically, reluctant respondents in the 2003 NSRCG are those who responded to the regular or shortened questionnaire following the incentive offer. Our conjecture was that the behavior of the reluctant respondents would be more like that of nonrespondents than of respondents to the surveys. This paper describes an investigation of reluctant respondents and the extent to which they are different from regular respondents. We compare different response groups on several key survey estimates. This comparison will expand our understanding of nonresponse bias in the NSRCG, and of the characteristics of nonrespondents themselves, thus providing a basis for changes in the NSRCG weighting system or estimation procedures in the future.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010996
    Description:

    In recent years, the use of paradata has become increasingly important to the management of collection activities at Statistics Canada. Particular attention has been paid to social surveys conducted over the phone, like the Survey of Labour and Income Dynamics (SLID). For recent SLID data collections, the number of call attempts was capped at 40 calls. Investigations of the SLID Blaise Transaction History (BTH) files were undertaken to assess the impact of the cap on calls.The purpose of the first study was to inform decisions as to the capping of call attempts, the second study focused on the nature of nonresponse given the limit of 40 attempts.

    The use of paradata as auxiliary information for studying and accounting for survey nonresponse was also examined. Nonresponse adjustment models using different paradata variables gathered at the collection stage were compared to the current models based on available auxiliary information from the Labour Force Survey.

    Release date: 2009-12-03

  • Technical products: 11-522-X200600110453
    Description:

    National Food and Nutrition Surveys provide critical information to support the understanding the complex relationship between health and diet in the population. Many of these surveys use 24 hour recall methodology which collects at a detailed level all food and beverages consumed over a day. Often it is the longer term intake of foods and nutrients that is of interest and a number of techniques are available that allow estimation of population usual intakes. These techniques require that at least one repeat 24 hour recall be collected from at least a subset of the population in order to estimate the intra individual variability of intakes. Deciding on the number of individuals required to provide a repeat is an important step in the survey design that must recognize that too few repeat individuals compromises the ability to estimate usual intakes, but large numbers of repeats are costly and pose added burden to the respondents. This paper looks at the statistical issues related to the number of repeat individuals, assessing the impact of the number of repeaters on the stability and uncertainty in the estimate of intra individual variability and provides guidance on required number of repeat responders .

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110413
    Description:

    The National Health and Nutrition Examination Survey (NHANES) has been conducted by the National Center for Health Statistics for over forty years. The survey collects information on the health and nutritional status of the United States population using in-person interviews and standardized physical examinations conducted in mobile examination centers. During the course of these forty years, numerous lessons have been learned about the conduct of a survey using direct physical measures. Examples of these "lessons learned" are described and provide a guide for other organizations and countries as they plan similar surveys.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110449
    Description:

    Traditionally administrative hospital discharge databases have been mainly used for administrative purposes. Recently, health services researchers and population health researchers have been using the databases for a wide variety of studies; in particular health care outcomes. Tools, such as comorbidity indexes, have been developed to facilitate this analysis. Every time the coding system for diagnoses and procedures is revised or a new one is developed, these comorbidity indexes need to be updated. These updates are important in maintaining consistency when trends are examined over time.

    Release date: 2008-03-17

  • Technical products: 11-522-X20050019482
    Description:

    Health studies linking the administrative hospital discharge database by person can be used to describe disease/procedure rates and trends by person, place and time; investigate outcomes of disease, procedures or risk factors; and illuminate hospital utilization. The power and challenges of this work will be illustrated with examples from work done at Statistics Canada.

    Release date: 2007-03-02

  • Technical products: 11-522-X20050019437
    Description:

    The explanatory information accompanying statistical data is called metadata, and its presence is essential for the correct understanding and interpretation of the data. This paper will report on the experience of Statistics Canada in the conceptualization, naming and organization of variables on which data are produced.

    Release date: 2007-03-02

  • Technical products: 11-522-X20040018738
    Description:

    This paper describes the efforts made during the 2001 UK Census to both maximise and measure the response in the hardest to count sectors of the population. It also discusses the research that will be undertaken for the 2011 UK Census.

    Release date: 2005-10-27

  • Technical products: 11-522-X20030017707
    Description:

    The paper discusses the structure and the quality measures Eurostat uses to provide European Union and EU-Zone with economic seasonally adjusted series.

    Release date: 2005-01-26

  • Technical products: 11-522-X20030017710
    Description:

    This paper presents a probabilistic model which estimates the number of enterprises in different strata and applies logistic regression to estimate the probability of companies' activity statuses based on a survey on existence.

    Release date: 2005-01-26

  • Technical products: 11-522-X20020016722
    Description:

    Colorectal cancer (CRC) is the second cause of cancer deaths in Canada. Randomized controlled trials (RCT) have shown the efficacy of screening using faecal occult blood tests (FOBT). A comprehensive evaluation of the costs and consequences of CRC screening for the Canadian population is required before implementing such a program. This paper evaluates whether or not the CRC screening is cost-effective. The results of these simulations will be provided to the Canadian National Committee on Colorectal Cancer Screening to help formulate national policy recommendations for CRC screening.

    Statistics Canada's Population Health Microsimulation Model was updated to incorporate a comprehensive CRC screening module based on Canadian data and RCT efficacy results. The module incorporated sensitivity and specificity of FOBT and colonoscopy, participation rates, incidence, staging, diagnostic and therapeutic options, disease progression, mortality and direct health care costs for different screening scenarios. Reproducing the mortality reduction observed in the Funen screening trial validated the model.

    Release date: 2004-09-13

  • Technical products: 11-522-X20020016732
    Description:

    Analysis of dose-response relationships has long been important in toxicology. More recently, this type of analysis has been employed to evaluate public education campaigns. The data that are collected in such evaluations are likely to come from standard household survey designs with all the usual complexities of multiple stages, stratification and variable selection probabilities. On a recent evaluation, a system was developed with the following features: categorization of doses into three or four levels, propensity scoring of dose selection and a new jack-knifed Jonckheere-Terpstra test for a monotone dose-response relationship. This system allows rapid production of tests for monotone dose-response relationships that are corrected both for sample design and for confounding. The focus of this paper will be the results of a Monte-Carlo simulation of the properties of the jack-knifed Jonckheere-Terpstra.

    Moreover, there is no experimental control over dosages and the possibility of confounding variables must be considered. Standard regressions in WESVAR and SUDAAN could be used to determine if there is a linear dose-response relationship while controlling on confounders, but such an approach obviously has low power to detect nonlinear but monotone dose-response relationships and is time-consuming to implement if there are a large number of possible outcomes of interest.

    Release date: 2004-09-13

  • Technical products: 11-522-X20020016750
    Description:

    Analyses of data from social and economic surveys sometimes use generalized variance function models to approximate the design variance of point estimators of population means and proportions. Analysts may use the resulting standard error estimates to compute associated confidence intervals or test statistics for the means and proportions of interest. In comparison with design-based variance estimators computed directly from survey microdata, generalized variance function models have several potential advantages, as will be discussed in this paper, including operational simplicity; increased stability of standard errors; and, for cases involving public-use datasets, reduction of disclosure limitation problems arising from the public release of stratum and cluster indicators.

    These potential advantages, however, may be offset in part by several inferential issues. First, the properties of inferential statistics based on generalized variance functions (e.g., confidence interval coverage rates and widths) depend heavily on the relative empirical magnitudes of the components of variability associated, respectively, with:

    (a) the random selection of a subset of items used in estimation of the generalized variance function model(b) the selection of sample units under a complex sample design (c) the lack of fit of the generalized variance function model (d) the generation of a finite population under a superpopulation model.

    Second, under conditions, one may link each of components (a) through (d) with different empirical measures of the predictive adequacy of a generalized variance function model. Consequently, these measures of predictive adequacy can offer us some insight into the extent to which a given generalized variance function model may be appropriate for inferential use in specific applications.

    Some of the proposed diagnostics are applied to data from the US Survey of Doctoral Recipients and the US Current Employment Survey. For the Survey of Doctoral Recipients, components (a), (c) and (d) are of principal concern. For the Current Employment Survey, components (b), (c) and (d) receive principal attention, and the availability of population microdata allow the development of especially detailed models for components (b) and (c).

    Release date: 2004-09-13

  • Technical products: 11-522-X20020016738
    Description:

    Parental union dissolution has been on the rise in Canada for the last 30 years and the nature and intensity of the fact that children stay with their parents after the family has broken up is now an important issue. Until now, most research on this topic has been done using cross-sectional data. However, the arrangements that separating parents make concerning the physical and financial care of their children are far from static, evolving in response to a variety of changes in the lives of both biological parents, including those occurring as a result of the new conjugal unions mothers and fathers enter into.

    In this paper, we first determine how custody arrangements evolve through time and then examine changes in the frequency of contact that non-resident fathers maintain with their children. In both analyses, particular attention is given to the effect that a new union in the mother's or father's life has on the level of contact that children maintain with the non-custodial parent. We also examine how this varies depending on whether or not the new partner had children from a previous union, and on whether the mother's or father's new union is fertile. Prospective data from the two first waves of the National Longitudinal Survey of Children and Youth (NLSCY) will enable us to compare levels of contact both before and after family recomposition.

    Analyses are conducted using multinomial logit and probit models, and ordered logit and probit models according to the nature of the dependent variables. The observation of some of our dependent variables (e.g., the levels of contact between non-residing fathers and their child) is dependent on a selection process (e.g., that a father not residing with his child at Time 1 does not reside with the child at Time 2). In such cases, analyses are conducted using ordered probit models with selectivity. In all analyses, standard errors are adjusted to account for the sample design.

    Release date: 2004-09-13

Date modified: