Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

89 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

89 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

89 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

89 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (116)

All (116) (25 of 116 results)

  • Articles and reports: 82-003-X201700614829
    Description:

    POHEM-BMI is a microsimulation tool that includes a model of adult body mass index (BMI) and a model of childhood BMI history. This overview describes the development of BMI prediction models for adults and of childhood BMI history, and compares projected BMI estimates with those from nationally representative survey data to establish validity.

    Release date: 2017-06-21

  • Articles and reports: 82-003-X201600414489
    Description:

    Using accelerometry data for children and youth aged 3 to 17 from the Canadian Health Measures Survey, the probability of adherence to physical activity guidelines is estimated using a conditional probability, given the number of active and inactive days distributed as a Betabinomial.

    Release date: 2016-04-20

  • Technical products: 11-522-X201700014714
    Description:

    The Labour Market Development Agreements (LMDAs) between Canada and the provinces and territories fund labour market training and support services to Employment Insurance claimants. The objective of this paper is to discuss the improvements over the years in the impact assessment methodology. The paper describes the LMDAs and past evaluation work and discusses the drivers to make better use of large administrative data holdings. It then explains how the new approach made the evaluation less resource-intensive, while results are more relevant to policy development. The paper outlines the lessons learned from a methodological perspective and provides insight into ways for making this type of use of administrative data effective, especially in the context of large programs.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014724
    Description:

    At the Institut national de santé publique du Québec, the Quebec Integrated Chronic Disease Surveillance System (QICDSS) has been used daily for approximately four years. The benefits of this system are numerous for measuring the extent of diseases more accurately, evaluating the use of health services properly and identifying certain groups at risk. However, in the past months, various problems have arisen that have required a great deal of careful thought. The problems have affected various areas of activity, such as data linkage, data quality, coordinating multiple users and meeting legal obligations. The purpose of this presentation is to describe the main challenges associated with using QICDSS data and to present some possible solutions. In particular, this presentation discusses the processing of five data sources that not only come from five different sources, but also are not mainly used for chronic disease surveillance. The varying quality of the data, both across files and within a given file, will also be discussed. Certain situations associated with the simultaneous use of the system by multiple users will also be examined. Examples will be given of analyses of large data sets that have caused problems. As well, a few challenges involving disclosure and the fulfillment of legal agreements will be briefly discussed.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014706
    Description:

    Over the last decade, Statistics Canada’s Producer Prices Division has expanded its service producer price indexes program and continued to improve its goods and construction producer price indexes program. While the majority of price indexes are based on traditional survey methods, efforts were made to increase the use of administrative data and alternative data sources in order to reduce burden on our respondents. This paper focuses mainly on producer price programs, but also provides information on the growing importance of alternative data sources at Statistics Canada. In addition, it presents the operational challenges and risks that statistical offices could face when relying more and more on third-party outputs. Finally, it presents the tools being developed to integrate alternative data while collecting metadata.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014738
    Description:

    In the standard design approach to missing observations, the construction of weight classes and calibration are used to adjust the design weights for the respondents in the sample. Here we use these adjusted weights to define a Dirichlet distribution which can be used to make inferences about the population. Examples show that the resulting procedures have better performance properties than the standard methods when the population is skewed.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014754
    Description:

    Background: There is increasing interest in measuring and benchmarking health system performance. We compared Canada’s health system with other countries in the Organisation for Economic Co-operation and Development (OECD) on both the national and provincial levels, across 50 indicators of health system performance. This analysis can help provinces identify potential areas for improvement, considering an optimal comparator for international comparisons. Methods: OECD Health Data from 2013 was used to compare Canada’s results internationally. We also calculated provincial results for OECD’s indicators on health system performance, using OECD methodology. We normalized the indicator results to present multiple indicators on the same scale and compared them to the OECD average, 25th and 75th percentiles. Results: Presenting normalized values allow Canada’s results to be compared across multiple OECD indicators on the same scale. No country or province consistently has higher results than the others. For most indicators, Canadian results are similar to other countries, but there remain areas where Canada performs particularly well (i.e. smoking rates) or poorly (i.e. patient safety). This data was presented in an interactive eTool. Conclusion: Comparing Canada’s provinces internationally can highlight areas where improvement is needed, and help to identify potential strategies for improvement.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014729
    Description:

    The use of administrative datasets as a data source in official statistics has become much more common as there is a drive for more outputs to be produced more efficiently. Many outputs rely on linkage between two or more datasets, and this is often undertaken in a number of phases with different methods and rules. In these situations we would like to be able to assess the quality of the linkage, and this involves some re-assessment of both links and non-links. In this paper we discuss sampling approaches to obtain estimates of false negatives and false positives with reasonable control of both accuracy of estimates and cost. Approaches to stratification of links (non-links) to sample are evaluated using information from the 2011 England and Wales population census.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014740
    Description:

    In this paper, we discuss the impacts of Employment Benefit and Support Measures delivered in Canada under the Labour Market Development Agreements. We use linked rich longitudinal administrative data covering all LMDA participants from 2002 to 2005. We Apply propensity score matching as in Blundell et al. (2002), Gerfin and Lechner (2002), and Sianesi (2004), and produced the national incremental impact estimates using difference-in-differences and Kernel Matching estimator (Heckman and Smith, 1999). The findings suggest that, both Employment Assistance Services and employment benefit such as Skills Development and Targeted Wage Subsidies had positive effects on earnings and employment.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014718
    Description:

    This study assessed whether starting participation in Employment Assistance Services (EAS) earlier after initiating an Employment Insurance (EI) claim leads to better impacts for unemployed individuals than participating later during the EI benefit period. As in Sianesi (2004) and Hujer and Thomsen (2010), the analysis relied on a stratified propensity score matching approach conditional on the discretized duration of unemployment until the program starts. The results showed that individuals who participated in EAS within the first four weeks after initiating an EI claim had the best impacts on earnings and incidence of employment while also experiencing reduced use of EI starting the second year post-program.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014727
    Description:

    "Probability samples of near-universal frames of households and persons, administered standardized measures, yielding long multivariate data records, and analyzed with statistical procedures reflecting the design – these have been the cornerstones of the empirical social sciences for 75 years. That measurement structure have given the developed world almost all of what we know about our societies and their economies. The stored survey data form a unique historical record. We live now in a different data world than that in which the leadership of statistical agencies and the social sciences were raised. High-dimensional data are ubiquitously being produced from Internet search activities, mobile Internet devices, social media, sensors, retail store scanners, and other devices. Some estimate that these data sources are increasing in size at the rate of 40% per year. Together their sizes swamp that of the probability-based sample surveys. Further, the state of sample surveys in the developed world is not healthy. Falling rates of survey participation are linked with ever-inflated costs of data collection. Despite growing needs for information, the creation of new survey vehicles is hampered by strained budgets for official statistical agencies and social science funders. These combined observations are unprecedented challenges for the basic paradigm of inference in the social and economic sciences. This paper discusses alternative ways forward at this moment in history. "

    Release date: 2016-03-24

  • Articles and reports: 82-003-X201600314338
    Description:

    This paper describes the methods and data used in the development and implementation of the POHEM-Neurological meta-model.

    Release date: 2016-03-16

  • Articles and reports: 82-003-X201500714205
    Description:

    Discrepancies between self-reported and objectively measured physical activity are well-known. For the purpose of validation, this study compares a new self-reported physical activity questionnaire with an existing one and with accelerometer data.

    Release date: 2015-07-15

  • Articles and reports: 12-001-X201400214113
    Description:

    Rotating panel surveys are used to calculate estimates of gross flows between two consecutive periods of measurement. This paper considers a general procedure for the estimation of gross flows when the rotating panel survey has been generated from a complex survey design with random nonresponse. A pseudo maximum likelihood approach is considered through a two-stage model of Markov chains for the allocation of individuals among the categories in the survey and for modeling for nonresponse.

    Release date: 2014-12-19

  • Technical products: 11-522-X201300014287
    Description:

    The purpose of the EpiNano program is to monitor workers who may be exposed to intentionally produced nanomaterials in France. This program is based both on industrial hygiene data collected in businesses for the purpose of gauging exposure to nanomaterials at workstations and on data from self-administered questionnaires completed by participants. These data will subsequently be matched with health data from national medical-administrative databases (passive monitoring of health events). Follow-up questionnaires will be sent regularly to participants. This paper describes the arrangements for optimizing data collection and matching.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014271
    Description:

    The purpose of this paper is to present the use of administrative records in the U.S. Census for Group Quarters, or known as collective dwellings elsewhere. Group Quarters enumeration involves collecting data from such hard-to-access places as correctional facilities, skilled nursing facilities, and military barracks. We discuss benefits and constraints of using various sources of administrative records in constructing the Group Quarters frame for coverage improvement. This paper is a companion to the paper by Chun and Gan (2014), discusing the potential uses of administrative records in the Group Quarters enumeration.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014258
    Description:

    The National Fuel Consumption Survey (FCS) was created in 2013 and is a quarterly survey that is designed to analyze distance driven and fuel consumption for passenger cars and other vehicles weighing less than 4,500 kilograms. The sampling frame consists of vehicles extracted from the vehicle registration files, which are maintained by provincial ministries. For collection, FCS uses car chips for a part of the sampled units to collect information about the trips and the fuel consumed. There are numerous advantages to using this new technology, for example, reduction in response burden, collection costs and effects on data quality. For the quarters in 2013, the sampled units were surveyed 95% via paper questionnaires and 5% with car chips, and in Q1 2014, 40% of sampled units were surveyed with car chips. This study outlines the methodology of the survey process, examines the advantages and challenges in processing and imputation for the two collection modes, presents some initial results and concludes with a summary of the lessons learned.

    Release date: 2014-10-31

  • Technical products: 12-002-X201400111901
    Description:

    This document is for analysts/researchers who are considering doing research with data from a survey where both survey weights and bootstrap weights are provided in the data files. This document gives directions, for some selected software packages, about how to get started in using survey weights and bootstrap weights for an analysis of survey data. We give brief directions for obtaining survey-weighted estimates, bootstrap variance estimates (and other desired error quantities) and some typical test statistics for each software package in turn. While these directions are provided just for the chosen examples, there will be information about the range of weighted and bootstrapped analyses that can be carried out by each software package.

    Release date: 2014-08-07

  • Articles and reports: 12-001-X201400111886
    Description:

    Bayes linear estimator for finite population is obtained from a two-stage regression model, specified only by the means and variances of some model parameters associated with each stage of the hierarchy. Many common design-based estimators found in the literature can be obtained as particular cases. A new ratio estimator is also proposed for the practical situation in which auxiliary information is available. The same Bayes linear approach is proposed for obtaining estimation of proportions for multiple categorical data associated with finite population units, which is the main contribution of this work. A numerical example is provided to illustrate it.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201400114000
    Description:

    We have used the generalized linearization technique based on the concept of influence function, as Osier has done (Osier 2009), to estimate the variance of complex statistics such as Laeken indicators. Simulations conducted using the R language show that the use of Gaussian kernel estimation to estimate an income density function results in a strongly biased variance estimate. We are proposing two other density estimation methods that significantly reduce the observed bias. One of the methods has already been outlined by Deville (2000). The results published in this article will help to significantly improve the quality of information on the precision of certain Laeken indicators that are disseminated and compared internationally.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201300211888
    Description:

    When the study variables are functional and storage capacities are limited or transmission costs are high, using survey techniques to select a portion of the observations of the population is an interesting alternative to using signal compression techniques. In this context of functional data, our focus in this study is on estimating the mean electricity consumption curve over a one-week period. We compare different estimation strategies that take account of a piece of auxiliary information such as the mean consumption for the previous period. The first strategy consists in using a simple random sampling design without replacement, then incorporating the auxiliary information into the estimator by introducing a functional linear model. The second approach consists in incorporating the auxiliary information into the sampling designs by considering unequal probability designs, such as stratified and pi designs. We then address the issue of constructing confidence bands for these estimators of the mean. When effective estimators of the covariance function are available and the mean estimator satisfies a functional central limit theorem, it is possible to use a fast technique for constructing confidence bands, based on the simulation of Gaussian processes. This approach is compared with bootstrap techniques that have been adapted to take account of the functional nature of the data.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201200211752
    Description:

    Coca is a native bush from the Amazon rainforest from which cocaine, an illegal alkaloid, is extracted. Asking farmers about the extent of their coca cultivation areas is considered a sensitive question in remote coca growing regions in Peru. As a consequence, farmers tend not to participate in surveys, do not respond to the sensitive question(s), or underreport their individual coca cultivation areas. There is a political and policy concern in accurately and reliably measuring coca growing areas, therefore survey methodologists need to determine how to encourage response and truthful reporting of sensitive questions related to coca growing. Specific survey strategies applied in our case study included establishment of trust with farmers, confidentiality assurance, matching interviewer-respondent characteristics, changing the format of the sensitive question(s), and non enforcement of absolute isolation of respondents during the survey. The survey results were validated using satellite data. They suggest that farmers tend to underreport their coca areas to 35 to 40% of their true extent.

    Release date: 2012-12-19

  • Technical products: 11-522-X200800011006
    Description:

    The Office for National Statistics (ONS) has an obligation to measure and annually report on the burden that it places on businesses participating in its surveys. There are also targets for reduction of costs to businesses complying with government regulation as part of the 2005 Administrative Burdens Reduction Project (ABRP) coordinated by the Better Regulation Executive (BRE).

    Respondent burden is measured by looking at the economic costs to businesses. Over time the methodology for measuring this economic cost has changed with the most recent method being the development and piloting of a Standard Cost Model (SCM) approach.

    The SCM is commonly used in Europe and is focused on measuring objective administrative burdens for all government requests for information e.g. tax returns, VAT, as well as survey participation. This method was not therefore specifically developed to measure statistical response burden. The SCM methodology is activity-based, meaning that the costs and time taken to fulfil requirements are broken down by activity.

    The SCM approach generally collects data using face-to-face interviews. The approach is therefore labour intensive both from a collection and analysis perspective but provides in depth information. The approach developed and piloted at ONS uses paper self-completion questionnaires.

    The objective of this paper is to provide an overview of respondent burden reporting and targets; and to review the different methodologies that ONS has used to measure respondent burden from the perspectives of sampling, data collection, analysis and usability.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010948
    Description:

    Past survey instruments, whether in the form of a paper questionnaire or telephone script, were their own documentation. Based on this, the ESRC Question Bank was created, providing free-access internet publication of questionnaires, enabling researchers to re-use questions, saving them trouble, whilst improving the comparability of their data with that collected by others. Today however, as survey technology and computer programs have become more sophisticated, accurate comprehension of the latest questionnaires seems more difficult, particularly when each survey team uses its own conventions to document complex items in technical reports. This paper seeks to illustrate these problems and suggest preliminary standards of presentation to be used until the process can be automated.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010971
    Description:

    Keynote address

    Release date: 2009-12-03

Data (1)

Data (1) (1 result)

  • Table: 53-222-X19980006587
    Description:

    The primary purpose of this article is to present a new time series data and to demonstrate its analytical potential and not to provide a detailed analysis of these data. The analysis in section 5.2.4 will deal primarily with the trends of major variables dealing with domestic and transborder traffic.

    Release date: 2000-03-07

Analysis (57)

Analysis (57) (25 of 57 results)

  • Articles and reports: 82-003-X201700614829
    Description:

    POHEM-BMI is a microsimulation tool that includes a model of adult body mass index (BMI) and a model of childhood BMI history. This overview describes the development of BMI prediction models for adults and of childhood BMI history, and compares projected BMI estimates with those from nationally representative survey data to establish validity.

    Release date: 2017-06-21

  • Articles and reports: 82-003-X201600414489
    Description:

    Using accelerometry data for children and youth aged 3 to 17 from the Canadian Health Measures Survey, the probability of adherence to physical activity guidelines is estimated using a conditional probability, given the number of active and inactive days distributed as a Betabinomial.

    Release date: 2016-04-20

  • Articles and reports: 82-003-X201600314338
    Description:

    This paper describes the methods and data used in the development and implementation of the POHEM-Neurological meta-model.

    Release date: 2016-03-16

  • Articles and reports: 82-003-X201500714205
    Description:

    Discrepancies between self-reported and objectively measured physical activity are well-known. For the purpose of validation, this study compares a new self-reported physical activity questionnaire with an existing one and with accelerometer data.

    Release date: 2015-07-15

  • Articles and reports: 12-001-X201400214113
    Description:

    Rotating panel surveys are used to calculate estimates of gross flows between two consecutive periods of measurement. This paper considers a general procedure for the estimation of gross flows when the rotating panel survey has been generated from a complex survey design with random nonresponse. A pseudo maximum likelihood approach is considered through a two-stage model of Markov chains for the allocation of individuals among the categories in the survey and for modeling for nonresponse.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400111886
    Description:

    Bayes linear estimator for finite population is obtained from a two-stage regression model, specified only by the means and variances of some model parameters associated with each stage of the hierarchy. Many common design-based estimators found in the literature can be obtained as particular cases. A new ratio estimator is also proposed for the practical situation in which auxiliary information is available. The same Bayes linear approach is proposed for obtaining estimation of proportions for multiple categorical data associated with finite population units, which is the main contribution of this work. A numerical example is provided to illustrate it.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201400114000
    Description:

    We have used the generalized linearization technique based on the concept of influence function, as Osier has done (Osier 2009), to estimate the variance of complex statistics such as Laeken indicators. Simulations conducted using the R language show that the use of Gaussian kernel estimation to estimate an income density function results in a strongly biased variance estimate. We are proposing two other density estimation methods that significantly reduce the observed bias. One of the methods has already been outlined by Deville (2000). The results published in this article will help to significantly improve the quality of information on the precision of certain Laeken indicators that are disseminated and compared internationally.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201300211888
    Description:

    When the study variables are functional and storage capacities are limited or transmission costs are high, using survey techniques to select a portion of the observations of the population is an interesting alternative to using signal compression techniques. In this context of functional data, our focus in this study is on estimating the mean electricity consumption curve over a one-week period. We compare different estimation strategies that take account of a piece of auxiliary information such as the mean consumption for the previous period. The first strategy consists in using a simple random sampling design without replacement, then incorporating the auxiliary information into the estimator by introducing a functional linear model. The second approach consists in incorporating the auxiliary information into the sampling designs by considering unequal probability designs, such as stratified and pi designs. We then address the issue of constructing confidence bands for these estimators of the mean. When effective estimators of the covariance function are available and the mean estimator satisfies a functional central limit theorem, it is possible to use a fast technique for constructing confidence bands, based on the simulation of Gaussian processes. This approach is compared with bootstrap techniques that have been adapted to take account of the functional nature of the data.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201200211752
    Description:

    Coca is a native bush from the Amazon rainforest from which cocaine, an illegal alkaloid, is extracted. Asking farmers about the extent of their coca cultivation areas is considered a sensitive question in remote coca growing regions in Peru. As a consequence, farmers tend not to participate in surveys, do not respond to the sensitive question(s), or underreport their individual coca cultivation areas. There is a political and policy concern in accurately and reliably measuring coca growing areas, therefore survey methodologists need to determine how to encourage response and truthful reporting of sensitive questions related to coca growing. Specific survey strategies applied in our case study included establishment of trust with farmers, confidentiality assurance, matching interviewer-respondent characteristics, changing the format of the sensitive question(s), and non enforcement of absolute isolation of respondents during the survey. The survey results were validated using satellite data. They suggest that farmers tend to underreport their coca areas to 35 to 40% of their true extent.

    Release date: 2012-12-19

  • Articles and reports: 12-001-X200900110886
    Description:

    Interviewer variability is a major component of variability of survey statistics. Different strategies related to question formatting, question phrasing, interviewer training, interviewer workload, interviewer experience and interviewer assignment are employed in an effort to reduce interviewer variability. The traditional formula for measuring interviewer variability, commonly referred to as the interviewer effect, is given by ieff := deff_int = 1 + (n bar sub int - 1) rho sub int, where rho sub int and n bar sub int are the intra-interviewer correlation and the simple average of the interviewer workloads, respectively. In this article, we provide a model-assisted justification of this well-known formula for equal probability of selection methods (epsem) with no spatial clustering in the sample and equal interviewer workload. However, spatial clustering and unequal weighting are both very common in large scale surveys. In the context of a complex sampling design, we obtain an appropriate formula for the interviewer variability that takes into consideration unequal probability of selection and spatial clustering. Our formula provides a more accurate assessment of interviewer effects and thus is helpful in allocating more reasonable amount of funds to control the interviewer variability. We also propose a decomposition of the overall effect into effects due to weighting, spatial clustering and interviewers. Such a decomposition is helpful in understanding ways to reduce total variance by different means.

    Release date: 2009-06-22

  • Articles and reports: 12-001-X200900110884
    Description:

    The paper considers small domain estimation of the proportion of persons without health insurance for different minority groups. The small domains are cross-classified by age, sex and other demographic characteristics. Both hierarchical and empirical Bayes estimation methods are used. Also, second order accurate approximations of the mean squared errors of the empirical Bayes estimators and bias-corrected estimators of these mean squared errors are provided. The general methodology is illustrated with estimates of the proportion of uninsured persons for several cross-sections of the Asian subpopulation.

    Release date: 2009-06-22

  • Articles and reports: 82-003-S200700010363
    Description:

    This overview describes the sampling strategy used to meet the collection and estimation requirements of the Canadian Health Measures Survey.

    Release date: 2007-12-05

  • Articles and reports: 12-001-X20060019256
    Description:

    In some situations the sample design of a survey is rather complex, consisting of fundamentally different designs in different domains. The design effect for estimates based upon the total sample is a weighted sum of the domain-specific design effects. We derive these weights under an appropriate model and illustrate their use with data from the European Social Survey (ESS).

    Release date: 2006-07-20

  • Articles and reports: 75-001-X200510613145
    Description:

    Changes in hours worked normally track employment changes very closely. Recently, however, employment has increased more than hours, resulting in an unprecedented gap. In effect, the average annual hours worked have decreased by the equivalent of two weeks. Many factors can affect the hours worked. Some are structural or cyclical - population aging, industrial shifts, the business cycle, natural disasters, legislative changes or personal preferences. Others are a result of the survey methodology. How have the various factors contributed to the recent drop in hours of work?

    Release date: 2005-09-21

  • Articles and reports: 12-001-X20050018087
    Description:

    In Official Statistics, data editing process plays an important role in terms of timeliness, data accuracy, and survey costs. Techniques introduced to identify and eliminate errors from data are essentially required to consider all of these aspects simultaneously. Among others, a frequent and pervasive systematic error appearing in surveys collecting numerical data, is the unity measure error. It highly affects timeliness, data accuracy and costs of the editing and imputation phase. In this paper we propose a probabilistic formalisation of the problem based on finite mixture models. This setting allows us to deal with the problem in a multivariate context, and provides also a number of useful diagnostics for prioritising cases to be more deeply investigated through a clerical review. Prioritising units is important in order to increase data accuracy while avoiding waste of time due to the follow up of non-really critical units.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018093
    Description:

    Kish's well-known expression for the design effect due to clustering is often used to inform sample design, using an approximation such as b_bar in place of b. If the design involves either weighting or variation in cluster sample sizes, this can be a poor approximation. In this article we discuss the sensitivity of the approximation to departures from the implicit assumptions and propose an alternative approximation.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20040027749
    Description:

    A simple and practicable algorithm for constructing stratum boundaries in such a way that the coefficients of variation are equal in each stratum is derived for positively skewed populations. The new algorithm is shown to compare favourably with the cumulative root frequency method (Dalenius and Hodges 1957) and the Lavallée and Hidiroglou (1988) approximation method for estimating the optimum stratum boundaries.

    Release date: 2005-02-03

  • Articles and reports: 12-001-X20040016997
    Description:

    Multilevel models are often fitted to survey data gathered with a complex multistage sampling design. However, if such a design is informative, in the sense that the inclusion probabilities depend on the response variable even after conditioning on the covariates, then standard maximum likelihood estimators are biased. In this paper, following the Pseudo Maximum Likelihood (PML) approach of Skinner (1989), we propose a probability weighted estimation procedure for multilevel ordinal and binary models which eliminates the bias generated by the informativeness of the design. The reciprocals of the inclusion probabilities at each sampling stage are used to weight the log likelihood function and the weighted estimators obtained in this way are tested by means of a simulation study for the simple case of a binary random intercept model with and without covariates. The variance estimators are obtained by a bootstrap procedure. The maximization of the weighted log likelihood of the model is done by the NLMIXED procedure of the SAS, which is based on adaptive Gaussian quadrature. Also the bootstrap estimation of variances is implemented in the SAS environment.

    Release date: 2004-07-14

  • Articles and reports: 12-001-X20030016602
    Description:

    The Canadian Labour Force Survey (LFS) produces monthly direct estimates of the unemployment rate at national and provincial levels. The LFS also releases unemployment estimates for subprovincial areas such as census metropolitan areas (CMAs) and census agglomerations (CAs). However, for some subprovincial areas, the direct estimates are not very reliable since the sample size in some areas is quite small. In this paper, a cross-sectional and time-series model is used to borrow strength across areas and time periods to produce model-based unemployment rate estimates for CMAs and CAs. This model is a generalization of a widely used cross-sectional model in small area estimation and includes a random walk or AR(1) model for the random time component. Monthly Employment Insurance (EI) beneficiary data at the CMA or CA level are used as auxiliary covariates in the model. A hierarchical Bayes (HB) approach is employed and the Gibbs sampler is used to generate samples from the joint posterior distribution. Rao-Blackwellized estimators are obtained for the posterior means and posterior variances of the CMA/CA-level unemployment rates. The HB method smoothes the survey estimates and leads to a substantial reduction in standard errors. Base on posterior distributions, bayesian model fitting is also investigated in this paper.

    Release date: 2003-07-31

  • Articles and reports: 12-001-X20020016419
    Description:

    Since some individuals in a population may lack telephones, telephone surveys using random digit dialling within strata may result in asymptotically biased estimators of ratios. The impact from not being able to sample the non-telephone population is examined. We take into account the propensity that a household owns a telephone, when proposing a post-stratified telephone-weighted estimator, which seems to perform better than the typical post-stratified estimator in terms of mean squared error. Such coverage propensities are estimated using the Public Use Microdata Samples, as provided by the United States Census. Non-post-stratified estimators are considered when sample sizes are small. The asymptotic mean squared error, along with its estimate based on a sample of each of the estimators is derived. Real examples are analysed using the Public Use Microdata Samples. Other forms of no-nresponse are not examined herein.

    Release date: 2002-07-05

  • Articles and reports: 12-001-X20010015855
    Description:

    The Canadian Labour Force Survey (LFS) is a monthly survey with a complex rotating panel design. After extensive studies, including the investigation of a number of alternative methods for exploiting the sample overlap to improve the quality of estimates, the LFS has chosen a composite estimation method which achieves this goal while satisfying practical constraints. In addition, for variables where there is a substantial gain in efficiency, the new time series tend to make more sense from a subject-matter perspective. This makes it easier to explain LFS estimates to users and the media. Because of the reduced variance under composite estimation, for some variables it is now possible to publish monthly estimates where only three-month moving averages were published in the past. In addition, a greater number of series can be successfully seasonally adjusted.

    Release date: 2001-08-22

  • Articles and reports: 12-001-X20010015858
    Description:

    The objective of this paper is to study and measure the change (from the initial to the final weight) which results from the procedure used to modify weights. A breakdown of the final weights is proposed in order to evaluate the relative impact of the nonresponse adjustment, the correction for poststratification and the interaction between these two adjustments. This measure of change is used as a tool for comparing the effectiveness of the various methods for adjusting for nonresponse, in particular the methods relying on the formation of Response Homogeneity Groups. The measure of change is examined through a simulation study, which uses data from a Statistics Canada longitudinal survey, the Survey of Labour and Income Dynamics. The measure of change is also applied to data obtained from a second longitudinal survey, the National Longitudinal Survey of Children and Youth.

    Release date: 2001-08-22

  • Articles and reports: 12-001-X20000015183
    Description:

    For surveys which involve more than one stage of data collection, one method recommended for adjusting weights for nonresponse (after the first stage of data collection) entails utilizing auxiliary variables (from previous stages of data collection) which are identified as predictors of nonresponse.

    Release date: 2000-08-30

  • Articles and reports: 12-001-X19990024879
    Description:

    Godambe and Thompson consider the problem of confidence intervals in survey sampling. They first review the use of estimating functions to obtain model robust pivotal quantities and associated confidence intervals, and then discuss the adaptation of this approach to the survey sampling context. Details are worked out for some more specific types of models, and an empirical comparison of this approach with more conventional methods is presented.

    Release date: 2000-03-01

  • Articles and reports: 12-001-X19990014718
    Description:

    In this short note, we demonstrate that the well-known formula for the design effect intuitively proposed by Kish has a model-based justification. The formula can be interpreted as a conservative value for the actual design effect.

    Release date: 1999-10-08

Reference (58)

Reference (58) (25 of 58 results)

  • Technical products: 11-522-X201700014714
    Description:

    The Labour Market Development Agreements (LMDAs) between Canada and the provinces and territories fund labour market training and support services to Employment Insurance claimants. The objective of this paper is to discuss the improvements over the years in the impact assessment methodology. The paper describes the LMDAs and past evaluation work and discusses the drivers to make better use of large administrative data holdings. It then explains how the new approach made the evaluation less resource-intensive, while results are more relevant to policy development. The paper outlines the lessons learned from a methodological perspective and provides insight into ways for making this type of use of administrative data effective, especially in the context of large programs.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014724
    Description:

    At the Institut national de santé publique du Québec, the Quebec Integrated Chronic Disease Surveillance System (QICDSS) has been used daily for approximately four years. The benefits of this system are numerous for measuring the extent of diseases more accurately, evaluating the use of health services properly and identifying certain groups at risk. However, in the past months, various problems have arisen that have required a great deal of careful thought. The problems have affected various areas of activity, such as data linkage, data quality, coordinating multiple users and meeting legal obligations. The purpose of this presentation is to describe the main challenges associated with using QICDSS data and to present some possible solutions. In particular, this presentation discusses the processing of five data sources that not only come from five different sources, but also are not mainly used for chronic disease surveillance. The varying quality of the data, both across files and within a given file, will also be discussed. Certain situations associated with the simultaneous use of the system by multiple users will also be examined. Examples will be given of analyses of large data sets that have caused problems. As well, a few challenges involving disclosure and the fulfillment of legal agreements will be briefly discussed.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014706
    Description:

    Over the last decade, Statistics Canada’s Producer Prices Division has expanded its service producer price indexes program and continued to improve its goods and construction producer price indexes program. While the majority of price indexes are based on traditional survey methods, efforts were made to increase the use of administrative data and alternative data sources in order to reduce burden on our respondents. This paper focuses mainly on producer price programs, but also provides information on the growing importance of alternative data sources at Statistics Canada. In addition, it presents the operational challenges and risks that statistical offices could face when relying more and more on third-party outputs. Finally, it presents the tools being developed to integrate alternative data while collecting metadata.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014738
    Description:

    In the standard design approach to missing observations, the construction of weight classes and calibration are used to adjust the design weights for the respondents in the sample. Here we use these adjusted weights to define a Dirichlet distribution which can be used to make inferences about the population. Examples show that the resulting procedures have better performance properties than the standard methods when the population is skewed.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014754
    Description:

    Background: There is increasing interest in measuring and benchmarking health system performance. We compared Canada’s health system with other countries in the Organisation for Economic Co-operation and Development (OECD) on both the national and provincial levels, across 50 indicators of health system performance. This analysis can help provinces identify potential areas for improvement, considering an optimal comparator for international comparisons. Methods: OECD Health Data from 2013 was used to compare Canada’s results internationally. We also calculated provincial results for OECD’s indicators on health system performance, using OECD methodology. We normalized the indicator results to present multiple indicators on the same scale and compared them to the OECD average, 25th and 75th percentiles. Results: Presenting normalized values allow Canada’s results to be compared across multiple OECD indicators on the same scale. No country or province consistently has higher results than the others. For most indicators, Canadian results are similar to other countries, but there remain areas where Canada performs particularly well (i.e. smoking rates) or poorly (i.e. patient safety). This data was presented in an interactive eTool. Conclusion: Comparing Canada’s provinces internationally can highlight areas where improvement is needed, and help to identify potential strategies for improvement.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014729
    Description:

    The use of administrative datasets as a data source in official statistics has become much more common as there is a drive for more outputs to be produced more efficiently. Many outputs rely on linkage between two or more datasets, and this is often undertaken in a number of phases with different methods and rules. In these situations we would like to be able to assess the quality of the linkage, and this involves some re-assessment of both links and non-links. In this paper we discuss sampling approaches to obtain estimates of false negatives and false positives with reasonable control of both accuracy of estimates and cost. Approaches to stratification of links (non-links) to sample are evaluated using information from the 2011 England and Wales population census.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014740
    Description:

    In this paper, we discuss the impacts of Employment Benefit and Support Measures delivered in Canada under the Labour Market Development Agreements. We use linked rich longitudinal administrative data covering all LMDA participants from 2002 to 2005. We Apply propensity score matching as in Blundell et al. (2002), Gerfin and Lechner (2002), and Sianesi (2004), and produced the national incremental impact estimates using difference-in-differences and Kernel Matching estimator (Heckman and Smith, 1999). The findings suggest that, both Employment Assistance Services and employment benefit such as Skills Development and Targeted Wage Subsidies had positive effects on earnings and employment.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014718
    Description:

    This study assessed whether starting participation in Employment Assistance Services (EAS) earlier after initiating an Employment Insurance (EI) claim leads to better impacts for unemployed individuals than participating later during the EI benefit period. As in Sianesi (2004) and Hujer and Thomsen (2010), the analysis relied on a stratified propensity score matching approach conditional on the discretized duration of unemployment until the program starts. The results showed that individuals who participated in EAS within the first four weeks after initiating an EI claim had the best impacts on earnings and incidence of employment while also experiencing reduced use of EI starting the second year post-program.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014727
    Description:

    "Probability samples of near-universal frames of households and persons, administered standardized measures, yielding long multivariate data records, and analyzed with statistical procedures reflecting the design – these have been the cornerstones of the empirical social sciences for 75 years. That measurement structure have given the developed world almost all of what we know about our societies and their economies. The stored survey data form a unique historical record. We live now in a different data world than that in which the leadership of statistical agencies and the social sciences were raised. High-dimensional data are ubiquitously being produced from Internet search activities, mobile Internet devices, social media, sensors, retail store scanners, and other devices. Some estimate that these data sources are increasing in size at the rate of 40% per year. Together their sizes swamp that of the probability-based sample surveys. Further, the state of sample surveys in the developed world is not healthy. Falling rates of survey participation are linked with ever-inflated costs of data collection. Despite growing needs for information, the creation of new survey vehicles is hampered by strained budgets for official statistical agencies and social science funders. These combined observations are unprecedented challenges for the basic paradigm of inference in the social and economic sciences. This paper discusses alternative ways forward at this moment in history. "

    Release date: 2016-03-24

  • Technical products: 11-522-X201300014287
    Description:

    The purpose of the EpiNano program is to monitor workers who may be exposed to intentionally produced nanomaterials in France. This program is based both on industrial hygiene data collected in businesses for the purpose of gauging exposure to nanomaterials at workstations and on data from self-administered questionnaires completed by participants. These data will subsequently be matched with health data from national medical-administrative databases (passive monitoring of health events). Follow-up questionnaires will be sent regularly to participants. This paper describes the arrangements for optimizing data collection and matching.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014271
    Description:

    The purpose of this paper is to present the use of administrative records in the U.S. Census for Group Quarters, or known as collective dwellings elsewhere. Group Quarters enumeration involves collecting data from such hard-to-access places as correctional facilities, skilled nursing facilities, and military barracks. We discuss benefits and constraints of using various sources of administrative records in constructing the Group Quarters frame for coverage improvement. This paper is a companion to the paper by Chun and Gan (2014), discusing the potential uses of administrative records in the Group Quarters enumeration.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014258
    Description:

    The National Fuel Consumption Survey (FCS) was created in 2013 and is a quarterly survey that is designed to analyze distance driven and fuel consumption for passenger cars and other vehicles weighing less than 4,500 kilograms. The sampling frame consists of vehicles extracted from the vehicle registration files, which are maintained by provincial ministries. For collection, FCS uses car chips for a part of the sampled units to collect information about the trips and the fuel consumed. There are numerous advantages to using this new technology, for example, reduction in response burden, collection costs and effects on data quality. For the quarters in 2013, the sampled units were surveyed 95% via paper questionnaires and 5% with car chips, and in Q1 2014, 40% of sampled units were surveyed with car chips. This study outlines the methodology of the survey process, examines the advantages and challenges in processing and imputation for the two collection modes, presents some initial results and concludes with a summary of the lessons learned.

    Release date: 2014-10-31

  • Technical products: 12-002-X201400111901
    Description:

    This document is for analysts/researchers who are considering doing research with data from a survey where both survey weights and bootstrap weights are provided in the data files. This document gives directions, for some selected software packages, about how to get started in using survey weights and bootstrap weights for an analysis of survey data. We give brief directions for obtaining survey-weighted estimates, bootstrap variance estimates (and other desired error quantities) and some typical test statistics for each software package in turn. While these directions are provided just for the chosen examples, there will be information about the range of weighted and bootstrapped analyses that can be carried out by each software package.

    Release date: 2014-08-07

  • Technical products: 11-522-X200800011006
    Description:

    The Office for National Statistics (ONS) has an obligation to measure and annually report on the burden that it places on businesses participating in its surveys. There are also targets for reduction of costs to businesses complying with government regulation as part of the 2005 Administrative Burdens Reduction Project (ABRP) coordinated by the Better Regulation Executive (BRE).

    Respondent burden is measured by looking at the economic costs to businesses. Over time the methodology for measuring this economic cost has changed with the most recent method being the development and piloting of a Standard Cost Model (SCM) approach.

    The SCM is commonly used in Europe and is focused on measuring objective administrative burdens for all government requests for information e.g. tax returns, VAT, as well as survey participation. This method was not therefore specifically developed to measure statistical response burden. The SCM methodology is activity-based, meaning that the costs and time taken to fulfil requirements are broken down by activity.

    The SCM approach generally collects data using face-to-face interviews. The approach is therefore labour intensive both from a collection and analysis perspective but provides in depth information. The approach developed and piloted at ONS uses paper self-completion questionnaires.

    The objective of this paper is to provide an overview of respondent burden reporting and targets; and to review the different methodologies that ONS has used to measure respondent burden from the perspectives of sampling, data collection, analysis and usability.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010948
    Description:

    Past survey instruments, whether in the form of a paper questionnaire or telephone script, were their own documentation. Based on this, the ESRC Question Bank was created, providing free-access internet publication of questionnaires, enabling researchers to re-use questions, saving them trouble, whilst improving the comparability of their data with that collected by others. Today however, as survey technology and computer programs have become more sophisticated, accurate comprehension of the latest questionnaires seems more difficult, particularly when each survey team uses its own conventions to document complex items in technical reports. This paper seeks to illustrate these problems and suggest preliminary standards of presentation to be used until the process can be automated.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010971
    Description:

    Keynote address

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010991
    Description:

    In the evaluation of prospective survey designs, statistical agencies generally must consider a large number of design factors that may have a substantial impact on both survey costs and data quality. Assessments of trade-offs between cost and quality are often complicated by limitations on the amount of information available regarding fixed and marginal costs related to: instrument redesign and field testing; the number of primary sample units and sample elements included in the sample; assignment of instrument sections and collection modes to specific sample elements; and (for longitudinal surveys) the number and periodicity of interviews. Similarly, designers often have limited information on the impact of these design factors on data quality.

    This paper extends standard design-optimization approaches to account for uncertainty in the abovementioned components of cost and quality. Special attention is directed toward the level of precision required for cost and quality information to provide useful input into the design process; sensitivity of cost-quality trade-offs to changes in assumptions regarding functional forms; and implications for preliminary work focused on collection of cost and quality information. In addition, the paper considers distinctions between cost and quality components encountered in field testing and production work, respectively; incorporation of production-level cost and quality information into adaptive design work; as well as costs and operational risks arising from the collection of detailed cost and quality data during production work. The proposed methods are motivated by, and applied to, work with partitioned redesign of the interview and diary components of the U.S. Consumer Expenditure Survey.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010998
    Description:

    Businesses which have not responded to a mail survey are generally subject to intensive follow up (IFU) by telephone or other means to obtain a response. As this contact is expensive, strategies are needed to optimise the approach to conducting calls for IFU purpose.

    This paper presents results from an investigation conducted into the number and type of IFU contacts made for business surveys at the Australian Bureau of Statistics (ABS). The paper compares the amount of effort expended in IFU compared to the response rates and contribution to key estimates for these surveys, and discusses possible uses of this type of information to make more optimal use of resources.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010978
    Description:

    Census developers and social researchers are at a critical juncture in determining collection modes of the future. Internet data collection is technically feasible, but the initial investment in hardware and software is costly. Given the great divide in computer knowledge and access, internet data collection is viable for some, but not for all. Therefore internet cannot fully replace the existing paper questionnaire - at least not in the near future.

    Canada, Australia and New Zealand are pioneers in internet data collection as an option for completing the census. This paper studies four driving forces behind this collection mode: 1) responding to social/public expectations; 2) longer term economic benefits; 3) improved data quality; and 4) improved coverage.

    Issues currently being faced are: 1) estimating internet uptake and maximizing benefits without undue risk; 2) designing a questionnaire for multiple modes; 3) producing multiple public communication approaches; and 4) gaining positive public reaction and trust in using the internet.

    This paper summarizes the countries' collective thinking and experiences on the benefits and limitation of internet data collection for a census of population and dwellings. It also provides an outline of where countries are heading in terms of internet data collection in the future.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010980
    Description:

    A census is the largest and possibly one of the most complex data collection operations undertaken by a government. Many of the challenges encountered are linked to the sheer size of the operation, when millions of dwellings need to be contacted, and thousands of people must be mobilized to help in the data collection efforts. Statistics Canada is a world leader in its approaches to census data collection. New collection approaches were introduced with the 2006 Census, more particularly an Internet response option, to add to the mail-out, telephone and face-to-face collection approaches. Such diversity in data collection methods requires an integrated approach to management to ensure quality and efficiency in an environment of declining survey response rates and a tighter fiscal framework. In preparing for its' 2011 Census, Statistics Canada is putting in place a number of new systems and processes to actively manage field data collection operations. One of the key elements of the approach will be a Field Management System which will allow the majority of field personnel to register enumeration progress in the field, and be informed in a very timely fashion of questionnaires received at the Data Operations Centre via Internet, by mail or other channels, thus informing them to cease non-response follow up efforts on those dwellings, in an attempt to eliminate unnecessary follow-up work.

    Release date: 2009-12-03

  • Technical products: 11-522-X200600110436
    Description:

    The 2006/2007 New Zealand Health Survey sample was designed to meet a range of objectives, the most challenging of which was to achieve sufficient precision for subpopulations of interest, particularly the indigenous Maori population. About 14% of New Zealand's population are Maori. This group is geographically clustered to some extent, but even so most Maori live in areas which have relatively low proportions of Maori, making it difficult to sample this population efficiently. Disproportionate sampling and screening were used to achieve sufficient sample size while maintaining low design effects.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110426
    Description:

    This paper describes the sample design used to satisfy the objectives and logistics of the Canadian Health Measures Survey. Among the challenges in developing the design were the need to select respondents close to clinics, the difficulty of achieving the desired sample size for young people, and subsampling for measures associated with exposure to environmental contaminants. The sample design contains solutions to those challenges: the establishment of collection sites, the use of more than one sample frame, and a respondent selection strategy.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110395
    Description:

    This study investigates factors associated with obesity in Canada and the U.S., using data from the 2002-03 Joint Canada/United States Survey of Health, a telephone survey conducted jointly by Statistics Canada and the U.S. National Center for Health Statistics. Essentially the same questionnaire was administered in both countries at the same time, yielding a data set that provided unprecedented comparability of national estimates from the two countries. Analysis of empirical distributions of body mass index (BMI) show that American women are appreciably heavier than Canadian women, whereas the distributions of BMI are almost identical for American men and Canadian men. Factors are investigated that may account for the differences between women.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110402
    Description:

    This paper explains how to append census area-level summary data to survey or administrative data. It uses examples from survey datasets present in Statistics Canada Research Data Centres, but the methods also apply to external datasets, including administrative datasets. Four examples illustrate common situations faced by researchers: (1) when the survey (or administrative) and census data both contain the same level of geographic identifiers, coded to the same year standard ("vintage") of census geography (for example, if both have 2001 DA); (2) when the two files contain geographic identifiers of the same vintage, but at different levels of census geography (for example, 1996 EA in the survey, but 1996 CT in the census data); (3) when the two files contain data coded to different vintages of census geography (such as 1996 EA for the survey, but 2001 DA for the census); (4) when the survey data are lacking in geographic identifiers, and those identifiers must first be generated from postal codes present on the file. The examples are shown using SAS syntax, but the principles apply to other programming languages or statistical packages.

    Release date: 2008-03-17

  • Technical products: 11-522-X200600110444
    Description:

    General population health surveys often include small samples of smokers. Few longitudinal studies specific to smoking have been carried out. We discuss development of the Ontario Tobacco Survey (OTS) which combines a rolling longitudinal, and repeated cross-sectional components. The OTS began in July 2005 using random selection and data-collection by telephones. Every 6 months, new samples of smokers and non-smokers provide data on smoking behaviours and attitudes. Smokers enter a panel study and are followed for changes in smoking influences and behaviour. The design is proving to be cost effective in meeting sample requirements for multiple research objectives.

    Release date: 2008-03-17

Date modified: