Statistical techniques

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

2 facets displayed. 0 facets selected.

Survey or statistical program

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (24)

All (24) (0 to 10 of 24 results)

  • Articles and reports: 12-001-X202300200005
    Description: Population undercoverage is one of the main hurdles faced by statistical analysis with non-probability survey samples. We discuss two typical scenarios of undercoverage, namely, stochastic undercoverage and deterministic undercoverage. We argue that existing estimation methods under the positivity assumption on the propensity scores (i.e., the participation probabilities) can be directly applied to handle the scenario of stochastic undercoverage. We explore strategies for mitigating biases in estimating the mean of the target population under deterministic undercoverage. In particular, we examine a split population approach based on a convex hull formulation, and construct estimators with reduced biases. A doubly robust estimator can be constructed if a followup subsample of the reference probability survey with measurements on the study variable becomes feasible. Performances of six competing estimators are investigated through a simulation study and issues which require further investigation are briefly discussed.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300100002
    Description: We consider regression analysis in the context of data integration. To combine partial information from external sources, we employ the idea of model calibration which introduces a “working” reduced model based on the observed covariates. The working reduced model is not necessarily correctly specified but can be a useful device to incorporate the partial information from the external data. The actual implementation is based on a novel application of the information projection and model calibration weighting. The proposed method is particularly attractive for combining information from several sources with different missing patterns. The proposed method is applied to a real data example combining survey data from Korean National Health and Nutrition Examination Survey and big data from National Health Insurance Sharing Service in Korea.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202000200005
    Description:

    In surveys, text answers from open-ended questions are important because they allow respondents to provide more information without constraints. When classifying open-ended questions automatically using supervised learning, often the accuracy is not high enough. Alternatively, a semi-automated classification strategy can be considered: answers in the easy-to-classify group are classified automatically, answers in the hard-to-classify group are classified manually. This paper presents a semi-automated classification method for multi-label open-ended questions where text answers may be associated with multiple classes simultaneously. The proposed method effectively combines multiple probabilistic classifier chains while avoiding prohibitive computational costs. The performance evaluation on three different data sets demonstrates the effectiveness of the proposed method.

    Release date: 2020-12-15

  • Articles and reports: 82-622-X2015009
    Description:

    The Canadian Cancer Registry (CCR) represents a collaborative effort between Statistics Canada and the thirteen provincial and territorial cancer registries to create a single database to report annually on cancer incidence and survival at the national and jurisdictional level. While gains have been made to ensure high quality, standardized, and comparable data, the CCR currently lacks information on cancer treatment. The Canadian Council of Cancer Registries (CCCR) identified the need to capture treatment data at the national level as a key strategic priority for 2013/2014. Record linkage was identified as one possible approach to fill this information gap.

    The purpose of this study is to examine the feasibility of using record linkage to add cancer treatment information for selected cancers: breast, colorectal and prostate. The objectives are twofold: to assess the quality of the linkage processes and the validity of using linked data to estimate cancer treatment rates at the provincial level. The study is based on the Canadian Cancer Registry (2005 to 2008) linked to the Discharge Abstract Database (DAD) and the National Ambulatory Care Reporting System (NACRS) for four provinces (Ontario, Manitoba, Nova Scotia and Prince Edward Island). The linkage was proposed by Statistics Canada, the CCCR and the Canadian Institute for Health Information (CIHI). The linkage was approved and conducted at Statistics Canada.

    Release date: 2015-11-23

  • Articles and reports: 12-001-X201400114004
    Description:

    In 2009, two major surveys in the Governments Division of the U.S. Census Bureau were redesigned to reduce sample size, save resources, and improve the precision of the estimates (Cheng, Corcoran, Barth and Hogue 2009). The new design divides each of the traditional state by government-type strata with sufficiently many units into two sub-strata according to each governmental unit’s total payroll, in order to sample less from the sub-stratum with small size units. The model-assisted approach is adopted in estimating population totals. Regression estimators using auxiliary variables are obtained either within each created sub-stratum or within the original stratum by collapsing two sub-strata. A decision-based method was proposed in Cheng, Slud and Hogue (2010), applying a hypothesis test to decide which regression estimator is used within each original stratum. Consistency and asymptotic normality of these model-assisted estimators are established here, under a design-based or model-assisted asymptotic framework. Our asymptotic results also suggest two types of consistent variance estimators, one obtained by substituting unknown quantities in the asymptotic variances and the other by applying the bootstrap. The performance of all the estimators of totals and of their variance estimators are examined in some empirical studies. The U.S. Annual Survey of Public Employment and Payroll (ASPEP) is used to motivate and illustrate our study.

    Release date: 2014-06-27

  • Articles and reports: 82-003-X200900110795
    Geography: Canada
    Description:

    This article presents methods of combining cycles of the Canadian Community Health Survey and discusses issues to consider if these data are to be combined.

    Release date: 2009-02-18

  • Articles and reports: 82-003-X200800310681
    Geography: Canada
    Description:

    This article describes the methods used to link census data from the long-form questionnaire to mortality data, and reports simple findings for the major groups, defined by income, education, occupation, language and ethnicity, Aboriginal or visible minority status, and disability status.

    Release date: 2008-09-17

  • Articles and reports: 11-522-X200600110402
    Description:

    This paper explains how to append census area-level summary data to survey or administrative data. It uses examples from survey datasets present in Statistics Canada Research Data Centres, but the methods also apply to external datasets, including administrative datasets. Four examples illustrate common situations faced by researchers: (1) when the survey (or administrative) and census data both contain the same level of geographic identifiers, coded to the same year standard ("vintage") of census geography (for example, if both have 2001 DA); (2) when the two files contain geographic identifiers of the same vintage, but at different levels of census geography (for example, 1996 EA in the survey, but 1996 CT in the census data); (3) when the two files contain data coded to different vintages of census geography (such as 1996 EA for the survey, but 2001 DA for the census); (4) when the survey data are lacking in geographic identifiers, and those identifiers must first be generated from postal codes present on the file. The examples are shown using SAS syntax, but the principles apply to other programming languages or statistical packages.

    Release date: 2008-03-17

  • Surveys and statistical programs – Documentation: 68-514-X
    Description:

    Statistics Canada's approach to gathering and disseminating economic data has developed over several decades into a highly integrated system for collection and estimation that feeds the framework of the Canadian System of National Accounts.

    The key to this approach was creation of the Unified Enterprise Survey, the goal of which was to improve the consistency, coherence, breadth and depth of business survey data.

    The UES did so by bringing many of Statistics Canada's individual annual business surveys under a common framework. This framework included a single survey frame, a sample design framework, conceptual harmonization of survey content, means of using relevant administrative data, common data collection, processing and analysis tools, and a common data warehouse.

    Release date: 2006-11-20

  • Articles and reports: 12-002-X20060019254
    Description:

    This article explains how to append census area-level summary data to survey or administrative data. It uses examples from datasets present in Statistics Canada Research Data Centres, but the methods also apply to external datasets. Four examples illustrate common situations faced by researchers: (1) when the survey (or administrative) and census data both contain the same level of geographic identifiers, coded to the same year standard ("vintage") of census geography; (2) when the two files contain geographic identifiers of the same vintage, but at different levels of census geography; (3) when the two files contain data coded to different vintages of census geography; (4) when the survey data are lacking in geographic identifiers, and those identifiers must first be generated from postal codes present on the file. The examples are shown using SAS syntax, but the principles apply to other programming languages or statistical packages.

    Release date: 2006-07-18
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (23)

Analysis (23) (0 to 10 of 23 results)

  • Articles and reports: 12-001-X202300200005
    Description: Population undercoverage is one of the main hurdles faced by statistical analysis with non-probability survey samples. We discuss two typical scenarios of undercoverage, namely, stochastic undercoverage and deterministic undercoverage. We argue that existing estimation methods under the positivity assumption on the propensity scores (i.e., the participation probabilities) can be directly applied to handle the scenario of stochastic undercoverage. We explore strategies for mitigating biases in estimating the mean of the target population under deterministic undercoverage. In particular, we examine a split population approach based on a convex hull formulation, and construct estimators with reduced biases. A doubly robust estimator can be constructed if a followup subsample of the reference probability survey with measurements on the study variable becomes feasible. Performances of six competing estimators are investigated through a simulation study and issues which require further investigation are briefly discussed.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300100002
    Description: We consider regression analysis in the context of data integration. To combine partial information from external sources, we employ the idea of model calibration which introduces a “working” reduced model based on the observed covariates. The working reduced model is not necessarily correctly specified but can be a useful device to incorporate the partial information from the external data. The actual implementation is based on a novel application of the information projection and model calibration weighting. The proposed method is particularly attractive for combining information from several sources with different missing patterns. The proposed method is applied to a real data example combining survey data from Korean National Health and Nutrition Examination Survey and big data from National Health Insurance Sharing Service in Korea.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202000200005
    Description:

    In surveys, text answers from open-ended questions are important because they allow respondents to provide more information without constraints. When classifying open-ended questions automatically using supervised learning, often the accuracy is not high enough. Alternatively, a semi-automated classification strategy can be considered: answers in the easy-to-classify group are classified automatically, answers in the hard-to-classify group are classified manually. This paper presents a semi-automated classification method for multi-label open-ended questions where text answers may be associated with multiple classes simultaneously. The proposed method effectively combines multiple probabilistic classifier chains while avoiding prohibitive computational costs. The performance evaluation on three different data sets demonstrates the effectiveness of the proposed method.

    Release date: 2020-12-15

  • Articles and reports: 82-622-X2015009
    Description:

    The Canadian Cancer Registry (CCR) represents a collaborative effort between Statistics Canada and the thirteen provincial and territorial cancer registries to create a single database to report annually on cancer incidence and survival at the national and jurisdictional level. While gains have been made to ensure high quality, standardized, and comparable data, the CCR currently lacks information on cancer treatment. The Canadian Council of Cancer Registries (CCCR) identified the need to capture treatment data at the national level as a key strategic priority for 2013/2014. Record linkage was identified as one possible approach to fill this information gap.

    The purpose of this study is to examine the feasibility of using record linkage to add cancer treatment information for selected cancers: breast, colorectal and prostate. The objectives are twofold: to assess the quality of the linkage processes and the validity of using linked data to estimate cancer treatment rates at the provincial level. The study is based on the Canadian Cancer Registry (2005 to 2008) linked to the Discharge Abstract Database (DAD) and the National Ambulatory Care Reporting System (NACRS) for four provinces (Ontario, Manitoba, Nova Scotia and Prince Edward Island). The linkage was proposed by Statistics Canada, the CCCR and the Canadian Institute for Health Information (CIHI). The linkage was approved and conducted at Statistics Canada.

    Release date: 2015-11-23

  • Articles and reports: 12-001-X201400114004
    Description:

    In 2009, two major surveys in the Governments Division of the U.S. Census Bureau were redesigned to reduce sample size, save resources, and improve the precision of the estimates (Cheng, Corcoran, Barth and Hogue 2009). The new design divides each of the traditional state by government-type strata with sufficiently many units into two sub-strata according to each governmental unit’s total payroll, in order to sample less from the sub-stratum with small size units. The model-assisted approach is adopted in estimating population totals. Regression estimators using auxiliary variables are obtained either within each created sub-stratum or within the original stratum by collapsing two sub-strata. A decision-based method was proposed in Cheng, Slud and Hogue (2010), applying a hypothesis test to decide which regression estimator is used within each original stratum. Consistency and asymptotic normality of these model-assisted estimators are established here, under a design-based or model-assisted asymptotic framework. Our asymptotic results also suggest two types of consistent variance estimators, one obtained by substituting unknown quantities in the asymptotic variances and the other by applying the bootstrap. The performance of all the estimators of totals and of their variance estimators are examined in some empirical studies. The U.S. Annual Survey of Public Employment and Payroll (ASPEP) is used to motivate and illustrate our study.

    Release date: 2014-06-27

  • Articles and reports: 82-003-X200900110795
    Geography: Canada
    Description:

    This article presents methods of combining cycles of the Canadian Community Health Survey and discusses issues to consider if these data are to be combined.

    Release date: 2009-02-18

  • Articles and reports: 82-003-X200800310681
    Geography: Canada
    Description:

    This article describes the methods used to link census data from the long-form questionnaire to mortality data, and reports simple findings for the major groups, defined by income, education, occupation, language and ethnicity, Aboriginal or visible minority status, and disability status.

    Release date: 2008-09-17

  • Articles and reports: 11-522-X200600110402
    Description:

    This paper explains how to append census area-level summary data to survey or administrative data. It uses examples from survey datasets present in Statistics Canada Research Data Centres, but the methods also apply to external datasets, including administrative datasets. Four examples illustrate common situations faced by researchers: (1) when the survey (or administrative) and census data both contain the same level of geographic identifiers, coded to the same year standard ("vintage") of census geography (for example, if both have 2001 DA); (2) when the two files contain geographic identifiers of the same vintage, but at different levels of census geography (for example, 1996 EA in the survey, but 1996 CT in the census data); (3) when the two files contain data coded to different vintages of census geography (such as 1996 EA for the survey, but 2001 DA for the census); (4) when the survey data are lacking in geographic identifiers, and those identifiers must first be generated from postal codes present on the file. The examples are shown using SAS syntax, but the principles apply to other programming languages or statistical packages.

    Release date: 2008-03-17

  • Articles and reports: 12-002-X20060019254
    Description:

    This article explains how to append census area-level summary data to survey or administrative data. It uses examples from datasets present in Statistics Canada Research Data Centres, but the methods also apply to external datasets. Four examples illustrate common situations faced by researchers: (1) when the survey (or administrative) and census data both contain the same level of geographic identifiers, coded to the same year standard ("vintage") of census geography; (2) when the two files contain geographic identifiers of the same vintage, but at different levels of census geography; (3) when the two files contain data coded to different vintages of census geography; (4) when the survey data are lacking in geographic identifiers, and those identifiers must first be generated from postal codes present on the file. The examples are shown using SAS syntax, but the principles apply to other programming languages or statistical packages.

    Release date: 2006-07-18

  • Articles and reports: 12-001-X20050018083
    Description:

    The advent of computerized record linkage methodology has facilitated the conduct of cohort mortality studies in which exposure data in one database are electronically linked with mortality data from another database. This, however, introduces linkage errors due to mismatching an individual from one database with a different individual from the other database. In this article, the impact of linkage errors on estimates of epidemiological indicators of risk such as standardized mortality ratios and relative risk regression model parameters is explored. It is shown that the observed and expected number of deaths are affected in opposite direction and, as a result, these indicators can be subject to bias and additional variability in the presence of linkage errors.

    Release date: 2005-07-21
Reference (1)

Reference (1) ((1 result))

  • Surveys and statistical programs – Documentation: 68-514-X
    Description:

    Statistics Canada's approach to gathering and disseminating economic data has developed over several decades into a highly integrated system for collection and estimation that feeds the framework of the Canadian System of National Accounts.

    The key to this approach was creation of the Unified Enterprise Survey, the goal of which was to improve the consistency, coherence, breadth and depth of business survey data.

    The UES did so by bringing many of Statistics Canada's individual annual business surveys under a common framework. This framework included a single survey frame, a sample design framework, conceptual harmonization of survey content, means of using relevant administrative data, common data collection, processing and analysis tools, and a common data warehouse.

    Release date: 2006-11-20
Date modified: