Statistical techniques

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Type

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (6)

All (6) ((6 results))

  • Articles and reports: 11-522-X202100100017
    Description: The outbreak of the COVID-19 pandemic required the Government of Canada to provide relevant and timely information to support decision-making around a host of issues, including personal protective equipment (PPE) procurement and deployment. Our team built a compartmental epidemiological model from an existing code base to project PPE demand under a range of epidemiological scenarios. This model was further enhanced using data science techniques, which allowed for the rapid development and dissemination of model results to inform policy decisions.

    Key Words: COVID-19; SARS-CoV-2; Epidemiological model; Data science; Personal Protective Equipment (PPE); SEIR

    Release date: 2021-10-22

  • Articles and reports: 11-522-X202100100003
    Description:

    The increasing size and richness of digital data allow for modeling more complex relationships and interactions, which is the strongpoint of machine learning. Here we applied gradient boosting to the Dutch system of social statistical datasets to estimate transition probabilities into and out of poverty. Individual estimates are reasonable, but the main advantages of the approach in combination with SHAP and global surrogate models are the simultaneous ranking of hundreds of features by their importance, detailed insight into their relationship with the transition probabilities, and the data-driven identification of subpopulations with relatively high and low transition probabilities. In addition, we decompose the difference in feature importance between general and subpopulation into a frequency and a feature effect. We caution for misinterpretation and discuss future directions.

    Key Words: Classification; Explainability; Gradient boosting; Life event; Risk factors; SHAP decomposition.

    Release date: 2021-10-15

  • Articles and reports: 82-622-X2015009
    Description:

    The Canadian Cancer Registry (CCR) represents a collaborative effort between Statistics Canada and the thirteen provincial and territorial cancer registries to create a single database to report annually on cancer incidence and survival at the national and jurisdictional level. While gains have been made to ensure high quality, standardized, and comparable data, the CCR currently lacks information on cancer treatment. The Canadian Council of Cancer Registries (CCCR) identified the need to capture treatment data at the national level as a key strategic priority for 2013/2014. Record linkage was identified as one possible approach to fill this information gap.

    The purpose of this study is to examine the feasibility of using record linkage to add cancer treatment information for selected cancers: breast, colorectal and prostate. The objectives are twofold: to assess the quality of the linkage processes and the validity of using linked data to estimate cancer treatment rates at the provincial level. The study is based on the Canadian Cancer Registry (2005 to 2008) linked to the Discharge Abstract Database (DAD) and the National Ambulatory Care Reporting System (NACRS) for four provinces (Ontario, Manitoba, Nova Scotia and Prince Edward Island). The linkage was proposed by Statistics Canada, the CCCR and the Canadian Institute for Health Information (CIHI). The linkage was approved and conducted at Statistics Canada.

    Release date: 2015-11-23

  • Articles and reports: 12-001-X201200111685
    Description:

    Survey data are often used to fit linear regression models. The values of covariates used in modeling are not controlled as they might be in an experiment. Thus, collinearity among the covariates is an inevitable problem in the analysis of survey data. Although many books and articles have described the collinearity problem and proposed strategies to understand, assess and handle its presence, the survey literature has not provided appropriate diagnostic tools to evaluate its impact on regression estimation when the survey complexities are considered. We have developed variance inflation factors (VIFs) that measure the amount that variances of parameter estimators are increased due to having non-orthogonal predictors. The VIFs are appropriate for survey-weighted regression estimators and account for complex design features, e.g., weights, clusters, and strata. Illustrations of these methods are given using a probability sample from a household survey of health and nutrition.

    Release date: 2012-06-27

  • Articles and reports: 12-001-X20050018085
    Description:

    Record linkage is a process of pairing records from two files and trying to select the pairs that belong to the same entity. The basic framework uses a match weight to measure the likelihood of a correct match and a decision rule to assign record pairs as "true" or "false" match pairs. Weight thresholds for selecting a record pair as matched or unmatched depend on the desired control over linkage errors. Current methods to determine the selection thresholds and estimate linkage errors can provide divergent results, depending on the type of linkage error and the approach to linkage. This paper presents a case study that uses existing linkage methods to link record pairs but a new simulation approach (SimRate) to help determine selection thresholds and estimate linkage errors. SimRate uses the observed distribution of data in matched and unmatched pairs to generate a large simulated set of record pairs, assigns a match weight to each pair based on specified match rules, and uses the weight curves of the simulated pairs for error estimation.

    Release date: 2005-07-21

  • Articles and reports: 11-522-X20030017712
    Description:

    This paper discusse variance estimation in the presence of imputations with an application to price index estimation, multiphase sampling, and the use of graphics in publications.

    Release date: 2005-01-26
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (6)

Analysis (6) ((6 results))

  • Articles and reports: 11-522-X202100100017
    Description: The outbreak of the COVID-19 pandemic required the Government of Canada to provide relevant and timely information to support decision-making around a host of issues, including personal protective equipment (PPE) procurement and deployment. Our team built a compartmental epidemiological model from an existing code base to project PPE demand under a range of epidemiological scenarios. This model was further enhanced using data science techniques, which allowed for the rapid development and dissemination of model results to inform policy decisions.

    Key Words: COVID-19; SARS-CoV-2; Epidemiological model; Data science; Personal Protective Equipment (PPE); SEIR

    Release date: 2021-10-22

  • Articles and reports: 11-522-X202100100003
    Description:

    The increasing size and richness of digital data allow for modeling more complex relationships and interactions, which is the strongpoint of machine learning. Here we applied gradient boosting to the Dutch system of social statistical datasets to estimate transition probabilities into and out of poverty. Individual estimates are reasonable, but the main advantages of the approach in combination with SHAP and global surrogate models are the simultaneous ranking of hundreds of features by their importance, detailed insight into their relationship with the transition probabilities, and the data-driven identification of subpopulations with relatively high and low transition probabilities. In addition, we decompose the difference in feature importance between general and subpopulation into a frequency and a feature effect. We caution for misinterpretation and discuss future directions.

    Key Words: Classification; Explainability; Gradient boosting; Life event; Risk factors; SHAP decomposition.

    Release date: 2021-10-15

  • Articles and reports: 82-622-X2015009
    Description:

    The Canadian Cancer Registry (CCR) represents a collaborative effort between Statistics Canada and the thirteen provincial and territorial cancer registries to create a single database to report annually on cancer incidence and survival at the national and jurisdictional level. While gains have been made to ensure high quality, standardized, and comparable data, the CCR currently lacks information on cancer treatment. The Canadian Council of Cancer Registries (CCCR) identified the need to capture treatment data at the national level as a key strategic priority for 2013/2014. Record linkage was identified as one possible approach to fill this information gap.

    The purpose of this study is to examine the feasibility of using record linkage to add cancer treatment information for selected cancers: breast, colorectal and prostate. The objectives are twofold: to assess the quality of the linkage processes and the validity of using linked data to estimate cancer treatment rates at the provincial level. The study is based on the Canadian Cancer Registry (2005 to 2008) linked to the Discharge Abstract Database (DAD) and the National Ambulatory Care Reporting System (NACRS) for four provinces (Ontario, Manitoba, Nova Scotia and Prince Edward Island). The linkage was proposed by Statistics Canada, the CCCR and the Canadian Institute for Health Information (CIHI). The linkage was approved and conducted at Statistics Canada.

    Release date: 2015-11-23

  • Articles and reports: 12-001-X201200111685
    Description:

    Survey data are often used to fit linear regression models. The values of covariates used in modeling are not controlled as they might be in an experiment. Thus, collinearity among the covariates is an inevitable problem in the analysis of survey data. Although many books and articles have described the collinearity problem and proposed strategies to understand, assess and handle its presence, the survey literature has not provided appropriate diagnostic tools to evaluate its impact on regression estimation when the survey complexities are considered. We have developed variance inflation factors (VIFs) that measure the amount that variances of parameter estimators are increased due to having non-orthogonal predictors. The VIFs are appropriate for survey-weighted regression estimators and account for complex design features, e.g., weights, clusters, and strata. Illustrations of these methods are given using a probability sample from a household survey of health and nutrition.

    Release date: 2012-06-27

  • Articles and reports: 12-001-X20050018085
    Description:

    Record linkage is a process of pairing records from two files and trying to select the pairs that belong to the same entity. The basic framework uses a match weight to measure the likelihood of a correct match and a decision rule to assign record pairs as "true" or "false" match pairs. Weight thresholds for selecting a record pair as matched or unmatched depend on the desired control over linkage errors. Current methods to determine the selection thresholds and estimate linkage errors can provide divergent results, depending on the type of linkage error and the approach to linkage. This paper presents a case study that uses existing linkage methods to link record pairs but a new simulation approach (SimRate) to help determine selection thresholds and estimate linkage errors. SimRate uses the observed distribution of data in matched and unmatched pairs to generate a large simulated set of record pairs, assigns a match weight to each pair based on specified match rules, and uses the weight curves of the simulated pairs for error estimation.

    Release date: 2005-07-21

  • Articles and reports: 11-522-X20030017712
    Description:

    This paper discusse variance estimation in the presence of imputations with an application to price index estimation, multiphase sampling, and the use of graphics in publications.

    Release date: 2005-01-26
Reference (0)

Reference (0) (0 results)

No content available at this time.

Date modified: