Data analysis

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Type

2 facets displayed. 0 facets selected.

Geography

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (14)

All (14) (0 to 10 of 14 results)

  • Stats in brief: 89-20-00062023001
    Description: This course is intended for Government of Canada employees who would like to learn about evaluating the quality of data for a particular use. Whether you are a new employee interested in learning the basics, or an experienced subject matter expert looking to refresh your skills, this course is here to help.
    Release date: 2023-07-17

  • Articles and reports: 11-522-X202100100025
    Description:

    We propose a longitudinal analysis with a point of view connected to the organizational changes that have taken place in the Italian National Institute of Statistics in recent years. In 2016 the Institute introduced a new Directorate, intending to standardize and generalize the business process of Data Collection according to the European standard of the GAMSO model. The paper discusses the pros and cons of this change from the perspective of the survey's participation. The ICT survey response rate analysis demonstrates an increase of around 20% since the beginning of the new organization: the paper tries to focus on the impact of the changes introduced with the new organization. We focused our attention on two specific subsets of respondents - the so-called "wanted" - the ones who have never answered to an ICT survey or to any other Istat survey and - the so-called “lost” - the ones included in two consecutive survey’s samples and that answered in the previous edition but not in the current one. The paper aims to illustrate how an efficient organization of data collection reflects its benefits on survey results and what kind of actions should be taken to catch the attention of the "wanted". Finally, we apply a logistic model measuring the probability that an enterprise responding in 2018 (t-1) also answered in 2019 (t). All the analysis suggests some actions that could be taken to improve respondents' participation, data quality, and respondents' perception of the official statistics.

    Key Words: data collection strategy, response rate, paradata, response burden, ICT Survey.

    Release date: 2021-10-29

  • Articles and reports: 11-633-X2018016
    Description:

    Record linkage has been identified as a potential mechanism to add treatment information to the Canadian Cancer Registry (CCR). The purpose of the Canadian Cancer Treatment Linkage Project (CCTLP) pilot is to add surgical treatment data to the CCR. The Discharge Abstract Database (DAD) and the National Ambulatory Care Reporting System (NACRS) were linked to the CCR, and surgical treatment data were extracted. The project was funded through the Cancer Data Development Initiative (CDDI) of the Canadian Partnership Against Cancer (CPAC).

    The CCTLP was developed as a feasibility study in which patient records from the CCR would be linked to surgical treatment records in the DAD and NACRS databases, maintained by the Canadian Institute for Health Information. The target cohort to whom surgical treatment data would be linked was patients aged 19 or older registered on the CCR (2010 through 2012). The linkage was completed in Statistics Canada’s Social Data Linkage Environment (SDLE).

    Release date: 2018-03-27

  • Articles and reports: 12-001-X201700114822
    Description:

    We use a Bayesian method to infer about a finite population proportion when binary data are collected using a two-fold sample design from small areas. The two-fold sample design has a two-stage cluster sample design within each area. A former hierarchical Bayesian model assumes that for each area the first stage binary responses are independent Bernoulli distributions, and the probabilities have beta distributions which are parameterized by a mean and a correlation coefficient. The means vary with areas but the correlation is the same over areas. However, to gain some flexibility we have now extended this model to accommodate different correlations. The means and the correlations have independent beta distributions. We call the former model a homogeneous model and the new model a heterogeneous model. All hyperparameters have proper noninformative priors. An additional complexity is that some of the parameters are weakly identified making it difficult to use a standard Gibbs sampler for computation. So we have used unimodal constraints for the beta prior distributions and a blocked Gibbs sampler to perform the computation. We have compared the heterogeneous and homogeneous models using an illustrative example and simulation study. As expected, the two-fold model with heterogeneous correlations is preferred.

    Release date: 2017-06-22

  • Articles and reports: 11-633-X2016002
    Description:

    Immigrants comprise an ever-increasing percentage of the Canadian population—at more than 20%, which is the highest percentage among the G8 countries (Statistics Canada 2013a). This figure is expected to rise to 25% to 28% by 2031, when at least one in four people living in Canada will be foreign-born (Statistics Canada 2010).

    This report summarizes the linkage of the Immigrant Landing File (ILF) for all provinces and territories, excluding Quebec, to hospital data from the Discharge Abstract Database (DAD), a national database containing information about hospital inpatient and day-surgery events. A deterministic exact-matching approach was used to link data from the 1980-to-2006 ILF and from the DAD (2006/2007, 2007/2008 and 2008/2009) with the 2006 Census, which served as a “bridge” file. This was a secondary linkage in that it used linkage keys created in two previous projects (primary linkages) that separately linked the ILF and the DAD to the 2006 Census. The ILF–DAD linked data were validated by means of a representative sample of 2006 Census records containing immigrant information previously linked to the DAD.

    Release date: 2016-08-17

  • Articles and reports: 82-003-X201401014098
    Geography: Province or territory
    Description:

    This study compares registry and non-registry approaches to linking 2006 Census of Population data for Manitoba and Ontario to Hospital data from the Discharge Abstract Database.

    Release date: 2014-10-15

  • Articles and reports: 12-001-X20050029048
    Description:

    We consider a problem in which an analysis is needed for categorical data from a single two-way table with partial classification (i.e., both item and unit nonresponses). We assume that this is the only information available. A Bayesian methodology permits modeling different patterns of missingness under ignorability and nonignorability assumptions. We construct a nonignorable nonresponse model which is obtained from the ignorable nonresponse model via a model expansion using a data-dependent prior; the nonignorable nonresponse model robustifies the ignorable nonresponse model. A multinomial-Dirichlet model, adjusted for the nonresponse, is used to estimate the cell probabilities, and a Bayes factor is used to test for association. We illustrate our methodology using data on bone mineral density and family income. A sensitivity analysis is used to assess the effects of the data-dependent prior. The ignorable and nonignorable nonresponse models are compared using a simulation study, and there are subtle differences between these models.

    Release date: 2006-02-17

  • Articles and reports: 11-522-X20020016732
    Description:

    Analysis of dose-response relationships has long been important in toxicology. More recently, this type of analysis has been employed to evaluate public education campaigns. The data that are collected in such evaluations are likely to come from standard household survey designs with all the usual complexities of multiple stages, stratification and variable selection probabilities. On a recent evaluation, a system was developed with the following features: categorization of doses into three or four levels, propensity scoring of dose selection and a new jack-knifed Jonckheere-Terpstra test for a monotone dose-response relationship. This system allows rapid production of tests for monotone dose-response relationships that are corrected both for sample design and for confounding. The focus of this paper will be the results of a Monte-Carlo simulation of the properties of the jack-knifed Jonckheere-Terpstra.

    Moreover, there is no experimental control over dosages and the possibility of confounding variables must be considered. Standard regressions in WESVAR and SUDAAN could be used to determine if there is a linear dose-response relationship while controlling on confounders, but such an approach obviously has low power to detect nonlinear but monotone dose-response relationships and is time-consuming to implement if there are a large number of possible outcomes of interest.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016739
    Description:

    The Labour Force Survey (LFS) was not designed to be a longitudinal survey. However, given that respondent households typically remain in the sample for six consecutive months, it is possible to reconstruct six-month fragments of longitudinal data from the monthly records of household members. Such longitudinal data (altogether consisting of millions of person-months of individual- and family-level data) is useful for analyses of monthly labour market dynamics over relatively long periods of time, 20 years and more.

    We make use of these data to estimate hazard functions describing transitions among the labour market states: self-employed, paid employee and not employed. Data on job tenure for the employed, and data on the date last worked for the not employed - together with the date of survey responses - permit the estimated models to include terms reflecting seasonality and macro-economic cycles, as well as the duration dependence of each type of transition. In addition, the LFS data permit spouse labour market activity and family composition variables to be included in the hazard models as time-varying covariates. The estimated hazard equations have been included in the LifePaths socio-economic microsimulation model. In this setting, the equations may be used to simulate lifetime employment activity from past, present and future birth cohorts. Cross-sectional simulation results have been used to validate these models by comparisons with census data from the period 1971 to 1996.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016744
    Description:

    A developmental trajectory describes the course of a behaviour over age or time. This technical paper provides an overview of a semi-parametric, group-based method for analysing developmental trajectories. This methodology provides an alternative to assuming a homogenous population of trajectories as is done in standard growth modelling.

    Four capabilities are described: (1) the capability to identify, rather than assume, distinctive groups of trajectories; (2) the capability to estimate the proportion of the population following each such trajectory group; (3) the capability to relate group membership probability to individual characteristics and circumstances; and (4) the capability to use the group membership probabilities for various other purposes, such as creating profiles of group members.

    In addition, two important extensions of the method are described: the capability to add time-varying covariates to trajectory models and the capability to estimate joint trajectory models of distinct but related behaviours. The former provides the statistical capacity for testing if a contemporary factor, such as an experimental intervention or a non-experimental event like pregnancy, deflects a pre-existing trajectory. The latter provides the capability to study the unfolding of distinct but related behaviours such as problematic childhood behaviour and adolescent drug abuse.

    Release date: 2004-09-13
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (14)

Analysis (14) (0 to 10 of 14 results)

  • Stats in brief: 89-20-00062023001
    Description: This course is intended for Government of Canada employees who would like to learn about evaluating the quality of data for a particular use. Whether you are a new employee interested in learning the basics, or an experienced subject matter expert looking to refresh your skills, this course is here to help.
    Release date: 2023-07-17

  • Articles and reports: 11-522-X202100100025
    Description:

    We propose a longitudinal analysis with a point of view connected to the organizational changes that have taken place in the Italian National Institute of Statistics in recent years. In 2016 the Institute introduced a new Directorate, intending to standardize and generalize the business process of Data Collection according to the European standard of the GAMSO model. The paper discusses the pros and cons of this change from the perspective of the survey's participation. The ICT survey response rate analysis demonstrates an increase of around 20% since the beginning of the new organization: the paper tries to focus on the impact of the changes introduced with the new organization. We focused our attention on two specific subsets of respondents - the so-called "wanted" - the ones who have never answered to an ICT survey or to any other Istat survey and - the so-called “lost” - the ones included in two consecutive survey’s samples and that answered in the previous edition but not in the current one. The paper aims to illustrate how an efficient organization of data collection reflects its benefits on survey results and what kind of actions should be taken to catch the attention of the "wanted". Finally, we apply a logistic model measuring the probability that an enterprise responding in 2018 (t-1) also answered in 2019 (t). All the analysis suggests some actions that could be taken to improve respondents' participation, data quality, and respondents' perception of the official statistics.

    Key Words: data collection strategy, response rate, paradata, response burden, ICT Survey.

    Release date: 2021-10-29

  • Articles and reports: 11-633-X2018016
    Description:

    Record linkage has been identified as a potential mechanism to add treatment information to the Canadian Cancer Registry (CCR). The purpose of the Canadian Cancer Treatment Linkage Project (CCTLP) pilot is to add surgical treatment data to the CCR. The Discharge Abstract Database (DAD) and the National Ambulatory Care Reporting System (NACRS) were linked to the CCR, and surgical treatment data were extracted. The project was funded through the Cancer Data Development Initiative (CDDI) of the Canadian Partnership Against Cancer (CPAC).

    The CCTLP was developed as a feasibility study in which patient records from the CCR would be linked to surgical treatment records in the DAD and NACRS databases, maintained by the Canadian Institute for Health Information. The target cohort to whom surgical treatment data would be linked was patients aged 19 or older registered on the CCR (2010 through 2012). The linkage was completed in Statistics Canada’s Social Data Linkage Environment (SDLE).

    Release date: 2018-03-27

  • Articles and reports: 12-001-X201700114822
    Description:

    We use a Bayesian method to infer about a finite population proportion when binary data are collected using a two-fold sample design from small areas. The two-fold sample design has a two-stage cluster sample design within each area. A former hierarchical Bayesian model assumes that for each area the first stage binary responses are independent Bernoulli distributions, and the probabilities have beta distributions which are parameterized by a mean and a correlation coefficient. The means vary with areas but the correlation is the same over areas. However, to gain some flexibility we have now extended this model to accommodate different correlations. The means and the correlations have independent beta distributions. We call the former model a homogeneous model and the new model a heterogeneous model. All hyperparameters have proper noninformative priors. An additional complexity is that some of the parameters are weakly identified making it difficult to use a standard Gibbs sampler for computation. So we have used unimodal constraints for the beta prior distributions and a blocked Gibbs sampler to perform the computation. We have compared the heterogeneous and homogeneous models using an illustrative example and simulation study. As expected, the two-fold model with heterogeneous correlations is preferred.

    Release date: 2017-06-22

  • Articles and reports: 11-633-X2016002
    Description:

    Immigrants comprise an ever-increasing percentage of the Canadian population—at more than 20%, which is the highest percentage among the G8 countries (Statistics Canada 2013a). This figure is expected to rise to 25% to 28% by 2031, when at least one in four people living in Canada will be foreign-born (Statistics Canada 2010).

    This report summarizes the linkage of the Immigrant Landing File (ILF) for all provinces and territories, excluding Quebec, to hospital data from the Discharge Abstract Database (DAD), a national database containing information about hospital inpatient and day-surgery events. A deterministic exact-matching approach was used to link data from the 1980-to-2006 ILF and from the DAD (2006/2007, 2007/2008 and 2008/2009) with the 2006 Census, which served as a “bridge” file. This was a secondary linkage in that it used linkage keys created in two previous projects (primary linkages) that separately linked the ILF and the DAD to the 2006 Census. The ILF–DAD linked data were validated by means of a representative sample of 2006 Census records containing immigrant information previously linked to the DAD.

    Release date: 2016-08-17

  • Articles and reports: 82-003-X201401014098
    Geography: Province or territory
    Description:

    This study compares registry and non-registry approaches to linking 2006 Census of Population data for Manitoba and Ontario to Hospital data from the Discharge Abstract Database.

    Release date: 2014-10-15

  • Articles and reports: 12-001-X20050029048
    Description:

    We consider a problem in which an analysis is needed for categorical data from a single two-way table with partial classification (i.e., both item and unit nonresponses). We assume that this is the only information available. A Bayesian methodology permits modeling different patterns of missingness under ignorability and nonignorability assumptions. We construct a nonignorable nonresponse model which is obtained from the ignorable nonresponse model via a model expansion using a data-dependent prior; the nonignorable nonresponse model robustifies the ignorable nonresponse model. A multinomial-Dirichlet model, adjusted for the nonresponse, is used to estimate the cell probabilities, and a Bayes factor is used to test for association. We illustrate our methodology using data on bone mineral density and family income. A sensitivity analysis is used to assess the effects of the data-dependent prior. The ignorable and nonignorable nonresponse models are compared using a simulation study, and there are subtle differences between these models.

    Release date: 2006-02-17

  • Articles and reports: 11-522-X20020016732
    Description:

    Analysis of dose-response relationships has long been important in toxicology. More recently, this type of analysis has been employed to evaluate public education campaigns. The data that are collected in such evaluations are likely to come from standard household survey designs with all the usual complexities of multiple stages, stratification and variable selection probabilities. On a recent evaluation, a system was developed with the following features: categorization of doses into three or four levels, propensity scoring of dose selection and a new jack-knifed Jonckheere-Terpstra test for a monotone dose-response relationship. This system allows rapid production of tests for monotone dose-response relationships that are corrected both for sample design and for confounding. The focus of this paper will be the results of a Monte-Carlo simulation of the properties of the jack-knifed Jonckheere-Terpstra.

    Moreover, there is no experimental control over dosages and the possibility of confounding variables must be considered. Standard regressions in WESVAR and SUDAAN could be used to determine if there is a linear dose-response relationship while controlling on confounders, but such an approach obviously has low power to detect nonlinear but monotone dose-response relationships and is time-consuming to implement if there are a large number of possible outcomes of interest.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016739
    Description:

    The Labour Force Survey (LFS) was not designed to be a longitudinal survey. However, given that respondent households typically remain in the sample for six consecutive months, it is possible to reconstruct six-month fragments of longitudinal data from the monthly records of household members. Such longitudinal data (altogether consisting of millions of person-months of individual- and family-level data) is useful for analyses of monthly labour market dynamics over relatively long periods of time, 20 years and more.

    We make use of these data to estimate hazard functions describing transitions among the labour market states: self-employed, paid employee and not employed. Data on job tenure for the employed, and data on the date last worked for the not employed - together with the date of survey responses - permit the estimated models to include terms reflecting seasonality and macro-economic cycles, as well as the duration dependence of each type of transition. In addition, the LFS data permit spouse labour market activity and family composition variables to be included in the hazard models as time-varying covariates. The estimated hazard equations have been included in the LifePaths socio-economic microsimulation model. In this setting, the equations may be used to simulate lifetime employment activity from past, present and future birth cohorts. Cross-sectional simulation results have been used to validate these models by comparisons with census data from the period 1971 to 1996.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016744
    Description:

    A developmental trajectory describes the course of a behaviour over age or time. This technical paper provides an overview of a semi-parametric, group-based method for analysing developmental trajectories. This methodology provides an alternative to assuming a homogenous population of trajectories as is done in standard growth modelling.

    Four capabilities are described: (1) the capability to identify, rather than assume, distinctive groups of trajectories; (2) the capability to estimate the proportion of the population following each such trajectory group; (3) the capability to relate group membership probability to individual characteristics and circumstances; and (4) the capability to use the group membership probabilities for various other purposes, such as creating profiles of group members.

    In addition, two important extensions of the method are described: the capability to add time-varying covariates to trajectory models and the capability to estimate joint trajectory models of distinct but related behaviours. The former provides the statistical capacity for testing if a contemporary factor, such as an experimental intervention or a non-experimental event like pregnancy, deflects a pre-existing trajectory. The latter provides the capability to study the unfolding of distinct but related behaviours such as problematic childhood behaviour and adolescent drug abuse.

    Release date: 2004-09-13
Reference (0)

Reference (0) (0 results)

No content available at this time.

Date modified: