Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Survey or statistical program

37 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Survey or statistical program

37 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Survey or statistical program

37 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Survey or statistical program

37 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (1,544)

All (1,544) (25 of 1,544 results)

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2017-05-12

  • Articles and reports: 18-001-X2017002
    Description:

    This working paper presents a methodology to measure remoteness at the community level. The method takes into account some of the recent literature on the subject, as well as new computational opportunities provided by the integration of official statistics with data from non-official statistical sources. The approach that was used in the computations accounts for multiple points of access to services; it also establishes a continuum between communities with different transportation infrastructures and connectivity while at the same time retaining the information on the community transportation infrastructures in the database. In addition, a method to implement accessibility measures to selected services is also outlined and a sample of accessibility measures are computed.

    Release date: 2017-05-09

  • Index and guides: 98-500-X
    Description:

    Provides information that enables users to effectively use, apply and interpret data from the Census of Population. Each guide contains definitions and explanations on census concepts as well as a data quality and historical comparability section. Additional information will be included for specific variables to help users better understand the concepts and questions used in the census.

    Release date: 2017-05-03

  • Index and guides: 98-500-X2016001
    Description:

    "This guide focuses on the following topic: Structural Type of Dwelling and Collectives variables. Provides information that enables users to effectively use, apply and interpret data from the 2016 Census. Each guide contains definitions and explanations on census concepts, talks about changes made to the 2016 Census, data quality and historical comparability, as well as comparison with other data sources. Additional information will be included for specific variables to help general users better understand the concepts and questions used in the census. "

    Release date: 2017-05-03

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2017-04-21

  • Technical products: 12-586-X
    Description:

    The Quality Assurance Framework (QAF) serves as the highest-level governance tool for quality management at Statistics Canada. The QAF gives an overview of the quality management and risk mitigation strategies used by the Agency’s program areas. The QAF is used in conjunction with Statistics Canada management practices, such as those described in the Quality Guidelines.

    Release date: 2017-04-21

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2017-04-03

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2017-03-16

  • Journals and periodicals: 11-633-X
    Description:

    Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.

    Release date: 2017-03-13

  • Articles and reports: 11-633-X2017006
    Description:

    This paper describes a method of imputing missing postal codes in a longitudinal database. The 1991 Canadian Census Health and Environment Cohort (CanCHEC), which contains information on individuals from the 1991 Census long-form questionnaire linked with T1 tax return files for the 1984-to-2011 period, is used to illustrate and validate the method. The cohort contains up to 28 consecutive fields for postal code of residence, but because of frequent gaps in postal code history, missing postal codes must be imputed. To validate the imputation method, two experiments were devised where 5% and 10% of all postal codes from a subset with full history were randomly removed and imputed.

    Release date: 2017-03-13

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2017-03-09

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2017-01-30

  • Technical products: 91-621-X2017001
    Release date: 2017-01-25

  • Articles and reports: 11-633-X2017005
    Description:

    Hospitalization rates are among commonly reported statistics related to health-care service use. The variety of methods for calculating confidence intervals for these and other health-related rates suggests a need to classify, compare and evaluate these methods. Zeno is a tool developed to calculate confidence intervals of rates based on several formulas available in the literature. This report describes the contents of the main sheet of the Zeno Tool and indicates which formulas are appropriate, based on users’ assumptions and scope of analysis.

    Release date: 2017-01-19

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2016-12-23

  • Articles and reports: 82-003-X201601214687
    Description:

    This study describes record linkage of the Canadian Community Health Survey and the Canadian Mortality Database. The article explains the record linkage process and presents results about associations between health behaviours and mortality among a representative sample of Canadians.

    Release date: 2016-12-21

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2016-12-20

  • Journals and periodicals: 12-001-X
    Description:

    The journal publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214684
    Description:

    This paper introduces an incomplete adaptive cluster sampling design that is easy to implement, controls the sample size well, and does not need to follow the neighbourhood. In this design, an initial sample is first selected, using one of the conventional designs. If a cell satisfies a prespecified condition, a specified radius around the cell is sampled completely. The population mean is estimated using the \pi-estimator. If all the inclusion probabilities are known, then an unbiased \pi estimator is available; if, depending on the situation, the inclusion probabilities are not known for some of the final sample units, then they are estimated. To estimate the inclusion probabilities, a biased estimator is constructed. However, the simulations show that if the sample size is large enough, the error of the inclusion probabilities is negligible, and the relative \pi-estimator is almost unbiased. This design rivals adaptive cluster sampling because it controls the final sample size and is easy to manage. It rivals adaptive two-stage sequential sampling because it considers the cluster form of the population and reduces the cost of moving across the area. Using real data on a bird population and simulations, the paper compares the design with adaptive two-stage sequential sampling. The simulations show that the design has significant efficiency in comparison with its rival.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214661
    Description:

    An example presented by Jean-Claude Deville in 2005 is subjected to three estimation methods: the method of moments, the maximum likelihood method, and generalized calibration. The three methods yield exactly the same results for the two non-response models. A discussion follows on how to choose the most appropriate model.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214677
    Description:

    How do we tell whether weighting adjustments reduce nonresponse bias? If a variable is measured for everyone in the selected sample, then the design weights can be used to calculate an approximately unbiased estimate of the population mean or total for that variable. A second estimate of the population mean or total can be calculated using the survey respondents only, with weights that have been adjusted for nonresponse. If the two estimates disagree, then there is evidence that the weight adjustments may not have removed the nonresponse bias for that variable. In this paper we develop the theoretical properties of linearization and jackknife variance estimators for evaluating the bias of an estimated population mean or total by comparing estimates calculated from overlapping subsets of the same data with different sets of weights, when poststratification or inverse propensity weighting is used for the nonresponse adjustments to the weights. We provide sufficient conditions on the population, sample, and response mechanism for the variance estimators to be consistent, and demonstrate their small-sample properties through a simulation study.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214662
    Description:

    Two-phase sampling designs are often used in surveys when the sampling frame contains little or no auxiliary information. In this note, we shed some light on the concept of invariance, which is often mentioned in the context of two-phase sampling designs. We define two types of invariant two-phase designs: strongly invariant and weakly invariant two-phase designs. Some examples are given. Finally, we describe the implications of strong and weak invariance from an inference point of view.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214664
    Description:

    This paper draws statistical inference for finite population mean based on judgment post stratified (JPS) samples. The JPS sample first selects a simple random sample and then stratifies the selected units into H judgment classes based on their relative positions (ranks) in a small set of size H. This leads to a sample with random sample sizes in judgment classes. Ranking process can be performed either using auxiliary variables or visual inspection to identify the ranks of the measured observations. The paper develops unbiased estimator and constructs confidence interval for population mean. Since judgment ranks are random variables, by conditioning on the measured observations we construct Rao-Blackwellized estimators for the population mean. The paper shows that Rao-Blackwellized estimators perform better than usual JPS estimators. The proposed estimators are applied to 2012 United States Department of Agriculture Census Data.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214676
    Description:

    Winsorization procedures replace extreme values with less extreme values, effectively moving the original extreme values toward the center of the distribution. Winsorization therefore both detects and treats influential values. Mulry, Oliver and Kaputa (2014) compare the performance of the one-sided Winsorization method developed by Clark (1995) and described by Chambers, Kokic, Smith and Cruddas (2000) to the performance of M-estimation (Beaumont and Alavi 2004) in highly skewed business population data. One aspect of particular interest for methods that detect and treat influential values is the range of values designated as influential, called the detection region. The Clark Winsorization algorithm is easy to implement and can be extremely effective. However, the resultant detection region is highly dependent on the number of influential values in the sample, especially when the survey totals are expected to vary greatly by collection period. In this note, we examine the effect of the number and magnitude of influential values on the detection regions from Clark Winsorization using data simulated to realistically reflect the properties of the population for the Monthly Retail Trade Survey (MRTS) conducted by the U.S. Census Bureau. Estimates from the MRTS and other economic surveys are used in economic indicators, such as the Gross Domestic Product (GDP).

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214660
    Description:

    In an economic survey of a sample of enterprises, occupations are randomly selected from a list until a number r of occupations in a local unit has been identified. This is an inverse sampling problem for which we are proposing a few solutions. Simple designs with and without replacement are processed using negative binomial distributions and negative hypergeometric distributions. We also propose estimators for when the units are selected with unequal probabilities, with or without replacement.

    Release date: 2016-12-20

Data (8)

Data (8) (8 of 8 results)

  • Public use microdata: 89F0002X
    Description:

    The SPSD/M is a static microsimulation model designed to analyse financial interactions between governments and individuals in Canada. It can compute taxes paid to and cash transfers received from government. It is comprised of a database, a series of tax/transfer algorithms and models, analytical software and user documentation.

    Release date: 2016-12-05

  • Table: 53-500-X
    Description:

    This report presents the results of a pilot survey conducted by Statistics Canada to measure the fuel consumption of on-road motor vehicles registered in Canada. This study was carried out in connection with the Canadian Vehicle Survey (CVS) which collects information on road activity such as distance traveled, number of passengers and trip purpose.

    Release date: 2004-10-21

  • Table: 95F0495X2001012
    Description:

    This table contains information from the 2001 Census, presented according to the statistical area classification (SAC). The SAC groups census subdivisions according to whether they are a component of a census metropolitan area, a census agglomeration, a census metropolitan area and census agglomeration influenced zone (strong MIZ, moderate MIZ, weak MIZ or no MIZ) or of the territories (Northwest Territories, Nunavut and Yukon Territory). The SAC is used for data dissemination purposes.

    Data characteristics presented according to the SAC include age, visible minority groups, immigration, mother tongue, education, income, work and dwellings. Data are presented for Canada, provinces and territories. The data characteristics presented within this table may differ from those of other products in the "Profiles" series.

    Release date: 2004-02-27

  • Table: 53-222-X19980006587
    Description:

    The primary purpose of this article is to present a new time series data and to demonstrate its analytical potential and not to provide a detailed analysis of these data. The analysis in section 5.2.4 will deal primarily with the trends of major variables dealing with domestic and transborder traffic.

    Release date: 2000-03-07

  • Table: 75M0007X
    Description:

    The Absence from Work Survey was designed primarily to fulfill the objectives of Human Resources Development Canada. They sponsor the qualified wage loss replacement plan which applies to employers who have their own private plans to cover employee wages lost due to sickness, accident, etc. Employers who fall under the plan are granted a reduction in their quotas payable to the Unemployment Insurance Commission. The data generated from the responses to the supplement will provide input to determine the rates for quota reductions for qualified employers.

    Although the Absence from Work Survey collects information on absences from work due to illness, accident or pregnancy, it does not provide a complete picture of people who have been absent from work for these reasons because the concepts and definitions have been developed specifically for the needs of the client. Absences in this survey are defined as being at least two weeks in length, and respondents are only asked the three reasons for their most recent absence and the one preceding it.

    Release date: 1999-06-29

  • Table: 82-567-X
    Description:

    The National Population Health Survey (NPHS) is designed to enhance the understanding of the processes affecting health. The survey collects cross-sectional as well as longitudinal data. In 1994/95 the survey interviewed a panel of 17,276 individuals, then returned to interview them a second time in 1996/97. The response rate for these individuals was 96% in 1996/97. Data collection from the panel will continue for up to two decades. For cross-sectional purposes, data were collected for a total of 81,000 household residents in all provinces (except people on Indian reserves or on Canadian Forces bases) in 1996/97.

    This overview illustrates the variety of information available by presenting data on perceived health, chronic conditions, injuries, repetitive strains, depression, smoking, alcohol consumption, physical activity, consultations with medical professionals, use of medications and use of alternative medicine.

    Release date: 1998-07-29

  • Table: 62-010-X19970023422
    Description:

    The current official time base of the Consumer Price Index (CPI) is 1986=100. This time base was first used when the CPI for June 1990 was released. Statistics Canada is about to convert all price index series to the time base 1992=100. As a result, all constant dollar series will be converted to 1992 dollars. The CPI will shift to the new time base when the CPI for January 1998 is released on February 27th, 1998.

    Release date: 1997-11-17

  • Public use microdata: 89M0005X
    Description:

    The objective of this survey was to collect attitudinal, cognitive and behavioral information regarding drinking and driving.

    Release date: 1996-10-21

Analysis (838)

Analysis (838) (25 of 838 results)

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2017-05-12

  • Articles and reports: 18-001-X2017002
    Description:

    This working paper presents a methodology to measure remoteness at the community level. The method takes into account some of the recent literature on the subject, as well as new computational opportunities provided by the integration of official statistics with data from non-official statistical sources. The approach that was used in the computations accounts for multiple points of access to services; it also establishes a continuum between communities with different transportation infrastructures and connectivity while at the same time retaining the information on the community transportation infrastructures in the database. In addition, a method to implement accessibility measures to selected services is also outlined and a sample of accessibility measures are computed.

    Release date: 2017-05-09

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2017-04-21

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2017-04-03

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2017-03-16

  • Journals and periodicals: 11-633-X
    Description:

    Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.

    Release date: 2017-03-13

  • Articles and reports: 11-633-X2017006
    Description:

    This paper describes a method of imputing missing postal codes in a longitudinal database. The 1991 Canadian Census Health and Environment Cohort (CanCHEC), which contains information on individuals from the 1991 Census long-form questionnaire linked with T1 tax return files for the 1984-to-2011 period, is used to illustrate and validate the method. The cohort contains up to 28 consecutive fields for postal code of residence, but because of frequent gaps in postal code history, missing postal codes must be imputed. To validate the imputation method, two experiments were devised where 5% and 10% of all postal codes from a subset with full history were randomly removed and imputed.

    Release date: 2017-03-13

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2017-03-09

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2017-01-30

  • Articles and reports: 11-633-X2017005
    Description:

    Hospitalization rates are among commonly reported statistics related to health-care service use. The variety of methods for calculating confidence intervals for these and other health-related rates suggests a need to classify, compare and evaluate these methods. Zeno is a tool developed to calculate confidence intervals of rates based on several formulas available in the literature. This report describes the contents of the main sheet of the Zeno Tool and indicates which formulas are appropriate, based on users’ assumptions and scope of analysis.

    Release date: 2017-01-19

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2016-12-23

  • Articles and reports: 82-003-X201601214687
    Description:

    This study describes record linkage of the Canadian Community Health Survey and the Canadian Mortality Database. The article explains the record linkage process and presents results about associations between health behaviours and mortality among a representative sample of Canadians.

    Release date: 2016-12-21

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2016-12-20

  • Journals and periodicals: 12-001-X
    Description:

    The journal publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214684
    Description:

    This paper introduces an incomplete adaptive cluster sampling design that is easy to implement, controls the sample size well, and does not need to follow the neighbourhood. In this design, an initial sample is first selected, using one of the conventional designs. If a cell satisfies a prespecified condition, a specified radius around the cell is sampled completely. The population mean is estimated using the \pi-estimator. If all the inclusion probabilities are known, then an unbiased \pi estimator is available; if, depending on the situation, the inclusion probabilities are not known for some of the final sample units, then they are estimated. To estimate the inclusion probabilities, a biased estimator is constructed. However, the simulations show that if the sample size is large enough, the error of the inclusion probabilities is negligible, and the relative \pi-estimator is almost unbiased. This design rivals adaptive cluster sampling because it controls the final sample size and is easy to manage. It rivals adaptive two-stage sequential sampling because it considers the cluster form of the population and reduces the cost of moving across the area. Using real data on a bird population and simulations, the paper compares the design with adaptive two-stage sequential sampling. The simulations show that the design has significant efficiency in comparison with its rival.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214661
    Description:

    An example presented by Jean-Claude Deville in 2005 is subjected to three estimation methods: the method of moments, the maximum likelihood method, and generalized calibration. The three methods yield exactly the same results for the two non-response models. A discussion follows on how to choose the most appropriate model.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214677
    Description:

    How do we tell whether weighting adjustments reduce nonresponse bias? If a variable is measured for everyone in the selected sample, then the design weights can be used to calculate an approximately unbiased estimate of the population mean or total for that variable. A second estimate of the population mean or total can be calculated using the survey respondents only, with weights that have been adjusted for nonresponse. If the two estimates disagree, then there is evidence that the weight adjustments may not have removed the nonresponse bias for that variable. In this paper we develop the theoretical properties of linearization and jackknife variance estimators for evaluating the bias of an estimated population mean or total by comparing estimates calculated from overlapping subsets of the same data with different sets of weights, when poststratification or inverse propensity weighting is used for the nonresponse adjustments to the weights. We provide sufficient conditions on the population, sample, and response mechanism for the variance estimators to be consistent, and demonstrate their small-sample properties through a simulation study.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214662
    Description:

    Two-phase sampling designs are often used in surveys when the sampling frame contains little or no auxiliary information. In this note, we shed some light on the concept of invariance, which is often mentioned in the context of two-phase sampling designs. We define two types of invariant two-phase designs: strongly invariant and weakly invariant two-phase designs. Some examples are given. Finally, we describe the implications of strong and weak invariance from an inference point of view.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214664
    Description:

    This paper draws statistical inference for finite population mean based on judgment post stratified (JPS) samples. The JPS sample first selects a simple random sample and then stratifies the selected units into H judgment classes based on their relative positions (ranks) in a small set of size H. This leads to a sample with random sample sizes in judgment classes. Ranking process can be performed either using auxiliary variables or visual inspection to identify the ranks of the measured observations. The paper develops unbiased estimator and constructs confidence interval for population mean. Since judgment ranks are random variables, by conditioning on the measured observations we construct Rao-Blackwellized estimators for the population mean. The paper shows that Rao-Blackwellized estimators perform better than usual JPS estimators. The proposed estimators are applied to 2012 United States Department of Agriculture Census Data.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214676
    Description:

    Winsorization procedures replace extreme values with less extreme values, effectively moving the original extreme values toward the center of the distribution. Winsorization therefore both detects and treats influential values. Mulry, Oliver and Kaputa (2014) compare the performance of the one-sided Winsorization method developed by Clark (1995) and described by Chambers, Kokic, Smith and Cruddas (2000) to the performance of M-estimation (Beaumont and Alavi 2004) in highly skewed business population data. One aspect of particular interest for methods that detect and treat influential values is the range of values designated as influential, called the detection region. The Clark Winsorization algorithm is easy to implement and can be extremely effective. However, the resultant detection region is highly dependent on the number of influential values in the sample, especially when the survey totals are expected to vary greatly by collection period. In this note, we examine the effect of the number and magnitude of influential values on the detection regions from Clark Winsorization using data simulated to realistically reflect the properties of the population for the Monthly Retail Trade Survey (MRTS) conducted by the U.S. Census Bureau. Estimates from the MRTS and other economic surveys are used in economic indicators, such as the Gross Domestic Product (GDP).

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214660
    Description:

    In an economic survey of a sample of enterprises, occupations are randomly selected from a list until a number r of occupations in a local unit has been identified. This is an inverse sampling problem for which we are proposing a few solutions. Simple designs with and without replacement are processed using negative binomial distributions and negative hypergeometric distributions. We also propose estimators for when the units are selected with unequal probabilities, with or without replacement.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600214663
    Description:

    We present theoretical evidence that efforts during data collection to balance the survey response with respect to selected auxiliary variables will improve the chances for low nonresponse bias in the estimates that are ultimately produced by calibrated weighting. One of our results shows that the variance of the bias – measured here as the deviation of the calibration estimator from the (unrealized) full-sample unbiased estimator – decreases linearly as a function of the response imbalance that we assume measured and controlled continuously over the data collection period. An attractive prospect is thus a lower risk of bias if one can manage the data collection to get low imbalance. The theoretical results are validated in a simulation study with real data from an Estonian household survey.

    Release date: 2016-12-20

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2016-12-05

  • The Daily
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2016-11-10

  • Articles and reports: 11-633-X2016004
    Description:

    Understanding the importance of the dynamic entry process in the Canadian economy involves measuring the amount and size of firm entry. The paper presents estimates of the importance of firm entry in Canada. It uses the database underlying the Longitudinal Employment Analysis Program (LEAP), which has produced measures of firm entry and exit since 1988. This paper discusses the methodology used to estimate entry and exit, the issues that had to be resolved and the reasons for choosing the particular solutions that were adopted. It then presents measures that are derived from LEAP. Finally, it analyzes the sensitivity of the estimates associated with LEAP to alternative methods of estimating entry and exit.

    Release date: 2016-11-10

Reference (698)

Reference (698) (25 of 698 results)

  • Index and guides: 98-500-X
    Description:

    Provides information that enables users to effectively use, apply and interpret data from the Census of Population. Each guide contains definitions and explanations on census concepts as well as a data quality and historical comparability section. Additional information will be included for specific variables to help users better understand the concepts and questions used in the census.

    Release date: 2017-05-03

  • Index and guides: 98-500-X2016001
    Description:

    "This guide focuses on the following topic: Structural Type of Dwelling and Collectives variables. Provides information that enables users to effectively use, apply and interpret data from the 2016 Census. Each guide contains definitions and explanations on census concepts, talks about changes made to the 2016 Census, data quality and historical comparability, as well as comparison with other data sources. Additional information will be included for specific variables to help general users better understand the concepts and questions used in the census. "

    Release date: 2017-05-03

  • Technical products: 12-586-X
    Description:

    The Quality Assurance Framework (QAF) serves as the highest-level governance tool for quality management at Statistics Canada. The QAF gives an overview of the quality management and risk mitigation strategies used by the Agency’s program areas. The QAF is used in conjunction with Statistics Canada management practices, such as those described in the Quality Guidelines.

    Release date: 2017-04-21

  • Technical products: 91-621-X2017001
    Release date: 2017-01-25

  • Technical products: 12-206-X
    Description:

    This report summarizes the achievements program sponsored by the three methodology divisions of Statistics Canada. This program covers research and development activities in statistical methods with potentially broad application in the Agency's survey programs, which would not otherwise have been carried out during the provision of methodology services to those survey programs. They also include tasks that provided client support in the application of past successful developments in order to promote the utilization of the results of research and development work.

    Release date: 2016-08-22

  • Technical products: 75F0002M
    Description:

    This series provides detailed documentation on income developments, including survey design issues, data quality evaluation and exploratory research.

    Release date: 2016-07-08

  • Technical products: 75F0002M2016003
    Description:

    Periodically, income statistics are updated to reflect the most recent population estimates from the Census. Accordingly, with the release of the 2014 data from the Canadian Income Survey, Statistics Canada has revised estimates for 2006 to 2013 using new population totals from the 2011 Census. This paper provides unrevised estimates alongside revised estimates for key income series, indicating where the revisions were significant.

    Release date: 2016-07-08

  • Technical products: 84-538-X
    Description:

    This document presents the methodology underlying the production of the life tables for Canada, provinces and territories, from reference period 2005 to 2007 and onward.

    Release date: 2016-05-19

  • Technical products: 11-522-X
    Description:

    Since 1984, an annual international symposium on methodological issues has been sponsored by Statistics Canada. Proceedings have been available since 1987.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014729
    Description:

    The use of administrative datasets as a data source in official statistics has become much more common as there is a drive for more outputs to be produced more efficiently. Many outputs rely on linkage between two or more datasets, and this is often undertaken in a number of phases with different methods and rules. In these situations we would like to be able to assess the quality of the linkage, and this involves some re-assessment of both links and non-links. In this paper we discuss sampling approaches to obtain estimates of false negatives and false positives with reasonable control of both accuracy of estimates and cost. Approaches to stratification of links (non-links) to sample are evaluated using information from the 2011 England and Wales population census.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014749
    Description:

    As part of the Tourism Statistics Program redesign, Statistics Canada is developing the National Travel Survey (NTS) to collect travel information from Canadian travellers. This new survey will replace the Travel Survey of Residents of Canada and the Canadian resident component of the International Travel Survey. The NTS will take advantage of Statistics Canada’s common sampling frames and common processing tools while maximizing the use of administrative data. This paper discusses the potential uses of administrative data such as Passport Canada files, Canada Border Service Agency files and Canada Revenue Agency files, to increase the efficiency of the NTS sample design.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014758
    Description:

    "Several Canadian jurisdictions including Ontario are using patient-based healthcare data in their funding models. These initiatives can influence the quality of this data both positively and negatively as people tend to pay more attention to the data and its quality when financial decisions are based upon it. Ontario’s funding formula uses data from several national databases housed at the Canadian Institute for Health Information (CIHI). These databases provide information on patient activity and clinical status across the continuum of care. As funding models may influence coding behaviour, CIHI is collaborating with the Ontario Ministry of Health and Long-Term Care to assess and monitor the quality of this data. CIHI is using data mining software and modelling techniques (that are often associated with “big data”) to identify data anomalies across multiple factors. The models identify what the “typical” clinical coding patterns are for key patient groups (for example, patients seen in special care units or discharged to home care), so that outliers can be identified, where patients do not fit the expected pattern. A key component of the modelling is segmenting the data based on patient, provider and hospital characteristics to take into account key differences in the delivery of health care and patient populations across the province. CIHI’s analysis identified several hospitals with coding practices that appear to be changing or significantly different from their peer group. Further investigation is required to understand why these differences exist and to develop appropriate strategies to mitigate variations. "

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014741
    Description:

    Statistics Canada’s mandate includes producing statistical data to shed light on current business issues. The linking of business records is an important aspect of the development, production, evaluation and analysis of these statistical data. As record linkage can intrude on one’s privacy, Statistics Canada uses it only when the public good is clear and outweighs the intrusion. Record linkage is experiencing a revival triggered by a greater use of administrative data in many statistical programs. There are many challenges to business record linkage. For example, many administrative files not have common identifiers, information is recorded is in non-standardized formats, information contains typographical errors, administrative data files are usually large in size, and finally the evaluation of multiple record pairings makes absolute comparison impractical and sometimes impossible. Due to the importance and challenges associated with record linkage, Statistics Canada has been developing a record linkage standard to help users optimize their business record linkage process. For example, this process includes building on a record linkage blocking strategy that reduces the amount of record-pairs to compare and match, making use of Statistics Canada’s internal software to conduct deterministic and probabilistic matching, and creating standard business name and address fields on Statistics Canada’s Business Register. This article gives an overview of the business record linkage methodology and looks at various economic projects which use record linkage at Statistics Canada, these include projects in the National Accounts, International Trade, Agriculture and the Business Register.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014723
    Description:

    The U.S. Census Bureau is researching uses of administrative records in survey and decennial operations in order to reduce costs and respondent burden while preserving data quality. One potential use of administrative records is to utilize the data when race and Hispanic origin responses are missing. When federal and third party administrative records are compiled, race and Hispanic origin responses are not always the same for an individual across different administrative records sources. We explore different sets of business rules used to assign one race and one Hispanic response when these responses are discrepant across sources. We also describe the characteristics of individuals with matching, non-matching, and missing race and Hispanic origin data across several demographic, household, and contextual variables. We find that minorities, especially Hispanics, are more likely to have non-matching Hispanic origin and race responses in administrative records than in the 2010 Census. Hispanics are less likely to have missing Hispanic origin data but more likely to have missing race data in administrative records. Non-Hispanic Asians and non-Hispanic Pacific Islanders are more likely to have missing race and Hispanic origin data in administrative records. Younger individuals, renters, individuals living in households with two or more people, individuals who responded to the census in the nonresponse follow-up operation, and individuals residing in urban areas are more likely to have non-matching race and Hispanic origin responses.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014751
    Description:

    Practically all major retailers use scanners to record the information on their transactions with clients (consumers). These data normally include the product code, a brief description, the price and the quantity sold. This is an extremely relevant data source for statistical programs such as Statistics Canada’s Consumer Price Index (CPI), one of Canada’s most important economic indicators. Using scanner data could improve the quality of the CPI by increasing the number of prices used in calculations, expanding geographic coverage and including the quantities sold, among other things, while lowering data collection costs. However, using these data presents many challenges. An examination of scanner data from a first retailer revealed a high rate of change in product identification codes over a one-year period. The effects of these changes pose challenges from a product classification and estimate quality perspective. This article focuses on the issues associated with acquiring, classifying and examining these data to assess their quality for use in the CPI.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014752
    Description:

    This paper presents a new price index method for processing electronic transaction (scanner) data. Price indices are calculated as a ratio of a turnover index and a weighted quantity index. Product weights of quantities sold are computed from the deflated prices of each month in the current publication year. New products can be timely incorporated without price imputations, so that all transactions can be processed. Product weights are monthly updated and are used to calculate direct indices with respect to a fixed base month. Price indices are free of chain drift by this construction. The results are robust under departures from the methodological choices. The method is part of the Dutch CPI since January 2016, when it was first applied to mobile phones.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014738
    Description:

    In the standard design approach to missing observations, the construction of weight classes and calibration are used to adjust the design weights for the respondents in the sample. Here we use these adjusted weights to define a Dirichlet distribution which can be used to make inferences about the population. Examples show that the resulting procedures have better performance properties than the standard methods when the population is skewed.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014755
    Description:

    The National Children’s Study Vanguard Study was a pilot epidemiological cohort study of children and their parents. Measures were to be taken from pre-pregnancy until adulthood. The use of extant data was planned to supplement direct data collection from the respondents. Our paper outlines a strategy for cataloging and evaluating extant data sources for use with large scale longitudinal. Through our review we selected five evaluation factors to guide a researcher through available data sources including 1) relevance, 2) timeliness, 3) spatiality, 4) accessibility, and 5) accuracy.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014735
    Description:

    Microdata dissemination normally requires data reduction and modification methods be applied, and the degree to which these methods are applied depend on the control methods that will be required to access and use the data. An approach that is in some circumstances more suitable for accessing data for statistical purposes is secure computation, which involves computing analytic functions on encrypted data without the need to decrypt the underlying source data to run a statistical analysis. This approach also allows multiple sites to contribute data while providing strong privacy guarantees. This way the data can be pooled and contributors can compute analytic functions without either party knowing their inputs. We explain how secure computation can be applied in practical contexts, with some theoretical results and real healthcare examples.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014745
    Description:

    In the design of surveys a number of parameters like contact propensities, participation propensities and costs per sample unit play a decisive role. In on-going surveys, these survey design parameters are usually estimated from previous experience and updated gradually with new experience. In new surveys, these parameters are estimated from expert opinion and experience with similar surveys. Although survey institutes have a fair expertise and experience, the postulation, estimation and updating of survey design parameters is rarely done in a systematic way. This paper presents a Bayesian framework to include and update prior knowledge and expert opinion about the parameters. This framework is set in the context of adaptive survey designs in which different population units may receive different treatment given quality and cost objectives. For this type of survey, the accuracy of design parameters becomes even more crucial to effective design decisions. The framework allows for a Bayesian analysis of the performance of a survey during data collection and in between waves of a survey. We demonstrate the Bayesian analysis using a realistic simulation study.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014706
    Description:

    Over the last decade, Statistics Canada’s Producer Prices Division has expanded its service producer price indexes program and continued to improve its goods and construction producer price indexes program. While the majority of price indexes are based on traditional survey methods, efforts were made to increase the use of administrative data and alternative data sources in order to reduce burden on our respondents. This paper focuses mainly on producer price programs, but also provides information on the growing importance of alternative data sources at Statistics Canada. In addition, it presents the operational challenges and risks that statistical offices could face when relying more and more on third-party outputs. Finally, it presents the tools being developed to integrate alternative data while collecting metadata.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014722
    Description:

    The U.S. Census Bureau is researching ways to incorporate administrative data in decennial census and survey operations. Critical to this work is an understanding of the coverage of the population by administrative records. Using federal and third party administrative data linked to the American Community Survey (ACS), we evaluate the extent to which administrative records provide data on foreign-born individuals in the ACS and employ multinomial logistic regression techniques to evaluate characteristics of those who are in administrative records relative to those who are not. We find that overall, administrative records provide high coverage of foreign-born individuals in our sample for whom a match can be determined. The odds of being in administrative records are found to be tied to the processes of immigrant assimilation – naturalization, higher English proficiency, educational attainment, and full-time employment are associated with greater odds of being in administrative records. These findings suggest that as immigrants adapt and integrate into U.S. society, they are more likely to be involved in government and commercial processes and programs for which we are including data. We further explore administrative records coverage for the two largest race/ethnic groups in our sample – Hispanic and non-Hispanic single-race Asian foreign born, finding again that characteristics related to assimilation are associated with administrative records coverage for both groups. However, we observe that neighborhood context impacts Hispanics and Asians differently.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014733
    Description:

    The social value of data collections are dramatically enhanced by the broad dissemination of research files and the resulting increase in scientific productivity. Currently, most studies are designed with a focus on collecting information that is analytically useful and accurate, with little forethought as to how it will be shared. Both literature and practice also presume that disclosure analysis will take place after data collection. But to produce public-use data of the highest analytical utility for the largest user group, disclosure risk must be considered at the beginning of the research process. Drawing upon economic and statistical decision-theoretic frameworks and survey methodology research, this study seeks to enhance the scientific productivity of shared research data by describing how disclosure risk can be addressed in the earliest stages of research with the formulation of "safe designs" and "disclosure simulations", where an applied statistical approach has been taken in: (1) developing and validating models that predict the composition of survey data under different sampling designs; (2) selecting and/or developing measures and methods used in the assessments of disclosure risk, analytical utility, and disclosure survey costs that are best suited for evaluating sampling and database designs; and (3) conducting simulations to gather estimates of risk, utility, and cost for studies with a wide range of sampling and database design characteristics.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014728
    Description:

    Record linkage joins together two or more sources. The product of record linkage is a file with one record per individual containing all the information about the individual from the multiple files. The problem is difficult when a unique identification key is not available, there are errors in some variables, some data are missing, and files are large. Probabilistic record linkage computes a probability that records from on different files pertain to a single individual. Some true links are given low probabilities of matching, whereas some non links are given high probabilities. Errors in linkage designations can cause bias in analyses based on the composite data base. The SEER cancer registries contain information on breast cancer cases in their registry areas. A diagnostic test based on the Oncotype DX assay, performed by Genomic Health, Inc. (GHI), is often performed for certain types of breast cancers. Record linkage using personal identifiable information was conducted to associate Oncotype DC assay results with SEER cancer registry information. The software Link Plus was used to generate a score describing the similarity of records and to identify the apparent best match of SEER cancer registry individuals to the GHI database. Clerical review was used to check samples of likely matches, possible matches, and unlikely matches. Models are proposed for jointly modeling the record linkage process and subsequent statistical analysis in this and other applications.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014713
    Description:

    Big data is a term that means different things to different people. To some, it means datasets so large that our traditional processing and analytic systems can no longer accommodate them. To others, it simply means taking advantage of existing datasets of all sizes and finding ways to merge them with the goal of generating new insights. The former view poses a number of important challenges to traditional market, opinion, and social research. In either case, there are implications for the future of surveys that are only beginning to be explored.

    Release date: 2016-03-24

Date modified: