Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

1 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

1 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

1 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Type of information

1 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (657)

All (657) (25 of 657 results)

  • Technical products: 84-538-X
    Description:

    This document presents the methodology underlying the production of the life tables for Canada, provinces and territories, from reference period 1980/1982 and onward.

    Release date: 2017-11-16

  • Technical products: 12-206-X
    Description:

    This report summarizes the achievements program sponsored by the three methodology divisions of Statistics Canada. This program covers research and development activities in statistical methods with potentially broad application in the Agency's survey programs, which would not otherwise have been carried out during the provision of methodology services to those survey programs. They also include tasks that provided client support in the application of past successful developments in order to promote the utilization of the results of research and development work.

    Release date: 2017-11-03

  • Technical products: 12-586-X
    Description:

    The Quality Assurance Framework (QAF) serves as the highest-level governance tool for quality management at Statistics Canada. The QAF gives an overview of the quality management and risk mitigation strategies used by the Agency’s program areas. The QAF is used in conjunction with Statistics Canada management practices, such as those described in the Quality Guidelines.

    Release date: 2017-04-21

  • Technical products: 91-621-X2017001
    Release date: 2017-01-25

  • Technical products: 75F0002M
    Description:

    This series provides detailed documentation on income developments, including survey design issues, data quality evaluation and exploratory research.

    Release date: 2016-07-08

  • Technical products: 75F0002M2016003
    Description:

    Periodically, income statistics are updated to reflect the most recent population estimates from the Census. Accordingly, with the release of the 2014 data from the Canadian Income Survey, Statistics Canada has revised estimates for 2006 to 2013 using new population totals from the 2011 Census. This paper provides unrevised estimates alongside revised estimates for key income series, indicating where the revisions were significant.

    Release date: 2016-07-08

  • Technical products: 11-522-X
    Description:

    Since 1984, an annual international symposium on methodological issues has been sponsored by Statistics Canada. Proceedings have been available since 1987.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014718
    Description:

    This study assessed whether starting participation in Employment Assistance Services (EAS) earlier after initiating an Employment Insurance (EI) claim leads to better impacts for unemployed individuals than participating later during the EI benefit period. As in Sianesi (2004) and Hujer and Thomsen (2010), the analysis relied on a stratified propensity score matching approach conditional on the discretized duration of unemployment until the program starts. The results showed that individuals who participated in EAS within the first four weeks after initiating an EI claim had the best impacts on earnings and incidence of employment while also experiencing reduced use of EI starting the second year post-program.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014750
    Description:

    The Educational Master File (EMF) system was built to allow the analysis of educational programs in Canada. At the core of the system are administrative files that record all of the registrations to post-secondary and apprenticeship programs in Canada. New administrative files become available on an annual basis. Once a new file becomes available, a first round of processing is performed, which includes linkage to other administrative records. This linkage yields information that can improve the quality of the file, it allows further linkages to other data describing labour market outcomes, and it’s the first step in adding the file to the EMF. Once part of the EMF, information from the file can be included in cross-sectional and longitudinal projects, to study academic pathways and labour market outcomes after graduation. The EMF currently consists of data from 2005 to 2013, but it evolves as new data become available. This paper gives an overview of the mechanisms used to build the EMF, with focus on the structure of the final system and some of its analytical potential.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014727
    Description:

    "Probability samples of near-universal frames of households and persons, administered standardized measures, yielding long multivariate data records, and analyzed with statistical procedures reflecting the design – these have been the cornerstones of the empirical social sciences for 75 years. That measurement structure have given the developed world almost all of what we know about our societies and their economies. The stored survey data form a unique historical record. We live now in a different data world than that in which the leadership of statistical agencies and the social sciences were raised. High-dimensional data are ubiquitously being produced from Internet search activities, mobile Internet devices, social media, sensors, retail store scanners, and other devices. Some estimate that these data sources are increasing in size at the rate of 40% per year. Together their sizes swamp that of the probability-based sample surveys. Further, the state of sample surveys in the developed world is not healthy. Falling rates of survey participation are linked with ever-inflated costs of data collection. Despite growing needs for information, the creation of new survey vehicles is hampered by strained budgets for official statistical agencies and social science funders. These combined observations are unprecedented challenges for the basic paradigm of inference in the social and economic sciences. This paper discusses alternative ways forward at this moment in history. "

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014734
    Description:

    Data protection and privacy are key challenges that need to be tackled with high priority in order to enable the use of Big Data in the production of Official Statistics. This was emphasized in 2013 by the Directors of National Statistical Insitutes (NSIs) of the European Statistical System Committee (ESSC) in the Scheveningen Memorandum. The ESSC requested Eurostat and the NSIs to elaborate an action plan with a roadmap for following up the implementation of the Memorandum. At the Riga meeting on September 26, 2014, the ESSC endorsed the Big Data Action Plan and Roadmap 1.0 (BDAR) presented by the Eurostat Task Force on Big Data (TFBD) and agreed to integrate it into the ESS Vision 2020 portfolio. Eurostat also collaborates in this field with external partners such as the United Nations Economic Commission for Europe (UNECE). The big data project of the UNECE High-Level Group is an international project on the role of big data in the modernization of statistical production. It comprised four ‘task teams’ addressing different aspects of Big Data issues relevant for official statistics: Privacy, Partnerships, Sandbox, and Quality. The Privacy Task Team finished its work in 2014 and gave an overview of the existing tools for risk management regarding privacy issues, described how risk of identification relates to Big Data characteristics and drafted recommendations for National Statistical Offices (NSOs). It mainly concluded that extensions to existing frameworks, including use of new technologies were needed in order to deal with privacy risks related to the use of Big Data. The BDAR builds on the work achieved by the UNECE task teams. Specifically, it recognizes that a number of big data sources contain sensitive information, that their use for official statistics may induce negative perceptions with the general public and other stakeholders and that this risk should be mitigated in the short to medium term. It proposes to launch multiple actions like e.g., an adequate review on ethical principles governing the roles and activities of the NSIs and a strong communication strategy. The paper presents the different actions undertaken within the ESS and in collaboration with UNECE, as well as potential technical and legal solutions to be put in place in order to address the data protection and privacy risks in the use of Big Data for Official Statistics.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014747
    Description:

    The Longitudinal Immigration Database (IMDB) combines the Immigrant Landing File (ILF) with annual tax files. This record linkage is performed using a tax filer database. The ILF includes all immigrants who have landed in Canada since 1980. In looking to enhance the IMDB, the possibility of adding temporary residents (TR) and immigrants who landed between 1952 and 1979 (PRE80) was studied. Adding this information would give a more complete picture of the immigrant population living in Canada. To integrate the TR and PRE80 files into the IMDB, record linkages between these two files and the tax filer database, were performed. This exercise was challenging in part due to the presence of duplicates in the files and conflicting links between the different record linkages.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014709
    Description:

    Traffic congestion is not limited to large cities but is also becoming a problem in medium-size cities and to roads going through cities. Among a large variety of congestion measures, six were selected for the ease of aggregation and their capacity to use the instantaneous information from CVUS-light component in 2014. From the selected measures, the Index of Congestion is potentially the only one not biased. This measure is used to illustrate different dimension of congestion on the road network.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014711
    Description:

    After the 2010 Census, the U.S. Census Bureau conducted two separate research projects matching survey data to databases. One study matched to the third-party database Accurint, and the other matched to U.S. Postal Service National Change of Address (NCOA) files. In both projects, we evaluated response error in reported move dates by comparing the self-reported move date to records in the database. We encountered similar challenges in the two projects. This paper discusses our experience using “big data” as a comparison source for survey data and our lessons learned for future projects similar to the ones we conducted.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014757
    Description:

    The Unified Brazilian Health System (SUS) was created in 1988 and, with the aim of organizing the health information systems and databases already in use, a unified databank (DataSUS) was created in 1991. DataSUS files are freely available via Internet. Access and visualization of such data is done through a limited number of customized tables and simple diagrams, which do not entirely meet the needs of health managers and other users for a flexible and easy-to-use tool that can tackle different aspects of health which are relevant to their purposes of knowledge-seeking and decision-making. We propose the interactive monthly generation of synthetic epidemiological reports, which are not only easily accessible but also easy to interpret and understand. Emphasis is put on data visualization through more informative diagrams and maps.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014731
    Description:

    Our study describes various factors that are of concern when evaluating disclosure risk of contextualized microdata and some of the empirical steps that are involved in their assessment. Utilizing synthetic sets of survey respondents, we illustrate how different postulates shape the assessment of risk when considering: (1) estimated probabilities that unidentified geographic areas are represented within a survey; (2) the number of people in the population who share the same personal and contextual identifiers as a respondent; and (3) the anticipated amount of coverage error in census population counts and extant files that provide identifying information (like names and addresses).

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014745
    Description:

    In the design of surveys a number of parameters like contact propensities, participation propensities and costs per sample unit play a decisive role. In on-going surveys, these survey design parameters are usually estimated from previous experience and updated gradually with new experience. In new surveys, these parameters are estimated from expert opinion and experience with similar surveys. Although survey institutes have a fair expertise and experience, the postulation, estimation and updating of survey design parameters is rarely done in a systematic way. This paper presents a Bayesian framework to include and update prior knowledge and expert opinion about the parameters. This framework is set in the context of adaptive survey designs in which different population units may receive different treatment given quality and cost objectives. For this type of survey, the accuracy of design parameters becomes even more crucial to effective design decisions. The framework allows for a Bayesian analysis of the performance of a survey during data collection and in between waves of a survey. We demonstrate the Bayesian analysis using a realistic simulation study.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014722
    Description:

    The U.S. Census Bureau is researching ways to incorporate administrative data in decennial census and survey operations. Critical to this work is an understanding of the coverage of the population by administrative records. Using federal and third party administrative data linked to the American Community Survey (ACS), we evaluate the extent to which administrative records provide data on foreign-born individuals in the ACS and employ multinomial logistic regression techniques to evaluate characteristics of those who are in administrative records relative to those who are not. We find that overall, administrative records provide high coverage of foreign-born individuals in our sample for whom a match can be determined. The odds of being in administrative records are found to be tied to the processes of immigrant assimilation – naturalization, higher English proficiency, educational attainment, and full-time employment are associated with greater odds of being in administrative records. These findings suggest that as immigrants adapt and integrate into U.S. society, they are more likely to be involved in government and commercial processes and programs for which we are including data. We further explore administrative records coverage for the two largest race/ethnic groups in our sample – Hispanic and non-Hispanic single-race Asian foreign born, finding again that characteristics related to assimilation are associated with administrative records coverage for both groups. However, we observe that neighborhood context impacts Hispanics and Asians differently.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014735
    Description:

    Microdata dissemination normally requires data reduction and modification methods be applied, and the degree to which these methods are applied depend on the control methods that will be required to access and use the data. An approach that is in some circumstances more suitable for accessing data for statistical purposes is secure computation, which involves computing analytic functions on encrypted data without the need to decrypt the underlying source data to run a statistical analysis. This approach also allows multiple sites to contribute data while providing strong privacy guarantees. This way the data can be pooled and contributors can compute analytic functions without either party knowing their inputs. We explain how secure computation can be applied in practical contexts, with some theoretical results and real healthcare examples.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014726
    Description:

    Internal migration is one of the components of population growth estimated at Statistics Canada. It is estimated by comparing individuals’ addresses at the beginning and end of a given period. The Canada Child Tax Benefit and T1 Family File are the primary data sources used. Address quality and coverage of more mobile subpopulations are crucial to producing high-quality estimates. The purpose of this article is to present the results of evaluations of these elements using access to more tax data sources at Statistics Canada.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014751
    Description:

    Practically all major retailers use scanners to record the information on their transactions with clients (consumers). These data normally include the product code, a brief description, the price and the quantity sold. This is an extremely relevant data source for statistical programs such as Statistics Canada’s Consumer Price Index (CPI), one of Canada’s most important economic indicators. Using scanner data could improve the quality of the CPI by increasing the number of prices used in calculations, expanding geographic coverage and including the quantities sold, among other things, while lowering data collection costs. However, using these data presents many challenges. An examination of scanner data from a first retailer revealed a high rate of change in product identification codes over a one-year period. The effects of these changes pose challenges from a product classification and estimate quality perspective. This article focuses on the issues associated with acquiring, classifying and examining these data to assess their quality for use in the CPI.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014716
    Description:

    Administrative data, depending on its source and original purpose, can be considered a more reliable source of information than survey-collected data. It does not require a respondent to be present and understand question wording, and it is not limited by the respondent’s ability to recall events retrospectively. This paper compares selected survey data, such as demographic variables, from the Longitudinal and International Study of Adults (LISA) to various administrative sources for which LISA has linkage agreements in place. The agreement between data sources, and some factors that might affect it, are analyzed for various aspects of the survey.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014728
    Description:

    Record linkage joins together two or more sources. The product of record linkage is a file with one record per individual containing all the information about the individual from the multiple files. The problem is difficult when a unique identification key is not available, there are errors in some variables, some data are missing, and files are large. Probabilistic record linkage computes a probability that records from on different files pertain to a single individual. Some true links are given low probabilities of matching, whereas some non links are given high probabilities. Errors in linkage designations can cause bias in analyses based on the composite data base. The SEER cancer registries contain information on breast cancer cases in their registry areas. A diagnostic test based on the Oncotype DX assay, performed by Genomic Health, Inc. (GHI), is often performed for certain types of breast cancers. Record linkage using personal identifiable information was conducted to associate Oncotype DC assay results with SEER cancer registry information. The software Link Plus was used to generate a score describing the similarity of records and to identify the apparent best match of SEER cancer registry individuals to the GHI database. Clerical review was used to check samples of likely matches, possible matches, and unlikely matches. Models are proposed for jointly modeling the record linkage process and subsequent statistical analysis in this and other applications.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014704
    Description:

    We identify several research areas and topics for methodological research in official statistics. We argue why these are important, and why these are the most important ones for official statistics. We describe the main topics in these research areas and sketch what seems to be the most promising ways to address them. Here we focus on: (i) Quality of National accounts, in particular the rate of growth of GNI (ii) Big data, in particular how to create representative estimates and how to make the most of big data when this is difficult or impossible. We also touch upon: (i) Increasing timeliness of preliminary and final statistical estimates (ii) Statistical analysis, in particular of complex and coherent phenomena. These topics are elements in the present Strategic Methodological Research Program that has recently been adopted at Statistics Netherlands

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014740
    Description:

    In this paper, we discuss the impacts of Employment Benefit and Support Measures delivered in Canada under the Labour Market Development Agreements. We use linked rich longitudinal administrative data covering all LMDA participants from 2002 to 2005. We Apply propensity score matching as in Blundell et al. (2002), Gerfin and Lechner (2002), and Sianesi (2004), and produced the national incremental impact estimates using difference-in-differences and Kernel Matching estimator (Heckman and Smith, 1999). The findings suggest that, both Employment Assistance Services and employment benefit such as Skills Development and Targeted Wage Subsidies had positive effects on earnings and employment.

    Release date: 2016-03-24

Data (0)

Data (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Analysis (0)

Analysis (0) (0 results)

Your search for "" found no results in this section of the site.

You may try:

Reference (657)

Reference (657) (25 of 657 results)

  • Technical products: 84-538-X
    Description:

    This document presents the methodology underlying the production of the life tables for Canada, provinces and territories, from reference period 1980/1982 and onward.

    Release date: 2017-11-16

  • Technical products: 12-206-X
    Description:

    This report summarizes the achievements program sponsored by the three methodology divisions of Statistics Canada. This program covers research and development activities in statistical methods with potentially broad application in the Agency's survey programs, which would not otherwise have been carried out during the provision of methodology services to those survey programs. They also include tasks that provided client support in the application of past successful developments in order to promote the utilization of the results of research and development work.

    Release date: 2017-11-03

  • Technical products: 12-586-X
    Description:

    The Quality Assurance Framework (QAF) serves as the highest-level governance tool for quality management at Statistics Canada. The QAF gives an overview of the quality management and risk mitigation strategies used by the Agency’s program areas. The QAF is used in conjunction with Statistics Canada management practices, such as those described in the Quality Guidelines.

    Release date: 2017-04-21

  • Technical products: 91-621-X2017001
    Release date: 2017-01-25

  • Technical products: 75F0002M
    Description:

    This series provides detailed documentation on income developments, including survey design issues, data quality evaluation and exploratory research.

    Release date: 2016-07-08

  • Technical products: 75F0002M2016003
    Description:

    Periodically, income statistics are updated to reflect the most recent population estimates from the Census. Accordingly, with the release of the 2014 data from the Canadian Income Survey, Statistics Canada has revised estimates for 2006 to 2013 using new population totals from the 2011 Census. This paper provides unrevised estimates alongside revised estimates for key income series, indicating where the revisions were significant.

    Release date: 2016-07-08

  • Technical products: 11-522-X
    Description:

    Since 1984, an annual international symposium on methodological issues has been sponsored by Statistics Canada. Proceedings have been available since 1987.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014718
    Description:

    This study assessed whether starting participation in Employment Assistance Services (EAS) earlier after initiating an Employment Insurance (EI) claim leads to better impacts for unemployed individuals than participating later during the EI benefit period. As in Sianesi (2004) and Hujer and Thomsen (2010), the analysis relied on a stratified propensity score matching approach conditional on the discretized duration of unemployment until the program starts. The results showed that individuals who participated in EAS within the first four weeks after initiating an EI claim had the best impacts on earnings and incidence of employment while also experiencing reduced use of EI starting the second year post-program.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014750
    Description:

    The Educational Master File (EMF) system was built to allow the analysis of educational programs in Canada. At the core of the system are administrative files that record all of the registrations to post-secondary and apprenticeship programs in Canada. New administrative files become available on an annual basis. Once a new file becomes available, a first round of processing is performed, which includes linkage to other administrative records. This linkage yields information that can improve the quality of the file, it allows further linkages to other data describing labour market outcomes, and it’s the first step in adding the file to the EMF. Once part of the EMF, information from the file can be included in cross-sectional and longitudinal projects, to study academic pathways and labour market outcomes after graduation. The EMF currently consists of data from 2005 to 2013, but it evolves as new data become available. This paper gives an overview of the mechanisms used to build the EMF, with focus on the structure of the final system and some of its analytical potential.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014727
    Description:

    "Probability samples of near-universal frames of households and persons, administered standardized measures, yielding long multivariate data records, and analyzed with statistical procedures reflecting the design – these have been the cornerstones of the empirical social sciences for 75 years. That measurement structure have given the developed world almost all of what we know about our societies and their economies. The stored survey data form a unique historical record. We live now in a different data world than that in which the leadership of statistical agencies and the social sciences were raised. High-dimensional data are ubiquitously being produced from Internet search activities, mobile Internet devices, social media, sensors, retail store scanners, and other devices. Some estimate that these data sources are increasing in size at the rate of 40% per year. Together their sizes swamp that of the probability-based sample surveys. Further, the state of sample surveys in the developed world is not healthy. Falling rates of survey participation are linked with ever-inflated costs of data collection. Despite growing needs for information, the creation of new survey vehicles is hampered by strained budgets for official statistical agencies and social science funders. These combined observations are unprecedented challenges for the basic paradigm of inference in the social and economic sciences. This paper discusses alternative ways forward at this moment in history. "

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014734
    Description:

    Data protection and privacy are key challenges that need to be tackled with high priority in order to enable the use of Big Data in the production of Official Statistics. This was emphasized in 2013 by the Directors of National Statistical Insitutes (NSIs) of the European Statistical System Committee (ESSC) in the Scheveningen Memorandum. The ESSC requested Eurostat and the NSIs to elaborate an action plan with a roadmap for following up the implementation of the Memorandum. At the Riga meeting on September 26, 2014, the ESSC endorsed the Big Data Action Plan and Roadmap 1.0 (BDAR) presented by the Eurostat Task Force on Big Data (TFBD) and agreed to integrate it into the ESS Vision 2020 portfolio. Eurostat also collaborates in this field with external partners such as the United Nations Economic Commission for Europe (UNECE). The big data project of the UNECE High-Level Group is an international project on the role of big data in the modernization of statistical production. It comprised four ‘task teams’ addressing different aspects of Big Data issues relevant for official statistics: Privacy, Partnerships, Sandbox, and Quality. The Privacy Task Team finished its work in 2014 and gave an overview of the existing tools for risk management regarding privacy issues, described how risk of identification relates to Big Data characteristics and drafted recommendations for National Statistical Offices (NSOs). It mainly concluded that extensions to existing frameworks, including use of new technologies were needed in order to deal with privacy risks related to the use of Big Data. The BDAR builds on the work achieved by the UNECE task teams. Specifically, it recognizes that a number of big data sources contain sensitive information, that their use for official statistics may induce negative perceptions with the general public and other stakeholders and that this risk should be mitigated in the short to medium term. It proposes to launch multiple actions like e.g., an adequate review on ethical principles governing the roles and activities of the NSIs and a strong communication strategy. The paper presents the different actions undertaken within the ESS and in collaboration with UNECE, as well as potential technical and legal solutions to be put in place in order to address the data protection and privacy risks in the use of Big Data for Official Statistics.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014747
    Description:

    The Longitudinal Immigration Database (IMDB) combines the Immigrant Landing File (ILF) with annual tax files. This record linkage is performed using a tax filer database. The ILF includes all immigrants who have landed in Canada since 1980. In looking to enhance the IMDB, the possibility of adding temporary residents (TR) and immigrants who landed between 1952 and 1979 (PRE80) was studied. Adding this information would give a more complete picture of the immigrant population living in Canada. To integrate the TR and PRE80 files into the IMDB, record linkages between these two files and the tax filer database, were performed. This exercise was challenging in part due to the presence of duplicates in the files and conflicting links between the different record linkages.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014709
    Description:

    Traffic congestion is not limited to large cities but is also becoming a problem in medium-size cities and to roads going through cities. Among a large variety of congestion measures, six were selected for the ease of aggregation and their capacity to use the instantaneous information from CVUS-light component in 2014. From the selected measures, the Index of Congestion is potentially the only one not biased. This measure is used to illustrate different dimension of congestion on the road network.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014711
    Description:

    After the 2010 Census, the U.S. Census Bureau conducted two separate research projects matching survey data to databases. One study matched to the third-party database Accurint, and the other matched to U.S. Postal Service National Change of Address (NCOA) files. In both projects, we evaluated response error in reported move dates by comparing the self-reported move date to records in the database. We encountered similar challenges in the two projects. This paper discusses our experience using “big data” as a comparison source for survey data and our lessons learned for future projects similar to the ones we conducted.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014757
    Description:

    The Unified Brazilian Health System (SUS) was created in 1988 and, with the aim of organizing the health information systems and databases already in use, a unified databank (DataSUS) was created in 1991. DataSUS files are freely available via Internet. Access and visualization of such data is done through a limited number of customized tables and simple diagrams, which do not entirely meet the needs of health managers and other users for a flexible and easy-to-use tool that can tackle different aspects of health which are relevant to their purposes of knowledge-seeking and decision-making. We propose the interactive monthly generation of synthetic epidemiological reports, which are not only easily accessible but also easy to interpret and understand. Emphasis is put on data visualization through more informative diagrams and maps.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014731
    Description:

    Our study describes various factors that are of concern when evaluating disclosure risk of contextualized microdata and some of the empirical steps that are involved in their assessment. Utilizing synthetic sets of survey respondents, we illustrate how different postulates shape the assessment of risk when considering: (1) estimated probabilities that unidentified geographic areas are represented within a survey; (2) the number of people in the population who share the same personal and contextual identifiers as a respondent; and (3) the anticipated amount of coverage error in census population counts and extant files that provide identifying information (like names and addresses).

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014745
    Description:

    In the design of surveys a number of parameters like contact propensities, participation propensities and costs per sample unit play a decisive role. In on-going surveys, these survey design parameters are usually estimated from previous experience and updated gradually with new experience. In new surveys, these parameters are estimated from expert opinion and experience with similar surveys. Although survey institutes have a fair expertise and experience, the postulation, estimation and updating of survey design parameters is rarely done in a systematic way. This paper presents a Bayesian framework to include and update prior knowledge and expert opinion about the parameters. This framework is set in the context of adaptive survey designs in which different population units may receive different treatment given quality and cost objectives. For this type of survey, the accuracy of design parameters becomes even more crucial to effective design decisions. The framework allows for a Bayesian analysis of the performance of a survey during data collection and in between waves of a survey. We demonstrate the Bayesian analysis using a realistic simulation study.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014722
    Description:

    The U.S. Census Bureau is researching ways to incorporate administrative data in decennial census and survey operations. Critical to this work is an understanding of the coverage of the population by administrative records. Using federal and third party administrative data linked to the American Community Survey (ACS), we evaluate the extent to which administrative records provide data on foreign-born individuals in the ACS and employ multinomial logistic regression techniques to evaluate characteristics of those who are in administrative records relative to those who are not. We find that overall, administrative records provide high coverage of foreign-born individuals in our sample for whom a match can be determined. The odds of being in administrative records are found to be tied to the processes of immigrant assimilation – naturalization, higher English proficiency, educational attainment, and full-time employment are associated with greater odds of being in administrative records. These findings suggest that as immigrants adapt and integrate into U.S. society, they are more likely to be involved in government and commercial processes and programs for which we are including data. We further explore administrative records coverage for the two largest race/ethnic groups in our sample – Hispanic and non-Hispanic single-race Asian foreign born, finding again that characteristics related to assimilation are associated with administrative records coverage for both groups. However, we observe that neighborhood context impacts Hispanics and Asians differently.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014735
    Description:

    Microdata dissemination normally requires data reduction and modification methods be applied, and the degree to which these methods are applied depend on the control methods that will be required to access and use the data. An approach that is in some circumstances more suitable for accessing data for statistical purposes is secure computation, which involves computing analytic functions on encrypted data without the need to decrypt the underlying source data to run a statistical analysis. This approach also allows multiple sites to contribute data while providing strong privacy guarantees. This way the data can be pooled and contributors can compute analytic functions without either party knowing their inputs. We explain how secure computation can be applied in practical contexts, with some theoretical results and real healthcare examples.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014726
    Description:

    Internal migration is one of the components of population growth estimated at Statistics Canada. It is estimated by comparing individuals’ addresses at the beginning and end of a given period. The Canada Child Tax Benefit and T1 Family File are the primary data sources used. Address quality and coverage of more mobile subpopulations are crucial to producing high-quality estimates. The purpose of this article is to present the results of evaluations of these elements using access to more tax data sources at Statistics Canada.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014751
    Description:

    Practically all major retailers use scanners to record the information on their transactions with clients (consumers). These data normally include the product code, a brief description, the price and the quantity sold. This is an extremely relevant data source for statistical programs such as Statistics Canada’s Consumer Price Index (CPI), one of Canada’s most important economic indicators. Using scanner data could improve the quality of the CPI by increasing the number of prices used in calculations, expanding geographic coverage and including the quantities sold, among other things, while lowering data collection costs. However, using these data presents many challenges. An examination of scanner data from a first retailer revealed a high rate of change in product identification codes over a one-year period. The effects of these changes pose challenges from a product classification and estimate quality perspective. This article focuses on the issues associated with acquiring, classifying and examining these data to assess their quality for use in the CPI.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014716
    Description:

    Administrative data, depending on its source and original purpose, can be considered a more reliable source of information than survey-collected data. It does not require a respondent to be present and understand question wording, and it is not limited by the respondent’s ability to recall events retrospectively. This paper compares selected survey data, such as demographic variables, from the Longitudinal and International Study of Adults (LISA) to various administrative sources for which LISA has linkage agreements in place. The agreement between data sources, and some factors that might affect it, are analyzed for various aspects of the survey.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014728
    Description:

    Record linkage joins together two or more sources. The product of record linkage is a file with one record per individual containing all the information about the individual from the multiple files. The problem is difficult when a unique identification key is not available, there are errors in some variables, some data are missing, and files are large. Probabilistic record linkage computes a probability that records from on different files pertain to a single individual. Some true links are given low probabilities of matching, whereas some non links are given high probabilities. Errors in linkage designations can cause bias in analyses based on the composite data base. The SEER cancer registries contain information on breast cancer cases in their registry areas. A diagnostic test based on the Oncotype DX assay, performed by Genomic Health, Inc. (GHI), is often performed for certain types of breast cancers. Record linkage using personal identifiable information was conducted to associate Oncotype DC assay results with SEER cancer registry information. The software Link Plus was used to generate a score describing the similarity of records and to identify the apparent best match of SEER cancer registry individuals to the GHI database. Clerical review was used to check samples of likely matches, possible matches, and unlikely matches. Models are proposed for jointly modeling the record linkage process and subsequent statistical analysis in this and other applications.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014704
    Description:

    We identify several research areas and topics for methodological research in official statistics. We argue why these are important, and why these are the most important ones for official statistics. We describe the main topics in these research areas and sketch what seems to be the most promising ways to address them. Here we focus on: (i) Quality of National accounts, in particular the rate of growth of GNI (ii) Big data, in particular how to create representative estimates and how to make the most of big data when this is difficult or impossible. We also touch upon: (i) Increasing timeliness of preliminary and final statistical estimates (ii) Statistical analysis, in particular of complex and coherent phenomena. These topics are elements in the present Strategic Methodological Research Program that has recently been adopted at Statistics Netherlands

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014740
    Description:

    In this paper, we discuss the impacts of Employment Benefit and Support Measures delivered in Canada under the Labour Market Development Agreements. We use linked rich longitudinal administrative data covering all LMDA participants from 2002 to 2005. We Apply propensity score matching as in Blundell et al. (2002), Gerfin and Lechner (2002), and Sianesi (2004), and produced the national incremental impact estimates using difference-in-differences and Kernel Matching estimator (Heckman and Smith, 1999). The findings suggest that, both Employment Assistance Services and employment benefit such as Skills Development and Targeted Wage Subsidies had positive effects on earnings and employment.

    Release date: 2016-03-24

Date modified: