Statistics by subject – Statistical methods

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

129 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

129 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

129 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Filter results by

Help for filters and search
Currently selected filters that can be removed

Keyword(s)

Author(s)

129 facets displayed. 1 facets selected.

Content

1 facets displayed. 0 facets selected.

Other available resources to support your research.

Help for sorting results
Browse our central repository of key standard concepts, definitions, data sources and methods.
Loading
Loading in progress, please wait...
All (169)

All (169) (25 of 169 results)

  • Articles and reports: 12-001-X20030016610
    Description:

    In the presence of item nonreponse, unweighted imputation methods are often used in practice but they generally lead to biased estimators under uniform response within imputation classes. Following Skinner and Rao (2002), we propose a bias-adjusted estimator of a population mean under unweighted ratio imputation and random hot-deck imputation and derive linearization variance estimators. A small simulation study is conducted to study the performance of the methods in terms of bias and mean square error. Relative bias and relative stability of the variance estimators are also studied.

    Release date: 2003-07-31

  • Articles and reports: 12-001-X20030016600
    Description:

    International comparability of Official Statistics is important for domestic uses within any country. But international comparability matters also for the international uses of statistics; in particular the development and monitoring of global policies and assessing economic and social development throughout the world. Additionally statistics are used by international agencies and bilateral technical assistance programmes to monitor the impact of technical assistance.The first part of this paper describes how statistical indicators are used by the United Nations and other agencies. The framework of statistical indicators for these purposes is described ans some issues concerning the choice and quality of these indicators are identified.In the past there has been considerable methodological research in support of Official Statistics particularly by the strongest National Statistical Offices and some academics. This has established the basic methodologies for Official Statistics and has led to considerable developments and quality improvements over time. Much has been achieved. However the focus has, to an extent, been on national uses of Official Statistics. These developments have, of course, benefited the international uses, and some specific developments have also occurred. There is however a need to foster more methodological development on the international requirements. In the second part of this paper a number of examples illustrate this need.

    Release date: 2003-07-31

  • Articles and reports: 12-001-X20030016613
    Description:

    The Illinois Department of Employment Security is using small domain estimation techniques to estimate employment at the county or industry divisional level. The estimator is a standard synthetic estimator, based on the ability to match Current Employment Statistics sample data to ES202 administrative records and an assumed model relationship between the two data sources. This paper is a case study that reviews the steps taken to evaluate the appropriateness of the model and the difficulties encountered in linking the two data sources.

    Release date: 2003-07-31

  • Articles and reports: 12-001-X20020026428
    Description:

    The analysis of survey data from different geographical areas where the data from each area are polychotomous can be easily performed using hierarchical Bayesian models, even if there are small cell counts in some of these areas. However, there are difficulties when the survey data have missing information in the form of non-response, especially when the characteristics of the respondents differ from the non-respondents. We use the selection approach for estimation when there are non-respondents because it permits inference for all the parameters. Specifically, we describe a hierarchical Bayesian model to analyse multinomial non-ignorable non-response data from different geographical areas; some of them can be small. For the model, we use a Dirichlet prior density for the multinomial probabilities and a beta prior density for the response probabilities. This permits a 'borrowing of strength' of the data from larger areas to improve the reliability in the estimates of the model parameters corresponding to the smaller areas. Because the joint posterior density of all the parameters is complex, inference is sampling-based and Markov chain Monte Carlo methods are used. We apply our method to provide an analysis of body mass index (BMI) data from the third National Health and Nutrition Examination Survey (NHANES III). For simplicity, the BMI is categorized into 3 natural levels, and this is done for each of 8 age-race-sex domains and 34 counties. We assess the performance of our model using the NHANES III data and simulated examples, which show our model works reasonably well.

    Release date: 2003-01-29

  • Technical products: 11-522-X20010016293
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    This paper presents the Second Summit of the Americas Regional Education Indicators Project (PRIE), whose basic goal is to develop a set of comparable indicators for the Americas. This project is led by the Ministry of Education of Chile and has been developed in response to the countries' needs to improve their information systems and statistics. The countries need to construct reliable and relevant indicators to support decisions in education, both within their individual countries and the region as a whole. The first part of the paper analyses the importance of statistics and indicators in supporting educational policies and programs, and describes the present state of the information and statistics systems in these countries. It also discusses the major problems faced by the countries and reviews the countries' experiences in participating in other education indicators' projects or programs, such as the INES Program, WEI Project, MERCOSUR and CREMIS. The second part of the paper examines PRIE's technical co-operation program, its purpose and implementation. The second part also emphasizes how technical co-operation responds to the needs of the countries, and supports them in filling in the gaps in available and reliable data.

    Release date: 2002-09-12

  • Technical products: 11-522-X20010016233
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    From January 2000, the data collection method of the Finnish Consumer Survey was changed from a Labour Force Survey panel design mode to an independent survey. All interviews are now carried out centrally from Statistics Finland's Computer Assisted Telephone Interview (CATI) Centre. There have been suggestions that the new survey mode has been influencing the respondents' answers. This paper analyses the extent of obvious changes in the results of the Finnish Consumer Survey. This is accomplished with the help of a pilot survey. Furthermore, this paper studies the interviewer's role in the data collection process. The analysis is based on cross-tabulations, chi-square tests and multinomial logit models. It shows that the new survey method produces more optimistic estimations and expectations concerning economic matters than the old method did.

    Release date: 2002-09-12

  • Technical products: 11-522-X20010016265
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    Several key strategies contributed to the success of the United States' Census 2000. This paper describes the strategy for the Address Building Process that incorporated numerous address lists and updated activities. The Field Interview Process created close to 900,000 jobs that needed to be filled. Two key strategies to achieve this are also described. The Formal Quality Control Process established principles to guide the quality assurance (QA) programs. These programs are presented, as are some of the examples of their implementation. The Coverage Measurement and Correction Process was used to increase census accuracy through the use of statistical methods. The steps taken to ensure Annual Capital Expenditures (ACE) accuracy and quality are described and the preliminary estimates of the undercount are reported.

    Release date: 2002-09-12

  • Technical products: 11-522-X20010016282
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    The Discharge Abstract Database (DAD) is one of the key data holdings held by the Canadian Institute for Health Information (CIHI). The institute is a national, not-for-profit organization, which plays a critical role in the development of Canada's health information system. The DAD contains acute care discharge data from most Canadian hospitals. The data generated are essential for determining, for example, the number and types of procedures and the length of hospital stays. CIHI is conducting the first national data quality study of selected clinical and administrative data from the DAD. This study is evaluating and measuring the accuracy of the DAD by returning to the original data sources and comparing this information with what exists in the CIHI database, in order to identify any discrepancies and their associated reasons. This paper describes the DAD data quality study and some preliminary findings. The findings are also briefly compared with another similar study. In conclusion, the paper discusses subsequent steps for the study and how the findings from the first year are contributing to improvements in the quality of the DAD.

    Release date: 2002-09-12

  • Technical products: 11-522-X20010016303
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    In large-scale surveys, it is almost guaranteed that some level of non-response will occur. Generally, statistical agencies use imputation as a way to treat non-response items. A common preliminary step to imputation is the formation of imputation cells. In this article, the formation of these cells is studied using two methods. The first method is similar to that of Eltinge and Yansaneh (1997) in the case of weighting cells and the second is the method currently used in the Canadian Labour Force Survey. Using Labour Force data, simulation studies are performed to test the impact of the response rate, the response mechanism, and constraints on the quality of the point estimator in both methods.

    Release date: 2002-09-12

  • Technical products: 11-522-X20010016247
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    This paper describes joint research by the Office for National Statistics (ONS) and Southampton University regarding the evaluation of several different approaches to the local estimation of International Labour Office (ILO) unemployment. The need to compare estimators with different underlying assumptions has led to a focus on evaluation methods that are (partly at least) model-independent. Model-fit diagnostics that have been considered include: various residual procedures, cross-validation, predictive validation, consistency with marginals, and consistency with direct estimates within single cells. These diagnostics have been used to compare different model-based estimators with each other and with direct estimators.

    Release date: 2002-09-12

  • Technical products: 11-522-X20010016258
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    To fill statistical gaps in the areas of health determinants, health status and health system usage by the Canadian population at the health region levels (sub-provincial areas or regions of interest to health authorities), Statistics Canada established a new survey called the Canadian Community Health Survey (CCHS). The CCHS consists of two separate components: a regional survey in the first year and a provincial survey in the second year. The main purpose of the regional survey, for which collection took place between September 2000 and October 2001, was to produce cross-sectional estimates for 136 health regions in Canada, based on a sample of more than 134,000 respondents. This article focuses on the various measures taken at the time of data collection to ensure a high level of quality for this large-scale survey.

    Release date: 2002-09-12

  • Technical products: 11-522-X20010016298
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    This paper discusses the Office for National Statistics' (ONS) approach to developing systematic quality measurements and reporting methods. It is presented against the background of European developments and the growing demand for quality measurement. Measuring the quality of statistics presents considerable practical and methodological challenges. The paper describes the main building blocks to be used for the new quality measure program, and includes specific examples. Working with other national statistical institutions; and developing an enhanced measurement framework, output measurements, and reporting procedures, are all vital ingredients in achieving recognition of the ONS as a quality organization.

    Release date: 2002-09-12

  • Technical products: 11-522-X20010016279
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    Rather than having to rely on traditional measures of survey quality, such as response rates, the Social Survey Division of the U.K. Office for National Statistics has been looking for alternative ways to report on quality. In order to achieve this, all the processes involved throughout the lifetime of a survey, from sampling and questionnaire design through to production of the finished report, have been mapped out. Having done this, we have been able to find quality indicators for many of these processes. By using this approach, we hope to be able to appraise any changes to our processes as well as to inform our customers of the quality of the work we carry out.

    Release date: 2002-09-12

  • Technical products: 11-522-X20010016288
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    The upcoming 2002 U.S. Economic Census will give businesses the option of submitting their data on paper or by electronic media. If reporting electronically, they may report via Windows-based Computerized Self-Administered Questionnaires (CSAQs). The U.S. Census Bureau will offer electronic reporting for over 650 different forms to all respondents. The U.S. Census Bureau has assembled a cross-divisional team to develop an electronic forms style guide, outlining the design standards to use in electronic form creation and ensuring that the quality of the form designs will be consistent throughout.

    The purpose of a style guide is to foster consistency among the various analysts who may be working on different pieces of a software development project (in this case, a CSAQ). The team determined that the style guide should include standards for layout and screen design, navigation, graphics, edit capabilities, additional help, feedback, audit trails, and accessibility for disabled users.

    Members of the team signed up to develop various sections of the style guide. The team met weekly to discuss and review the sections. Members of the team also conducted usability tests on edits, and subject-matter employees provided recommendations to upper management. Team members conducted usability testing on prototype forms with actual respondents. The team called in subject-matter experts as necessary to assist in making decisions about particular forms where the constraints of the electronic medium required changes to the paper form.

    The style guide will become the standard for all CSAQs for the 2002 Economic Census, which will ensure consistency across the survey programs.

    Release date: 2002-09-12

  • Technical products: 11-522-X20010016283
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    The accurate recording of patients' Indegenous status in hospital separations data is critical to analyses of health service use by Aboriginal and Torres Strait Islander Australians, who have relatively poor health. However, the accuracy of these data is now well understood. In 1998, a methodology for assessing the data accuracy was piloted in 11 public hospitals. Data were collected for 8,267 patients using a personal interview, and compared with the corresponding, routinely collected data. Among the 11 hospitals, the proportion of patients correctly recorded as Indigenous ranged from 55 % to 100 %. Overall, hospitals with high proportions of Indigenous persons in their catchment areas reported more accurate data. The methodology has since been used to assess data quality in hospitals in two Australian states and to promote best practice data collection.

    Release date: 2002-09-12

  • Technical products: 11-522-X20010016257
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    This paper describes the methods used to increase the response rates of a generalist and specialist physician survey. The means of delivery (regular versus priority mail) and notification of inclusion in a draw for cash prizes were randomized using a 2 x 2 factorial design. While neither priority delivery nor notification of a cash prize sufficiently overcame whatever obstacles exist in this population, both approaches had a positive, though limited, effect on the response rate of the physicians. However, a subsequent mailing of a prepaid cash incentive (delivered by courier) was particularly effective in increasing the representation of the generalist subsample.

    Release date: 2002-09-12

  • Articles and reports: 12-001-X20010026091
    Description:

    The theory of double sampling is usually presented under the assumption that one of the samples is nested within the other. This type of sampling is called two-phase sampling. The first-phase sample provides auxiliary information (x) that is relatively inexpensive to obtain, whereas the second-phase sample: (b) to improve the estimate using a difference, ratio or regression estimator; or (c) to draw a sub-sample of non-respondent units. However, it is not necessary for one of the samples to be nested in the other or selected from the same frame. The case of non-nested double sampling is dealt with in passing in the classical works on sampling (Des Raj 1968, Cochrane 1977). This method is now used in several national statistical agencies. This paper consolidates double sampling by presenting it in a unified manner. Several examples of surveys used at Statistics Canada illustrate this unification.

    Release date: 2002-02-28

  • Articles and reports: 12-001-X20000025531
    Description:

    Information from list and area sampling frames is combined to obtain efficient estimates of population size and totals. We consider the case where the probabilities of inclusion on the list frames are heterogeneous and are modeled as a function of covariates. We adapt and modify the methodology of Huggins (1989) and Albo (1990) for modeling auxiliary variables in capture-recapture studies using a logistic regression model. We present the results from a simulation study which compares various estimators of frame size and population totals using the logistic regression approach to modeling heterogeneous inclusion probabilities.

    Release date: 2001-02-28

  • Articles and reports: 81-003-X19990045145
    Description:

    This paper examines the characteristics of young people who responded to the 1991 School Leavers Survey (SLS), but who subsequently failed to respond to the 1995 School Leavers Follow-up Survey (SLF).

    Release date: 2000-09-01

  • Articles and reports: 12-001-X20000015183
    Description:

    For surveys which involve more than one stage of data collection, one method recommended for adjusting weights for nonresponse (after the first stage of data collection) entails utilizing auxiliary variables (from previous stages of data collection) which are identified as predictors of nonresponse.

    Release date: 2000-08-30

  • Technical products: 11-522-X19990015662
    Description:

    As the availability of both health utilization and outcome information becomes increasingly important to health care researchers and policy makers, the ability to link person-specific health data becomes a critical objective. This type of linkage of population-based administrative health databases has been realized in British Columbia. The database was created by constructing an historical file of all persons registered with the health care system, and then by probabilistically linking various program files to this 'coordinating' file. The first phase of development included the linkage of hospital discharge data, physician billing data, continuing care data, data about drug costs for the elderly, births data and deaths data. The second phase of development has seen the addition data sources external to the Ministry of Health including cancer incidence data, workers' compensation data, and income assistance data.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015652
    Description:

    Objective: To create an occupational surveillance system by collecting, linking, evaluating and disseminating data relating to occupation and mortality with the ultimate aim of reducing or preventing excess risk among workers and the general population.

    Release date: 2000-03-02

  • Technical products: 11-522-X19990015678
    Description:

    A population needs-based health care resource allocation model was developed and applied using age, sex and health status of populations to measure population need for health care in Ontario. To develop the model, provincial data on self-assessed health and health service utilization by age and sex from 62,413 respondents to the 1990 Ontario Health Survey (OHS) were used in combination with provincial health care expenditure data for the fiscal year 1995/96 by age and sex. The model was limited to the services that were covered in the OHS (general practitioner, specialist physician, optometry, physiotherapy, chiropractic and acute hospital). The distribution of utilization and expenditures between age-sex-health status categories was used to establish appropriate health care resource shares for each age-sex-health status combination. These resource shares were then applied to geographic populations using age, sex and health status data from the OHS together with more recent population estimates to determine the needs-based health care resource allocation for each area. Total dollar allocations were restricted to sum to the 1995/96 provincial budget and were compared with 1995/96 allocations to determine the extent to which Ontario allocations are consistent with the relative needs of the area populations.

    Release date: 2000-03-02

  • Technical products: 11-522-X19980015032
    Description:

    The objective of this research project is to examine the long-term consequences of being raised in a single parent household. We examine the impact of parental separation or divorce on the adult labour market behaviour of children ten to fifteen years after the event. In particular, we relate the family income and household characteristics of a cohort of individuals who are 16 to 19 years of age in 1982 to their labour market earnings, reliance on social transfers (UI and Income Assistance), and marital/fertility outcomes during the early 1990s, when they are in their late 20s and early 30s. Our data is based upon the linked income tax records developed by us at Statistics Canada, the Survey of Labour and Income Dynamics, and the National Longitudinal Survey of Children and Youth.

    Release date: 1999-10-22

  • Technical products: 11-522-X19980015026
    Description:

    The purpose of the present study is to utilize panel data from the Current Population Survey (CPS) to examine the effects of unit nonresponse. Because most nonrespondents to the CPS are respondents during at least one month-in-sample, data from other months can be used to compare the characteristics of complete respondents and panel nonrespondents and to evaluate nonresponse adjustment procedures. In the current paper we present analyses utilizing CPS panel data to illustrate the effects of unit nonresponse. After adjusting for nonresponse, additional comparisons are also made to evaluate the effects of nonresponse adjustment. The implications of the findings and suggestions for further research are discussed.

    Release date: 1999-10-22

Data (1)

Data (1) (1 result)

  • Table: 62-010-X19970023422
    Description:

    The current official time base of the Consumer Price Index (CPI) is 1986=100. This time base was first used when the CPI for June 1990 was released. Statistics Canada is about to convert all price index series to the time base 1992=100. As a result, all constant dollar series will be converted to 1992 dollars. The CPI will shift to the new time base when the CPI for January 1998 is released on February 27th, 1998.

    Release date: 1997-11-17

Analysis (81)

Analysis (81) (25 of 81 results)

  • Articles and reports: 82-003-X201700614829
    Description:

    POHEM-BMI is a microsimulation tool that includes a model of adult body mass index (BMI) and a model of childhood BMI history. This overview describes the development of BMI prediction models for adults and of childhood BMI history, and compares projected BMI estimates with those from nationally representative survey data to establish validity.

    Release date: 2017-06-21

  • Articles and reports: 12-001-X201600214662
    Description:

    Two-phase sampling designs are often used in surveys when the sampling frame contains little or no auxiliary information. In this note, we shed some light on the concept of invariance, which is often mentioned in the context of two-phase sampling designs. We define two types of invariant two-phase designs: strongly invariant and weakly invariant two-phase designs. Some examples are given. Finally, we describe the implications of strong and weak invariance from an inference point of view.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600114540
    Description:

    In this paper, we compare the EBLUP and pseudo-EBLUP estimators for small area estimation under the nested error regression model and three area level model-based estimators using the Fay-Herriot model. We conduct a design-based simulation study to compare the model-based estimators for unit level and area level models under informative and non-informative sampling. In particular, we are interested in the confidence interval coverage rate of the unit level and area level estimators. We also compare the estimators if the model has been misspecified. Our simulation results show that estimators based on the unit level model perform better than those based on the area level. The pseudo-EBLUP estimator is the best among unit level and area level estimators.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114543
    Description:

    The regression estimator is extensively used in practice because it can improve the reliability of the estimated parameters of interest such as means or totals. It uses control totals of variables known at the population level that are included in the regression set up. In this paper, we investigate the properties of the regression estimator that uses control totals estimated from the sample, as well as those known at the population level. This estimator is compared to the regression estimators that strictly use the known totals both theoretically and via a simulation study.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201500214229
    Description:

    Self-weighting estimation through equal probability selection methods (epsem) is desirable for variance efficiency. Traditionally, the epsem property for (one phase) two stage designs for estimating population-level parameters is realized by using each primary sampling unit (PSU) population count as the measure of size for PSU selection along with equal sample size allocation per PSU under simple random sampling (SRS) of elementary units. However, when self-weighting estimates are desired for parameters corresponding to multiple domains under a pre-specified sample allocation to domains, Folsom, Potter and Williams (1987) showed that a composite measure of size can be used to select PSUs to obtain epsem designs when besides domain-level PSU counts (i.e., distribution of domain population over PSUs), frame-level domain identifiers for elementary units are also assumed to be available. The term depsem-A will be used to denote such (one phase) two stage designs to obtain domain-level epsem estimation. Folsom et al. also considered two phase two stage designs when domain-level PSU counts are unknown, but whole PSU counts are known. For these designs (to be termed depsem-B) with PSUs selected proportional to the usual size measure (i.e., the total PSU count) at the first stage, all elementary units within each selected PSU are first screened for classification into domains in the first phase of data collection before SRS selection at the second stage. Domain-stratified samples are then selected within PSUs with suitably chosen domain sampling rates such that the desired domain sample sizes are achieved and the resulting design is self-weighting. In this paper, we first present a simple justification of composite measures of size for the depsem-A design and of the domain sampling rates for the depsem-B design. Then, for depsem-A and -B designs, we propose generalizations, first to cases where frame-level domain identifiers for elementary units are not available and domain-level PSU counts are only approximately known from alternative sources, and second to cases where PSU size measures are pre-specified based on other practical and desirable considerations of over- and under-sampling of certain domains. We also present a further generalization in the presence of subsampling of elementary units and nonresponse within selected PSUs at the first phase before selecting phase two elementary units from domains within each selected PSU. This final generalization of depsem-B is illustrated for an area sample of housing units.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214248
    Description:

    Unit level population models are often used in model-based small area estimation of totals and means, but the models may not hold for the sample if the sampling design is informative for the model. As a result, standard methods, assuming that the model holds for the sample, can lead to biased estimators. We study alternative methods that use a suitable function of the unit selection probability as an additional auxiliary variable in the sample model. We report the results of a simulation study on the bias and mean squared error (MSE) of the proposed estimators of small area means and on the relative bias of the associated MSE estimators, using informative sampling schemes to generate the samples. Alternative methods, based on modeling the conditional expectation of the design weight as a function of the model covariates and the response, are also included in the simulation study.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214236
    Description:

    We propose a model-assisted extension of weighting design-effect measures. We develop a summary-level statistic for different variables of interest, in single-stage sampling and under calibration weight adjustments. Our proposed design effect measure captures the joint effects of a non-epsem sampling design, unequal weights produced using calibration adjustments, and the strength of the association between an analysis variable and the auxiliaries used in calibration. We compare our proposed measure to existing design effect measures in simulations using variables like those collected in establishment surveys and telephone surveys of households.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500114199
    Description:

    In business surveys, it is not unusual to collect economic variables for which the distribution is highly skewed. In this context, winsorization is often used to treat the problem of influential values. This technique requires the determination of a constant that corresponds to the threshold above which large values are reduced. In this paper, we consider a method of determining the constant which involves minimizing the largest estimated conditional bias in the sample. In the context of domain estimation, we also propose a method of ensuring consistency between the domain-level winsorized estimates and the population-level winsorized estimate. The results of two simulation studies suggest that the proposed methods lead to winsorized estimators that have good bias and relative efficiency properties.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201400214119
    Description:

    When considering sample stratification by several variables, we often face the case where the expected number of sample units to be selected in each stratum is very small and the total number of units to be selected is smaller than the total number of strata. These stratified sample designs are specifically represented by the tabular arrays with real numbers, called controlled selection problems, and are beyond the reach of conventional methods of allocation. Many algorithms for solving these problems have been studied over about 60 years beginning with Goodman and Kish (1950). Those developed more recently are especially computer intensive and always find the solutions. However, there still remains the unanswered question: In what sense are the solutions to a controlled selection problem obtained from those algorithms optimal? We introduce the general concept of optimal solutions, and propose a new controlled selection algorithm based on typical distance functions to achieve solutions. This algorithm can be easily performed by a new SAS-based software. This study focuses on two-way stratification designs. The controlled selection solutions from the new algorithm are compared with those from existing algorithms using several examples. The new algorithm successfully obtains robust solutions to two-way controlled selection problems that meet the optimality criteria.

    Release date: 2014-12-19

  • Articles and reports: 82-003-X201401014098
    Description:

    This study compares registry and non-registry approaches to linking 2006 Census of Population data for Manitoba and Ontario to Hospital data from the Discharge Abstract Database.

    Release date: 2014-10-15

  • Articles and reports: 12-001-X201400114004
    Description:

    In 2009, two major surveys in the Governments Division of the U.S. Census Bureau were redesigned to reduce sample size, save resources, and improve the precision of the estimates (Cheng, Corcoran, Barth and Hogue 2009). The new design divides each of the traditional state by government-type strata with sufficiently many units into two sub-strata according to each governmental unit’s total payroll, in order to sample less from the sub-stratum with small size units. The model-assisted approach is adopted in estimating population totals. Regression estimators using auxiliary variables are obtained either within each created sub-stratum or within the original stratum by collapsing two sub-strata. A decision-based method was proposed in Cheng, Slud and Hogue (2010), applying a hypothesis test to decide which regression estimator is used within each original stratum. Consistency and asymptotic normality of these model-assisted estimators are established here, under a design-based or model-assisted asymptotic framework. Our asymptotic results also suggest two types of consistent variance estimators, one obtained by substituting unknown quantities in the asymptotic variances and the other by applying the bootstrap. The performance of all the estimators of totals and of their variance estimators are examined in some empirical studies. The U.S. Annual Survey of Public Employment and Payroll (ASPEP) is used to motivate and illustrate our study.

    Release date: 2014-06-27

  • Articles and reports: 12-001-X201300211885
    Description:

    Web surveys are generally connected with low response rates. Common suggestions in textbooks on Web survey research highlight the importance of the welcome screen in encouraging respondents to take part. The importance of this screen has been empirically proven in research, showing that most respondents breakoff at the welcome screen. However, there has been little research on the effect of the design of this screen on the level of the breakoff rate. In a study conducted at the University of Konstanz, three experimental treatments were added to a survey of the first-year student population (2,629 students) to assess the impact of different design features of this screen on the breakoff rates. The methodological experiments included varying the background color of the welcome screen, varying the promised task duration on this first screen, and varying the length of the information provided on the welcome screen explaining the privacy rights of the respondents. The analyses show that the longer stated length and the more attention given to explaining privacy rights on the welcome screen, the fewer respondents started and completed the survey. However, the use of a different background color does not result in the expected significant difference.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300211887
    Description:

    Multi-level models are extensively used for analyzing survey data with the design hierarchy matching the model hierarchy. We propose a unified approach, based on a design-weighted log composite likelihood, for two-level models that leads to design-model consistent estimators of the model parameters even when the within cluster sample sizes are small provided the number of sample clusters is large. This method can handle both linear and generalized linear two-level models and it requires level 2 and level 1 inclusion probabilities and level 1 joint inclusion probabilities, where level 2 represents a cluster and level 1 an element within a cluster. Results of a simulation study demonstrating superior performance of the proposed method relative to existing methods under informative sampling are also reported.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300211869
    Description:

    The house price index compiled by Statistics Netherlands relies on the Sale Price Appraisal Ratio (SPAR) method. The SPAR method combines selling prices with prior government assessments of properties. This paper outlines an alternative approach where the appraisals serve as auxiliary information in a generalized regression (GREG) framework. An application on Dutch data demonstrates that, although the GREG index is much smoother than the ratio of sample means, it is very similar to the SPAR series. To explain this result we show that the SPAR index is an estimator of our more general GREG index and in practice almost as efficient.

    Release date: 2014-01-15

  • Articles and reports: 12-001-X201300111830
    Description:

    We consider two different self-benchmarking methods for the estimation of small area means based on the Fay-Herriot (FH) area level model: the method of You and Rao (2002) applied to the FH model and the method of Wang, Fuller and Qu (2008) based on augmented models. We derive an estimator of the mean squared prediction error (MSPE) of the You-Rao (YR) estimator of a small area mean that, under the true model, is correct to second-order terms. We report the results of a simulation study on the relative bias of the MSPE estimator of the YR estimator and the MSPE estimator of the Wang, Fuller and Qu (WFQ) estimator obtained under an augmented model. We also study the MSPE and the estimators of MSPE for the YR and WFQ estimators obtained under a misspecified model.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201200111682
    Description:

    Sample allocation issues are studied in the context of estimating sub-population (stratum or domain) means as well as the aggregate population mean under stratified simple random sampling. A non-linear programming method is used to obtain "optimal" sample allocation to strata that minimizes the total sample size subject to specified tolerances on the coefficient of variation of the estimators of strata means and the population mean. The resulting total sample size is then used to determine sample allocations for the methods of Costa, Satorra and Ventura (2004) based on compromise allocation and Longford (2006) based on specified "inferential priorities". In addition, we study sample allocation to strata when reliability requirements for domains, cutting across strata, are also specified. Performance of the three methods is studied using data from Statistics Canada's Monthly Retail Trade Survey (MRTS) of single establishments.

    Release date: 2012-06-27

  • Articles and reports: 82-003-X201200111625
    Description:

    This study compares estimates of the prevalence of cigarette smoking based on self-report with estimates based on urinary cotinine concentrations. The data are from the 2007 to 2009 Canadian Health Measures Survey, which included self-reported smoking status and the first nationally representative measures of urinary cotinine.

    Release date: 2012-02-15

  • Articles and reports: 12-001-X201000211378
    Description:

    One key to poverty alleviation or eradication in the third world is reliable information on the poor and their location, so that interventions and assistance can be effectively targeted to the neediest people. Small area estimation is one statistical technique that is used to monitor poverty and to decide on aid allocation in pursuit of the Millennium Development Goals. Elbers, Lanjouw and Lanjouw (ELL) (2003) proposed a small area estimation methodology for income-based or expenditure-based poverty measures, which is implemented by the World Bank in its poverty mapping projects via the involvement of the central statistical agencies in many third world countries, including Cambodia, Lao PDR, the Philippines, Thailand and Vietnam, and is incorporated into the World Bank software program PovMap. In this paper, the ELL methodology which consists of first modeling survey data and then applying that model to census information is presented and discussed with strong emphasis on the first phase, i.e., the fitting of regression models and on the estimated standard errors at the second phase. Other regression model fitting procedures such as the General Survey Regression (GSR) (as described in Lohr (1999) Chapter 11) and those used in existing small area estimation techniques: Pseudo-Empirical Best Linear Unbiased Prediction (Pseudo-EBLUP) approach (You and Rao 2002) and Iterative Weighted Estimating Equation (IWEE) method (You, Rao and Kovacevic 2003) are presented and compared with the ELL modeling strategy. The most significant difference between the ELL method and the other techniques is in the theoretical underpinning of the ELL model fitting procedure. An example based on the Philippines Family Income and Expenditure Survey is presented to show the differences in both the parameter estimates and their corresponding standard errors, and in the variance components generated from the different methods and the discussion is extended to the effect of these on the estimated accuracy of the final small area estimates themselves. The need for sound estimation of variance components, as well as regression estimates and estimates of their standard errors for small area estimation of poverty is emphasized.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X201000211385
    Description:

    In this short note, we show that simple random sampling without replacement and Bernoulli sampling have approximately the same entropy when the population size is large. An empirical example is given as an illustration.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X201000211384
    Description:

    The current economic downturn in the US could challenge costly strategies in survey operations. In the Behavioral Risk Factor Surveillance System (BRFSS), ending the monthly data collection at 31 days could be a less costly alternative. However, this could potentially exclude a portion of interviews completed after 31 days (late responders) whose respondent characteristics could be different in many respects from those who completed the survey within 31 days (early responders). We examined whether there are differences between the early and late responders in demographics, health-care coverage, general health status, health risk behaviors, and chronic disease conditions or illnesses. We used 2007 BRFSS data, where a representative sample of the noninstitutionalized adult U.S. population was selected using a random digit dialing method. Late responders were significantly more likely to be male; to report race/ethnicity as Hispanic; to have annual income higher than $50,000; to be younger than 45 years of age; to have less than high school education; to have health-care coverage; to be significantly more likely to report good health; and to be significantly less likely to report hypertension, diabetes, or being obese. The observed differences between early and late responders on survey estimates may hardly influence national and state-level estimates. As the proportion of late responders may increase in the future, its impact on surveillance estimates should be examined before excluding from the analysis. Analysis on late responders only should combine several years of data to produce reliable estimates.

    Release date: 2010-12-21

  • Articles and reports: 12-001-X201000111246
    Description:

    Many surveys employ weight adjustment procedures to reduce nonresponse bias. These adjustments make use of available auxiliary data. This paper addresses the issue of jackknife variance estimation for estimators that have been adjusted for nonresponse. Using the reverse approach for variance estimation proposed by Fay (1991) and Shao and Steel (1999), we study the effect of not re-calculating the nonresponse weight adjustment within each jackknife replicate. We show that the resulting 'shortcut' jackknife variance estimator tends to overestimate the true variance of point estimators in the case of several weight adjustment procedures used in practice. These theoretical results are confirmed through a simulation study where we compare the shortcut jackknife variance estimator with the full jackknife variance estimator obtained by re-calculating the nonresponse weight adjustment within each jackknife replicate.

    Release date: 2010-06-29

  • Articles and reports: 12-001-X200900110885
    Description:

    Peaks in the spectrum of a stationary process are indicative of the presence of stochastic periodic phenomena, such as a stochastic seasonal effect. This work proposes to measure and test for the presence of such spectral peaks via assessing their aggregate slope and convexity. Our method is developed nonparametrically, and thus may be useful during a preliminary analysis of a series. The technique is also useful for detecting the presence of residual seasonality in seasonally adjusted data. The diagnostic is investigated through simulation and an extensive case study using data from the U.S. Census Bureau and the Organization for Economic Co-operation and Development (OECD).

    Release date: 2009-06-22

  • Articles and reports: 12-001-X200800110616
    Description:

    With complete multivariate data the BACON algorithm (Billor, Hadi and Vellemann 2000) yields a robust estimate of the covariance matrix. The corresponding Mahalanobis distance may be used for multivariate outlier detection. When items are missing the EM algorithm is a convenient way to estimate the covariance matrix at each iteration step of the BACON algorithm. In finite population sampling the EM algorithm must be enhanced to estimate the covariance matrix of the population rather than of the sample. A version of the EM algorithm for survey data following a multivariate normal model, the EEM algorithm (Estimated Expectation Maximization), is proposed. The combination of the two algorithms, the BACON-EEM algorithm, is applied to two datasets and compared with alternative methods.

    Release date: 2008-06-26

  • Articles and reports: 12-001-X200700210495
    Description:

    The purpose of this work is to obtain reliable estimates in study domains when there are potentially very small sample sizes and the sampling design stratum differs from the study domain. The population sizes are unknown as well for both the study domain and the sampling design stratum. In calculating parameter estimates in the study domains, a random sample size is often necessary. We propose a new family of generalized linear mixed models with correlated random effects when there is more than one unknown parameter. The proposed model will estimate both the population size and the parameter of interest. General formulae for full conditional distributions required for Markov chain Monte Carlo (MCMC) simulations are given for this framework. Equations for Bayesian estimation and prediction at the study domains are also given. We apply the 1998 Missouri Turkey Hunting Survey, which stratified samples based on the hunter's place of residence and we require estimates at the domain level, defined as the county in which the turkey hunter actually hunted.

    Release date: 2008-01-03

  • Articles and reports: 12-001-X200700210498
    Description:

    In this paper we describe a methodology for combining a convenience sample with a probability sample in order to produce an estimator with a smaller mean squared error (MSE) than estimators based on only the probability sample. We then explore the properties of the resulting composite estimator, a linear combination of the convenience and probability sample estimators with weights that are a function of bias. We discuss the estimator's properties in the context of web-based convenience sampling. Our analysis demonstrates that the use of a convenience sample to supplement a probability sample for improvements in the MSE of estimation may be practical only under limited circumstances. First, the remaining bias of the estimator based on the convenience sample must be quite small, equivalent to no more than 0.1 of the outcome's population standard deviation. For a dichotomous outcome, this implies a bias of no more than five percentage points at 50 percent prevalence and no more than three percentage points at 10 percent prevalence. Second, the probability sample should contain at least 1,000-10,000 observations for adequate estimation of the bias of the convenience sample estimator. Third, it must be inexpensive and feasible to collect at least thousands (and probably tens of thousands) of web-based convenience observations. The conclusions about the limited usefulness of convenience samples with estimator bias of more than 0.1 standard deviations also apply to direct use of estimators based on that sample.

    Release date: 2008-01-03

Reference (87)

Reference (87) (25 of 87 results)

  • Technical products: 11-522-X201700014714
    Description:

    The Labour Market Development Agreements (LMDAs) between Canada and the provinces and territories fund labour market training and support services to Employment Insurance claimants. The objective of this paper is to discuss the improvements over the years in the impact assessment methodology. The paper describes the LMDAs and past evaluation work and discusses the drivers to make better use of large administrative data holdings. It then explains how the new approach made the evaluation less resource-intensive, while results are more relevant to policy development. The paper outlines the lessons learned from a methodological perspective and provides insight into ways for making this type of use of administrative data effective, especially in the context of large programs.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014742
    Description:

    This paper describes the Quick Match System (QMS), an in-house application designed to match business microdata records, and the methods used to link the United States Patent and Trademark Office (USPTO) dataset to Statistics Canada’s Business Register (BR) for the period from 2000 to 2011. The paper illustrates the record-linkage framework and outlines the techniques used to prepare and classify each record and evaluate the match results. The USPTO dataset consisted of 41,619 U.S. patents granted to 14,162 distinct Canadian entities. The record-linkage process matched the names, city, province and postal codes of the patent assignees in the USPTO dataset with those of businesses in the January editions of the Generic Survey Universe File (GSUF) from the BR for the same reference period. As the vast majority of individual patent assignees are not engaged in commercial activity to provide taxable property or services, they tend not to appear in the BR. The relatively poor match rate of 24.5% among individuals, compared to 84.7% among institutions, reflects this tendency. Although the 8,844 individual patent assignees outnumbered the 5,318 institutions, the institutions accounted for 73.0% of the patents, compared to 27.0% held by individuals. Consequently, this study and its conclusions focus primarily on institutional patent assignees. The linkage of the USPTO institutions to the BR is significant because it provides access to business micro-level data on firm characteristics, employment, revenue, assets and liabilities. In addition, the retrieval of robust administrative identifiers enables subsequent linkage to other survey and administrative data sources. The integrated dataset will support direct and comparative analytical studies on the performance of Canadian institutions that obtained patents in the United States between 2000 and 2011.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014719
    Description:

    Open Data initiatives are transforming how governments and other public institutions interact and provide services to their constituents. They increase transparency and value to citizens, reduce inefficiencies and barriers to information, enable data-driven applications that improve public service delivery, and provide public data that can stimulate innovative business opportunities. As one of the first international organizations to adopt an open data policy, the World Bank has been providing guidance and technical expertise to developing countries that are considering or designing their own initiatives. This presentation will give an overview of developments in open data at the international level along with current and future experiences, challenges, and opportunities. Mr. Herzog will discuss the rationales under which governments are embracing open data, demonstrated benefits to both the public and private sectors, the range of different approaches that governments are taking, and the availability of tools for policymakers, with special emphasis on the roles and perspectives of National Statistics Offices within a government-wide initiative.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014711
    Description:

    After the 2010 Census, the U.S. Census Bureau conducted two separate research projects matching survey data to databases. One study matched to the third-party database Accurint, and the other matched to U.S. Postal Service National Change of Address (NCOA) files. In both projects, we evaluated response error in reported move dates by comparing the self-reported move date to records in the database. We encountered similar challenges in the two projects. This paper discusses our experience using “big data” as a comparison source for survey data and our lessons learned for future projects similar to the ones we conducted.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014729
    Description:

    The use of administrative datasets as a data source in official statistics has become much more common as there is a drive for more outputs to be produced more efficiently. Many outputs rely on linkage between two or more datasets, and this is often undertaken in a number of phases with different methods and rules. In these situations we would like to be able to assess the quality of the linkage, and this involves some re-assessment of both links and non-links. In this paper we discuss sampling approaches to obtain estimates of false negatives and false positives with reasonable control of both accuracy of estimates and cost. Approaches to stratification of links (non-links) to sample are evaluated using information from the 2011 England and Wales population census.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014749
    Description:

    As part of the Tourism Statistics Program redesign, Statistics Canada is developing the National Travel Survey (NTS) to collect travel information from Canadian travellers. This new survey will replace the Travel Survey of Residents of Canada and the Canadian resident component of the International Travel Survey. The NTS will take advantage of Statistics Canada’s common sampling frames and common processing tools while maximizing the use of administrative data. This paper discusses the potential uses of administrative data such as Passport Canada files, Canada Border Service Agency files and Canada Revenue Agency files, to increase the efficiency of the NTS sample design.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014726
    Description:

    Internal migration is one of the components of population growth estimated at Statistics Canada. It is estimated by comparing individuals’ addresses at the beginning and end of a given period. The Canada Child Tax Benefit and T1 Family File are the primary data sources used. Address quality and coverage of more mobile subpopulations are crucial to producing high-quality estimates. The purpose of this article is to present the results of evaluations of these elements using access to more tax data sources at Statistics Canada.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014743
    Description:

    Probabilistic linkage is susceptible to linkage errors such as false positives and false negatives. In many cases, these errors may be reliably measured through clerical-reviews, i.e. the visual inspection of a sample of record pairs to determine if they are matched. A framework is described to effectively carry-out such clerical-reviews based on a probabilistic sample of pairs, repeated independent reviews of the same pairs and latent class analysis to account for clerical errors.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014740
    Description:

    In this paper, we discuss the impacts of Employment Benefit and Support Measures delivered in Canada under the Labour Market Development Agreements. We use linked rich longitudinal administrative data covering all LMDA participants from 2002 to 2005. We Apply propensity score matching as in Blundell et al. (2002), Gerfin and Lechner (2002), and Sianesi (2004), and produced the national incremental impact estimates using difference-in-differences and Kernel Matching estimator (Heckman and Smith, 1999). The findings suggest that, both Employment Assistance Services and employment benefit such as Skills Development and Targeted Wage Subsidies had positive effects on earnings and employment.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014718
    Description:

    This study assessed whether starting participation in Employment Assistance Services (EAS) earlier after initiating an Employment Insurance (EI) claim leads to better impacts for unemployed individuals than participating later during the EI benefit period. As in Sianesi (2004) and Hujer and Thomsen (2010), the analysis relied on a stratified propensity score matching approach conditional on the discretized duration of unemployment until the program starts. The results showed that individuals who participated in EAS within the first four weeks after initiating an EI claim had the best impacts on earnings and incidence of employment while also experiencing reduced use of EI starting the second year post-program.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014716
    Description:

    Administrative data, depending on its source and original purpose, can be considered a more reliable source of information than survey-collected data. It does not require a respondent to be present and understand question wording, and it is not limited by the respondent’s ability to recall events retrospectively. This paper compares selected survey data, such as demographic variables, from the Longitudinal and International Study of Adults (LISA) to various administrative sources for which LISA has linkage agreements in place. The agreement between data sources, and some factors that might affect it, are analyzed for various aspects of the survey.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014725
    Description:

    Tax data are being used more and more to measure and analyze the population and its characteristics. One of the issues raised by the growing use of these type of data relates to the definition of the concept of place of residence. While the census uses the traditional concept of place of residence, tax data provide information based on the mailing address of tax filers. Using record linkage between the census, the National Household Survey and tax data from the T1 Family File, this study examines the consistency level of the place of residence of these two sources and its associated characteristics.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014732
    Description:

    The Institute for Employment Research (IAB) is the research unit of the German Federal Employment Agency. Via the Research Data Centre (FDZ) at the IAB, administrative and survey data on individuals and establishments are provided to researchers. In cooperation with the Institute for the Study of Labor (IZA), the FDZ has implemented the Job Submission Application (JoSuA) environment which enables researchers to submit jobs for remote data execution through a custom-built web interface. Moreover, two types of user-generated output files may be distinguished within the JoSuA environment which allows for faster and more efficient disclosure review services.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014755
    Description:

    The National Children’s Study Vanguard Study was a pilot epidemiological cohort study of children and their parents. Measures were to be taken from pre-pregnancy until adulthood. The use of extant data was planned to supplement direct data collection from the respondents. Our paper outlines a strategy for cataloging and evaluating extant data sources for use with large scale longitudinal. Through our review we selected five evaluation factors to guide a researcher through available data sources including 1) relevance, 2) timeliness, 3) spatiality, 4) accessibility, and 5) accuracy.

    Release date: 2016-03-24

  • Technical products: 11-522-X201700014746
    Description:

    Paradata research has focused on identifying opportunities for strategic improvement in data collection that could be operationally viable and lead to enhancements in quality or cost efficiency. To that end, Statistics Canada has developed and implemented a responsive collection design (RCD) strategy for computer-assisted telephone interview (CATI) household surveys to maximize quality and efficiency and to potentially reduce costs. RCD is an adaptive approach to survey data collection that uses information available prior to and during data collection to adjust the collection strategy for the remaining in-progress cases. In practice, the survey managers monitor and analyze collection progress against a predetermined set of indicators for two purposes: to identify critical data-collection milestones that require significant changes to the collection approach and to adjust collection strategies to make the most efficient use of remaining available resources. In the RCD context, numerous considerations come into play when determining which aspects of data collection to adjust and how to adjust them. Paradata sources play a key role in the planning, development and implementation of active management for RCD surveys. Since 2009, Statistics Canada has conducted several RCD surveys. This paper describes Statistics Canada’s experiences in implementing and monitoring this type of surveys.

    Release date: 2016-03-24

  • Technical products: 11-522-X201300014256
    Description:

    The American Community Survey (ACS) added an Internet data collection mode as part of a sequential mode design in 2013. The ACS currently uses a single web application for all Internet respondents, regardless of whether they respond on a personal computer or on a mobile device. As market penetration of mobile devices increases, however, more survey respondents are using tablets and smartphones to take surveys that are designed for personal computers. Using mobile devices to complete these surveys may be more difficult for respondents and this difficulty may translate to reduced data quality if respondents become frustrated or cannot navigate around usability issues. This study uses several indicators to compare data quality across computers, tablets, and smartphones and also compares the demographic characteristics of respondents that use each type of device.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014278
    Description:

    In January and February 2014, Statistics Canada conducted a test aiming at measuring the effectiveness of different collection strategies using an online self-reporting survey. Sampled units were contacted using mailed introductory letters and asked to complete the online survey without any interviewer contact. The objectives of this test were to measure the take-up rates for completing an online survey, and to profile the respondents/non-respondents. Different samples and letters were tested to determine the relative effectiveness of the different approaches. The results of this project will be used to inform various social surveys that are preparing to include an internet response option in their surveys. The paper will present the general methodology of the test as well as results observed from collection and the analysis of profiles.

    Release date: 2014-10-31

  • Technical products: 11-522-X201300014276
    Description:

    In France, budget restrictions are making it more difficult to hire casual interviewers to deal with collection problems. As a result, it has become necessary to adhere to a predetermined annual work quota. For surveys of the National Institute of Statistics and Economic Studies (INSEE), which use a master sample, problems arise when an interviewer is on extended leave throughout the entire collection period of a survey. When that occurs, an area may cease to be covered by the survey, and this effectively generates a bias. In response to this new problem, we have implemented two methods, depending on when the problem is identified: If an area is ‘abandoned’ before or at the very beginning of collection, we carry out a ‘sub-allocation’ procedure. The procedure involves interviewing a minimum number of households in each collection area at the expense of other areas in which no collection problems have been identified. The idea is to minimize the dispersion of weights while meeting collection targets. If an area is ‘abandoned’ during collection, we prioritize the remaining surveys. Prioritization is based on a representativeness indicator (R indicator) that measures the degree of similarity between a sample and the base population. The goal of this prioritization process during collection is to get as close as possible to equal response probability for respondents. The R indicator is based on the dispersion of the estimated response probabilities of the sampled households, and it is composed of partial R indicators that measure representativeness variable by variable. These R indicators are tools that we can use to analyze collection by isolating underrepresented population groups. We can increase collection efforts for groups that have been identified beforehand. In the oral presentation, we covered these two points concisely. By contrast, this paper deals exclusively with the first point: sub-allocation. Prioritization is being implemented for the first time at INSEE for the assets survey, and it will be covered in a specific paper by A. Rebecq.

    Release date: 2014-10-31

  • Technical products: 11-522-X200800010973
    Description:

    The Canadian Community Health Survey (CCHS) provides timely estimates of health information at the sub-provincial level. We explore two main issues that prevented us from using physical activity data from CCHS cycle 3.1 (2005) as part of the Profile of Women's Health in Manitoba. CCHS uses the term 'moderate' to describe physical effort that meets Canadian minimum guidelines, whereas 'moderate' conversely describes sub-minimal levels of activity. A Manitoba survey of physical activity interrogates a wider variety of activities to measure respondents' daily energy expenditure. We found the latter survey better suited to our needs and more likely a better measure of women's daily physical activity and health.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010964
    Description:

    Statistics Netherlands (SN) has been using electronic questionnaires for Business surveys since the early nineties. Some years ago SN decided to invest in a large scale use of electronic questionnaires. The big yearly production survey of about 80 000 forms, divided over many different economical activity areas, was redesigned using a meta database driven approach. The resulting system is able to generate non-intelligent personalized PDF forms and intelligent personalized Blaise forms. The Blaise forms are used by a new tool in the Blaise system which can be downloaded by the respondents from the SN web site to run the questionnaire off-line. Essential to the system is the SN house style for paper and electronic forms. The flexibility of the new tool offered the questionnaire designers the possibility to implement a user friendly form according to this house style.

    Part of the implementation is an audit trail that offers insight in the way respondents operate the questionnaire program. The entered data including the audit trail can be transferred via encrypted e-mail or through the internet to SN. The paper will give an outline of the overall system architecture and the role of Blaise in the system. It will also describe the results of using the system for several years now and some results of the analysis of the audit trail.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010948
    Description:

    Past survey instruments, whether in the form of a paper questionnaire or telephone script, were their own documentation. Based on this, the ESRC Question Bank was created, providing free-access internet publication of questionnaires, enabling researchers to re-use questions, saving them trouble, whilst improving the comparability of their data with that collected by others. Today however, as survey technology and computer programs have become more sophisticated, accurate comprehension of the latest questionnaires seems more difficult, particularly when each survey team uses its own conventions to document complex items in technical reports. This paper seeks to illustrate these problems and suggest preliminary standards of presentation to be used until the process can be automated.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011010
    Description:

    The Survey of Employment, Payrolls and Hours (SEPH) is a monthly survey using two sources of data: a census of payroll deduction (PD7) forms (administrative data) and a survey of business establishments. This paper focuses on the processing of the administrative data, from the weekly receipt of data from the Canada Revenue Agency to the production of monthly estimates produced by SEPH.

    The edit and imputation methods used to process the administrative data have been revised in the last several years. The goals of this redesign were primarily to improve the data quality and to increase the consistency with another administrative data source (T4) which is a benchmark measure for Statistics Canada's System of National Accounts people. An additional goal was to ensure that the new process would be easier to understand and to modify, if needed. As a result, a new processing module was developed to edit and impute PD7 forms before their data is aggregated to the monthly level.

    This paper presents an overview of both the current and new processes, including a description of challenges that we faced during development. Improved quality is demonstrated both conceptually (by presenting examples of PD7 forms and their treatment under the old and new systems) and quantitatively (by comparison to T4 data).

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010956
    Description:

    The use of Computer Audio-Recorded Interviewing (CARI) as a tool to identify interview falsification is quickly growing in survey research (Biemer, 2000, 2003; Thissen, 2007). Similarly, survey researchers are starting to expand the usefulness of CARI by combining recordings with coding to address data quality (Herget, 2001; Hansen, 2005; McGee, 2007). This paper presents results from a study included as part of the establishment-based National Center for Health Statistics' National Home and Hospice Care Survey (NHHCS) which used CARI behavior coding and CARI-specific paradata to: 1) identify and correct problematic interviewer behavior or question issues early in the data collection period before either negatively impact data quality, and; 2) identify ways to diminish measurement error in future implementations of the NHHCS. During the first 9 weeks of the 30-week field period, CARI recorded a subset of questions from the NHHCS application for all interviewers. Recordings were linked with the interview application and output and then coded in one of two modes: Code by Interviewer or Code by Question. The Code by Interviewer method provided visibility into problems specific to an interviewer as well as more generalized problems potentially applicable to all interviewers. The Code by Question method yielded data that spoke to understandability of the questions and other response problems. In this mode, coders coded multiple implementations of the same question across multiple interviewers. Using the Code by Question approach, researchers identified issues with three key survey questions in the first few weeks of data collection and provided guidance to interviewers in how to handle those questions as data collection continued. Results from coding the audio recordings (which were linked with the survey application and output) will inform question wording and interviewer training in the next implementation of the NHHCS, and guide future enhancement of CARI and the coding system.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800010999
    Description:

    The choice of number of call attempts in a telephone survey is an important decision. A large number of call attempts makes the data collection costly and time-consuming; and a small number of attempts decreases the response set from which conclusions are drawn and increases the variance. The decision can also have an effect on the nonresponse bias. In this paper we study the effects of number of call attempts on the nonresponse rate and the nonresponse bias in two surveys conducted by Statistics Sweden: The Labour Force Survey (LFS) and Household Finances (HF).

    By use of paradata we calculate the response rate as a function of the number of call attempts. To estimate the nonresponse bias we use estimates of some register variables, where observations are available for both respondents and nonrespondents. We also calculate estimates of some real survey parameters as functions of varying number of call attempts. The results indicate that it is possible to reduce the current number of call attempts without getting an increased nonresponse bias.

    Release date: 2009-12-03

  • Technical products: 11-522-X200800011008
    Description:

    In one sense, a questionnaire is never complete. Test results, paradata and research findings constantly provide reasons to update and improve the questionnaire. In addition, establishments change over time and questions need to be updated accordingly. In reality, it doesn't always work like this. At Statistics Sweden there are several examples of questionnaires that were designed at one point in time and rarely improved later on. However, we are currently trying to shift the perspective on questionnaire design from a linear to a cyclic one. We are developing a cyclic model in which the questionnaire can be improved continuously in multiple rounds. In this presentation, we will discuss this model and how we work with it.

    Release date: 2009-12-03

Date modified: