Methodology and data quality

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Introduction

This section describes the methodology of the Annual Motor Carriers of Freight Survey of Small For-hire Carriers and Owner Operators. Included are descriptions of the target and survey populations, the sample design, and the data processing and estimation methods. The quality of the data presented in this publication is discussed and quality indicators for some key statistics are given.

This information will provide the user with a better understanding of the strengths and limitations of the data, and how they can be effectively used and analyzed. The information may be of particular importance when making comparisons with data from other surveys or sources of information, and in drawing conclusions regarding changes over time, differences between geographic areas and differences among sub-groups of the target population.

Several terms used in this chapter are defined below:

Target population: all units (e.g. carriers) for which the information is required.

Survey population: all units (e.g. carriers) for which the survey can realistically provide information. The survey population may differ from the target population due to the operational difficulty of identifying all the units that belong to the target population.

Survey frame: a list of all units in the survey population that carries classification information (e.g. industrial, geographical and size) of the units. This list is used for sample design and selection.

Stratification: a non-overlapping partition of the survey population into relatively homogeneous groups with respect to certain characteristics such as geographical and industrial classification, size, etc. These groups are called strata and are used for sample allocation and selection.

Sampling weight: a raising factor attached to each sampled unit to obtain estimates for the population from a sample. The basic concept of the sampling weight can be explained by using the representation rate. For example, if 2 units are selected out of 10 population units at random, then each selected unit represents 5 units in the population including itself, and is given the sampling weight of 5. A survey with a complex sample design requires a more complicated way of calculating the sampling weight. However, the sampling weight is still equal to the number of units in the population that the unit represents.

Data sources and methodology

Survey objectives

The objective of this survey is to obtain information on small for-hire carriers and owner operators in terms of their structure and performance on an annual basis.

Populations

Target population

The target population includes all Canadian-domiciled for-hire motor carriers (companies) with annual operating revenues greater than or equal to $30 thousand and less than $1 million, as well as all Canadian-domiciled owner operators with annual operating revenues of $30 thousand and more. Courier and messenger services are not covered by this survey, nor are private carriers.

Survey population

The 2003 survey population consists of all companies on Statistics Canada's Business Register, the Central Frame Data Base (CFDB), classified as trucking companies with annual gross business income greater than or equal to $30 thousand and less than $1 million, or with annual gross business income of $1 million and more if these companies are known as owner operators. In addition, the survey population includes some companies from an administrative file of 2003 tax filers, classified as trucking companies with annual operating revenues greater than or equal to $30 thousand and less than $1 million that are not yet found on the Business Register.

Sample design

The number of trucking companies on the survey frame is large and for that reason a sample of them is selected to represent the population. The survey uses a two-phase sample design, where a large first-phase sample is selected and the second-phase sample is drawn as a sub-sample from these companies. The design of the sampling procedures used in each phase is described below.

First-phase

The first-phase sample is drawn independently for incorporated and unincorporated companies. The list of unincorporated companies on the Business Register is classified using province/territory of domicile, North American Industrial Classification System (NAICS) code, and size (measured by annual gross business income). Companies with the same classification form a stratum within which a first-phase sample is selected. The complete list of incorporated companies on the survey frame is included in the first-phase sample

Second-phase

The second-phase sample is a sub-sample of the first-phase sample. The companies included in the first-phase sample are again classified using province/territory of domicile, NAICS code, and size. Companies with the same classification form a stratum within which a second-phase sample is selected.

The overall size of the second phase sample and its allocation among strata are determined to satisfy precision requirements for provincial estimates under given cost constraints. The coefficient of variation is used as the measure of precision. The sample size and the estimated population size by province or territory of domicile are given in Table 21.

Data collection and processing

During the collection period, financial data is obtained for all units in the first-phase sample from an administrative file of tax filers. The companies included in the second-phase sample are contacted via telephone interview in order to collect operational data.

The survey data are checked for errors and inconsistencies. Problems or missing data are replaced with consistent values (are imputed) using Statistics Canada's Generalized Edit and Imputation System. The system imputes data using imputation rules that select representative data from another carrier. The data is then verified by subject matter specialists.

Estimation

Since only a sample of carriers was contacted for the AMCF survey, the individual values are weighted to represent the whole industry within the scope of the survey. The value of each carrier is multiplied by the weight for that carrier, and then the weighted data from all sampled carriers belonging to a given estimation domain (e.g. Ontario) are summed to obtain the estimate.

Reference period

The reference period for collection purposes is the firm's own 12-month accounting period whose year-end occurred on any date from April1, 2003 to March 31, 2004, inclusive.

Data quality

Any survey suffers from errors. While considerable effort is made to ensure a high standard throughout all survey operations, the resulting estimates are inevitably subject to a certain degree of error. The total survey error is defined as the difference between the survey estimate and the true population value for which the survey estimate aims. The total survey error consists of two types of errors; sampling and non-sampling errors. Sampling error occurs when a sample survey is carried out. Non-sampling errors arise due to various reasons other than sampling. These two types of errors are further explained below.

Sampling error

The Annual Motor Carriers of Freight Survey of Small For-hire Carriers and Owner Operators is subject to sampling error. When a sample is selected from a population, estimates based on the sample data will not in general be exactly the same as what would be obtained from a census of that population.

The difference between the estimates from a sample survey and a census conducted under the same conditions is referred to as the sampling error. Factors such as the sample size, the sample design, the variability of the population characteristic under study and the estimation method affect the sampling error. In general, a larger sample size produces a smaller sampling error. If the population is very heterogeneous like the trucking industry, a large sample size is needed to obtain a reliable estimate.

Non-sampling errors

The sampling error is only one component of the total survey error. Errors arising from all phases of a survey are called non-sampling errors. For example, non-sampling error can arise when a respondent provides incorrect information or does not answer certain questions or when a unit in the target population is omitted or covered more than once or when a unit that is out of scope for the survey is included by mistake or when errors occur in data processing, such as coding and capture errors.

The effects of some of the non-sampling errors will cancel out over a large number of observations, but systematically occurring errors (i.e. those that do not tend to cancel) will contribute to a bias in the estimates. For example, if carriers consistently tend to under-report their revenues, then the resulting estimate of the total revenues will be below the true population total.

As the sample size becomes closer to the population size, the sampling error component of the total survey error is expected to decrease. However, this is not necessarily true for the non-sampling error component.

In general, non-sampling errors are difficult to evaluate and special studies must be conducted to estimate them. However, certain measures such as imputation rates are easily obtained and can be used as indicators for portions of the non-sampling errors. Different types of non-sampling error together with their associated measures are discussed below.

Coverage errors

Coverage errors arise when the survey frame does not adequately cover the target population. As a result, certain units belonging to the target population are either excluded (under coverage), or counted more than once (over coverage). In addition, out of scope units may be present in the survey frame (over coverage). Errors in the North American Industrial Classification Standard (NAICS) code on the survey frame may also result in either over or under coverage of the trucking industry.

Response errors

Response errors occur when a respondent provides incorrect information due to misinterpretation of the survey questions or lack of correct information, gives wrong information by mistake, or is reluctant to disclose the correct information. Large response errors are likely to be caught during editing; however, others may simply go through undetected.

Non-response errors

Non-response errors can occur when a respondent does not respond at all (total non-response) or responds only to some questions (partial non-response). These errors can have a serious effect if non-respondents are systematically different from respondents in survey characteristics and/or the non-response rate is high.

Processing errors

Apart from coverage, response and non-response errors as described above, errors that occur during the processing of the data constitute another component of the non-sampling error. Processing errors can arise in data capture, coding, transcription, imputation, outlier detection and treatment, and other types of data handling.

A coding error occurs when a field is coded erroneously because of misinterpretation of coding procedures or poor judgment (e.g. errors in NAICS coding). A data capture error occurs when data are misinterpreted or keyed incorrectly. For this survey, errors in financial data can occur when the data are being transcribed from the tax returns.

Once data are coded and captured, they are subject to editing and imputation of missing or erroneous values. The quality of the data depends on the amount of imputation and the difference between the imputed and the true, but unknown, values. Using invalid assumptions when developing the imputation system could result in bias in the imputed data.

The non-sampling error as a whole is only one part of the total survey error but its contribution may be important. To minimize the effect of this type of error, a quality assurance program is carried out for this survey. For instance, various quality assurance procedures are exercised at the data capture step. The data editing procedures identify some inconsistencies in the data structure and the imputation procedures correct the identified inconsistencies.

Some measures of data quality

This section presents some indicators of the data quality of the Annual Motor Carriers of Freight Survey of Small For-hire Carriers and Owner Operators as shown in Table 22. To assist the user in evaluating the potential effect of non-response and imputation, relative imputation rates for key characteristics (number of employees, fuel consumed, and operating revenues) are presented. The relative imputation rate is defined as the proportion of the corresponding published estimate that is accounted for by imputed data. For example, assume that the total published estimate is $25 million, composed of $20 million from non-imputed data and $5 million from imputed data. Then the relative imputation rate is 0.2 ($5 million divided by $25 million) or 20%. The lower the relative imputation rate, the more reliable the published estimate.

A total response rate is also provided. This rate is defined as the number of carriers that responded to the survey divided by the total number of in-scope units in the sample.

As a measure of the sampling error, estimated coefficients of variation (CV) for some variables are presented in Table 22. CVs for other estimates may be obtained from the Transportation Division upon request. Note that the provided CV estimates do not consider the fact that some of the data were imputed and thus may underestimate the true CVs. The CV and the relative imputation rate should be considered simultaneously to make an assessment of the reliability of an estimate.

The quality of the estimates is classified as follows:

Excellent:
CV is 0,01% and 4,99%
Very good:
CV is 5,00% and 9,99%
Good:
CV is 10,00% and 14,99%
Acceptable:
CV is 15,00% and 24,99%
Caution:
CV is 25,00% and 34,99%
Unreliable:
CV is larger than 35,00%

Comparability of data

For the 1999 reference year, changes were made to the derivation of certain financial variables from the administrative tax files. The variables for which historical comparison may be affected are transportation revenues, fuel expenses, Owner-operator expenses and miscellaneous expenses.

Effective for the 1998 reference year, the survey underwent a major redesign. The redesign involved major changes to the frame creation process, the sample design and the estimation strategy. The frame for 1998 was created from Statistics Canada's Business Register. The sample design at the second-phase has been optimized, and the estimation strategy now uses a calibration approach to make use of information that became available after the sample had been drawn (e.g. an updated frame is used in calculating the estimation weights to make use of updates to the Business Register since the sample was drawn). The overall effect of these changes is an improved reliability in the survey estimates. In particular, this improved reliability comes from i) an improved coverage of the target population (which results in an increase in the estimated number of in-scope companies), and ii) an improved precision in the survey estimates (i.e. lower coefficients of variation).

In order to improve the combined coverage of the trucking industry by the AMCF survey and its complement, the Quarterly Motor Carriers of Freight (QMCF) survey, a group of large companies will be included in the AMCF, effective for 1998. Since the QMCF uses the annual gross business income from the Business Register, which is not identical to the true annual operating revenues, to define the survey population, some companies in the QMCF target population (those with annual gross business income less than $1 million, and actual annual operating revenues over $1 million) are not included in the QMCF survey population. Although these companies do not fall into the target population for the AMCF, it was decided to include them in the AMCF survey population to improve the overall industry coverage of the two surveys. The effect of this is the addition of a small number of large companies to the AMCF survey population. Although the number of such companies is small, their large size leads to a noticeable increase in the survey estimates. In order to facilitate historical comparisons, separate domain estimates will be produced for i) companies with actual operating revenue less than $1 million and ii) companies with actual operating revenue greater than or equal to $1 million.

From 1995 to 1997, the survey covered for-hire carriers and owner operators with annual operating revenues greater than or equal to $30 thousand and less than $1 million. Starting with 1998, owner operators with annual operating revenues greater than or equal to $1 million will also be covered.

From 1990 to 1995, the survey covered for-hire carriers and owner operators with annual operating revenues greater than or equal to $25 thousand and less than $1 million.

The survey data prior to the 1990 survey covered for-hire carriers with annual operating revenues of $100 thousand or more. Owner operators were not included in the 1989 and preceding surveys.