Methodology
Data concepts and definitions
This section explains the basic methodology used to estimate the number of exporters by industry (NAICS), exporter size, province of residence, destination and number of employees (for 2007 only). Essentially, there are two fundamental parts involved in this process: the data linkage process and the estimation of the unlinked documents.
Statistics Canada obtains trade data from two main sources: U.S. Customs documents and Canada Border Services Agency (CBSA) documents.
In 1990, a Memorandum of Understanding (MOU) was signed between Canada and the United States to exchange import data. Through this MOU, each country obtains a comprehensive list of exports to the other country. This is currently the largest source of export data in Canada. All remaining data on Canadian commodity exports destined for consumption in countries other than the United States are obtained from CBSA documents. The data from the two different sources are processed differently during the linkage process.
Exports to the United States: According to the Exporter Register Database, exports to the United States accounted for almost 79% of the value of Canada’s annual domestic exports in 2007 (Table 4-2 and Table 4-3). Each U.S. Customs document contains a vendor identification (ID) code. This code is constructed using the name and address of the Canadian exporter.
For each vendor ID code, it is necessary to:
The duplication problem arises because the descriptive information (namely, vendor name and address) is not a standardized field on the U.S. Customs document.
For example, the municipality of "ST JOHNS" (as it is written in the StatCan municipality library) in Newfoundland (standardized province is 10) has been reported in a number of ways, including "Saint Johns", "St. Johns", "St. John's", "Saint John", "Saint Jean" and "St Jean", while the province has been reported as "Newfoundland", "Nfld", "Terre-Neuve", and "TN".
This makes any automated linkage exercise very difficult, because each different spelling or listing is considered a different item. So, an initial automated processing of the file is performed using the Postal Address Analysis System at Statistics Canada. This generalized application attempts to rearrange a freeform address into standardized positioned components.
Exports to destinations other than the United States: According to the Exporter Register Database exports to non-U.S. destinations accounted for about 21% of the value of Canada’s total domestic exports in 2007 (Table 4-1 and Table 4-3).
Within each record, an exporter ID code is attached. Unlike documents for exports to the United States, the exporter ID code can come from various sources. The exporter ID can be a payroll deduction number, a Customs and Excise number or, since 1997, a business number. However, in many cases, the exporter ID field is not completed. In such instances, a ‘dummy’ StatCan code is assigned, and then the name and address information is captured and stored. Each of the previously mentioned codes also has a repository of names and addresses.
For each exporter ID code, it is necessary to
As with exports to the United States, the present descriptive information (name and address) is not standardized. Again, an initial automated processing of the file is performed using the Postal Address Analysis System.
After the standardizing and unduplication processes are completed, it is then possible to aggregate exports by unique exporter at the location level.
This process delivers a concordance file containing many initial ID codes for U.S. and non-U.S. destinations linked to one standardized exporter ID.
The final step is to ensure a proper linkage between the Business Register and the new file of exporters created for the Exporter Register Database.
Non-residents: Where feasible, exports by non-residents are allocated to their Canadian subsidiaries. When no Canadian subsidiary exists, non-residents are considered unlinked and Canadian exporters are estimated during the estimation process. For example, if a U.S. corporation is listed as the exporter of record on the Customs documentation for a given domestic export from Canada, then the corporation’s Canadian subsidiary, not the U.S. establishment, will be linked as the exporter.
A relatively small but significant portion of the documents was not successfully linked to the Business Register. Therefore, based on the linked portion alone, the number of exporters underestimates the true size of the exporting community.
Moreover, the linked portion cannot provide consistent estimates when the linkage rate changes over time. This is the case for exports to countries other than the United States, where the proportion of unlinked documents shrank from an average of about 45% between 1993 and 1995 down to around 10% between 1996 and 2007. By contrast, coverage for U.S. destinations was high and relatively constant from 1993 to 2007 (Table 11).
The number of exporting establishments and the value of their exports were estimated for the unlinked portion, in order to provide a more complete and reliable picture of the exporting community.
The estimation methodology first uses the patterns of the linked portion to provide estimates for the unlinked portion, and then follows these steps:
First for 1997 to 1999, the estimated total value of non-captured documents is distributed to commodities, provinces and destinations, for inclusion in the estimates as part of the unlinked portion. These non-captured documents show exports of less than $10,000 in value to non-U.S. destinations. This is done using the distribution of the value observed in similar recorded transactions within the linked portion of exports to non-U.S. destinations. All documents were captured in 2007 regardless of destination or export value.
Second, the export value of the unlinked portion is distributed by NAICS industry, exporter size and employment (for 2007 only) based on observed patterns in the linked portion. For example, in the Fruit and other vegetable farms industry, if the export values of apples in documents of $30,000 to $100,000 has been equally reported by establishments of two sizes ($30,000 to $99,999 and $100,000 to $999,999) in the linked portion, then the value of the exported apples in an unlinked $50,000 document would be distributed equally between these two exporter sizes in this industry.
Third, the province of origin reported on the unlinked documents is used to approximate the province of residency of the exporters.
Fourth, the destination reported on the unlinked portion by NAICS industry, size and employee class (for 2007 only) is distributed to various trading area combinations based on the linked patterns. For example, exports to Japan of $30,000 to $100,000 from the Fruit and other vegetable farms industry would be equally distributed to ‘Japan only’ and ‘Japan and Mexico’, if this were the pattern observed in the linked portion. This is necessary because an exporter can export to multiple countries. Therefore, summing the number of exporters by destination will not yield the accurate number of exporters. The distribution by trading area combination tries to split exports by ‘unique exporters’, where the sum of exporters by these trading area combinations equals the total number of exporters.
It is assumed that this average should be the same for a given industry, size and employee class (for 2007 only) across provinces and destinations. The geometric mean formula has been used because of the uneven distribution of exports by establishment. Namely, there is a much greater number of smaller exporting establishments than larger ones.
To obtain counts of exporting establishments, divide the exports (sorted by NAICS industry, size, and province and trading area combination, as well as by employee class for 2007 only), by the average export value per establishment and size. Estimates of the population counts by destination are obtained by adding all the trading area combinations for each destination in which the unlinked portion is involved. For example, for Japan, to obtain the total number of unlinked exporters of size $30,000 to $100,000 for the Fruit and other vegetable farms industry, add the count of ‘Japan only’ plus ‘Japan and Mexico;’ for Mexico, add ‘Mexico only’ plus ‘Japan and Mexico.’ In this way, the exporter exporting to both Mexico and Japan is counted as exporting to both countries.
This methodology is applied at an aggregation level that balances homogeneity of the aggregates and reliability (minimum of observations). The most detailed level of industry classification available for establishments was the six-digit NAICS. To ensure a minimum number of exporters in the linked portion, establishments were aggregated to the four-digit NAICS level (or higher in some cases) to form 137 industry classes.
The exporter size, employment class (for 2007 only) and destination categories used in the tables of this publication were the same as those used for aggregation. The province and territory categories were used without aggregation. At this level of aggregation, estimated counts were rounded to the closest integer value.
In cases where unlinked documents did not have the corresponding patterns in the linked portion at the detailed level, the closest pattern available was used. For example, if the linked establishments did not export apples, then the exports of ‘unlinked apples’ was distributed according to the distribution of a more aggregated HS (Harmonized Description and Coding System) class for apples.
Results
The estimated counts for the unlinked portion represent 4% of the total number of exporters from 1996 to 2002, and 8% from 2003 to 2007. This is similar to the proportions of unlinked documents over the same periods. The proportion of unlinked value is only about 2% from 1993 to 2001, and about 5% from 2003 to 2007. This reflects the fact that low-value documents are more likely to be unlinked and, therefore, are more likely to be associated with smaller establishments with a lower average value of exports.
Potential sources of error
The unique nature of the source data in the Exporter Register Database lends itself to unique potential sources of error. The following are the most prominent sources of error:
Linkage Rates
The most appropriate data quality measure for these data is the linkage rates of the population. For the period 1993 to 2007, these rates indicate that, on average, 96% of the documents and 98% of the export value destined for the United States were linked to a valid establishment. Similarly, for the same period, on average 83% of the Customs documents and 93% of the value bound for non-U.S. destinations were linked. Table 11 highlights the annual linkage rates.
There are two main sources of error to consider:
The main problem with these estimates relates to biases in the linked portion patterns. The most important bias stems from the assumption that the average export value per establishment is the same in both the linked and unlinked portions. This assumption means that the unlinked documents are not related to establishments already in the linked portion. However, an unknown proportion of unlinked documents are indeed related to linked establishments. This implies that the number of establishments corresponding to the unlinked portion is overestimated.
This overestimation is not believed to be too serious and is partially offset by a second source of bias. The larger establishments tend to be matched more effectively to the Business Register.
This increases the average exports per establishment in the linked portion, and thus creates a downward bias in the population estimates. This was more prevalent in the period 1993 to 1995 for low-value export documents to non-U.S. destinations.
If the observed exports per establishment in the linked portion vary a lot between establishments within the same group, the resulting estimates are likely to be less reliable. Therefore, the variance of the population estimates is directly related to the variance of the exports per establishment within establishment groupings. For 2007, the coefficient of variation of exports (after logarithmic transformation) by industry, exporter size, employee class, province and destination was less than 1% for 97% of the groups.
Statistics Canada’s Business Register is a central repository of information on businesses operating in Canada. It is used as the principal frame for most of Statistics Canada’s economic statistical programs, including the Exporter Register Database. The Business Register provides consistent and standardized data at the establishment and enterprise levels for each year under consideration.
The standardized business classification model developed at Statistics Canada comprises a four level hierarchy of statistical entities:
As in previous editions of this report, the statistical unit used in the Exporter Register Database is the statistical establishment, which represents a unit of production, such as a factory, plant or a head office. A statistical enterprise represents the sum of the statistical establishments under its control.
The industry of the exporting establishment may sometimes be different from the industry of the enterprise. Although this publication attributes exports to the industry of the exporting establishment, data are also given for the top 50 enterprises that export.
This publication conforms to the North American Industry Classification System (NAICS). NAICS is an industry classification system developed by the statistical agencies of Canada, Mexico and the United States. It provides common definitions of the industrial structure of the three countries and a common statistical framework to facilitate the analysis of the three economies.
The Exporter Register Database provides time-series statistics on exporting establishments and enterprises. Using the Business Register to link statistical entities through time is a complex task because of the frequency of re-organizations, mergers and takeovers, which often impact only the structure of the enterprise and leave the structure of the establishment unaffected. A new enterprise identifier is not always created when the structure of an enterprise changes. Therefore, the most recent structure is allocated throughout the period 1993 to 2007 in the Exporter Register Database.
As an example, consider two hypothetical enterprises called ABC and YYZ. Enterprise YYZ began exporting in 1993 and was taken over by ABC in 1998. During the takeover, ABC transferred its own business identifier to YYZ. The Exporter Register Database looks at the most recent data year available on the Business Register and transfers this information to the Exporter Register Database for all years under consideration. In 2007, YYZ is no longer on the Business Register; only ABC exists. Suppose that ABC also began exporting in 1993. Throughout the time series, ABC would now replace YYZ.
Technically, both enterprises co-existed for a period (1993 to 1997); however, because of the data refreshment on the Exporter Register Database in 2007, only one enterprise (ABC) is recorded as existing from 1993 to 2007.
It is important to note that this situation occurs only at the enterprise level: the establishment identifier number does not usually change during mergers or takeovers. This is the one reason why the establishment level was selected to measure the exporter population.
Another reason for using the establishment as the main statistical unit of measure is that it allows estimation at the provincial/territorial level. An enterprise often operates several establishments. These establishments can be located in more than one province/territory. Since a single establishment operates from one province or territory only, deriving provincial/territorial estimates at the establishment level is more meaningful.
Merchandise trade transactions for a given year include domestically produced exports as well as re-exports.1 The Exporter Register Database includes only the value of domestically produced exports and covers more than 95% of these domestic exports. The remaining share not covered can be attributed to the following:
For comparative purposes, Table 4-1 contains the Exporter Register Database value totals and ITD published totals for domestic export values. Table 12 outlines a list of the commodities not covered by the Exporter Register Database.
Canadian export transactions valued at less than $2,000 to non-U.S. destinations are not required to be reported to Canada Border Services Agency (CBSA). Therefore, this information does not appear in Statistics Canada’s export statistics, and as a result they do not appear on the Exporter Register Database.
The Exporter Register Database currently disseminates data on the number of exporters and the value of exports by industry grouping, exporter size, province of residence, and destination of export. In this edition of the Register, exporters are also grouped by their employment size (for 2007 only). Multidimensional tables at aggregated levels are also available. Despite aggregation, not all data in this format can be released because of confidentiality issues. Some descriptive background information on each of these dimensions follows.
Note that an establishment can export to different destinations and can, therefore, be counted in more than one destination. For this reason, the population counts shown in tables 3-2 and 3-3 do not always add up. For example, adding the exporters who export to U.S. destinations to the exporters who export to non-U.S. destinations will not give the total number of exporters. However summing exporter counts in three aggregates U.S. only, non-U.S. only, and both U.S. and non-U.S. will yield the total number of exporters.