Statistics Canada
Symbol of the Government of Canada

Methodology, data concepts and definitions

Methodology
Data concepts and definitions

Methodology

This section explains the basic methodology used to estimate the number of exporters by industry (NAICS), exporter size, province of residence, destination and number of employees (for 2007 only). Essentially, there are two fundamental parts involved in this process: the data linkage process and the estimation of the unlinked documents.

Data linkage process

Statistics Canada obtains trade data from two main sources: U.S. Customs documents and Canada Border Services Agency (CBSA) documents.

In 1990, a Memorandum of Understanding (MOU) was signed between Canada and the United States to exchange import data. Through this MOU, each country obtains a comprehensive list of exports to the other country. This is currently the largest source of export data in Canada. All remaining data on Canadian commodity exports destined for consumption in countries other than the United States are obtained from CBSA documents. The data from the two different sources are processed differently during the linkage process.

Step 1. Validate the exporter.

Exports to the United States: According to the Exporter Register Database, exports to the United States accounted for almost 79% of the value of Canada’s annual domestic exports in 2007 (Table 4-2 and Table 4-3). Each U.S. Customs document contains a vendor identification (ID) code. This code is constructed using the name and address of the Canadian exporter.

For each vendor ID code, it is necessary to:

  • Standardize: Each initial vendor ID code is assigned two codes. The first is a revised/standardized municipality, based on the Statistics Canada (StatCan) municipality library. The second is a revised/standardized province code (two-digit StatCan numeric code); and
  • Un-duplicate: Each initial vendor ID code (for a unique exporter and location) is linked to a single standard identification code for each vendor.

The duplication problem arises because the descriptive information (namely, vendor name and address) is not a standardized field on the U.S. Customs document.

For example, the municipality of "ST JOHNS" (as it is written in the StatCan municipality library) in Newfoundland (standardized province is 10) has been reported in a number of ways, including "Saint Johns", "St. Johns", "St. John's", "Saint John", "Saint Jean" and "St Jean", while the province has been reported as "Newfoundland", "Nfld", "Terre-Neuve", and "TN".

This makes any automated linkage exercise very difficult, because each different spelling or listing is considered a different item. So, an initial automated processing of the file is performed using the Postal Address Analysis System at Statistics Canada. This generalized application attempts to rearrange a freeform address into standardized positioned components.

Exports to destinations other than the United States: According to the Exporter Register Database exports to non-U.S. destinations accounted for about 21% of the value of Canada’s total domestic exports in 2007 (Table 4-1 and Table 4-3).

Within each record, an exporter ID code is attached. Unlike documents for exports to the United States, the exporter ID code can come from various sources. The exporter ID can be a payroll deduction number, a Customs and Excise number or, since 1997, a business number. However, in many cases, the exporter ID field is not completed. In such instances, a ‘dummy’ StatCan code is assigned, and then the name and address information is captured and stored. Each of the previously mentioned codes also has a repository of names and addresses.

For each exporter ID code, it is necessary to

  • Standardize: Each initial exporter ID code is assigned a revised/standardized municipality, based on the StatCan municipality library, and a revised/standardized province (two-digit StatCan numeric code); and
  • Un-duplicate: Each initial exporter ID code for a single exporter and location is linked to a unique revised exporter ID code.

As with exports to the United States, the present descriptive information (name and address) is not standardized. Again, an initial automated processing of the file is performed using the Postal Address Analysis System.

Step 2: Link exports to U.S. destinations and exports to non-U.S. destinations by name and address of the exporter.

After the standardizing and unduplication processes are completed, it is then possible to aggregate exports by unique exporter at the location level.

This process delivers a concordance file containing many initial ID codes for U.S. and non-U.S. destinations linked to one standardized exporter ID.

Step 3: Link unduplicated exporter information.

The final step is to ensure a proper linkage between the Business Register and the new file of exporters created for the Exporter Register Database.

Non-residents: Where feasible, exports by non-residents are allocated to their Canadian subsidiaries. When no Canadian subsidiary exists, non-residents are considered unlinked and Canadian exporters are estimated during the estimation process. For example, if a U.S. corporation is listed as the exporter of record on the Customs documentation for a given domestic export from Canada, then the corporation’s Canadian subsidiary, not the U.S. establishment, will be linked as the exporter.

Estimation of the unlinked portion

A relatively small but significant portion of the documents was not successfully linked to the Business Register. Therefore, based on the linked portion alone, the number of exporters underestimates the true size of the exporting community.

Moreover, the linked portion cannot provide consistent estimates when the linkage rate changes over time. This is the case for exports to countries other than the United States, where the proportion of unlinked documents shrank from an average of about 45% between 1993 and 1995 down to around 10% between 1996 and 2007. By contrast, coverage for U.S. destinations was high and relatively constant from 1993 to 2007 (Table 11).

The number of exporting establishments and the value of their exports were estimated for the unlinked portion, in order to provide a more complete and reliable picture of the exporting community.

The estimation methodology first uses the patterns of the linked portion to provide estimates for the unlinked portion, and then follows these steps:

Step 1. Estimate the export value of the unlinked portion by North American Industry Classification System (NAICS) industry, exporter size, employee class (for 2007 only), province and trading area.

First for 1997 to 1999, the estimated total value of non-captured documents is distributed to commodities, provinces and destinations, for inclusion in the estimates as part of the unlinked portion. These non-captured documents show exports of less than $10,000 in value to non-U.S. destinations. This is done using the distribution of the value observed in similar recorded transactions within the linked portion of exports to non-U.S. destinations. All documents were captured in 2007 regardless of destination or export value.

Second, the export value of the unlinked portion is distributed by NAICS industry, exporter size and employment (for 2007 only) based on observed patterns in the linked portion. For example, in the Fruit and other vegetable farms industry, if the export values of apples in documents of $30,000 to $100,000 has been equally reported by establishments of two sizes ($30,000 to $99,999 and $100,000 to $999,999) in the linked portion, then the value of the exported apples in an unlinked $50,000 document would be distributed equally between these two exporter sizes in this industry.

Third, the province of origin reported on the unlinked documents is used to approximate the province of residency of the exporters.

Fourth, the destination reported on the unlinked portion by NAICS industry, size and employee class (for 2007 only) is distributed to various trading area combinations based on the linked patterns. For example, exports to Japan of $30,000 to $100,000 from the Fruit and other vegetable farms industry would be equally distributed to ‘Japan only’ and ‘Japan and Mexico’, if this were the pattern observed in the linked portion. This is necessary because an exporter can export to multiple countries. Therefore, summing the number of exporters by destination will not yield the accurate number of exporters. The distribution by trading area combination tries to split exports by ‘unique exporters’, where the sum of exporters by these trading area combinations equals the total number of exporters.

Step 2. Calculate the average exports per establishment for each industry, exporter size and employee class (for 2007 only) in the linked portion.

It is assumed that this average should be the same for a given industry, size and employee class (for 2007 only) across provinces and destinations. The geometric mean formula has been used because of the uneven distribution of exports by establishment. Namely, there is a much greater number of smaller exporting establishments than larger ones.

Step 3. Estimate the number of exporters by NAICS industry, size, employee class (for 2007 only), province and destination.

To obtain counts of exporting establishments, divide the exports (sorted by NAICS industry, size, and province and trading area combination, as well as by employee class for 2007 only), by the average export value per establishment and size. Estimates of the population counts by destination are obtained by adding all the trading area combinations for each destination in which the unlinked portion is involved. For example, for Japan, to obtain the total number of unlinked exporters of size $30,000 to $100,000 for the Fruit and other vegetable farms industry, add the count of ‘Japan only’ plus ‘Japan and Mexico;’ for Mexico, add ‘Mexico only’ plus ‘Japan and Mexico.’ In this way, the exporter exporting to both Mexico and Japan is counted as exporting to both countries.

This methodology is applied at an aggregation level that balances homogeneity of the aggregates and reliability (minimum of observations). The most detailed level of industry classification available for establishments was the six-digit NAICS. To ensure a minimum number of exporters in the linked portion, establishments were aggregated to the four-digit NAICS level (or higher in some cases) to form 137 industry classes.

The exporter size, employment class (for 2007 only) and destination categories used in the tables of this publication were the same as those used for aggregation. The province and territory categories were used without aggregation. At this level of aggregation, estimated counts were rounded to the closest integer value.

In cases where unlinked documents did not have the corresponding patterns in the linked portion at the detailed level, the closest pattern available was used. For example, if the linked establishments did not export apples, then the exports of ‘unlinked apples’ was distributed according to the distribution of a more aggregated HS (Harmonized Description and Coding System) class for apples.

Results

The estimated counts for the unlinked portion represent 4% of the total number of exporters from 1996 to 2002, and 8% from 2003 to 2007. This is similar to the proportions of unlinked documents over the same periods. The proportion of unlinked value is only about 2% from 1993 to 2001, and about 5% from 2003 to 2007. This reflects the fact that low-value documents are more likely to be unlinked and, therefore, are more likely to be associated with smaller establishments with a lower average value of exports.

Potential sources of error

The unique nature of the source data in the Exporter Register Database lends itself to unique potential sources of error. The following are the most prominent sources of error:

  • Incorrect classification of commodities
  • Incorrect identification of destination or origin (a trade misallocation - for example, some exports are reported as going to the United States, when in fact they are only traveling through the United States on their way to another country)·
  • Trade undercoverage (occurs when exporting establishments do not file export documents)
  • Incorrect valuation of exports
  • Data capture errors
  • Incorrect data linkages (owing to clerical errors or poorly reported information).

Linkage Rates

The most appropriate data quality measure for these data is the linkage rates of the population. For the period 1993 to 2007, these rates indicate that, on average, 96% of the documents and 98% of the export value destined for the United States were linked to a valid establishment. Similarly, for the same period, on average 83% of the Customs documents and 93% of the value bound for non-U.S. destinations were linked. Table 11 highlights the annual linkage rates.

Data quality of unlinked establishments

There are two main sources of error to consider:

Biases

The main problem with these estimates relates to biases in the linked portion patterns. The most important bias stems from the assumption that the average export value per establishment is the same in both the linked and unlinked portions. This assumption means that the unlinked documents are not related to establishments already in the linked portion. However, an unknown proportion of unlinked documents are indeed related to linked establishments. This implies that the number of establishments corresponding to the unlinked portion is overestimated.

This overestimation is not believed to be too serious and is partially offset by a second source of bias. The larger establishments tend to be matched more effectively to the Business Register.

This increases the average exports per establishment in the linked portion, and thus creates a downward bias in the population estimates. This was more prevalent in the period 1993 to 1995 for low-value export documents to non-U.S. destinations.

Variance

If the observed exports per establishment in the linked portion vary a lot between establishments within the same group, the resulting estimates are likely to be less reliable. Therefore, the variance of the population estimates is directly related to the variance of the exports per establishment within establishment groupings. For 2007, the coefficient of variation of exports (after logarithmic transformation) by industry, exporter size, employee class, province and destination was less than 1% for 97% of the groups.

Data concepts and definitions

Statistical units of measure

Statistics Canada’s Business Register is a central repository of information on businesses operating in Canada. It is used as the principal frame for most of Statistics Canada’s economic statistical programs, including the Exporter Register Database. The Business Register provides consistent and standardized data at the establishment and enterprise levels for each year under consideration.

The standardized business classification model developed at Statistics Canada comprises a four level hierarchy of statistical entities:

  • Enterprise the top of the hierarchy, which is associated with a complete (consolidated) set of financial statements;
  • Company — the level at which operating profit can be measured;
  • Establishment — the level at which the accounting data required to measure production are available (principal inputs, revenues, wages, etc.); and
  • Location — the bottom of the hierarchy, which requires only the number of employees for delineation.

As in previous editions of this report, the statistical unit used in the Exporter Register Database is the statistical establishment, which represents a unit of production, such as a factory, plant or a head office. A statistical enterprise represents the sum of the statistical establishments under its control.

The industry of the exporting establishment may sometimes be different from the industry of the enterprise. Although this publication attributes exports to the industry of the exporting establishment, data are also given for the top 50 enterprises that export.

This publication conforms to the North American Industry Classification System (NAICS). NAICS is an industry classification system developed by the statistical agencies of Canada, Mexico and the United States. It provides common definitions of the industrial structure of the three countries and a common statistical framework to facilitate the analysis of the three economies.

The Exporter Register Database provides time-series statistics on exporting establishments and enterprises. Using the Business Register to link statistical entities through time is a complex task because of the frequency of re-organizations, mergers and takeovers, which often impact only the structure of the enterprise and leave the structure of the establishment unaffected. A new enterprise identifier is not always created when the structure of an enterprise changes. Therefore, the most recent structure is allocated throughout the period 1993 to 2007 in the Exporter Register Database.

As an example, consider two hypothetical enterprises called ABC and YYZ. Enterprise YYZ began exporting in 1993 and was taken over by ABC in 1998. During the takeover, ABC transferred its own business identifier to YYZ. The Exporter Register Database looks at the most recent data year available on the Business Register and transfers this information to the Exporter Register Database for all years under consideration. In 2007, YYZ is no longer on the Business Register; only ABC exists. Suppose that ABC also began exporting in 1993. Throughout the time series, ABC would now replace YYZ.

Technically, both enterprises co-existed for a period (1993 to 1997); however, because of the data refreshment on the Exporter Register Database in 2007, only one enterprise (ABC) is recorded as existing from 1993 to 2007.

It is important to note that this situation occurs only at the enterprise level: the establishment identifier number does not usually change during mergers or takeovers. This is the one reason why the establishment level was selected to measure the exporter population.

Another reason for using the establishment as the main statistical unit of measure is that it allows estimation at the provincial/territorial level. An enterprise often operates several establishments. These establishments can be located in more than one province/territory. Since a single establishment operates from one province or territory only, deriving provincial/territorial estimates at the establishment level is more meaningful.

Coverage of the Exporter Register Database

Merchandise trade transactions for a given year include domestically produced exports as well as re-exports.1 The Exporter Register Database includes only the value of domestically produced exports and covers more than 95% of these domestic exports. The remaining share not covered can be attributed to the following:

  • Very small exporters: Establishments with annual exports of less than $30,000 during every year from 1993 to 2007 are outside the scope of the Exporter Register Database. It can be difficult to identify, track and classify small exporters by business frame because of the infrequency of their exports or the low quality of the source documents. Many of these exporters are unincorporated businesses, individuals or institutions whose export patterns are irregular and difficult to monitor. As a result, exporters with less than $30,000 in exports for every year from 1993 to 2007 are not included in the Exporter Register Database.
  • Special trade transactions: Merchandise exports are a record of commodities that cross the border. Exporters range from large multinational corporations to individuals sending personal effects to another country. The objective of the Exporter Register Database is to identify Canadian establishments that export. Therefore it is important to remove all data unrelated to business activity. One way to do this is to eliminate all commodities that would most likely be exported by individuals for personal, non-business use. These commodities are mainly identified in Chapter 99 of the Harmonized Description and Coding System used by the International Trade.
  • Confidential transactions: Transactions that are allocated to Chapter 99 are not included in the Exporter Register Database.

For comparative purposes, Table 4-1 contains the Exporter Register Database value totals and ITD published totals for domestic export values. Table 12 outlines a list of the commodities not covered by the Exporter Register Database.

Non-reported trade

Canadian export transactions valued at less than $2,000 to non-U.S. destinations are not required to be reported to Canada Border Services Agency (CBSA). Therefore, this information does not appear in Statistics Canada’s export statistics, and as a result they do not appear on the Exporter Register Database.

Existing dimensions of the Exporter Register Database

The Exporter Register Database currently disseminates data on the number of exporters and the value of exports by industry grouping, exporter size, province of residence, and destination of export. In this edition of the Register, exporters are also grouped by their employment size (for 2007 only). Multidimensional tables at aggregated levels are also available. Despite aggregation, not all data in this format can be released because of confidentiality issues. Some descriptive background information on each of these dimensions follows.

  • Industrial classification

    The Exporter Register Database classifies exporters by the North American Industry Classification System (NAICS). The original version of the Exporter Register Database classified exporters by the Standard Industrial Classification for Establishments (SIC-E), which is based on products and relates to the producer, not the exporter. The NAICS system is a comprehensive system encompassing all economic activities of the establishment under consideration.

    To illustrate, consider an enterprise ABC that is composed of two separate establishments (situated in different provinces). One establishment (a plant) only produces goods, whereas the other establishment (a wholesaler) only distributes them. Each establishment has its own NAICS code. If the distributing establishment always acts as the exporter for ABC, then this will be the establishment included in the Exporter Register Database and the exports will be attributed to the wholesale trade NAICS code. This can explain why the Wholesale trade industry accounts for such a significant share of exports—13% of total value and 22% of exporting establishments in 2007 (Table 1-1 and Table 2-1).

    A similar phenomenon holds for the Business Services industry. One reason why this industry accounted for 5% of the total value of exports and 9% of exporting establishments in 2007 stems from corporate head offices being listed as the exporter of record (Table 1-1 and Table 2-1). If a corporate head office reports the domestic export, then the NAICS code for the head office (a business services code) is attributed to that exporter.

    The Exporter Register Database covers trade in domestically produced merchandise, but does not include trade in services. However, if a service-producing establishment (e.g., a consultant) exported goods (e.g., computer equipment), then this establishment (and the value of the goods exported) would be included on the Exporter Register Database, yet the NAICS code would be a business services code.
  • Exporter size

    This concept is a key variable in the analysis of the exporting community, given the high proportion of exports by a small proportion of exporters. Each exporting establishment has been assigned to a size class according to the value of its total domestic exports (and employment for 2007 only). Since the ‘exporter size’ variable refers only to the value of the establishment’s exports, it is possible to have a large producer in terms of employment classified as a small exporter in terms of the value of exports.
  • Employment size

    The number of exporting establishments and the value of their exports are also grouped according to employment counts for 2007 only.
  • Province of residence

    The term ‘province of residence’ represents the province/territory where the exporting establishment is located. ‘Province of origin’ represents the province/territory where the commodities under consideration are grown, extracted, processed or manufactured.

    Statistics Canada’s International Trade Division reports merchandise trade statistics by province of origin. The Exporter Register Database reports exports by province of residence of the exporting establishment. By identifying the exporter, commodities are classified according to the residence of the exporter, rather than the origin of the manufacturer or producer. This is important because manufacturing a commodity is a different activity than exporting one.

    For example, suppose a commodity is manufactured in Ontario and exported by an establishment located in Nova Scotia. Ontario would be the province of origin reported on the Customs document, despite the fact that the exporter resides in Nova Scotia. The exporter’s province of residence is obtained from the Customs document. Often, the same establishment performs the production and exporting activities. However when these activities are separated and located in different provinces/territories, the province of origin and province of residence do not coincide.

    Table 13 shows that Quebec, Ontario and Alberta have higher percentage shares of total value of exports by province of residence than by province of origin. This indicates that these provinces had slightly more commodity-exporting activities than commodity-producing activities. This may be attributed to wholesaling industries and the activities of head offices. The opposite holds true for Newfoundland and Labrador, Prince Edward Island, Nova Scotia, Manitoba, Saskatchewan and British Columbia (including the territories); they showed slightly higher production values than export values. New Brunswick demonstrated no significant difference between the two measures.
  • Destination

    The destination countries or states (of the United States) indicated on Customs documents are used to allocate an establishment’s exports. Specific destinations were aggregated to five U.S. regions and to five country groupings. These groupings are further aggregated to U.S., non-U.S., U.S. only, non-U.S. only, both U.S. and non-U.S., and a total of all countries. The detail break-downs of each of these destination groupings are listed as the follows:
    1. U.S. Groupings:
      • Eastern Seaboard: Connecticut, Delaware, District of Columbia, Maine, Maryland, Massachusetts, New Hampshire, New Jersey, New York, North Carolina, Pennsylvania, Rhode Island, Vermont, Virginia, West Virginia
      • Industrial Heartland: Illinois, Indiana, Kentucky, Michigan, Ohio, Wisconsin
      • Midwest: Colorado, Iowa, Idaho, Kansas, Minnesota, Missouri, Montana, North Dakota, Nebraska, New Mexico, Oklahoma, South Dakota, Texas, Utah, Wyoming
      • Southeast: Alabama, Arkansas, Florida, Georgia, Louisiana, Mississippi, Puerto Rico, South Carolina, Tennessee, U.S. Virgin Islands
      • West: Arizona, Alaska, California, Hawaii, Oregon, Nevada, Washington
    2. Non-U.S. Grouping:
      • European Union: Austria, Belgium, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Poland, Portugal, San Marino, Slovakia, Slovenia, Spain, Sweden, United Kingdom
      • South America: Argentina, Bolivia, Brazil, Chile, Columbia, Ecuador, Falkland Islands, French Guyana, Guyana, Peru, Paraguay, Surinam, Uruguay, Venezuela
      • Other: this category comprises 203 countries that are not already listed in the above mentioned categories.

Note that an establishment can export to different destinations and can, therefore, be counted in more than one destination. For this reason, the population counts shown in tables 3-2 and 3-3 do not always add up. For example, adding the exporters who export to U.S. destinations to the exporters who export to non-U.S. destinations will not give the total number of exporters. However summing exporter counts in three aggregates U.S. only, non-U.S. only, and both U.S. and non-U.S. will yield the total number of exporters.


Note

  1. Re-exports represent commodities imported to Canada and exported to another country without being materially transformed. This includes foreign goods withdrawn for export from bonded customs warehouses. This definition does not apply to commodities of United States origin that return to the United States from Canada without being transformed. These goods are coded to HS 9904.00