The Small Business Profiles present selected revenue, expense, profit and balance sheet items as well as financial ratios and employment data on small business in Canada.
Data release – February 6, 2009
The Small Business Profiles present selected revenue, expense, profit and balance sheet items as well as financial ratios on small business in Canada. The profiles are available from Industry Canada who provide the data through their Performance Plus on the website -- http://www.sme.ic.gc.ca
The target population consists of small businesses which are defined as those having annual revenue between $30,000 and $5,000,000. The information is presented by industry using the North American Industrial Classification System (NAICS) to the 6 digit level.
This is a sample survey with a cross-sectional design.
T1 sample design (unincorporated businesses)
For unincorporated businesses; the target population consists of paper filers and e-filers. This component of the estimates is derived from T1 e-filer information, submitted to the Canada Revenue Agency and maintained by the Tax Data Division of Statistics Canada, with revenue between $30,000 and $5 million. For reference year 2006, a total of 536,481 T1 e-file returns were used.
T2 universe (incorporated businesses)
This component of the estimates is derived from T2 incorporated balance sheet and income statement information submitted to the Canada Revenue Agency and maintained by the Tax Data Division of Statistics Canada. Business reports of 862,394 enterprises, processed as of January 2009, are included in the 2006 T2 estimates.
Data are extracted from administrative files.
The 2006 Profiles are produced using information extracted from tax returns submitted to the Canada Revenue Agency (CRA) for the 2006 tax year. Tax Data Division of Statistics Canada maintains files that provide income and expenses from self-employment for unincorporated businesses as well as GIFI schedules (Balance Sheets and Income Statements) for incorporated businesses.
Once the data is collected and captured by CRA, it is sent to Statistics Canada and run through edit programs, which identify errors, inconsistencies and extreme values. Data that fail to meet predetermined criteria are referred to analysts for appropriate action. At this stage of processing, all industries are handled in similar fashion. A second set of edits are also applied to the data after capture to ensure that basic inconsistencies, such as sub-totals not adding to totals, do not appear.
Imputation is the process whereby records with missing data (recipient records) have values assigned based on the data of records with more complete data (donor records). Imputation is done using the "nearest neighbour" method - using matching variables, the donor record most like the recipient record is identified and the information from this donor record is used. The matching variables are usually industry, total revenue and total expenses. Imputation is used in two cases - when a data point reported by a business is judged "extreme" or when a business fails to itemize all or part of the information.
After edit and imputation have been completed, estimation methods are used to relate the sample to the population. Each record is allocated a weight according to its probability of selection in the sample. The weight reflects the proportion of the population actually observed in the sample. Estimates deemed of unacceptable quality, or which violate confidentiality rules, are identified and removed. Using the weights, values are calculated for each variable in each industry, area and revenue grouping combination.
Half and quartile boundaries
The half and quartile boundaries are calculated for each business type by industry and by area. The businesses are ranked from lowest to highest operating revenue. The sample weights are re-scaled so that the sum of the weights equals one. The half boundary will be the total revenue value from the record that lies exactly on the mid-point (0.50) of the re-scaled weights. The quartile boundaries will be the total revenue values from the records which lie on the 0.25, 0.50 and 0.75 points of the re-scaled weights. Average data for the expense, balance sheet and other variables are then calculated using only those businesses allocated to each half or quartile group and the original sample weights.
Statistics Canada is prohibited by law from releasing any data which would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
All data are subject to confidentiality restrictions prior to release. For the Profiles, the following rules are used to determine whether data meet confidentiality restrictions.
A) A profile will not be published if the sample size is less than 5 records.
B) A half or quartile will not be published if there are fewer than 4 sample records falling into the half or quartile.
C) If a half or quartile is suppressed, the corresponding half or quartile is also suppressed.
Halves and quartiles data suppressed due to confidentiality will appear as an lowercase x in the cell in a profile. The revenue ranges will also be an lowercase x in the cell.
While considerable efforts have been taken to ensure high standards throughout all stages of the collection and processing, the resulting estimates are inevitably subject to a certain degree of non-sampling error. Non-sampling error is not related to sampling and may occur for various reasons. Population coverage, mistakes in recording, coding and processing data are examples of non-sampling errors.
Non-sampling errors are controlled through a careful design of the processing, the use of a minimal number of simple concepts and consistency checks.
Sampling error can be measured by the standard error (or standard deviation) of the estimate. The coefficient of variation (CV) is the estimated standard error percentage of the survey estimate. Estimates with smaller CVs are more reliable than estimates with larger CVs.
Code CV Range (%) Description
A Less than 5.01 Good
B 5.01 to 15.00 Satisfactory
C 15.01 to 33.33 Poor, use with caution
F Greater than 33.33 Suppressed