Data quality, concepts and methodology: Data quality, concepts and methodology

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

1- Survey methodology

The 2009 survey

The 2009 survey collected data on four years. The four years were:

  1. 2008 for which the data are expected to be final;
  2. 2009 for which the data are expected to be close to final,
  3. 2010 for which the data are planned expenditures, and
  4. 2011 for which the data are a forecast of spending intentions.

Estimates are not available for administrative data for 2010 and 2011. Therefore, based on the percentage increase or decrease by industry reported by the surveyed firms, forecasts are made for planned expenditures and spending intentions based on the administrative data.

The 2009 survey was mailed out in August 2010. The largest performers by industry group were selected, along with a random sample of small and medium R&D performers. Particulars are elaborated below.

The mailing list of companies was made up of firms which had reported R&D in the previous surveys, firms claiming an R&D income tax incentive for 2009, firms reported by government respondents as R&D contractors or grantees for 2009 to 2010, firms reported by other companies as funding or performing of R&D, and firms indicated in some other way, such as newspaper or journal articles or provincial directories. These larger performing and/or funding companies received the Research and Development in Canadian Industry questionnaire, covering R&D performing expenditures for: 2008, 2009, 2010 and 2011.

Upcoming and recent changes to survey methodology

The RDCI is continuing through an ongoing process of change. There have been a series of changes in methodology over the past few reference years and will experience changes going forward. These changes are itemized below by reference year in which they are being or were implemented.

Changes implemented for the 2008 reference year

Data users are advised that the RDCI was formally linked to the Business Register (BR) for reference year 2008. The BR is the survey frame for all industry-based surveys. As part of the linking process some statistical entities which were treated as enterprises for the RDCI universe are in fact companies on the BR (/concepts/units-unites-eng.htm). Steps were taken to ensure consistency of the data at the industry level, but there were some impact in the distribution R&D expenditures and personnel at the industry level.

This change also had some impact on the count of R&D firms at the provincial level, as information about the structure of the enterprise has been used to allocate R&D expenditures as reported through administrative data across multiple provinces where applicable. Previously, the expenditures and personnel were reported in one province only, based on the province in the address from which the tax records were filed.

Survey sample methodology for 2008 and 2009 reference years

For reference years 2008 and 2009 the survey sample methodology was revised to improve the quality of forecast estimates at the industry level. The entire population of all known R&D performing enterprises and firms which fund or purchase technologies were sorted by NAICS-based industrial categories (link to BSMD report) and then divided into the following groups:

  1. Special entities were included on a "must-take" list. These entities included industrial non-profit organizations, known R&D performers that do not file scientific research and experimental development SR&ED tax credit applications, and technology purchasers or vendors.
  2. The largest R&D performers in each industrial category (the "take-all" list). These large firms cover about two-thirds of R&D expenditures in the given industry group.
  3. Mid-size R&D performers in each industrial category were placed on the "take-some" list, which meant that these units were randomly selected within each industrial category.
  4. The smallest R&D performers in each industrial category were placed on a "take-none" list and excluded from the sample so as to reduce response burden for the smallest firms. These firms continue to be included in our tabulations as their R&D data is imputed using CRA administrative data from the SR&ED program.

Changes implemented for the 2007 reference year

For reference year 2007, all companies believed to be performing or funding one and a half million dollars or more of R&D were sent a questionnaire. The mailing list of companies was made up of firms which had reported R&D in the previous survey, of firms claiming an R&D income tax incentive for 2007, of firms reported by government respondents as R&D contractors or grantees for 2007 to 2008, of firms reported by other companies as funding or performing of R&D, and of firms indicated in some other way, such as newspaper or journal articles or provincial directories. These larger performing and/or funding companies received the Research and Development in Canadian Industry questionnaire, covering R&D performing expenditures for: 2006, 2007, 2008 and 2009.

Changes implemented for the 2006 reference year

To relieve respondent burden, the survey threshold was raised from one million dollars to one and one half million dollars in the survey year 2006, thereby reducing the number of surveyed firms. These firms continue to be included in our tabulations as their R&D data is imputed using CRA administrative data from the SR&ED program.

To improve data quality for two of the survey's classification variables - Revenues in Canada and Number of Employees in Canada - administrative sources were used to replace missing or inconsistent data.

Beginning reference year 2006, Canada Revenue Agency (CRA) Payroll Deductions total employment data (PD7) was used to improve the quality of missing or inconsistent total employment data for survey years 2001 through the current survey year. Payroll Deduction data are monthly data, therefore an annual average is calculated from CRA monthly Payroll Deduction data for all business enterprises that reported having one or more employees in at least one of the twelve months of the tax year.

Changes implemented for the 2005 reference year

Beginning reference year 2005, revenue figures for the SR&ED tax filers were adjusted to reflect corporate income tax data for the corresponding filer. These tax data are from T2 corporate income tax data mapped to the Statistics Canada Chart of Accounts (COA) classification, by firm, from Tax Data Division. The variable COA4 comprises (Total) Revenue for firms. COA4 values were used to improve data quality for missing total revenues data from reference year 1997 through the current year. Inconsistent reported total revenue data were also examined by subject matter experts with reference to COA4 data. Within the publication, the revisions have impacted the revenue size groups. It is believed the revisions have substantially improved the quality of the revenue variable.

2008 Canada Revenue Agency (CRA) changes to the Scientific Research and Experimental Development (SR&ED) tax forms

In 2008, the Canada Revenue Agency (CRA) introduced new tax forms for applicants to the Scientific Research and Experimental Development (SR&ED) investment tax credit program. These changes have impacted the data produced from the Research and Development in Canadian Industry (RDCI) survey. The new forms went into effect in November 2008. SR&ED applicants have been given the opportunity to use either the new or the old forms for their financial years ending in 2008. Please see the CRA's web-site for copies of the new and old SR&ED tax forms (http://www.cra-arc.gc.ca/E/pbg/tf/t661/README.htm).

The CRA changes that impact data continuity include:

  1. February 25, 2008, the federal budget provided for a change in the SR&ED tax qualified expenditures for wages and salaries of R&D activities performed outside of Canada that was directly performed by employee(s) of the applicant; "the employee who performed the SR&ED work was a resident of Canada at the time the expense was incurred; the SR&ED work carried on by the employees outside Canada was an integral part and solely in support of the SR&ED work for a project carried on in Canada; and salary or wages paid were not subject to income or profits tax from another country." (Guide to Form T661 – Scientific Research and Experimental Development (SR&ED) Expenditures Claim, http://www.cra-arc.gc.ca/E/pub/tg/t4088/t4088-11e.htm, accessed December 09, 2008).
  2. The nature of R&D are no longer available.
  3. The area of specialization of R&D activities (biotechnology, software development, and environmental protection) are no longer available.
  4. R&D personnel are not clearly identified as required in full-time equivalent on the SR&ED form which may impact related tables.

Other changes to the SR&ED forms which impacted data processing for 2008 reference year are:

  1. R&D expenditures are by project rather than program.
  2. Selected type of R&D activity by project is included.
  3. Science type has been added.
  4. Type of location used for R&D has been added.

For the 2008 R&D expenditures, SR&ED tax data were processed from two forms, therefore, data availability for 2008 are limited when compared with data from previous years.

The survey's history

Data on R&D in the business enterprise sector, covering commercially oriented enterprises (privately or publicly owned), industrial non-profit organizations and trade associations, have been collected since 1955. Until 1969, the survey was biennial. From 1970 to 1981, all known performing or funding companies of industrial R&D were surveyed for odd-numbered years and a sample, including the leading performers, were surveyed for even-numbered years. From 1982 to 1991, a full survey was conducted annually.

Because of reductions in the science and technology program, only the top 100 R&D performers (accounting for 64% of all industrial R&D) were surveyed for the 1992 and 1994 reference years. However, as a result of a cost-sharing agreement with the province of Quebec, the 1992 and 1994 industrial R&D survey results also included small firms having R&D activities in the province of Quebec.

Prior to 1997, Statistics Canada surveyed all firms that performed or funded R&D in Canada. Virtually all of these firms also provided information to CRA in order to claim tax benefits under the Scientific Research and Experimental Development (SR&ED) tax incentive program. In an effort to reduce respondent burden, Statistics Canada stopped surveying the small performing and funding companies (those with less than $1 million of R&D in Canada) and instead, imputes their R&D data using CRA administrative data from the SR&ED tax incentive program. In the 2006 survey year this threshold was raised to $1.5 million thereby further reducing respondent burden.

When first implemented, this administrative data initiative resulted in an understatement of the total value of intramural expenditure and of the total number of R&D personnel. Under the current tax regulations, firms must file their application to the SR&ED program within 18 months of expenditure. Once claims are submitted, they are processed and forwarded to Statistics Canada. As a result, data may not arrive for up to two years after the incurrence of expenditures. To remedy the situation, an imputation system was subsequently put into place to impute values for outstanding administrative data. This imputation system confirms the company is active using Statistics Canada's extensive Business Register, and then applies an imputation based on industry trends.

Recent developments in R&D spending are important economic signals, desired promptly by a variety of users. Because the small imputation of outstanding CRA data does not seriously influence overall trends, the R&D data are published as soon as possible after the survey is conducted, and revised in subsequent publications.

Data quality

One of the problems in a survey of this type is to ensure that the quality of the data is satisfactory. It cannot be expected that all firms funding R&D will be surveyed, will respond and will report correctly. There are sources of information such as federal government grant and contract lists to aid in identifying firms and editing returns. In addition, complete coverage cannot be assured. This is especially true for the smaller companies in the service industries. The term, R&D, in spite of survey guidelines, can be misinterpreted.

Different interpretations of the definition of R&D also result in discrepancies between federal government reporting of funds to industry (the business enterprise sector) for R&D and industry's reporting of such funds. For example, a federal government department may regard a contract to industry for the building of a prototype (e.g., communications satellite) as R&D. The contractors and subcontractors, however, may only use a portion of the R&D contract and even that portion may not be reported because the contract is considered as part of the firm's "routine" contract work. Differences may also arise for contracts awarded to industry for services or equipment required for a government in-house project which are reported by the federal sponsor as industrial R&D contracts. Therefore, the totals for R&D grants and contracts from the federal government to industry shown in this publication do not agree with those reported in Federal Science Activities, 2009/2010, (Catalogue no. 88-204-X).

Other notes

The business enterprise sector is the only sector in which data are not collected on R&D in the social sciences and humanities.

In this survey, the sampling unit is the enterprise while the reporting unit may, in some cases, be the company. The survey is designed to reflect the structure of the enterprise as it appears on the Business Register and the structure of the enterprise as it reports its R&D activities (including reporting R&D expenditures for the SR&ED tax incentive program). This procedure creates a problem when classifying data by industry. An enterprise can only be assigned to one industry although that enterprise may have companies or establishments in several industries. The assignment is based on the activity from which the firm derived the greatest portion of its income. Thus, comparisons between R&D data collected at the enterprise or company level and other data collected at the establishment level, such as "census value added", may be misleading. Since industrial R&D is highly concentrated, the use of the company/enterprise as the main reporting unit also means that classification cannot be very detailed, to avoid disclosing individual company data.

The survey response

The response for the 2009 "base year" survey is shown below.

For 2009, the response rate was 62%. Survey questionnaires were mailed to 1,985 firms: 1,114 were returned; 162 indicated no research and development activity; 7 were out of business and 10 were duplicates.

An additional 18,301 firms were added to the survey universe from the 2009 Scientific Research & Experimental Development tax incentive program data.

Interpretation of R&D

Generally speaking, industrial R&D is intended to result in an invention which may subsequently become a technological innovation. An essential requirement is that the outcome of the work is uncertain, i.e., that the possibility of obtaining a given technical objective cannot be known in advance on the basis of current knowledge or experience. Hence much of the work done by scientists and engineers is not R&D, since they are primarily engaged in "routine" production, engineering, quality control or testing. Although they apply scientific or engineering principles their work is not directed towards the discovery of new knowledge or the development of new products and processes. However, work elements which are not considered R&D by themselves but which directly support R&D projects, should be included with R&D in these cases. Examples of such work elements are design and engineering, shop work, computer programming, and secretarial work.

If the primary objective is to make further technical improvements to the product or process, then the work comes within the definition of R&D. If however, the product, process or approach is substantially set and the primary objective is to develop markets, to do pre-production planning or to get a production or control system working smoothly, then the activity can no longer be considered as part of R&D even though it could be regarded as an important part of the total innovation process. Thus, the design, construction and testing of prototypes, models and pilot plants are part of R&D. But, when necessary modifications have been made and testing has been satisfactorily completed, the boundary of R&D has been reached. Hence, the costs of tooling (design and try-out), construction drawings and manufacturing blueprints, and production start-up are not included in development costs.

Pilot plants may be included in development only if the main purpose is to acquire experience and compile data. As soon as they begin operating as normal production units, their costs can no longer be attributed to R&D. Similarly, once the original prototype has been found satisfactory, the cost of other "prototypes" built to meet a special need or fill a very small order are not to be considered as part of R&D.

Reliability of the data

There are two main origins of error: sampling errors and non-sampling errors. Within these two varieties there are a series of different types of errors. These types of errors are specified below.

Non-sampling errors

The four main types of non-sampling error are:

  1. Coverage error
  2. Measurement error
  3. Non-response error
  4. Processing error

Coverage

"Coverage errors are introduced whenever the sampling frame...does not adequately represent the target population at the time of the survey." 1  They "consist of omissions, erroneous inclusions, duplications and misclassifications of units in the survey frame." 2 

Survey questionnaires are sent to all known and suspected, large R&D performing and/or funding companies i.e., those believed to have the largest R&D expenditures within their industry group.

Administrative data are used for the remaining R&D performing or funding companies which are not included in the questionnaire coverage. Companies have up to 18 months after their fiscal year end to claim a tax credit for their R&D expenditures. Underreporting due to this time lag is estimated to be less than 8%, and is largely corrected by imputation based on industry trends for all known performers who have not yet submitted their claim.

Measurement

"Measurement error is the difference between the recorded response to a question and the 'true' value… One of the main causes of measurement error is misunderstanding on the part of the respondent or interviewer." 3 

As a result of a reconciliation of federal and industrial accounts of government grants and contracts, we think that industrial R&D performance estimates may be slightly low. This is caused by the non-reporting of industrial R&D funded by contract. Such work is sometimes not distinguishable from non-R&D contract work.

The accuracy of the company's estimates of future expenditures has also been a problem in the past, particularly in the wells and petroleum products industries.

Non-response

"Non-response occurs when information required for a survey unit is missing. This could happen because the unit cannot be contacted, because the unit is unable to provide the information requested, or because the unit refuses to cooperate in the survey." 4 

Non-response is a potential problem in three areas. One is the estimate of R&D expenditures two years past the base year. If no response is provided, editing rules are applied and a response is imputed based on the response of a similar firm in the same industry group.

The second involves the administrative data used for the smaller R&D performers. These represent 20% of all R&D performed by businesses. Certain information is not asked of them. However, the missing data are imputed from the replies of the sampled performers in the same industry.

Failure of surveyed companies to reply is the third type of non-response. We believe non-response error to be minor and may result in a minor under-estimation of R&D expenditures.

Processing

"Processing errors can occur during data coding, data capture, editing or imputation… Coding entails either assigning a code or comparing a response to a set of codes and selecting the one that best describes the response … Data capture errors result when the data are not entered into the computer exactly as they appear on the questionnaire… Editing is the application of checks to identify missing, invalid or inconsistent entries that point to data records that are potentially in error. Imputation is a process used to determine and assign replacement values to resolve problems of missing, invalid or inconsistent data." 5 

Processing errors are often monitored and controlled using quality control techniques.

Data capture

"The data capture operation in a census or survey consists of converting the data received on questionnaires (e.g., respondent answers) to a machine readable format." 6 

All data received from respondents are captured into a database application for further processing.

Significant uncorrected data capture errors are unlikely because of the examination of numerous tables and listings prepared for data validation and analysis before publication tables are created.

Edit and imputation

"The edit procedure usually consists of: (i) checking each field of every record to ascertain whether it contains a valid code or entry; (ii) checking codes or entries in certain predetermined combinations of fields to ascertain whether codes or entries are consistent with one another... The imputation procedure consists of changing values in some of the fields in records which failed the edit rules with a view to ensuring that the resultant data records satisfy all edit rules." 7 

Although there are a number of edits, all cases of failed edit checks are corrected after review. Automatic imputations are made for the administrative SR&ED tax data portion of the universe as well as for non-response and invalid response within the questionnaire portion of the universe.

Sampling

"Sampling error (is) defined as the error that results from estimating a population characteristic by measuring a portion of the population rather than the entire population." 8 

Although a complete enumeration is carried out of known and suspected R&D performing and/or funding companies, records received from the administrative data do not provide as much information as does the sampled universe. Certain data are imputed for records from the administrative file based on the patterns of survey response in the same industry.

2- Technical notes

Data availability

Data for the reference year 2009 are available for all tables with the exception of counts of companies.

In the even years prior to 1982 and for 1992 and 1994, the estimation procedures did not permit the preparation of tables based on revenue size, employment size, sources of funds and country of control of companies.

Regional data on research and development (R&D) expenditures and personnel are only available for 1977, 1979 and 1981 to 2009.

Terminology

The following terminology is used within the publication:

Performing company: is the organization which carried out the R&D. In the case of a consolidated return, performing company could include several companies. It also includes divisions of an enterprise which send separate returns or organizations such as industrial non-profit organizations.

Related companies: Includes parent, subsidiary and other affiliated companies. In the case where a consolidated return is submitted, "related companies" would exclude companies included in the consolidation.

R&D contracts for other companies: R&D contract work performed by the reporting company for other companies.

Federal grants: Federal R&D grants and the R&D portion of any other federal grants; it excludes funds or tax credits for R&D tax incentives.

Federal contracts: Federal R&D contracts and the R&D portion of any other federal contracts.

Provincial sources: Provincial R&D grants and contracts, and the R&D portion of any provincial grants and contracts; it excludes funds or tax credits for R&D tax incentives.

Other Canadian sources: Includes funds from universities and from levels of government other than federal and provincial.

Intramural expenditures: Expenditures for R&D work performed within the reporting company, including work financed by others.

Current intramural expenditures: Labour costs, fringe benefits and other current costs for R&D, including non-capital purchases of materials, supplies and equipment but excluding capital depreciation. Current intramural expenditures also include contracts for services required to carry out R&D (e.g. contracts awarded for drilling needed for heavy oil R&D).

Capital expenditures: Expenditures on fixed assets used in the R&D program, classified into land, buildings, and equipment.

Revenues: Revenues resulting from the sale of products and services (after deducting sales and excise taxes), and other revenues such as those generated from investment and rentals.

Non-commercial firms: R&D performers without a directly affiliated Canadian commercial base. Included are industrial non-profit organizations and trade associations, R&D performed by consortia, and R&D performed by non-residents without associated commercial enterprises and funded principally from abroad.

Country of control: In most cases of foreign control, the country of control is the country of residence of the ultimate foreign controlling parent corporation, family, trust, estate or related group. Each subsidiary within the global enterprise is assigned the same country of control as its parent. A company whose voting rights are equally owned by Canadian-controlled and foreign- controlled corporations is Canadian-controlled. If two foreign-controlled corporations jointly own an equal amount of the voting rights of a Canadian resident company, the country of control is assigned according to an order of precedence based on their aggregate level of foreign control in Canada. For example, United States takes precedence over all other foreign countries because it has the highest level of aggregate foreign control in Canada.

R&D personnel: Calculated in full-time equivalent (FTE). R&D may be carried out by persons who work solely on R&D projects or by persons who devote only part of their time to R&D, and the balance to other activities such as testing, quality control and production engineering. To arrive at the total effort devoted to R&D in terms of person-years, it is necessary to estimate the full-time equivalent of these persons working only part-time in R&D.

Full-time equivalent (FTE) = number of persons who work solely on R&D projects + estimate of time of persons working only part of their time on R&D.

Example calculation:

If out of five scientists engaged in R&D work, one works solely on R&D projects and the remaining four devote only one quarter of their working time to R&D, then: FTE = 1 + 1/4 + 1/4 + 1/4 + 1/4 = 2 scientists.

Federal government funds for industrial R&D: Federal support consists of grants and contracts for R&D to be performed by business enterprises. Taxes foregone as a result of income tax incentives for R&D are not considered direct government support and are not attributed to the federal government.

Industrial classification

North American Industry Classification System (NAICS) is the standard industrial classification system used for presenting R&D expenditures data for the business enterprise sector. There are limitations to its use. One important limitation is due to enterprises with activities in more than one industry (e.g., companies which both refine petroleum and extract oil). Another is caused by the concentration of the R&D activity among a few enterprises. In order to prevent disclosure of individual respondents NAICS codes may be combined to provide sufficient observations for publication.

A third problem is that the classification, chosen to represent general industrial activity, may not be entirely suitable for identifying companies chosen only for their involvement in R&D.

There are some restrictions on the application of the NAICS, for example, large R&D performing companies that are classified as "holding companies" are assigned to the principle industrial activity of the enterprise.

The R&D activities of other sectors such as the federal government, provincial governments, higher education, and private non-profit organizations are covered in other reports.

3- Definitions

Research and development

For the purpose of this survey, research and development (R&D) is systematic investigation carried out in the natural and engineering sciences by means of experiment or analysis to achieve a scientific or technological advance.

Research is original investigation undertaken on a systematic basis to gain new knowledge.

Development is the application of research findings or other scientific knowledge for the creation of new or significantly improved products or processes. If successful, development will usually result in devices or processes which represent an improvement in the "state of the art" and are likely to be patentable.

Example:

The investigation of electrical conduction in crystals was research. The application of this knowledge to the creation of a new amplifying device - the transistor - was development. The application of the device to the construction of new electrical circuits for television receivers was development. The formulation of new plastic cases for a television receiver is design, not development.

Research and development may be carried out either by a permanent R&D unit (e.g., R&D division) or by a unit generally engaged in any non-R&D activity such as engineering or production. In the first case, the R&D unit may spend part of its time on routine testing or trouble shooting or on some other activities which should not be included in R&D. In the second, only the R&D portion of such units' total activity should be considered.

Research and development should be considered to be "Scientific Research and Experimental Development" as defined in Section 37, Regulation 2900 of the Income Tax Act; this section specifically excludes the following:

  1. market research, sales promotion,
  2. quality control or routine analysis and testing of materials, devices or products,
  3. research in the social sciences or the humanities,
  4. prospecting, exploring or drilling for or producing minerals, petroleum or natural gas,
  5. the commercial production of a new or improved material, device or product or the commercial use of a new or improved process,
  6. style changes, or routine data collection,

Note:

Although the definition of "Scientific Research and Experimental Development" is considered to be the same as R&D, certain expenditures for scientific research cannot be claimed for income tax purposes (e.g., land, building). All expenditures attributable to R&D are included in this report.