Supplement to Statistics Canada's Generic Privacy Impact Assessment related to web-scraping activities for the Consumer Price Index

Date: June 2020

Program manager: Director, Consumer Prices Division

Reference to Personal Information Bank

Not applicable. No personal information is retrievable by an individual's name or other direct identifier.

Description of statistical activity

The Consumer Price Index (CPI) is a key economic indicator and is essential to measure the Canadian economy. The CPI represents changes in prices as experienced by Canadian consumers. It measures price changes by comparing, through time, the cost of a fixed basket of goods and services.

The Consumer Prices Program needs to integrate web-scraped data into the production of the Consumer Price Index (CPI) in order to keep this key economic indicator relevant by reflecting the prices of products and services bought by Canadians. As more consumers choose to shop online, it is critical that the products and prices offered online be accurately reflected in the compilation of the CPI. The collection and integration of web-scraped data also reduces response burden and collection costs.

Web-scraping is carried out using an automated program. The program can only capture variables that have been pre-identified and it rejects any miscellaneous information that has not been pre-identified. Statistics Canada will not circumvent security measures in place on any site to retrieve data which would otherwise be inaccessible.

The CPI program does not collect, create or use personal information. Any inadvertent collection of personal information is unlikely, given the nature of the websites of interest, and the type of information offered on these sites. As such, this activity is not considered privacy-invasive. However, should Statistics Canada inadvertently collect personal information during this web-scraping activity, this information will be immediately deleted and destroyed.

Reason for supplement

The Generic Privacy Impact Assessment (PIA) addresses most of the privacy and security risks related to statistical activities conducted by Statistics Canada.

The purpose of this supplement is to address any privacy risks associated with the inadvertent collection of personal information, such as employee contact information, during web-scraping activities for the CPI program. If applicable, any personal information inadvertently collected will be destroyed.

Necessity and Proportionality

The CPI program does not collect, create or use personal information. Any inadvertent collection of limited personal information during the web-scraping activity can be assessed against the four-part test proposed by the Office of the Privacy Commissioner of Canada:

  1. Necessity: In order to produce an accurate CPI, and to ensure the CPI reflects the consumption patterns of Canadians, it is necessary to obtain data for online products and services through new methods such as web-scraping. For example, more consumers are choosing to shop online, and some online products are not available in stores. Scraping information on these products and services from the web allows Statistics Canada to capture a more accurate and comprehensive picture of the goods and services Canadians are consuming.

  2. Effectiveness: Web scraping will replace or complement traditional collection methods to support the CPI program. The data is obtained directly from the retailer's websites, thereby keeping the CPI relevant while reducing business response burden and collection costs, as well as the need to have interviewers in the stores.

  3. Proportionality: The CPI program does not require any personal information or personal identifiers. The data collected by web scraping are advertised prices and product information, in the public domain on the retailer's website. The data will be used to primarily enhance coverage and, to a lesser extent, to replace what is currently collected by price collectors in stores.
    This activity will augment coverage and continue to allow for a high-quality Consumer Price Index that can be used by external stakeholders, including the public. The CPI also has a number of specific applications outlined below:

    1. It is used to escalate a given dollar value, over time, to preserve the purchasing power of that value. Thus, the CPI is widely used to adjust contracted payments, such as wages, rents, leases and child or spousal support allowances.

    2. It is used as a deflator of various economic aggregates, to obtain constant dollar estimates of income, or personal expenditure estimates at constant prices.

    3. It is used to set and monitor the implementation of economic and monetary policy.

    4. Business analysts and economists use the CPI for economic analysis and research on various issues.

  4. Alternatives: Alternative methods of collection for the CPI program were considered, including direct collection from the retailers and manual extraction of information from websites. Direct collection would require retailers to provide to Statistics Canada the price and product description information available on their web sites in file format. This method would add burden and costs on retailers, negatively impact the timeliness and quality of the CPI, and increase costs for Statistics Canada. Manual extraction of data is not feasible as it is not a cost-effective option, considering the number of products available on retailers' websites.

Mitigation factors

As no personal information is being collected and as the information sought is available publicly and used widely by Canadians, there are no additional safeguards required for this work. The privacy safeguards identified in the Generic Privacy Impact Assessment address any possible risks.

The program used to conduct the web scraping does not allow for the capture of any personal information. Any personal information that is inadvertently collected would be destroyed immediately.

Conclusion

This assessment did not identify any privacy risks that cannot be managed using existing safeguards.

Formal approval

This Supplementary Privacy Impact Assessment has been reviewed and recommended for approval by Statistics Canada's Chief Privacy Officer, Director General for Modern Statistical Methods and Data Science, and Assistant Chief Statistician for Economic Statistics. The Chief Statistician of Canada has the authority for section 10 of the Privacy Act for Statistics Canada, and is responsible for the Agency's operations, including the program area mentioned in this Supplementary Privacy Impact Assessment.

Eric Rancourt
Director General
Modern Statistical Methods and Data Science

Linda Howatson-Leo
Chief Privacy Officer

Greg Peterson
Assistant Chief Statistician
Economic Statistics

Anil Arora
Chief Statistician of Canada

Date modified: