Web scraping

In this information era, it is more important than ever to provide Canadians with reliable and timely data in order to enable informed decision-making.

Statistics Canada is using web scraping to gather data efficiently. Web scraping is a process by which information is collected and copied from the Internet for analysis.

The use of web scraping is part of a broader effort to reduce burden on businesses and organizations while continuing to provide high-quality, timely data in a cost-effective manner.

In the spirit of openness and transparency, Statistics Canada is committed to respecting the following best practices when conducting web scraping.

Statistics Canada will:

  • Transparency
    • carry out web scraping activities in a transparent, consistent and ethical manner;
    • notify the relevant companies that web scraping activities will be taking place;
    • publish the results of the web scraping activities on its website;
    • conduct all web scraping activities on Statistics Canada authorized computer equipment connected to its highly secure networks and secure the data on encrypted servers.
  • Ethics
    • use web scraped data appropriately and responsibly in statistical programs in order to facilitate fulfilment of its mandate;
    • collect only data available to the public from businesses and organizations for use in its statistical and research programs;
    • take steps to minimize burden on the websites, such as scraping during off-peak hours and only as needed, and coordinating data requirements across statistical programs to avoid duplicating efforts;
    • use an application programming interface (API) when possible in lieu of web scraping;
    • limit collection to only what is necessary and proportional for the production of the required statistical outputs.

Statistics Canada will not:

  • scrape personal information about individuals from any website;
  • scrape personal information that could establish a profile of individuals;
  • resell web scraped data or use them for commercial purposes;
  • scrape any information that will not be used to produce statistical outputs.

Useful links