Response Rate for Sawmills, production of lumber (softwood and hardwood) by Geography 2020

Table 1: Response Rate for Sawmills, production of lumber (softwood and hardwood) by Geography
Quantities produced (M.ft. b.m)
Geography Month
202001 202002 202003 202004 202005 202006 202007 202008 202009 202010 202011 202012
Canada 0.83 0.79 0.79 0.80 0.80 0.80 0.82 0.82 0.82 0.82 0.78 0.78
Newfoundland and Labrador  0.87 0.90 0.90 0.89 0.89 0.87 0.85 0.87 0.88 0.89 0.90 0.89
Prince Edward Island 0.46 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.21 0.05
Nova Scotia 0.33 0.34 0.56 0.54 0.75 0.50 0.36 0.60 0.77 0.96 0.80 0.61
New Brunswick 0.90 0.91 0.82 0.87 0.92 0.91 0.95 0.89 0.85 0.80 0.92 0.83
Quebec 0.77 0.75 0.73 0.78 0.76 0.78 0.68 0.79 0.77 0.79 0.81 0.78
Ontario 0.84 0.65 0.69 0.77 0.86 0.79 0.85 0.83 0.87 0.84 0.86 0.87
Manitoba 0.92 0.89 0.88 0.91 0.00 0.00 0.92 0.91 0.93 0.90 0.92 0.00
Saskatchewan 0.66 0.66 0.90 0.61 0.65 0.73 0.75 0.76 0.75 0.76 0.77 0.78
Alberta 0.82 0.81 0.80 0.81 0.80 0.80 0.83 0.82 0.83 0.66 0.59 0.71
British Columbia 0.88 0.85 0.85 0.82 0.79 0.83 0.89 0.83 0.83 0.91 0.80 0.80
British Columbia Coast 0.85 0.66 0.82 0.86 0.91 0.88 0.81 0.84 0.79 0.89 0.37 0.32
British Columbia Interior 0.89 0.87 0.86 0.82 0.78 0.82 0.90 0.83 0.84 0.91 0.84 0.84
Northern Interior, British Columbia 0.95 0.93 0.86 0.86 0.85 0.86 0.97 0.88 0.85 0.97 0.90 0.89
Southern Interior, British Columbia 0.82 0.81 0.84 0.77 0.70 0.78 0.80 0.77 0.83 0.83 0.78 0.79

Record linkage results per province – 2019

Record linkage results per province
Linkage March 2021 Release
Linkage RateTablenote 1 False Discovery RateTablenote 2 False Negative error RateTablenote 3
%
Nova Scotia Census (individuals) 87.8 < 0.5 < 1.0
Tax & Social Insurance Registry (individuals) 94.2 < 0.5 < 2.5
Business Register (non-individuals) 95.6 < 0.5 < 2.0
New Brunswick Census (individuals) 84.9 < 3.0 < 3.0
Tax & Social Insurance Registry (individuals) 92.3 < 2.5 < 6.0
Business Register (non-individuals) 95.3 < 2.0 < 2.0
Ontario Census (individuals) 93.2 < 1.0 < 0.5
Tax & Social Insurance Registry (individuals) 98.2 < 0.5 < 0.5
Business Register (non-individuals) 97.4 < 2.0 < 1.0
British Columbia Census (individuals) 90.5 < 1.0 < 2.0
Tax & Social Insurance Registry (individuals) 96.8 < 1.0 < 1.5
Business Register (non-individuals) 97.3 < 6.0 < 1.5
Tablenote 1

Linkage Rate: The linkage rate is calculated as the percentage of owner records with accepted links to the database shown. It is the denominator for the false discovery rate (FDR). While it is not a data quality indicator alone, in addition to the FDR and the false negative error rate (FNR) it provides a complete picture of the overall linkage quality.

Return to tablenote 1 referrer

Tablenote 2

False Discovery Rate (FDR): The FDR is calculated as the percentage of records with false links among records with accepted links (i.e., a record with a false link is a record that was linked incorrectly).

Return to tablenote 2 referrer

Tablenote 3

False Negative error Rate (FNR): The FNR is calculated as the percentage of records with true links which were not found in the linkage process (i.e., records that were not linked when they should have been).

Return to tablenote 3 referrer

2021 Census: Collective dwellings

Getting started

Why are we conducting this survey?

Thank you for taking a few minutes to participate in the 2021 Census. The information you provide is converted into statistics used by communities, businesses and governments to plan services and make informed decisions about employment, education, health care, market development and more.

Your answers are collected under the authority of the Statistics Act and kept strictly confidential. By law, all residents living in facilities and establishments must be included in the 2021 Census.

Statistics Canada makes use of existing sources of information such as immigration, income tax and benefits data to ensure the least amount of burden is placed on respondents.

Make sure you count all residents into Canada's statistical portrait, and complete the census questionnaire today.

Your information may also be used by Statistics Canada for other statistical and research purposes.

Your participation in this survey is required under the authority of the Statistics Act.

Other important information

Authorization to collect this information

Data are collected under the authority of the Statistics Act, Revised Statutes of Canada, 1985, Chapter S-19.

Confidentiality

By law, Statistics Canada is prohibited from releasing any information it collects that could identify any person, business, or organization, unless consent has been given by the respondent, or as permitted by the Statistics Act. Statistics Canada will use the information from this survey for statistical purposes only.

Record linkages

To enhance the data from this survey and to reduce the reporting burden, Statistics Canada may combine the acquired data with information from other surveys or from administrative sources.

Facility and contact information

1. Verify or provide the facility name and correct where needed.

  • Facility name

2. Verify or provide the contact information of the designated facility contact person for this questionnaire and correct where needed.

Note: The designated contact person is the person who should receive this questionnaire. The designated contact person may not always be the one who actually completes the questionnaire.

  • First name
  • Last name
  • Title
  • Preferred language of communication
  • Mailing address (number and street)
  • City
  • Province, territory or state
  • Postal code or ZIP (Zone Improvement Plan) code
    Example : A9A 9A9 or 12345-1234
  • Country
  • Email address
    Example : user@example.gov.ca
  • Telephone number (including area code)
    Example: 123-123-1234
  • Extension number (if applicable)

3. Is this the current civic address of this facility?

Note: If the address below is missing or incomplete, please answer No and provide the complete address.

  • Yes
  • No
    • Please enter the correct civic address of this facility.
      • Civic Number
      • Suffix
      • Street Name
      • Type
      • Direction
      • City
      • Province/territory

Facility information

The following questions will help us better understand the nature of this facility.

1. Does this facility allow for persons to stay overnight?

This includes patients, residents, clients, staff, owners, managers and their family members.

  • Yes
  • No
    • If No, exit survey.

Dwelling types

2. Which of the following best describes this facility?

Note: Press the help button (?) for additional information.

  • Hospital
    • If selected, go to Q5.
      • Is this facility licensed as a hospital?
        • Yes
        • No
  • Nursing home or residence for senior citizens
    • What levels of service are provided to residents at this facility?
      • Extended health care services (professional health care monitoring, nursing care and supervision 24 hours a day). Residents are not independent in most activities of daily living.
      • Support services or assisted living services (meals, housekeeping, laundry, supervision of medication, assistance bathing or dressing, etc.), but no extended health care services. Residents are independent in most activities of daily living.
      • Extended health care services to some residents, but only support services or assisted living services to other residents.
        e.g., a facility that is a mix of both a nursing home and a residence for senior citizens.
      • No care or services are provided to residents
        • If selected, exit survey.
  • Residential care facility, such as a group home for persons with disabilities or addictions
    • Is this facility for:
      Select all that apply.
      • Primarily children or minors
      • Persons with psychological disabilities
      • Persons with an addiction
      • Persons with physical challenges or disabilities
      • Persons who are developmentally delayed
      • Persons with other disabilities or addictions
        • Specify the other types of persons at this facility:
  • Shelter
    • Who does this shelter primarily serve?
      • Homeless persons
      • Ex-inmates in halfway houses
      • Abused women and their children
      • Refugees and asylum seekers
      • Other persons
        • Specify who this facility primarily serves
  • Correctional or custodial facility
    • What type of facility is this?
      • Federal correctional facility
      • Young offenders' facility
      • Provincial or territorial custodial facility
      • Jail or police lock-up facility
  • Lodging or rooming house
    • If selected, exit survey.
  • Religious establishment such as a convent, monastery or seminary
  • Hutterite colony
    • If selected, exit survey.
  • Establishment with temporary accommodation services such as a hotel, campground, YMCA/YWCA, Ronald McDonald House or hostel
    • What type of establishment is this?
      • Hotel, motel or tourist establishment
      • Campground or park
      • Another establishment with temporary accommodations services such as a YMCA/YWCA, Ronald McDonald House or hostel
  • Other establishment such as a school residence, military base, work camp or vessel
    • What type of establishment is this?
      • A school residence or training centre residence
      • A military base
      • A commercial vessel
      • A work camp
      • A government vessel
      • Other type of establishment
  • None of the above
    • If selected, go to Q3.

Go to Q6, unless otherwise specified.

You previously selected "none of the above". The following questions will help determine if this facility should be included in this survey or not.

3. Does this facility provide any care or services to its occupants?

  • Yes
  • No
    • Does this facility share any common amenities such as a kitchen or bathrooms with its occupants?
      • Yes
      • No
        • If no, exit survey.

4. According to the information provided, this facility meets the requirements and must be included in this survey. Please indicates which category best describes this facility.

Note: Press the help button (?) for additional information.

  • Hospital
    • If selected, go to Q5.
      • Is this facility licensed as a hospital?
        • Yes
        • No
  • Nursing home or residence for senior citizens
    • What levels of service are provided to residents at this facility?
      • Extended health care services (professional health care monitoring, nursing care and supervision 24 hours a day). Residents are not independent in most activities of daily living.
      • Support services or assisted living services (meals, housekeeping, laundry, supervision of medication, assistance bathing or dressing, etc.), but no extended health care services. Residents are independent in most activities of daily living.
      • Extended health care services to some residents, but only support services or assisted living services to other residents.
        e.g., a facility that is a mix of both a nursing home and a residence for senior citizens.
  • Residential care facility, such as a group home for persons with disabilities or addictions
    • Is this facility for:
      Select all that apply.
      • Primarily children or minors
      • Persons with psychological disabilities
      • Persons with an addiction
      • Persons with physical challenges or disabilities
      • Persons who are developmentally delayed
      • Persons with other disabilities or addictions
        • Specify the other types of persons at this facility
  • Shelter
    • Who does this shelter primarily serve?
      • Homeless persons
      • Ex-inmates in halfway houses
      • Abused women and their children
      • Refugees and asylum seekers
      • Other persons
        • Specify who this facility primarily serves
  • Correctional or custodial facility
    • What type of facility is this?
      • Federal correctional facility
      • Young offenders' facility
      • Provincial or territorial custodial facility
      • Jail or police lock-up facility
  • Lodging or rooming house
    • If selected, exit survey.
  • Religious establishment such as a convent, monastery or seminary
  • Hutterite colony
    • If selected, exit survey.
  • Establishment with temporary accommodation services such as a hotel, campground, YMCA/YWCA, Ronald McDonald House or hostel
    • What type of establishment is this?
      • Hotel, motel or tourist establishment
      • Campground or park
      • Another establishment with temporary accommodations services such as a YMCA/YWCA, Ronald McDonald House or hostel
  • Other establishment such as a school residence, military base, work camp or vessel
    • What type of establishment is this?
      • A school residence or training centre residence
      • A military base
      • A commercial vessel
      • A work camp
      • A government vessel
      • Other type of establishment

Go to Q6, unless otherwise specified.

5. What services are provided at this facility?

  • Short-term care
  • Long-term care
  • Both short-term and long-term care

Maximum capacity

6. What is the maximum number of persons who can stay overnight at this facility?

If the number of persons is unknown, provide the number of rooms. If the exact number is unknown, enter your best estimate.

  • Maximum number
    • If 0, exit survey.
  • Persons or rooms
    • If establishment with temporary accommodation services or other establishment was selected in Q2 or Q4, go to Q34.

Residential information

All residents of shelters should be counted in this question.

The census counts people at the place where they usually live.

7. How many people are living at this address and consider it to be their main residence, even if they are temporarily away?

This includes persons who have lived at this facility for 6 months or more (admitted on or before November 11, 2020) or those for which the facility is their only residence in Canada (i.e. they have no other residence).

Include live-in staff, owners and managers.

Exclude any persons who live in a private dwelling attached to your facility.

Note: Press the help button (?) for additional information.

  • Number of persons
    • If response is 0, skip Q12 to Q14.

8. Is there anyone who is staying at this address temporarily and who has their main residence located at another address in Canada?

  • Yes
    • Number of persons
  • No

9. Is there anyone who is staying at this address and who is a resident of another country visiting Canada?

  • Yes
    • Number of persons
  • No

Private dwelling attached to the facility

10. Other than this facility, are there any private dwellings at this address?

A private dwelling is a separate set of living quarters with a private entrance either from outside the building or from a common hall, lobby, vestibule or stairway inside the building. The entrance to the dwelling must be one that can be used without passing through the living quarters of other persons.

Include only dwellings:

  • with the same civic address as the facility, but with a different apartment or unit number
  • that are not part of the commercial, institutional or communal purpose of the facility (i.e. persons in these dwellings do not receive any care or services from the facility)

Exclude any dwellings occupied by live-in employees, owners and managers.

  • Yes
    • Number of dwellings
      • If 0, skip Q11 and Q15.
  • No
    • If no, skip Q11 and Q15.

11. Indicate the number of occupants who are currently living in private dwellings attached to this facility.

Exclude live-in staff, owners and managers.

Include only persons who do not receive any care or services from this facility.

  • Number of occupants
    • If 0, skip Q15.

File containing information about residents

As part of the Canadian Census of Population, Statistics Canada is collecting the following information from residents in facilities such as yours.

This includes:

  • name
  • other address if main residence is elsewhere in Canada
  • date of admission
  • date of birth
  • age
  • sex at birth
  • gender
  • legal marital status
  • common-law status
  • status at facility and/or relationship between persons living in the same unit
  • languages
  • Canadian military experience

12. Does this facility keep electronic records that contain some or all of this information for all persons that consider this facility to be their main residence (i.e. those who have lived here for 6 months or more (admitted on or before November 11, 2020) or those for which this facility is their only residence in Canada)?

Include all persons indicated in question 7, i.e. usual residents, live-in staff, owners and managers.

  • Yes
    • If yes, skip Q14.
  • No
    • If no, go to Q14.

Attach files

13. Please attach your records in any file format, even if some of the required information is not available.

Note: After 2 hours of inactivity, your session will time out. You will not be able to access any unsaved information. Please click on Save and finish later below before proceeding in order to save your responses up to this point. You will be asked to create a password and a security question and then you will be given the option to resume the questionnaire.

To attach files

  • Press the Attach files button.
  • Choose the file to attach. Multiple files can be attached.

Note:

  • Each file must not exceed 5MB.
  • The attachments combined must not exceed 50MB.
  • The name and size of each file attached will be displayed on the page.
  • Attach files

14. In the questionnaire provided below, please respond to each census question to the best of your ability, for all persons that consider this facility to be their main residence (i.e., those who have lived here for 6 months or more (admitted on or before November 11, 2020) or those for which this facility is their only residence in Canada).

Include all persons indicated in question 7, i.e., usual residents, live-in staff, owners and managers.

Note: After 2 hours of inactivity, your session will time out. You will not be able to access any unsaved information. Please click on Save and finish later below before proceeding in order to save your responses up to this point. You will be asked to create a password and a security question and then you will be given the option to resume the questionnaire.

To complete the questionnaire

Download the questionnaire, complete it and save the final version to your computer.

To attach and submit your competed questionnaire

Return to this page and follow the instructions below. Once your files are attached, use the navigation buttons to submit your questionnaire.

To attach files

  • Press the Attach files button.
  • Choose the file to attach. Multiple files can be attached.

Note:

  • Each file must not exceed 5MB.
  • The attachments combined must not exceed 50MB.
  • The name and size of each file attached will be displayed on the page.
  • Attach files

Files containing information about the occupants of units attached to the facility

As part of the Canadian Census of Population, Statistics Canada is collecting the following information from all Canadians, including people living in private dwellings attached to this facility.

This includes:

  • name
  • date of birth
  • age
  • sex at birth
  • gender
  • legal marital status
  • common-law status
  • relationship between persons living in the same unit
  • languages
  • Canadian military experience

15. In the questionnaire provided below, please respond to each census question to the best of your ability, for all persons living in a private dwelling that has the same civic number as this facility, but does not receive any care or services from your facility.

Include all persons indicated in question 11, i.e., only persons who do not receive any care or services from this facility.

Exclude live-in staff, owners and managers.

Note: Apartment or unit number must be different.

Note: After 2 hours of inactivity, your session will time out. You will not be able to access any unsaved information. Please click on Save and finish later below before proceeding in order to save your responses up to this point. You will be asked to create a password and a security question and then you will be given the option to resume the questionnaire.

To complete the questionnaire

Download the questionnaire, complete it and save the final version to your computer.

To attach and submit your competed questionnaire

Return to this page and follow the instructions below. Once your files are attached, use the navigation buttons to submit your questionnaire.

To attach files

  • Press the Attach files button.
  • Choose the file to attach. Multiple files can be attached.

Note:

  • Each file must not exceed 5MB.
  • The attachments combined must not exceed 50MB.
  • The name and size of each file attached will be displayed on the page.
  • Attach file
    • If establishment with temporary accommodation services or other establishment was NOT selected in Q2 or Q4, go to Q39.

Residential information

34. The census counts people at the place where they usually live.

Do you know how many people live at this address and consider it to be their only residence in Canada (i.e., they have no other residence)?

Include live-in staff, owners and managers.

Note: Press the help button (?) for additional information.

  • Yes
    • Number of people
      • Go to Q36.
  • No

35. You previously reported that the number of persons who usually live at this address is unknown. Please provide an estimated number of people for which this establishment is their only residence in Canada.

Include live-in staff, owners and managers.

Note: Press the help button (?) for additional information.

  • Estimated number of people
  • or Don't know
    • If don't know, exit survey.

Private dwellings attached to the facility

36. Other than this establishment, are there any private dwellings at this address?

A private dwelling is a separate set of living quarters with a private entrance either from outside the building or from a common hall, lobby, vestibule or stairway inside the building. The entrance to the dwelling must be one that can be used without passing through the living quarters of other persons.

Include only dwellings:

  • with the same civic address as the establishment, but with a different apartment or unit number
  • that are not part of the commercial, institutional or communal purpose of the establishment (i.e., persons in these dwellings do not receive any services from the establishment nor share certain common facilities, such as kitchen or bathroom, with the occupants of this establishment).

Exclude any dwellings occupied by live-in employees, owners and managers.

  • Yes
    • Number of dwellings
      • If 0, go to Q39.
  • No
    • If no, go to Q39.

37. Indicate the unit or apartment number for each private dwelling.

The number of units entered on this page should match the number specified in question 36.

  • Private dwelling #
    • Unit/apartment number

Units

38. For each unit number, indicate if they are currently occupied or unoccupied.

  • Private dwelling #
    • Occupied
      • Number of occupants
    • Unoccupied

Feedback

39. Do you have any comments about this questionnaire?

Please use this section if you have concerns, suggestions or comments to make about:

  • the steps to follow or the content of this questionnaire (e.g., a question that was difficult to understand or to answer)
  • the characteristics of the online questionnaire (e.g., the navigation, the online help, the design, the format, the size of the text)
  • any technical issues encountered.
  • Enter your comments

Wholesale Trade Survey (monthly): CVs for total sales by geography - January 2021

Monthly Wholesale Trade Survey - CVs for Total sales by geography
Geography Month
202001 202002 202003 202004 202005 202006 202007 202008 202009 202010 202011 202012 202101
percentage
Canada 0.7 0.7 0.6 0.8 0.8 0.7 0.7 0.7 0.7 0.5 0.6 0.6 0.7
Newfoundland and Labrador 0.7 0.3 1.2 0.7 0.5 0.1 0.2 0.4 0.3 0.3 0.4 0.4 0.7
Prince Edward Island 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Nova Scotia 2.6 2.0 2.8 3.3 4.0 2.3 1.5 1.8 1.7 2.4 3.4 7.5 1.5
New Brunswick 2.6 1.2 1.3 2.1 3.3 1.9 2.1 4.2 3.4 2.6 2.8 2.9 3.7
Quebec 1.4 2.1 1.6 2.4 2.0 1.9 1.8 2.1 2.0 1.5 1.5 1.7 1.8
Ontario 1.2 0.9 1.0 1.2 1.1 1.1 1.1 0.9 0.9 0.8 0.9 0.8 1.2
Manitoba 1.3 0.8 1.0 2.9 2.8 1.2 1.2 1.8 2.3 1.4 1.4 1.8 1.1
Saskatchewan 0.5 0.6 0.5 1.2 0.7 0.7 1.1 1.6 0.6 0.8 0.7 0.9 0.9
Alberta 1.0 0.9 1.2 2.9 2.9 2.3 2.3 1.8 3.3 1.3 1.1 1.7 1.1
British Columbia 1.3 1.6 1.5 1.3 1.7 1.6 1.3 1.9 1.8 1.4 1.4 1.4 1.4
Yukon Territory 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Northwest Territories 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Nunavut 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Monthly Survey of Manufacturing: National Level CVs by Characteristic - January 2021

National Level CVs by Characteristic
Month Sales of goods manufactured Raw materials and components inventories Goods / work in process inventories Finished goods manufactured inventories Unfilled Orders
%
January 2020 0.64 0.99 1.26 1.32 1.10
February 2020 0.63 1.02 1.22 1.36 1.08
March 2020 0.68 0.99 1.17 1.41 1.10
April 2020 0.87 0.99 1.20 1.41 1.10
May 2020 0.80 1.04 1.13 1.37 1.06
June 2020 0.69 1.05 1.19 1.38 1.06
July 2020 0.69 1.02 1.15 1.43 1.10
August 2020 0.64 1.05 1.23 1.50 1.20
September 2020 0.67 1.05 1.22 1.54 1.20
October 2020 0.69 1.02 1.18 1.53 1.15
November 2020 0.72 1.08 1.20 1.46 1.33
December 2020 0.69 1.04 1.22 1.45 1.37
January 2021 0.78 1.00 1.23 1.56 1.49

Greenhouse Detection with Remote Sensing and Machine Learning: Phase One

By: Stan Hatko, Statistics Canada

A modernization effort is underway at Statistics Canada to replace agricultural surveys with more innovative data collection methods. A key part of this modernization is the use of remote sensing classification methods of land use mapping and building detection from satellite imagery.

Currently, Statistics Canada conducts the Census of Agriculture every five years to collect information on topics such as population, yields, technology and agricultural greenhouse use in Canada. Data scientists have been teaming up with subject matter experts to modernize the collection of these data, as traditional methods are not sustainable in the long term. Innovative methods are needed to ensure the agency can continue to produce new information for the agriculture sector in an efficient manner. This project will allow the agency to make data available in a more timely manner and reduce the response burden for agricultural operators.

This project explores the machine learning techniques used to detect the total area of greenhouses in Canada from satellite imagery.

Satellite imagery

This project used RapidEye satellite images which have 5-metre pixel resolution (that is, each pixel is a 5 m by 5 m square) with 5 spectral bands.

Graphical representation of spectral bands for RapidEye satellite imagery
Description for Figure 1 - Graphical representation of spectral bands for RapidEye satellite imagery

A graphical representation of the spectral range of each band in a RapidEye output image: (1) blue (440 – 510 nm), (2) green (520 – 590 nm), (3) red (630 – 685 nm), (4) red edge (690 – 730 nm), and (5) near infrared (760 – 850 nm).

 

This imagery was chosen due to its relative availability and cost. Lower resolution imagery is not always adequate to detect greenhouses, and higher resolution imagery would have proven prohibitively expensive given the total area required to cover the Canadian agricultural sector.

Labelled shape data

For certain sites the subject matter experts labelled data in the form of Shapefiles indicating which areas correspond to greenhouses. This was done manually by looking at extremely high resolution satellite and aerial imagery (using Google Earth Pro and similar software), and highlighting the area corresponding to greenhouses.

These labelled data had two roles:

  • Training data (from certain sites) to build a machine learning classifier to determine the area covered by greenhouses.
  • Testing data (from other sites) to evaluate the performance of the classifier.

Labelled data from Leamington, Ontario; Niagara, Ontario; and Fraser Valley, British Columbia were produced. Certain sites were chosen as training sites (like Leamington West), while others were chosen to be testing sites (like Leamington East).

Figure 2 is an example of RapidEye imagery of a region together with the greenhouse labelling file.

The five spectral bands and greenhouse indicator based on shape file for one area of interest.
Description for Figure 2 - The five spectral bands and greenhouse indicator based on Shapefiles for one area of interest

A comparison of each of the five spectral bands against the Shapefiles of labelled greenhouses.

 

The labelled data were broken down into sites and sub-sites to train and validate the machine learning model. The training sites were:

  • Leamington West
  • Niagara North: N1, N1a, N3
  • Fraser South: S1, S2, S3, S4, S5

Validation sites were used to test the model were:

  • Leamington East
  • Niagara South: S1, S2
  • Fraser North: N2, N3, N5

Machine learning methodology

For each point, the data scientists needed to predict if it corresponded to a greenhouse or not, as well as a predicted probability of each point being a greenhouse.

For prediction, given a point, a window of specified size was taken around the point. This was fed in the data in this window to the classifier, which attempted to then predict if the central point is a greenhouse or not. The window around the point provided additional context to help the classifier determine if the central point is a greenhouse or not.

The classifier
Description for Figure 3 - The classifier

A grid representing an input window that evaluates pixels in a source image to try and classify them as greenhouses or not.

The classifier needs to determine if the central dark point corresponds to a greenhouse, based on the highlighted area around that point.

 

This process was repeated for every point in the image (except near borders), and resulted in a map showing the exact area covered by greenhouses.

For training, a sample of many such points (with the window around each point) was taken and fed in (with the label) to construct the model. The training set size was also increased by applying various transformations, for instance rotating the input image by different angles for different points.

Initial work and transition to cloud

Originally the work was done on a Statistics Canada internal system with 8 CPU cores and 16 GB of RAM. Several algorithms were tested for the classifier, including support vector machines, random forests, multilayer perceptron, and multilayer perceptron with principal component analysis (PCA).

The best results were obtained with PCA and multilayer perceptron, resulting in an F1 Score of 0.89 to 0.90 for Leamington East. Various system limitations were reached during this work, such as a lack of a dedicated Graphics Processing Unit (GPU). The GPU is required to efficiently train more complex models involving convolutional neural networks.

The public cloud was explored as an option as there were no sensitive data for this project. The project was transferred to the Microsoft Azure cloud, on a system with 112 GB of RAM, large amounts of storage and a very powerful NVIDIA V100 GPU. The Microsoft Azure Storage Explorer software was used to transfer data to and from the storage account.

Convolutional neural networks

Convolutional neural networks (ConvNets) incorporate the concepts of locality (neighbourhood around point in image being important) and translation invariance (same features useful everywhere) into a neural network. Architectures based on this have been considered state-of-the-art in image recognition for several years.

A layer in a basic ConvNet works as follows:

  • Around each point in the image or previous layer, a small window (for instance, 3x3) is taken.
  • The data in that window are multiplied by a matrix, to which the activation is applied (a bias can be added as well).
  • This process is repeated for every point in the image (or previous layer), to obtain the new layer. The same matrix is used each time.

This equivalently corresponds to multiplying by a large sparse matrix, with certain weights tied to same values, followed by the activation.

Diagram explaining how convolutional neural networks work
Figure 4 - Diagram explaining how convolutional neural networks work

Many different architectures based on ConvNets are possible. This project tested the following options:

  • Simple ConvNet: Apply convolutional layers in sequence (output of layer is input to next), followed by fully connected.
  • ResNet: Apply convolutional layer with same size output, and add to original (so input of next layer is sum of original and this layer). Can repeat this for many layers. Has been used to train extremely deep networks.
  • DenseNet: Apply convolutional layer, and append outputs to original as new channels. Each layer adds new channels, which can be useful features.
  • Custom branched architecture: Crop central part of window, and apply one convolutional network. Take the whole image, and apply another network (with more dimensionality reduction based on pooling layers). Merge both at the end in fully connected layer. This allows the user to focus in on the part near the central point, while getting some context around it.

The data scientists used the custom branched architecture for this project, as shown in figure 5.

Diagram of the convolutional neural network architecture chosen for this project
Description for Figure 5 - Diagram of the convolutional neural network architecture chosen for this project
  • The input has a window size of 10 around the central point (square of size 21 x 21), with the 5 spectral bands from RapidEye.
  • A convolutional layer with 64 filters, kernel size 3, and stride 1 is applied. Batch normalization is applied, followed by the ReLU (rectified linear unit) nonlinearity.
  • The output of the above is then split into two parts, one that focuses on the central region and one that considers a larger context window with down sampling.
  • For the first path (the ‘focus' path), the following is performed:
    • A window of size 5 around the central point is taken, and that part is subsetted (an 11 x 11 square centered around the central point).
    • A convolutional layer with 64 filters, kernel size 3, and stride 1 is applied. This is followed by batch normalization and the ReLU nonlinearity.
    • A convolutional layer with 64 filters, kernel size 3, and stride 1 is applied. This is followed by batch normalization and the ReLU nonlinearity.
  • For the second path (the ‘surround' path), the following is performed:
    • A convolutional layer with 64 filters, kernel size 3, and stride 1 is applied. This is followed by batch normalization and the ReLU nonlinearity.
    • Max-pooling of size 2 is applied.
    • A convolutional layer with 64 filters, kernel size 3, and stride 1 is applied. This is followed by batch normalization and the ReLU nonlinearity.
  • The output for both of the above paths is flattened and concatenated.
  • A dense layer with 128 units is applied, followed by batch normalization and the ReLU nonlinearity.
  • A dense layer with 64 units is applied, followed by batch normalization and the ReLU nonlinearity.
  • The output layer with a single linear output is used, followed by the sigmoid function to produce a probability.
  • For prediction, the above output is used as-is for the predicted probability of the point being a solar panel. A threshold of 0.5 is used for discrete prediction (if greater than 0.5, is a greenhouse, otherwise not a greenhouse). For training, the binary cross entropy loss is used with the above as the predicted value and the shapefile label as the ground truth label.

For optimization, the ADAM optimizer was used with a learning rate of 10-5. A mini-batch size of 5,000 was used, and the training was done for 50 epochs.

Results

After the model was trained, it was tested on each of the validation sites in Leamington East, Niagara South, and Fraser North. The results are summarized in the table below.

Table 1 - Numerical performance results for greenhouse detection (per-pixel quality measures)
Region Leamington East Fraser N2 Fraser N3 Fraser N5 Niagara S1 Niagara S2
Count Unknown 338443 292149 292149 246299 388479 388479
Count True Negative (TN) 14320042 12347479 12350813 8608499 24597241 24598805
Count False Positive (FP) 9984 1069 1875 2337 2143 2411
Count False Negative (FN) 6880 957 1069 5474 3248 1049
Count True Positive (TP) 138315 8346 4094 5041 8889 9256
Accuracy 0.998835 0.999836 0.999762 0.999094 0.999781 0.999859
Precision 0.932677 0.886458 0.685877 0.683247 0.805747 0.793349
Recall 0.952615 0.89713 0.79295 0.47941 0.732389 0.898205
F1 0.942541 0.891762 0.735537 0.563461 0.767318 0.842527
AUROC 0.999508 0.999728 0.998477 0.962959 0.977933 0.999949

For Leamington, the result obtained was very good: the greenhouses were picked up well and false positives were small. The number of misclassified points (FP and FN) was much smaller than both the correct classes (TN and TP). This area has the best overall F1 score at slightly over 0.94.

A spatial representation of Leamington East Results
Description for Figure 6 - Leamington East Results A spatial representation of the classification of detected items as True Positive, True Negative, False Positive, False Negative, or unknown.
 

For Niagara, the results were generally good: most of the greenhouse area was predicted correctly. There was a false positive greenhouse below left of the detected greenhouses in Niagara S1 (Figure 7). This corresponds to a river-coastal area. Originally this false positive was significantly larger, but increasing the sample size for a coastal urban area (with a fairly straight coastline) significantly reduced the size and also helped with some other areas. If more coastline images were added to the training set (with different river beds, etc.) this error may be further reduced.

A spatial representation of Niagara S1 greenhouse results
Description for Figure 7 - Niagara S1 greenhouse results A spatial representation of the classification of detected items as True Positive, True Negative, False Positive, False Negative, or unknown.
 
A spatial representation of Niagara S2 greenhouse results
Description for Figure 8 - Niagara S2 greenhouse results A spatial representation of the classification of detected items as True Positive, True Negative, False Positive, False Negative, or unknown.
 

For Fraser, the results varied depending on the area. For Fraser N2 (Figure 9) the results were good. The results were not as good for Fraser N3 (Figure 10), as a cluster of small greenhouses right of the detected greenhouses were missed (along with some false positives). For Fraser N5 (Figure 11) a significant number of greenhouses were missed. Various experimentation so far has not improved the results for Fraser. To improve these results, the team would need to investigate what type of greenhouses these are, if additional areas containing these types of greenhouses can be added to the training set, and even if this type of greenhouse can be detected from the 5m satellite images.

A spatial representation of Fraser N2 greenhouse results
Description for Figure 9 - Fraser N2 greenhouse results A spatial representation of the classification of detected items as True Positive, True Negative, False Positive, False Negative, or unknown.
 
A spatial representation of Fraser N3 greenhouse results
Description for Figure 10 - Fraser N3 greenhouse results A spatial representation of the classification of detected items as True Positive, True Negative, False Positive, False Negative, or unknown.
 
A spatial representation of Fraser N5 greenhouse results
Description for Figure 11 - Fraser N5 greenhouse result A spatial representation of the classification of detected items as True Positive, True Negative, False Positive, False Negative, or unknown.
 

Conclusion

Overall, convolutional neural networks were successfully used to detect greenhouses from satellite images in multiple areas. This was particularly true in the areas of Leamington, Niagara, and Fraser. Other areas are still showing low prediction levels for greenhouses. Additionally, there are still issues with small greenhouses in all three areas of interest, which were not large enough to be detected in the 5m RapidEye satellite imagery. These challenges could be solved by higher resolution aerial acquisitions.

The next phase of this project will explore greenhouse detection from higher resolution aerial images. Different methodologies are used when working with higher resolution aerial imagery, for instance the use of UNet-based image segmentation architectures to identify areas corresponding to greenhouses, which we look forward to exploring in a future article.

Date modified:

Writing a Satellite Imaging Pipeline, Twice: A Success Story

By: Blair Drummond, Statistics Canada

Statistics Canada is modernizing agriculture data collection by using satellite imagery to predict crop growth. The data scientists were faced with a series of challenges throughout this project, including the seemingly prohibitive expense of scaling the project to production requirements (despite promising initial results). They rose to the occasion by considering all options, including non-obvious ones, and experienced first-hand the value in having a diversely skilled team.

A team of Statistics Canada data scientists, working with experts from the agency's agriculture program, created a successful machine-learning proof-of-concept. They implemented a neural network which achieved 95% accuracy in predicting what crop was growing in a quarter-section (160 acre lot), using freely available satellite imagery. This represented a great opportunity for StatCan's Agriculture program, because the use of satellite imagery offers a way to get mid-season estimates, or even near real-time estimates, and the new approach helps to reduce the response burden for agricultural operators who were required to fill out surveys regularly.

But there was just one problem. The implementation produced by the proof-of-concept had a preprocessing step that extracted pixel data from the Landsat 8 satellite images and applied some transformations. On a single image, this process took approximately one day to complete, used over a 100 gigabytes of RAM, and the cost was estimated at $50 per image on a basic cloud desktop Virtual Machine (VM). The training set consisted of seven years and three provinces worth of data, or around 1,600 images total. If everything worked the first time using available public cloud infrastructure, the project would cost $80,000, before even getting to the step of training the model.

This cost would have been prohibitive for this project, which was still in the experimentation phase. It was not clear if the model would work well on a larger scale than this experiment, and the cost to develop a new model had this substantial hurdle in front of it. The data scientists were fairly confident that they could find a model that would work, but they had to make it more economical. Unphased, they set out on a small proof-of-concept to see if they could get over this hurdle and make the preprocessing steps economically viable.

The proof-of-concept: the first three pipelines

When the data scientists started the pipeline experiment for this project in the fall of 2019, the cloud was still relatively new to them, Data Analytics as a Service (DAaaS) was a young project, and the team was familiarizing themselves with what was newly available. The proof-of-concept was certainly aimed at solving a particular problem, but it was also a division-wide experiment to figure out how to navigate the cloud. For that reason, they took this on as a joint project with the DAaaS team and cloud solution architects. The aim was to get experience with a range of different technologies, including:

Relative cost of implementing evaluated services
Figure 1 - Cost/image

The Azure Batch and Azure Machine Learning solutions were to be implemented by a team who had expertise in those, in close collaboration with the Data Science Division (DScD). The DScD worked also with the DAaaS team to see what the DAaaS platform could offer. Each team was handed the same set of code and a couple of test images, and over the span of four months they worked to implement their solutions.

At the end of the implementation period, each approach was analyzed to produce cost-estimates for the processing of a single image (Figure 1).

The pipelines ran on different VM types chosen by the architect. The Azure solutions used low-priority instances, and the Kubernetes solution pricing is based on three-year reserved pricing (which is more expensive than low-priority). The real question became: why such huge differences in cost?

Comparing apples and oranges

The difference was the code. While each team received the same code at the start with the objective of parallelizing it, the Azure Machine Learning and Azure Batch solutions encouraged slightly different approaches to that parallelization. The small changes to the code led to significant differences in outcome. This was not really about the pipeline technology itself; all things being equal the performance would have been comparable, but one implementation bypassed a serious performance issue whereas another did not touch that part of the code.

For example, one of the issues with the original implementation was the way that it parallelized processing of an image. In its original form, it split up what needed to be extracted from the image into 30 groups and then it created 30 parallel processes, each handling a share of the image. On the surface, this was a great idea; but unfortunately there was a complexity. The extraction algorithm needed to load a large geographic data file into memory, as well as the image itself, which combined to about 3 GB of RAM. This would be fine for one process, but since the processes do not share memory, doing this across 30 processes in parallel ballooned the RAM usage to 90 GB. In addition, all processes wrote many small files to disk in the extraction process, and the parallel disk writes slowed down the program substantially. This first implementation used a large number of resources, and took longer than it should have because it was stuck writing data to disk.

This was one area where Azure Machine Learning and Azure Batch diverged. The Azure Batch solution made it convenient to parallelize at the level of those groups within the image to extract, and so those processes were split up across different machines. The RAM was much more manageable, and the processes did not compete while writing to the disk. It was less natural to do this in Azure Machine Learning, and so by no fault of its own, it appeared to have much worse performance.

In contrast, on the DAaaS / Kubernetes implementation, the data scientists carefully read through, rewrote and re-architected components, and, before even touching the pipeline, had an extraction process which:

  • Used 6 GB of RAM per image, not 100 GB
  • Ran in under 40 minutes, not several hours
  • Used 6 CPUs, not 30+
CPU usage over time
Description for Figure 2 - CPU usage over time

The processing of three batches of fifteen images each, 45 images in total. Each colour is one image being extracted and the CPU usage over time.

RAM usage
Description for Figure 3 - RAM usage

The processing of three batches of fifteen images each, 45 images in total. Each colour is one image being extracted and the RAM usage over time.

Before even getting to the pipeline, the problem was reduced from one that required large clusters to one that would run on a mid-tier laptop. Since the pipeline no longer needed to constrain its imagination to making one image economical, it was possible to move on to the stage of making the processing of a season's worth of images simple, manageable and automated.

How do we know this now? Why was this not noticed?

The team focused on the Azure Batch and Azure Machine Learning solutions did not have the benefit of a mandate that included redesigning code. They were explicitly not supposed to be modifying the code, as that would have potential methodological impacts, and has little to do with the Cloud solutions or proofs-of-technology. It simply did not fall under their purview.

On the other hand, the DScD team had just hired someone specifically to invest in their data engineering capacity, so in addition to working on a proof-of-technology with DAaaS, the team was also undergoing an effort to develop more expertise in this area. The data engineer undertook a thorough code-review, and had the benefit of having the original author of the code in-house to answer questions. They simply had more freedom to solve the problem in a range of ways by having resources in-house, which led to more efficient outcomes as well as new insights.

It is important to note that without this analysis and review of the code, not only would they not have had this new solution, but they also would not have known why the two Azure solutions had such performance differences! It was only clear after review what caused the differences between the three solutions.

Lessons learned

The team did not compare apples with apples in this situation, and it was a far more informative experience as a result. What they inadvertently compared was really

  • what happens when you try to Lift-and-Shift an existing application?

with

  • what can you achieve with an analysis on the existing application?

The former is basically a platform/infrastructure question, and the latter is an engineering/application question. What was uncovered in this proof-of-concept was that while the platform is a necessary context, without which this exercise would have been impossible, the real value was delivered by the core engineeringFootnote 1, and it was the work invested in the application itself that differentiated the outcomes.

The team was wiser for the experience and learned about where to focus their expertise. Because of this project, and similar experiences, they were able to make strategic decisions in the division about how and where to grow capacity. In the year since the experiment, the DScD's efforts to boost data engineering skills across the division has paid dividends on many projects.

Choosing a pipeline technology and what went wrong the first time

In the previous section it was demonstrated that the underlying pipeline technology was not really what affected the efficiency or cost. So what drove the decisions? And what mistakes were made the first time?

As alluded to earlier, the DScD's business value is in developing models and applications, not in provisioning or maintaining infrastructure. In addition, the division is only a small part of a much larger organization, and it is important that they keep their technology strategy aligned with the organization and leverage the horizontal services offered by solutions like DAaaS.

For the DScD, it was an easy decision. Aligning with and working with DAaaS enables the DScD's work by letting them focus on the things that bring business value to clients, and working with the DAaaS platform helps the DAaaS team build up a robust and flexible platform that meets customer needs—for the DScD, for Statistics Canada, for external partners and ultimately, for Canadians.

Going with the DAaaS / Kubernetes solution was obvious in the end. They implemented a fully automated pipeline which queried the United States Geological Survey Application Programming Interface for the images, downloaded and processed them, and did all of this automatically and in a version-controlled and artifact-driven way. The team had a successful close-out and the team and the clients were very happy with this solution.

Unfortunately, the pipeline ran on a specific piece of pipeline software, and about a month later, that software was moved to a new licensing model that caused a rethink of its use. It was found that continued use of the software was impossible (as purchasing the software was not deemed a viable option). As a result, the rewrite was inevitable anyway.

The rewrite and the benefit of hindsight

In many ways, the rewrite gave the team a chance to revisit and simplify the original implementation. The first pipeline had many small components which each tackled one function, and then the pipeline code orchestrated them.

The breakdown of each component in the original pipeline and how they flowed into each other
Description for Figure 4 - The original pipeline

The breakdown of each component in the original pipeline and how they flowed into each other:

A flow chart with the following labelled boxes: Trackframe Shapefiles > Trackframe Tables > Quarter Sections > ML process, Daily timer > Events > Images, Landsat API.

In the revised pipeline, using Kubeflow Pipelines, the complexity was shifted out of the pipeline orchestration and into the application code itself.

While this sounds counter-intuitive, the reality is that the pipeline is no more or less complicated whether the logic is encoded into the ‘components' of the pipeline, or in the fabric which stitches them together. The difference is more people know regular Python or R than they know pipeline orchestration code, so it is simpler and more maintainable (for projects like this) to not be over-zealous with the pipeline code. As a result, from the Kubeflow Pipelines side, the pipeline looks like this:

The Kubeflow pipeline flowchart
Description for Figure 5 - The Kubeflow pipeline

The updated pipeline combines many of the original components into a single process to identify images of interest, then Kubeflow creates parallel processes to handle each one.

A circle labelled "Get Image Name" points toward seven boxes labelled "QS Extraction."

It simply gets the list of images that it needs to process that day, fans out in parallel across the Kubernetes cluster, and extracts each image separately. The component itself is encapsulated in a Docker image, which keeps the component portable and makes it easy to test and deploy. The pipeline orchestration code is about 20 lines of Python.

The Pipeline flowchart
Description for Figure 6 - The Pipeline

A flow chart with the following labelled boxes: Remote Landsat 8 Blob Storage > Download image > Get image id X using Trackframe table > cache > Image cache in MinIO

WRS2 Grid in MinIO > Not in cache download if needed > Clip image to Trackframe > cache > Trackframes in MinIO

Not in cache > Get QSTRM in this projection > cache > Projected QSTRM files in MinIO

Original QSTRM in MinIO > Not in cache / download unprojected QSTRM > Project QSTRM

The data flow within the extractor is still a little complicated, but it is easily and efficiently managed within the extractor, using an S3 Bucket (implemented by MinIO) as a storage location and cache.

The team was happy with the result, and successfully processed 1,600 images without issue.

The end?

This is not actually the end of the story as the last chapter has yet to be written. With the new pipeline, the project will be expanded and transitioned to production, and soon the team will begin training the new model.

We look forward to sharing our developments in future articles as the project moves ahead. Who knows, you might get the chance to read about a fancy new neural network that can figure out what is growing from space!

 
Date modified: