Legacy Content

Use of Machine Learning for Crop Yield Prediction

By: Kenneth Chu, Statistics Canada

The Data Science Division (DScD) at Statistics Canada recently completed a research project for the Field Crop Reporting Series (FCRS) Footnote 1 on the use of machine learning techniques (more precisely, supervised regression techniques) for early-season crop yield prediction.

The project objective was to investigate whether machine learning techniques could be used to improve the precision of the existing crop yield prediction method (referred to as the Baseline method).

The project faced two key challenges: (1) how to incorporate any prediction technique (machine learning or otherwise) into the FCRS production environment in a methodologically sound way, and (2) how to evaluate any prediction method meaningfully within the FCRS production context.

For (1), the rolling window forward validation Footnote 2 protocol (originally designed for supervised learning on time series data) was adapted to safeguard against temporal information leakage. For (2), the team opted to perform testing by examining the actual series of prediction errors that would have resulted had it been deployed in past production cycles.

Motivation

Traditionally, the FCRS publishes annual crop yield estimates at the end of each reference year (shortly after harvest). In addition, full-year crop yield predictions are published several times during the reference year. Farms are contacted in March, June, July, September and November for data collection, resulting in a heavy response burden for farm operators.

In 2019, for the province of Manitoba, a model-based method—essentially, variable selection via LASSO (Least Absolute Shrinkage and Selection Operator), followed by robust linear regression—was introduced to generate the July predictions based on longitudinal satellite observations of local vegetation levels as well as region-level weather measurements. This allowed the removal of the question about crop yield prediction from the Manitoba FCRS July questionnaire, reducing the response burden.

Core regression technique: XGBoost with linear base learner

A number of prediction techniques were examined, including: random forests, support vector machines, elastic-net regularized generalized linear models, and multilayer perceptrons. Accuracy and computation time considerations led us to focus attention on XGBoost Footnote 3 with linear base learner.

Rolling Window Forward Validation to prevent temporal information leakage

The main contribution of the research project is the adaptation of rolling window forward validation (RWFV) Footnote 2 as hyperparameter tuning protocol. RWFV is a special case of forward validation Footnote 2, a family of validation protocols designed to prevent temporal information leakage for supervised learning based on time series data.

Suppose you are training a prediction model for deployment in production cycle 2021. This following schematic illustrates a rolling window forward validation scheme with a training window of five years, and a validation window of three years.

Description - Figure 1 Example of a rolling window forward validation scheme. This figure depicts, as an example, a rolling window forward validation scheme with a training window of five years and a validation window of three years. A validation scheme of this type is used to determine the optimal hyperparameter configuration to use when training the actual prediction model to be deployed in production.

The blue box at the bottom represents the production cycle 2021 and the five white boxes to its left correspond to the fact that a training window of five years is being used. This means that the training data for production cycle 2021 will be those from the five years strictly and immediately prior (2016 to 2020). For validation, or hyperparameter tuning for production cycle 2021, the three black boxes above the blue box correspond to our choice that the validation window is three years.

The RWFV protocol is used to choose the optimal configuration from the hyperparameter search space, as follows:

  • Fix temporarily an arbitrary candidate hyperparameter configuration from the search space.
  • Use that configuration to train a model for validation year 2020 using data from the following five years: 2015 to 2019.
  • Use that resulting trained model to make predictions for the validation year 2020. Compute accordingly the parcel-level prediction errors for 2020.
  • Aggregate the parcel-level prediction errors down to an appropriate single numeric performance metric.
  • Repeat for the two other validation years (2018 and 2019).

Averaging the performance metrics across the validation years 2018, 2019 and 2020, the result is a single numeric performance metric/validation error for the temporarily fixed hyperparameter configuration.

Next, this was repeated for all candidate hyperparameter configurations in the hyperparameter search space. The optimized configuration to actually be deployed in production is the one that yields the best aggregated performance metric. This is rolling window forward validation, or more precisely, our adaptation of it to the crop yield prediction context.

Note that the above protocol respects the operational constraint that, for production cycle 2021, the trained prediction model must have been trained and validated on data from strictly preceding years; in other words, the protocol prevents temporal information leakage.

Production-pertinent testing via prediction error series from virtual production cycles

To evaluate—in a way most pertinent to the production context of the FCRS—the performance of the aforementioned prediction strategy based on XGBoost(Linear) and RWFV, the data scientists computed the series of prediction errors that would have resulted had the strategy actually been deployed for past production cycles. In other words, these prediction errors of virtual past production cycles were regarded as estimates of the generalization error within the statistical production context of the FCRS.

The following schematic illustrates the prediction error series of the virtual production cycles:

Description - Figure 2 Prediction error series of virtual production cycles. Virtual production cycles are run for past reference years, as described in Figure 1. Since the actual crop yield data are already known for past production cycles, the actual prediction errors had the proposed prediction strategy been actually deployed for past production cycles (represented by orange boxes) can be computed. The resulting series of prediction errors for past production cycles is used to assess the accuracy and stability of the proposed crop yield prediction strategy.

Now repeat, for each past virtual production cycle (represented by an orange box), what was just described for the blue box. The difference now is the following: for the blue box, namely the current production cycle, it is NOT yet possible to compute the production/prediction errors at time of crop yield prediction (in July) since the current growing season has not ended. However, for the past virtual production cycles (the orange boxes), it is possible.

These prediction errors in virtual past production cycles can be illustrated in the following plot:

Description - Figure 3 Graphical comparison of the XGBoost(Linear)/RWFV prediction strategy against the Baseline strategy. The red line is the mock production error series of the Baseline strategy, while the orange is that of the XGBoost(Linear)/RWFV strategy. The latter strategy exhibits consistently smaller prediction errors across consecutive virtual past production cycles.

The red line illustrates the Baseline model prediction errors, while the orange line illustrates the XGBoost/RWFV strategy prediction errors. The gray lines illustrate the prediction errors for each of the candidate hyperparameter configurations in our chosen search grid (which contains 196 configurations).

The XGBoost/RWFV prediction strategy exhibited smaller prediction errors than the Baseline method, consistently over consecutive historical production runs.

Currently, the proposed strategy is in the final pre-production testing phase, to be jointly conducted by subject matter experts and the agricultural program’s methodologists.

The importance of evaluating protocols

The team chose not to use a more familiar validation method such as hold-out or cross validation, nor a generic generalization error estimate such as prediction error on a testing data set kept aside at the beginning.

These decisions were taken based on our determination that our proposed validation protocol and choice of generalization error estimates (RWFV and virtual production cycle prediction error series, respectively) would be much more relevant and appropriate given the production context of the FCRS.

Methodologists and machine learning practitioners are encouraged to evaluate carefully whether generic validation protocols or evaluation metrics are indeed appropriate for their use cases at hand, and if not, seek alternatives that are more relevant and meaningful within the given context. For more information about this project, please email statcan.dsnfps-rsdfpf.statcan@statcan.gc.ca.

References

Date modified:

Wholesale Trade Survey (monthly): CVs for total sales by geography - October 2020

Monthly Wholesale Trade Survey - CVs for Total sales by geography
Geography Month
201910 201911 201912 202001 202002 202003 202004 202005 202006 202007 202008 202009 202010
percentage
Canada 0.6 0.6 0.8 0.7 0.7 0.6 0.8 0.8 0.7 0.7 0.7 0.7 0.5
Newfoundland and Labrador 0.4 0.3 0.2 0.7 0.3 1.2 0.7 0.5 0.1 0.2 0.4 0.3 0.3
Prince Edward Island 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Nova Scotia 2.1 2.2 6.8 2.6 2.0 2.8 3.3 4.0 2.3 1.5 1.8 1.7 2.5
New Brunswick 1.4 3.8 1.7 2.6 1.2 1.3 2.1 3.3 1.9 2.1 4.2 3.4 2.8
Quebec 1.7 1.7 2.2 1.4 2.1 1.6 2.4 2.0 1.9 1.8 2.1 2.0 1.5
Ontario 1.0 0.8 1.2 1.2 0.9 1.0 1.2 1.1 1.1 1.1 0.9 0.9 0.8
Manitoba 1.7 0.9 2.6 1.3 0.8 1.0 2.9 2.8 1.2 1.2 1.8 2.3 1.7
Saskatchewan 0.7 1.0 0.7 0.5 0.6 0.5 1.2 0.7 0.7 1.1 1.6 0.6 0.8
Alberta 1.3 1.4 1.1 1.0 0.9 1.2 2.9 2.9 2.3 2.3 1.8 3.3 1.3
British Columbia 1.1 1.5 1.4 1.3 1.6 1.5 1.3 1.7 1.6 1.3 1.9 1.8 1.4
Yukon Territory 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Northwest Territories 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Nunavut 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Differences between the Annual Survey of Manufacturing Industries and the Monthly Survey of Manufacturing

The Annual Survey of Manufacturing and Logging Industries (ASML) measures both revenues from goods manufactured as well as total revenues. It should be noted that when comparing to the sales of goods manufactured variable from the Monthly Survey of Manufacturing (MSM), users should use the first concept, revenues from goods manufactured from the ASML. The total revenues published from the ASML measures a broader concept as it includes revenues from activities other than manufacturing.  For example, total revenues includes goods purchased for resale, investment and interest revenues. Total revenues from the ASML therefore cannot be compared to sales of goods manufactured published by the MSM.

The two surveys answer different user needs. The Monthly Survey of Manufacturing is built to provide an indicator on the state of the manufacturing sector and track monthly changes, i.e. provide information on the trend, while the Annual Survey of Manufacturing and Logging Industries is built to paint a detailed picture on the total dollar values of the industries, i.e. to provide information on the levels.

In order to provide information on trend that is not altered by changes in the sample, the sample of the Monthly Survey of Manufacturing is redrawn every five years, while the sample of the Annual Survey of Manufacturing and Logging Industries is renewed every year.

Both surveys are subject to revisions, however the two surveys will not produce identical results mainly because of methodological differences. For example, there are differences in sampling strategies (as described above), respondents reporting on the annual survey for a fiscal year that is different from a January to December calendar year, auxiliary data sources (MSM uses GST data, while ASML uses T2 tax data for imputation and calibration), imputation methods (for a particular record, MSM may use historical imputation, while ASML may use a donor to impute, or vice versa).

For more information on data sources and methodology please visit the following links:

Annual Survey of Manufacturing and Logging Industries (ASML)

Monthly Survey of Manufacturing (MSM)

Inter-city indexes of price differentials, of consumer goods and services 2020

Methodology

Inter-city indexes of price differentials of consumer goods and services show estimates of price differences between 15 Canadian cities in all provinces and territories, as of October 2019. These estimates are based on a selection of products (goods and services) purchased by consumers in each of the 15 cities.

In order to produce optimal inter-city indexes, product comparisons were initially made by pairing cities that are in close geographic proximity. The resulting price level comparisons were then extended to include comparisons between all of the cities, using a chaining procedure. The following initial pairings were used:

St. John's, Newfoundland and Labrador
Halifax, Nova Scotia
Charlottetown-Summerside, Prince Edward Island
Halifax, Nova Scotia
Saint John, New Brunswick
Halifax, Nova Scotia
Halifax, Nova Scotia
Ottawa, Ontario
Montréal, Quebec
Toronto, Ontario
Ottawa, Ontario
Toronto, Ontario
Toronto, Ontario
Winnipeg, Manitoba
Regina, Saskatchewan
Winnipeg, Manitoba
Edmonton, Alberta
Winnipeg, Manitoba
Vancouver, British Columbia
Edmonton, Alberta
Calgary, Alberta
Edmonton, Alberta
Whitehorse, Yukon
Edmonton, Alberta
Yellowknife, Northwest Territories
Edmonton, Alberta
Iqaluit, Nunavut
Yellowknife, Northwest Territories

Reliable inter-city price comparisons require that the selected products be very similar across cities. This ensures that the variation in index levels between cities is due to pure price differences and not to differences in the attributes of the products, such as size and/or quality.

Within each city pair, product price quotes were matched on the basis of detailed descriptions. Whenever possible, products were matched by brand, quantity and with some regard for the comparability of retail outlets from which they were selected.

Additionally, the target prices for this study are final prices and as such, include all sales taxes and levies applied to consumer products within a city. This can be an important source of variation when explaining differences in inter-city price levels.

It should be noted that price data for the inter-city indexes are drawn from the sample of monthly price data collected for the Consumer Price Index (CPI). Given that the CPI sample is optimized to produce accurate price comparisons through time, and not across regions, the number of matched price quotes between cities can be small. It should also be noted that, especially in periods when prices are highly volatile, the timing of the product price comparison can significantly affect city-to-city price relationships.

The weights used to aggregate the food indexes within a city are based on the combined consumption expenditures of households living in the 15 cities tracked. As such, one set of weights is used for all 15 cities for the food indexes. Iqaluit has only the food major component index and its selected sub-groups published, as a result, the weights used to aggregate the non-food product indexes within a city are based on the combined consumption expenditures of households living in the other 14 cities tracked. Currently, 2017 expenditures are used to derive the weights. These expenditures are expressed in October 2019 prices.

The inter-city index for a particular city is compared to the weighted average of all 15 cities, which is equal to 100. For example, an index value of 102 for a particular city means that prices for the measured commodities are 2% higher than the weighted, combined city average.

These estimates should not be interpreted as a measure of differences in the cost of living between cities. The indexes provide price comparisons for a selection of products only, and are not meant to give an exhaustive comparison of all goods and services purchased by consumers. Additionally, the shelter price concept used for these indexes is not conducive to making cost-of-living type comparisons between cities (see below).

Additional Information on Shelter

Shelter prices were absent from the inter-city index program prior to 1999 because of methodological and conceptual issues associated with their measurement. The diverse nature of shelter means that accurate matches between cities are often difficult to make.

To account for some of these difficulties, a rental equivalence approach is used to construct the inter-city price indexes for owned accommodation. Such an approach uses market rents as an approximation to the cost of the shelter services consumed by homeowners. It is important to note that this approach may not be suitable for the needs of all users. For instance, since the rental equivalence approach does not represent an out-of-pocket expenditure, the indexes should not be used for measuring differences in the purchasing power of homeowners across cities.

The relatively small size of the housing market in Whitehorse and Yellowknife makes it difficult to construct reliable price indexes for rented accommodation and owned accommodation. To compensate, housing information is collected using different pricing frequencies and collection methods than in the rest of the country. Consequently, users should exercise caution when using the indexes for rented accommodation and owned accommodation for these two cities.

Monthly Survey of Manufacturing: National Level CVs by Characteristic - October 2020

Text table 1: National Level CVs by Characteristic
Month Sales of goods manufactured Raw materials and components inventories Goods / work in process inventories Finished goods manufactured inventories Unfilled Orders
%
October 2019 0.61 0.93 1.13 1.41 1.11
November 2019 0.59 0.95 1.17 1.37 1.12
December 2019 0.58 0.98 1.16 1.39 1.06
January 2020 0.64 0.99 1.26 1.32 1.10
February 2020 0.63 1.02 1.22 1.36 1.08
March 2020 0.68 0.99 1.17 1.41 1.10
April 2020 0.87 0.99 1.20 1.41 1.10
May 2020 0.80 1.04 1.13 1.37 1.06
June 2020 0.69 1.05 1.19 1.38 1.06
July 2020 0.69 1.02 1.15 1.43 1.10
August 2020 0.64 1.04 1.23 1.49 1.48
September 2020 0.74 1.06 1.20 1.53 1.41
October 2020 0.75 1.02 1.15 1.51 1.27
Legacy Content

Notice of changes to the Manufacturing and Logging variant of NAPCS Canada 2017 version 2.0

December 11, 2020 (Previous notice)

The Variant of NAPCS Canada version 2.0 – Manufacturing and Logging has been updated this December 11 2020, to help the Annual Survey of Manufacturing and Logging Industries (ASML) program with improving the measurement of the use and production of plastic in the manufacturing industries. The updated variant was renamed Variant of NAPCS Canada version 2.0 – Manufacturing and Logging Rev.1 (for Revision 1). There are four (4) variant codes that have been expanded to twelve (12) codes, along with title changes to five (5) other codes, as shown below:

Annual Survey of Manufacturing and Logging Industries (ASML)
Old ASML variant Code Old ASML variant English Title Updated ASML variant Code Updated ASML variant English Title GSIM Type of Change
28111110 Polyester resins 28111111 Polyethylene terephthalate (PET) resins RC4.1 - Breakdown
28111110 Polyester resins 28111112 Other thermoplastic polyester resins RC4.1 - Breakdown
28111210 Polyethylene, low-density 28111210 Low-density polyethylene resins VC2 - Name change
28111220 Polyethylene, linear low-density 28111220 Linear low-density polyethylene resins VC2 - Name change
28111230 Polyethylene, high-density 28111230 High-density polyethylene resins VC2 - Name change
28111410 Acrylonitrile-butadiene-styrene 28111410 Acrylonitrile-butadiene-styrene resins VC2 - Name change
28111420 Polyvinyl chloride 28111420 Polyvinyl chloride resins VC2 - Name change
28111430 All other thermoplastic resins 28111431 Polypropylene resins RC4.2 - Split off
28111430 All other thermoplastic resins 28111432 Thermoplastic polyurethane resins RC4.2 - Split off
28111430 All other thermoplastic resins 28111433 Polyamide (nylon) resins RC4.2 - Split off
28111430 All other thermoplastic resins 28111434 All other thermoplastic resins, n.e.c. RC4.2 - Split off
28111510 Phenol-formaldehyde resins 28111511 Phenolic resins RC4.1 - Breakdown
28111510 Phenol-formaldehyde resins 28111512 Urea formaldehyde resins RC4.1 - Breakdown
28111510 Phenol-formaldehyde resins 28111513 All other formaldehyde based resins RC4.1 - Breakdown
28111610 Other thermosetting resins 28111611 Unsaturated polyester (thermosetting) resins RC4.2 - Split off
28111610 Other thermosetting resins 28111612 Thermosetting  polyurethane resins RC4.2 - Split off
28111610 Other thermosetting resins 28111613 Other thermosetting resins, n.e.c. RC4.2 - Split off

Description of changes in the classification, including Codes, Titles, Classes, Subclasses and Detailed categories (Based on GSIM)

Date modified:

Labour Force Survey Expert Panel

The social and economic impacts of COVID-19 have fuelled an extraordinary demand for timely, high-quality data on the health of Canada's people, society and economy. In response to this demand, Statistics Canada has enhanced many of its programs, including its Labour Force Survey (LFS), with the creation of the Labour Force Survey (LFS) Expert Panel.

Comprised of national and international experts from government, academia and non-governmental agencies, the LFS Expert Panel will provide independent advice and guidance on one of Statistics Canada's most important statistical programs.

To ensure that these enhancements result in a deeper and broader understanding of evolving market conditions, the Panel will:

  • Provide strategic advice to Statistics Canada on strategies to engage with respondents and encourage participation in the Labour Force Survey;
  • Provide expert advice to Statistics Canada on the analysis of Labour Force Survey data; and,
  • Act as a liaison on data quality with the broader community of LFS data users.

Listed below, Panel members were selected to reflect a wide-range of expertise and experience, including in the use of LFS data and international experience in the management of similar large-scale statistical programs. With the help of these experts, the LFS will provide even better data and insights to Canadians on evolving labour market conditions in our country.

Internal membership

Chair: Lynn Barr-Telford, Assistant Chief Statistician, Social, Health and Labour Statistics

Secretary: Josée Bégin, Director General, Labour Market, Education and Socioeconomic Well-being Statistics Branch

Agency subject matter: Centre for Labour Market Information, Modern Statistical Methods and Data Science Branch, and Collection and Regional Services Branch.

External membership

John L. Eltinge

John L. Eltinge

Assistant Director, Research and Methodology, United States Census Bureau

John Eltinge is the U.S. Census Bureau Liaison to Statistics Canada's LFS Expert Panel. Mr. Eltinge is the Assistant Director for Research and Methodology at the United States Census Bureau.  Before 2016, he served as the Associate Commissioner for Survey Methods Research at the Bureau of Labor Statistics (BLS). Prior to that, he served as a senior mathematical statistician at BLS, and an associate professor with tenure in the Department of Statistics at Texas A&M University.  He gave the Roger Herriot Memorial Lecture on Innovation in the Federal Statistical System; and was previously the President of the Washington Statistical Society, the overall chair of the 2003 Joint Statistical Meetings, an associate editor for The American Statistician, and an associate editor for the Applications and Case Studies Section of Journal of the American Statistical Association. In addition, at the 2018 Joint Statistical Meetings, he presented the annual plenary Deming Memorial Lecture, "Improving the Quality and Value of Statistical Information: 14 Questions on Management". A webcast of this lecture is available through: Plenary Session Webcasts

His research interests include the following: data quality; design optimization; integration of multiple data sources; imputation; time series; and small domain estimation.

Mr. Eltinge holds a Ph.D. from the Department of Statistics at Iowa State University; is a fellow of the American Statistical Association; an editor of the Harvard Data Science Review; an associate editor for Journal of Official Statistics and for Survey Methodology Journal; and a member of the Federal Committee on Statistical Methodology.

Dr. John L. Eltinge
Howard Ramos

Howard Ramos

Professor, Sociology Department, Western University

Howard Ramos is a professor at Western University as well as the Chair of the Department of Sociology. He investigates issues of social justice and social change and has published four books and over 50 articles and chapters on social movements, human rights, Indigenous issues, environmental advocacy, urban change, economic and tourism development, technology, ethnicity, race, immigration, and equity, diversity and inclusion in higher education. Dr. Ramos has worked with a wide range of advocacy and community organizations and is committed to knowledge translation and evidence-based policy.

Howard Ramos
Karyne B. Charbonneau

Karyne B. Charbonneau

Director, Prices, Labour and Housing Division, Canadian Economic Analysis Department, Bank of Canada

Karyne B. Charbonneau is the Director of the Canadian Economic Analysis (CEA) Department's Prices, Labour and Housing division. She is primarily responsible for the evolution of the labour market and inflation in the near term.

Ms. Charbonneau joined the Bank of Canada in 2013 as a Senior Economist in the International Economic Analysis Department. Prior to occupying her current role, she was a Policy Advisor in CEA and provided guidance on the impact of trade policy changes on the Canadian economy.

Her research focuses on applied econometrics, international trade and labour economics. She received her PhD in economics from Princeton University.

Karyne Charbonneau
Thomas Storring

Thomas Storring

Director of Economics and Statistics, Nova Scotia Department of Finance and Treasury Board

Thomas Storring is the Director of Economics and Statistics for the Nova Scotia Department of Finance and Treasury Board. His work focuses on macroeconomic conditions in the Province: how macroeconomics affects the government's fiscal choices and how government decisions affect the economy. He is the focal point for Statistics Canada within the Province of Nova Scotia, and advises Statistics Canada on the Province's needs and priorities for the national statistical system.

Mr. Storring has worked as an economist for over 20 years in provincial finance departments in Ontario and Nova Scotia, as well as for J. D. Irving, Limited. Over the last 10 years, Thomas has taught at both Saint Mary's University and Dalhousie University, lecturing on money and banking, public finance, statistics, principles of economics, as well as global economics for Saint Mary's MBA program. He completed his undergraduate degree in economics at Acadia University and received his Master's in economics at the University of Oxford.

Thomas Storring
Mikal Skuterud

Mikal Skuterud

Professor, Economics Department, University of Waterloo

Mikal Skuterud is a full-time Professor in the Department of Economics at the University of Waterloo and is affiliated with the Canadian Labour Economics Forum (CLEF) and the Institute of Labor Economics (IZA). He received his Master's degree in Economics from the University of British Columbia and his PhD in Economics from McMaster University.

His research interests include: the labour market integration of immigrants, labour market policies that influence hours of work, and the economics of trade unions. His work has appeared in the American Economic Review, the Journal of Labor Economics, and the Canadian Journal of Economics and has received national media coverage in the New York Times and the Globe and Mail.

Mikal Skuterud
Bjorn Jarvis

Bjorn Jarvis

Program Manager, Labour Surveys Branch, Australian Bureau of Statistics

Bjorn Jarvis is the head of the Labour Surveys Branch at the Australian Bureau of Statistics, which comprises the Labour Force Survey and related household surveys, and employer (establishment/business) surveys. In this role, he has overseen innovative transformation of Labour Force Survey methods, analysis and communication. This role included managing the impacts of COVID-19 on Labour Force statistics. Over his 16 years in official statistics, Bjorn has held a broad range of survey and administrative statistics roles in the labour and population statistics programs. He is a highly regarded survey statistician and communicator, with deep connections to the labour statistics user community in Australia.

Bjorn Jarvis
Angella MacEwen

Angella MacEwen

Senior Economist, Canadian Union of Public Employees, Broadbent Institute

Angella MacEwen is a Senior Economist at the Canadian Union of Public Employees, a policy fellow with the Broadbent Institute and a member of the National Stakeholder Advisory Panel (NSAP) at the Labour Market Information Council (LMIC). Her primary focus is understanding precarity and inequality in the Canadian labour market and evaluating policy solutions proposed for these issues. She also studies the impacts of Canadian economic and social policy on workers, especially climate policy and international trade and investment treaties. Ms. MacEwen writes a quarterly publication, Economy at Work, which aims to communicate current economic issues to a broad audience. She holds an MA in Economics from Dalhousie University.

Angella MacEwen
Legacy Content

Variant of NAPCS Canada 2017 Version 2.0 - Manufacturing and Logging - Background information

Status

This variant of the North American Product Classification System (NAPCS) Canada 2017 V2.0 was approved as a departmental standard on October 16, 2017. It replaces the NAPCS 2017 Version 1.0 Manufacturing and Logging variant.

The Annual Survey of Manufacturing and Logging Industries (ASML) is a survey of the manufacturing and logging industries in Canada. It is intended to cover all establishments primarily engaged in manufacturing and logging activities as well as some sales offices and warehouses which support these establishments.

The details collected include principal industrial statistics (such as revenue, salaries and wages, cost of materials and supplies used, cost of energy and water utility, inventories, etc.), as well as information about the commodities produced and consumed. Data collected by the ASML industries help measure the production of Canada's industrial and primary resource sectors, as well as provide an indication of the well-being of each industry covered by the survey and its contribution to the Canadian and Provincial economy.

Within Statistics Canada, the data are used by the Canadian System of National Accounts, the Monthly Survey of Manufacturing and Prices programs. The data are also used by the business community, trade associations, federal and provincial departments, as well as international organizations and associations to profile the manufacturing and logging industries, undertake market studies, forecast demand and develop trade and tariff policies. The manufacturing variant was created to capture additional details on products that NAPCS Canada 2017 Version 1.0 would otherwise not have collected. By adding an extra (eighth) digit to the classification, additional detail can be collected. In NAPCS Canada 2017 Version 2.0, some of those eight digit variant codes were brought up to the standard seven digit level. Those products include tobacco (see NAPCS Canada code 212112 - Cigars, chewing and smoking tobacco), chemicals (see codes 26321 - Petrochemicals, 27113 - Basic organic chemicals, n.e.c., 27211 - Ammonia and chemical fertilizers and 28111 - Plastic resins), cement (see code 46511 - Cement) and asphalt (see code 26211 - Asphalt (except natural) and asphalt products). The variant 8 digits codes left in Version 2.0 are for wood products, such as codes under NAPCS Canada 1451221 - Fuel products of waste wood, 157112 - Waste and scrap of wood, 24122 - Reconstituted wood products, 24124 - Other sawmill products, and treated wood products, and 462134 - Other wood millwork products.

Changes to the Variant of NAPCS Canada 2017 Version 2.0 - Manufacturing and Logging

The Variant of NAPCS Canada version 2.0 – Manufacturing and Logging has been updated this December 11 2020, to help the Annual Survey of Manufacturing and Logging Industries (ASML) program with improving the measurement of the use and production of plastic in the manufacturing industries. The updated variant was renamed Variant of NAPCS Canada version 2.0 – Manufacturing and Logging Rev.1 (for Revision 1). There are four (4) variant codes that have been expanded to twelve (12) codes, along with title changes to five (5) other codes, as shown below:

The Variant of NAPCS Canada version 2.0 – Manufacturing and Logging has been updated this December 11 2020, to help the Annual Survey of Manufacturing and Logging Industries (ASML) program with improving the measurement of the use and production of plastic in the manufacturing industries. There are four variant codes that have been expanded to 12 codes, along with title changes to 5 other codes, as shown below:
Old ASML variant Code Old ASML variant English Title Updated ASML variant Code Updated ASML variant English Title GSIM Type of Change
28111110 Polyester resins 28111111 Polyethylene terephthalate (PET) resins RC4.1 - Breakdown
28111110 Polyester resins 28111112 Other thermoplastic polyester resins RC4.1 - Breakdown
28111210 Polyethylene, low-density 28111210 Low-density polyethylene resins VC2 - Name change
28111220 Polyethylene, linear low-density 28111220 Linear low-density polyethylene resins VC2 - Name change
28111230 Polyethylene, high-density 28111230 High-density polyethylene resins VC2 - Name change
28111410 Acrylonitrile-butadiene-styrene 28111410 Acrylonitrile-butadiene-styrene resins VC2 - Name change
28111420 Polyvinyl chloride 28111420 Polyvinyl chloride resins VC2 - Name change
28111430 All other thermoplastic resins 28111431 Polypropylene resins RC4.2 - Split off
28111430 All other thermoplastic resins 28111432 Thermoplastic polyurethane resins RC4.2 - Split off
28111430 All other thermoplastic resins 28111433 Polyamide (nylon) resins RC4.2 - Split off
28111430 All other thermoplastic resins 28111434 All other thermoplastic resins, n.e.c. RC4.2 - Split off
28111510 Phenol-formaldehyde resins 28111511 Phenolic resins RC4.1 - Breakdown
28111510 Phenol-formaldehyde resins 28111512 Urea formaldehyde resins RC4.1 - Breakdown
28111510 Phenol-formaldehyde resins 28111513 All other formaldehyde based resins RC4.1 - Breakdown
28111610 Other thermosetting resins 28111611 Unsaturated polyester (thermosetting) resins RC4.2 - Split off
28111610 Other thermosetting resins 28111612 Thermosetting  polyurethane resins RC4.2 - Split off
28111610 Other thermosetting resins 28111613 Other thermosetting resins, n.e.c. RC4.2 - Split off

Description of changes in the classification, including Codes, Titles, Classes, Subclasses and Detailed categories (Based on GSIM)

Hierarchical structure

The structure of the NAPCS Canada 2017 variant for Manufacturing and Logging is hierarchical. It is composed of five levels.

  • level 1: group (three- digit standard codes)
  • level 2: class (five-digit standard codes)
  • level 3: subclass (six-digit standard codes)
  • level 4: detail (seven-digit standard codes)
  • level 5: detail (eight-digit variant codes)
Date modified:

Video - Introduction to Raster Data (Part 1): Processing and Visualizing Single-Band Rasters

Catalogue number: Catalogue number: 89200005

Issue number: 2020019

Release date: December 1, 2020

QGIS Demo 19

Introduction to Raster Data (Part 1): Processing and Visualizing Single-Band Rasters - Video transcript

(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Introduction to Raster Data (Part 1): Processing and Visualizing Single-Band Rasters")

So in today's tutorial we'll introduce using raster data in QGIS - specifically focussing on single-band rasters. These rasters depict changes in a single continuous variable, such as precipitation, slope or elevation. One of the most common single-band rasters are Digital Elevation Models or DEMs for short, which show changes in height above sea level. We'll cover some generic raster functions, such as Merge and Reproject; discuss the parameters for their visualization, and then in a follow up demo cover some DEM specific tools and the Raster Calculator. This will provide you with foundational skills for processing, combining and visualizing single-band rasters. Raster datasets epitomize the finer resolution and powerful analyses that are possible with publically available datasets, with resolutions of 15 to 30 metres being common.

As established, for a selection of files in the Browser Panel we can right-click and Add Selected Layers to load them simultaneously in to the Layers Panel. The boundaries between the DEMs are pronounced, resulting in their splotchy appearance. And this is because visualization is tailored to their specific value ranges, which vary significantly due to the mountainous terrain.

To address this we can merge the DEMs into one file, which will create a uniform value range for visualization. In the Processing Toolbox search and open the Merge tool under GDAL - Raster miscellaneous. Click Select All in the Multiple Selection box for the Input Layers. Since we want the full value range to be used in visualization leave the Grab pseudocolour from first layer parameter unchecked. The separate band parameter applies to composite rasters, such as satellite imagery. Since we are using single-band rasters we'll leave it unchecked. NoData values often relate to the cells at the edge of rasters, and may appear as a black perimeter. Here we'll leave the NoData values as default, as well as the compression parameters. So run with a temporary output file, since the merged file is around 1 gigabyte. And the process takes around 6 minutes to complete. If one tool fails there are often alternatives. For example, here we could use r.patch, a GRASS tool, to merge the rasters.

Once complete, the merged file appears like this. The visualization is a marked improvement, with the full value range being used in its rendering.

Now we'll use the Warp (reproject) tool to transform the projection and coordinate reference system. In general, it is best practice to avoid projecting rasters due to potential adverse effects on cell alignments or values but in this case we want to use a projected system for spatial analysis. There are a few additional parameters to specify within the tool. This includes the Source CRS, selecting NAD 83 (CSRS), a geographic coordinate system, from the drop-down. Then we can specify the Target coordinate reference system to transform to, opening the system selector and entering 26911 for NAD83 UTM Zone 11 N, which corresponds with the current location. Change the Resampling method to Bilinear, as Nearest Neighbour is better suited to thematic rasters. Once again we'll leave the NoData values unset, as we'll define them in our output raster. Leave the georeferenced units as-is to use the source layer's resolution. And we'll leave all other parameters with defaults and once again save to a temporary output file. And the tool takes roughly 12 minutes to complete.

As we can see, there were some effects on cell alignment and values, with the warped DEM containing slightly different value ranges. To address this open the Layer Properties box and within the Histogram tab, click Compute Values to assess the distribution of raster data values. Zooming in, based on the distribution, our minimum value is similar to the merged DEM, with 0 being a NoData Value. Clicking on the Hand icon we can interactively select the minimum value in the Histogram, or enter it manually – matching it to the minimum value in the merged DEM: 552. In the Transparency tab, we'll enter 0 as the NoData value to remove the black perimeter around the warped DEM. The min and max values could also be defined in the Symbology tab. So as we can see, the available Layer Properties Box tabs for rasters partly overlap with those of vector datasets. Clicking OK, removing the NoData value and adjusting the minimum has improved the visualization of the warped DEM.

To export a raster to a new dataset we can apply the same procedures applied to vectors. Right-click the raster, Export and click Save as to open the Save Raster Layer As box. We can select the file Format from the drop-down – with GeoTIFF being the most common, and we can optionally create a virtual raster or VRT file. This links the source datasets and applied processes, reducing processing times, file sizes as well as providing other processing advantages. For this format, a subfolder needs to be specified, which will also be the filename. Otherwise, provide an output filename and directory here entering PmBCDEM for projected merged British Columbia digital elevation model.

We can also specify the output cell-size, the source resolution being 15 m by 15 m or 225 square meters per pixel. Finer resolutions result in larger file sizes. So to reduce the total file-size, there is a trade-off, requiring a coarser resolution. Alternatively we can specify the number of rows and columns for the output raster, which will adjust the cell-size accordingly. Here we'll use the source resolution. There are also compression options and a parameter to build pyramids, which we'll cover momentarily with a separate tool. To remove any known NoData values or unrealistic values in the exported raster we could expand and check the NoData values box, click the Plus icon and enter the value range such as -9999 to -1. After clicking OK, the permanent file will be created and added to the Layers Panel.

You may have noticed the longer rendering times for the rasters. This is because the source resolution is being used regardless of the canvas scale. To improve it we can run the output through the Build Pyramids tool, which will create multiple coarser resolution versions of the input, which are then applied for rendering based on the canvas scale. We can then specify the Resampling method and whether the pyramids should be created internally within the DEM file or as an external .ovr file for GeoTiffs. As you can see, clicking Run, this significantly improves rendering times within the Canvas, as we zoom in and out and change the canvas location.

Now let's discuss raster visualization. Once again we'll repeat defining the NoData Value within our permanent raster – entering 0. The data distribution in the Histogram tab is the same as our temporary reprojected file. And finally in the Symbology tab we'll match the minimum value to the merged raster. Clicking apply, the visualization of the DEM now matches that of the temporary merged and reprojected DEMs.

The Symbology tab, as with vector datasets, is used for visualization. The Render type drop-down is equivalent to the Style drop-down, where we can apply different visualization schemes. For the single band rasters - Singleband gray - is the default, but we can also apply Singleband pseudocolour and, specifically for the DEMs, Hillshade. Paletted / Unique is used for thematic rasters and multi-band is used for composite imagery to assign bands to the visible spectrum for analysis and visualization.

There are various Contrast enhancement options in the drop-down. We'll leave it with the default - Stretch to Min/Max. Expanding the Min/Max Value Settings we can specify how the value ranges are applied in rendering. So switch to cumulative count cut. This enhanced the contrast and brightness between cells, using values between the 2nd and 98th percentile. So change the values to 0.5% and 99.5%. And this reduces the contrast, since we are using a larger range of the data – resulting in less values falling outside of the minimum and maximum values. Conversely using a smaller distribution of the total values, entering 5.0% and 95.0%, once again intensifies the contrast and brightness. We could also define the range in standard deviations. So changing it to 5 standard deviations, the contrast and brightness is noticeably reduced. Conversely, using 0.5 has the opposite effect, with more of the DEM values occurring outside of that range. Switch back to Cumulative Count Cut with default values and click Apply.

The Statistics Extent determines the raster values used based on the canvas. By default they apply to the Whole Raster, meaning zooming in or out produces no change in rendering. Alternatively we could switch to the Current extent to optimize visualization for a specific location and scale, or select Updated extent for a dynamic visualization. Now as we change the scale and location of the canvas, the values and visualization of the DEM is adjusted accordingly.

Finally, the Color Rendering drop-down of the symbology tab can be used to fine-tune the visualization. Now let's switch to a Pseudocolour style in the Render type drop-down. Apply the Red-Yellow-Green ramp from the expanded All Ramps side-bar. Reopen the colour ramp, and click Invert Colour Ramp. The particular Interpolation method can be specified as linear, discrete or exact, which varies according to the intended use. We'll leave the other settings as default, clicking Apply and OK. So this is another common visualization for DEMs, with red showing mountain peaks and greens visualizing valley bottoms. Ensure you have a permanent file of the DEM saved which we'll use in the Part II of this demo and then save the project file with a distinctive name.

(The words: "For comments or questions about this video, GIS tools or other Statistics Canada products or services, please contact us: statcan.sisagrequestssrsrequetesag.statcan@canada.ca" appear on screen.)

(Canada wordmark appears.)