Version Control with Git for Analytics Professionals

By: Collin Brown, Statistics Canada

Analytics and data science workflows are becoming more complex than ever before—there are more data to analyze; computing resources continue to become cheaper; and there has been a surge in the availability of open source software.

For these reasons and more, there has been a significant uptake in programming by analytics professionals who do not have a classical computer science background. These advances have allowed analytics professionals to expand the scope of their work, perform new tasks and leverage these tools to deliver more value.

However, this rapid adoption of programming by analytics professionals introduces new complexities and exasperates old ones. In classical computer science workflows, such as software development, many tools and techniques have been rigorously developed over decades to accommodate this complexity.

As more analytics professionals integrate programming and open source software into their work, they may benefit significantly from also adopting some of the best practices from computer science that allow for the management of complex analytics and workflows.

When should analytics professionals leverage tools and techniques to manage complexity? Consider the problem of source code version control. In particular, how can multiple analytics professionals work on the same code base without conflicting with each other and how can they quickly revert back to previous versions of the code?

Leveraging Git for version control

Even if you are not familiar with the details of Git, the following scenario will demonstrate the benefits of such a tool.

Imagine there is a small team of analytics professionals making use of Git (a powerful tool typically used in software engineering) and GCCode (a Government of Canada internal instance of GitLab).

The three analytics professionals—Jane, John and Janice—create a monthly report that involves producing descriptive statistics and estimating some model parameters. The code they use to implement this analysis is written in Python, and the datasets they perform their analysis on are posted to a shared file system location that they all have access to. They must produce the report on the same day that the new dataset is received and, afterwards, send it to their senior management for review.

The team uses GCCode to centrally manage their source code and documentation written in gitlab flavoured markdown. They use a paired down version of a successful git branching model to ensure there are no conflicts when they each push code to the repository. The team uses a peer review approach to pull requests (PRs), meaning that someone other than the person who submitted the PR must review and approve the changes implemented in the PR.

This month is unusual; with little notice, the team is informed by their supervisor that there will be a change in the format that one of the datasets is received in. This format change is significant and requires non-trivial changes to the team’s codebase. In particular, once the changes are made, the code will support data preprocessing in the new format, but will no longer accommodate the old format.

The three employees quickly delegate responsibilities to incorporate the necessary changes to the codebase:

  • Jane will write the new piece of code required to accommodate the new data format
  • John will write automated tests that verify the correctness of Jane’s code
  • Janice will update the documentation to describe the data format changes.

The team has been following good version control practices, so the main branch of their GCCode repository is up to date and correctly implements the analysis required to produce the previous months’ reports.

Jane, John, and Janice begin by pulling from the GCCode repository to make sure each of their local repositories is up to date. Once this step is done, they each checkout a new branch from the main branch. Since the team is small, they choose to omit much of the overhead presented in a successful branching model and just checkout their own branches directly from the main branch.

Description - Figure 1 Example of three employees interacting with a remote Git repository. There is a box at the top of the diagram representing a remote repository. Below this, there are three boxes side-by-side representing local repositories of each of the three employees. For each box, there is a figure showing the employees’ branch off of main, which is represented as a series of circles, where each circle is a commit on the employees’ branch. Arrows pointing to/from the remote and local repositories show that employees push to and pull from the remote repository to keep their changes in sync with the remote. Finally, the remote has a figure showing all three employees’ branches off of main put together in a single diagram, indicating that the work of all three employees is happening in parallel and the work of each employee is not conflicting with the work of the others.

The three go about their work on their local workstations, committing their changes as they go while following good commit practices. By the end of the business day, they push their branches to the remote repository. At this point, the remote repository has three new branches that are each several commits ahead of the main branch. Each of the three assigns another to be their peer reviewer, and the next day the team approves changes and merges each member’s branch to main.

Description - Figure 2 Example of three branches merging back into the main branch via pull request. There is a circle representing the most recent commit of the main branch at the point when each of the three employees’ branches are created off of main. There are now three branches that each employee has worked on in parallel to implement their workflow, without conflicting with the work of the others. Each branch has several consecutive circles representing commits made. At the right side of the figure, the three parallel branches converge into a second circle representing the head of the new main branch after all three employees’ branches have been merged.

On the day that the report must be generated, they run their new code and successfully generate and send the report for their senior management using the new data.

Later that day, they receive an urgent request asking them to reproduce the previous three months’ reports for audit purposes. Given that the code has changed to accommodate the new data format, the current code is no longer compatible with the previous datasets.

Git to the rescue!

Fortunately, however, the team is using Git to manage their codebase. Because the team is using Git, they can checkout to the commit just before they made their changes, and temporarily revert the state of the working folder to what it was before their changes. Now that the folder has changed, they can retroactively produce the three reports using the previous three months’ data. Finally, they can then checkout back to the most recent commit of the main branch, so that they can use the new codebase that accommodates the format change going forward.

Even though the team described above is performing an analytics workflow, they were able to leverage Git to prevent a situation that otherwise may have been very inconvenient and time-consuming.

Learn more about Git

Would your work benefit from using the practices described above? Are you unfamiliar with Git? Here are a few resources to get you started:

  • The first half of IBM’s How Does Git Work provides a mental model for how Git works, and introduces many of the technical terms of Git and how they relate to that model.
  • This article about a successful git branching model provides a guide on how to perform collaborative workflows using a branching model and a framework that can be adjusted to suit particular needs.
  • The Git book provides a very detailed review of the mechanics of how Git works. It is broken down by section, so you can review whichever portion(s) are most relevant to your current use case.

What’s next?

Applying version control to one’s source code is just one of many computer science-inspired practices that can be applied to analytics and data science workflows.

In addition to versioning source code, many data science and analytics professionals may find themselves benefiting from data versioning (see Data Version Control for an implementation of this concept) or model versioning (e.g. see MLFlow model versioning).

Outside of versioning, there are many other computer science practices that analytics professionals can make use of such as automated testing, adhering to coding standards (e.g. Python’s PEP 8 style guide and environment and package management tools (e.g. pip and virtual environments in Python).

These resources are a great place to start as you begin to discover how complexity management practices from computer science can be used to improve data science and analytics workflows!

Date modified:

Use of Machine Learning for Crop Yield Prediction

By: Kenneth Chu, Statistics Canada

The Data Science Division (DScD) at Statistics Canada recently completed a research project for the Field Crop Reporting Series (FCRS) Footnote 1 on the use of machine learning techniques (more precisely, supervised regression techniques) for early-season crop yield prediction.

The project objective was to investigate whether machine learning techniques could be used to improve the precision of the existing crop yield prediction method (referred to as the Baseline method).

The project faced two key challenges: (1) how to incorporate any prediction technique (machine learning or otherwise) into the FCRS production environment in a methodologically sound way, and (2) how to evaluate any prediction method meaningfully within the FCRS production context.

For (1), the rolling window forward validation Footnote 2 protocol (originally designed for supervised learning on time series data) was adapted to safeguard against temporal information leakage. For (2), the team opted to perform testing by examining the actual series of prediction errors that would have resulted had it been deployed in past production cycles.

Motivation

Traditionally, the FCRS publishes annual crop yield estimates at the end of each reference year (shortly after harvest). In addition, full-year crop yield predictions are published several times during the reference year. Farms are contacted in March, June, July, September and November for data collection, resulting in a heavy response burden for farm operators.

In 2019, for the province of Manitoba, a model-based method—essentially, variable selection via LASSO (Least Absolute Shrinkage and Selection Operator), followed by robust linear regression—was introduced to generate the July predictions based on longitudinal satellite observations of local vegetation levels as well as region-level weather measurements. This allowed the removal of the question about crop yield prediction from the Manitoba FCRS July questionnaire, reducing the response burden.

Core regression technique: XGBoost with linear base learner

A number of prediction techniques were examined, including: random forests, support vector machines, elastic-net regularized generalized linear models, and multilayer perceptrons. Accuracy and computation time considerations led us to focus attention on XGBoost Footnote 3 with linear base learner.

Rolling Window Forward Validation to prevent temporal information leakage

The main contribution of the research project is the adaptation of rolling window forward validation (RWFV) Footnote 2 as hyperparameter tuning protocol. RWFV is a special case of forward validation Footnote 2, a family of validation protocols designed to prevent temporal information leakage for supervised learning based on time series data.

Suppose you are training a prediction model for deployment in production cycle 2021. This following schematic illustrates a rolling window forward validation scheme with a training window of five years, and a validation window of three years.

Description - Figure 1 Example of a rolling window forward validation scheme. This figure depicts, as an example, a rolling window forward validation scheme with a training window of five years and a validation window of three years. A validation scheme of this type is used to determine the optimal hyperparameter configuration to use when training the actual prediction model to be deployed in production.

The blue box at the bottom represents the production cycle 2021 and the five white boxes to its left correspond to the fact that a training window of five years is being used. This means that the training data for production cycle 2021 will be those from the five years strictly and immediately prior (2016 to 2020). For validation, or hyperparameter tuning for production cycle 2021, the three black boxes above the blue box correspond to our choice that the validation window is three years.

The RWFV protocol is used to choose the optimal configuration from the hyperparameter search space, as follows:

  • Fix temporarily an arbitrary candidate hyperparameter configuration from the search space.
  • Use that configuration to train a model for validation year 2020 using data from the following five years: 2015 to 2019.
  • Use that resulting trained model to make predictions for the validation year 2020. Compute accordingly the parcel-level prediction errors for 2020.
  • Aggregate the parcel-level prediction errors down to an appropriate single numeric performance metric.
  • Repeat for the two other validation years (2018 and 2019).

Averaging the performance metrics across the validation years 2018, 2019 and 2020, the result is a single numeric performance metric/validation error for the temporarily fixed hyperparameter configuration.

Next, this was repeated for all candidate hyperparameter configurations in the hyperparameter search space. The optimized configuration to actually be deployed in production is the one that yields the best aggregated performance metric. This is rolling window forward validation, or more precisely, our adaptation of it to the crop yield prediction context.

Note that the above protocol respects the operational constraint that, for production cycle 2021, the trained prediction model must have been trained and validated on data from strictly preceding years; in other words, the protocol prevents temporal information leakage.

Production-pertinent testing via prediction error series from virtual production cycles

To evaluate—in a way most pertinent to the production context of the FCRS—the performance of the aforementioned prediction strategy based on XGBoost(Linear) and RWFV, the data scientists computed the series of prediction errors that would have resulted had the strategy actually been deployed for past production cycles. In other words, these prediction errors of virtual past production cycles were regarded as estimates of the generalization error within the statistical production context of the FCRS.

The following schematic illustrates the prediction error series of the virtual production cycles:

Description - Figure 2 Prediction error series of virtual production cycles. Virtual production cycles are run for past reference years, as described in Figure 1. Since the actual crop yield data are already known for past production cycles, the actual prediction errors had the proposed prediction strategy been actually deployed for past production cycles (represented by orange boxes) can be computed. The resulting series of prediction errors for past production cycles is used to assess the accuracy and stability of the proposed crop yield prediction strategy.

Now repeat, for each past virtual production cycle (represented by an orange box), what was just described for the blue box. The difference now is the following: for the blue box, namely the current production cycle, it is NOT yet possible to compute the production/prediction errors at time of crop yield prediction (in July) since the current growing season has not ended. However, for the past virtual production cycles (the orange boxes), it is possible.

These prediction errors in virtual past production cycles can be illustrated in the following plot:

Description - Figure 3 Graphical comparison of the XGBoost(Linear)/RWFV prediction strategy against the Baseline strategy. The red line is the mock production error series of the Baseline strategy, while the orange is that of the XGBoost(Linear)/RWFV strategy. The latter strategy exhibits consistently smaller prediction errors across consecutive virtual past production cycles.

The red line illustrates the Baseline model prediction errors, while the orange line illustrates the XGBoost/RWFV strategy prediction errors. The gray lines illustrate the prediction errors for each of the candidate hyperparameter configurations in our chosen search grid (which contains 196 configurations).

The XGBoost/RWFV prediction strategy exhibited smaller prediction errors than the Baseline method, consistently over consecutive historical production runs.

Currently, the proposed strategy is in the final pre-production testing phase, to be jointly conducted by subject matter experts and the agricultural program’s methodologists.

The importance of evaluating protocols

The team chose not to use a more familiar validation method such as hold-out or cross validation, nor a generic generalization error estimate such as prediction error on a testing data set kept aside at the beginning.

These decisions were taken based on our determination that our proposed validation protocol and choice of generalization error estimates (RWFV and virtual production cycle prediction error series, respectively) would be much more relevant and appropriate given the production context of the FCRS.

Methodologists and machine learning practitioners are encouraged to evaluate carefully whether generic validation protocols or evaluation metrics are indeed appropriate for their use cases at hand, and if not, seek alternatives that are more relevant and meaningful within the given context. For more information about this project, please email statcan.dsnfps-rsdfpf.statcan@statcan.gc.ca.

References

Date modified:

Wholesale Trade Survey (monthly): CVs for total sales by geography - October 2020

Monthly Wholesale Trade Survey - CVs for Total sales by geography
Geography Month
201910 201911 201912 202001 202002 202003 202004 202005 202006 202007 202008 202009 202010
percentage
Canada 0.6 0.6 0.8 0.7 0.7 0.6 0.8 0.8 0.7 0.7 0.7 0.7 0.5
Newfoundland and Labrador 0.4 0.3 0.2 0.7 0.3 1.2 0.7 0.5 0.1 0.2 0.4 0.3 0.3
Prince Edward Island 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Nova Scotia 2.1 2.2 6.8 2.6 2.0 2.8 3.3 4.0 2.3 1.5 1.8 1.7 2.5
New Brunswick 1.4 3.8 1.7 2.6 1.2 1.3 2.1 3.3 1.9 2.1 4.2 3.4 2.8
Quebec 1.7 1.7 2.2 1.4 2.1 1.6 2.4 2.0 1.9 1.8 2.1 2.0 1.5
Ontario 1.0 0.8 1.2 1.2 0.9 1.0 1.2 1.1 1.1 1.1 0.9 0.9 0.8
Manitoba 1.7 0.9 2.6 1.3 0.8 1.0 2.9 2.8 1.2 1.2 1.8 2.3 1.7
Saskatchewan 0.7 1.0 0.7 0.5 0.6 0.5 1.2 0.7 0.7 1.1 1.6 0.6 0.8
Alberta 1.3 1.4 1.1 1.0 0.9 1.2 2.9 2.9 2.3 2.3 1.8 3.3 1.3
British Columbia 1.1 1.5 1.4 1.3 1.6 1.5 1.3 1.7 1.6 1.3 1.9 1.8 1.4
Yukon Territory 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Northwest Territories 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Nunavut 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Differences between the Annual Survey of Manufacturing Industries and the Monthly Survey of Manufacturing

The Annual Survey of Manufacturing and Logging Industries (ASML) measures both revenues from goods manufactured as well as total revenues. It should be noted that when comparing to the sales of goods manufactured variable from the Monthly Survey of Manufacturing (MSM), users should use the first concept, revenues from goods manufactured from the ASML. The total revenues published from the ASML measures a broader concept as it includes revenues from activities other than manufacturing.  For example, total revenues includes goods purchased for resale, investment and interest revenues. Total revenues from the ASML therefore cannot be compared to sales of goods manufactured published by the MSM.

The two surveys answer different user needs. The Monthly Survey of Manufacturing is built to provide an indicator on the state of the manufacturing sector and track monthly changes, i.e. provide information on the trend, while the Annual Survey of Manufacturing and Logging Industries is built to paint a detailed picture on the total dollar values of the industries, i.e. to provide information on the levels.

In order to provide information on trend that is not altered by changes in the sample, the sample of the Monthly Survey of Manufacturing is redrawn every five years, while the sample of the Annual Survey of Manufacturing and Logging Industries is renewed every year.

Both surveys are subject to revisions, however the two surveys will not produce identical results mainly because of methodological differences. For example, there are differences in sampling strategies (as described above), respondents reporting on the annual survey for a fiscal year that is different from a January to December calendar year, auxiliary data sources (MSM uses GST data, while ASML uses T2 tax data for imputation and calibration), imputation methods (for a particular record, MSM may use historical imputation, while ASML may use a donor to impute, or vice versa).

For more information on data sources and methodology please visit the following links:

Annual Survey of Manufacturing and Logging Industries (ASML)

Monthly Survey of Manufacturing (MSM)

Inter-city indexes of price differentials, of consumer goods and services 2020

Methodology

Inter-city indexes of price differentials of consumer goods and services show estimates of price differences between 15 Canadian cities in all provinces and territories, as of October 2019. These estimates are based on a selection of products (goods and services) purchased by consumers in each of the 15 cities.

In order to produce optimal inter-city indexes, product comparisons were initially made by pairing cities that are in close geographic proximity. The resulting price level comparisons were then extended to include comparisons between all of the cities, using a chaining procedure. The following initial pairings were used:

St. John's, Newfoundland and Labrador
Halifax, Nova Scotia
Charlottetown-Summerside, Prince Edward Island
Halifax, Nova Scotia
Saint John, New Brunswick
Halifax, Nova Scotia
Halifax, Nova Scotia
Ottawa, Ontario
Montréal, Quebec
Toronto, Ontario
Ottawa, Ontario
Toronto, Ontario
Toronto, Ontario
Winnipeg, Manitoba
Regina, Saskatchewan
Winnipeg, Manitoba
Edmonton, Alberta
Winnipeg, Manitoba
Vancouver, British Columbia
Edmonton, Alberta
Calgary, Alberta
Edmonton, Alberta
Whitehorse, Yukon
Edmonton, Alberta
Yellowknife, Northwest Territories
Edmonton, Alberta
Iqaluit, Nunavut
Yellowknife, Northwest Territories

Reliable inter-city price comparisons require that the selected products be very similar across cities. This ensures that the variation in index levels between cities is due to pure price differences and not to differences in the attributes of the products, such as size and/or quality.

Within each city pair, product price quotes were matched on the basis of detailed descriptions. Whenever possible, products were matched by brand, quantity and with some regard for the comparability of retail outlets from which they were selected.

Additionally, the target prices for this study are final prices and as such, include all sales taxes and levies applied to consumer products within a city. This can be an important source of variation when explaining differences in inter-city price levels.

It should be noted that price data for the inter-city indexes are drawn from the sample of monthly price data collected for the Consumer Price Index (CPI). Given that the CPI sample is optimized to produce accurate price comparisons through time, and not across regions, the number of matched price quotes between cities can be small. It should also be noted that, especially in periods when prices are highly volatile, the timing of the product price comparison can significantly affect city-to-city price relationships.

The weights used to aggregate the food indexes within a city are based on the combined consumption expenditures of households living in the 15 cities tracked. As such, one set of weights is used for all 15 cities for the food indexes. Iqaluit has only the food major component index and its selected sub-groups published, as a result, the weights used to aggregate the non-food product indexes within a city are based on the combined consumption expenditures of households living in the other 14 cities tracked. Currently, 2017 expenditures are used to derive the weights. These expenditures are expressed in October 2019 prices.

The inter-city index for a particular city is compared to the weighted average of all 15 cities, which is equal to 100. For example, an index value of 102 for a particular city means that prices for the measured commodities are 2% higher than the weighted, combined city average.

These estimates should not be interpreted as a measure of differences in the cost of living between cities. The indexes provide price comparisons for a selection of products only, and are not meant to give an exhaustive comparison of all goods and services purchased by consumers. Additionally, the shelter price concept used for these indexes is not conducive to making cost-of-living type comparisons between cities (see below).

Additional Information on Shelter

Shelter prices were absent from the inter-city index program prior to 1999 because of methodological and conceptual issues associated with their measurement. The diverse nature of shelter means that accurate matches between cities are often difficult to make.

To account for some of these difficulties, a rental equivalence approach is used to construct the inter-city price indexes for owned accommodation. Such an approach uses market rents as an approximation to the cost of the shelter services consumed by homeowners. It is important to note that this approach may not be suitable for the needs of all users. For instance, since the rental equivalence approach does not represent an out-of-pocket expenditure, the indexes should not be used for measuring differences in the purchasing power of homeowners across cities.

The relatively small size of the housing market in Whitehorse and Yellowknife makes it difficult to construct reliable price indexes for rented accommodation and owned accommodation. To compensate, housing information is collected using different pricing frequencies and collection methods than in the rest of the country. Consequently, users should exercise caution when using the indexes for rented accommodation and owned accommodation for these two cities.

Monthly Survey of Manufacturing: National Level CVs by Characteristic - October 2020

Text table 1: National Level CVs by Characteristic
Month Sales of goods manufactured Raw materials and components inventories Goods / work in process inventories Finished goods manufactured inventories Unfilled Orders
%
October 2019 0.61 0.93 1.13 1.41 1.11
November 2019 0.59 0.95 1.17 1.37 1.12
December 2019 0.58 0.98 1.16 1.39 1.06
January 2020 0.64 0.99 1.26 1.32 1.10
February 2020 0.63 1.02 1.22 1.36 1.08
March 2020 0.68 0.99 1.17 1.41 1.10
April 2020 0.87 0.99 1.20 1.41 1.10
May 2020 0.80 1.04 1.13 1.37 1.06
June 2020 0.69 1.05 1.19 1.38 1.06
July 2020 0.69 1.02 1.15 1.43 1.10
August 2020 0.64 1.04 1.23 1.49 1.48
September 2020 0.74 1.06 1.20 1.53 1.41
October 2020 0.75 1.02 1.15 1.51 1.27

Notice of changes to the Manufacturing and Logging variant of NAPCS Canada 2017 version 2.0

December 11, 2020 (Previous notice)

The Variant of NAPCS Canada version 2.0 – Manufacturing and Logging has been updated this December 11 2020, to help the Annual Survey of Manufacturing and Logging Industries (ASML) program with improving the measurement of the use and production of plastic in the manufacturing industries. The updated variant was renamed Variant of NAPCS Canada version 2.0 – Manufacturing and Logging Rev.1 (for Revision 1). There are four (4) variant codes that have been expanded to twelve (12) codes, along with title changes to five (5) other codes, as shown below:

Annual Survey of Manufacturing and Logging Industries (ASML)
Old ASML variant Code Old ASML variant English Title Updated ASML variant Code Updated ASML variant English Title GSIM Type of Change
28111110 Polyester resins 28111111 Polyethylene terephthalate (PET) resins RC4.1 - Breakdown
28111110 Polyester resins 28111112 Other thermoplastic polyester resins RC4.1 - Breakdown
28111210 Polyethylene, low-density 28111210 Low-density polyethylene resins VC2 - Name change
28111220 Polyethylene, linear low-density 28111220 Linear low-density polyethylene resins VC2 - Name change
28111230 Polyethylene, high-density 28111230 High-density polyethylene resins VC2 - Name change
28111410 Acrylonitrile-butadiene-styrene 28111410 Acrylonitrile-butadiene-styrene resins VC2 - Name change
28111420 Polyvinyl chloride 28111420 Polyvinyl chloride resins VC2 - Name change
28111430 All other thermoplastic resins 28111431 Polypropylene resins RC4.2 - Split off
28111430 All other thermoplastic resins 28111432 Thermoplastic polyurethane resins RC4.2 - Split off
28111430 All other thermoplastic resins 28111433 Polyamide (nylon) resins RC4.2 - Split off
28111430 All other thermoplastic resins 28111434 All other thermoplastic resins, n.e.c. RC4.2 - Split off
28111510 Phenol-formaldehyde resins 28111511 Phenolic resins RC4.1 - Breakdown
28111510 Phenol-formaldehyde resins 28111512 Urea formaldehyde resins RC4.1 - Breakdown
28111510 Phenol-formaldehyde resins 28111513 All other formaldehyde based resins RC4.1 - Breakdown
28111610 Other thermosetting resins 28111611 Unsaturated polyester (thermosetting) resins RC4.2 - Split off
28111610 Other thermosetting resins 28111612 Thermosetting  polyurethane resins RC4.2 - Split off
28111610 Other thermosetting resins 28111613 Other thermosetting resins, n.e.c. RC4.2 - Split off

Description of changes in the classification, including Codes, Titles, Classes, Subclasses and Detailed categories (Based on GSIM)

Date modified:

Labour Force Survey Expert Panel

The social and economic impacts of COVID-19 have fuelled an extraordinary demand for timely, high-quality data on the health of Canada's people, society and economy. In response to this demand, Statistics Canada has enhanced many of its programs, including its Labour Force Survey (LFS), with the creation of the Labour Force Survey (LFS) Expert Panel.

Comprised of national and international experts from government, academia and non-governmental agencies, the LFS Expert Panel will provide independent advice and guidance on one of Statistics Canada's most important statistical programs.

To ensure that these enhancements result in a deeper and broader understanding of evolving market conditions, the Panel will:

  • Provide strategic advice to Statistics Canada on strategies to engage with respondents and encourage participation in the Labour Force Survey;
  • Provide expert advice to Statistics Canada on the analysis of Labour Force Survey data; and,
  • Act as a liaison on data quality with the broader community of LFS data users.

Listed below, Panel members were selected to reflect a wide-range of expertise and experience, including in the use of LFS data and international experience in the management of similar large-scale statistical programs. With the help of these experts, the LFS will provide even better data and insights to Canadians on evolving labour market conditions in our country.

Internal membership

Chair: Lynn Barr-Telford, Assistant Chief Statistician, Social, Health and Labour Statistics

Secretary: Josée Bégin, Director General, Labour Market, Education and Socioeconomic Well-being Statistics Branch

Agency subject matter: Centre for Labour Market Information, Modern Statistical Methods and Data Science Branch, and Collection and Regional Services Branch.

External membership

John L. Eltinge

John L. Eltinge

Assistant Director, Research and Methodology, United States Census Bureau

John Eltinge is the U.S. Census Bureau Liaison to Statistics Canada's LFS Expert Panel. Mr. Eltinge is the Assistant Director for Research and Methodology at the United States Census Bureau.  Before 2016, he served as the Associate Commissioner for Survey Methods Research at the Bureau of Labor Statistics (BLS). Prior to that, he served as a senior mathematical statistician at BLS, and an associate professor with tenure in the Department of Statistics at Texas A&M University.  He gave the Roger Herriot Memorial Lecture on Innovation in the Federal Statistical System; and was previously the President of the Washington Statistical Society, the overall chair of the 2003 Joint Statistical Meetings, an associate editor for The American Statistician, and an associate editor for the Applications and Case Studies Section of Journal of the American Statistical Association. In addition, at the 2018 Joint Statistical Meetings, he presented the annual plenary Deming Memorial Lecture, "Improving the Quality and Value of Statistical Information: 14 Questions on Management". A webcast of this lecture is available through: Plenary Session Webcasts

His research interests include the following: data quality; design optimization; integration of multiple data sources; imputation; time series; and small domain estimation.

Mr. Eltinge holds a Ph.D. from the Department of Statistics at Iowa State University; is a fellow of the American Statistical Association; an editor of the Harvard Data Science Review; an associate editor for Journal of Official Statistics and for Survey Methodology Journal; and a member of the Federal Committee on Statistical Methodology.

Dr. John L. Eltinge
Howard Ramos

Howard Ramos

Professor, Sociology Department, Western University

Howard Ramos is a professor at Western University as well as the Chair of the Department of Sociology. He investigates issues of social justice and social change and has published four books and over 50 articles and chapters on social movements, human rights, Indigenous issues, environmental advocacy, urban change, economic and tourism development, technology, ethnicity, race, immigration, and equity, diversity and inclusion in higher education. Dr. Ramos has worked with a wide range of advocacy and community organizations and is committed to knowledge translation and evidence-based policy.

Howard Ramos
Karyne B. Charbonneau

Karyne B. Charbonneau

Director, Prices, Labour and Housing Division, Canadian Economic Analysis Department, Bank of Canada

Karyne B. Charbonneau is the Director of the Canadian Economic Analysis (CEA) Department's Prices, Labour and Housing division. She is primarily responsible for the evolution of the labour market and inflation in the near term.

Ms. Charbonneau joined the Bank of Canada in 2013 as a Senior Economist in the International Economic Analysis Department. Prior to occupying her current role, she was a Policy Advisor in CEA and provided guidance on the impact of trade policy changes on the Canadian economy.

Her research focuses on applied econometrics, international trade and labour economics. She received her PhD in economics from Princeton University.

Karyne Charbonneau
Thomas Storring

Thomas Storring

Director of Economics and Statistics, Nova Scotia Department of Finance and Treasury Board

Thomas Storring is the Director of Economics and Statistics for the Nova Scotia Department of Finance and Treasury Board. His work focuses on macroeconomic conditions in the Province: how macroeconomics affects the government's fiscal choices and how government decisions affect the economy. He is the focal point for Statistics Canada within the Province of Nova Scotia, and advises Statistics Canada on the Province's needs and priorities for the national statistical system.

Mr. Storring has worked as an economist for over 20 years in provincial finance departments in Ontario and Nova Scotia, as well as for J. D. Irving, Limited. Over the last 10 years, Thomas has taught at both Saint Mary's University and Dalhousie University, lecturing on money and banking, public finance, statistics, principles of economics, as well as global economics for Saint Mary's MBA program. He completed his undergraduate degree in economics at Acadia University and received his Master's in economics at the University of Oxford.

Thomas Storring
Mikal Skuterud

Mikal Skuterud

Professor, Economics Department, University of Waterloo

Mikal Skuterud is a full-time Professor in the Department of Economics at the University of Waterloo and is affiliated with the Canadian Labour Economics Forum (CLEF) and the Institute of Labor Economics (IZA). He received his Master's degree in Economics from the University of British Columbia and his PhD in Economics from McMaster University.

His research interests include: the labour market integration of immigrants, labour market policies that influence hours of work, and the economics of trade unions. His work has appeared in the American Economic Review, the Journal of Labor Economics, and the Canadian Journal of Economics and has received national media coverage in the New York Times and the Globe and Mail.

Mikal Skuterud
Bjorn Jarvis

Bjorn Jarvis

Program Manager, Labour Surveys Branch, Australian Bureau of Statistics

Bjorn Jarvis is the head of the Labour Surveys Branch at the Australian Bureau of Statistics, which comprises the Labour Force Survey and related household surveys, and employer (establishment/business) surveys. In this role, he has overseen innovative transformation of Labour Force Survey methods, analysis and communication. This role included managing the impacts of COVID-19 on Labour Force statistics. Over his 16 years in official statistics, Bjorn has held a broad range of survey and administrative statistics roles in the labour and population statistics programs. He is a highly regarded survey statistician and communicator, with deep connections to the labour statistics user community in Australia.

Bjorn Jarvis
Angella MacEwen

Angella MacEwen

Senior Economist, Canadian Union of Public Employees, Broadbent Institute

Angella MacEwen is a Senior Economist at the Canadian Union of Public Employees, a policy fellow with the Broadbent Institute and a member of the National Stakeholder Advisory Panel (NSAP) at the Labour Market Information Council (LMIC). Her primary focus is understanding precarity and inequality in the Canadian labour market and evaluating policy solutions proposed for these issues. She also studies the impacts of Canadian economic and social policy on workers, especially climate policy and international trade and investment treaties. Ms. MacEwen writes a quarterly publication, Economy at Work, which aims to communicate current economic issues to a broad audience. She holds an MA in Economics from Dalhousie University.

Angella MacEwen

Variant of NAPCS Canada 2017 Version 2.0 - Manufacturing and Logging - Background information

Status

This variant of the North American Product Classification System (NAPCS) Canada 2017 V2.0 was approved as a departmental standard on October 16, 2017. It replaces the NAPCS 2017 Version 1.0 Manufacturing and Logging variant.

The Annual Survey of Manufacturing and Logging Industries (ASML) is a survey of the manufacturing and logging industries in Canada. It is intended to cover all establishments primarily engaged in manufacturing and logging activities as well as some sales offices and warehouses which support these establishments.

The details collected include principal industrial statistics (such as revenue, salaries and wages, cost of materials and supplies used, cost of energy and water utility, inventories, etc.), as well as information about the commodities produced and consumed. Data collected by the ASML industries help measure the production of Canada's industrial and primary resource sectors, as well as provide an indication of the well-being of each industry covered by the survey and its contribution to the Canadian and Provincial economy.

Within Statistics Canada, the data are used by the Canadian System of National Accounts, the Monthly Survey of Manufacturing and Prices programs. The data are also used by the business community, trade associations, federal and provincial departments, as well as international organizations and associations to profile the manufacturing and logging industries, undertake market studies, forecast demand and develop trade and tariff policies. The manufacturing variant was created to capture additional details on products that NAPCS Canada 2017 Version 1.0 would otherwise not have collected. By adding an extra (eighth) digit to the classification, additional detail can be collected. In NAPCS Canada 2017 Version 2.0, some of those eight digit variant codes were brought up to the standard seven digit level. Those products include tobacco (see NAPCS Canada code 212112 - Cigars, chewing and smoking tobacco), chemicals (see codes 26321 - Petrochemicals, 27113 - Basic organic chemicals, n.e.c., 27211 - Ammonia and chemical fertilizers and 28111 - Plastic resins), cement (see code 46511 - Cement) and asphalt (see code 26211 - Asphalt (except natural) and asphalt products). The variant 8 digits codes left in Version 2.0 are for wood products, such as codes under NAPCS Canada 1451221 - Fuel products of waste wood, 157112 - Waste and scrap of wood, 24122 - Reconstituted wood products, 24124 - Other sawmill products, and treated wood products, and 462134 - Other wood millwork products.

Changes to the Variant of NAPCS Canada 2017 Version 2.0 - Manufacturing and Logging

The Variant of NAPCS Canada version 2.0 – Manufacturing and Logging has been updated this December 11 2020, to help the Annual Survey of Manufacturing and Logging Industries (ASML) program with improving the measurement of the use and production of plastic in the manufacturing industries. The updated variant was renamed Variant of NAPCS Canada version 2.0 – Manufacturing and Logging Rev.1 (for Revision 1). There are four (4) variant codes that have been expanded to twelve (12) codes, along with title changes to five (5) other codes, as shown below:

The Variant of NAPCS Canada version 2.0 – Manufacturing and Logging has been updated this December 11 2020, to help the Annual Survey of Manufacturing and Logging Industries (ASML) program with improving the measurement of the use and production of plastic in the manufacturing industries. There are four variant codes that have been expanded to 12 codes, along with title changes to 5 other codes, as shown below:
Old ASML variant Code Old ASML variant English Title Updated ASML variant Code Updated ASML variant English Title GSIM Type of Change
28111110 Polyester resins 28111111 Polyethylene terephthalate (PET) resins RC4.1 - Breakdown
28111110 Polyester resins 28111112 Other thermoplastic polyester resins RC4.1 - Breakdown
28111210 Polyethylene, low-density 28111210 Low-density polyethylene resins VC2 - Name change
28111220 Polyethylene, linear low-density 28111220 Linear low-density polyethylene resins VC2 - Name change
28111230 Polyethylene, high-density 28111230 High-density polyethylene resins VC2 - Name change
28111410 Acrylonitrile-butadiene-styrene 28111410 Acrylonitrile-butadiene-styrene resins VC2 - Name change
28111420 Polyvinyl chloride 28111420 Polyvinyl chloride resins VC2 - Name change
28111430 All other thermoplastic resins 28111431 Polypropylene resins RC4.2 - Split off
28111430 All other thermoplastic resins 28111432 Thermoplastic polyurethane resins RC4.2 - Split off
28111430 All other thermoplastic resins 28111433 Polyamide (nylon) resins RC4.2 - Split off
28111430 All other thermoplastic resins 28111434 All other thermoplastic resins, n.e.c. RC4.2 - Split off
28111510 Phenol-formaldehyde resins 28111511 Phenolic resins RC4.1 - Breakdown
28111510 Phenol-formaldehyde resins 28111512 Urea formaldehyde resins RC4.1 - Breakdown
28111510 Phenol-formaldehyde resins 28111513 All other formaldehyde based resins RC4.1 - Breakdown
28111610 Other thermosetting resins 28111611 Unsaturated polyester (thermosetting) resins RC4.2 - Split off
28111610 Other thermosetting resins 28111612 Thermosetting  polyurethane resins RC4.2 - Split off
28111610 Other thermosetting resins 28111613 Other thermosetting resins, n.e.c. RC4.2 - Split off

Description of changes in the classification, including Codes, Titles, Classes, Subclasses and Detailed categories (Based on GSIM)

Hierarchical structure

The structure of the NAPCS Canada 2017 variant for Manufacturing and Logging is hierarchical. It is composed of five levels.

  • level 1: group (three- digit standard codes)
  • level 2: class (five-digit standard codes)
  • level 3: subclass (six-digit standard codes)
  • level 4: detail (seven-digit standard codes)
  • level 5: detail (eight-digit variant codes)
Date modified: