Announcements

October 2019

Business data scientist challenge

Business Data Scientist Challenge 2019/2020

Nominal gross domestic product (GDP) by industry

Statistics Canada continuously works to provide high quality, relevant and timely data on economic and social developments in Canada. The Agency's data table "Multifactor productivity, value-added, capital input and labour input in the aggregate business sector and major sub-sectors, by industry" (36-10-0208-01) provides important indicators of production efficiency and business performance in Canadian industries. Among the table's variables, nominal GDP is important for the estimation of multifactor productivity growth. It is also used as the base with which income shares and other inputs are calculated. While the data for most variables in this table are available up to the most recent year of reference, the nominal GDP data by industry are available with a 3-year-lag.

The challenge

This year's challenge is to use publicly available data sources, and/or apply data analytics or analysis techniques to generate timely estimates of nominal GDP at the same level of industry as in the data table 36-10-0208-01. Of particular interest are: measuring nominal GDP using publicly available data sources, methods for producing more timely estimates, and methods for benchmarking.

Note that publically available data include Statistics Canada data tables such as multifactor productivity and related variables by industry, real GDP by industry, GDP implicit price indexes, industrial product price index, and other GDP/price related tables.

Submissions to the challenge can include models for prediction, applications of machine learning, data visualizations, interactive tools or dashboards, web scraping or data gathering tools, text mining or any other analytical technique or process available. The use of publicly available data sources must be done in accordance with applicable laws.

Eligibility

The challenge is open to graduate and senior undergraduate students from a Canadian university, or Canadian/Permanent Resident graduate and senior undergraduate students at a foreign university.

Applicants may be enrolled in one of the following fields: Economics, Data Science, Computer Science, Mathematics and Statistics.

Submissions may be made by teams of 1 to 3 eligible students.

Registration (deadline March 1, 2020)

To register for the challenge, each team must submit:

  • a letter of support from a faculty member that is no longer than 500 words and includes the names of all team members, and
  • a signed consent form per each team member.

Please submit your registration by email to: statcan.cder-cdre.statcan@canada.ca, with the subject line "Attn: Data Challenge Registration (YOUR TEAM NAME)".

Teams can register at any time until the registration deadline on March 1, 2020.

Following registration, an email containing the dataset as well as background material will be sent to applicants.

Submissions and deadline

Registered teams must send their submission by email to: statcan.cder-cdre.statcan@canada.ca, with the subject line "Attn: Data Challenge Submission (YOUR TEAM NAME)".

Deadline for submissions is March 31, 2020.

Submissions must consist of:

  • an output (e.g. a model; a visualization; a dashboard, etc.); and,
  • code which is thoroughly documented and runs without manual intervention once data are loaded; and,
  • a report of a maximum of 1,500 words that may contain up to 4 tables and up to 4 charts/visualizations.

There are no restrictions on the type of software that may be used.

Please note that only finalists will be contacted.

Evaluation, winner selection and presentation of results

Submissions will be evaluated by a panel of Statistics Canada staff and will be judged according to their novelty, usability, accuracy or ability to derive value added from the contest data.

Winners will be announced by May 31, 2020.

A representative from the winning team will be invited to present their results to Statistics Canada and have their submission publicized on the Agency's website. All copyright and intellectual property of submissions reverts to Statistics Canada. Please note that Statistics Canada reserves the right to decline to announce a winner if too few submissions are received.

Questions and contact information

Questions about the challenge can sent via email to: statcan.cder-cdre.statcan@canada.ca (subject line "Attn: Data competition questions")

An email list of all participants will be compiled, and responses to general questions will be shared with all participants.


May 2019

Business data scientist challenge

Winners of the Business Data Challenge 2018/2019

Statistics Canada is pleased to announce the winners of the Business Data Scientist Challenge for 2018/2019.

The winners are Nicolas Leblanc, Mindy Lin and Jasper Zhu from the University of Waterloo!

This year, the challenge received numerous high quality submissions from across Canada. The challenge asked participants to evaluate quarterly business dynamics measures (entry, exit, firm openings, firm closings, number of firms). Teams were provided with the contest dataset which could be used on its own, or which could be merged with other data sources.

For their entry, Nicolas, Mindy and Jasper chose to focus on business dynamics for Canada, and combined the contest dataset with information from the Labour Force Survey and the Quarterly Survey of Financial Statements. They added a linear trend as well as categorical variables for each of the four quarters. They wrote an R program that uses this information to predict the number of entries, exits, openings, closings and active firms.

The predictions are based on two models, one for variable selection and one for time series prediction. To select the relevant variables for prediction, Nicolas, Mindy and Jasper used a negative binomial general linear model with LASSO from the glmnetFootnote 1 package. Using the cv.glmnet function, they determined which variables are most relevant for predicting a particular type of business dynamics measure.

They then applied a general linear model for count data from the tscountFootnote 2 package. The tsglm function was used to make predictions about the number of entries, exits, openings, closings and firm counts. The time series model includes two additional parameters for the lags of the dependent variable and the number of lags of the regressors. These parameters were determined based on a grid search, and the results of the prediction were placed into an R shinyFootnote 3 app that visualizes the original time series as well as the predictions.

Thank you to everyone who entered and congratulations once again to Nicolas, Mindy and Jasper.


Upcoming business data scientist challenge

The details of the Statistics Canada Business Data Scientist Challenge for 2019-2020 will be announced in September 2019, so keep an eye out and tell all the students you know.

The Business Data Scientist Challenge is based on a business dataset available from Statistics Canada. The dataset and the goal of the challenge change over time. Using the dataset, teams of up to three persons are asked to apply analytical techniques so that an audience can gain insight or understanding of an economic phenomenon.

Depending on the type of challenge, teams can use forecasts or predictive models, applications of machine learning, data visualizations, interactive tools or dashboards, web-scrapping or data gathering tools, text mining or any other analytical technique or process available. Publicly available data sources may be combined with the challenge dataset, but this must be done in accordance with applicable laws.

Submissions and eligibility

The competition is open to graduate students and senior undergraduate students from a Canadian university or Canadian/Permanent Resident graduate students and senior undergraduate students at a foreign university. Applicants may be enrolled in one of the following fields: Economics, Data Science, Computer Science, Mathematics and Statistics. Submissions may be made by teams of up to three persons.

Teams may use Python, R, SAS or Stata. Other software may be used depending on approval from Statistics Canada.

Contact information

Please address questions to Attn: Data competition questions at statcan.cder-cdre.statcan@canada.ca.


November 2018

Business data scientist challenge

Business data scientist challenge

Statistics Canada continuously works to improve the relevance and timeliness of its data. One area of progress is the publication of data series on business performance such as entry, exit and employment re-allocation. These data inform about a dynamic process within the Canadian economy whereby new firms enter the market, successful firms grow, and unsuccessful firms shrink or exit the market. While informative, these data are only available with a multi-year lag.

To produce more timely estimates for entry and exit, a set of experimental quarterly estimates were created. These estimates inform about entry up to the most recent quarter, but exits continue to be estimated with a lag of up to 7 quarters.

Challenge

The challenge is to apply data analytics or analysis techniques to increase our understanding of the state of entry and exit in the Canadian economy. Of particular interest are methods for producing more timely estimates for exits, methods for identifying and accounting for aberrant observations such as leverage points or outliers, and methods for incorporating unclassified firms into analysis.

Submissions can include forecasts or predictive models, applications of machine learning, data visualizations, interactive tools or dashboards, web scraping or data gathering tools, text mining or any other analytical technique or process available. Publicly available data sources may be combined with the entry and exit data, but this must be done in accordance with applicable laws.

Submissions and eligibility

The competition is open to graduate students and senior undergraduate students from a Canadian university or Canadian/Permanent Resident graduate students and senior undergraduate students at a foreign university. Applicants may be enrolled in one of the following fields: Economics, Data Science, Computer Science, Mathematics and Statistics. Submissions may be made by teams of up to 3 people.

To register for the competition, applicants must submit a letter of support from a faculty member (not to be longer than half a page) and the names of team members.

Contest entries must consist of an output (e.g. a model; a visualization; a dashboard, etc) as well as:

  • Code which is thoroughly documented and runs without manual intervention once data are loaded;
  • A report, if necessary, that has a maximum of 1,500 words and may contain up to 4 tables and up to 4 charts/visualizations;

There are no restrictions on the type of software that may be used.

Contest entry and deadline

To register for the contest, please submit an email indicating the team, its members and a short (half page) letter of support from a faculty member to Attn: Data competition registration at statcan.cder-cdre.statcan@canada.ca. The registration can be made at any time up to March 1st, 2019. Following registration, an email containing the dataset as well as background material will be sent to participants.

The deadline for submission is March 31st, 2019. Please submit the entry, project code, and a short report if needed to Attn: Data competition submission at statcan.cder-cdre.statcan@canada.ca by March 31, 2019. Only finalists will be contacted.

Presentation of results

Applications will be evaluated by a panel of Statistics Canada staff. Applications will be judged according to their novelty, usability, accuracy or ability to derive value added from the contest data.

A representative of the winning team will be invited to present their results to Statistics Canada and have their submission publicized by Statistics Canada. All copyright and intellectual property of submissions reverts to Statistics Canada. Statistics Canada reserves the right to decline to announce a winner if too few entries are received. Winners will be announced by May 31st, 2019.

Contact information

Questions can be addressed to Attn: Data competition questions at statcan.cder-cdre.statcan@canada.ca.

An email list of all participants will be compiled, and responses to general questions will be shared with all participants.


September 2015

Remote access to WES

Researchers may apply for remote access to the Workplace Employee Survey (WES) through a service offered by CDER. Access is granted under certain conditions and remote access to the data is offered on a cost recovery basis. When applying for remote access, justification is needed as to why access to the data through an RDC is not being sought.

Once a project is approved, researchers are provided with synthetic data from which they develop and test their computer programs (in SAS or STATA). Researchers then transmit their programs to a CDER analyst via a dedicated e-mail address. The programs are run on secure data servers by the analyst, who also vet the outputs to ensure they meet disclosure and confidentiality requirements and return the vetted outputs to the researcher via e-mail.

Note: The researcher is fully and entirely responsible for developing and testing his/her programs before submitting them to be run. Statistics Canada does not provide programming assistance, offer support for the use of software, or make modifications to the programs that are submitted. If a program does not run properly, or if the researcher submits too many complex programs, the researcher must modify the programs and re-submit.

For more information, contact CDER at statcan.cder-cdre.statcan@canada.ca.


May 2015

The Workplace Survey is now available at CDER!

Date modified: