Winners of the Business Data Challenge 2018/2019
Statistics Canada is pleased to announce the winners of the Business Data Scientist Challenge for 2018/2019.
The winners are Nicolas Leblanc, Mindy Lin and Jasper Zhu from the University of Waterloo!
This year, the challenge received numerous high quality submissions from across Canada. The challenge asked participants to evaluate quarterly business dynamics measures (entry, exit, firm openings, firm closings, number of firms). Teams were provided with the contest dataset which could be used on its own, or which could be merged with other data sources.
For their entry, Nicolas, Mindy and Jasper chose to focus on business dynamics for Canada, and combined the contest dataset with information from the Labour Force Survey and the Quarterly Survey of Financial Statements. They added a linear trend as well as categorical variables for each of the four quarters. They wrote an R program that uses this information to predict the number of entries, exits, openings, closings and active firms.
The predictions are based on two models, one for variable selection and one for time series prediction. To select the relevant variables for prediction, Nicolas, Mindy and Jasper used a negative binomial general linear model with LASSO from the glmnetFootnote 1 package. Using the cv.glmnet function, they determined which variables are most relevant for predicting a particular type of business dynamics measure.
They then applied a general linear model for count data from the tscountFootnote 2 package. The tsglm function was used to make predictions about the number of entries, exits, openings, closings and firm counts. The time series model includes two additional parameters for the lags of the dependent variable and the number of lags of the regressors. These parameters were determined based on a grid search, and the results of the prediction were placed into an R shinyFootnote 3 app that visualizes the original time series as well as the predictions.
Thank you to everyone who entered and congratulations once again to Nicolas, Mindy and Jasper.
Upcoming business data scientist challenge
The details of the Statistics Canada Business Data Scientist Challenge for 2019-2020 will be announced in September 2019, so keep an eye out and tell all the students you know.
The Business Data Scientist Challenge is based on a business dataset available from Statistics Canada. The dataset and the goal of the challenge change over time. Using the dataset, teams of up to three persons are asked to apply analytical techniques so that an audience can gain insight or understanding of an economic phenomenon.
Depending on the type of challenge, teams can use forecasts or predictive models, applications of machine learning, data visualizations, interactive tools or dashboards, web-scrapping or data gathering tools, text mining or any other analytical technique or process available. Publicly available data sources may be combined with the challenge dataset, but this must be done in accordance with applicable laws.
Submissions and eligibility
The competition is open to graduate students and senior undergraduate students from a Canadian university or Canadian/Permanent Resident graduate students and senior undergraduate students at a foreign university. Applicants may be enrolled in one of the following fields: Economics, Data Science, Computer Science, Mathematics and Statistics. Submissions may be made by teams of up to three persons.
Teams may use Python, R, SAS or Stata. Other software may be used depending on approval from Statistics Canada.
Please address questions to Attn: Data competition questions at email@example.com.
Business data scientist challenge
Statistics Canada continuously works to improve the relevance and timeliness of its data. One area of progress is the publication of data series on business performance such as entry, exit and employment re-allocation. These data inform about a dynamic process within the Canadian economy whereby new firms enter the market, successful firms grow, and unsuccessful firms shrink or exit the market. While informative, these data are only available with a multi-year lag.
To produce more timely estimates for entry and exit, a set of experimental quarterly estimates were created. These estimates inform about entry up to the most recent quarter, but exits continue to be estimated with a lag of up to 7 quarters.
The challenge is to apply data analytics or analysis techniques to increase our understanding of the state of entry and exit in the Canadian economy. Of particular interest are methods for producing more timely estimates for exits, methods for identifying and accounting for aberrant observations such as leverage points or outliers, and methods for incorporating unclassified firms into analysis.
Submissions can include forecasts or predictive models, applications of machine learning, data visualizations, interactive tools or dashboards, web scraping or data gathering tools, text mining or any other analytical technique or process available. Publicly available data sources may be combined with the entry and exit data, but this must be done in accordance with applicable laws.
Submissions and eligibility
The competition is open to graduate students and senior undergraduate students from a Canadian university or Canadian/Permanent Resident graduate students and senior undergraduate students at a foreign university. Applicants may be enrolled in one of the following fields: Economics, Data Science, Computer Science, Mathematics and Statistics. Submissions may be made by teams of up to 3 people.
To register for the competition, applicants must submit a letter of support from a faculty member (not to be longer than half a page) and the names of team members.
Contest entries must consist of an output (e.g. a model; a visualization; a dashboard, etc) as well as:
- Code which is thoroughly documented and runs without manual intervention once data are loaded;
- A report, if necessary, that has a maximum of 1,500 words and may contain up to 4 tables and up to 4 charts/visualizations;
There are no restrictions on the type of software that may be used.
Contest entry and deadline
To register for the contest, please submit an email indicating the team, its members and a short (half page) letter of support from a faculty member to Attn: Data competition registration at firstname.lastname@example.org. The registration can be made at any time up to March 1st, 2019. Following registration, an email containing the dataset as well as background material will be sent to participants.
The deadline for submission is March 31st, 2019. Please submit the entry, project code, and a short report if needed to Attn: Data competition submission at email@example.com by March 31, 2019. Only finalists will be contacted.
Presentation of results
Applications will be evaluated by a panel of Statistics Canada staff. Applications will be judged according to their novelty, usability, accuracy or ability to derive value added from the contest data.
A representative of the winning team will be invited to present their results to Statistics Canada and have their submission publicized by Statistics Canada. All copyright and intellectual property of submissions reverts to Statistics Canada. Statistics Canada reserves the right to decline to announce a winner if too few entries are received. Winners will be announced by May 31st, 2019.
Questions can be addressed to Attn: Data competition questions at firstname.lastname@example.org.
An email list of all participants will be compiled, and responses to general questions will be shared with all participants.
REMOTE ACCESS TO WES
Researchers may apply for remote access to the Workplace Employee Survey (WES) through a service offered by CDER. Access is granted under certain conditions and remote access to the data is offered on a cost recovery basis. When applying for remote access, justification is needed as to why access to the data through an RDC is not being sought.
Once a project is approved, researchers are provided with synthetic data from which they develop and test their computer programs (in SAS or STATA). Researchers then transmit their programs to a CDER analyst via a dedicated e-mail address. The programs are run on secure data servers by the analyst, who also vet the outputs to ensure they meet disclosure and confidentiality requirements and return the vetted outputs to the researcher via e-mail.
Note: The researcher is fully and entirely responsible for developing and testing his/her programs before submitting them to be run. Statistics Canada does not provide programming assistance, offer support for the use of software, or make modifications to the programs that are submitted. If a program does not run properly, or if the researcher submits too many complex programs, the researcher must modify the programs and re-submit.
For more information, contact CDER at email@example.com.
The Workplace Survey is now available at CDER!