Business data scientist challenge
Statistics Canada continuously works to improve the relevance and timeliness of its data. One area of progress is the publication of data series on business performance such as entry, exit and employment re-allocation. These data inform about a dynamic process within the Canadian economy whereby new firms enter the market, successful firms grow, and unsuccessful firms shrink or exit the market. While informative, these data are only available with a multi-year lag.
To produce more timely estimates for entry and exit, a set of experimental quarterly estimates were created. These estimates inform about entry up to the most recent quarter, but exits continue to be estimated with a lag of up to 7 quarters.
The challenge is to apply data analytics or analysis techniques to increase our understanding of the state of entry and exit in the Canadian economy. Of particular interest are methods for producing more timely estimates for exits, methods for identifying and accounting for aberrant observations such as leverage points or outliers, and methods for incorporating unclassified firms into analysis.
Submissions can include forecasts or predictive models, applications of machine learning, data visualizations, interactive tools or dashboards, web scraping or data gathering tools, text mining or any other analytical technique or process available. Publicly available data sources may be combined with the entry and exit data, but this must be done in accordance with applicable laws.
Submissions and eligibility
The competition is open to graduate students and senior undergraduate students from a Canadian university or Canadian/Permanent Resident graduate students and senior undergraduate students at a foreign university. Applicants may be enrolled in one of the following fields: Economics, Data Science, Computer Science, Mathematics and Statistics. Submissions may be made by teams of up to 3 people.
To register for the competition, applicants must submit a letter of support from a faculty member (not to be longer than half a page) and the names of team members.
Contest entries must consist of an output (e.g. a model; a visualization; a dashboard, etc) as well as:
- Code which is thoroughly documented and runs without manual intervention once data are loaded;
- A report, if necessary, that has a maximum of 1,500 words and may contain up to 4 tables and up to 4 charts/visualizations;
There are no restrictions on the type of software that may be used.
Contest entry and deadline
To register for the contest, please submit an email indicating the team, its members and a short (half page) letter of support from a faculty member to Attn: Data competition registration at email@example.com. The registration can be made at any time up to March 1st, 2019. Following registration, an email containing the dataset as well as background material will be sent to participants.
The deadline for submission is March 31st, 2019. Please submit the entry, project code, and a short report if needed to Attn: Data competition submission at firstname.lastname@example.org by March 31, 2019. Only finalists will be contacted.
Presentation of results
Applications will be evaluated by a panel of Statistics Canada staff. Applications will be judged according to their novelty, usability, accuracy or ability to derive value added from the contest data.
A representative of the winning team will be invited to present their results to Statistics Canada and have their submission publicized by Statistics Canada. All copyright and intellectual property of submissions reverts to Statistics Canada. Statistics Canada reserves the right to decline to announce a winner if too few entries are received. Winners will be announced by May 31st, 2019.
Questions can be addressed to Attn: Data competition questions at email@example.com.
An email list of all participants will be compiled, and responses to general questions will be shared with all participants.
REMOTE ACCESS TO WES
Researchers may apply for remote access to the Workplace Employee Survey (WES) through a service offered by CDER. Access is granted under certain conditions and remote access to the data is offered on a cost recovery basis. When applying for remote access, justification is needed as to why access to the data through an RDC is not being sought.
Once a project is approved, researchers are provided with synthetic data from which they develop and test their computer programs (in SAS or STATA). Researchers then transmit their programs to a CDER analyst via a dedicated e-mail address. The programs are run on secure data servers by the analyst, who also vet the outputs to ensure they meet disclosure and confidentiality requirements and return the vetted outputs to the researcher via e-mail.
Note: The researcher is fully and entirely responsible for developing and testing his/her programs before submitting them to be run. Statistics Canada does not provide programming assistance, offer support for the use of software, or make modifications to the programs that are submitted. If a program does not run properly, or if the researcher submits too many complex programs, the researcher must modify the programs and re-submit.
For more information, contact CDER at firstname.lastname@example.org.
The Workplace Survey is now available at CDER!