Co-op student explores the power of Big Data

by Kathrin Knorr, Simon Fraser University

Editor's note: The following is an edited version of an article originally featured in Simon Fraser University (SFU)'s The Co-op Close-up series. The article was modified and translated by the Data Science Network for the Federal Public Service, and reproduced here with permission from SFU.

The article features Mihir Gajjar, a co-op student working in the Data Science Division at Statistics Canada. He completed a Bachelor of Technology in Information and Communication Technology at Ahmedabad University, India. He recently completed the Professional Master's program in Computer Science at SFU. The article also features his previous supervisor at Statistics Canada, Meredith Thomas.

SFU: Can you tell us about Statistics Canada? What is it like working there?

Mihir Gajjar: I have been working in the amazing Data Science Division at Statistics Canada. In this division, data scientists work with subject-matter analysts, methodologists, and IT specialists to develop big data-processing, machine-learning, and AI (artificial intelligence) strategies.

For me, there are several highlights about the work culture at Statistics Canada, such as the daily scrum meetings with the supervisor and team members where we prioritize the day's work and discuss other important issues. I also like the agile development approach for most of the projects so that each project has a lifespan of four months, and then the project is ready for deployment. We also have weekly machine-learning technical seminars where we learn about advancements in the field and discuss relevant research papers.

SFU: Can you tell us a bit about the project(s) you are working on in your co-op position?

Mihir Gajjar, student at Simon Fraser University (Master's in Computer Science program) and co-op student with Data Science Division.

Photo: D. Taiwo.

Mihir Gajjar: At Statistics Canada, analysts spend a lot of time searching for information about enterprises. With the amount of news growing exponentially, it becomes difficult to manually track all the published information. The project I am working on seeks to automate the tasks of detecting events of interest from news articles and extracting their attributes.

For example, events of interest that are related to enterprises might include mergers and acquisitions, equity markets, and branch openings, whereas event attributes are things like dates and locations of said events. Ultimately, my work allows economic analysts to spend less time on information searches and devote more time to analysis. This multidisciplinary work is a collaboration between teams, including portfolio and accounts managers, methodologists, and other data scientists.

The main technical tasks include finding similarities between articles for ranking, removing duplicates, and text summarization. The goal is to provide subject-matter experts with a dashboard to support the detection and tracking of desired events over a specified time span.

The data for our models consist of 1.5 million news articles from the Dow Jones Data News and Analytics Platform and NewsDesk, a shared governmental system. Exploratory data analysis and basic text pre-processing were used to train various machine-learning models.

SFU: How did the Big Data program prepare you for your co-op position?

Mihir Gajjar: SFU's Big Data program provided me with theoretical as well as practical hands-on experience through lectures and a project-based learning environment. Subjects like machine learning helped me to develop a solid theoretical base while the practical assignments and group projects allowed me to implement the concepts and try out new tools and technologies.

Along with sound technical knowledge, the program equipped me with essential skills, such as working in a team, communicating and sharing ideas with other people, giving presentations, critical thinking, technical writing, and time management.

SFU: What are your most valuable takeaways from this co-op experience?

Mihir Gajjar: Through the project I have been working on, I learned a lot about the practical aspects of working as a data scientist. Part of the project involved extracting data using an external company's Application Programming Interface, which meant weekly meetings with its development team. This helped me learn how to think analytically and design questions which aid in understanding the quality and the depth of the data. I also learned about the importance of fully understanding the user's needs in order to develop a product that meets those requirements.

Working at Statistics Canada gave me exposure to real-world data science projects and taught me how to create and execute a technical plan to achieve the desired goals. This is my first time working as a data scientist and this experience has improved my skills and made me feel confident about working in this role moving forward in my career.

SFU: What do employers say about our students?

Meredith Thomas, Chief, Data Science Division: Mihir is a great fit for this work environment, as he is always open to learning new approaches in technology, and works well independently or in a team setting. Partnered with a senior data scientist, Mihir continues to grow in his time here at Statistics Canada, moving from Natural Language Processing projects to image processing projects with enthusiasm and focus. He is a valued member of our team.

Date modified: