Evaluation of the Census of Agriculture and Innovation in the Agriculture Statistics Program

Evaluation Report

March 2020

The report in short

The Agriculture Statistics Program (ASP) is comprised of an integrated set of components including crop and livestock surveys, farm economic statistics, agri-environmental statistics, tax and other administrative data, research and analysis, remote sensing and the Census of Agriculture (CEAG). The statistical information produced by the CEAG is unique in its ability to provide a comprehensive snapshot of the industry and its people, as well as small area data, both of which are instrumental not only to the agricultural industry, but also for meeting the data requirements of environmental programs, health programs, trade and crisis management. ASP statistical information is used by a wide range of organizations, including different levels of government, not-for-profit and private organizations, academic institutions, and individual Canadians.

This evaluation was conducted by Statistics Canada in accordance with the Treasury Board Secretariat's Policy on Results (2016) and Statistics Canada's Risk-Based Audit and Evaluation Plan (2019/2020 to 2023/2024). The main objective of the evaluation was to provide a neutral, evidence-based assessment of the 2016 CEAG dissemination strategy, and of the design and delivery of the CEAG migration to the Integrated Business Statistics Program (IBSP). The evaluation also assessed projects in the broader ASP, with a focus on projects supporting Statistics Canada's modernization initiative.

The evaluation methodology consisted of a document review, administrative reviews and key informant interviews with Statistics Canada professionals working in the Agriculture Division, and other relevant divisions. Additionally, interviews were conducted with key users and partners external to Statistics Canada. The triangulation of these data collection methods was used to arrive at the overall evaluation findings.

Key findings and recommendations

Census of Agriculture dissemination strategy

CEAG data are used by a wide range of organizations to understand and monitor trends, formulate advice on policies and programs, and address requests from stakeholders. The majority of interviewees were satisfied with the dissemination of the 2016 CEAG and noted it was an improvement over 2011. Data tables were identified as the most used product while other products were relevant but less useful. In terms of timeliness, interviewees were satisfied with the release of the first set of tables (farm operator data - one year after Census Day); however, the timeframe for releasing the remaining two sets of data tables affected their usefulness (2.5 years after Census Day for the last data table release with socioeconomic data). They also noted that there were gaps in cross-analysis with non-agricultural sectors and in emerging sectors. Finally, web tools were not being used because of a lack of guidance on how to use them and how to interpret the data.

The Assistant Chief Statistician (ACS), Economic Statistics (Field 5), should ensure that:

Recommendation 1

For the 2021 CEAG, the Agriculture Division explore ways to improve the timeliness of the last two sets of data tables (historical data, and socio-economic data) and increase cross-analysis with non-agricultural sectors.

Recommendation 2

Web tools include guidance on how to use them and how to interpret data from them. A proactive approach to launching new tools should be taken. Webinars were identified as an effective channel and the use of other channels would allow for even a wider coverage.

Census of Agriculture migration to the Integrated Business Statistics Program

The CEAG migration to the IBSP was proceeding as planned at the time of the evaluation. The transition phase was complete and the integration phase was well underway. Governance structures were in place and deliverables and schedules were being managed effectively. Efforts to resolve issues, such as those related to compatibilities between the Collection Management Portal (CMP) and the IBSP, and the availability of tools and capacity to support data quality assessments, were continuing. The start of the production phase will bring additional risks as new resources become involved and time pressures increase.

The ACS, Field 5, should ensure that:

Recommendation 3

Unresolved issues for the migration to the IBSP, including incompatibilities between the IBSP and the CMP as well as the IBSP processing capacity, are addressed prior to the production phase.

Recommendation 4

Significant risks during the production phase, particularly with regard to data quality assessments and the exercising of roles and responsibilities, are monitored and mitigated.

Projects supporting the modernization initiative

All five projects reviewed were aligned with the modernization pillars and expected results. Most of the projects focussed on increasing the use of data from alternative sources and integrating data. The evaluation found that while governance structures existed and regular monitoring was taking place, project management practices could be strengthened. For example, clearly defined measurable outcomes were often missing, best practices were not being systematically documented, shared or leveraged, and risk management was ad-hoc in some cases. Project management is perceived to be time and resource consuming in an environment focussed on expediency.

The ACS, Field 5, should ensure that:

Recommendation 5

Planning processes for future projects falling outside the scope of the Departmental Project Management FrameworkFootnote 1 include an initial assessment that takes into account elements such as risk, materiality, public visibility and interdependencies. The assessment should then be used to determine the appropriate level of oversight and project management.

Recommendation 6

Processes and tools for documenting and sharing of best practices are implemented and lessons learned from other organizations (internal and external) are leveraged.

What is covered

The evaluation was conducted in accordance with the Treasury Board Secretariat's Policy on Results (2016) and Statistics Canada's Integrated Risk-Based Audit and Evaluation Plan (2019/2020 to 2023/2024). In support of decision making, accountability, and improvement, the main objective of the evaluation was to provide a neutral, evidence-based assessment of the 2016 Census of Agriculture (CEAG) dissemination strategy, and of the design and delivery of the CEAG migration to the Integrated Business Statistics Program (IBSP)Footnote 2. The evaluation also assessed projects in the broader Agriculture Statistics Program (ASP), with a focus on projects supporting Statistics Canada's modernization initiative.

The Agriculture Statistics Program

The mandate of the ASP is to provide economic and social statistics pertaining to the characteristics and performance of the Canadian agriculture sector and its people. It aligns with section 22 of the Statistics Act, which stipulates that Statistics Canada shall "collect, compile, analyse, abstract and publish statistics in relation to all or any of the following matters in Canada: (a) population, (b) agriculture." It also aligns with section 20Footnote 3 of the Statistics Act, which requires Statistics Canada to conduct a CEAG. A CEAG has been conducted nationally and concurrently with the Census of Population since 1951Footnote 4.

According to the ASP Performance Information Profile, the ASP provides data to support and evaluate the fulfillment of requirements or objectives contained in other legislation such as the Farm Products Agencies Act, the Agricultural Products Marketing Act, and the Pest Control Products Act. The ASP also supplies the Canadian System of Macroeconomic Accounts with data required under the Federal-Provincial Fiscal Arrangements Regulations and the International Monetary Fund's Special Data Dissemination Standard.

The ASP includes an integrated set of components that includes crop and livestock surveys, farm economic statistics, agri-environmental statistics, tax and other administrative data, research and analysis, remote sensing and, the CEAG.

The Census of Agriculture

The CEAG collects data on the state of all agricultural operations in CanadaFootnote 5 including: farms, ranches, dairies, greenhouses, and orchards. The information is used to develop a statistical portrait of Canada's farms and agricultural operators. Typically, data are collected on: size of agricultural operation, land tenure, land use, crop area harvested, irrigation, livestock numbers, labour, and other agricultural inputs. Its "whole farm" approach to capturing data directly from agricultural producers provides a comprehensive count of the major commodities of the industry and its people, and a range of information on emerging crops, farm finances, and uses of technologies in agricultural operations.

The objectives of the CEAG are

  • to maintain an accurate and complete list of all farms and types of farms for the purpose of ensuring optimal survey sampling - at the lowest cost and response burden - through categorization of farms by type and sizeFootnote 6
  • to provide comprehensive agriculture information for detailed geographic areas such as counties - information for which there is no other source and that is critical to formulating and monitoring programs and policies related to the environment, health, and crisis management for all levels of government
  • to provide measurement of rare or emerging commodities, which is essential for disease control and trade issues
  • to provide critical input for managing federal and provincial government expenditures in the agriculture sector.

The Agriculture Division of the Agriculture, Energy and Environment Statistics Branch is responsible for the ASP. The division has many long-standing strategic partnerships with key stakeholders and data users, including federal departments and agencies, provincial and territorial agriculture ministries, local and regional governments, farmers' associations, the agriculture industry, universities, and researchers. The division has established forums to obtain feedback on emerging issues and needs. These include the Advisory Committee on Agriculture and Agri-Food Statistics and the Federal-Provincial-Territorial Committee on Agriculture Statistics. Internal governance bodies such as the CEAG Steering Committee are also in place to help direct and monitor implementation.

The Evaluation

The scope of the evaluation was established based on meetings and interviews with divisions involved in the ASP. The following areas were identified for review:

Evaluation issues, Evaluation questions
Evaluation issues Evaluation questions
2016 CEAG dissemination strategy To what extent did the 2016 CEAG dissemination strategy address the needs of key users in the following areas?
  • Timeframe of releases (i.e., for all releases, between each release)
  • Coverage and level of detail
  • Types and formats of products
  • Cross-analysis with non-agricultural sectors
  • Access to data
Design and delivery: CEAG migration to the IBSP To what extent are governance structures for collection and processing (migration to the IBSP) designed to contribute to an effective and efficient delivery of the 2021 CEAG?
ASP projectsFootnote 7 supporting the modernization initiative To what extent are there effective governance, planning and project management practices in place to support modernization projects within the ASP?

Guided by a utilization-focused evaluation approach, the following quantitative and qualitative collection methods were used:

 
Administrative reviews

Review of ASP administrative data on activities, outputs and results.

 
Document review

Review of internal agency strategic documents.

Key informant interviews (external) n=28

Semi-structured interviews with key users from federal departments, provincial and local governments, farm associations, private sector organizations and research institutions.

Key informant interviews (internal) n=14

Semi-structured interviews with individuals working in the Agriculture Division and partner divisions.

Four main limitations were identified, and mitigation strategies were employed:

Limitations, Mitigation strategies
Limitations Mitigation strategies
Because of the large number of users and partners using data, the perspectives gathered through external interviews may not be fully representative. External interviewees were selected using specific criteria to maximize a strategic reach for the interviews. Different types of organizations from a wide range of locations across Canada, and that use CEAG data extensively were selected. Evaluators were able to find consistent overall patterns.
Key informant interviews have the possibility of self-reported bias, which occurs when individuals who are reporting on their own activities portray themselves in a more positive light. By seeking information from a maximized circle of stakeholders involved in the ASP, including the CEAG migration to the IBSP (e.g. the main groups involved, multiple levels within groups), evaluators were able to find consistent overall patterns.
Limited documentation was available on the projects sampled for the evaluation. Key staff working on ASP projects were interviewed and a strategy to gather additional documents during the interview sessions was put in place. Additional interviews were conducted, as needed, to fill the gaps.
The scope of the evaluation related to innovation reflected only a select number of topics (i.e., alignment, project management) rather than the full spectrum of factors which may have an impact. The evaluation methodology was conducted in such a way that other topics related to innovation could be identified and considered.

What we learned

1.1 2016 Census of Agriculture dissemination strategy

Evaluation question

To what extent did the 2016 CEAG dissemination strategy address the needs of key users in the following areas?

  • Timeframe of releases (i.e., for all releases, between each release)
  • Coverage and level of detail
  • Types and formats of products
  • Cross-analysis with non-agricultural sectors
  • Access to data

Summary

To inform the 2021 CEAG dissemination strategy the evaluation assessed the extent to which the 2016 dissemination strategy addressed the needs of key users in different areas. The majority of users considered the 2016 CEAG an improvement compared with the 2011 CEAG and were satisfied with the overall approach taken. However, the evaluation found some areas for improvement, particularly with regard to the timeframe of releases, coverage, and guidance on web tools.

Census of Agriculture data are used for multiple purposes with data tables being the product of choice

CEAG data are used by organizations to portray the agriculture sector in their jurisdiction or sector of the economy. For provincial government departments, their portrait allows them to understand trends within their province and to compare them with other jurisdictions. Subprovincial data are also available for analysis of smaller geographic areas. For farm associations, data allow them to monitor trends within their area of interest. Overall, CEAG statistical information is used for identifying and monitoring trends, providing advice on policies and programs, addressing requests or questions from various stakeholders, and informing internal or public communications.

A large majority of external interviewees mentioned that, in general, the 2016 CEAG products and statistical information shed light on the issues that were important for their organization. The evaluation found that the data tables from Statistics Canada's website were the products of greatest utility to users. In particular, the Farm and Farm Operator Data tables were identified as the products most used. This was especially true for organizations that had internal capacities to conduct their own analysis. The analytical products and The Daily releases were identified as being less useful, but still relevant since they provided a different and objective perspective on specific topics. This was true for other products as well (e.g., maps, infographics) - interviewees responded that they used them only occasionally or rarely but still believed they were useful. Finally, a number of provincial users also mentioned that they received a file containing CEAG statistical information, which helped facilitate their ability to conduct their own analyses.

Table 1: Use of Census of Agriculture products
Products Extensively Occasionally Rarely Don't know
Data tables from the website 17 6 1 0
The Daily releases 9 5 10 0
Boundary files 7 6 11 0
Analytical products 5 12 7 0
Thematic maps 4 8 12 0
Infographics 3 9 12 0
Dynamic web application 3 6 14 1

Besides CEAG statistical information, a majority of users mentioned that they consulted additional sources of information, either from Statistics Canada or other national and international organizations, to fill gaps. This included information on commodity prices, imports and exports of agricultural products, land values, and interest rates. Users consulted international sources to compare data with other countries (e.g., United States and Australia) or to assess global market demand for certain agricultural commodities (e.g., livestock, crops, etc.).

Historical and socioeconomic data tables wanted sooner

Three data table releases took place for the 2016 CEAG: Farm and Farm Operator Data (May 10, 2017 - one year after Census Day); select historical data (December 11, 2017 – approximately one and a half years after Census Day); and a socioeconomic portrait of the farm population (November 27, 2018 – approximately two and a half years after Census Day). It should be noted that the tool used to create the socioeconomic portrait of the farm population was not part of the original scope for the 2016 CEAG but was added later - thus the reason for the relatively late release.

The majority of interviewees believed the time lapse to receive the first set of data tables was satisfactory given the quality of information they received. While they would have welcomed an earlier release, they recognized the level of effort required to produce the information and felt the time lapse was reasonable given the quality of information they received. However, overall, interviewees believed that the time lapse between Census Day and the final data table releases, specifically for the socioeconomic tables, affected the usefulness of the statistical information. In particular, organizations developing policies or programs targeting young farmers, specific population groups, or educational advancements would have benefited from timelier data.

Figure 1: Dissemination schedule (refer to Appendix B for additional details)

May 10, 2016

Launch of the 2016 Cencus of Agriculture and the 2016 Census of Population.

May 10, 2017

The first set of products for the 2016 CEAG - including a Daily release, farm and farm operator data (47 data tables), provincial and territorial trends (11 analytical products) and provincial reference maps (34 maps).

May - June 2017

A series of weekly analytical articles were released covering different topics, including an infographic titled 150 Years of Canadian Agriculture.

September - November 2017

A boundary file and analytical products were released.

December 11, 2017

Select historical data were released.

December 2017 – April 2018

A number of maps along with an analytical article were released.

November 27, 2018

The Agricultural Stats Hub, a dynamic web application, was released as well as a Daily article, 13 data tables and 3 infographics. The application provided a socioeconomic overview of the farm population by linking agricultural and population data.

December 2018 – March 2019

A number of analytical articles were released

July 3, 2019

Last release from the 2016 CEAG

Table 2: Satisfaction with releases
How satisfied are you with the following? Satisfied Somewhat satisfied Not satisfied Unsure
Time lapse between Census Day and first release 15 5 3 1
Time lapse between each release 13 5 2 4
Time lapse between Census Day and release of all data 6 11 4 3

Some interest in preliminary estimates, so long as differences are small

Users were asked about the possibility of releasing preliminary estimates for specific high-level variables. The estimates would differ from the final data released, however no specific examples of variables were provided to interviewees for consideration.

Half of the interviewees were not interested, with a large proportion advising against it. Several explained that the release of preliminary estimates would create confusion within their organizations and they would be required to explain the differences between the preliminary and final data. Most interviewees noted that any policy decision-making and trend analysis would continue to be based solely on final data.

Those who were either "very interested" or "slightly interested" indicated that the difference between the estimates and the final data would need to be small, otherwise they would prefer the status quo.

Some gaps remain

The Agriculture Division has several mechanisms in place to identify information gaps including: regular pre-census cycle consultations, the Federal-Provincial-Territorial Committee on Agriculture Statistics, the Advisory Committee on Agriculture Statistics, and engagement with national farm organizations. Based on these mechanisms, the CEAG builds on the content approved for the previous cycle to better address new and emerging agricultural activities. In addition, projects recently implemented by the Agriculture Division, particularly the Agriculture-Zero project, have filled several gaps (e.g., temporary foreign workers data).

The majority of interviewees were satisfied with the diversity of topics and themes covered. However, a number of information gaps were identified, particularly regarding emerging operations and fast-growing sectors such as organic farming. Additional statistical information and further analysis were also identified related to farm succession, labour (e.g., foreign and contract workers), pesticide use, and new land use categories (e.g., loss of land to urbanization). Additional variables covered over time (i.e., historical data) was also identified as a need. Finally, all interviewees wanted more granular data, although they recognized there are limitations related to confidentiality.

Table 3: Coverage
How satisfied are you with the following? Satisfied Somewhat satisfied Not satisfied Unsure
Types of agricultural operations covered 15 9 0 0
Number of topics or themes covered in each release 15 5 0 4
Cross-analysis with other topics and other agricultural surveys 10 7 2 5

Interviewees also wanted additional cross-cutting analysis between the agricultural sector and other sectors. For the 2016 CEAG, analysis with non-agricultural data, such as technology, innovation, and socioeconomic issues was provided to users. This approach was highly regarded by those interviewed - but they wanted more. Evidence suggests that there is a growing appetite for cross-cutting analysis in areas such as technology, farm profitability, demographic shifts, transportation, and the environment.

Increased guidance on tools is needed

Two web tools were released for the 2016 CEAG: boundary files and the Agriculture Stats Hub. The evaluation found relatively low use of these two products when compared with other products such as data tables. Although some interviewees used the boundary files, the majority rarely did. Similarly, few interviewees used the Agriculture Stats Hub. A lack of guidance on how to use the tools and how to interpret the data were noted as key impediments. The lengthy timeframe for releasing socioeconomic data, which included the Agriculture Stats Hub, was also identified as a factor that limited the use of the Hub.

Although the use of existing web tools for the 2016 CEAG was somewhat limited, a majority of interviewees were interested in having additional web-based tools, such as interactive maps, custom table building, and query tools, which would allow for the increased customization of products. As data tables were the product most used, tools attached to the tables would greatly benefit users. However, guidance and support must accompany the tools, and a more active approach to launching the tools would be recommended.

More prominent communication of methodological information would be useful

Although methodological information is generally available, some interviewees noted that it would be useful to have it more prominently displayed in the products that are released, either in The Daily or as footnotes in the data tables. For example, since definitions used by Statistics Canada may differ from definitions used by farmer associations (e.g., how farm operator counts are calculated), information to explain the differences would be helpful.

Users were aware of releases, and data were accessible

The evaluation found a high level of satisfaction with the accessibility of statistical information even though Statistics Canada's website was identified as being a challenge. A high level of satisfaction was also reported for any custom data received. Interviewees were highly satisfied with the time lapse between first contact with Statistics Canada and the delivery of the product, the quality of the product, and the level of detail provided.

In terms of awareness of releases, the majority of interviewees stated that they were informed far enough in advance and were satisfied with the channels used. Most interviewees identified reminder emails as the most effective channel for being kept informed about releases.

Table 4: Notification of releases
Best way to be informed of releases Number of Respondents
Reminder emails 20
Calendar invites 7
Webinars 5
Social media posts 3

In addition, those who participated in webinars were very satisfied since the webinars provided additional information on the data available and major trends observed. Webinars were identified as opportunities to raise awareness of the products and data that will be available and to facilitate interpretation of the data and the use of the web tools.

1.2 Design and delivery: Census of Agriculture migration to the Integrated Business Statistics Program

Evaluation question

To what extent are governance structures for collection and processing (migration to the IBSP) designed to contribute to an effective and efficient delivery of the 2021 CEAG?

Summary

The evaluation assessed whether the governance structures associated with the CEAG's migration to the IBSP - including roles and responsibilities, interdependencies, and project management practices - will contribute to an effective and efficient delivery of the 2021 CEAG. The evaluation found some areas of risk that could have a negative impact on the delivery of the 2021 CEAG.

Migrating the Census of Agriculture to the Integrated Business Statistics Program is expected to create benefits

At the time of the evaluation, the Agriculture Division had already successfully migrated all of its surveys to the IBSP. The last component to be migrated is the CEAG; migration work began in fiscal year 2018/2019 and is expected to continue until fiscal year 2022/2023. The IBSP migrations are conducted in three phases: transition (defining program-specific requirements), integration (development and testing activities), and production (collection and processing tasks are implemented through the IBSP).

Because of its five-year cycle, the CEAG is considered an ever-migrating component to the IBSP. Similar to the divisional surveys that have already been migrated, it is expected that migration of the CEAG to the IBSP will create specific benefits for the CEAG:

  • reduced number of systems for collection, processing and storage through the adoption of common tools and statistical methods
  • facilitated integration and harmonization of data with all programs in the IBSP, including agriculture surveys
  • increased corporate support for systems, particularly when significant changes occur (e.g., cloud technology)
  • a more targeted approach for collection (i.e., follow-up operations) through the IBSP's Quality Indicators and Measures of Impact (QIMI) feature.

Roles and responsibilities are a risk during the production phase

For previous CEAG cycles, the Agriculture Division was responsible for designing, planning, implementing, and managing all required tasks, such as content determination, collection, processingFootnote 8, data quality assessmentFootnote 9, and dissemination. The migration to the IBSP for the 2021 cycle will change the governance of collection and processing tasks (and associated roles and responsibilities) because the Enterprise Statistics Division (ESD) is responsible for managing the IBSP.Footnote 10

The shift of processing responsibilities to ESD affect the CEAG team since it will now act only in an advisory capacity for this task, rather than being fully responsible for it. The same structures for the overall management of the CEAG will remain within the Agriculture Division while the migration to the IBSP brings in governance structures already established within ESD.Footnote 11

The evaluation found that the early part of the transition phase was challenging for the CEAG team as they were not familiar with the implications of migrating the processing activities to a different system run by another division. The CEAG's management team and ESD were key in resolving early challenges in the transition. In particular, both groups showed leadership in explaining potential benefits and impacts of the migration while ensuring that roles and responsibilities were well communicated and understood. Governance structures are also adequate. The leadership demonstrated during the transition phase facilitated the start of the second phase of the project – the integration phase.

The evaluation found that there are concerns regarding roles and responsibilities during the production phase as new individuals, such as subject-matter experts within the Agriculture Division and the IBSP production team within ESD, become involved while others leave the project as the integration phase ends. Based on previous survey migrations to the IBSP, the roles and responsibilities during the production phase are typically less clear than during previous phases. To help with this, a good practice identified during interviews is the involvement of production staff during the integration phase to help build continuity and understanding – this took place when the surveys conducted by the ASP were migrated to the IBSP. With the CEAG, most of the divisional staff participating in the integration phase are also part of the team for the production phase.

ESD's Change Management Committee, which is responsible for the triage of required changes, will be involved during the production phase. Consideration of escalation processes is required when multiple committees (i.e., the CEAG Steering Committee, the IBSP Project Management Team and the Change Management Committee) are involved in the decision-making process, particularly during crunch periods typically observed in the production phase. Although the change management process has been defined and cross-membership within committees and working groups was identified as a mitigating factor, the risk of ineffective and inefficient decision-making because of an increased number of governing bodies remains.

Deliverables and associated schedules are well-managed

The migration of the CEAG to the IBSP is managed by the existing working groups. So far, for the transition and integration phases, effective practices were in place to manage deliverables, associated schedules, and outstanding issues. The transition phase, led by ESD in collaboration with the Agriculture Division, worked as planned. Activities for the integration phase, which were being implemented at the time of the evaluation, were also working as planned. The current IBSP integration schedule is seen as robust and includes the first set of processing activities. The schedule for the production phase is in place and is reviewed and updated regularly. While deliverables, schedules, and outstanding issues are being managed effectively, the differentiation between outstanding issues and risks inherent to the migration, particularly for the production phase, have yet to be clearly articulated.

In addition to the IBSP migration schedules, two other schedules come into play. As collection for the 2021 Census of Population and the 2021 CEAG are conducted in parallel, the Census of Population schedule is a crucial element for the development and implementation of the CEAG's internal schedule. All three schedules have varying levels of flexibility: the Census of Population schedule is inflexible and the CEAG schedule is flexible while the IBSP, given its focus on collection and processing, is considered to be moderately flexible. The Agriculture Division is the main conduit for the alignment of all schedules. Requirements from the Census of Population schedule are assessed on a continuous basis and discussions are held with ESD, as needed, to modify the IBSP schedule and the CEAG schedule. At the time of the evaluation, no major changes to the schedules were required, but it is expected that shifts will occur during the production phase. JIRA, which is the system used for change management (e.g., outstanding issues, schedules, deliverables) by the CEAG, the IBSP, and the Census of Population is seen as an effective tool.

Incompatibilities between the Collection Management Portal and the Integrated Business Statistics Program to be resolved

A unique approach for data collection will be used for the 2021 CEAG - different from the one used for the 2016 CEAG and different from other Statistics Canada surveys that have migrated to the IBSP. For the 2016 CEAG, collection was under the responsibility of the Agriculture Division and took place through the Collection Management Portal (CMP) – a shared collection platform with the Census of Population. The CEAG team was responsible for monitoring collection and the management of follow-up operations. Because of synchronicity with the Census of Population, the CMP will continue to be used for CEAG collection in 2021.

Surveys under the IBSP (which are business-focused in nature) are typically collected through a different platform, the Business Collection Portal. New and unique linkages between the CMP and the IBSP need to be designed, tested, and operationalized for the 2021 CEAG collection operations. Links were still under development at the time of the evaluation. Although some functionalities are now operational, there is still development work to be done. For example, paradata from the CMP (e.g., information related to the collection process, such as attempts to contact someone, comments provided to an interviewer, completion rate) were not compatible with the IBSP at the time of the evaluation. Although work is being done to resolve the issue, the incompatibility of CMP paradata would disable the IBSP's QIMI feature, which allows for a targeted process for follow-up operations (i.e., prioritizing follow-up operations to target units that have the most effect on the data). QIMI is an effective tool used to support data quality, the 2021 CEAG data quality assessment strategy would need to be adapted should it not be available. There is a risk that some of the relationships between the CMP and the IBSP will not be fully developed or tested in time for the production phase.

Integrated Business Statistics Program processing capacity and available tools will affect data quality assessment activities

Data quality assessment activities will remain under the responsibility of the Agriculture Division. While the IBSP is designed for data collection and processing, it also includes features supporting data quality assessments, such as QIMI, rolling estimates, inclusion of non-response variances, and values attributed by imputation. However, given the volume of data with the CEAG, some validation processes will not be usable, and alternative tools outside the IBSP will need to be developed and tested. The use of alternative tools will require a reconfiguration of the data quality assessment strategy.

Although the generation of rolling estimates is seen as an important step to ensuring data quality, concerns were raised about the IBSP's processing capacity. In previous cycles, the CEAG team was able to impute, run, and analyze data at a higher rate than is currently possible under the IBSP. As data change from the generation of a rolling estimate and its completion, there is a concern that subject-matter experts will be validating outdated data. Although the IBSP's processing capacity has improved since the start of the CEAG migration and further improvements are expected, concerns remain.

Lessons learned from past migrations to the IBSP suggest that the level of effort required for data quality assessments is either similar to or greater than what is typical. A number of large surveys that have migrated to the IBSP in the past have encountered delays during the production phase because of challenges associated with the data quality assessment task.Footnote 12 At the time of the evaluation, the data quality assessment strategy was being developed.

Migration will benefit from an extended timeframe and experience

Migration activities started in fiscal year 2018/2019 and will continue until 2022/2023. Testing activities will also continue, as needed, during collection. The extended timeframe available for testing (because of the CEAG's five-year cycle) will allow for additional testing activitiesFootnote 13 (i.e., with simulated data, actual data from the 2016 CEAG, and data from content tests conducted in 2019). However, the production phase will be implemented with real 2021 data, with no options available for parallel testing.

Finally, since approximately 30 surveys from the Agriculture Division have already migrated to the IBSP, expertise has been built within the division and ESD. Knowledge gained from previous experience will contribute to the successful migration of the CEAG.

Additional pressures may affect the migration

A risk that could affect the CEAG's migration to the IBSP is the move to cloud technology. Although there are no specific scheduled implementation dates for Statistics Canada programs, any move of IBSP components to the cloud technology during the production phase, where most testing will have been completed, would affect the migration. At the time of the evaluation, this topic was still under discussion.

Another element that is noted for every CEAG cycle is the timeliness of content approval. Any changes to content will affect various elements, including the questionnaire and systems.

1.3 Agriculture Statistics Program projects supporting the modernization initiative

Evaluation question

To what extent are there effective governance, planning and project management practices in place to support modernization projects within the ASP?

Summary

The evaluation reviewed a sample of ongoing and completed projects undertaken within the ASP to examine their relationship to Statistics Canada's modernization pillars and expected resultsFootnote 14, and to identify areas for improvement regarding governance, planning and project managementFootnote 15 practices. The evaluation found that the projects were aligned with the modernization initiative and that governance is in place, but project management practices could be improved.Footnote 16

Projects are aligned with the modernization pillars and expected results

Statistics Canada's modernization initiative supports a vision for a data-driven society and economy. The modernization of Statistics Canada's workplace culture and its approach to collecting and producing statistics will result in "greater and faster access to needed statistical products for Canadians."Footnote 17 Five modernization pillars along with expected results have been articulated to guide the modernization initiative (Figure 2).

Figure 2: Statistics Canada modernization initiative

Figure 2: Statistics Canada modernization initiative
Description for Figure 2: Statistics Canada modernization initiative
The Vision: A Data-driven Society and Economy

Modernizing Statistics Canada's workplace culture and its approach to collecting and producing statistics will result in greater and faster access to needed statistical products for Canadians. Specifically, the initiative and its projects will:

  • Ensure more timely and responsive statistics – Ensuring Canadians have the data they need when they need it!
  • Provide leadership in stewardship of the Government of Canada's data asset: Improve and increase alignment and collaboration with counterparts at all levels of government as well as private sector and regulatory bodies to create a whole of government, integrated approach to collection, sharing, analysis and use of data
  • Raise the awareness of Statistics Canada's data and provide seamless access
  • Develop and release more granular statistics to ensure Canadians have the detailed information they need to make the best possible decisions.
The Pillars:

User-Centric Delivery Service:

  • Users have the information/data they need, when they need it, in the way they want to access it, with the tools and knowledge to make full use of it.
  • User-centric focus is embedded in Statistics Canada’s culture.

Leading-edge Methods and Data Integration:

  • Access to new or untapped data modify the role of surveys.
  • Greater reliance on modelling and integration capacity through R&D environment.

Statistical Capacity Building and Leadership:

  • Whole of government, integrated approach to collection, sharing, analysis and use of data.
  • Statistics Canada is the leader identifying, building and fostering savvy information and critical analysis skills beyond our own perimeters.

Sharing and Collaboration:

  • Program and services are delivered taking a coordinated approach with partners and stakeholders.
  • Partnerships allow for open sharing of data, expertise and best practices.
  • Barriers to accessing data are removed.

Modern Workforce and Flexible Workplace:

  • Organization is agile, flexible and responsive to client needs.
  • Have the talent and environment required to fulfill our current business needs and be open and nimble to continue to position ourselves for the future.
Expected Outcome

Modern and Flexible Operations: Reduced costs to industry, streamlined internal processes and improved efficiency/support of existing and new activities.

Most of the projects examined focus on increasing the use of administrative data and integrating data into centralized systems. The evaluation selected a sample of projects through an objective methodology using the following criteria: level of priority for the ASP, budget, expected impact (e.g., data users, respondents, data quality, and costs) and the perceived contribution to modernization. Additional criteria, such as length, start date, and project stage were also considered. Based on this methodology, five projects were selected:

  • The Agriculture-Zero (Ag-Zero) project is a 7-year project which received funding commencing fiscal year 2019/2020. It is designed to reduce response burden by replacing survey data with data from other sources. The purpose of AG-Zero is to undertake multiple pilot projects involving the acquisition and the extensive use of satellite imagery, scanner and other administrative data, and models to serve as inputs to the ASP in place of direct data collection from farmers. The project aims to reduce response burden on farmers to as close to zero as possible by 2026, while maintaining the amount and quality of information available.

    The project adopts a "collect once, use multiple times" approach. Administrative data will be used to directly replace survey data, to model estimates that are currently generated using survey data, and to produce value-added statistical products for stakeholders. Under the umbrella of Ag-Zero, a series of subprojects are planned to be implemented over the seven year periodFootnote 18; at the time of the evaluation, three had been initiated. The following two were selected for review:

    • Pig Traceability uses administrative data to model estimates of pig inventories and has the potential to replace biannual survey estimates with near real-time estimates. The source data are pig trace data collected under the Health of Animals Act.
    • In-season Weekly Yield Estimates uses a combination of satellite imagery, administrative data from crop insurance corporations, and modelling to create in-season estimates of crop yields and area.
  • The Agriculture Taxation Data Program (ATDP) is being redesigned to move from a survey-based to a census-based activity that uses tax records to estimate a range of financial variables including revenues, expenses, and income. The ATDP's redesign to a census-based activity will support replacement of financial data in the CEAG.
  • The Agriculture Data Integration (Ag-DI) project will integrate agriculture commodity surveys that require processing outside the IBSP into the existing Farm Income and Prices Section Data Integration (FIPS-DI) system. The system will combine data from over 100 sources to produce aggregate integrated data for the System of National Accounts. The project will involve the integration of a multitude of spreadsheets and other systems into one common divisional tool. The project will also update the formulas in FIPS-DI to accept the naming convention used by the IBSP or other data sources loaded directly to FIPS-DI. It is expected that cross-analysis between data-sets will be facilitated, particularly when the CEAG will be migrated to the IBSP.
Table 5: Overview of the innovative projects selected, and alignment with modernization pillars
Project Timeframe Alignment with modernization pillars
Ag-Zero
Budget - $ 2.8M
Start: 2019/2020
Length: 7 years
Stage: Planning

Leading-edge methods & data integration: This project involves the use of new sources of data and new methods for collecting data. Extensive use of modelling, machine learning, and data integration are also featured.

Sharing and collaboration: A key element of this project involves the establishment and maintenance of mutually beneficial partnerships with other federal departments and industry associations.

User-centric service delivery: This project is expected to yield improvements in data quality and timeliness of data releases, as well as offer opportunities for new products.

Pig Traceability (AG-Zero sub-project) Start: 2018/2019
Length: 1 to 3 years
Stage: Execution
In-Season Weekly Yield Estimates(AG-Zero sub-project) Start: 2019/2020
Length: 1 to 3 years
Stage: Initiation
Redesign of the ATDP
Budget - $ 1M (approx.)Footnote 19
Start: 2015/2016
Length: 3 to 5 years
Stage: Close-out

User-centric service delivery: Consultations were held with Agriculture and Agri-Food Canada (AAFC) on priorities for the project. AAFC is the primary client and sponsor of the project.

Leading-edge methods and data integration: This project relied heavily on modelling and the integration of agriculture data (CEAG) and tax data.

Ag-DI
Budget - $ 696K (approx.)
Start: 2015/2016
Length: 3 to 5 years
Stage: Execution
Leading-edge methods and data integration: This project features data integration from an operational point of view. The integration will affect efficiency, data quality, and coherence of the data. It will also enable further cross-analysis opportunities.

Governance is in place

Overall, the evaluation found that governance structures are in place to support the projects. Similarly, schedules are developed and regular meetings take place to monitor progress, budgets and outstanding issues.

The projects employ different governance structures. AG-Zero is monitored under the Departmental Project Management Framework (DPMF)Footnote 20 and started on April 1, 2019. The first three years of the project are funded by a modernization investment submission while the final four years will be self-funded with savings realized through the first set of subprojects. Some of the subprojects under the AG-Zero umbrella have additional dedicated funding.

For AG-Zero, as required under the DPMF guidelines, detailed project planning documentation is in place including a project charter, Project Complexity and Risk Assessment, and an IT Development Plan. Monthly dashboards are provided to the Departmental Project Management Office (DPMO), reporting on various aspects including timelines, deliverables, expenditures, and risks. Within the division, a governance structure exists that includes working groups, divisional management, and the CEAG Steering Committee. AG-Zero subprojects are managed through the same governance structure. They are discussed within the division, and updates on elements such as deliverables, risks, and schedules are rolled-up into the AG-Zero monthly dashboard, as needed.

The Ag-DI project is also a DPMF project. It is small in scope with one resource working fulltime and no non-salary investment. Oversight and reporting are via the standard governance structure for the Agriculture Division and the DPMF.

Project management for the ATDP takes place via the regular divisional governance structure; it is not a DPMF project. Evidence indicates that project management has improved over time (e.g., budget planning, schedules, assumptions, governance, roles and responsibilities) and that at the time of the evaluation, adequate governance was in place and the project was on track to meet its overall objectives.

Risk assessments are conducted on an ad hoc basis

Risks for AG-Zero as a whole were identified at the outset of the project and are monitored every month as per DPMF requirements. Risks at the subproject level are meant to be rolled-up to inform risk management at the AG-Zero project level. While project-specific risks are identified and entered into JIRA during regular team lead meetings, there is little evidence that initial risk assessments were conducted for the subprojects. As AG-Zero is the sum of its subprojects, informal risk management at the sub-project level limits the effectiveness of risk management.

For example, the interruption of reliable access to administrative data (short-term or long-term) has been identified as a risk for the AG-Zero project overall. The division has developed mitigation and contingency options, including the feasibility or practicality of remaining "survey ready" in the case that this risk materializes. Because the risk has not been fully assessed at the subproject level, the management of this risk is limited. Similarly, risk management for other non-DPMF activities is taking place on an ad hoc informal basis.

Quantifiable objectives and targets are missing

The projects examined have the potential to advance innovation in important areas such as data collection, processing, analysis, and dissemination. The evaluation found that clearly defined, quantifiable expected outcomes have not been articulated in most cases. There is a general understanding of what types of positive effects these projects "might" generate, but there are few specific objectives that quantify the expected level of improvement in areas such as data quality, cost efficiency, response burden, timeliness, or relevance.

For example, while it is generally assumed that the integration of data from alternative sources will eventually lead to savings in data collection costs, there are no documented expectations for what the level of savings will be and when they will be realized. This is especially true for the subprojects under the AG-Zero umbrella. The AG-Zero project, which has a hybrid funding scheme (i.e., approved funding during the first three years, and self-funding for the remaining four years), does not have a clear plan to identify and measure returns on investment.Footnote 21

Finally, the measurement of returns on investment should be thorough and comprehensive. For example, as data from alternative sources are acquired from external sources in exchange for some type of service (such as data cleaning or preparation), the associated cost of the service must be considered. Non-payment for the administrative data does not mean they are free; there is still a cost for the "quid pro quo" service that must be accounted for. Similarly, associated costs for remaining "survey ready" while using administrative data (i.e., the mitigation strategy implemented for the risk associated with the accessibility of administrative data) should be accounted for.

The establishment of overall performance indicators for projects and for key milestones during the timeline of the project is critical for monitoring the progress of the work and, ultimately, for measuring the return to the agency for the initiative. The return can be in the form of data quality improvements, cost reduction, reduction of response burden, improvements to data access and availability, or any other improvement realized by the agency.

Best practices could be better leveraged

The evaluation found little evidence that best practices and lessons learned from the projects are being shared (or were planned to be shared) outside of the division; nor did it appear that the projects took advantage of experiences acquired by other divisions.Footnote 22 Lessons learned and best practices were not being documented. Instead, they were being deferred until there was "more time."

While minimal effort was made in this regard, staff recognized the importance of sharing and benefiting from others and that sharing and using best practices could be improved. Staff were also aware of channels for this purpose, such as the Innovation Radar and the Economic Statistics Forum. In November 2019, the division provided an overview of the Geospatial Statistics Framework (a system built to view and analyze geospatial data) at the Economics Statistics Forum.

When asked about ways to enhance information sharing, a number of suggestions were provided: encourage the use of existing corporate mechanisms such as the Innovation Radar; develop a user-friendly open corporate platform where more detailed information about initiatives organized by themes, including contact information, could be housed; involve partner areas such as the Finance, Planning and Procurement Branch, the Informatics Branch, and the Modern Statistical Methods and Data Science Branch (which support different projects for sound statistical approaches) at the outset of a new initiative since these groups have a corporate perspective of innovative projects.

Focus is on expediency

The level of project management typically reflects several factors including risk, materiality and interdependencies. The evidence suggests that timeliness for delivering results is given the highest priority for the projects and that project management is viewed as being a time consuming onerous task that slows things down. As such, minimal effort is placed on activities such as conducting formal risk assessments, identifying quantifiable goals, undertaking cost-benefit analyses, and sharing best practices (as well as learning from experiences of other divisions). An appropriate balance is missing. 

How to improve the program

2016 Census of Agriculture dissemination strategy

The Assistant Chief Statistician (ACS), Economic Statistics (Field 5), should ensure that:

Recommendation 1:

For the 2021 CEAG, the Agriculture Division explore ways to improve the timeliness of the last two sets of data tables (historical data, and socio-economic data) and increase cross-analysis with non-agricultural sectors.

Recommendation 2:

Web tools include guidance on how to use them and how to interpret data from them. A proactive approach to launching new tools should be taken. Webinars were identified as an effective channel and the use of other channels would allow for even a wider coverage.

Design and delivery: Census of Agriculture migration to the Integrated Business Statistics Program

The ACS, Field 5, should ensure that:

Recommendation 3:

Unresolved issues for the migration to the IBSP, including incompatibilities between the IBSP and the CMP as well as the IBSP processing capacity, are addressed prior to the production phase.

Recommendation 4:

Significant risks during the production phase, particularly with regard to data quality assessments and the exercising of roles and responsibilities, are monitored and mitigated.

Agriculture Statistics Program projects supporting the modernization initiative

The ACS, Field 5, should ensure that:

Recommendation 5:

Planning processes for future projects falling outside the scope of the Departmental Project Management Framework include an initial assessment that takes into account elements such as risk, materiality, public visibility and interdependencies. The assessment should then be used to determine the appropriate level of oversight and project management.

Recommendation 6:

Processes and tools for documenting and sharing of best practices are implemented and lessons learned from other organizations (internal and external) are leveraged.

Management response and action plan

Recommendation 1:

For the 2021 CEAG, the Agriculture Division explore ways to improve the timeliness of the last two sets of data tables (historical data, and socio-economic data) and increase cross-analysis with non-agricultural sectors.

Management response

Management agrees with the recommendation.

For the 2016 Census of Agriculture, no funding was provided for the creation and release of the socioeconomic portrait of the farm population; as such, it was not part of the original scope for the 2016 dissemination plan but was added later. The tool used to create the socio-economic dataset from the 2016 CEAG (dealing specifically with the linkage between the Censuses of Agriculture and Population) is specifically in-scope as a deliverable for the 2021 Census of Agriculture.

The 2021 CEAG dissemination strategy and release schedule will be presented to the CEAG steering committee for review and approval. Related processes for the release of selected historical farm and farm operator data will also be reviewed and the timeline for releases will be adjusted based on feedback from the Federal Provincial Territorial partners (key users of the data).

Agriculture Division has already taken steps to increase cross-sectoral analysis with non-agricultural sectors, including the infographics on:

  1. Which came first: The chicken or the egg? Poultry and eggs in Canada
  2. Thanksgiving: Around the Harvest Table.

The CEAG will continue to build on this initiative by developing cross-sectoral infographics, analytical studies, Daily releases and interactive data visualization for the 2021 CEAG data release.

Deliverables and timelines

The Assistant Chief Statistician, Economic Statistics (Field 5) will ensure the delivery of

  1. The approved Dissemination strategy (December 2020)
  2. A proposal for cross-analysis such as Infographics, analytical studies, and Daily releases integrating CEAG data with data from other sectors (March 2021)
  3. A proposal for new interactive visualization tools within the Agriculture Stats Hub (March 2021).

Recommendation 2:

Web tools include guidance on how to use them and how to interpret data from them. A proactive approach to launching new tools should be taken. Webinars were identified as an effective channel and the use of other channels would allow for even a wider coverage.

Management response

Management agrees with the recommendation.

YouTube tutorial videos on how to use Statistics Canada geographic boundary files using open source GIS software (QGIS) have been produced and added to the Agriculture and Food portal.

The CEAG will create "How to" instructions and demos on how to use the interactive visualization web tools. The "How to" instructions will be available within each tool and the demos will be presented to data users in a series of webinars planned for the 2021 CEAG releases.

Deliverables and timelines

The Assistant Chief Statistician, Economic Statistics (Field 5) will ensure the delivery of

  1. A proposal for new interactive visualization tools within the Agriculture Stats Hub, with integral "How to use" instructions and webinar demos (March 2021).

Recommendation 3:

Unresolved issues for the migration to the IBSP, including incompatibilities between the IBSP and the CMP as well as the IBSP processing capacity, are addressed prior to the production phase.

Management response

Management agrees with the recommendation.

The CEAG will continue to work with partners to identify relevant and emerging issues related to the migration to the IBSP during the integrated testing commencing June 2020. Issues will be captured in JIRA and major risks entered in the CEAG risk register. Consolidated risks and issues will be tracked and actioned in project plan documentation.

The integrated testing will take place over several months. All relevant and emerging issues must be resolved by December 2020 to ensure the readiness of production activities.

Issues and risks to be monitored through the CEAG Steering Committee.

Deliverables and timelines

The Assistant Chief Statistician, Economic Statistics (Field 5) will ensure that relevant and emerging IBSP issues and risks are tracked consistent with the DPMF (December 2020).

Recommendation 4:

Significant risks during the production phase, particularly with regard to data quality assessments and the exercising of roles and responsibilities, are monitored and mitigated.

Management response

Management agrees with the recommendation.

A table top exercise will be conducted to identify potential gaps in the processes in place (including risk management) for the production phase. Information gathered during the exercise will be used to inform plans and develop potential contingencies. Results will be presented to the CEAG Steering Committee.

The CEAG will engage the IBSP and all its stakeholders ("SWAT" team) in convening meetings to communicate relevant and emerging issues and risks during the production phase and to find resolutions. Roles and responsibilities will be formally documented and presented at the CEAG Steering committee.

The SWAT team will be ready for the production phase.

Deliverables and timelines

The Assistant Chief Statistician, Economic Statistics (Field 5) will ensure the delivery of

  1. The results from table top exercise (December 2020)
  2. The CEAG "SWAT" team with documented roles and responsibilities (March 2021).

Recommendation 5:

Planning processes for future projects falling outside the scope of the Departmental Project Management Framework include an initial assessment that takes into account elements such as risk, materiality, public visibility and interdependencies. The assessment should then be used to determine the appropriate level of oversight and project management.

Management response

Management agrees with the recommendation.

A new process will be implemented (for both subprojects under AG-Zero and non-DPMF projects) that will require the development of a project plan prior to the launching of a new project. The plan will include among other things: an initial assessment of the issues and risks (and mitigation strategies); a description of the methodology and assumptions; the identification of interdependencies and expected outcomes; and communication plans. The monitoring of projects will take place through existing governance mechanisms. Finally, existing projects already underway will be subject retroactively to the new process.

Where relevant, the plans will be used to update the DPMF project issues and risks register and the DPMF Project Plan.

Deliverables and timelines

The Assistant Chief Statistician, Economic Statistics (Field 5) will ensure the delivery of a new project plan process (June 2020).

Recommendation 6:

Processes and tools for documenting and sharing of best practices are implemented and lessons learned from other organizations (internal and external) are leveraged.

Management response

Management agrees with the recommendation.

The Agriculture Division has already shared lessons learned and best practices through various mechanisms including:

  1. a presentation at AAFC on producing crop yield estimates using earth observation and administrative data on March 14, 2019
  2. a presentation at the Economic Statistics Forum on November 12th, 2019
  3. a presentation at AAFC on February 7th, 2020, on Predicting the Number of Employees using Tax Data.

As part of the new project plan process outlined previously, the Agriculture Division will leverage lessons learned from other organizations where applicable. In addition, as part of ongoing monitoring, lessons learned and best practices from projects will be documented.

Deliverables and timelines

The Assistant Chief Statistician, Economic Statistics (Field 5) will ensure the delivery of:

  1. A systematic approach to share and document lessons learned (December 2020)
  2. A presentation(s) at conferences such as the Economic Statistics Forum (March 2021)
  3. An article(s) in @StatCan or the Modernization bulletin (March 2021)
  4. A presentation(s) at AAFC (March 2021).

Appendix A: Integrated Business Statistics Program (IBSP)

The IBSP provides a standardized framework for surveys with common methodologies for collection and processing. Through standardization and use of corporate services and generalized systems, the program optimizes the processes involved in the production of statistical outputs; improves governance across all areas involved in statistical data output, particularly for change management; and modernizes the data processing infrastructure. This is achieved by balancing the development of a coherent standardized model with the maintenance of flexible program-specific requirements. It is expected that the IBSP surveys will use:

  • the Business Register (BR) as a common frame;
  • harmonized concepts and content for questionnaires;
  • electronic data collection as the principal mode of collection;
  • shared common sampling, collection and processing methodologies;
  • common tools for data editing and analysis; and
  • the tax data universe for estimating financial information.

Appendix B: List of products released (2016 CEAG)

Table 6: List of products released (2016 CEAG)
Date of release Title of product (including link) Type of product Timeliness
Time lapse between collection and release:
Days (years)
Time lapse since previous release: Days
May 10, 2017 The Daily: 2016 Census of Agriculture Statistics Canada's official release bulletin 365
(1 year)
N/A
Farm and Farm Operator Data (CANSIM tables 004-0200 to 004-0246) Data table
Provincial and territorial trends (NL; PE; NS; NB; QC; ON; MB; SK; AB; BC; YT/NT) Analytical product
Reference maps: Provinces Map
May 17, 2017 A portrait of a 21st century agricultural operation Analytical product 367 2
May 24, 2017 Production efficiency and prices drive trends in livestock Analytical product 374 7
May 31, 2017 Seeding decisions harvest opportunities for Canadian farm operators Analytical product 381 7
June 7, 2017 Leveraging technology and market opportunities in a diverse horticulture industry Analytical product 388 7
June 14, 2017 Farmers are adapting to evolving markets Analytical product 395 7
June 21, 2017 Growing opportunity through innovation in agriculture Analytical product 402 7
June 27, 2017 150 Years of Canadian Agriculture Infographic 408 6
September 13, 2017 Agricultural Ecumene Boundary File Boundary file 486 78
November 20, 2017 Canadian Agriculture at a Glance: Other livestock and poultry in Canada Analytical product 554
(~1.5 years)
68
December 6, 2017 Canadian Agriculture at a Glance: Dairy goats in Ontario: a growing industry Analytical product 570 16
December 11, 2017 Selected Historical Data from the Census of Agriculture (CANSIM Tables 004-0001 to 004-0017)   Data table 575 5
December 13, 2017 Agricultural operation characteristics Map 577 2
January 25, 2018 Land use, land tenure and management practices Map 620 43
February 22, 2018 Crops - Hay and field crops Map 648 28
March 22, 2018 Canadian Agriculture at a Glance: Innovation and healthy living propel growth in certain other crops Analytical product 676 28
April 5, 2018 Crops - Vegetables (excluding greenhouse vegetables), fruits, berries and nuts, greenhouse products and other crops Map 690 14
April 26, 2018 Livestock, poultry, bees and characteristics of farm operators Map 711
(~2 years)
21
November 27, 2018 The Daily: The socioeconomic portrait of Canada's evolving farm population, 2016 Statistics Canada's official release bulletin 926
(~2.5 years)
215
Agriculture-Population Linkage Data (The socioeconomic portrait of Canada's evolving farm population, 2016) (13 Data Tables) Data table
Socioeconomic overview of the farm population - The Agriculture Stats Hub Dynamic web application (Agriculture-Population Data Linkage)
Canadian farm operators: An educational portrait Infographic
The socioeconomic portrait of Canada's evolving farm population Infographic
Canada's immigrant farm population Infographic
December 13, 2018 Canadian Agriculture at a Glance: Female and young farm operators represent a new era of Canadian farmers Analytical product 942 16
January 17, 2019 Canadian Agriculture at a Glance: Aboriginal peoples and agriculture in 2016: A portrait Analytical product 977 35
March 21, 2019 Canadian Agriculture at a Glance: The educational advancement of Canadian farm operators Analytical product 1040
(~3 years)
63
July 3, 2019 Canadian Agriculture at a Glance: The changing face of the immigrant farm operator Analytical product 1144
(~3 years)
104

Appendix C: Governance and management structures (Census of Agriculture and the Integrated Business Statistics Program)

Overall management of the Census of Agriculture:

  • CEAG Working Group (WG) for overall management of the CEAG (monthly meetings): chaired by the Assistant Director (AD) and Chief, and includes Chiefs from other relevant areas (e.g., methodology, IT and unit heads)
  • CEAG Management Team for day-to-day management of the CEAG (weekly meetings): includes the same members as the CEAG WG, but is also extended to other staff involved
  • CEAG Steering Committee:Footnote 23 an overarching advisory and decision-making function (monthly meetings)
  • Other WGs and committees for various functions (e.g., Census of Population/CEAG WG, Collection WG, Advisory Committee on Agriculture and Agri-Food Statistics, Federal-Provincial-Territorial Committee on Agriculture Statistics.)

Governance structures already established within the Enterprise Statistics Division (ESD) for the IBSP:

  • IBSP Transition/Integration/Production WGs: chaired by ESD, and includes the CEAG and all other partners such as the Operations and Integration Division (OID), the Collection Planning and Research Division (CPRD), as well as methodology and IT (bi-weekly meetings), to support the transition, integration, and production phases;
  • IBSP Project Management Team:Footnote 24 an overarching advisory and decision-making function that includes directors general, directors and ADs involved in the IBSP migrations
  • Change Management Committee: involved only during the production phase, it will be responsible for overseeing change management during production (e.g., if the schedule needs to be changed, the Committee will triage the request to the different stakeholders involved.)

Appendix D: Innovation Maturity Survey

In 2018, Statistics Canada conducted a survey to measure the innovation maturity level of the agency across 6 attributes:Footnote 25

  • Client expectations - incorporating the expectations and needs of clients in the design and development of innovative services and policies
  • Strategic alignment - articulating clear innovation strategies that are aligned with the organization's priorities and mandate
  • Internal activities - building the right capabilities aligned with the innovation strategies
  • External activities - collaborating across the whole of government and with external partners to co-innovate policies, services and programs
  • Organization - fostering the right organizational elements to drive innovation performance at optimal cost
  • Culture - aligning the innovation goals, cultural attributes, and behaviours with the innovation strategies

The Agriculture Division had maturity levels higher than those for all of Statistics Canada and compared with the Economic Statistics Field as a whole.

Figure 3: Results from the Innovation Maturity Survey (5 point scale)Footnote 26
Figure 3: Results from the Innovation Maturity Survey (5 point scale)
Description for Figure 3 - Results from the Innovation Maturity Survey (5 point scale)

The figure depicts the results of Statistics Canada Innovation Maturity Survey level for 4 different groups (Statistics Canada; Economic Statistics Field; Agriculture, Energy and Environment Statistics Branch; and, Agriculture Division. Six different attributes were used: Client expectations; Strategic alignment; Internal activities; External activities; Organization; and, Culture. Overall maturity was also assessed.

Results from the Innovation Maturity Survey (5 point scale)
Attribute Statistics Canada Economic Statistics Field Agriculture, Energy and Environment Statistics Branch Agriculture Division
Overall maturity 1.98 2.01 2.15 2.35
Client Expectations 2.16 2.34 2.55 2.84
Strategic Alignment 1.98 1.95 2.03 2.52
Internal activities 2.11 2.08 2.22 2.38
External activities 1.47 1.57 1.73 1.66
Organization 1.90 1.90 1.97 2.11
Culture 2.26 2.24 2.40 2.56

Video - Join by Attributes (Part 2): One to Many Joins

Catalogue number: Catalogue number: 89200005

Issue number: 2020013

Release date: November 20, 2020

QGIS Demo 13

Join by Attributes (Part 2) - One to Many Joins - Video transcript

(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Join by Attributes (Part 2) - One to Many Joins")

Hello everyone, following up from the previous tutorial on one-to-one join by attributes, today we'll discuss the second type – the one-to-many join, where there are many rows or features for one corresponding feature geometry. We'll demonstrate these procedures using the Farms Classified by Total Farm Capital table, which includes the numbers of farms reporting in six separate capital categories for each corresponding boundary. The table also requires some formatting in an external editor, including extracting the join information which is not unusual for performing joins or integrating Statistics Canada tables with vector data presently.

So within the table we'll begin by moving the first Census Agricultural Region name down to align with the first capital category and delete the row it was in. Then we'll copy the main table to a new workbook to continue editing. So now we must fill the Census Agricultural Region names for each capital category. So under Find and Select, we can click Go to Special and check Blanks. In the Formula Bar at the top we'll then define the cell used to fill the first blank cell and hit Ctrl and Enter to auto-fill the formula down the entire column.

Next we want to extract the Census Agricultural Region Unique Identifiers found within the square brackets of the "Geography" column. So to do so we can apply this formula, which will find and extract characters within square brackets. I've included this and other helpful formulas for extracting join information within the video description. So after pasting the formula we can use the tab in the bottom-right corner of the cell to apply the formula for the entire column.

Now we want to remove the CAR prefix so we'll use the RIGHT formula, specifying the cell and the number of characters to take, in this case 9.

The final procedure to isolate the Census Agricultural Region Unique Identifiers is to take the first four values from the LEFT.

To extract the join information for finer census geographies such as subdivisions, we would actually take the first three digits on the left and the last four digits on the right and concatenate them together, with the procedures otherwise being the same.

Now we'll copy the extracted IDs and paste them as values in the adjacent column. We can then delete the previous three columns with formulas. And the final step is to provide abbreviated field names less than 10 characters in length as we learned in the previous video. I'm adding FN in front of 2011 and 2016 to distinguish that these are farm numbers being reported within these columns. Then we'll save our table first as an excel workbook, which we'll call FarmCap and then we'll resave as a comma-separated values file selected from the Save As Type drop-down. Saving directly to .csv could've adversely affected our table formatting. So now we can close the program, and click Don't Save.

Now within QGIS we can refresh our browser panel and load in the formatted table along with our Census Agricultural Regions boundary file.

Rather than using the Joins tab which results in a more complex workflow, we'll go to the Processing Toolbox, and search Join Value - opening the Join Attributes by Field Value tool. Despite the slightly different appearance and format the tool contains the exact same parameters as the Joins tab. We specify the two layers that we'd like to join, here the Census Agricultural Regions boundary file and the formatted farm capital table – and the two fields with common entries used to link the datasets together – in this case both being CARUID. We can also specify which fields to add from our second layer. The main distinguishing feature of the tool is the ability to select the One-To-Many from the Join Type drop-down.

So we'll now save it to a permanent file – calling it JFarmCap – in our Joins folder. Run the tool and once complete refresh the Browser Panel and load the layer.

So we'll isolate the Census Agricultural Regions in Manitoba for a smaller area of analysis. With our selection we can then calculate the difference in the farm numbers reporting in each capital category between the two census years using the Field Calculator. So I'll call the field D-Farm-16-11 and expanding the Fields and Values drop-down we'll subtract the number of farms in 2016 from those in 2011.

So now we can apply a graduated symbology to our calculated difference field. I created a symbology file earlier for this purpose which I'll load from File. So Natural Jenks was the Mode to establish the ranges for visualization and the same field was also used to label the features . It's important that we are applying the visualization to this layer, as it contains the full range of values which might not be replicated by individual capital categories.

Now within the Processing Toolbox, the Split Vector Layer tool can be used to create many separate layers from a vector according to a unique ID. So if we left it as is, using the CARUID – each census agricultural region would be output as a separate file. However, since we're interested in examining the changing number of farms in each capital category - we'll apply it to the Farm Capital Class field. We'll create a new directory within our Joins folder for the output files, calling it Split Farm Capital. Ensuring it's only applied to the selected features we can then click Run.

Once complete, we can now expand the directory and for a selection of layers in the Browser Panel we can right-click and select Add Layers.

So with the symbology already applied to our original joined vector, we can simply right-click copy style, and selecting all style categories. Now we can select and right-click the split layers and click paste style. So this enables a uniform visualization, with the complete value ranges, to be rapidly applied across multiple layers.

I've loaded and ordered the split layers in ascending capital ranges in the Prepared Layers group. So now if we toggle through the separate Farm Capital Categories we can see a broad trend. In general there's a decline in the numbers of farms reporting in smaller capital categories – and as we approach larger capital categories, there's a general increase in the number of farms reporting. This is an established trend within the agricultural sector.

So joining table and vector data is a powerful tool for integrating tabulated variables in a geospatial format – facilitating analyzing and visualizing spatial and temporal variations in a wide-range of reported variables. With the skills developed in this demo, users should be able to: Extract or match join information from datasets, Perform one-to-one and one-to-many joins between tables and vector datasets, and visualize joined variables and examine the relations with the graphics tools.

Although we only joined one table to the vector datasets for each of these case-uses, these procedures can be repeated to combine multiple tables and explore relations between variables at a common level. Apply these skills to datasets of interest to you. So stay tuned for the next tutorial in which we'll cover creating Maps in QGIS.

And on one final note - Statistics Canada is increasingly releasing datasets in table and spatial formats on the Federal Geospatial Platform, such as Median Income after Tax and Farm Operator datasets. They can be readily loaded into QGIS and used to examine trends at multiple scales. So download these datasets when available to facilitate using Statistics Canada data in QGIS.

(The words: "For comments or questions about this video, GIS tools or other Statistics Canada products or services, please contact us:
statcan.sisagrequestssrsrequetesag.statcan@canada.ca" appear on screen.)

(Canada wordmark appears.)

Retail Trade Survey (Monthly): CVs for Total sales by geography - July 2020

CVs for Total sales by geography - July 2020
Table summary
This table displays the results of Annual Retail Trade Survey: CVs for Total sales by geography - July 2020. The information is grouped by Geography (appearing as row headers), Month and Percent (appearing as column headers).
Geography Month
202007
%
Canada 0.7
Newfoundland and Labrador 1.3
Prince Edward Island 0.9
Nova Scotia 1.9
New Brunswick 1.8
Quebec 1.5
Ontario 1.4
Manitoba 1.6
Saskatchewan 1.4
Alberta 1.4
British Columbia 1.6
Yukon Territory 1.3
Northwest Territories 0.4
Nunavut 2.2

Video - Join by Attributes (Part 1): One-to-One Joins

Catalogue number: Catalogue number: 89200005

Issue number: 2020012

Release date: November 20, 2020

QGIS Demo 12

Join by Attributes (Part 1) - One-to-One Joins - Video transcript

(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Join by Attributes (Part 1) - One-to-One Joins")

So in this tutorial we'll introduce joining datasets by attributes, specifically linking tables to vector datasets for analysis and visualization. This is a powerful way to examine tabulated variables, linking them to vector geometries via common entries - in this case a column in the table and a matching field within the vector. There are two types of attribute joins, with slightly different procedures. Today we'll cover the first, the one-to-one join, where there is one row for each corresponding feature or geometry. For a successful join the entries must match perfectly. Thus, numeric identifiers are best due to complications with text such as special characters, spacing and case-sensitivities. Copying and matching entries between datasets is another method to improve the likelihood of a successful join.

For the tutorial we'll use the Population and Dwelling Highlight tables - downloaded previously. They are ideally formatted, as the join information is readily available and boundary changes between census collections have been accounted for – requiring no external formatting.

So we can load table datasets in to QGIS using the established procedures, double left clicking or dragging and dropping into the Layers Panel. So with the tables and corresponding boundaries loaded we'll open the Layer Property box of the Census Division layer which we'll use to demonstrate the procedures. We can use the Joins tab to perform one-to-one joins and to create our join click the Plus icon.

So the Join layer is the dataset that we'd like to join. And the join field is the specific column or field used to link the datasets. While the target field is the corresponding field within the vector containing matching entries, in this case the Unique Census Division Identifier.

We'll also check the following boxes, and specify the fields to join – as otherwise all fields are joined by default. Specifically we'll add the population and total private dwelling counts and percent change fields, as well as the land area, population density and two population rank fields. We'll remove the custom prefix to retain the original column names.

Now in the Source Fields tab we can see that the columns are joined temporarily and have been misattributed as text field types – a default in QGIS. Additionally, the joined field names exceed the limits of the file format we're using. So to permanently join the datasets we would have to export the dataset to a new layer. But to accomplish all three tasks at once we'll use the Refactor Fields tool.

In the Processing toolbox we can search the tool and double-left click to open it. So we'll close the tool description. And using the drop-downs we can change the field types for the joined columns. Specifically we'll use integer for the count variables, which will be whole numbers, while for the percent change field, land area and population density fields we'll use double - the equivalent for the decimal number field type. Then we need to specify the parameters – using a length of 12 and, for our double columns, a precision of 2. Finally we'll rename our fields to abbreviated headers less than 10 characters in length – the limit of the shapefile format we're using. So feel free to use abbreviations that are most interpretable to you.

Now we'll save the file - providing it an output directory and name. I am storing it in the Joins demo folder and calling it JPopCD for Joined Population - Census Divisions. Once complete, refresh the Browser panel and load the joined dataset into the Layers Panel.

So with our fields correctly attributed - we can now visualize the joined variables, applying a graduated symbology to the numeric fields – specifically in this case we'll use the percent population changes field. We'll select an appropriate colour ramp for visualization and for the Census Division layer we can use Pretty Breaks to establish the value ranges for visualization – clicking OK.

Now we can examine the spatial variations in the joined variables across Divisions within Canada.

We recommend practicing these procedures on your own by repeating with the Census Subdivisions layer to familiarize yourself with the workflow. Once again specify the Join layer and the join information in this case using the Census Subdivision unique identifier to link the datasets.

For the demo, I've loaded a Joined Subdivision layer I created earlier. Due to the data distribution at this level Quantile (Equal Count) was the method used to establish the break values for visualization. But toggling back and forth between the layers demonstrates how these procedures enable variables of interest to be assessed at multiple scales relatively quickly and easily.

And on that note - if we want to determine trends at a broader level we can use the Aggregate tool. So we specify the layer to be aggregated, and then the Group by Expression drop-down enables us to select a field to use in aggregating both the geometries and attributes of the layer. Since we are gonna use the Provin…Unique Provincial Identifier, the first three Census Division fields are redundant – so we'll remove them. We can specify the operator applied in aggregating the fields from the drop-down. So we'll use First Value for the first two text fields and for the percent changes we'll use Mean. Finally we'll remove the Population Rank fields. And we'll save this layer in the same directory calling it JAgPrPop for Joined Aggregated Provincial Populations. Once it's complete we'll refresh the Browser Panel once again loading the layer.

And now we'll recalculate the percent population change field with the Field Calculator. So we'll add a bracket, and then subtract the Population in 2016 from that in 2011, close the bracket and divide it by the baseline in this case being the population in 2011 multiplied by 100. Once again we can use the symbology tab to visualize the variables at three separate levels.

So the final item I'd like to discuss is the Graphics drop-down in the Processing Toolbox, which can be used to quickly assess data distributions, such as the feature counts in different categories using histograms and boxplots, or variable relations between joined variables – using the Scatterplot tool. So we could specify the 2016 Population as our independent variable and the Total Private Dwellings in the same year as the dependent. Once run, we can click on the Hyperlink in the Results Viewer to examine the relation. Unsurprisingly there is a very strong positive relationship between the total population and number of private dwellings.

So that concludes the one-to-one join - when there is one entry for each corresponding feature - using the Joins tab. The procedures can be applied to join tables or link two or more vector datasets together - enabling the visualization of variables of interest. We also learned how to apply the Refactor Fields tool to alter field types of joined data and use the Aggregate tool to examine trends at broader levels. We could iterate these procedures to combine multiple variables for examining changing variable relations between locations or over time. In the next demo we'll examine how to perform a one-to-many join.

(The words: "For comments or questions about this video, GIS tools or other Statistics Canada products or services, please contact us: statcan.sisagrequestssrsrequetesag.statcan@canada.ca" appear on screen.)

(Canada wordmark appears.)

Wholesale Trade Survey (monthly): CVs for total sales by geography - July 2020

Wholesale Trade Survey (monthly): CVs for total sales by geography - July 2020
Geography Month
201907 201908 201909 201910 201911 201912 202001 202002 202003 202004 202005 202006 202007
percentage
Canada 1.4 1.2 1.3 1.3 1.1 1.5 1.5 1.3 1.3 1.6 0.8 0.7 0.7
Newfoundland and Labrador 0.8 0.7 0.6 0.7 0.7 0.3 1.4 0.5 2.3 1.2 0.5 0.1 0.2
Prince Edward Island 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Nova Scotia 2.6 4.7 4.8 4.2 4.9 13.0 5.0 3.8 5.3 6.2 4.0 2.3 1.5
New Brunswick 2.9 3.4 2.3 2.8 5.5 3.6 4.9 2.4 2.1 3.3 3.3 1.9 1.4
Quebec 3.2 3.2 3.3 3.4 3.1 3.5 3.0 3.7 3.1 4.6 2.0 1.9 1.7
Ontario 2.4 1.8 1.9 2.0 1.7 2.4 2.4 1.8 2.1 2.3 1.1 1.1 1.1
Manitoba 1.9 2.0 2.1 3.3 1.8 5.1 2.7 1.6 1.9 5.8 2.8 1.2 1.2
Saskatchewan 1.6 2.2 1.7 1.4 1.9 1.4 1.0 1.1 0.9 2.4 0.7 0.7 1.1
Alberta 1.8 1.8 3.4 2.6 2.4 2.0 2.0 1.8 2.4 4.8 2.9 2.3 2.4
British Columbia 2.1 2.7 2.9 2.3 3.0 2.6 2.5 3.2 3.1 2.6 1.7 1.6 1.3
Yukon Territory 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Northwest Territories 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Nunavut 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Video - Best Practices, Tips and Altering Defaults in QGIS

Catalogue number: Catalogue number: 89200005

Issue number: 2020011

Release date: November 19, 2020

QGIS Demo 11

Best Practices, Tips and Altering Defaults in QGIS - Video transcript

(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Best Practices, Tips and Altering Defaults in QGIS")

Today we’ll introduce some tips and best practices for using QGIS, covering topics such as file management, optimizing workflows and accessing, running and troubleshooting common issues with processing tools. We’ll also briefly discuss changing program defaults, enabling some customization of the interface and treatment of data according to individual needs. These best practices and tips will help avoid frustration and facilitate processing, analyzing and sharing spatial data products and visualizations with others.

So the first tips concern file and directory management. Like many programs QGIS uses absolute file paths by default to link layers to project files. Therefore, if a directory or file is moved or renamed, the new path must be provided when reopening the project otherwise the affected layers are discarded. So select layer and click Browse – navigating to the new directory or filename, and for a shapefile - select the .shp component of the layer.

So as noted, all spatial data should be in a common directory – here the Geospatial Data folder - using additional subdirectories and distinctive file-names for further organization. It is still best practice to avoid spaces and special characters in filenames or directories – as this can complicate saving or loading files. So substitute spaces with underscores or dashes as required. Finally, using GIS it’s easy to rapidly create multiple files – so ensure to manage your directories judiciously.

The optimal file format for a dataset depends on the intended use. Shapefiles help quickly share layers with others for analysis, visualization and editing – while geodatabase and geopackage files enable layers of different geometry types to be stored in a single file; with the original layers locked from editing unlike Shapefiles – there are no limits on field name lengths. The Package Layers tool can be used to create a geopackage, where we could then combine the points from the Grain Elevators layer, lines from the road segments and polygons from the projected Manitoba census subdivisions. After saving to a permanent file, we could then load the layers like a file geodatabase. And there are a variety of other formats – such as KML for loading and displaying a vector layer in Google Earth. In general, use the Format drop-down in the Save Vector Layer As box to change to the desired file format. There are many sources detailing the applications, advantages and disadvantages of major formats, which can be consulted in determining the best format for your data.

And to improve rendering times for large vector datasets we can use the Create a Spatial Index tool from either the toolbox or the Source Tab of the Layer Properties Box. The raster equivalent is Build Overviews - creating coarser resolution versions of the input for rapid rendering at broader extents.

The next tips relate to GIS workflows. Ultimately, there are many ways to complete the same task in GIS. So the shortest workflow, in number of steps, intermediary outputs or processing times, which achieves the same result is the best workflow. Comparing these expressions – which produce the same selection - the second expression is better – as it avoids repeating the field name and operator for each attribute of interest. So apply these principles to your own workflows – whether it is the specific tools applied, the order in which they’re implemented or, as just shown, the way that an expression or code is written.

So QGIS tools can be accessed from the menu-bar drop-downs or from the Processing Toolbox. Note there is some mutual exclusion in the available tools – such as the Check Geometries Core Plugin in the Vector drop-down or the additional GDAL, SAGA and GRASS tool set, as well as user-created models and processing scripts in the Toolbox. I find that the Toolbox is the fastest and easiest way to isolate available tools using the Search bar, as this will also return additional or alternative tools that may be relevant for your workflow. If needed, use the descriptions on the right-side of a tool to help parametrize it. Note that the parameters can vary depending upon the specific source of the tool. For example, the QGIS Slope tool has just two parameters, for the Digital Elevation Model and Z factor, while the GDAL Slope tool contains additional parameters such as expressing slope in percentages vs degrees. The appearance of tools can also vary according to the location that they’re accessed from. So, for example, opening the Select by Expression tool from the Toolbox is markedly different in it’s appearance from that on the Attribute Toolbar – lacking the central drop-downs to help us construct our expressions.

The next item is on spatial properties. As noted, when using multiple layers in QGIS, the projection, datum and coordinate reference system should be uniform. Although QGIS re-projects layers on-the-fly for visualization to the Project Coordinate Reference System – established by the first loaded layer - it does not resolve these differing properties for processing and analysis. For spatial analysis, use a Projected Coordinate Reference System – tailoring the selected system to the required precision for your analytical needs.

Conversely, due to the potential effects on cell alignments and values, rasters should not be re-projected unless necessary such as for spatial analysis or integrating multiple rasters from different sources. In these cases, the alignment and resolution of cells should also match - which can be accomplished using the Align Rasters tool. Select the input layers, output file name and resampling method. The coarser resolution raster should be used as the Reference layer. And as we can see, the position of pixels compared against the original raster have been slightly shifted, but toggling on the aligned DEM we can see that their cells are aligned which we could then process and analyze further as required. Similarly, when sampling raster layers ensure that the minimum distance between points is greater than the resolution of cells to avoid violating assumptions of statistical independence.

The next tips concern running processing tools. Most tools can be run on single layers or Run as a Batch Process for multiple inputs. However, when run as a Batch Process – temporary layers and Selected Features Only are not available. The Multiple Selection box can help rapidly select layers of interest, and where possible we can copy and paste parameters to reduce manual inputs. To store intermediary layers we can create a temporary directory which we can be deleted after processing – as I did to re-project layers to WGS UTM Zone 14, with the Scratch folder. Provided the layers are named with the desired filename we can just add a prefix and use the Autofill Settings, Fill with Parameter Values to automate the output filenames.

Alternatively, for vector processing we can enable the Edit in Place function in the Toolbox. This enables input layers to be modified without creating new layers. So we could re-project layers, or here take the AOI layer and Rotate Features by 180 degrees. We can use the Undo function to revert to the original inputs as needed. Another option is to create a process model, defining inputs and algorithms for repeated tasks, such as this one here which reprojects and clips a layer to a common coordinate reference system and extent. We could then double left-click it in the Toolbox to run it individually or as a batch process in standardizing the spatial properties and the extent of analysis. We’ll cover the Process Modeler in a later demo.

So most QGIS tools are run in the Background – meaning that other tasks can be completed while processing tools are running. This is not necessarily applied to GRASS or SAGA tools. So be patient – even when the program appears frozen - often tools are still running and will complete given the required processing time. However, there is no auto-save in QGIS – so ensure to save edits to layers, visualizations and project files frequently, especially prior to running processing-intensive tools. And if QGIS crashes while using a processing tool, the Toolbox Icon may disappear from the Attribute Toolbar when the program is reopened. Since it’s a core plugin, it can be reloaded from the Manage and Install Plugins box, opened from the Plugins drop-down. We can then check the Processing box off and on again to have the icon reappear.

The Plugins are another key component of QGIS, integrating user-created functions. And they can be installed and updated directly from this window when connected to the internet or loaded from a compressed folder if downloaded from the Online Repository. Note that non-core plugins may rely on additional dependencies and can also become deprecated between QGIS versions – in which case they are listed in red.

Now let’s quickly discuss editing defaults within QGIS. To do so, expand the Settings drop-down and select Options. Note that any changes made here apply to all project files, and require restarting the program to take effect.

Within the General Tab, we can alter the interface language - specifying the language and locale – here having selected Canadian French. As we can see this translates most aspects of the interface, including tools and outputs accordingly. Back in the General Tab, below are additional defaults on system prompts and project parameters. In the Coordinate Reference System tab we can change the default Coordinate Reference System. We’ll leave it as WGS84, as this is the most widely used Geographic Coordinate Reference System. We can also alter how the coordinate reference system is established when loading layers – using either the Default, Prompting for each Layer or using the Project Coordinate Reference System.

In the Data Sources tab we can alter the behaviour and formatting of the attribute table. We can specify which features are shown, the default view as either form or table, and the defaults for copying the table. So, the default here includes Well Known Text which are the coordinates for the geometries of each feature. And this enables tables to be processed and analyzed externally, and reloaded in a spatial file format. However, if no further analysis in GIS was required or the data could be rejoined via another means such as unique identifiers - we could switch to plain text, no geometry to reduce times in exporting the table.

Rendering provides information on the defaults for visualizing vector and raster layers, such as geometry simplification for vectors and default rendering styles for rasters. The next four tabs enable edits the selection and colours for other map interaction tools, pre-defined colours and scales, and parameters for feature delineations.

Within the Processing tab, we can select the default file formats for raster and vector layers, how to address invalid geometries in a vector – here leaving it in its default - as well as the displayed information when running tools and the default output folder. In the Menus drop-down, we can customize the tools listed in the menu-bar drop-downs and on the toolbars. So to add it to the menu-bar, copy the Menu Path syntax from a tool already added and paste it to a tool of interest. Then to add it to a toolbar simply provide an icon and check the “Add button in toolbar” box. So here I created a custom toolbar with Geoprocessing tools, including Extract by Location – using the Snipping tool to extract icons from the toolbox. The toolbar can then be accessed once QGIS is restarted – here being shown in the French interface.

The Project Properties box contains similar parameters - but are specific to the active project file. It can be opened by clicking on the Project Coordinate Reference System button in the bottom right corner of the interface. Within the General tab, we can switch the Save Paths from Absolute to Relative for saving layers, which will reduce complications when sharing project files and directories with others. We can also specify default visualizations for different geometry types. And within the Relation tab we can establish layer relations, with the Referencing layer containing ‘many’ entries - such as the Census Subdivision layer - and the referenced layer containing one matching entry – here using the Census Division layer - and linking them by the census division identifier field.

Finally, let’s discuss some common problems and resolutions for processing layers. Most resolutions link back to the best practices we’ve discussed. The first thing to do is to consult the Log tab for targeting your trouble-shooting initiatives. For example, if it returns Invalid Geometries – run the layers through a cleaning tool such as Fix Geometries - and then rerun through the tool of interest with the fixed output. If errors persist tools such as Check Validity and Topology Checker can help identify errors, which can then be resolved with more advanced cleaning tools such as v.clean and Check Geometries. There are also case-specific tools such as Delete Holes and Remove Null Geometries, which can be applied as required. Less favourable is altering the default settings for Invalid Filtering to Ignore - since it does not address underlying issues and may yield inconsistencies in the outputs and analysis.

If the Log tab indicates a layer or folder cannot be found, ensure once again there are no spaces or special characters in the directories, subdirectories or filenames.

Inconsistencies in projections of input layers can also produce failures. And the differences will be shown by the differing EPSG codes after the layer names – in which case simply re-project the layers to the same system. If a geoprocessing error is returned, this may indicate that layers may differ in their type – specifically as single or multi-part, which relates to the number of features and corresponding entries in the attribute table. In this case, simply use the Multipart to Single Part or Promote to Multi-part tools to ensure conformity between the layers.

Finally, similar issues can occur with tools that require conformity or have constraints on accepted field types or file formats of input layers.

If related to differing field types we can use the Refactor Fields tool to ensure that the field types are the same. Otherwise, differences in common fields between layers can cause Join Attributes by Field Values, Merge and other tools to fail. Within the tool we can specify the field types, and length and precision parameters. In addition to linking layers together, it can also be used to correctly attribute a field type based on its content – such as changing a string field type with numeric variables to integer or double for use in the field calculator, interpolation tools or applying a graduated symbology.

If pertaining to the accepted geometry types: there’s a variety of geometry conversion tools to switch to the desired type. Some relevant tools include Buffer to generate polygons from lines or points, Polygons to Lines or Points to Path for Lines, and Centroids and Extract Vertices to extract points. Some layers may require additional formatting to convert successfully. And broadly, Polygonise and Rasterize tools can be used for converting between raster and vector formats.

If pertaining to the vector format: Use the Export – Save As box to change to the desired file format, such as enabling file geodatabase layers to be edited and processed.

Otherwise, use a comparable tool within the Processing Toolbox. And if substitutes also fail, this indicates that the issue likely lies with the input datasets. However, we can also troubleshoot online, exploring GIS forums and other online documentation. Seldom will you be the first to encounter an issue, and these particular resources are fantastic means to identify any issues or known bugs being reported, and ultimately resolve any issues you may encounter.

And finally we can explore and install plugins as substitutes to perform a task of interest.

So using these best practices will facilitate navigating, loading, editing and visualizing multiple geospatial datasets in QGIS. Apply these practices to minimize potential errors, frustrations or repeating processes when using QGIS. As with any program save edits to layers, symbology styles and the project file frequently to avoid information loss should the program close unexpectedly.

(The words: "For comments or questions about this video, GIS tools or other Statistics Canada products or services, please contact us: statcan.sisagrequestssrsrequetesag.statcan@canada.ca" appear on screen.)

(Canada wordmark appears.)

Environment and Energy Statistics Division
Energy Section

This guide is designed to assist you as you complete the
2021 Monthly Electricity Supply and Disposition Survey.

Help Line: 1-877-604-7828 (TTY: 1-866-753-7083)

Confidentiality

Statistics Canada is prohibited by law from releasing any information it collects which could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Statistics Canada will use the information from this survey for statistical purposes.

Table of contents

A – Reporting Instructions

Please report information for the month indicated on the front of the questionnaire, and return it within 10 days of receipt.

Please complete all sections as applicable.

If the information requested is unknown, please provide your best estimate.

This guide is designed to assist you as you complete the Monthly Electricity Supply and Disposition Survey. If you need more information, please call 1-877-604-7828.

B – Electricity Generation Method

Combustible fuels: see section C

Nuclear: Electricity generated at an electric power plant whose turbines are driven by steam generated in a reactor by heat from the fission of nuclear fuel.

Hydro: Electric power generated from a plant in which the turbine generators are driven by flowing water.

Tidal: Electric power generated from a plant in which turbine generators are driven from tidal movements.

Wind: A power plant in which the prime mover is a wind turbine. Electric power is generated by the conversion of wind power into mechanical energy.

Solar: Electricity created using Photovoltaic (PV) technology which converts sunlight into electricity OR electricity created using solar thermal technology where sunlight heats a liquid or gas to drive a turbine or engine.

Wave: Electricity generated from mechanical energy derived from wave motion.

Geothermal: Electricity generated from heat emitted from within the earth's crust, usually in the form of hot water or steam.

Other non-combustible sources: This includes fuels such as waste heat, steam, and steam purchased from another company. Specify in the space provided.

C – Combustible fuels

Coal: A readily combustible, black or brownish-black rock-like substance, whose composition, including inherent moisture, consists of more than 50% by weight and 70% by volume of carbonaceous material. It is formed from plant remains that have been compacted, hardened, chemically altered and metamorphosed by heat and pressure over geologic time without access to air.

Natural gas: A mixture of hydrocarbons (principally methane) and small quantities of various hydrocarbons existing in the gaseous phase or in solution with crude oil in underground reservoirs.

Petroleum: This covers both naturally occurring unprocessed crude oil and petroleum products that are made up of refined crude oil and used as a fuel source (i.e., crude oil, synthetic crude oil, natural gas liquids, naphtha, kerosene, jet fuel, gasoline, diesel, and fuel oil; excludes Petroleum coke, bitumen and other oil products not specified).

Other non-renewable combustible fuels: This includes fuels such as propane, orimulsion, petroleum coke, coke oven gas, ethanol and any other type of non-renewable combustible fuels not otherwise identified on the questionnaire. Specify in the space provided.

Wood and wood waste: Wood and wood energy used as fuel, including round wood (cord wood), lignin, wood scraps from furniture and window frame manufacturing, wood chips, bark, sawdust, shavings, lumber rejects, forest residues, charcoal and pulp waste from the operation of pulp mills, sawmills and plywood mills.

Spent pulping liquor (Black liquor): A recycled by-product formed during the pulping of wood in the paper-making process. It is primarily made up of lignin and other wood constituents, and chemicals that are by-products of the manufacture of chemical pulp. It is burned as fuel or in a recovery boiler which produces steam which can be used to produce electricity.

Methane (Landfill gas): A biogas composed principally of methane and carbon dioxide produced by anaerobic digestion of landfill waste.

Municipal and other waste: Wastes (liquids or solids) produced by households, industry, hospitals and others (examples: paper, cardboard, rubber, leather, natural textiles, wood, brush, grass clippings, kitchen waste and sewage sludge).

Other type of Biomass: Any other type of biomass not otherwise identified on the questionnaire. This includes fuels such as food waste/food processing residues, used diapers, and biogases – example, gas produced from anaerobic digesters. Specify in the space provided.

D – Receipts of electricity from the U.S.A.

If applicable, please report the total quantity of electricity (MWh) and Canadian dollar value (thousands of dollars) this business imported/purchased from the United States.

E – Receipts of electricity from within Canada

If applicable, please report the total quantities of electricity (MWh) and total dollar value (thousands of dollars) purchased or received from within and/or other provinces (e.g., other utilities/producers, transmitters, distributors).

F – Total Supply

This is the sum of Total Generation, Total Receipts from United States, Total Receipts from other Provinces and Total Receipts from Within Province. The Total Supply number must equal the Total Disposition number.

G – Deliveries of electricity to the U.S.A.

If applicable, please report the total quantity of electricity (MWh) and Canadian dollar value (thousands of dollars) this business exported/sold to the United States.

H – Deliveries of electricity within Canada

If applicable, please report the total quantity of electricity (MWh) and total dollar value (thousands of dollars) your company sold to other domestic companies, by province or territory.

I – Unallocated and/or losses

Include

  • transmission losses
  • adjustments
  • "unaccounted for" amounts which are subject to variation because of cyclical billing
  • losses in the main generator transformers and the electrical energy absorbed by the generating auxiliaries

Thank you for your participation.

Data stewardship: An introduction

Catalogue number: 892000062020013

Release date: September 23, 2020 Updated: June 9, 2022

By the end of this video, you should understand how to determine what data you need, where to find data, how to gather data (whether from existing sources or by doing a survey) and how to keep data safe.

Note that data gathering is usually called "data collection" when conducting a survey.

Data journey step
Foundation
Data competency
Data gathering
Audience
Basic
Suggested prerequisites
N/A
Length
8:26
Cost
Free

Watch the video

Data stewardship: An introduction - Transcript

Data stewardship: An introduction - Transcript

(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Data stewardship: An introduction")

Data stewardship: Data governance in action

Data stewardship is often described as data governance in action. This video introduces you to the fundamental aspects of data stewardship.

Learning goals

This video is intended for learners who wish to get a basic understanding of data stewardship. No previous knowledge is required. By the end of this video, you'll be able to answer the following questions:

  • What is data stewardship?
  • What's the difference between data governance and data stewardship?
  • Why is data stewardship important?
  • What are the main roles of data stewards and what are the expected outcomes of a data stewardship program?

Steps of a data journey

(Diagram of the Steps of the data journey: Step 1 - define, find, gather; Step 2 - explore, clean, describe; Step 3 - analyze, model; Step 4 - tell the story. The data journey is supported by a foundation of stewardship, metadata, standards and quality.)

This diagram is a visual representation of the data journey from collecting the data to cleaning, exploring, describing and understanding the data to analyzing the data, and lastly to communicating with others the story the data tell. Data governance and actionable data governance in the form of data stewardship principles cover all the steps of the data journey, also called the data lifecycle.

 

What is data stewardship?

Before discussing data stewardship, it's important to briefly introduce data governance and describe the link between the two. Data governance is often described as the exercise of decision making, an authority for data related matters. Data governance includes policies, directives, and regulations on data, data privacy and data security, and the assignment of roles and responsibilities to ensure continuous data quality and data management improvement. Data stewardship is often described as data governance in action. Data stewardship includes the management and oversight of data to ensure fitness for use and compliance with policies, directives, and regulations.

What is the difference between data governance and data stewardship? Data governance

Data governance is strategic and involves: creating an organizational structure that's responsible for managing governance decisions, creating a multidisciplinary and coordinated team of stewards to govern the data, defining the uses and purpose of the data and the principles by which they will be handled, establishing a plan to communicate the policies that govern the data, defining the roles and responsibilities for those who oversee data governance.

What is the difference between data governance and data stewardship? Data stewardship

Data stewardship is operational and involves identifying what data are critical and documenting the allowable values of the data. Defining operational procedures to meet the requirements defined by organizational policies regarding the creation, collection, storage, or use of, and denial of access or data. Documenting data sources which involves using a system for recording where data come from. Establishing thresholds or acceptable levels for the quality and usability of the organizations data. Ensuring compliance of data management and interoperability standards that enable data linkage and allow computer systems to communicate with each other. Adding and managing metadata that describe the data and resolving any issues that arise related to the organisation's data.

Why is data stewardship so important?

The rapid increase of data and data providers is often referred to as the data revolution or the data explosion. This increase in volume and variety of data presents many opportunities for organizations to develop more output in the form of data, information and insights. However, there are also growing concerns with data privacy and security. Since some of these data contain identifiable information. With the increase in volume, variety and speed at which data can be created, users expect more data provided in or near real time and at ever increasing levels of detail. There's a growing native many organizations to increase data sharing and data interoperability in order to use data assets to their full potential. Proper data management and stewardship have never been more important.

What is the role of a data steward?

A data steward is accountable for the organisations data assets and must know where the data assets reside throughout their life cycle, what their measure of quality is and how they are protected against associated risks. Data stewards are responsible for defining and implementing policies and procedures for the day-to-day operation and administrative management of systems and data, including the intake, storage, processing, and transmission of data to internal and external systems.

Data steward activities?

The primary roles of data stewards vary between organizations, but most data stewards are directly involved in the following activities. Data lifecycle management from obtaining data to data deletion: This includes protocols, processes and rules for data storage, access, archiving and deletion. data protection, and privacy: This includes ensuring the use of masking or de-identification techniques to protect identifiable information. Data quality: This includes adherence to data quality frameworks to ensure the data meet the needs of the users. Interoperability standards: This is the use of data standards, vocabularies, taxonomies and ontologies to permit data reuse and sharing. Training: This ensures everyone in the organization understands the role of the data steward. Communication: This includes the creation of reports on the state of data asset management. Policy instrument implementation: This involves ensuring that data adherence to all organizational policies, directives and guidelines throughout their life cycle. Data access management and security: This includes adherence to access privileges and protocols that are based on roles and right to know.

What does data stewardship look like?

When done successfully, data stewardship insures overall data management is fully aligned with an organisations corporate strategy and supports organizational performance. Sound data stewardship also includes repeatable and automated business processes, well established roles and accountabilities for those responsible for data, and ensures that business rules are adhered to and that metrics and audits are used to continuously improve data quality and effective data stewardship.

Expected outcomes

The expected outcomes of a data stewardship program are: Greater trust in information; Greater understanding of the data needed to make critical business decisions because of accurate terms and definitions; Adherence to best practices, protocols, rules and standards leading to greater efficiency; Consistent results across lines of business, and less time spent finding data, creating reports, verifying results, investigating anomalies and explaining inconsistencies; More consistent, findable, and defendable data and information leading to maintained public trust.

Goals of data stewardship

The goals of data stewardship and a data stewardship program are to:

  • Support high quality and optimized data use;
  • Facilitate data discoverability and accessibility;
  • Help set common data definitions, standards and policies to support interoperability;
  • Reduce the time spent finding data, verifying results or identifying inconsistencies;
  • Help eliminate duplication in the acquisition and storage of data; Support effective data
  • governance and strategies.

Recap of key points

Data governance is strategic and involves creating an infrastructure for looking after data in a responsible way. Data stewardship is data governance in action. In other words, data stewardship involves the day-to-day activities of gathering, storing, processing and sharing data. Data stewardship is important as we use and are held accountable for the protection of greater volumes of data.

(The Canada Wordmark appears.)

What did you think?

Please give us feedback so we can better provide content that suits our users' needs.

Data Accuracy and Validation: Methods to ensure the quality of data

Catalogue number: 892000062020008

Release date: September 23, 2020 Updated: November 25, 2021

Accuracy is one of the six dimensions of Data Quality used at Statistics Canada. Accurate data correctly describe the phenomena they were designed to measure or represent.

Before we use data we should explore it to learn about the variables and concepts, and also to discover if there are errors, inconsistencies or gaps in the data. This video looks at ways to explore the accuracy of data.

Data journey step
Explore, clean, describe
Data competency
  • Data discovery
  • Data cleaning
  • Data quality evaluation
Audience
Basic
Suggested prerequisites
N/A
Length
10:29
Cost
Free

Watch the video

Data Accuracy and Validation: Methods to ensure the quality of data - Transcript

Data Accuracy and Validation: Methods to ensure the quality of data - Transcript

(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Data Accuracy and Validation: Methods to ensure the quality of data")

Data Accuracy and Validation: Methods to ensure the quality of data

Assessing the accuracy of data is an important part of the analytical process.

Learning goals

Accuracy is one of the six dimensions of data quality used at Statistics Canada. Accuracy refers to how well the data reflects the truth or what actually happened. actually happened. In this video we will present methods to describe accuracy in terms of validity and correctness. We will also discuss methods to validate and check the accuracy of data values.

Steps of a data journey

(Diagram of the Steps of the data journey: Step 1 - define, find, gather; Step 2 - explore, clean, describe; Step 3 - analyze, model; Step 4 - tell the story. The data journey is supported by a foundation of stewardship, metadata, standards and quality.)

This diagram is a visual representation of the steps involved in turning data into knowledge.

Step 2: Explore, clean and describe

(Diagram of the Steps of the data journey with an emphasis on Step 2 - explore, clean, describe.)

Accurate data correctly describes the phenomena they were designed to measure or represent. before we use data. We should explore it to learn about the variables and concepts and also to discover if there are errors, inconsistencies or gaps in the data. This video looks at ways to explore the accuracy of data.

What does it mean for data to be accurate?

What does it mean for data to be accurate? Accurate data is a reflection of reality. In other words, the data values are valid, so not blank or missing and the values are within a valid range. Accurate data is also correct. First, let's look at the concept of valid data. One method for exploring the validity of data is to do what we call of emo analysis. Vimo is an acronym for valid, invalid, missing an outlier data values.

Invalid values

(Table of values on screen listing houshold ID, their respective spending on food and total spending on housing. One of the table cells is occupied by name "blue" and not a dollar ammount.)

On the previous slide, we defined valid data as being not blank or missing, and within a valid range of values. Invalid data, on the other hand, has values that are impossible. An example would be a variable that should have a dollar amount, such as spending on housing, having the value blue. That makes no sense.

Missing values

(Table of values on screen listing houshold ID, their respective spending on food and total spending on housing. One of the table cells is empty and does not contain a dollar ammount.)

Missing values are where the variable is left blank. For example, we would expect either a 0 or a number for the value of total expenses.

Outlier values

(Table of values on screen listing the name of individuals, their respective occupation and age. One individual is listed as being 103 years old and another as being 301 years old.)

Outlier values are extremely small or extremely large compared to what we would expect. Some outlier values are actually true. For example, a person's age could be 103 years, although this is quite rare. Other times, outlier values are also invalid, such as the value of 301 for a living person's age in years.

VIMO analysis

One way to do of emo analysis is to produce a frequency distribution distribution of key variables and look at the proportion of valid, invalid, missing and outlier values. What proportion of valid values is acceptable? Is at 100%? Or something lower? Look at the range of values for key variables, ignoring the missing and invalid values for a moment, is the range and distribution of values realistic? Where the values are invalid or missing, is it easy to tell if they should actually be 0 or are they not applicable? Or should there be some other value? Another way to explore the validity of data is to use datavisualization techniques, such as plotting the data on an axis. This is a straightforward way to quickly detect if there are patterns or anomalies in the data. There are software tools to detect outlier values and do data visualization. Remember that not all unusual values are necessarily wrong.

Example: Detecting invalid values

(Diagram of a barchart presenting the number of footwear sold online. The listed types of boots are, from the left: Winter boots; Rubber boots; Sandals; Running shoes; Umbrellas.)

In this made up example, we use a bar chart which is a very simple data visualization method to look at the frequency distribution of the types of footwear sold online. The Heights of the bars looked to be all within the same range. However, we notice on the horizontal axis that one of the bars is for umbrellas. You can't wear umbrellas on your feet. This is invalid. Further investigation is needed to figure out if the data in the bar actually represents some other type of footwear and the label umbrella was erroneously assigned, or if somehow the count of umbrellas got into the chart of footwear sales by accident.

Example: Detecting missing values

(Table on screen presenting a data distribution for Apples (A), Oranges (O) and Bananas (B). The following columns represnt the count values at 0 (A=0; O=0; B=1), 3 (A=1; O=0; B=0), 5 (A=0; O=2; B=0), 8 (A=0; O=0; B=2). The last columns represnts the count of missing values (A=5; O=7; B=6).)

In this example, we created a frequency distribution table of the values for three variables, apples, oranges and bananas. The column on the far right shows how many times they were missing values for each of these three variables. Remember that missing values are not the same as values equal to 0. In this example, there are a lot of missing values relative to the number of non missing values, so we would probably want to try to fill them in before using this data.

Example: correcting missing values

(Text on screen: There are many missing values in this table. Some are easy to fill by adding or substracting; Others we cannot fill without makingsome assumptions or finding additional information.)

(Table on screen presenting data values for the same table presented in the previous slide where the columns represnt the Row, Apples, Oranges and Total fruit (TF). the values are as listed: Row 1 (A=3; O=5; TF=-); Row 2 (A=-; O=5; TF=8); Row 3(A=-; O=-; TF=0); Row 4(A=-; O=-; TF=8).)

Following through with the outliers detected on the previous slide, here we see how we could correct them in this table of actual data values. We see where the missing values are. In the first row, it's easy to see that if we have three apples and five oranges, the missing value for the total number of fruit should be 8. Similarly, it's not hard to determine that the missing number of apples in the 2nd row is 3. However, in the 3rd row, the O could be correct, in which case the missing values for apples and oranges should also be 0. However, if the 0 total is wrong, then we don't know what the value of any of the three variables should be. In the 4th row, if the total is indeed 8, then we do not have enough information to know what the value is for. Apples and oranges should be. We only know that they're between zero and eight.

Example: detecting outlier values

(Scatter plot on screen with random dots where all but one red dot are approximatly aligned. 2 trendlines are added to represent said linearity.)

(Text on screen: This value (red dot) is further from all the other data values than we would expect.)

In this made up example, the data points represented by the green and red dots have been plotted on a horizontal and vertical axis. Two different methods have been used to estimate the central tendency of the data values. Those are represented by the red and blue lines. Most of the data values fall on or near both of the fitted lines. However, the Red Point is way off the lines. It's an outlier value. Further investigation is needed to determine what makes this data point so different and what should be done with it. Some outlier values are correct even though they are unusual.

Exploring the correctness of data

(Text on screen: Micro-data: For example a list of people with their occupation and date of birth. Macro data: Less detailed, like zooming out with a camera. For example: Micro data produced from a list of people with their occupation and date of birth could be counts of people by age categories and by occupational groups. Micro data is more granular than macro data, at a more detailed level.)

We said earlier that accurate data is both valid and correct. We looked at the vemoa analysis as a way to explore the validity of data. Now let's focus on the correctness of data. But first, we need to differentiate between looking at individual data values or micro data and looking at those values summarized up to a higher level or macro data microdata is more granular than macro data. At a more detailed level.

Exploring correctness of data

(Text on screen: Exemple 2: a 12year-old has a Master's degree in biology, is married and is employed by the University of Manitoba. Does this makes sense?)

One method to explore the correctness of data is to compare it to other related information. We could look at the reasonableness of values across a single data record. Are there variables that should make sense together? For example, if there are a total and the parts that make up that total is the sum correct? Another example is to look at a person's current age and compare that to the highest level of education attained or marital status or employment status. Does it make sense?

We could also look for commonality with standards, for example, in Canada, the 1st letter of the Postal code is determined by which province the addresses in all Postal codes in Newfoundland and Labrador start with a all Postal codes in Nova Scotia start with B and so on. If this is not the case then one of the pieces of information is incorrect.

(To answer these questions it is necessary to have reliable "facts" about the real world.)

Yet another way to explore correctness is to compare what's in the data with what's happening in the real world. You could calculate summary statistics such as totals and averages for car sales across Canada and compare across provinces or through time. Do the numbers make sense? Does the auto industry track these numbers and how to your numbers compared to theirs?

Tips for exploring correctness of data: Part 1

Here are some tips to make the comparisons easier. Before trying to compare data values, put them into a common format. The 12th of June 2018 will look different if the month is listed first in one case and the day is listed first in another. As well as using standard formats, use standard abbreviations, concepts and definitions to the extent possible. For example, in Canada we have a standard two letter code for the names of all the provinces and territories.

Tips for exploring correctness of data: Part 2

Using data visualization is a great way to spot anomalies in data before you get started, think about what level of incorrectness you can tolerate in the data, what's adequate for your purpose. Once you find discrepancies, use automation to correct errors in an efficient, consistent and objective manner.

Describing accuracy of data

(Text on screen: Document Clearly: The level of accuracy in terms of validity and correctness of the data once you have finished exploring and cleaning the data. This documentation could be of interest to: Those who will use the data and to those who will be responsible for exploring, cleaning and describing other similar data.)

Before using the data or passing it to stakeholders who will use the data, be sure to describe the accuracy of the data. The documentation describing the data is sometimes referred to as metadata. Document the methods you used to explore the validity and correctness of the data, as well as the methods you use to clean or improve the data. This is what users of the data need to know so they can use it responsibly.

Recap of key points

This video presented the basic concepts of accuracy and data validation. Vimo analysis recommends the use of frequency distributions of key variables to assess the proportion of valid, invalid missing an outlier values. Data visualization techniques and the use of common formats. An automation help to ensure efficient correct results. In addition, clear documentation is essential to gain insight into the methods used to explore and validate the data.

(The Canada Wordmark appears.)

What did you think?

Please give us feedback so we can better provide content that suits our users' needs.

Gathering Data: Things to consider before gathering data

Catalogue number: 892000062020005

Release date: September 23, 2020 Updated: November 25, 2021

By the end of this video, you should understand how to determine what data you need, where to find data, how to gather data (whether from existing sources or by doing a survey) and how to keep data safe.

Note that data gathering is usually called "data collection" when conducting a survey.

Data journey step
Define, find, gather
Data competency
Data gathering
Audience
Basic
Suggested prerequisites
N/A
Length
6:10
Cost
Free

Watch the video

Gathering Data: Things to consider before gathering data - Transcript

Gathering Data: Things to Consider Before Gathering Data - Transcript

(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Gathering Data: Things to Consider Before Gathering Data")

Gathering data: things to consider before gathering data

Data gathering involves first determining what data you need, then where to find it, how to get it, and how to keep it safe. This video introduces you to things you should consider when gathering data.

Learning goals

By the end of this video you should understand how to determine what data you need, where to find it, how to gather data, whether from existing sources, or by doing a survey, and how to keep it safe. Note that data gathering is usually called data collection.

Steps of a data journey

(Diagram of the Steps of the data journey: Step 1 - define, find, gather; Step 2 - explore, clean, describe; Step 3 - analyze, model; Step 4 - tell the story. The data journey is supported by a foundation of stewardship, metadata, standards and quality.)

When conducting a survey, this diagram is a visual representation of the data journey from collecting the data to exploring, cleaning, describing, and understanding the data to analyzing the data, and Lastly to communicating with others the story that day to tell.

Step 1: Find, gather and protect

(Diagram of the Steps of the data journey with an emphasis on Step 1 - Find, gather, protect.)

(Text on screen: Showing relationship between two things)

Looking into how to gather data is part of the find gather an protect step of the data journey some data are gathered for statistical or research purposes in other situations data are gathered for regulatory purposes or to provide an individualized service to Canadians no matter what the purposes for data gathering the aspects to consider are similar.

Determining what data you need

The first thing to consider before gathering data is to fully articulate what questions you're trying to answer. Who do you want to draw conclusions about? Is it all Canadians or all businesses in a certain sector of the economy? This is the target population. Next. What's the individual unit you want to look at? Is it a person, family, household, or a business? This is called the unit of observation.

What is the time frame you want to look at? Do you want to look at only one period of time, or do you want to have data for multiple time periods? Also, what level of quality do you need in the data when looking at different data sources? Consider how and for what purpose the data was created.

Will IT support the level of analysis that you want to do? What characteristics or attributes are you interested in? Are they all available on a single data source, or will you have to use two or more different data sources? It's important to know at the outset what you looking for and then to assess all potential data sources against these criteria.

Where to find data

When you deciding which ones to use. The first place to look for data are open source is the Government of Canada has a wealth of data available to all Canadians in the open data portal. Statistics Canada has public use microdata files, aggregated data products and many data products free for download. Online sources are also an option.

Data sources are also available, but with some restrictions on who can use them or out of cost. Statistics Canada offers researchers access to data through research, data centers. Statistics Canada also offers remote access to data under certain conditions under certain constraints. Service providers such as Internet and power companies, offer data products, sometimes for a fee if no existing data will meet your needs, you can do a survey to collect new data as a last resort. We want to emphasize it. Doing a survey should be a last resort. It's by far the most costly and complex option for gathering data. To learn more about how to do a survey, please refer to the course surveys from start to finish. Course code 10H0085 on the Statistics Canada website.

How to gather data

The first step in gathering data is to prepare a plan. The plan should cover which data source or sources will be used in all the steps to acquire the data. For example, what are the steps if there's a protocol that must be followed, is it necessary to negotiate with the data owner, estimate the time it will take to get the data and the cost both in terms of fees, if any, an storage costs take into account the skill set required for gathering the data, the plan could include a business case to explain our request for funding. The data might be structured, meaning it's already in some sort of database or format where the variables are separated, or it might be unstructured, such as sensor data or web scraped data that will require some manipulation to put it into a usable format. For more information about day to see the video on types of data.

No matter where the data come from, the quality of the data needs to be monitored throughout the gathering process to ensure that anomalies responded. Once the data are gathered, the next steps are to clean Explorer and describe the data. For more about these steps, see the videos for the clean Explorer and describe step in the data journey.

Keeping data safe

When you gather data, you need to consider the following privacy by collecting only the information that is needed to reach your objective security. By keeping data safe from unauthorized access and use confidentiality by not releasing information that could directly or indirectly identify information sources, transparency in your process is consult your organization's policies and guidelines to ensure that your meeting privacy and security requirements.

Canada has municipal, provincial, territorial and national jurisdictions that govern privacy and security requirements. Consult these as well as your organization's privacy and security policies and guidelines as they relate to your data gathering exercise.

Recap of key points

Data gathering involves first articulating what questions you're trying to answer. Next, look for existing open source data. If you can't find what you need there, try existing sources that have some restrictions as a last resort, do a survey to collect new data. Make a plan for all the steps and gathering data. Be sure to protect the privacy and security of the data.

(The Canada Wordmark appears.)

What did you think?

Please give us feedback so we can better provide content that suits our users' needs.