Chapter 4.3: Management and access to metadata
National statistical organizations (NSOs) are responsible for providing information to users about the nature and characteristics of published data; and for assisting users in assessing the quality of the data and the suitability of the data for their own purposes. That information—commonly called metadata—is provided to ensure an understanding of the components of the data, including concepts, variables and classifications, underlying statistical methods, and key aspects of data quality.
By proactively providing access to metadata, NSOs ensure that their users understand all the attributes of statistics (interpretability), including data limitations, so they are able to make informed decisions. Providing access to accurate, up-to-date metadata is also a way to ensure that NSOs maintain the trust and confidence of the public with regard to data quality, thus building on the credibility and the reputation of their organization.
In the recent years, metadata management and data access have been guided by two important principles:
- Metadata are critical throughout the entire survey process: They are a driver in the survey process, starting at the data collection phase, ensuring that information about data is consistently reused throughout the entire data collection, processing, analysis and dissemination cycle (for additional details refer to Chapter 3.1: The Corporate Business Architecture and Chapter 3.3: Enhancing how surveys are conducted); and
- Interoperability: Common models and tools are used to manage metadata throughout the survey cycle, ensuring that metadata can be accessed, used and imported as content at each stage in the survey process.
Strategies and tools
As Statistics Canada's experience has shown, two important mechanisms can greatly facilitate users' accessibility to metadata: (1) the creation of an integrated metadatabase to store and centralize all public metadata; and (2) an adequate governance structure. The next section will describe how this was achieved in Canada.
1. The Integrated Metadatabase: a repository of information about surveys and programs
The process of creating a metadata repository is not an instinctive one; it is often seen as an additional step to the data production process. In Canada, this process started in 2000. Statistics Canada's Integrated Metadatabase is a critical tool for data users: it provides them with the ability to access the central corporate repository of statistical metadata for accurate information on survey methodology, variables and data quality. Statistics Canada has developed the information included in the database in accordance with international standards, such as those set by the Statistical Data and Metadata eXchange (SDMX), the International Organization for Standardization (ISO) and the Data Documentation Initiative (DDI).
Public metadata for all surveys and statistical programs are accessible on the Statistics Canada website through a module called Definitions, Data Sources and Methods. Within that module, data users can easily find the following information:
- a description of each of Statistics Canada's surveys and statistical programs, including information on data sources and methodology
- questionnaires used to collect data for each survey or statistical program
- definitions of standards concepts, variables and classifications (also called structured metadata)
- data quality considerations and indicators.
At Statistics Canada, metadata are managed by a specific program area (the Standards Division). However, experts from each survey and statistical program area are responsible for supplying the metadata for their surveys or statistical programs and for following a template and guidelines developed by the individual standards programs.
Metadata for surveys and statistical programs are now being integrated into the survey prescription process for data collection, with the result that basic survey and program metadata are now accessible on the Statistics Canada website at the start of data collection as well as at the time of data release. This has the benefit of informing survey respondents about the collection activity in which they are participating, thus encouraging higher levels of participation. This initiative highlights the importance of publishing metadata throughout the survey cycle, not only during the dissemination phase.
The maintenance of a statistical metadatabase is part of Statistics Canada's Corporate Business Architecture program, which includes, among its principles, the creation and maintenance of metadata-driven processes that can produce and use metadata information throughout the survey cycle (for more details refer to Chapter 3.1: Corporate Business Architecture).
2. Governance structure
It is important for NSOs to identify a specific program that will be in charge of centralized metadata. As mentioned earlier, this program area should be responsible for the development and maintenance of a central metadata repository where standards and approved deviations are to be documented. To the maximum extent possible, the use of internationally recognized standard classifications, concepts, variables and definitions is adopted. Deviations are acceptable only if they are essential to the proper description and measurement of the country's specific realities.
Statistics Canada has a mature and effective governance and management structure that ensures proper management of metadata. Responsibilities are shared among three entities: the Standards Division, the Methods and Standards Committee, and the program areas.
The mandate of the Standards Division is to develop, maintain and communicate statistical standards, to promote and monitor their implementation under the terms of the Policy on Standards, and to provide guidance on their interpretation. The Standards Division also has a mandate to develop, maintain and disseminate statistical metadata for surveys and statistical programs, under the terms of the Policy on Informing Users of Data Quality and Methodology.
The Policy on Standards mandates the use of standard names and definitions for populations, statistical units, concepts, variables and classifications in statistical programs, and provides a framework for reviewing, documenting, authorizing, and monitoring these in Statistics Canada's programs. This policy dictates that, where departmental standards have been issued, program areas must follow those standards unless a specific exemption has been obtained under the provisions of the policy. The policy outlines separate processes and roles and responsibilities related to the following:
- Creating and defining standard names and definitions;
- Obtaining approval for creating and registering new departmental standards; and
- Obtaining an exemption where, in exceptional circumstances, an existing standard cannot be adopted.
The Policy on Informing Users of Data Quality and Methodology identifies the type of information to be made available to data users as part of the agency's disseminated data products. This policy governs all statistical data and analytical results disseminated by Statistics Canada. It also provides guidelines on the type of information to be published, and gives examples, and identifies the integrated metadatabase as the corporate metadata repository.
Standards Division is guided mainly by the Methods and Standards Committee. This committee, comprised of directors from across the agency, is chaired by an assistant chief statistician. The role of this committee is to
- manage corporate metadata
- assist and advise on the development and application of statistical standards and metadata within the agency's programs
- approve the adoption of statistical concepts, variables and classifications as departmental standards while ensuring comparability and compliance with international standards
- approve exemptions to the departmental standards where appropriate
- initiate periodic reports on the state of compliance with the Policy on Standards, and initiate a review of the policy and accompanying standards when deemed necessary
- advise on the development and use of sound statistical methods
- provide guidance on priorities for statistical research and innovation
- act as the focal point for the review and monitoring of corporate data quality practices and issues
The Methods and Standards Committee reports directly to the Executive Management Board, which is chaired by the Chief Statistician of Canada.
Finally, the program areas are responsible for documenting the metadata according to the requirements prescribed in the Policy on Informing Users of Data Quality and Methodology.
Key success factors
Metadata management and access at Statistics Canada are successful for a number of reasons.
Firstly, the creation, storage, management and dissemination of metadata are governed by clear mandatory requirements explicitly stated in Statistics Canada policies, and overseen by the Methods and Standards Committee. Clear policy requirements and support from senior executives from across the agency provide a framework for ensuring quality in the development, documentation and dissemination of Statistics Canada's metadata. They also ensure that employees are engaged and aware of the importance of maintaining and disseminating current, accurate, relevant and interpretable metadata.
Secondly, the use of a corporate central metadata repository, the Integrated Metadatabase, means that there is a primary source for disseminated metadata at Statistics Canada. This ensures that compliance with policies and guidelines, including international guidelines, can be carefully monitored and that, when changes are required, they can be implemented across all program areas, in an efficient and coordinated way.
Thirdly, because metadata are integrated in the survey prescription process, whereby metadata on survey programs are published on Statistics Canada's website prior to the actual start of data collection, metadata have become a standardized part of the survey process, not only at the dissemination stage but from the start to the finish of the survey cycle.
With current metadata increasingly available to the public via the agency's website, Statistics Canada has been able to ensure the quality of the information it publishes and to respond to the needs of its clients in an effective way. In turn, clients have come to expect high-quality metadata, which they use on a regular basis to interpret and contextualize their data and analyses.
Challenges and looking ahead
The creation and dissemination of comprehensive metadata is not without challenges.
Resource requirements are an issue, both in terms of creating metadata within program areas and of maintaining and disseminating metadata via Statistics Canada's Integrated Metadatabase. The number of concepts, variables and classifications used across Statistics Canada's surveys and statistical programs is in the thousands. Furthermore, it can sometimes be difficult to find consensus on their nomenclature across different program areas, which may use similar, but not identical, versions of the same variable.
It is important, therefore, to constantly reinforce, to all staff, the positive impact that good metadata have on the efficiency and coherence of statistical production and on users' ability to properly understand and use the data. Fostering interoperability, maintaining compliance with international standards, and developing and implementing a consistent metadata approach can only enhance the relevance and effectiveness of official statistics.
Statistics Canada (2004). Policy on Standards, Ottawa. Internal document. Accessible on demand.
Statistics Canada (2000). Policy on Informing Users of Data Quality and Methodology, Ottawa. Internal document. Accessible on demand.
Statistics Canada. Statistics Canada's website. Consulted on the 11th of March 2016 and retrieved from http://www.statcan.gc.ca/eng/concepts/index?MM=
Statistics Canada. Statistical Data and Metadata Exchange, Ottawa. Internal document. Accessible on demand.
- Date modified: