Data quality in six dimensions - Transcript
Data quality in six dimensions - Transcript
(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Data quality in six dimensions")
Data quality in six dimensions: Evaluating and ensuring quality
We are exposed to data every day. For example in news stories, weather reports and advertising. But how do we know whether these data are of good quality? In this video, you will be introduced to the fundamentals of data quality, which can be summed up in 6 dimensions or six different ways to think about quality. You will also learn how each dimension can be used to evaluate the quality of data.
By the end of this video you will learn about basic quality concepts, data quality expressed as 6 dimensions and the interactions between these dimensions. This video is intended for learners who want to acquire a basic understanding of data quality. No previous knowledge is required.
Steps in the data journey
(Diagram of the Steps of the data journey: Step 1 - Find, gather, protect; Step 2 - explore, clean, describe; Step 3 - analyze, model; Step 4 - tell the story. The data journey is supported by a foundation of stewardship, metadata, standards and quality.)
(Text on screen: The steps in the data journey are supported by a foundation of stewardship, metadata, standards and quality)
This diagram is a visual representation of the data journey from collecting the data to cleaning, exploring, describing and understanding the data to analyzing the data, and Lastly to communicating with others the story. The data tell. You will notice that data quality does not fall under one specific step in the process. It is instead something that is important throughout the entire data journey.
(Diagram of the six dimensions of data: Relevance; Accuracy, Timeliness, Interpretability, Coherence, Accessibility)
The six dimensions of data quality are: Relevance, Accuracy, Timeliness, Interpretability, Coherence, Accessibility. Each dimension will be examined separately over the next few slides.
The relevance of data or statistical information reflects the degree to which it meets the needs of data, users and stakeholders to test a data product for relevance, you should ask yourself, does this information matter? At Statistics Canada, it is our responsibility to provide Canadians with information that matters. In other words, is it useful in building policy? Does it aid in long term planning? Does it fill an existing data gap? Can it promote new initiatives that would benefit Canadians? Does it help improve services? What questions would you ask to test the relevance of your data?
Accurate data give a true reflection of reality. Ask yourself if what is being measured is in line with what is actually true.
Timeliness is the delay between the time when the data are meaningful and when they are available. For example, school bus authorities need UpToDate weather forecasts very early in the morning to make good decisions about whether to cancel school buses. Likewise, parents need to know about school bus cancellations before they had to work. Timeliness is closely related to accuracy and relevance.
Information people can't understand or can easily misunderstand has no value and could even be misleading. To avoid such misunderstandings, supplementary information, or documentation, called metadata should always be provided with any data set as it allows users to interpret the data properly.
Coherence can be split into two concepts, consistency and commonality. Consistency means using the same concepts, definitions and methods overtime. Commonality means using the same or similar concepts, definitions and methods across different statistical programs. If there's good consistency and good commonality, than it is easier to compare results from different studies or track how they stayed the same, or change overtime with regards to data quality. Coherence is the ability to make comparisons across cities, regions, time periods, etc.
The final dimension of quality is Accessibility, which means that people are aware of and have access to the data. When determining whether data are iaccessible, make sure they are organized a system or a catalog is in place to allow the users to locate all available data available. Once the location of a data source has been determined, a consistent means of accessing these data must also be provided.
Accountable, a data producer must be accountable for assisting users experiencing difficulty or dissatisfaction with any aspect of data access affordable. What good are the most reliable data? If you can't afford to use them?
Applying the dimensions of quality
Imagine that you own a pizza shop and you are considering expanding your business by opening a second location in the Toronto area. What kind of data could help you make your decision, and where might you find such information?
(Text on screen: The types of questions that could help you expand in Toronto are: What kind of data could help you to decide whether to open a second location? Where might you find such information at a relatively low cost? How could you ensure that the data are accurate, timely, interpretable and coherent?
Opening a second location in Toronto would require social and economic data about the city, including neighborhood profiles, business expansion, and location assistance, employee data in household spending habits. Grants, incentives, and rebates, festivals, events, parks and beaches, municipal development plans.
Being able to access reliable data helps inform your decision of whether to open a second location and to assess its potential growth overtime. Ideally this information would be well organized and readily available at little to no cost from reliable open data sources such as: the federal government's open data site, the city of Toronto's open data portal, the Ontario Ministry of Finance and newspapers.
What makes these sites so accessible? They have many features, including open by default, menu driven apps Gallery, open government licenses, open data inventory, application programming interface API and content in both official languages, federal and provincial sites.
(Text on screen: Access to the aforementioned portals: the federal government's Open Data site: https://open.canada.ca/; the Ontario Ministry of Finance: https://www.fin.gov.ca/; the City of Toronto's Open Data Portal: https://www.toronto.ca/
Accurate data allow you to make precise calculations about your expected costs. An earnings as well as about the potential success of any new restaurant operation. The success of your new restaurant operation will depend on U preparing accurate financial projections based on solid research, planning and good quality data.
(Table titled: Historical and projected population by census division, selected years - reference scenario)
Data tend to be of greater value when they are released at a consistent, favorable or useful time. The release of projected population data by region gives restaurant tours a sense of which areas are likely to experience population growth.
There are several ways in which these open data sites make it easier to understand and interpret their data. They apply a structured, standardized format or user friendly interfaces. They provide the user with a consistent way to access, view, and understand the data. They incorporate a variety of data into a single visualization tool to make them easy to interpret.
Documentation and supplementary information are readily available to help provide context around the datasets, notes, Footnotes, and sources appear within the table. The site makes use of data visualization tools, tables, info, graphics charts which make it easier to interpret the data.
(Image of the Socioeconomic highlights from the 2016 census for the Scarborough centre (Toronto Ward 21))
Comparative measures of employment rates, income levels, and education are important indicators of economic outlook and the potential success of any new restaurant operation. The city of Toronto's open data portal has predefined views with built in coherence analysis. Each view allows the user to compare the data for each Ward with those for the entire city as well as those for other wards in a single visualization tool.
Summary of key points
Data can be a very powerful decision-making tool, but when used improperly they can be misleading by applying the six dimensions of quality, you can choose a high quality data source that's right for your needs.
An acceptable level of quality can be achieved by ensuring that there is a good balance among all six dimensions relevance, accuracy, timeliness, interpretability, coherence and accessibility.
(The Canada Wordmark appears.)