FAIR data principles: What is FAIR?

Catalogue number: 892000062022002

Release date: May 24, 2022

This video will break down what it means to be FAIR in terms of data and metadata, and how each pillar of FAIR serves to guide data users and producers alike, as they navigate their way through the data journey, in order to gain maximum, long-term value.

Data journey step
Data competency
  • Data stewardship
  • Metadata creation and use
Suggested prerequisites

Watch the video

FAIR data principles: What is FAIR? - Transcript

FAIR data principles: What is FAIR? - Transcript

(The Statistics Canada symbol and Canada wordmark appear on screen with the title: “FAIR data principles: What is FAIR?”)

(Text on screen: FAIR data principles: What is FAIR? Delivering insight through data for a better Canada)

This video will break down what it means to be FAIR in terms of data and metadata, and how each pillar of FAIR serves to guide users and producers alike, as they navigate their way through the data journey, in order to gain maximum, long term value.

In this video, you will learn the answers to the following questions:

  • What are FAIR data principles?
  • Why are FAIR data important?

This diagram is a visual representation of the data journey from collecting the data to exploring, cleaning, describing and understanding the data. To analyzing the data and lastly to communicating with others the story the data tell.

FAIR data principles are relevant throughout every step of the data journey

FAIR data means data that are: Findable, that unique identifiers and metadata are used to help locate data quickly and efficiently.

It also means the data are Accessible, that they are available with the appropriate permissions and that metadata are freely available and can be accessed in a standardized way.

FAIR data are also Interoperable, in that by using standards, machine-readable data are exchanged and yield outputs for use in a readable and useful format.

All to ensure the data are Reusable. That metadata exist to describe the source, origin and destination of data and their usages in a standardized way, enabling the meaningful reuse of data over time and across disciplines. Let’s break that down a little…

The ultimate goal of FAIR is to use these principles as a set of guidelines for anyone wishing to enhance the reusability of their data. This is done by ensuring the data are Findable, accessible, interoperable and reusable.

Data and metadata that include unique identifiers help us search data catalogues to find information. For example, something as simple as “current weather in Whitehorse”, when typed into an internet search engine will yield multiple URLs. These URLs, or webpage links, are each made up of a string of unique identifiers which have been registered in the search engine’s data catalogue. And as a result, when clicked, these URLs will bring you to where you need to be in order to find the information you are looking for.

Once you have found your desired data through that unique identifier, in this case, the URL that offers to show you the weather in Whitehorse, you need to access them. Sometimes sources are freely available and sometimes, when you click on a link, you might be asked for the appropriate permissions, such as a user name and/or password. In the event you do not have the appropriate permissions, information or metadata should be freely available to explain to you what the data contain and how data might be accessed.

After you have access to the data, in this case, the current weather in Whitehorse, you might be interested to see if today’s weather is on par with previous years, or if it is currently colder or warmer than average. For that, you might want access to a file that possesses historical data. The way in which that file - located at point A - is formatted, must be understood and readable in order to be used by point B - your personal computer. This requires the exchange and interpretation of machine- readable information.

Machine-readable information includes the use of standardized:

  • Vocabularies, to provide a consistent way of describing data such as geographic names or numerical codes
  • Formats and applications including HTML, .CSV, JSON and others
  • APIs (Application Programming Interfaces), which allow one piece of software to freely and openly communicate with another

In order to feel comfortable reusing data, you need to know the origins of the data or where they came from, where they have been, and how they have been used in the past. This is called Provenance. Provenance is information about the source of the data ( there could be more than one) relative to where you are within a particular process. For example, if you are tasked with one step in the process, then provenance could be the list of all the people or machines that handled or manipulated the data before you. Then lineage would list all the transformations that occurred throughout those processes, like which records have been changed and how, which variables have been renamed, etc. Together, provenance and lineage help understand how the data came to be in their current form.

Metadata containing rich descriptions of provenance and lineage help to encourage:

  • Understanding where data have come from and what methodologies have been employed to produce them
  • Understanding the quality of the final product or the pedigree of its sources by detailing its relevance, completeness, accuracy, reputation and integrity.

Together, provenance and lineage provide the complete traceability of where data have resided and what processes have been performed on them over the course of their life, making them easier and safer to reuse.

So, back to our example of historical weather data for Whitehorse. First, you found the data, accessed them and then used them on your device of choice. Rich descriptions of the data that include information on how the data have been transformed and any data usage licensing now provide you with the needed information to combine these data with other data in order to reuse them based on your needs. Meaning, after accessing historical data for other cities, over a certain time frame, you can rank and compare Whitehorse to a set of other cities, in terms of being colder or warmer than average this year.

Now that the video is almost over, time for a knowledge check! How much do you remember about FAIR data? I’ll read the question aloud. Then after, pause the video while you make your selection. APIs (Application Programming Interfaces), which allow one piece of software to freely and openly communicate data with another, are an example of which FAIR principle that are…

  • Findable
  • Accessible
  • Interoperable
  • Reusable

The correct answer is 3 – Interoperability. APIs are an example of interoperability in that they facilitate the exchange and interpretation of machine-readable information from point A to point B.

FAIR data principles ensure data are:

  • Findable
  • Accessible
  • Interoperable
  • Reusable

FAIR data principles are important because they can be used as a guideline for anyone wishing to enhance the reusability of their data or wishing to develop a new reusable data product.

(The Canada Wordmark appears.)

What did you think?

Please give us feedback so we can better provide content that suits our users' needs.