What is Data? An Introduction to Data Terminology and Concepts - Transcript
(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "What is Data? An Introduction to Data Terminology and Concepts")
What's data? An introduction to data terminology and concepts
This video will introduce some basic terminology and concepts related to data.
The data terminology and concepts covered in this video are datasets, databases, data protection, data variables, micro and macro data, and statistical information. No previous knowledge is necessary.
Steps of a data journey
(Diagram of the Steps of the data journey: Step 1 - Find, gather, protect; Step 2 - explore, clean, describe; Step 3 - analyze, model; Step 4 - tell the story. The data journey is supported by a foundation of stewardship, metadata, standards and quality.)
The data journey represents the steps data goes through on its way to telling a story. We will try to answer the question what is data by considering data at these different steps. First, let's consider what is data in the context of finding, gathering and protecting it.
What is data?
Data are facts or figures about an object or phenomenon. Objects simply exist and phenomena happen. People create data. We measure, count, observe, and describe the world around us. We record what we find using symbols and images. This is what data is.
Where does data come from?
Where do data come from? Data come from everywhere. For example, doctors gather data about our health and wellbeing, >stores gather data about our purchases, surveys gather data about our habits, and scientists gather data about climate conditions such as temperature and wind speed. This is sometimes referred to as Earth observation data. These are just a few examples in the digital age, data is literally all around us.
How is data organized?
Data can be organized in structured formats such as tables, graphs, or Maps, or a data can also be unstructured, for example text and documents.
Custodians of data have a responsibility to be good stewards and protect the privacy, confidentiality and security of personal identifiable information.
Personal identifiable information includes any information that could directly or indirectly identify an individual person, business, or organization.
Step 2: Explore, clean and describre
(Diagram of the Steps of the data journey with an emphasis on Step 2 - explore, clean, describe.)
Now let's consider what is data at the next step in the data journey. Once we have data were curious to explore it. If we find errors in the data, we try to correct them. What does data look like for this to happen?
Datasets and databases
Data is often organized in a table with rows and columns. In electronic format, this is known as a dataset. A dataset that is organized for a particular purpose, for example, hospital registrations is called a database. There are software packages for managing databases, such as Oracle, SQL and Microsoft Access.
A relational database is an organized collection of datasets that relate to each other through key values.
For example, a relational database about the school system could have one data set of a list of schools. Another data set of classes within schools, and another data set of students within classes.
There's a way to connect up all the datasets in a relational database. In this example, an identification variable for the school could be on all three datasets so that you could find all the classes and all the students related to a particular school.
What's inside a dataset?
The actual data inside a dataset or database are arranged in variables. Some of the variables represent the measurements, counts, observations, or descriptions that we talked about earlier. Other variables served to identify to whom or what those measurements counts, observations or descriptions pertain data where one record or row represents one unit of observation is called Micro data. It's highly recommended to explore and clean micro data before doing any sort of analysis on it or using it for any other purpose.
To do this, basic statistical methods are applied to Microdata variables. For more information, see the videos on central tendency and dispersion.
Step 3: Analyze and model
(Diagram of the Steps of the data journey with an emphasis on Step 3 - analyze, model.)
To discover relationships between variables or to look for trends through time, we do analysis on cleaned Microdata. Other terms for doing analysis are modeling, deriving inference and data analytics. To learn more about data analysis, see the analysis one or one video series.
Different states of data
Here's a handy way to summarize the different states that data can be in. Microdata refers to a data set where one record represents one unit of observation. Microdata are the basic building block whether you're using data to provide services, enforce regulations, answer research questions, or build policy. Macro data refers to a data set where records have been rolled up or aggregated together.
Statistical analysis or data analytics can be performed uncleaned, microdata, or macro data. Metadata is the documentation or information that provides context It makes it easier to use the data appropriately.
Step 4: tell the story
(Diagram of the Steps of the data journey with an emphasis on Step 4 - Tell the story.)
Applying statistical analysis or a data analytics to data is a way to produce statistical information. The last step in the data journey is to tell the story that emerges from statistical information.
Statistical information looks quite different from the original data on which it is based. It has been synthesized and transformed to reveal meaning or intelligence that is difficult to discern in the micro data. The statistical information that comes from analysis and modeling is easier for people to understand if it's presented in some sort of story. The story could be told through a research paper, an infographic a media article, a data visualization, or some combination of these and other data presentation methods.
Recap of key points
Data are facts or figures about an object or phenomenon. Data variables are stored in a dataset. Custodians have a responsibility to protect the privacy, confidentiality and security of identifiable data. Clean micro or macro data is analyzed using statistical analysis or data analytics to produce statistical information. The data story is told from the statistical information.
Recap of key points
To learn more about data, check out the videos called the data journey types of data and gather data.
(The Canada Wordmark appears.)