"A variable is a characteristic of a statistical unit being observed that may assume more than one of a set of values to which a numerical measure or a category from a classification can be assigned."
In the above definition, the key components are:
- statistical unit being observed,
- numerical measure, and
- category from a classification.
These components constitute the standard components used in this information package to name and structure variables. A statistical organization publishing data has to adopt a standard way to name and structure the variables to which the data relate. From the point of view of the users, they must be able to recognize the same structure underlying the name of variables whichever sub-division of the organization is producing the data and whatever the subject area being studied. From the point of view of the management of the information about the data (referred to as metadata) published by the organization, it is necessary to adopt a standard naming convention and structure for the variables in order to efficiently store the metadata in a central database, allow efficient extraction, and permit efficient search by users.
The naming convention and structure referred to above are adapted from the International Standards Organisation (ISO) standard; Information technology - Metadataregistries (MDR) or ISO 11179. This standard is being adopted by an increasing number of National Statistical Organizations.
How the structure is applied
When it is decided that a statistical program will produce data to illuminate a certain subject area, the responsible analysts have to determine:
- which statistical unit(s) will be observed, e.g. persons or households, etc. in the case of a social statistics program, or business establishments or enterprises in the case of a business statistics program; then,
- which characteristics of these statistical units will be measured, e.g. revenue or expenses, etc., and some times, the actual occurrence of the statistical unit (e.g. count of persons, in which case the characteristic measured is for the statistical unit the state of existing); then
- most often, the statistical program will produce data for more than just the total of the units beingobserved, and for the global characteristic being considered; the program will probably produce data forsub-categories of the statistical unit, and for sub-categories of the characteristic considered. For example, in the case of income of households, data is produced for different categories of revenues, e.g. wages, pensions, etc.; as well, data is produced for different categories of households, e.g. households with one income earner, with two income earners, etc. These categories are what statistical organizations call 'classes within classifications'. For coherence of the data published by the various sub-divisions of a statistical organization, and even by different statistical organizations, standard classifications are created. These usually comprise the most frequently used categorizations of characteristics and observation units. For example, the three North American countries have developed the North American Industry Classification System (NAICS) in order to publish data for the same sub-categories of industry of statistical units. Finally, the analysts have to decide
- which unit of measure will be used to express the numerical values, e.g. in the case of income, it could be current Canadian dollars, constant 1997 dollars, etc.
How to read time series statistical tables using the ISO components
Imagine a time series table, applying to Canada, where the headings of the columns consist of the reference periods and the headings of the rows contain the name of the general characteristics being measured for the statistical unit being observed, e.g. "Total income of all Households". The documentation of variables you are consulting defines the characteristic being measured and the statistical unit being observed. The cells along the rows contain the numerical values using the unit of measure indicated in the documentation of the variables.
In most cases, the data in the table will be broken down by geographic areas within Canada, e.g. provinces and territories, or Canadian Metropolitan Areas, etc. The variable documentation informs the users of this geographic breakdown. In most cases, the value of the general characteristic being measured will be broken down by sub-categories of the characteristic and/or of the statistical unit as well or in other words by classes within classifications, e.g. classes of income sources, or classes of industries. The variable documentation always informs the users on the different classes of the specific classification(s) used to detail the data in the table. The names of these classes and groups of classes appear as the headings of rows in the table.