Minister's message

Navdeep Bains

The Honourable Navdeep Bains
Minister of Innovation, Science and Industry

It is our pleasure to present the 2019–20 Departmental Results Report for Statistics Canada. As the agency continues to mobilize industry and the research community to confront the COVID-19 pandemic, the various organizations in the Innovation, Science and Economic Development (ISED) portfolio have coordinated their efforts to position Canada as a global innovation leader and shape an inclusive economy for all Canadians.

As part of the ISED portfolio, Statistics Canada continued to deliver high-quality, timely and accessible data while protecting the privacy and confidentiality of Canadians' information. It also advanced its modernization agenda to keep pace with our data-driven economy and society. In 2019–20, Statistics Canada worked with experts from around the world to strengthen its strict privacy and confidentiality measures, while providing Canadians with the data and insights they need to make informed decisions.

We are pleased to see Statistics Canada's innovative approaches to providing more relevant and timely data, while reducing the response burden on Canadians, to help people make evidence-based decisions that drive Canada's economy and society forward.

Statistics Canada continued to engage and collaborate with its partners from a wide range of sectors both at home and abroad. The agency worked with partners from all three levels of government, Indigenous groups, industry leaders and more to raise awareness of its modernization agenda and to better understand and deliver on information needs.

These are just a few examples of Statistics Canada's work on behalf of Canadians. We invite you to read this report to learn more about how Statistics Canada is collaborating with its partners, disseminating more disaggregated data, modernizing its programs with leading-edge methods and technologies, and supporting government priorities to deliver insight through data for a better Canada.

 

Current and Historical Countries and Areas of Interest 2019

The material on current and historical countries and areas of interest is presented here to help users who compile longitudinal data on countries and areas of interest to assign those data to the right current or formerly used country names and codes. A start and end date are provided to define the period of validity of country names and codes; there is no end date when the country names or codes are current. This list, which compiles changes to country names and codes since 1970, is based on information gathered from the following sources:

  • current and previous editions of the standard ISO 3166-1, Codes for the representation of names of countries and their subdivisions – Part 1: Country codes;
  • ISO 3166-3 Codes for the representation of names of countries and their subdivisions – Part 3: Code for formerly used names of countries;
  • Newsletters related to ISO 3166-1 and ISO 3166-3; and
  • current and previous editions of the United Nations Standard Country or Area Codes for Statistical Use.
Current and Historical Countries and Areas of Interest - CSV Version (CSV, 22.66 KB)
Showing 1 to 10 of 364 entries
Current and Historical Countries and Areas of Interest
Current and Historical Countries and Areas of InterestPeriod of ValidityRemarks
CodeCountries and Areas of InterestStartEnd
11124 Canada 1970    
11304 Greenland 1970    
11666 Saint Pierre and Miquelon 1970    
11840 United States 1970 2019 Renamed as United States of America.
11840 United States of America 2019   Formerly known as United States.
12084 Belize 1973   Formerly known as British Honduras.
12084 British Honduras 1970 1973 Now known as Belize.
12188 Costa Rica 1970    
12222 El Salvador 1970    
12320 Guatemala 1970    
Footnote 1

The official name of Bolivia is Plurinational State of Bolivia.

Return to footnote 1 referrer

Footnote 2

China excludes Hong Kong and Macao.

Return to footnote 2 referrer

Footnote 3

The full name of Hong Kong is the Hong Kong Special Administrative Region of China.

Return to footnote 3 referrer

Footnote 4

The official name of Iran is Islamic Republic of Iran.

Return to footnote 4 referrer

Footnote 5

Ireland is also referred to as Republic of Ireland.

Return to footnote 5 referrer

Footnote 6

The official name of North Korea is Democratic People's Republic of Korea.

Return to footnote 6 referrer

Footnote 7

The official name of South Korea is Republic of Korea.

Return to footnote 7 referrer

Footnote 8

The official name of Kosovo is Republic of Kosovo.

Return to footnote 8 referrer

Footnote 9

The official name of Laos is Lao People's Democratic Republic.

Return to footnote 9 referrer

Footnote 10

The full name of Macao is Macao Special Administrative Region of China.

Return to footnote 10 referrer

Footnote 11

The official name of Moldova is Republic of Moldova.

Return to footnote 11 referrer

Footnote 12

Serbia excludes Kosovo.

Return to footnote 12 referrer

Footnote 13

The full name of Sudan is the Republic of the Sudan.

Return to footnote 13 referrer

Footnote 14

The official name of Syria is Syrian Arab Republic.

Return to footnote 14 referrer

Footnote 15

The official name of Tanzania is United Republic of Tanzania.

Return to footnote 15 referrer

Footnote 16

The official name of United Kingdom is United Kingdom of Great Britain and Northern Ireland. United Kingdom includes Scotland, Wales, England and Northern Ireland (excludes Isle of Man, the Channel Islands and British Overseas Territories).

Return to footnote 16 referrer

Footnote 17

The official name of Venezuela is Bolivarian Republic of Venezuela.

Return to footnote 17 referrer

Footnote 18

West Bank and Gaza Strip are the territories referred to in the Declaration of Principles, signed by Israel and the Palestine Liberation Organization in 1993. Includes responses of Palestine.

Return to footnote 18 referrer

Differences between SCCAI 2019 and ISO 3166-1:2013

Differences between SCCAI 2019 and ISO 3166-1:2013 - CSV Version (CSV, 1.11 KB)
Differences between SCCAI 2019 and ISO 3166-1:2013
SCCAI 2019 ISO 3166-1:2013
Bolivia Bolivia, Plurinational State of
Congo, Republic of the Congo
Holy See (Vatican City State) Holy See
Iran Iran, Islamic Republic of
Korea, North Korea, Democratic People's Republic of
Korea, South Korea, Republic of
KosovoFootnote 1  
Laos Lao People's Democratic Republic
Moldova Moldova, Republic of
SarkFootnote 2  
South Africa, Republic of South Africa
Syria Syrian Arab Republic
Taiwan Taiwan, Province of China
Tanzania Tanzania, United Republic of
United Kingdom United Kingdom of Great Britain and Northern Ireland
Venezuela Venezuela, Bolivarian Republic of
Virgin Islands, United States Virgin Islands, U.S.
West Bank and Gaza Strip State of Palestine
Footnote 1

Kosovo was recognized as a country by Canada in 2008. Kosovo is not included in the current version of ISO 3166-1 but has been included in the SCCAI since 2009.

Return to footnote 1 referrer

Footnote 2

Sark is an area of interest listed by the United Nations Statistics Division (Source: Standard Country or Area Codes for Statistical Use [accessed January 15, 2020]).

Return to footnote 2 referrer

Standard Classification of Countries and Areas of Interest (SCCAI) 2019 - Introduction

The Standard Classification of Countries and Areas of Interest (SCCAI) 2019 was developed to increase coherence of the list of countries used within Statistics Canada and to be more consistent with Government of Canada norms. This list of countries and areas includes those for which statistical data are compiled. To satisfy the broadest possible range of applications, all entities in the list are mutually exclusive. For instance, China, Hong Kong and Macao are considered as separate entities for the purpose of this classification. This list of countries and areas of interest forms the base level of the classification and applies to both economic and social statistics.

There are 251 countries or areas in the SCCAI 2019, including the 249 countries or areas found in the international standard ISO 3166-1:2013Footnote 1. The two additional entries in the SCCAI that are not in the ISO list are Kosovo, which was recognized as a country by Canada in 2008, and Sark, which was recognized as an area by the United Nations in 2011. The names of countries or areas refer to their short form used and not necessarily to their full names. They are based on the short names used in the ISO standard and were modified both to reflect Canadian norms as well as to follow specific naming rules adopted for the Canadian list. The modifications to reflect Canadian norms were done based on consulting the Global Affairs Canada website as well as examining responses to the 2016 Census of Population question for the place of birth variable. The specific naming rules adopted for the Canadian list are:

  1. Use of short form of country names wherever practicable and/or to avoid confusion;
  2. Use of commas for sorting in alphabetical order; and
  3. Use of long form of country names to avoid confusion.

These changes to the names have resulted in differences between SCCAI 2019 and ISO 3166-1:2013.

The SCCAI provides a list of the names of countries or areas of interest in order of their corresponding five-digit SCCAI code. The first two digits in the SCCAI code correspond to the hierarchical structure in the Countries and Areas of Interest for Social Statistics - SCCAI 2019, while the last three digits represent the United Nations numeric codes (NUM-3) for countries or areas. Also included are internationally used three-digit numerical codes, two-character alpha codes and three-character alpha codes.

In addition, historical revisions of countries and areas of interest are available to users who compile longitudinal data on countries and areas of interest.

The material on current and historical countries and areas of interest is presented here to help users who compile longitudinal data on countries and areas of interest to assign those data to the right current or formerly used country names and codes. A start and end date are provided to define the period of validity of country names and codes; there is no end date when the country names or codes are current. This list, which compiles changes to country names and codes since 1970, is based on information gathered from the following sources:

  • current and previous editions of the standard ISO 3166-1, Codes for the representation of names of countries and their subdivisions – Part 1: Country codes;
  • ISO 3166-3 Codes for the representation of names of countries and their subdivisions – Part 3: Code for formerly used names of countries;
  • newsletters related to ISO 3166-1 and ISO 3166-3; and
  • current and previous editions of the United Nations Standard Country or Area Codes for Statistical Use.

Code description

Code: Five-digit numerical code defined in the variant for social statistics, Countries and Areas of Interest for Social Statistics (SCCAI).

Num-3: Three-digit numerical code defined by the United Nations.

Alpha-2: Two-character alpha code defined by the International Organization for Standardization (ISO).

Alpha-3: Three-character alpha code defined by the International Organization for Standardization (ISO).

Relation to previous version

This is a revision of the previous Standard Classification of Countries and Areas of Interest (SCCAI) 2018. This standard includes a classification variant, Countries and Areas of Interest for Social Statistics - SCCAI 2019, that provides standard groupings of countries and associated codes for publication purposes. Usage notes are included with the standard.

Conformity to relevant internationally recognized standards

This standard is compatible with the list of countries or areas included in the International Standard for country codes ISO 3166-1, except for the recognition of Kosovo and Sark. The coding structure follows the ISO coding structure, with countries having three-digit numeric codes, two-character alpha codes, and three-character alpha codes. New codes were assigned to Kosovo and Sark based on an ISO clause which places codes at the disposal of users who need to add further names of countries or areas of interest to their list.

The 249 countries and areas in the ISO 3166-1:2013Footnote 1 standard includes 240 of the 241 countries or areas for which statistical data are compiled by the Statistics Division of the United Nations SecretariatFootnote 2. The entity Channel Islands, which regroups Guernsey, Jersey and Sark, is an intermediate level not retained in ISO or SCCAI.

Countries and Areas of Interest for Social Statistics - SCCAI 2019 - Introduction

The variant Countries and Areas of Interest for Social Statistics – SCCAI 2019 was developed to create groupings of countries to enable the production of integrated statistics when publishing social statistics data. These groupings are based on those used historically for Statistics Canada's Census of Population place of birth variable.

The variant Countries and Areas of Interest for Social Statistics – SCCAI 2019 has three levels: 6 geographical macro-regions, 19 geographical sub-regions and 251 countries or areas of interest. This variant is defined in terms of countries or areas of interest in the Standard Classification of Countries and Areas of Interest (SCCAI) 2019. The hierarchical structure of the classification shows the relationship between the three levels of the classification variant.

  • Level 1: Geographical macro-regions
  • Level 2: Geographical sub-regions
  • Level 3: Countries and areas of interest

This variant has a coding structure built on the United Nations numeric code (NUM-3) for countries or areas. This three-digit code is preceded by two numeric digits, where the first digit represents the macro-region and the second digit identifies the sub-region within the macro-region. Together, this five-digit code forms the SCCAI code.

Relation to previous version

This is a revision of the previous Standard Classification of Countries and Areas of Interest (SCCAI) 2018. This standard includes a classification variant, Countries and Areas of Interest for Social Statistics – SCCAI 2019, that provides standard groupings of countries and associated codes for publication purposes. Usage notes are included with the standard.

Conformity to relevant internationally recognized standards

This standard is compatible with the composition of geographical macro-regions and geographical sub-regions of the United Nations Statistics Division;Footnote 1 however, some differences remain. The main differences are:

  • in this standard, Bermuda is included within the sub-region Caribbean and Bermuda, whereas it is part of Northern America in the United Nations groupings
  • the macro-region Oceania is not further divided into sub-regions in SCCAI
  • West Central Asia and the Middle East is one sub-region in SCCAI, while most of the countries found here are in two sub-regions, Central Asia and Western Asia, in the United Nations groupings
  • Estonia, Latvia and Lithuania are included within the sub-region of Eastern Europe in SCCAI, whereas they are part of Northern Europe in the United Nations groupings
  • some French sub-region names used by the United Nations have been modified in SCCAI to make them simpler and more consistent with Canadian convention.

Standard Classification of Countries and Areas of Interest (SCCAI) 2019

Status

This standard was approved as a departmental standard on September 15, 2020.

2019 version of SCCAI

The Standard Classification of Countries and Areas of Interest (SCCAI) 2019 is based on the international standard for country codes ISO 3166-1:2013Footnote 1. All changes made as of December 31, 2019 are included in this 2019 version of the SCCAI. The list was also updated for consistency with Government of Canada norms.

In addition to the list of countries and areas of interest, a classification variant for social statistics data is included. The hierarchical structure of the classification shows the relationship between these geographic areas.

HTML format

CSV format

PDF format

Concordances and documentation on changes

Variants of SCCAI

Complete a survey on your experience using this Departmental Results Report.

Erratum:
After the Statistics Canada's 2019–20 Departmental Result Report was tabled in Parliament and published online, it was determined that a modification to the results achieved table was needed. For indicator "Number of statistical products available on the website", the result was erroneously reported as 38,042 but should have been 37,254.

Catalogue no. 11-628-X
ISSN 2368-1160

Table of contents

2019 Annual Electric Power Generating Stations Survey

Integrated Business Statistics Program (IBSP)

Reporting guide

This guide is designed to assist you as you complete the 2019 Annual Electric Power Generating Stations Survey. If you need more information, please call the Statistics Canada Help Line at the number below.

Your answers are confidential.

Statistics Canada is prohibited by law from releasing any information it collects which could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act.

Statistics Canada will use information from this survey for statistical purposes.

Help Line: 1-877-604-7828

Definitions

Legal Name
The legal name is one recognized by law, thus it is the name liable for pursuit or for debts incurred by the business or organization. In the case of a corporation, it is the legal name as fixed by its charter or the statute by which the corporation was created.

Modifications to the legal name should only be done to correct a spelling error or typo.

To indicate a legal name of another legal entity you should instead indicate it in question 3 by selecting 'Not currently operational' and then choosing the applicable reason and providing the legal name of this other entity along with any other requested information.

Operating Name
The operating name is a name the business or organization is commonly known as if different from its legal name. The operating name is synonymous with trade name.

Current main activity of the business or organization

The North American Industry Classification System (NAICS) is an industry classification system developed by the statistical agencies of Canada, Mexico and the United States. Created against the background of the North American Free Trade Agreement, it is designed to provide common definitions of the industrial structure of the three countries and a common statistical framework to facilitate the analysis of the three economies. NAICS is based on supply-side or production-oriented principles, to ensure that industrial data, classified to NAICS, are suitable for the analysis of production-related issues such as industrial performance.

The target entity for which NAICS is designed are businesses and other organizations engaged in the production of goods and services. They include farms, incorporated and unincorporated businesses and government business enterprises. They also include government institutions and agencies engaged in the production of marketed and non-marketed services, as well as organizations such as professional associations and unions and charitable or non-profit organizations and the employees of households.

The associated NAICS should reflect those activities conducted by the business or organizational unit(s) targeted by this questionnaire only, and which can be identified by the specified legal and operating name. The main activity is the activity which most defines the targeted business or organization's main purpose or reason for existence. For a business or organization that is for-profit, it is normally the activity that generates the majority of the revenue for the entity.

The NAICS classification contains a limited number of activity classifications; the associated classification might be applicable for this business or organization even if it is not exactly how you would describe this business or organization's main activity.

Please note that any modifications to the main activity through your response to this question might not necessarily be reflected prior to the transmitting of subsequent questionnaires and as a result they may not contain this updated information.

Commission year
Indicates the first year the turbine became active.

Electricity generation capacity
Indicates the maximum potential generation capacity of the turbine and not the actual generation value.

Principal fuel
Indicate the "primary" fuel used at this station. If this is a co-generation facility and the steam turbine is operated using recaptured waste heat, please indicate "Other fuels (waste heat)" as the fuel source.

Station
A station refers to a facility which generates electricity. Many stations contain multiple turbines.

Station detail
Indicate the station unit ID name and or unit number, the commission year of the unit, the unit's capacity and total capacity of the station (report capacity in kilowatts).

Station latitude and longitude
If known please indicate as applicable.

Station name
Each station should be reported separately, as applicable. Indicate the name of the station. Also indicate the provincial location of each station.

Status
If this station is a standby facility (a unit whose operation is not part of the planned load), please indicate accordingly.

Turbine
A machine used for converting mechanical power to electrical power, typically through a rotor. In the case of solar electricity generating stations, a turbine refers to a solar array, comprised of photovoltaic cells.

Turbine ID
Indicates the identifier associated with an individual turbine.

Turbine type
Indicate which type of station is present - Combustion Turbine, Hydraulic (Hydro) Turbine, Internal Combustion Turbine, Solar, Nuclear Steam Turbine, Conventional Steam Turbine, Tidal Power Turbine or Wind Power Turbine.

Water source
In the case of hydro stations, name the river or lake utilized.

Thank you for your participation.

The Open Database of Cultural and Art Facilities (ODCAF)
Metadata document: concepts, methodology and data quality

Version 1.0

Data Exploration and Integration Lab (DEIL)
Centre for Special Business Projects (CSBP)

October 2, 2020

Table of Contents

  1. Overview
  2. Data Sources
  3. Reference Period
  4. Target Population
  5. Compilation Methodology
  6. Database Coverage
  7. Data Quality
  8. Data Dictionary
  9. Contact Us

1. Overview

This experimental Open Database of Cultural and Art Facilities (ODCAF) is one of a number of datasets being created as part of the Linkable Open Data Environment (LODE). The LODE is an exploratory initiative of the Data Exploration and Integration Lab (DEIL) at Statistics Canada. It aims at enhancing the use, accessibility and harmonization of open data from authoritative sources by providing a collection of datasets released under a single licence, as well as open-source code to link these datasets together. This initiative is also meant to explore open data for official statistics and to support geospatial research across various domains. The LODE datasets and code are available through the Statistics Canada website and can be found at: Linkable Open Data Environment.

The ODCAF is a database of cultural and art facilities released as open data. Data sources include various levels of government within CanadaFootnote 1 and professional associations. This document details the process of collecting, compiling, and standardizing the individual datasets of cultural and art facilities that were used to create the ODCAF. The ODCAF is made available under the Open Government Licence – Canada.

In its current version (Version 1.0), the ODCAF contains approximately 8,000 individual records. The database is expected to be updated periodically as new open datasets become available. The ODCAF is provided as a compressed comma separated values (CSV) file.

2. Data Sources

Multiple data sources were used to create the ODCAF. The sources used are detailed in a 'Data Sources' CSV file located within the zipped data folder available for download on the ODCAF webpage. The links to the original datasets, licenses or terms of use, attribution statements and additional notes are also included in the Data Sources CSV file. For further information on the individual licences, users should consult directly the information provided on the open data portals of the various data providers. In addition to openly licensed databases, the ODCAF also includes a publicly available listing of cultural and art facilities.

The distinction between open and other publicly available data is based on the licensing terms (explicit or implicit) attached to each source dataset used. Open data licenses permit, in varying degrees, usability for any lawful purpose, redistribution (re-sharing) and modification and re-packaging of the data. However, open data licenses can impose some restrictions, such as attribution of original source, share-alike (re-sharing only with like conditions), and no commercial use. Examples of open data licenses are Creative Commons, MIT, GPLv3, and Canada's Open Government License. In general, no warranty is expressed and there are very minor conditions stipulated by the provider.

Publicly available data that are not open data might be associated with proprietary licensing or terms of use that may restrict some of the aspects that would otherwise be permitted under open data licensing.

3. Reference Period

The Data Sources CSV provides, when this is known, either the update frequency or the date each underlying dataset was last updated by the provider (this information is collected at the time the dataset was accessed for this project). Additionally, the Data Sources CSV provides the date each dataset used in the ODCAF was downloaded or provided by the organization that is the source of the data. Data were gathered between January 2020 and July 2020. Users are cautioned that the download date should not be used as an indication of the reference date of the data. To obtain specific information concerning the reference dates of the source datasets, users might contact the relevant data providers directly.

4. Target Population

For the purposes of the ODCAF database, cultural and art facilities are facilities wherein the primary activity is of a cultural nature or is related to the arts. The target population includes only brick and mortar cultural and art facilities that offer programs or services to the general public.

In terms of the North American Industry Classification System (NAICS), the facilities in the ODCAF are primarily in the following sub-sectors:

  • 711 - Performing arts, spectator sports and related industries
  • 712 - Heritage institutions

Facilities are included when their primary activities have a cultural or arts character, regardless of the source of funding, private or public status, operator type, location or other attributes. However, facilities that are not open to the general public and those that are primarily commercial in nature are not included. Thus, a theatre that offered ballet performances would be in scope, while a ballet school that offered training and performances only to paying students would not.

5. Compilation Methodology

This section provides an overview of the processing done to compile the ODCAF.

Data Standardization and Cleaning

The first processing component for compiling the ODCAF database comprised reformatting the source data to CSV format and mapping the original dataset attributes to standard variable (field) names. This was done using a version of the custom OpenTabulate software developed by the LODE team. A data dictionary of the variables used is provided in section 8.

Owing to the different classification systems and data attributes used in the source datasets and the need to standardize through application of several processing steps, the potential exists for the introduction of errors.

The methodology and limitations of the techniques used in each step used in the data cleaning process are described below. Trivial cleaning techniques, such as removal of whitespace characters and punctuation removal, are omitted from discussion.

Address Parsing

The libpostal address parser, an open source natural language processing solution to parsing addresses, was used to split concatenated address strings into strings corresponding to address variables, such as street name and street number. Occasionally, addresses were split incorrectly due to unconventional formatting of the original address. While effort was made to identify and correct these entries in the final database, some incorrectly parsed entries may have remained undetected. Exceptions are entries with street numbers of the form of two numbers separated by a hyphen or space. Entries of this form usually indicate that the address parser incorrectly parsed a numbered street name (e.g., "123 100 ave" is parsed into the street number "123 100" and the street name "ave", or else that a unit has not been identified correctly (as in "3-100 main st"). Numbers of this form are automatically separated, where the right most number is prepended to the street name if the street name is a variant of the word "street" or "avenue." Otherwise, the left most number is appended to the unit column.

A limited number of entries were manually edited when it was clear that the parsing had not been done correctly. An example is addresses with hyphenated numbers such as "1035-55 street nw", which may have been interpreted as having a civic number of "1035-55" and a street name of "street nw", rather than a civic number of 1035, and a street name of "55 street nw". While effort was made to ensure that the results are correct, it is possible that the scripts used to process and parse the addresses may unintentionally cause other, undetected, errors. Should any such errors be reported to or detected by the LODE team subsequently, they will be corrected in future versions of the ODCAF.

Removal of Duplicates

The removal of duplicates was done using both literal and fuzzy string matching on the facility name and street name, conditioned on the street number and province; by "conditioned," it is meant that a fuzzy comparison between two facilities is made provided that the street numbers and provinces agree. The fuzzy comparison is done using the Python package FuzzyWuzzyFootnote 2, which returns a similarity score between 0 and 100 for two strings, where a score of 100 indicates that the shorter string is a sub-string of the larger string. A threshold value for the returned score of the comparison is chosen empirically, indicating when an entry is marked as a duplicate.

If two entries contained identical street number and province information, then their street names and facility names were compared. When these were nearly identical (defined as having the sum of the similarity scores for the facility names and street names to be at least 195 out of a possible 200), then the entries were marked as duplicates. Recognized duplicates were deleted without manual intervention. The chosen threshold was selected close to the maximum score, which minimized any removal of false positives. When duplicates were found, whichever record contained more non-empty fields was retained. In total, 2,435 duplicates were removed.

Identification of Invalid Entries

A pair of filters was used to process the data after the address parsing stage. This captured entries with invalid postal code or province code information and wrote them to a file separate from the database for further processing. Most of these entries were manually corrected and added back into the database. The choice of these two filters is based on their capabilities in detecting potential errors in postal codes and province codes.

Other Data Cleaning Steps

  • Data entry formatting (removal of excess whitespace and punctuation), removal of postal code, province/territory names.
  • During processing, separation of entries with incorrect postal code or 2-letter province/territory code format from the cleaned data and their manual editing.

Selection of Record to Retain in Case of Duplicates

In some instances, a facility was present in more than one source. In such cases, the record with the most information available was retained. Where information between sources did not match, validation tools were used to decide which to retain.

Classification Used and Assignment of Cultural and Art Facility Type

The original data sources use a variety of standards, classifications and nomenclature to describe the type of cultural and art facility. Unfortunately, there is no classification for cultural and art facilities in Canada that is used universally. The following classification of cultural and art facilities is used for Version 1.0 of the ODCAF:

  • Arts or cultural centre: Establishments primarily engaged in promoting culture and arts
  • Artist: Individual artists engaged in creating artistic works
  • Festival site: Sites on which arts or cultural festivals are held
  • Gallery: Establishments primarily engaged in the display of artistic works
  • Heritage or historic site: Sites of cultural, artistic, or historic significance
  • Library or archive: Establishments primarily engaged in the display, curation, and sharing of primarily written material such as manuscripts, periodicals, and other items such as maps or images
  • Miscellaneous: Establishments associated in some way with promoting or providing culture or arts that do not fall into any of the above categories
  • Museum: Establishments primarily engaged in the display, curation, and sharing of collections of artifacts, fine arts, and other objects of artistic, cultural, or historical importance
  • Theatre/performance and concert hall: Establishments primarily engaged in the public performance of artistic or cultural works

The classification is intended to have broad categories that are helpful in distinguishing major types of facilities and yet enable accuracy in mapping source-specific facility types. Facility types are determined from source-specific facility types and source coverage metadata information. Assignments are made using keywords and validated afterwards, with changes made manually whenever needed. When classifying facilities based on source metadata information, this was done analytically on a case by case basis.

Geocoding and Determination of Census Subdivision

In general, the data included in the ODCAF are what is available from the original sources without imputation. The exception to this is the geocoding and the imputation of CSD names and categories, discussed below.

Census subdivision (CSD)Footnote 3 names were derived from two different attributes in the data.

The first attribute comprises the geographic coordinates, namely latitude and longitude. These are placed into the corresponding CSDs by linking the coordinate points to the CSD polygons through a spatial join operation using the Python package GeoPandas.Footnote 4

The second attribute is the city name, where literal string matching was done with each cultural and art facility municipality name and a list of CSD names. The city names with at least ten entries that did not receive a CSD name through this process were manually assigned a CSD name by using Place Names in GeoSuite.

Geocoding was carried out for some sources that provide address data but no geo-coordinates. Latitude and longitude were determined and validated using tools on the internet. A subset of the source-provided geo-coordinates were also validated using the internet. Some coordinates have also been removed from the original sources when it was determined they were derived from postal codes or other aggregate geographic areas as opposed to street address.

While efforts have been made to ensure the accuracy of geo-coordinates, no guarantees are implied, and errors and inaccuracies are possible.

Inclusion in the ODCAF of Facility Type Provided in Source Datasets

The facility types as provided in the data sources (e.g., exhibition or cultural centre, community library, centre d'art, etc.) are also included in the ODCAF without any modification, reassignment, or mapping to a uniform classification.

6. Database Coverage

The ODCAF current version (Version 1.0) database as provided contains approximately 8,000 cultural and art facilities.

As the total number of all cultural and art facilities in the country is not known with a reasonable degree of certainty, the coverage obtained with the sources used was not quantitatively assessed. However, many of the sources purport to list all facilities of a certain type within a jurisdiction. Thus, within these facility type categories and jurisdictions, coverage would be expected to be fairly complete. However, if facilities of a certain category were omitted in a source, then these might be missing from the database, unless they were obtained from a different source.

7. Data Quality

All cultural and art facility data in the ODCAF were collected from government data sources, either from open data portals or publicly-available webpages. In general, other than the processing required to harmonize the different sources into one database, the underlying datasets were taken "as is." The accuracy and completeness of the information is in general a function of the source datasets used.

Classifying facilities

Assignment of facility type was largely based on facility types provided by source datasets. In instances where facility type was either unclear or not defined by the source, facility type was classified based on further research or using meta-information, such as name of dataset.

Removing duplicates

Some source datasets do overlap; datasets which cover only a particular type of arts or cultural facility for an entire province, for example, may overlap with data provided only for specific towns. Although deduplication techniques are used, not all duplicates might have been removed. Modifying the deduplication methods to seek out the remaining duplicates would generate numerous false positives, which would require additional manual intervention. Further details are available in the sub-section Removal of duplicates above.

Correcting invalid entries

A few entries with erroneous province/territory names and postal codes were detected and manually corrected. Further details on the identification of erroneous entries are also reported in the sub-section Identification of invalid entries above.

Address parsing

Natural language processing methods were used for parsing and separation of address strings into address variables, such as street number and postal code (which is removed from the final released database). The methods are reputable in the field for performance and accuracy, but as with all statistical learning methods, they have limitations as well. Poor or unconventional formatting of addresses may result in incorrect parsing. At this stage, no further integration with other address sources was attempted; hence, although address records are generally expected to be correct, residual errors may be present in the current version of the database.

8. Data Dictionary

This data dictionary below describes the variables of the ODCAF.

Arts and cultural facilities varables

Variable – Index

Name
Index
Format
String
Source
Internally generated during data processing
Description
Unique number automatically generated during data processing

Variable – Facility Name

Name
Facility_Name
Format
String
Source
Provided as is from original data
Description
Cultural or arts facility name

Variable – Source Facility Type

Name
Source_Facility_Type
Format
String
Source
Provided as is from original data
Description
Facility type chosen by data provider

Variable – ODCAF Facility Type

Name
ODCAF_Facility_Type
Format
String
Source
Imputed from source data or metadata
Description
Facility type assigned from nine ODCAF categories

Location Variables

Variable – Unit Number

Name
Unit
Format
String
Source
Parsed from a full address string or provided as is
Description
Civic unit or suite number

Variable – Street Number

Name
Street_No
Format
String
Source
Parsed from a full address string or provided as is
Description
Civic street number

Variable – Street Name

Name
Street_Name
Format
String
Source
Parsed from a full address string or provided as is
Description
Civic street name

Variable – City

Name
City
Format
String
Source
Parsed from a full address string or provided as is
Description
City or municipality name (certain records may list the neighbourhood name)

Variable – Province/Territory

Name
Prov_Terr
Format
String
Source
Converted to two letter codes (internationally approved) after parsing from a full address string, or provided as is, or indicated by providers
Description
Province or territory name

Variable – Province Unique Identifier

Name
PRUID
Format
Integer
Source
Converted from province code
Description
Province unique identifier

Variable – CSD Name

Name
CSD_Name
Format
String
Source
Imputed from geographic coordinates and city names using GeoSuite 2016
Description
Census subdivision name

Variable – CSD Unique Identifier

Name
CSDUID
Format
Integer
Source
Imputed from either geographic coordinates or CSD name using GeoSuite 2016
Description
Census subdivision unique identifier

Variable – Longitude

Name
Longitude
Format
Float
Source
Provided as is from original data
Description
Longitude

Variable – Latitude

Name
Latitude
Format
Float
Source
Provided as is from original data
Description
Latitude

Variable – Data Provider

Name
Data_Provider
Format
String
Source
Created based on origins of input dataset
Description
Name of the entity that provided the dataset

9. Contact Us

The LODE open databases are modelled on ongoing improvement. To provide information on additions, updates, corrections or omissions, or for more information, please contact us at statcan.lode-ecdo.statcan@statcan.gc.ca. Please include the title of the open database in the subject line of the email.

Share this page
Date modified:

The Open Database of Cultural and Art Facilities

Catalogue number: 21260001
Issue number: 2020001

The Open Database of Cultural and Art Facilities (ODCAF) is a collection of open data containing the names, types, and locations of cultural and art facilities across Canada. It is released under the Open Government License - Canada.

The ODCAF compiles open and publicly available data on cultural and art facilities across Canada. Data sources include provincial/territorial governments, municipal governments, and professional associations. This database aims to provide enhanced access to a harmonized listing of cultural and art facilities across Canada by making it available as open data. This database is a component of the Linkable Open Data Environment (LODE).

Data sources and methodology

The inputs for the ODCAF are datasets whose sources include provincial, territorial and municipal governments, and professional associations. These datasets were available either under one of the various types of open data licences, e.g., in an open government portal, or as publicly available data. Details of the sources used are available in a 'Data Sources' table located within the downloadable zipped ODCAF folder.

The data sources used do not deploy a uniform classification system. The ODCAF harmonizes facility type by assigning one of nine types to each facility. This was done based on the facility type provided in the source data as well as using other research carried out for that purpose.

The facility types used in the ODCAF are: art or cultural centre, artist, festival site, gallery, heritage or historic site, library or archive, museum, theatre/performance and concert hall, and miscellaneous.

The ODCAF does not assert having exhaustive coverage and may not contain all facilities in scope for the current version. While efforts have been made to minimize these, facility type classification and geolocation errors are also possible. While all ODCAF data are released on the same date, the dates as of which data are current depends on the update dates of the sources used.

A subset of geo-coordinates available in the source data were validated using the internet and updated as needed. When latitude and longitude were not available, geocoding was performed for some sources using address data in the source street address.

Deduplication was done to remove duplicates for cases where sources overlapped in coverage.

This first version of the database (Version 1.0) contains approximately 8,000 records. Data were collected by accessing sources between January 2020 and July 2020.

The variables included in the ODCAF are as follows:

  • Index
  • Facility Name
  • Source Facility Type
  • ODCAF Facility Type
  • Provider
  • Unit
  • Street Number
  • Street Name
  • Postal Code
  • City
  • Province or Territory
  • Source-Format Street Address
  • Census Subdivision Name
  • Census Subdivision Unique Identifier
  • Province or Territory Unique Identifier
  • Latitude
  • Longitude

For more information on how the addresses and variables were compiled, see the metadata that accompanies the ODCAF.

Downloading the ODCAF

For ease of download, the ODCAF is provided as a compressed comma-separated values (CSV) file.

Visualizing the ODCAF

The ODCAF content is available for visualization on a map using the Linkable Open Data Environment Viewer.

Share this page
Date modified: