Section 7: Accessing and Citing DLI Data

  1. What is in the DLI Collection
  2. Accessing DLI Data
  3. The DLI EFT Site
  4. The DLI Nesstar
  5. WDS
  6. DLI Mirror Site
  7. Secondary Data Distributers
  8. Other sources of data
  9. Citing Data

What is in the DLI Collection

The DLI Collection is composed primarily of standard products produced by Statistics Canada. These products include Public Use Microdata Files (PUMF), cd-rom products, spreadsheets and databases. The DLI also provides access to a special collection of data products, including sample files from the Discharge Abstract Database (DAD) from the Canadian Institute of Health Information (CIHI), Postal CodeOM data products, and the Social Policy Simulation Database and Model (SPSD/M).

List of DLI Collection holdings

The DLI website includes a list of surveys and products which are available in the DLI Collection portion of the site, restricted access to DLI members, on the DLI Products page. This handy one-pager can be provided to potential users when promoting the DLI at the institution.

Survey acronyms

The collection on the EFT site and the DLI Products page, restricted access portion of the website, are organized according to survey record number with associated acronyms (the abbreviation for the survey). Many of the questions posed on the dlilist refer to data sets by their acronym. It is therefore helpful for DLI Contacts and DLI alternates to be familiar with the acronyms used at Statistics Canada.

Surveys without products

Not all surveys produce data products. For example, microdata files for business surveys are not available due to the small sampling size of the surveys. Author divisions have the discretion of producing or not producing products.

The Special Surveys Division at Statistics Canada often provides a public use microdata file (PUMF) once it has completed survey work for its client. As the product then becomes part of the Statistics Canada collection, the DLI can request a copy for its collection.

However, many divisions do not create PUMFs as these are costly to produce and must be vetted by the Microdata Release Committee (Statistics Canada's confidentiality control for microdata files). Some divisions prefer to create standard tables available in CANSIM and charge retrieval fees for more in-depth requests. Although the data may be freely available, cost recovery charges apply to the analyst's time.


Accessing DLI Data

The DLI Collection can be accessed either through the Electronic File Transfer (EFT) Service, Nesstar or the DLI's Web Data Server (WDS) available through the restricted access portion of the DLI website.

The difference between the DLI EFT, Nesstar, and WDS.

The DLI EFT is the repository for the entire DLI collection. It houses StatCan standard data products, Postal CodeOM data products, DADs files, and the SPSD/M products. Nesstar is a data access tool for the Public Use Microdata Files in the collection. And finally, the WDS houses the DLI's aggregate data and geographic files.


The DLI EFT Site

About the EFT site

The DLI EFT site is the primary method used to disseminate the DLI Collection. To ensure the absolute protection of data files, the EFT requires that each user have their own unique user ID and password.

The EFT service supports an FTP protocol standard for sending and receiving files with FTPS support expected in 2017. DLI contacts will need an FTP application like WS_FTP, filezilla etc to access the EFT site. Software used to download files from the FTP site varies based on institutional preferences.

Understanding the DLI EFT directory structure

Although not always intuitive to the first time user, the EFT directory structure is quite logical. The DLI team recently consolidated our data holdings into 5 safes, each with the following contents. Please note, some DLI contacts may not be able to view all of the following safes due to various licenses for different data.

EFT directory structure
Safe name Contents
MAD_PUMF_FMGD_DAM Survey PUMFs and metadata
MAD_DLI_IDD_DAM DLI annual reports, DLI training materials, CD-ROM data products, Geography files, Census files and more
MAD-PCCF_FCCP_DAM Postal Code Conversion Files (PCCF), Postal Code Conversion Files Plus (PCCF+) and Postal Code Federal Riding Files (PCFRF)
MAD_CIHI_ICIS_DAM Discharge Abstract Database (DAD) from the Canadian Institute of Health Information (CIHI)
MAD_SPSDM_BDMSPS_DAM Social Policy Simulation Database and Model files (SPSDM)

MAD_PUMF_FMGD_DAM

The initial screen upon browsing the MAD_PUMF_FMGD_DAM provides a listing of survey folders, organized according to their survey record number and corresponding survey acronym. The DLI team has begun using this naming convention to both conserve server space, and to harmonize both official languages. Data under the MAD_PUMF_FMGD_DAM safe is made available under the Licence agreement for Public Use Microdata Files (PUMF) (Appendix 1 of the DLI Licence agreement).

Survey folders

Once inside a survey folder, the breakdown would be the same as before. If a survey has been collected more than once, each year is usually contained in a separate subdirectory. The secondary level in the survey breaks down the information based on data (data) and documentation (doc). The readme file for the survey is also found at this level. The data folder provides a zipped file with the data. The data can take the form of microdata in ascii, SPSS, STATA, and SAS formats. The documentation folder includes the metadata that is the information necessary to interpret and understand the microdata.

For example:

/MAD_PUMF_FMGD_DAM/Root/

/3250_APS_EAPA

/1991
/2001
/2001-Children
/2006

/age-06-14
/age-15+

/data
/doc
lisezeapa2006-age-15+.txt
readaps2006-age-15+.txt

/3251_PALS_EPLA

With respect to the GSS folders, please consult the EFT key entitled Readme-Key_Lisezmoi-clé.xls. We made special note of the GSS cycles (far right column indicates associated cycles)

Example:

GSS Cycle 27 SI (SDDS 5024)
GSS Cycle 27 GVP (SDDS 4430)

MAD_DLI_IDD_DAM

The initial screen upon browsing the MAD_DLI_IDD_DAM provides a listing of the folders available to DLI contacts. Data within the MAD_DLI_IDD_DAM safe is made available under the DLI Licence agreement and disseminated under the terms and conditions of the Statistics Canada Open Licence Agreement. Users can expect to find folders for the following materials:

Census folders

The Census folders, for Census of Population and Census of Agriculture, are initially broken down by Census year. The secondary level of breakdown varies, Census years prior to 2001 break down the information based on the level of geography sought. The topic files are then provided for this level of geography.

Starting in 2001, the files are organized by topic (21 topics in 2001). The levels of geography are not sorted and the user must make good use of the readme files to identify the data desired.

Geography folder

The Geography folder is initially broken down by Census year. The secondary level of breakdown identifies the type of information sought. For example, a user may be seeking reference maps, boundary files, specific products, etc. The readme once again becomes a critical tool to navigate the folder.

Mirror-site-files-dliftp.library.ualberta.ca folder

Provides a list of product titles available from the DLI Mirror site hosted at the University of Alberta.

Reports folder

The reports folder contains materials of particular interest to DLI contacts. The Reports folder contains the External Advisory Committee (EAC) Bi-Annual report, DLI updates, meeting minutes for both the EAC and the Professional Development (PD) Committee. In addition, users can find the images of the new DLI graphic identifier.

Other-Autres folder (data tables, CD-ROM products, etc)

The Other-Autres folder contains all additional products associated with Statistics Canada microdata – excluding PUMFs. The folder provides a listing of additional data products organized according to their survey record number or catalogue number and corresponding survey or product acronym. The DLI team has begun using this naming convention to both conserve server space, and to harmonize both official languages. In addition, users seeking clarity on the record numbers or acronyms are invited to use the Excel workbook housed in the folder (other-products_autres-produits.xls). The workbook functions as a key to help explain the nomenclature. CD-ROM products found within are available in a zipped format for download. At times, the contents of the CD-ROM will need to be downloaded by the user, unzipped and then burned onto a CD-ROM (this will be noted in the file's readme). Many of the products have strange proprietary structures and must be run from a CD as opposed to downloading to a hard drive.

MAD_CIHI_ICIS_DAM

The CIHI safe contains sample files from the Discharge Abstract Database (DAD). Data from 2009 – 2015 is currently available in clearly labeled subfolders. Unlike other products available in the DLI Collection, DAD files are separate from the Statistics Canada Open Licence Agreement. Data under the MAD_CIHI_ICIS_DAM safe is made available under the License Agreement for the Discharge Abstract Database (DAD) Research Analytic Files from the Canadian Institute for Health Information (CIHI) (Appendix 3 of the DLI Licence agreement).

MAD_PCCF_FCCP_DAM

The PCCF safe is initially broken down by Census year. The secondary level of breakdown identifies the postal code data product:

Postal CodeOM Conversion File (folder: pccf-fccp)

Postal CodesOM by Federal Ridings File (folder: pcfrf-fcpcef)

Postal CodeOM Conversion File Plus (folder: ppcf-fccp-plus)

Within each subfolder, a Read Me file provides a description of the product; high summary changes of the product (eg: Starting with June 2013, the PCCF is only available as a standard package for Canada (no longer available at the province level) and is updated and released annually (previously released semi-annually). The read me file also lists the title of product (eg: PCCF August 2015); the release date (eg: February 12, 2016), frequency of release and the directory. Unlike other products available in the DLI Collection, PCCF files are separate from the Statistics Canada Open Licence Agreement. Data under the MAD_PCCF_FCCP_DAM safe is made available under the End-use Licence Agreement for Postal CodeOM Conversion File, Postal CodesOM by Federal Ridings File and Postal CodeOM Conversion File Plus ("data product") (Appendix 2 of the DLI Licence agreement).

MAD_SPSDM_BDMSPS_DAM

New for 2016, the DLI team is pleased to announce the creation of the Social Policy Simulation Database and Model (SPSD/M) safe. Subfolders are labeled by version, with each containing their own unique install files and instructions. Please consult the readme files housed within the folders for more information.  Unlike other products available in the DLI Collection, SPSD/M files are separate from the Statistics Canada Open Licence Agreement. A separate license agreement is required to access the files, the Licence Agreement for the Social Policy Simulation Database and Model (SPSD/M) (Appendix 4 of the DLI Agreement).

The naming convention of files

Files located on the EFT site follow a similar naming convention. When the files are received from the author division, they are renamed to fit the DLI naming convention. Therefore, a file produced by the subject matter division may differ from the file located on the DLI EFT site.

Files are initially named according to the survey acronym followed by the year or cycle of the survey and finally the type of document.

Documents and their extensions

For example:

  • User Guide for the 2003 Household Internet Use Survey: hius2003gid.pdf
  • Questionnaire for the 2009 Survey of Household Spending: shs2009que.pdf

If a file is updated or replaced by the subject matter division, an additional extension will be added identifying the version number. For example, aes1984gidv2.pdf would represent an update to the document from the originally published publication.

The readme files

The readme file is a quick reference guide to the DLI EFT site. Once in a folder (survey, Census, Geography, etc.), the readme file provides a breakdown of the contents of the folder. This includes not only the file names, but also longer titles in order for a user to identify the file sought. The readme file also includes the size and length of the data file for PUMFs in order to perform a quick verification that the file was successfully transferred from the EFT to the user's computer.

Retrieving files from the DLI EFT site

Using a specialized FTP software, access and log into the DLI EFT site. The hostname is eftftp-ptftef.statcan.gc.ca, the user ID and password are provided by the DLI.

If you are using WinSCP (secure shell software), some of the default settings may cause you to experience difficulties when connecting to the DLI EFT site. If you have any problems, please contact the DLI unit.

When you have located the file(s) you wish to download, mark the files and transfer them to your computer. Make sure to designate the receiving folder on your computer before initiating the transaction.

Another tip is to set the Transfer Mode based on the type of file you are transferring. A good idea is to set your default to Auto so that the program selects the right transfer mode based on the file extension. As a general rule, all files should be downloaded in Binary except for files with the following extensions: .txt; .sps; .sas; and .dat .

*Please note: DLI Contacts cannot delete files from the DLI EFT site.

Once you have downloaded the data files, decompress (unzip) them as necessary. As well, the documentation may at times be zipped so it will require unzipping.

Confirm that the file transfer for PUMFs was successful by opening the documentation and running Maxline to verify the record length and size of the data file. Maxline is available in the Utilities folder on the EFT site.


The DLI Nesstar

Nesstar is a web-based data exploration, extraction and analysis tool. It lets you search for survey variables across the collection, and supports basic tabulation and analysis online. It also allows for the downloading of the PUMF files into statistical software for further analysis.

About Nesstar

As part of an international standardization effort related to social science data, the DLI is currently preparing DDI-compliant XML-based survey files. Datasets are 'tagged' according to internationally recognized documentation schema, which allows for detailed and structured information searches. The interpretability of the XML format and precise encoding ensure the preservation of data and metadata over time, and promotes data sharing and accessibility within the DDI community. Nesstar Publisher is the data management program used to generate DDI files, and Nesstar is the system used for data dissemination. Authorized users can view the data and metadata through Nesstar WebView (restricted access portion of the DLI Website). Please see the DLI contact at your institution for further information.

Accessing Nesstar

When an institution signs a license with the DLI, their IP's will be registered and all students, researchers, and faculty will have access to the site. To connect remotely, your institution will need to register the site with its Proxy service.

Retrieving files from Nesstar

The Nesstar enables to browse, visualize, manipulate and extract social survey data. The platform is composed of two primary catalogues in English and in French:

  • Statistics Canada Public Use Microdata Files (PUMF)
    English catalogue of public use microdata files, which contain anonymized, non-aggregated data. Users can download the data file and associated metadata.
  • Statistics Canada metadata for Master Files (RDC)
    English catalogue of detailed microdata files, which contains most of the original information collected during the survey interview. Users can only browse or search the survey metadata.

Users can also search surveys or statistical products for both public-use microdata files (PUMF) and public master files. You may also search DLI variables. You may also browse survey datasets directly in Nesstar. To download a data file, click on the Download icon. Nesstar supports various file formats for downloading, including SPSS portable files, Stata, SAS, delimited and comma separated values. The downloaded data file is a ZIP compressed archive and needs to be extracted using an unzipping tool, for instance Winzip, Pkunzip or equivalent. Consult the Nesstar help guide for more information.


The WDS

The WDS (Web Data Server), or Beyond 20/20, is a web-based, multidimensional table viewer that allows users to research aggregate data. Aggregate data are statistical summaries organized in a specific data file structure which permits further computer analysis (for data processing). Aggregate data are produced to provide access to data that cannot be released as microdata. The WDS is used to manipulate this aggregate data by allowing the user to create tables, charts, graphs, and reshape data files to fit the specific needs of the researcher. The WDS is completely user friendly, meaning you do not require any knowledge of SPSS, SAS, or any other complicated statistical program.

Accessing the WDS

When an institution signs a license with the DLI, their IP's will be registered and all students, researchers, and faculty will have access to the site. To connect remotely, your institution will need to register the site with its Proxy service.


DLI Mirror Site

Data from other producers

The DLI Collection houses data products produced by Statistics Canada only. In the past, data products from other producers were accepted in the DLI Collection. However, due to language requirements and confidentiality rules at Statistics Canada, these products can no longer be housed within the DLI.

These products, such as the Canadian Heart Health Survey and the Youth Smoking Survey, are available to DLI Contacts and their Designates through the DLI Mirror site at the University of Alberta. For access credentials, please contact the DLI unit.

Requesting data not found in the DLI collection

If, through perusing the Statistics Canada website, you identify a product that you feel should be part of the DLI Collection (e.g., it is not part of the DSP or the DLI and the nature of the product is consistent with DLI guidelines), please place your request on the dlilist and the DLI Unit will be in a position to assess the request and contact the author division with a request as required. Please note that the DLI does not subsidies request for cost recovery custom tabulations.


Secondary Data Distributers

CANSIM on CHASS (institutional membership required)

CANSIM on CHASS is available through the University of Toronto's CHASS (Computing in the Humanities and Social Sciences) section. Institutions must be a member of CHASS to access CANSIM on CHASS.

CHASS (Computing in the Humanities and Social Sciences) (institutional membership required)

CHASS is a computing facility within the Faculty of Arts and Science, University of Toronto. It offers, among other things, a collection of social sciences and general interest databases (e.g., Canadian Census, CANSIM, IMF, World Bank Tables, etc.).

Microdata Analysis and Subsetting with SDA on CHASS (institutional membership required)

SDA @ CHASS is a set of programs for the documentation and web-based analysis of survey data. SDA also has procedures for creating customized subsets of datasets.

Abacus Dataverse Network (institutional membership required)

The Abacus Dataverse Network is the research data repository of the British Columbia Research Libraries' Data Services, a collaboration involving the Data Libraries at Simon Fraser University (SFU), the University of British Columbia (UBC), the University of Northern British Columbia (UNBC) and the University of Victoria (UVic).

ODESI - Ontario Data Documentation, Extraction Service and Infrastructure (institutional membership required)

ODESI is a digital repository for social science data, including DLI data as well as a range of public opinion polls. It is a web-based data exploration, extraction and analysis tool created by the Ontario Council of University Libraries (OCUL), and is available to authorized users from Ontario universities.  Metadata is openly available to the world.

Scholars Portal Dataverse (institutional membership required)

The Scholars Portal Dataverse is a repository primarily for research data collected by researchers and organizations affiliated with Ontario universities, although anyone in the world is welcome to use Scholars Portal Dataverse to deposit, share, and archive data.

Données statistiques et géographiques

Developed by the Quebec university libraries, the Données statistiques et géographiques site enables access to DLI geographic products and aggregate data. Access to data is restricted to students, researchers and professors at the participating universities.


Other Sources of Data

CANSIM (Canadian Socio-economic Information Management System)

CANSIM is Statistics Canada's key socio-economic database. Updated daily, it provides fast and easy access to a large range of the latest and most up-to-date statistics available in Canada. Alternative access to CANSIM data is available through.

The Research Data Centres (RDC) Program

The Research Data Centres (RDCs) provide researchers with direct access to a wide range of population and household surveys, as well as administrative microdata files in a secure university setting. The centres are staffed by Statistics Canada employees and are accessible only to researchers with approved projects who have been sworn in under the Statistics Act as 'deemed employees.' RDCs are located throughout the country, so researchers do not have to travel to Ottawa to access Statistics Canada microdata. In some situations access fees may be charged.

The Real Time Remote Access (RTRA) system

The RTRA system is an on-line remote access service allowing users to run SAS programs in real-time against microdata files located in a central and secure location. Researchers using the RTRA system do not gain direct access to the microdata and cannot view the content of the microdata file. Instead, users submit SAS programs to extract results in the form of frequency tables. As RTRA researchers cannot view the microdata, becoming a deemed employees of Statistics Canada is not necessary. There is a subscription fee to obtain access to the RTRA service and there is no requirement to submit proposals.

Open Data - Government of Canada

Search open data that is relevant to Canadians, learn how to work with datasets, and see what people have done with open data across the country.

Canadian Century Research Infrastructure

The CCRI is a pan-Canadian, multi-disciplinary and multi-institutional effort to develop a set of interrelated databases centered on data from Canadian censuses between 1911 to 1951. A gateway website is hosted at the University of Alberta, providing access to microdata, a geographical framework constructed to enable the location, selection, aggregation, and analysis of census data, and contextual data.

ICPSR – Inter-university Consortium for Political and Social Research

Located at the University of Michigan, ICPSR maintains and provides access to a vast archive of social science data for research and instruction, and offers training in quantitative methods to facilitate effective data use.


Citing Data

The importance of citing data

There are good reasons to develop the habit of including clear, complete references in your document or bibliography. Bibliographic references are important when you are using the data or ideas of others in your written work. References credit your sources and permit your readers to find those sources. Additional information about the importance of citing sources is available in the Why we cite section of How to Cite Statistics Canada Products.

How to Cite Statistics Canada Products

The How to Cite Statistics Canada Products guide has been developed for authors, editors, researchers, academics, students, librarians and data librarians. It describes, in three steps, how to build your reference when citing Statistics Canada products.

This guide began as a project of Statistics Canada's Data Liberation Initiative. The editors acknowledge the work of Gaëtan Drolet from Université Laval (Quebec), a valuable collaborator, who undertook the initial development of this guide to address the needs of his peers.