Statistics Canada
Symbol of the Government of Canada

Types of data collection

Data can be collected using three main types of surveys: censuses, sample surveys, and administrative data. Each has advantages and disadvantages. As students, you may be required to collect data at some time. The method you choose will depend on a number of factors.

Census

A census refers to data collection about every unit in a group or population. If you collected data about the height of everyone in your class, that would be regarded as a class census. There are various reasons why a census may or may not be chosen as the method of data collection:

Advantages (+)

Sampling variance is zero: There is no sampling variability attributed to the statistic because it is calculated using data from the entire population.

Detail: Detailed information about small sub-groups of the population can be made available.

Disadvantages (–)

Cost: In terms of money, conducting a census for a large population can be very expensive.

Time: A census generally takes longer to conduct than a sample survey.

Response burden: Information needs to be received from every member of the target population.

Control: A census of a large population is such a huge undertaking that it makes it difficult to keep every single operation under the same level of scrutiny and control.

Example 1: The Census

Sample survey

In a sample survey, only part of the total population is approached for data. If you collected data about the height of 10 students in a class of 30, that would be a sample survey of the class rather than a census. Reasons one may or may not choose to use a sample survey include:

Advantages (+)

Cost: A sample survey costs less than a census because data are collected from only part of a group.

Time: Results are obtained far more quickly for a sample survey, than for a census. Fewer units are contacted and less data needs to be processed.

Response burden: Fewer people have to respond in the sample.

Control: The smaller scale of this operation allows for better monitoring and quality control.

Disadvantages (–)

Sampling variance is non-zero: The data may not be as precise because the data came from a sample of a population, instead of the total population.

Detail: The sample may not be large enough to produce information about small population sub-groups or small geographical areas.

Example 2: A sample survey

Administrative data

Administrative data are collected as a result of an organization's day-to-day operations. Examples include data on births, deaths, marriages, divorces and car registrations. For example, prior to being issued a marriage license, a couple must provide the registrar with information about their age, sex, birthplace, address and previous marital status. These administrative files can be used later as a substitute for a sample survey or a census.

Advantages (+)

Sampling variance is zero: There is no variability attributed to the statistic because it was calculated using data from the entire population.

Time series: Data are collected on an ongoing basis, allowing for trend analysis.

Simplicity: Administrative data may eliminate the need to design a census or survey and the associated work.

Response burden: Since the data are already collected, there is no additional burden on the respondents.

Disadvantages (–)

Flexibility: Data items may be limited to essential administrative information, unlike a survey.

Population: Data are limited to the population on whom the administrative records are kept.

Change over time: Definitions are created to serve specific purposes, but often change and evolve over time. The statistician must understand that there is a possibility of change to the definitions of these files.

Concepts and definitions: The definitions are established by those who create and manage the file for their own purposes. For example, income definitions may not include everything a user expects to see.

Data quality: The emphasis placed on data quality may differ from organization to organization. This may be evident when someone relies on data collected from another organization.

Example 3: Administrative data