Data can be collected using three main types of surveys: censuses, sample surveys, and administrative data. Each has advantages and disadvantages. As students, you may be required to collect data at some time. The method you choose will depend on a number of factors.
A census refers to data collection about every unit in a group or population. If you collected data about the height of everyone in your class, that would be regarded as a class census. There are various reasons why a census may or may not be chosen as the method of data collection:
Sampling variance is zero: There is no sampling variability attributed to the statistic because it is calculated using data from the entire population.
Detail: Detailed information about small sub-groups of the population can be made available.
Cost: In terms of money, conducting a census for a large population can be very expensive.
Time: A census generally takes longer to conduct than a sample survey.
Response burden: Information needs to be received from every member of the target population.
Control: A census of a large population is such a huge undertaking that it makes it difficult to keep every single operation under the same level of scrutiny and control.
In a sample survey, only part of the total population is approached for data. If you collected data about the height of 10 students in a class of 30, that would be a sample survey of the class rather than a census. Reasons one may or may not choose to use a sample survey include:
Cost: A sample survey costs less than a census because data are collected from only part of a group.
Time: Results are obtained far more quickly for a sample survey, than for a census. Fewer units are contacted and less data needs to be processed.
Response burden: Fewer people have to respond in the sample.
Control: The smaller scale of this operation allows for better monitoring and quality control.
Sampling variance is non-zero: The data may not be as precise because the data came from a sample of a population, instead of the total population.
Detail: The sample may not be large enough to produce information about small population sub-groups or small geographical areas.
Administrative data are collected as a result of an organization's day-to-day operations. Examples include data on births, deaths, marriages, divorces and car registrations. For example, prior to being issued a marriage license, a couple must provide the registrar with information about their age, sex, birthplace, address and previous marital status. These administrative files can be used later as a substitute for a sample survey or a census.
Sampling variance is zero: There is no variability attributed to the statistic because it was calculated using data from the entire population.
Time series: Data are collected on an ongoing basis, allowing for trend analysis.
Simplicity: Administrative data may eliminate the need to design a census or survey and the associated work.
Response burden: Since the data are already collected, there is no additional burden on the respondents.
Flexibility: Data items may be limited to essential administrative information, unlike a survey.
Population: Data are limited to the population on whom the administrative records are kept.
Change over time: Definitions are created to serve specific purposes, but often change and evolve over time. The statistician must understand that there is a possibility of change to the definitions of these files.
Concepts and definitions: The definitions are established by those who create and manage the file for their own purposes. For example, income definitions may not include everything a user expects to see.
Data quality: The emphasis placed on data quality may differ from organization to organization. This may be evident when someone relies on data collected from another organization.