3.2 Sampling
3.2.3 Non-probability sampling

Text begins

Non-probability sampling is a method of selecting units from a population using a subjective (i.e. non-random) method. Since non-probability sampling does not require a complete survey frame, it is a fast, easy and inexpensive way of obtaining data. However, in order to draw conclusions about the population from the sample, it must assume that the sample is representative of the population. This is often a risky assumption to make in the case of non-probability sampling due to the difficulty of assessing whether the assumption holds. In addition, since elements are chosen arbitrarily, there is no way to estimate the probability of any one element being included in the sample. Also, no assurance is given that each item has a chance of being included, making it impossible either to estimate sampling variability or to identify possible bias.

In general, official statistical agencies around the world have been using probability sampling as their preferred tool to meet information needs about a population of interest. In the last few years, however, there have been some research and studies about how to apply non-probability sampling into the official statistics. Using other data sources has been increasingly explored. There are five key reasons behind this trend:

  • the decline in response rates in probability surveys;
  • the high cost of data collection;
  • the increased burden on respondents;
  • the desire for access to real-time statistics, and
  • the surge of non-probability data sources such as web surveys and social media.

Some have suggested the possibility of a shift in the paradigm and traditional approach to statistics. However, data from non-probability sources have a few challenges with respect to data quality, including the potential presence of participation and selection bias. Therefore, data collected using non-probability sampling should be used with extra caution.

The commonly used non-probability sampling methods include the following.

Convenience or haphazard sampling

Units are selected in an arbitrary manner with little or no planning involved. Haphazard sampling assumes that the population units are all alike, then any unit may be chosen for the sample. An example of haphazard sampling is the vox pop survey where the interviewer selects any person who happens to walk by. Unfortunately, unless the population units are truly similar, selection is subject to the biases of the interviewer and whoever happened to walk by at the time of sampling.

Volunteer sampling

The respondents are only volunteers in this method. Generally, volunteers must be screened so as to get a set of characteristics suitable for the purposes of the survey (e.g. individuals with a particular disease). This method can be subject to large selection biases, but is sometimes necessary. For example, for ethical reasons, volunteers with particular medical conditions may have to be solicited for some medical experiments.

Another example of volunteer sampling is callers to a radio or television show, when an issue is discussed and listeners are invited to call in to express their opinions. Only the people who care strongly enough about the subject one way or another tend to respond. The silent majority does not typically respond, resulting in a large selection bias. Volunteer sampling is often used to select individuals for focus groups or in-depth interviews (i.e. for qualitative testing, where no attempt is made to generalize to the whole population).

Judgement sampling

With this method, sampling is done based on previous ideas of population composition and behaviour. An expert with knowledge of the population decides which units in the population should be sampled. In other words, the expert purposely selects what is considered to be a representative sample. Judgment sampling is subject to the researcher’s biases and is perhaps even more biased than haphazard sampling.

Since any preconceptions the researcher has are reflected in the sample, large biases can be introduced if these preconceptions are inaccurate. However, it can be useful in exploratory studies, for example in selecting members for focus groups or in-depth interviews to test specific aspects of a questionnaire.

Quota sampling

This is one of the most common forms of non-probability sampling. Sampling is done until a specific number of units (quotas) for various subpopulations have been selected. Quota sampling is a means for satisfying sample size objectives for the subpopulations.

The quotas may be based on population proportions. For example, if there are 100 men and 100 women in the population and a sample of 20 are to be drawn, 10 men and 10 women may be interviewed. Quota sampling can be considered preferable to other forms of non-probability sampling (e.g. judgment sampling) because it forces the inclusion of members of different subpopulations.

Quota sampling is somewhat similar to stratified sampling, which is probability sampling, in that similar units are grouped together. However, it differs in how the units are selected. In probability sampling, the units are selected randomly while in quota sampling a non-random method is used—it is usually left up to the interviewer to decide who is sampled. Contacted units that are unwilling to participate are simply replaced by units that are, in effect ignoring nonresponse bias. Market researchers often use quota sampling (particularly for telephone surveys) instead of stratified sampling to survey individuals with particular socio-economic profiles. This is because compared with stratified sampling, quota sampling is relatively inexpensive and easy to administer and has the desirable property of satisfying population proportions. However, it disguises potentially significant selection bias.

As with all other non-probability sample designs, in order to make inferences about the population, it is necessary to assume that persons selected are similar to those not selected. Such strong assumptions are rarely valid.

Snowball or network sampling

Suppose a researcher wishes to find rare individuals in the population, and already knows of the existence of some of these individuals and how to contact them. One approach is to contact those individuals and simply ask them if they know anyone like themselves, then contact those people, etc. The sample grows like a snowball rolling down a hill to hopefully include virtually everybody with that characteristic. Snowball sampling is useful for rare or hard to reach populations such as people with disabilities, homeless people, drug users, or other persons who may not belong to an organised group or such as musicians, painters, or poets, not readily identified on a survey list frame. However, some individuals or subgroups may have no chance of being sampled. In order to be able to generalize the conclusion to the whole population, some assumptions, which are usually not met, are required.

Crowdsourcing

Crowdsourcing has been defined slightly differently by researchers from various areas. Despite the multiplicity of definitions for crowdsourcing, one constant has been the broadcasting of a problem to the public, and an open call for contributions to help solve the problem. Members of the public submit solutions that are then owned by the entity (e.g. individuals, companies, or organizations), which originally broadcast the problem. Crowdsourcing is channelling the experts’ desire to solve a problem and then freely sharing the answer with everyone.

As part of Statistics Canada’s modernization, crowdsourcing has become an innovative way to collect valuable information for statistical purposes. By using crowdsourcing as the only collection method, surveys can be executed quickly with reduced cost and response burden. To better understand the challenges associated with crowdsourcing and to ensure that the results are good quality, methods are being developed to compare and validate the data with other sources of complementary data. A couple of examples are outlined below.

  • As part of the OpenStreetMap (OSM) pilot project, which was completed in March 2018, crowdsourced geographic information was collected by mapping the building footprints in the Ottawa, Ontario and Gatineau, Quebec areas. The network and experience of this pilot project helped to launch the Building Canada 2020 initiative (BC2020), aimed at mapping all building footprints of Canada on OSM by the year 2020.
  • During the pandemic of COVID-19, Statistics Canada developed a series of initiatives to generate data and analysis quickly and effectively via crowdsourcing to help fill the data gaps on the economic and social impact of COVID-19 on Canadians. For example, the survey, Impacts of COVID-19 on Canadians, collected data from April 3 to 9, 2020. Close to 200,000 people living in Canada voluntarily answered the survey, which focused on behaviour and attitudes related to COVID-19. And then, a series of results were released over the following weeks.

Web panels

A web panel (or online or internet panel) could be defined as an access panel of people willing to respond to web questionnaires. It contains a sample of potential respondents who declare that they will cooperate for future data collection if selected. A web panel survey is a survey utilizing samples from web panels.

Web panels can be seen as sampling frames for web panel surveys. All persons in the panels must have up-to-date e-mail addresses. Recruitment for web panels can be made in different ways. Respondents can be sourced from offline channels: telephone, TV ads, radio ads, ads in newspapers and magazines, addressed letters, outdoor posters, customer registers, etc. Respondents can also be sourced from online channels: e-mails, websites, banners, community sites, member programs, etc. Often, many channels are used in order to achieve the necessary diversity. After the recruitment, a profile survey is conducted in order to collect information on the new participants to the panel. The recruitment can be done using either probability-based or self-recruited panels. In practice, the distinction between these two may not be very important if the nonresponse rate is very high for the probability-based panels. Sometimes incentives, such as gift cards or souvenirs, are used to attract people and boost response rates. Web panels are often used for marketing research or pilot studies.

During the pandemic of COVID-19, Statistics Canada developed a new web panel survey, Canadian Perspectives Survey Series (CPSS), to get timely information about how Canadians are coping with COVID-19. More than 4,600 people in the 10 provinces responded to this survey between March 29 and April 3. Unlike the most web panels, CPSS is a probabilistic panel based on the Labour Force Survey (LFS), as some respondents agreed to complete short online questionnaires following their participation to the LFS. CPSS enables Statistics Canada to collect important information from Canadians more efficiently, more rapidly and at a lower cost, compared with traditional survey methods. 

Advantages and disadvantages of non-probability sampling

Advantages
  • Quick and convenient
    As a general rule, non-probability samples can be constituted quickly, which allows the survey to be launched, executed and finished in shorter times.
  • Inexpensive
    It usually only takes a few hours to an interviewer to conduct such a survey. As well, non-probability samples are generally not spread out geographically, therefore travelling expenses for interviewers are low. In web panels or crowdsourcing, no interviewers are necessary. Tracing and persuasion of non-respondents are not required or less demanding.
  • Reduce respondent burden
    In the case of volunteer sampling or crowdsourcing, respondents volunteer to participate in the survey without being solicited personally.
Disadvantages
  • Selection bias
    In order to make inferences about the population, it requires strong assumptions about the similarity between the sample and the population even though the respondents are self-selected. Due to the selection bias presented in all non-probability samples, these are often dangerous assumptions to make. When generalization to the whole population is to be made, probability sampling should be performed instead.
  • Noncoverage (undercoverage) bias
    Since some units in the population can have no chance of being included in the sample, it results noncoverage bias. For example, people without the internet at home might never be selected for a web panel and may differ from those with the internet.
  • Difficulty of assessing the quality
    It is impossible to determine the probability that a unit in the population is selected for the sample, so reliable estimates and estimates of sampling error cannot be computed.

Date modified: