Annex C
Adult Literacy and Life Skills Survey Methodology

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Skip to text

Survey methodology
Assessment design
Target population and sample frame
Sample design
Sample size
Data collection
Scoring of tasks
Survey response and weighting

Text begins

Survey methodology

Each participating country was required to design and implement the Adult Literacy and Life Skills (ALL) survey according to the standards provided in the document 'Standards and Guidelines for the Design and Implementation of the Adult Literacy and Life Skills Survey.' These ALL standards established the minimum survey design and implementation requirements for the following project areas:

  1. Survey planning
  2. Target population
  3. Method of data collection
  4. Sample frame
  5. Sample design
  6. Sample selection
  7. Literacy assessment design
  8. Background questionnaire
  9. Task booklets
  10. Instrument requirements to facilitate data processing
  11. Data collection
  12. Respondent contact strategy
  13. Response rate strategy
  14. Interviewer hiring, training, supervision
  15. Data capture
  16. Coding
  17. Scoring
  18. All data file--format and editing
  19. Weighting
  20. Estimation
  21. Confidentiality
  22. Survey documentation
  23. Pilot Survey

Assessment design

The participating countries, with the exception of the state of Nuevo Leon in Mexico, implemented an ALL assessment design. Nuevo Leon assessed literacy using the International Adult Literacy Survey (IALS) assessment instruments.

In both ALL and IALS a Balanced Incomplete Block (BIB) assessment design was used to measure the skill domains. The BIB design comprised a set of assessment tasks organized into smaller sets of tasks, or blocks. Each block contained assessment items from one of the skill domains and covers a wide range of difficulty, i.e., from easy to difficult. The blocks of items were organized into task booklets according to a BIB design. Individual respondents were not required to take the entire set of tasks. Instead, each respondent was randomly administered one of the task booklets.

ALL assessment

The ALL psychometric assessment consisted of the domains Prose, Document, Numeracy, and Problem Solving. The assessment included four 30-minute blocks of Literacy items (i.e., Prose AND Document Literacy), two 30-minute blocks of Numeracy items, and two 30-minute blocks of Problem-Solving items.

A four-domain ALL assessment was implemented in Australia, Bermuda, Canada, Hungary, Italy, Netherlands, New Zealand, Norway, and the French and German language regions of Switzerland. The United States and the Switzerland Italian language region carried out a three-domain ALL assessment that excluded the Problem Solving domain. In addition to the mentioned assessment domains, these participating countries assessed the use of information and communication technology via survey questions incorporated in the ALL Background Questionnaire.

The blocks of assessment items were organized into 28 task booklets in the case of the four-domain assessment and into 18 task booklets for the three-domain assessment. The assessment blocks were distributed to the task booklets according to a BIB design whereby each task booklet contained two blocks of items. The task booklets were randomly distributed amongst the selected sample. In addition, the data collection activity was closely monitored in order to obtain approximately the same number of complete cases for each task booklet, except for two task booklets in the three-domain assessment containing only Numeracy items that required a larger number of complete cases.

IALS assessment

The state of Nuevo Leon, Mexico carried out an IALS assessment. The IALS assessment consisted of three literacy domains: Prose, Document, and Quantitative. In addition, the ALL Background Questionnaire was used in Nuevo Leon. The use of information and communication technology was assessed via survey questions incorporated in the ALL Background Questionnaire.

IALS employed seven task booklets with three blocks of items per booklet. The task booklets were randomly distributed amongst the selected sample. In addition, the data collection activity was monitored in order to obtain approximately the same number of complete cases for each task booklet.

Target population and sample frame

Each participating country designed a sample to be representative of its civilian non-institutionalized persons 16 to 65 years old (inclusive).

Countries were also at liberty to include adults over the age of 65 in the sample provided that a minimum suggested sample size requirement was satisfied for the 16 to 65 year age group. Canada opted to include in its target population adults over the age of 65. All remaining countries restricted the target population to the 16 to 65 age group.

Exclusions from the target population for practical operational reasons were acceptable provided a country's survey population did not differ from the target population by more than five per cent, i.e. provided the total number of exclusions from the target population due to undercoverage was not more than five per cent of the target population. All countries indicate that this five-per cent requirement was satisfied.

Each country chose or developed a sample frame to cover the target population. The following table shows the sample frame and the target population exclusions for each country:

Table C.1 Sample frame and target population exclusions

Sample design

Each participating country was required to use a probability sample representative of the national population aged 16 to 65. Of course, the available sampling frames and resources varied from one country to another. Therefore, the particular probability sample design to be used was left to the discretion of each country. Each country's proposed sample design was reviewed by Statistics Canada to ensure that the sample design standards and guidelines were satisfied.

Each country's sample design is summarized below. The sample size and response rate for each country can be found in the section following this one.

Australia

The sample was based on the population master sample which is the standard household survey design used in the Australian Bureau of Statistics. The population master sample, redesigned and selected once every 5 years, is a stratified, multi-stage cluster sample design. Stratification is based on 8 states/territories and 17 area types within each state/territory. Area types are based on part of state (i.e., state capital city or balance of state), region, population density and remoteness.

The ALL sample included four stages of sampling. The first stage sampling units were Census Collection Districts (CDs), the second stage sampling units were blocks (which are small areas within CDs), the third stage sampling units were clusters of dwellings, and the final stage units were the eligible household members.

The ALL sample was allocated proportionally to the standard ABS household survey sample. As in the standard ABS household surveys, the sample was allocated within states to ensure equal probability of selection for all households in the state. The allocation of sample between states is a compromise between accurate national estimates and usable estimates for the smaller states. As such, the probability of selection was different between states.

The first stage of selection involved selecting CDs systematically from an ordered list, with probability proportional to size (PPS), and without replacement. The list of CDs was ordered using 'serpentine ordering,' a method of ranking CDs in an attempt to maximize the geographical distance between selected CDs, thereby attempting to increase the heterogeneity of individual samples. The second stage of selection was a PPS selection of one block without replacement from each selected CD. In the third stage, a cluster of dwellings in the block was selected using systematic equal probability sampling. At the final stage, one person within the selected household was randomly selected from the list of in-scope household members.

Bermuda

A two-stage stratified probability design was employed. In stage one Bermuda's Land Valuation List of dwellings was stratified by parish, i.e., geographic region. Within each parish, a random sample of dwellings was selected with probability proportional to the number of parish dwellings. At stage two, one eligible respondent was selected using a Kish-type person selection grid.

Canada

A stratified multi-stage probability sample design was used to select the sample from the Census Frame. The sample was designed to yield separate samples for the two Canadian official languages, English and French. In addition, Canada increased the sample size in order to produce estimates for a number of population subgroups. Provincial ministries and other organizations sponsored supplementary samples to increase the base or to target specific subpopulations such as youth (ages 16 to 24 in Québec and 16 to 29 in British Columbia), adults aged 25 to 64 in Québec, linguistic minorities (English in Québec and French elsewhere), recent and established immigrants, urban aboriginals, and residents of the northern territories.

In each of Canada's ten provinces the Census Frame was further stratified into an urban stratum and a rural stratum. The urban stratum was restricted to urban centers of a particular size, as determined from the previous census. The remainder of the survey frame was delineated into primary sampling units (PSUs) by Statistics Canada's Generalised Area Delineation System (GArDS). The PSUs were created to contain a sufficient population in terms of the number of dwellings within a limited area of reasonable compactness. In addition, the Census Frame was ordered within each geographic region by highest level of education prior to sample selection, thus ensuring a representation across the range of educational backgrounds.

Within the urban stratum, two stages of sampling were used. In the first stage, households were selected systematically with probability proportional to size. During the second stage, a simple random sample algorithm was used by the CAPI application to select an individual from the eligible household adults. Three stages were used to select the sample in the rural stratum. In the first stage, Primary Sampling Units were selected with probability proportional to population size. The second and third stages for the rural stratum repeated the same methodology employed in the two-stage selection for the urban stratum.

Hungary

A stratified two-stage sample design was employed to yield a sample of persons selected Proportional to Population Size (PPS).

The population was stratified into seven regions and twenty counties. This stratification took into consideration the regional and county demographic characteristics and other conditions (e.g. rate of active and inactive population, unemployment rate) that varied from one region to another. In each county, the population was further stratified into three types of settlements: city, town, and village. Subsequently, the sample was selected in two stages:

  • Stage1:  a PPS sample of settlements,
  • Stage2:  a random selection of addresses from the settlements selected at stage 1. The list of addresses in each selected settlement were obtained from the Ministry of Interior files from the 2001 Census, the most up-to-date and precise data for the population of Hungary at the time of sample selection. The addresses to be contacted for interview were selected from these files.

Italy

A stratified three-stage probability design was used to select a sample using municipal polling lists. Italy was stratified geographically into 22 regions. In general the sample was allocated proportionally to the 22 regions. However, the regions Piemonte, Veneto, Toscana, Campania, and Trento were oversampled to satisfy an objective to produce separate estimates in these five regions.

At the first stage, municipalities were the primary sampling units. Within each geographic region the municipalities were stratified, based on the municipality population size, into self-representing units and non-self-representing units. The self-representing units, i.e., the larger municipalities and metropolitan municipalities, were selected with certainty in the sample. In the non-self-representing stratum in each region, two municipalities were selected with a probability proportional to the target population size. In total, 256 municipalities were selected from the self-representing and non-self-representing strata.

The second stage of the sample design defined 'sex sub-lists' as the secondary sampling unit. The polling list for each selected municipality comprised a number of sub-lists that were stratified by gender, referred to as 'sex sub-lists.' The polling list included the household address of Italian residents aged 18 to 65. The same number of sex sub-lists was systematically selected for each gender. A total of 1,326 sex sub-lists (663 in the male stratum and 663 in the female stratum) were selected.

At the third stage of sample design, a sample of 18 to 65 year old individuals was systematically selected from the secondary sampling units. Subsequently, at the household contact phase, all 16 to 17 year olds living in the household of a selected 18 to 65 year old were included in the sample.

Netherlands

The sample design in the Netherlands was a stratified, multi-stage systematic cluster design.

In the first stage, the country was stratified into 4 regions; North, East, West, and South. Within each stratum, a sample of municipalities was selected with probability proportional to municipality population size. This was achieved by ordering the municipalities within a stratum by population size and by systematically selecting the sample of municipalities using a random starting point and a fixed sampling interval. The population data were based on the municipality data, Gemeentelijke Basis Administratie (GBA), from the national statistical office, Centraal Bureau voor Statistiek (CBS).

In the second stage, within each selected municipality a systematic sample of postal code areas was drawn. The company, Experian, provided information about credit score (i.e. the percentage of households having debts within a postal code area) and purchasing power for the postal code areas (6 digits). The postal code areas were ordered by credit score and then by purchasing power. From a random starting point and with a fixed sampling interval (in terms of households), the households were drawn.

In the third stage within each selected postal code area one household was randomly selected. Data came from the Experian database on household information (based on CENDRIS, the current owner of the Post Office central database). This database is updated on a monthly basis.

In the fourth stage one eligible individual within the selected household was randomly selected.

New Zealand

The sample design was a stratified probability design with three stages of sampling – replicate, dwelling, and household member. The population was categorized into three strata - main stratum (everyone 16 to 65 eligible), Māori and Pacific stratum (only Māori and Pacific eligible), and Pacific stratum (only Pacific people eligible).

(a) Stage 1:  The Replicate

From the 38,000 meshblocks which formed the basis of New Zealand's 2001 Census of Population and Dwellings, those with 9 or fewer dwellings were eliminated, leaving 32,115 meshblocks with 10 or more dwellings. The coverage of permanent private dwellings was 98.6 per cent. The probability of selection for each meshblock was proportional to the number of dwellings in the meshblock. A total of 896 meshblocks were selected, and subsequently allocated to 32 replicates made up of 28 meshblocks per replicate. Each replicate contained meshblocks distributed north to south in approximately the same manner, and was thus a mini national probability sample.

(b) Stage 2:  The Dwelling

For the main stratum, dwellings were selected as follows. The sample interval was derived for each meshblock as the number of dwellings in the meshblock divided by 15. The sample interval thus differed according to the size of the meshblock. Beginning from a randomised starting point, interviewers selected dwellings according to the meshblock's sample interval.

In addition to the dwellings in the main stratum, up to an additional 21 dwellings per meshblock were also sampled for the Māori and Pacific, and the Pacific strata. In 4 of these dwellings, residents of either Māori or Pacific ethnicity were eligible for selection. In the remaining 17 dwellings only residents of Pacific ethnicity were eligible. The sample interval was 1 for these dwellings once the main stratum dwellings were set aside.

(c) Stage 3:  The Respondent

For the main stratum, one person per household was selected from all eligible household members using a Kish grid. For the two ethnic strata, the ethnicity of the household members (Māori or Pacific for stratum two, Pacific for stratum three) was an additional eligibility criterion prior to selection using the Kish grid.

Norway

The sample was selected from the 2002 version of the Norwegian Register of Education using a two-stage probability sample design.

The design created 363 Primary Sampling Units (PSUs) from the 435 municipalities in Norway. These PSUs were grouped into 109 geographical strata. Thirty-eight strata consisted of one PSU that was a municipality with a population of 25,000 or more. At the first stage of sample selection, each of these 38 PSUs was included with certainty in the sample. The remaining municipalities were allocated to 79 strata. The variables used for stratification of these municipalities were industrial structure, number of inhabitants, centrality, communication structures, commuting patterns, trade areas and (local) media coverage. One PSU was selected with probability proportional to size from each of these 79 strata.

The second stage of the sample design involved the selection of a sample of individuals from each sampled PSU. Each selected PSU was stratified by three education levels defined by the Education Register. The sample size for each selected PSU was determined by allocating the overall sample size to each selected PSU with probability proportional to the target population size. The PSU sample was then allocated with 30 per cent from the low-education group, 40 per cent from the medium-education group and 30 per cent from the high-education group. Individuals for whom the education level (84,318 persons) was not on the Education Register were excluded from the sampling.

Nuevo Leon, Mexico

The sample design was a stratified probability design with two stages of sampling within each stratum.

The 51 municipalities in Nuevo Leon were grouped geographically into three strata: Stratum 1 – Census Metropolitan Area of Monterrey, consisting of 9 municipalities; Stratum 2 – the municipalities of Linares and Sabinas Hidalgo; Stratum 3 – the remaining 40 municipalities of Nuevo Leon. The initial sample was allocated to the three strata proportional to the number of dwellings in each stratum.

At the first stage of sample selection, in each stratum a simple random sample of households was selected. The second sampling stage consisted of selecting one person belonging to the target population from each selected household using a Kish-type person selection grid.

Switzerland

The sample design was a stratified probability design with two stages of sampling. Separate estimates were required for Switzerland's three language regions (i.e., German, French, Italian). Thus, the three language regions are the primary strata. Within the language regions, the population was further stratified into the metropolitan areas represented by the cantons of Geneva and Zurich and the rest of the language regions. At the first stage of sampling, in each stratum a systematic sample of households was drawn from a list of private telephone numbers. In the second stage, a single person belonging to the target population was selected from each household using a Kish-type person selection grid.

United States

A stratified multi-stage probability sample design was employed in the United States.

The first stage of sampling consisted of selecting a sample of 60 primary sampling units (PSUs) from a total of 1,883 PSUs that were formed using a single county or a group of contiguous counties, depending on the population size and the area covered by a county or counties. The PSUs were stratified on the basis of the social and economic characteristics of the population, as reported in the 2000 Census. The following characteristics were used to stratify the PSUs: region of the country, whether or not the PSU is a Metropolitan Statistical Area (MSA), population size, percentage of African-American residents, percentage of Hispanic residents, and per capita income. The largest PSUs in terms of a population size cut-off were included in the sample with certainty. For the remaining PSUs, one PSU per stratum was selected with probability proportional to the population size.

At the second sampling stage, a total of 505 geographic segments were systematically selected with probability proportionate to population size from the sampled PSUs. Segments consist of area blocks (as defined by Census 2000) or combinations of two or more nearby blocks. They were formed to satisfy criteria based on population size and geographic proximity.

The third stage of sampling involved the listing of the dwellings in the selected segments, and the subsequent selection of a random sample of dwellings. An equal number of dwellings was selected from each sampled segment.

At the fourth and final stage of sampling, one eligible person was randomly selected within households with fewer than four eligible adults. In households with four or more eligible persons, two adults were randomly selected.

Sample size

A sample size of 5,400 completed cases in each official language was recommended for each country that was implementing the full ALL psychometric assessment (i.e., comprising the domains Prose and Document Literacy, Numeracy, and Problem-Solving).

A sample size of 3,420 complete cases in each official language was recommended if the Problem Solving domain was excluded from the ALL assessment.

A sample size of 3,000 complete cases was recommended for the state of Nuevo Leon, Mexico, which assessed literacy skills with the psychometric task booklets of the International Adult Literacy Survey (IALS).

Table C.2 shows the final number of respondents (complete cases) for each participating country's assessment language(s).

Table C.2 Sample size by assessment language

Data collection

The ALL survey design combined educational testing techniques with those of household survey research to measure literacy and provide the information necessary to make these measures meaningful. The respondents were first asked a series of questions to obtain background and demographic information on educational attainment, literacy practices at home and at work, labour force information, information communications technology uses, adult education participation and literacy self-assessment.

Once the background questionnaire had been completed, the interviewer presented a booklet containing six simple tasks (Core task). Respondents who passed the Core tasks were given a much larger variety of tasks, drawn from a pool of items grouped into blocks, each booklet contained 2 blocks which represented about 45 items. No time limit was imposed on respondents, and they were urged to try each item in their booklet. Respondents were given a maximum leeway to demonstrate their skill levels, even if their measured skills were minimal.

Data collection for the ALL project took place during the years 2002 to 2008, depending on the country. Table C.3 presents the collection periods for each participating country.

Table C.3 Survey collection period

To ensure high quality data, the ALL Survey Administration Guidelines specified that each country should work with a reputable data collection agency or firm, preferably one with its own professional, experienced interviewers. The manner in which these interviewers were paid should encourage maximum response. The interviews were conducted in home in a neutral, non-pressured manner. Interviewer training and supervision was to be provided, emphasizing the selection of one person per household (if applicable), the selection of one of the 28 main task booklets (if applicable), the scoring of the core task booklet, and the assignment of status codes. Finally the interviewers' work was to have been supervised by using frequent quality checks at the beginning of data collection, fewer quality checks throughout collection and having help available to interviewers during the data collection period.

The ALL took several precautions against non-response bias, as specified in the ALL Administration Guidelines. Interviewers were specifically instructed to return several times to non-respondent households in order to obtain as many responses as possible. In addition, all countries were asked to ensure address information provided to interviewers was as complete as possible, in order to reduce potential household identification problems.

Countries were asked to complete a debriefing questionnaire after the Main study in order to demonstrate that the guidelines had been followed, as well as to identify any collection problems they had encountered. Table C.4 presents information about interviews derived from this questionnaire.

Table C.4 Interviewer information

Data processing

As a condition of their participation in the ALL study, countries were required to capture and process their survey data files using procedures to ensure logical consistency and acceptable levels of data capture error. Specifically, countries were advised to conduct complete verification of the captured scores (i.e. enter each record twice) in order to minimize error rates. Because the process of accurately capturing the task scores is essential to high data quality, 100 per cent keystroke verification was required.

Each country was also responsible for coding the industry, occupation, and education variables using standard coding schemes such as the International Standard Industrial Classification (ISIC), the International Standard Classification for Occupation (ISCO) and the International Standard Classification for Education (ISCED). Coding schemes were provided by Statistics Canada for all open-ended items, and countries were given specific instructions about the coding of such items.

In order to facilitate comparability in data analysis, each ALL country was required to map its national dataset into a highly structured, standardized record layout. In addition to specifying the position, format and length of each field, the international record layout included a description of each variable and indicated the categories and codes to be provided for that variable. Upon receiving a country's file, Statistics Canada performed a series of range checks to ensure compliance with the prescribed record layout format. As well, flow edits and consistency edits were also run on each country's file. When anomalies were detected in a country's file, the country was notified of the problem and requested to resolve the edit issues, and to subsequently submit a cleaned file.

Scoring of tasks

Persons charged with scoring in each country received intensive training in scoring responses to the open-ended items using the ALL scoring manual. As well, they were provided a tool for capturing closed-format questions. Table C.5 provides a summary of the scoring operations.

Table C.5 Scoring operations summary

To aid in maintaining scoring accuracy and comparability between countries, the ALL survey introduced the use of an electronic bulletin board, where countries could post their scoring questions and receive scoring decisions from the domain experts. This information could be viewed by all countries so that scoring could be adjusted.

To further ensure quality, countries were monitored as to the quality of their scoring in two ways.

First, within a country, at least 20 per cent of the tasks had to be re-scored. Guidelines for intra-country rescoring involved rescoring a larger portion of booklets at the beginning of the scoring process to identify and rectify as many scoring problems as possible. As a second phase, they were to select a smaller portion of the next third of the scoring booklets; the last phase was viewed as a quality monitoring measure, which involved rescoring a smaller portion of booklets regularly to the end of the re-scoring activities. The two sets of scores needed to match with at least 95 per cent accuracy before the next step of processing could begin. In fact, most of the intra-country scoring reliabilities were above 95 per cent. Where errors occurred, a country was required to go back to the booklets and rescore all the questions with problems and all the tasks that belonged to a problem scorer.

Second, an international re-score was performed. The main goal of the re-score was to verify that no country scored consistently differently from another.

For Bermuda, Canada, Italy, Norway, Nuevo Leon-Mexico, Switzerland, and the United States, each country had 10 per cent of its sample re-scored by scorers in another country. For example, a sample of task booklets from the United States was re-scored by the persons who had scored Canadian English booklets, and vice-versa. Inter-country score reliabilities were calculated by Statistics Canada and the results were evaluated by the Educational Testing Service based in Princeton. Again, strict accuracy was demanded: a 90 per cent correspondence was required before the scores were deemed acceptable. Any problems detected had to be re-scored.

For Australia, Hungary, the Netherlands, and New Zealand, each country was required to score a standard set of 400 Canadian English booklets. Inter-country score reliabilities were calculated by Statistics Canada and the results were evaluated by the Educational Testing Service.

Table C.6 displays the achieved levels of inter-country score agreement for each domain.

Table C.6 Scoring – per cent reliability by domain

Survey response and weighting

The following table summarizes the sample sizes and response rates for each participating country.

Table C.7 Sample size and response rate summary

Each participating country in ALL used a multi-stage probability sample design with stratification and unequal probabilities of respondent selection. Furthermore, there is a need to compensate for the non-response that occurred at varying levels. Therefore, the estimation of population parameters and the associated standard errors is dependent on the survey weights.

All participating countries used the same general procedure for calculating the survey weights. However, each country developed the survey weights according to its particular probability sample design.

In general, two types of weights were calculated by each country, population weights that are required for the production of population estimates, and jackknife replicate weights that are used to derive the corresponding standard errors.

Population weights

For each respondent record the population weight was created by first calculating the theoretical or sample design weight. Then a base sample weight was derived by mathematically adjusting the theoretical weight for non-response. The base weight is the fundamental weight that can be used to produce population estimates. However, in order to ensure that the sample weights were consistent with a country's known population totals (i.e., benchmark totals) for key characteristics, the base sample weights were ratio-adjusted to the benchmark totals.

Table C.8 provides the benchmark variables for each country and the source of the benchmark population counts.

Jackknife weights

It was recommended that 10 to 30 jackknife replicate weights be developed for use in determining the standard errors of the survey estimates. Switzerland produced 15 jackknife replicate weights. The remaining countries produced 30 jackknife replicate weights.

Table C.8 Benchmark variables by country

Contributors

Owen Power, Statistics Canada
Carrie Munroe, Statistics Canada
Sylvie Grenier, Statistics Canada

Date modified: