10 Data sources and data development

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

The early research on immigration issues helped guide data development. Statistics Canada, in co-operation with the relevant policy agencies, has developed a suite of datasets used in the types of analysis described above.

10.1 Census of Population, the mainstay of immigration research in Canada

The requirement for large samples when focusing on successive cohorts of entering immigrants drives analysts to large sample data sources, most notably the Census of Population. It remains the most important data source for immigration research in Canada. Like all datasets, it has both advantages and disadvantages.

10.1.1 Advantages of the census for immigration research

Sample size is obviously the reason many people use the census. The 20% sample of Canadians, with detailed information on education, occupation, earnings and family income, geographic mobility and neighbourhood of residence, has been the mainstay for immigration research in Canada over the past few decades. It allows the researcher to focus on successive entering immigrant cohorts, an essential part of such research.
Recent changes to the census allow the country where the highest level of education was received to be identified. Since economic returns to education for immigrants differ significantly depending upon the country where it is received, this is important.
Parental place of birth—added to the census in 2001—allows for the analysis of the outcomes of the children of immigrants for the first time.
Research on the impact of immigration on neighbourhoods is facilitated by the fact that the most commonly used 'neighbourhood' definition—the census tract—is longitudinally consistent from one census to the next. This is a tremendous advantage. One can study changes in the ethnic and immigrant composition of neighbourhoods over decades, and also look at the correlation with other neighbourhood outcomes.
One of the most important events to encourage and facilitate immigration analysis in Statistics Canada has been the creation of easy-to-use, 'flat,' SAS or STATA files from the census. In earlier periods, census data were only available to analysts through a relatively difficult to use hierarchical software that only produced tables. Following the 2001 Census, easy-to-use flat files formatted for use with popular statistical analysis packages have allowed researchers to themselves exploit the census data. This change has been very important in facilitating the immigration research reported here.

10.1.2 Some issues with the census data

Language ability (in French or English) may be one of the most important determinants of economic and social integration. However, measures on almost all surveys do not capture it well. Variables such as mother tongue and language spoken at work are available, but they do not provide a good measure of ability. The variables are self reported, and they are not designed to be measures of language ability. The lack of a reliable measure of language ability is one of the major data shortcomings in immigration research.
The use of 'synthetic cohorts' from the census has allowed researchers to develop a picture of the earnings trajectory with years since immigration. For example, immigrants aged from 30 to 34 in one census are assumed to be the same population as those aged from 35 to 39 in the next census, five years later. Based on such an assumption, earnings, employment, poverty and other trajectories are developed (see Chart 1). However, recent research has shown that, in fact, a significant proportion (perhaps one quarter) of immigrants leave Canada within the first few years of their arrival. Little is known yet of the characteristics of these leavers and how they compare with the stayers. However, some 'selection' effects almost certainly exist in the synthetic cohort trajectories produced from the census. Whether this is a positive or negative selection, or its extent, is as yet unknown.
Immigrant class (skilled economic, family, refugee, etc.) is an important determinant of various outcomes, but it cannot be identified on the census. It is unlikely that self reporting would provide reliable data.

10.2 Other on-going surveys used in immigration research

There are a host of other on-going surveys to which researchers turn when conducting immigration research in Canada. They include:

• The Survey of Labour and Income Dynamics (SLID)

SLID is a longitudinal survey that focuses primarily on income and labour market outcomes of the adult population, but it has an immigrant identifier, which allows some immigrant research. At the heart of the survey's objectives is the understanding of the economic well- being of Canadians (and immigrants), and how the population reacts and adjusts to economic 'shocks.'

This longitudinal survey of 30,000 households was first conducted 1993, and it is in many ways similar to the German and British household panel surveys. However, it has a narrower content focus, a larger sample size and a shorter panel length (six years) than these European panel surveys. Also, the survey is cross-sectionally representative, allowing for the production of the official annual income statistics. Statistics Canada is currently exploring the possibility of replacing SLID with a household panel survey that more closely resembles, in design and content, those in existence in some European countries, the United States and Australia. Such a move would enhance the opportunity for internationally comparative research. It is as yet unknown whether a new Canadian household panel survey will be implemented. It depends, as always, upon funding and the degree of support from policy agencies.

SLID supports some analysis of immigrant economic assimilation issues, but it does not have the sample or statistical reliability to concentrate on particular entering cohorts, such as was done in the research described earlier.

• The International Adult Literacy and Skills Survey (IALSS)
The 2003 IALSS is the Canadian component of the Adult Literacy and Life Skills Survey. The main purpose of IALSS was to assess how well adults used printed information to better function in society. IALSS data include background information and psychometric results of respondents' proficiency along four skill domains: prose and document literacy, numeracy and problem-solving. The survey is capable of supporting important research on the effect of literacy and numeracy skills on earnings outcomes of both the Canadian born and immigrants (as well as the gap between the two). Citizenship and Immigration Canada funded an oversampling of immigrants to allow more recent immigrants to be differentiated from earlier immigrants. It has supported some very important immigrant research. Some of the results were reported earlier in this paper.

• The National Population Health Survey (NPHS)

The NPHS is a longitudinal survey of17,276 persons of all ages, and it was initiated in 1994. These same persons are interviewed every two years over a period of 18 years. The objectives of the NPHS are to aid in the development of public policy by providing information on the health status of the population, understanding the determinants of health, and increasing the understanding of the relationship between health status and health care utilization. An immigrant identifier supports both research on the health dimensions of the immigrant population and comparisons with the Canadian born.

• The National Longitudinal Survey of Children and Youth (NLSCY)

The NLSCY is a long-term study of Canadian children that follows their development and well-being from birth to early adulthood. The NLSCY is designed to collect information about factors influencing a child's social, emotional and behavioural development and to monitor the impact of these factors on the child's development over time. With a very small immigrant sample, it does allow some basic comparisons of childhood outcomes of immigrants and the Canadian born, and the trajectory of childhood outcome 'gaps' as children age. As with some of the other surveys, sample size is of concern when addressing issues surrounding the assimilation of immigrant children.

10.3 The development of new data sources to support immigration research

In spite of the existence of surveys such as those mentioned above, the early research suggested that there were a number of data gaps. As a result, the following data sources, largely longitudinal, were developed.

10.3.1 The development of the Longitudinal Immigration Database

As noted, the census remains the primary source of data for immigration research. But it is conducted only once every five years, and outcomes for entering cohorts can change over such a period. Furthermore, the availability of true longitudinal data (not synthetic cohorts) would enhance the power of the analysis. Hence, another source of longitudinal data with very large samples of immigrants was required. This requirement was met through the creation of two data sources that are based mainly on taxation data. The first was the Longitudinal Immigration Database (IMDB).

The IMDB was created jointly by Statistics Canada and Citizenship and Immigration Canada; it was funded in part by a consortium of users, including provincial governments. This file merges immigrant landing records with taxation records. The former provide detailed information on immigrant characteristics, the latter give detailed longitudinal information on employment earnings in particular. Given the universal coverage of tax files (almost complete coverage of the population in many age groups), this data source allows detailed tracking of earnings trajectories of entering cohorts of immigrants since the early 1980s up to 2005. The IMDB was created in large part to provide the data necessary to evaluate outcomes for immigrants in different immigrant classes and program changes implemented by Citizenship and Immigration Canada.

But the IMDB also has its shortcomings. Most notably, there is no comparison group (no data on the Canadian born); therefore, the earnings gaps between immigrants and the Canadian born cannot be assessed. Furthermore, the data are for individuals only; families are not formed on this file. Hence, economic welfare issues such as low-income levels cannot be measured. To overcome this, yet another administrative data source was created.

10.3.2 Linking of the Longitudinal Immigration Database to the Longitudinal Administrative Databank

The Longitudinal Administrative Databank (LAD) was an already existing longitudinal data source, covering 20% of the Canadian population, and based on taxation data. It does allow families to be formed, and it covers the period from 1982 to the present. LAD supports numerous kinds of analyses, such as the effect of divorce on economic outcomes for men and women, the intergenerational income mobility of Canadians, poverty dynamics, entry and exit from social assistance and other government programs, and the 'brain drain.' However, until recently it was not possible to identify immigrants on this file, and hence precluded potentially important immigrant research. The linking of LAD with the IMDB file, with its detailed information on immigrants and immigrant identifier, overcame this problem. Due to the very large sample, this data source supported more recent work (since the 2001 Census) on the economic assimilation patterns of immigrants entering Canada. It also has supported research on low-income dynamics among immigrants, the use of government transfers (social assistance, employment insurance, etc.) by immigrants, the onward migration of immigrants and other topics.

10.3.3 The development of the Longitudinal Survey of Immigrants to Canada

As useful as the previously described administrative longitudinal sources are, they suffer from a drawback common to almost all administrative databases: they contain a limited number of covariates. The administrative data have substantial detail on immigrants, but they lack key information on the Canadian born, such as education and occupation. Furthermore, being based on taxation data, they cannot support research on the social integration of immigrants. To overcome these and other data gaps, Citizenship and Immigration Canada, along with Statistics Canada, created a true longitudinal sample survey. The Longitudinal Survey of Immigrants to Canada (LSIC) followed a single entering cohort of immigrants (those entering in 2000) for four years, with interviews at six months, two years and four years after entry.

Topics covered in the LSIC include housing, education, foreign credential recognition, employment, health, values and attitudes, the development and use of social networks, income, and perceptions of settlement in Canada. This survey, with an initial sample of 12,000 immigrants, continues to be exploited by analysts inside and outside of Statistics Canada, particularly those concerned with social integration issues.

10.3.4 The development of the Ethnic Diversity Survey

In 2002 Canadian Heritage and Statistics Canada initiated the Ethnic Diversity Survey (EDS), which focused on issues relating to the rapidly changing cultural diversity in Canada. While not strictly an immigrant survey, the EDS has provided many opportunities for immigration research. It used the 2001 Census as the survey frame. It was designed to better understand how people's backgrounds affect their participation in the social, economic and cultural life of Canada.

Topics covered include ethnic ancestry, ethnic identity, place of birth, visible-minority status, religion, religious participation, knowledge of languages, family background, family interaction, social networks, civic participation, interaction with society, attitudes, satisfaction with life, trust and socioeconomic activities. The sample of approximately 57,000 individuals was stratified so as to provide large samples for the ethnic groups whose mother tongue tends to be a language other than English.

10.3.5 Recent changes to existing surveys

Other steps have been taken to improve the data availability for immigration research. Most notably, last year an immigrant identifier was added to the monthly Labour Force Survey. This move will allow data on immigrant outcomes to be more current.

Faced with the demand for improved and expanded longitudinal and cross-sectional data on immigrant outcomes, Statistics Canada along with its partners in the three most relevant policy agencies—Citizenship and Immigration Canada, Heritage Canada and Human Resources and Skills Development Canada—responded in a significant manner.