Skip to main content
Skip to secondary menu
Skip to footer

4 Estimation method

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Consider now an immigrant i who arrived in year c (a member of arrival cohort c) at the age of j. The earnings of this person in year t can be described with a fair degree of flexibility by

(4)

Formula - Long Description available

where μ_cjt is the mean log earnings in each cjt cell. Equation (4) is the first-stage estimation equation that extracts the individual earnings component from the earnings dynamics of the arrival cohort. A two-stage approach is standard in the literature on earnings inequality and earnings instability; however, in some studies ŷ_cjit are obtained by regressing log-earnings on an age polynomial (Haider 2001; Beach, Finnie and Gray 2003; Morissette and Ostrovsky 2005). The approach above appears more flexible in the context of this study.

After obtaining ŷ_cjit from the first-stage regression, the variance of ŷ_cjit can be decomposed into 'between' and 'within' components. In the descriptive part of this study, it is simply assumed (as in Beach, Finnie and Gray 2003; Morissette and Ostrovsky 2005) that y_cjit = ȳ_cji + v_cjit, and both variance components are computed following the formulas in Johnston (1984).³

As different arrival cohorts are observed for a different number of years (for instance, the 1980- to-1982 cohort is observed for 22 years, while the 1998-to-2000 cohort is observed for only 4 years) it would be difficult to make a cross-cohort comparison of inequality and instability if calculations were made for all t's in which a cohort is observed. To make results comparable across cohorts, the decomposition is computed for a fixed number of post-arrival periods: t=4 (all cohorts), t=7 (all cohorts except that of 1998 to 2000) and t=10 (all cohorts except those of 1995 to 1997 and 1998 to 2000). For instance, if t=4 then the variance for the 1980-to-1982 arrival cohort is computed based on 1983, 1984, 1985 and 1986; the variance for the 1983-to-1985 arrival cohort is computed based on 1986, 1987, 1988 and 1989; and so on. The resulting panels are unbalanced because, for instance, in four-year panels, those who were present for only two or three periods are also included; similarly, seven-year panels include those who were observed for five or six periods and 10-year panels include those who were observed for eight or nine periods.

As mentioned in the introduction, the goal of this study is not only to document immigrant earnings inequality and earnings instability but also to analyse their potential causes, in particular, the role of pre-arrival education, language ability and country of birth. The effects of these variables can be estimated by adding control variables into the first-stage equation, re-estimating y_cjit and using the new estimates of y_cjit on the second stage. More specifically, Equation (4) takes the following form

(5)

Formula - Long Description available

where X_cji is foreign education measured by the years of schooling, L_cji is a set of dummy variables reflecting the ability to speak either official language or both, and B_cji is the set of dummies related to the place of birth. A model that includes either X_cji, L_cji, B_cji, or the full set can be estimated. Hence, we can not only compare measures of earnings inequality and earnings instability across different arrival cohorts and arrival ages but also see the degree to which the earnings inequality and instability of each cohort are influenced by these variables. In the context of the Canadian immigrant selection process based on a point system⁴ that rewards foreign education and the ability to speak one of the official Canadian languages, such analysis may be particularly useful.

Although this is a very simple and intuitive method of analysing inequality and instability, it has obvious drawbacks. First, and most importantly, it does not allow for over-time changes in either permanent or transitory components. Second, it does not allow for the heterogeneity in earnings growth, as opposed to the heterogeneity in the levels of earnings. Finally, it ignores serial correlation in the transitory component. Hence, we will consider a more flexible model, similar to the models in Haider (2001) and Baker and Solon (2003).

We proceed as follows. Similar to (2), individual earnings of the members of c^th arrival cohort who were j-years old at arrival are assumed to follow

(6)

Formula - Long Description available

where u_cjit = u_cji,t−1 + r_cjit and ε_cjit = ρε_cji,t−1, + λ_t v_cjit . Hence, total experience is broken down into two components: (1) 'Canadian experience,' t_c, which is the same for all members of the c^th arrival cohort, and (2) potential foreign experience Z_cji, simply defined as the age at arrival minus 25.

From the residuals in (4), a sample auto-covariance matrix is constructed for each cohort and arrival age. For instance, for those who arrived during the 1980-to-1982 period at the age of 30, this will be a 22×22 matrix ( t=1983, 1984,…, 2004); for those who arrived during the 1995-to- 1997 period at the age of 30 this will be a 7×7 matrix ( t=1998, 1999,…, 2004). The size of the matrix will depend on both c and j; as the total number of arrival cohorts is seven, then for j ∈ [25,49] there will be 7×25=175 auto-covariance matrices Ω_cj in total, which will produce 13,615 sample moments.

Let ω_cj be a vector of unique elements of Ω_cj,

Formula - Long Description available

where M×M is the size of each Ω_cj matrix depending on c and j. All ω_cj can be stacked into a

single vector Ω so that each diagonal element ω_cjt in Ω_cj can be written as

(7)

Formula - Long Description available

and each off-diagonal element ω_cjts as

(8)

Formula - Long Description available

The transitory variance component ε_cjit = ρε_cji,t−1 + λ_tν_cjit takes the form of

(9)

Formula - Long Description available

and the covariance takes the form of

(10)

Formula - Long Description available

As in Baker and Solon (2003), σ²_v can be modelled as a quadratic or quartic function of t and Z_cj. In particular, it may be written as

(11)

Formula - Long Description available

Assuming that Ω^* = f(t,s,Z;θ) is the population analog of Ω, we can now estimate the set of model parameters

Formula - Long Description available

by the generalized method of moments (GMM) using 13,615 sample moment corresponding to 13,615 elements in Ω

(12)

Formula - Long Description available

The parameters in (12) can be estimated using a GMM minimum distance estimator that chooses an optimal set of parameter estimates θ by minimizing

(13)

Formula - Long Description available

Haider (2001) and Baker and Solon (2003) point out the advantages of using an identity matrix as a weighting matrix in place of W (see also Altonji and Segal 1996, Clark 1996). One particular source of efficiency loss in an equal-weighted minimum distance estimator is that it ignores the fact that ω_cj elements of Ω are based on a different number of observations. A more efficient estimator may be obtained if sample moments are weighted in proportion to the size of each cj cell. The estimation results in this study are based on a minimum-distance estimator that uses both an identity matrix as a weighting matrix and a weighting matrix that weights the sample moments according to their sample sizes.

It can be seen from (7) that setting p₁₉₈₃=1 (t=0) identifies σ²_α in a model with a single σ²_α

parameter in the growth term. In a full model with cohort-specific parameters

Formula - Long Description available

in the growth term, it is assumed that Formula - Long Description available

Formula - Long Description available

where t* is the first loading factor for the cohort to which i belongs. For instance, for the 1980-to-1982 cohort t*=1983; for the 1983-to-1985 cohort t*=1986; and so on. A diagonal element in Ω_cjt can now be expressed as

Formula - Long Description available

Hence, assuming Z_cj

=0, the permanent variance component for the 1980-to-1982 cohort in year 1983 (t=0) is

Formula - Long Description available

; for the 1983-to-1985 cohort it is

Formula - Long Description available

and so on. Put otherwise, all

Formula - Long Description available

'absorb' the first loading factor for the cohorts they represent. The estimates of

Formula - Long Description available

can be used instead of Formula - Long Description available

Formula - Long Description available

to construct cohort- specific profiles of immigrant earning inequality.

³ The within variance component is computed according to

Formula - Long Description available

and the between variance component can be computed according to

Formula - Long Description available

⁴ The 'point system' introduced in Canada in 1967 rewards applicants with extra points for a higher education level, knowledge of official languages (English or French) and younger age.