This appendix reports the results of some checks made of the findings reported in this study with what may be found using the YITS-B dataset, described earlier as probably the other best data source for studying persistence in PSE in Canada and the basis of other work undertaken by the authors (Finnie and Qiu (2008)). We also suggest other checks that might be carried out, including two file linkages that could be very interesting.
The general strengths and weaknesses of the YITS and PSIS datasets have been mentioned in the main part of the paper. These stem mainly from the general characteristics of administrative data (the PSIS) and survey data (the YITS-B), and relate to coverage, attrition, and the variables available. Furthermore, in this case the YITS-B is a national level dataset, whereas the PSIS used in this analysis is limited to Atlantic Canada.
Appendix Table A.4.1 (below) shows the first year transition rates based on the PSIS, as presented above, as well as a set of YITS results that are roughly comparable to those reported in Finnie and Qiu (2008), but adjusted to be more directly comparable to the PSIS.
In particular, the YITS analysis has been restricted to students attending institutions in the Atlantic region (rather than anywhere in Canada), and the definition of switching has been changed so that within-institution program changers, defined as switchers in the original YITS analysis, are re-classified as continuers. That is, we consider graduating from, continuing in, or leaving a given institution as the dynamics of interest, as has been done in the PSIS analysis, rather than graduating from, continuing in, or leavingthe initial program, as was done in the authors’ original YITS work.
Furthermore, moves to institutions outside of the Atlantic region, defined as switchers in the YITS analysis, are now re-classified as leavers for the sake of these comparisons. This intentionally erroneous classification thus corresponds to the treatment in the PSIS analysis due to its capture of students in the Atlantic region only (meaning that those who continue their studies but leave the region to do so are registered as leavers).
There remain, though, other differences in the two datasets, and as a result in any related calculations, that need to be understood. First, the YITS analysis is based on the first PSE program taken by those individuals included in the YITS sample, who were age 18 to 20 in 2000. This could have occurred at any point over the period covered by the data, starting in 1996 (identified retroactively in the first interview) and continuing through the just-available Cycle IV survey carried out in 2006. As a result, the emphasis on first programs in the YITS analysis is more unambiguous than in the PSIS analysis, the age spread is a little different, and so is the period covered by the analysis.
Secondly, the YITS is subject to sample response bias, as discussed in the main part of the paper, whereas the PSIS should not be since it includes all individuals in all PSE programs (albeit in Atlantic Canada only) over the period covered. We might thus expect leaving rates, in particular, to be understated in the YITS, since leavers would likely be harder to follow over time, resulting in these dynamics being missed and the related transition estimates to be biased accordingly.
But in addition to this response bias, a substantial number of PSE profiles are difficult to classify in the YITS due to contradictory information given across surveys (e.g., the individual said he or she was in PSE at the end of one cycle, but at the next interview claimed that had not been the case), and estimated persistence rates vary considerably depending on the particular treatment of these cases.
Finally, the samples sizes for Atlantic Canada are small in the YITS, and the persistence estimates commensurately subject to wider variances.
For these and other related reasons we should, therefore, not expect the two sets of results to be identical. The question is: are they close, and do their differences correspond to what we might expect? And can they point us in the direction of other checks that could be carried out?
The first year continuing rates are in fact very close in the two sets of results. For bachelor’s level students they are 79.8 percent in the PSIS and 81.2 percent in the YITS, while at the college level they are 52.6 and 50.4 percent in the PSIS and YITS respectively. The number of graduates also agrees fairly closely: negligible at the bachelor’s level and 24 to 27 percent for college students. This similarity of findings is reassuring for both analyses.
The leaver and switcher rates, however, differ a little more. First year leaving rates are 15.1 percent in the PSIS and 10.5 percent in the YITS among bachelor’s students, although they are a much closer 22.6 percent versus 20.4 percent among college graduates. This conforms to our expectation of possibly lower leaving rates in the YITS due to its likely response/attrition bias as discussed above and elsewhere in the paper. And the fact that these differences are greater among bachelor’s students than college students might be driven by the former being a more mobile group for whom going away to school (for example) – and hence perhaps also moving after leaving school and therefore being lost from the sample – is more common than is the case for college students.
Switcher rates are, conversely, a little higher in the YITS relative to the PSIS among bachelor’s students: 5.1 percent in the PSIS versus 7.8 percent in the YITS (while they are everywhere low for college graduates). The reason for this difference is less obvious.
One concern we had was that the PSIS was in fact not picking up all switchers, perhaps because it was not catching all students from one year to the next when they moved between institutions. In particular, if a student moved from one institution to another and was not linked across those two years by the record matching methods employed by Statistics Canada, the student would be counted as a leaver rather than a switcher in our analysis, which could in theory help explain the differences in both leaver and switcher rates: leaver rates in the PSIS being higher, and switcher rates lower.
Presented with these findings, Statistics Canada took our concerns to heart, and checked once again their linkage programs (including extensive checks of the underlying micro records) and concluded once again that the record matching exercise which was used to create the longitudinal file used in the analysis did indeed identify the desired linkages, and that the problem just described (i.e., missing linkages as students moved institutions) was not likely to be the source of the observed differences as hypothesized.
Given the power of the individual identifiers on the file (SINs, full name and birth date information) and Statistics Canada’s generally excellent track record in making such linkages based on their years of experience in doing so using a variety of different datasets across the bureau, it would probably have been surprising had missed linkages in fact been a major problem. The checks carried out affirm that supposition. At this point, therefore, we conclude that the YITS-PSIS differences remain unexplained thus far by the sort of particular longitudinal matching problems that have been suggested.
Assuming that the individuals for whom Statistics Canada has received records have in fact been correctly linked, this leaves the possibility of there being incomplete reporting on the part of at least some institutions in at least some years. If, however, this was an erratic reporting error over time, such as some individuals being missed in some years but not others at a given institution either because they were not reported at that institution or the entire institution did not report, we would expect continuing rates to be lower and leaving rates higher in the PSIS, with no clear implications for switching rates. But what we actually find is similar continuing rates, higher leaving rates, and lower switching rates in the PSIS. So this would not seem to explain the problem, although a variety of different biases might be trading off against each other, including the underlying response bias that one suspects has to affect the YITS results to at least some degree.
In the absence of any obvious explanations for the observed differences in switching rates – except that the lower leaver rates in the YITS can perhaps be at least partly explained by survey response bias – we conclude that while some differences remain between the YITS and PSIS results, the estimated persistence rates are generally close enough not to cause us to doubt the quality of the PSIS data or the analysis that has been carried out with those data in any fundamental way.
Let us now consider returning rates among leavers. First year returning rates among bachelor’s students are, at 20.0 percent (Tables 13 and 14), considerably lower than the 35.6 percent first year rate found with the YITS for all of Canada reported in Finnie and Qiu (2008). But the detailed breakdowns are interesting, and possibly revealing in terms of identifying the potential sources of the differences in findings between the two datasets.
The number who return to the same institution is very similar in the two analyses: 12.5 percent in the YITS as compared to the 11.9 percent found here with the PSIS. But we find considerably lower rates in the PSIS data among others: those who move institutions, including moves to institutions out of the original province.
One reason for these differences would be that the YITS data are for all Canada (breaking out the results for this dynamic for Atlantic Canada is not possible due to the limited sample sizes in the YITS), along with the other fundamental differences in the YITS data as described above (i.e., they are limited to individuals in their first programs, etc.)
Another reason for the differences in the two sets of findings would be that students who move to enter new programs in institutions outside of Atlantic Canada are – as mentioned earlier – not captured in the PSIS, and this may be a significant group among leavers/returners. After all, those who leave PSE and then return might be expected to be a generally geographically mobile group given the instability in this other part of their lives. (Again, the relevant numbers cannot be determined with any accuracy in the YITS due to its more limited sample sizes in a situation where relatively few individuals are involved overall).
Otherwise put, the PSIS numbers understate the number who return to PSE after leaving to the degree these returners are doing so out of the Atlantic region. This said, the YITS findings would themselves (again) likely be subject to sample bias. So what we observe is the result of these, and any other, potential data limitations and problems.
Resolving this issue will, like the general overestimation of leavers and underestimation of switchers that is inherent in the Atlantic-only nature of the PSIS, require an expansion of the file to include data from the other provinces to which Atlantic Canada students move when they return to school (thus affecting returning rates), as much as when they switch from one program to another (thus affecting leaving and switching rates).
Another means of checking the PSIS would be to link it to other data sets and to directly compare students’ PSE profiles in the two different sources. One such possibility would be an actual PSIS-YITS linkage. Since the PSIS is essentially a census of all PSE students in Atlantic Canada, it should be possible to find all those individuals in the YITS in the PSIS whenever they are located in any of those four provinces over the relevant period.
The PSE profiles of the PSIS-YITS students could thus be tracked independently in the two data sets to see where any differences in persistence profiles result, and why. Sample sizes would be limited due to the relatively small number of individuals from Atlantic Canada in the YITS (since it is a national level survey-based dataset), but such an exercise might nevertheless be revealing, and at least indicative of the potential sources of the differences in results across the two datasets, including those discussed above.
No proposal for such a linkage has yet been initiated, but if a good case could be made for it, such a project could at least in principle be undertaken. The purpose of the linkage could be kept restricted to providing checks of the PSIS, or could be pushed further to include having the linked file made available for analysis if the benefits of doing so could be successfully argued.
Another interesting possibility would be to link the PSIS to Statistics Canada’s Longitudinal Administrative Database, or LAD. The LAD is a longitudinal database constructed from individuals’ tax files which includes information on participation in PSE based on students’ declarations of the available tax credits. The LAD covers a random 20 percent of the population, meaning that 20 percent of those in the PSIS could, in principle, be linked to the LAD, but reasonable sample sizes would still result.
With a LAD-PSIS linkage, individuals could – comparable to the PSIS-YITS linkage just discussed – be followed jointly in the PSIS and the LAD and their PSE profiles compared. Again, the concerns raised above could be addressed. Do some of those identified as PSE leavers in the PSIS really continue in their studies, but they are not linked across years as is required to capture that dynamic? What is the extent of the hole left in the PSIS due to the restriction of its coverage to Atlantic Canada, and how many individuals in fact leave the Atlantic region while continuing their PSE studies and should therefore be classified as switchers rather than leavers? And what about those who return to school either directly, after leaving a program without graduating, or after graduating? In this way, essentially all the uncertainties that now exist with respect to the PSIS relating to these dynamics could be checked.
There is in fact already a record linkage proposal underway at Statistics Canada for a match of the LAD to the Atlantic Canada PSIS. This proposal was initially launched by the MESA project on PSE in which the authors are involved, supported by the Centre for Education Statistics at Statistics Canada, along with the Small Area and Administrative Data Division where the LAD are kept. Additional outside support for this linkage would, however (as in the case of the PSIS-YIITS linkage discussed above) significantly bolster its chances of being accepted since the public benefits of the linkage – as might be argued by external partners – have to be adequately demonstrated.
A LAD-PSIS linkage would – incidentally to the purposes of the research being presented here but fundamental to the overall benefit of the LAD-PSIS linkage as originally conceived – have the additional benefit of allowing us to attach the longitudinal-based family background information available in the LAD to any persistence analysis carried out with the PSIS. Family income, family type, where the person lived before pursuing PSE, and other such information are some of the variables that could be added to the analysis.
In addition, individuals included in any LAD-PSIS linkage could continue to be tracked in the years after they leave PSE through their (LAD-based) tax files, thus opening up the possibility of linking PSE experiences to later outcomes, including labour market experiences, demographic profiles (marriage and child bearing), savings, and more. These are in fact the main objectives of the LAD-PSIS linkage as originally conceived, while using the LAD to help verify the PSIS data is a more recent idea. All of these purposes would be served were the linkage made and the linked file made available to i) check the PSIS and ii) use the linked file for analysis.