Customized duration data construction: An example of deriving unemployment insurance variables using SPSS

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

by Kailing Shen1

Abstract
Introduction
Outline of Constructing UI Spells from SLID
General procedures of duration data manipulation
Conclusion

Abstract

Developed initially for the author's research on Unemployment Insurance (UI), this article summarizes a set of procedures for constructing customized duration data, using SPSS software and the Survey of Labour and Income Dynamics (SLID). These procedures could be used to merge, deduce, or match multiple duration datasets.

1. Introduction

The Survey of Labour and Income Dynamics (SLID) is one of the most important Canadian panel data for labour market studies, for which the impacts of unemployment insurance (UI) policy has remained one of the central issues.  But there is one major data challenge in using SLID for UI studies – derivation of UI spells2 and UI variables.3

This paper provides an outline of the data construction procedures I developed while using SLID for UI studies. It also provides a general set of procedures for customized duration data construction.  Although the examples here are closely related to UI and labour market studies, the intuitions behind are expected to be useful in other areas as well.
 
In section 2, I first outline the overall process of UI spells derivation. Then, section 3 presents a set of programming procedures for duration data manipulation. Finally, section 4 concludes by discussing the possibility of applying the techniques here using computer languages other than SPSS.

2. Outline of Constructing UI Spells from SLID

Some explanations are in order here:

First, the SLIDret application is provided by Statistics Canada for retrieving SLID data, which is organized as a relational database. Each SLID query has to specify whether the query is in terms of person, person-job, or person-job absence, etc. Since we are using all of these types of data, multiple SLIDret queries are needed. Also, there is a limit on the maximum number of variables that can be included in each query.  This also means several smaller sized queries are often easier to get than one big one.4

Second, all the spells referred above are defined in calendar terms, starting date and ending date. The term 'merge' means to add the set of calendar dates covered by multiple spells together, while 'deduct' means to delete one set of dates from another set of dates (if there are overlappings).

Third, several -- but not all -- institutional features of the Canadian unemployment insurance program are considered. Specifically, spells not eligible for receiving UI benefits are excluded from the Observation Windows (OWs); and, the connection of Paid Working Spells (PWSs) separated by less than 15 days is meant to match the two-week waiting period of any initial UI benefit payment period.

Last, the SLID does have an individual UI region variable, but it is uniformly defined according to the UI rules in 1996 June. Since 1993, there are 3 different sets of boundaries in effect: 1994 July, 1996 June, and 2000 July. Therefore, here I use postal code data to backup the applicable UI regions at each point of time.

Overall, the derivation of UI variables are closely based on the creation of OWs and UI eligible spells, that is, PWSs. Once the spells are constructed, it is straightforward to derive each individual worker's weekly UI treatment variables by matching these spells with on-going UI unemployment rates.

3. General procedures of duration data manipulation

Many of the steps outlined in section 2 involve duration data manipulation, such as merging, deduction, and matching (that is to create correspondence between two logically independent or dependent sets of duration data sets, such as OWs and PWSs, Paid Job Working Spells (PJWs) and PWSs). These types of operations are expected to make it possible for researchers to fully utilize the potential of micro-panel data by creating customized spells. Unfortunately, there isn't much documentation in this area. Here are some of the related programming procedures I developed using SPSS.

Merging spells with no overlapping

Let each record in file a.sav have 3 fields: personid, startdate, and enddate. We know for sure for each personid, there is no overlapping of the periods covered by different records. We want to merge the spells within each personid, so that two spells, A and B, are merged together if and only if the next day of spell A's enddate is the startdate of spell B. This could be done as follows,

Example 1

/*==step one: reshape the input data set==*/.
get file='a.sav'.
varstocases /make date from startdate enddate /index=datef.
compute datef=(3-2*datefsf).
execute.
/* now each record has 4 fields:*/.
/* personid date datef (1 if start date; -1 if end date)*/.
/*==step two: calculate lead and lag of date==*/.
sort cases by personid date datef.
split file separate by personid.
create /d_lag=lag(date 1) /d_lead=lead(date 1).
split file off.
/*==step three: select subset of dates ==*/.
compute fs=1.
if(date= date.yrday(xdate.year(d_lag),xdate.jday(d_lag)+1))&(datef=1) fs=0.
if(date= date.yrday(xdate.year(d_lead),xdate.jday(d_lead)-1))&(datef=-1) fs=0.
select if(fs=1).
execute.
/*==step four: generate spell id for output data set ==*/.
split file separate by personid.
date O 1 2.
split file off.
compute spellid=cycle_.
compute dateindex=obs_
execute.
/*==step five: reshape the output dataset ==*/.
casestovars /id=personid spellid/index= dateindex.
rename variables date.1=strdate date.2=enddate.
save outfile='merged_a.sav' /keep=personid spellid strdate enddate.

Merging spells with overlaps

More generally, the step 2 and 3 could be modified to merge spells that potentially overlap. This is done by calculating the number of active spells at each critical date.

Figure 1 gives a visual illustration of this procedure. The thick line segments represent the merged spells while the thin ones are the initial spells. Each initial spell is transformed to a pair of signed flags. Then the merged spells are constructed from those time points where the cumulative summations of flags are 1s and 0s.

Figure 1 Spells time line. Opens a new browser window.

Figure 1 Spells time line

Example 2

/*==step one: reshape the input data set==*/.
/*==step two: calculate lead and lag of date==*/.
sort cases by personid date datef.
aggregate outfile='temp.sav'/break=personid date  /sumf=sum(datef).
get file='temp.sav'.
split file separate by personid.
create /csumf=csum(sumf).
create /cf_lag=lag(csumf 1) /cf_lead=lead(csumf 1).
split file off.
/* csumf is the number of active spells at each date*/.
/* cf_lag is the preceding date's number of active spells*/.
/* cf_lead is the succeeding date's number of active spells*/.
/*==step three: select subset of dates ==*/.
compute fs=1.
if(csumf>0&cf_lag>0) fs=0.
if(csumf=0&(missing(cf_lag)|(cf_lag=0))) fs=0.
if(csumf<0&cf_lead<=0) fs=0.
select if(fs=1).
execute.
/*==step four: generate spell id for output data set ==*/.
/*==step five: reshape the output dataset ==*/.

Deducing one type of spell from another

Deduction happens when we want to create job working spells from job spells and job absence spells. This is solved by taking start/end of job absence spells as ending/start of a job working spells.5  The following example shows how to deduce spells in b.sav from a.sav.

Example 3

/*==step one: reshape the input data sets==*/.
get file='a.sav'.
compute flaga=1
add files /file=* /file='b.sav'.
if(missing(flaga)) flaga=0.
execute.
varstocases /make date from startdate enddate /index=datef.
compute datef=(3-2*datefsf).
If(flaga=0) datef=-datef.
execute.
/*==step two: generate spell id for output data set ==*/.
split file separate by personid.
date O 1 2.
split file off.
compute spellid=cycle_.
compute dateindex=obs_
execute.
/*==step three: reshape the output dataset ==*/.
casestovars /id=personid spellid/index= dateindex.
rename variables date.1=strdate date.2=enddate.
save outfile='deduct_b from a.sav' /keep=personid spellid strdate enddate.

Matching two types of spells

This could happen if we need to know which jobs the workers worked in each employment spell, or which initial spells correspond to each merged spell in figure 1. The following example shows how to match spells in b.sav to a.sav.

Example 4

/*==step one: reshape the input data sets==*/.
get file='a.sav'.
compute flaga=1
execute.
varstocases /make startdate from startdate enddate /index=datef.
add files /file=* /file='b.sav'.
if(missing(flaga)) flaga=0.
If(flaga=1)  datef=(3-2*datefsf)*spellid.
If(flaga=0) datef=0.
execute.
/*==step two: sort the merged data and generate the linking id==*/.
sort cases by personid(a) date(a) datef(d).
split file separate by personid.
create /cspellid=csum(datef).
split file off.
/*==step three: select the proper records and save==*/.
select if(flaga=0).
compute spellid= cspellid.
execute.
save outfile='match_b to a.sav' /keep=personid spellid strdate enddate.

4. Conclusion

This article provides an outline for constructing UI variables using SLID data. It also provides a set of procedures for duration data manipulations using SPSS software. Specifically, the ability of SPSS to restructure a dataset to compute lag and lead values are of critical importance here.  Therefore, it relies on finding similar techniques to implement the ideas presented here in other computer languages, such as SAS or STATA.


Notes

  1. I thank Darren Lauzon for helpful comments and suggestions.
  2. Unfortunately the weekly vector of labour force status information readily available in SLID isn't useful in the derivation of UI spells.  The reason is that UI only concerns paid employment, while SLID's definition of employment includes paid-employment as well as self-employment and other forms of unpaid employment.
  3. The derivations of UI spells and that of UI variables are closely related as UI variables are always defined as: at week X of some UI employment/unemployment spell. Furthermore, the precision of UI variables closely depends on that of the UI spells derived.
  4. This could be a serious problem for monthly variables. For example, there will be 72 (6 years * 12 month) variables for schooling flag. I created a separate query for this variable only, to accommodate the limitation of SLIDret.
  5. The technique shown here only applies to the simplest case and will only be valid if: 1) each person has only one spell in a.sav; 2) for each person, his spells in b.sav is within his master spell in a.sav. Otherwise, the procedure has to be modified by applying matching techniques shown below.