Statistics Canada
Symbol of the Government of Canada

Site navigation menu

Methodology

Data source: The Incident-Based Uniform Crime Reporting Survey
Study population
Population at risk
Selection, matching, and weighting of UCR2 records

Data source: The Incident-Based Uniform Crime Reporting Survey

This report uses data from the Incident-Based Uniform Crime Reporting Survey, or "UCR2", which captures detailed information about each incident recorded by the information systems of participating police services. The main limitation of the UCR2 as a data source is that it has not yet achieved full national coverage of Canada. It is being phased in over time, as police services in Canada are able to implement changes to their records management systems which will accommodate the enhanced information requirements of this survey. The coverage of the UCR2 has increased, since its inception in 1988, to 122 police services and detachments in 9 provinces, accounting for 71% of the population of Canada, in 2005. However, in order to study the development of delinquency over the period from 1995 to 2005, the data in this study are taken from a subset of police services in 6 provinces which have reported consistently to the UCR2 since 1995.1 The selected police services accounted for approximately 52% of the population of Canada during the period from 1995 to 2005.

The police services included in this study are concentrated in the province of Québec and urban Ontario. Of the 61 police services and detachments included in the study, 39 are in Québec, and represent close to 100% coverage of that province. Ten are municipal police services in cities in Ontario, providing coverage of 51% of the population of that province. The remaining 12 respondents in the study are municipal police services in cities and towns in New Brunswick (5 services; 16% coverage of the provincial population), Saskatchewan (3 services; 42% coverage), Alberta (3 services; 54% coverage), and British Columbia (1 service; 14% coverage). However, the numbers of police services give a misleading picture of the geographical distribution of the coverage of the data, because many in the province of Québec provide policing services to relatively small populations. In terms of the population in 2005 of the parts of Canada covered by this report, police services in the province of Québec account for 47%, in Ontario for 35%, in Alberta for 11%, in British Columbia for 4%, in Saskatchewan for 3%, and in New Brunswick for 1%. Therefore, the findings presented in this report apply mainly to the province of Québec and to urban Ontario, and to a lesser extent to certain cities and towns in New Brunswick, Saskatchewan, Alberta, and British Columbia. However, the overall youth crime rate and distribution of types of recorded youth crime in the parts of Canada included in this study do not differ substantially from the youth crime rate and types of youth crime reported for Canada as a whole. This issue is discussed under the heading, "Delinquency in the Study Population and the National Population", above.

The UCR2 Survey has one record for each person who has been identified by police as "chargeable" in relation to a criminal incident.2 A "chargeable" person3 is defined as one who "has been identified by police as being involved in a criminal incident and against whom an information [i.e. charge] could be laid as a result of sufficient evidence/information" (Canadian Centre for Justice Statistics, 2004: 80). A variable on this record indicates whether or not the chargeable person was in fact charged. All chargeable children and young persons, whether charged or not, are included in this study. The UCR2 Survey allows for up to four violations to be coded for each incident. If more than four violations occur in an incident, the four most serious are selected. These are recorded in the UCR2 in order of seriousness. Unless otherwise noted, the analyses in this report which involve the type of offence committed in the incident are based on the most serious recorded violation. As a result, less serious offences may be under-represented.

Study population

The study begins in 1995 because prior to that year, the UCR2 covered less than 50% of the population of Canada. Therefore, 11 years of data (1995 to 2005) were available. However, only 10 years of data were used for each individual, since each person was tracked from his or her birthday in 1995 to the birthday in 2005, so that the age range for each individual in the cohort would be the same. The cohort born in 1987 were selected for analysis because these persons had their 18th birthday in 2005. Thus it was possible to track them to the end of the period during which they were legally defined as "young persons"; that is, to the day before their 18th birthday. The earliest birthday from which they could be tracked was the 8th birthday, occurring during 1995.

Other research has established that very little recorded crime is committed by persons less than 8 years old. Therefore, study of the 1987 birth cohort alone was expected to capture the great majority of recorded crime involving children and young persons. Nevertheless, it was considered desirable to study the development of criminal behaviour from the earliest possible age. Therefore, persons born in 1990 were also included in the study population, in order to begin the study at the 5th birthday.4 This also provided a substantial period of "overlap", from the 8th to the 15th birthdays, when data were available for both cohorts, which allowed verification of results by comparing results calculated separately for the two cohorts.

Population at risk5

The prevalence of, or participation in, recorded criminal behaviour is usually expressed in the criminal careers literature as the proportion of the cohort who committed an offence at a given age (age-specific prevalence), or ever committed an offence up to a given age (age-specific cumulative prevalence), or ever committed an offence during the period of observation (overall or lifetime prevalence). Calculation of such prevalence estimates requires both the number of persons who exhibited the behaviour, and the number of persons at risk of exhibiting it — the eligible population at risk.

Using the UCR2 data, one cannot track exactly the same group of individuals for the 10-year-period of observation.  Each year, some individuals either immigrate to or emigrate from Canada or the parts of Canada included in the study, or move between the provinces under study. Therefore, the offenders included in this study, and the larger population from which they are drawn, include persons who were present in the study area for less than the entire 10-year period of observation. Consequently, determining the exact total eligible populations at risk of committing a recorded offence is not possible.  However, population data provided by Statistics Canada for each age and sex in the selected provinces may be used to approximate the populations at risk.

As a result of net migration, the total population of the 1987 and 1990 birth cohorts in the parts of Canada included in the study experienced a small but steady net growth between 1995 and 2005.  The 1987 birth cohort increased from 179,000 8 year olds in 1995 to 209,000 18 year olds in 2005. This represents an average annual compounded increase of 1.6%, or an overall increase of 17% in the size of this cohort. The 1990 birth cohort increased from 193,000 5 year olds in 1995 to 216,000 15 year olds in 2005. This represents an average annual compounded increase of 1.1%, or an overall increase of 12% in the size of this cohort.

Age-specific prevalence rates are calculated using yearly population data to determine the approximate population of males and females in each specific year for that corresponding age group. Thus, changes in the population are not considered problematic because any gains or losses—through migration or death—are taken into account.  However, when calculations of overall prevalence are concerned, the changing denominator (size of the total eligible population at risk) becomes problematic.

For purposes of estimating overall prevalence, the study utilizes the largest approximate population—the number of 15 year olds in 2005, for the 1990 birth cohort, and the number of 18 year olds in 2005 for the 1987 birth cohort—in its calculations. This approach accounts not only for the stable component of the original cohort size, but also the net growth experienced over time. This approach is also used in the companion report (Carrington et al., 2005). Lee (1999) used a similar approach and rationale in determining the total eligible population in presenting overall prevalence estimates in a study of youth crime trends in British Columbia for four separate cohorts. An alternative, and less desirable, method uses the number of live births in the cohort birth year as an approximation of cohort size throughout the time period under study (see Prime et al., 2001 for an example of this use).

Selection, matching, and weighting of UCR2 records

After selecting police services as described above, all records were selected which pertained to offences committed during 1995 to 2005 by individuals born in 1987, during the period from the 8th birthday to the day before the 18th birthday, and by individuals born in 1990, during the period from the 5th birthday to the day before the 15th birthday. The birthdays were used as the start and end dates for offence selection in order that each individual would be observed for the same length of time and age range (though not exactly the same period) "at risk" of committing crimes. Provincial and municipal offences were not included. A detailed breakdown of the offences which were included is given in Appendix Table A.1.

Records pertaining to the same individual were linked together in order to create the individual's delinquent career as the unit of analysis. This was not straightforward, since there is no unique person identifier in the UCR2.  Record matching is done using the province, person's name, date of birth, and sex. This raises the issue of potential false positive matches. Different people have the same name, date of birth and sex. Furthermore, the accused person's name is not recorded as such in the UCR2—it is encoded in a 4-character Russell Soundex code, which is not unique:  many names are encoded with the same Soundex code.6  Thus, matching on the Soundex code, date of birth and sex could result in false positive matches: records for different people would be erroneously treated as pertaining to a single person.  The result would be an underestimate of the number of different offenders and an overestimate of the numbers of incidents in their careers.

False negatives—where two records should be matched but are not—are also a potential problem in record matching. A false negative could occur if police records contained more than one name for the same person; for example, if a person changed his or her name during the observation period, or used an alias, or if the name was misspelled. An incorrect record of the date of birth or sex would also result in a false negative. A false negative would also occur if the person committed crimes in more than one province, since all matching was done within provinces. Matching could have been done across the entire set of provinces which constituted the study population, in order to maintain the integrity of careers which crossed provincial boundaries, but this would have exacerbated the problem of false positives by increasing the size of the "pool" of persons being matched (see below for the relationship between the size of the pool and the probability of false positives).

An analysis of the probability of false positive matches was conducted by methodologists at Statistics Canada by determining the rate of occurrence of each Soundex code in the populations of the provinces in the study, using electronic telephone directories. This enabled the calculation, for each Soundex code, of the expected rate of false positives, when it was used for matching in combination with birth date and sex.  Soundex codes vary greatly in their vulnerability to false positive matches, since some encode very common names and others do not. 

The probability of false positives is directly related to the number of records being matched, which is approximately proportional to the population of the geographical area, and the number of years, within which matching is being done. There would be many false positives if records for many years for all of Canada were being matched, and few or none if records were matched for only a few years within one town. Thus, in a study such as the present one, where the number of years of data for which records are matched is fixed at 10 years, the "match quality" or "match efficiency" (i.e. non-vulnerability to false positives) of Soundex codes is related both to the commonness of the names which they encode, and to the population of the area within which matching is being done.

On the basis of this analysis, four categories of the match efficiency of Soundex codes were defined:

  • 0 – The code is rare enough that there is 99% or better match efficiency rate.
  • 1 – 95% to 99% match efficiency rate.
  • 2 – 90% to 95% match efficiency rate.
  • 3 – Less than 90% match efficiency rate.

"Match efficiency" refers to the expected absence of false positives; e.g. 99% match efficiency means that 1% of matches are expected to be false positives, and "99% or better" means that 1% or fewer false positive matches are expected.

Records (and therefore persons) whose Soundex codes had a 95% or better match efficiency (i.e. match quality codes of 0 or 1) were deemed to have an acceptable expected rate of false positive matches. The actual – rather than expected - match efficiency of the records with Soundex codes with a match quality code of 2 was assessed by linking records within the same province which had the same Soundex code, date of birth and sex, regardless of the match efficiency of the Soundex code. Duncan's multiple-range tests of the statistical significance of the differences in the mean numbers of records for the aggregated careers of "individuals"7 with Soundex codes of match quality codes of 0, 1, and 2 were performed separately for each province. If Soundex codes with a match efficiency code of 2 were more vulnerable, in practice as well as in principle, to false positive matches than those with match quality codes of 0 or 1, then matching on these codes would be more likely to aggregate together the records pertaining to different individuals, which would result in a higher mean number of records per "individual" career.

The result of this analysis for individuals born in 1987 was that "individuals" whose Soundex codes had a match efficiency code of 2 did not have a significantly different number of records from "individuals" whose Soundex codes had match efficiency codes of 0 or 1, in any province.  For those born in 1990, "individuals" whose Soundex codes had a match efficiency code of 2 had a significantly different number of records from "individuals" whose Soundex codes had match efficiency codes of 0 or 1 only in Saskatchewan, where the mean numbers of records per "individual" with Soundex match quality codes of 0, 1, and 2 were 1.7, 1.8, and 2.9, respectively. On the basis of this analysis, records with a Soundex match quality code of 0, 1, and 2 were retained for all provinces for persons born in 1987; and for persons born in 1990 except in Saskatchewan, where records with a Soundex match quality code of 2 were eliminated. All records with a Soundex match quality code of 3 were eliminated, for persons in both birth cohorts.

The rationale for simply eliminating records with Soundex codes which are unacceptably vulnerable to false positive matches is that, as a record selection criterion, Soundex codes (representing persons' names) are presumed to be unbiased with respect to criminal behaviour. A person with a common name such as John Smith, whose Soundex would probably have a match quality code of 3, is no more or less likely to have a criminal career, or a career with particular characteristics, than a person with an uncommon name and a Soundex match quality code of 0. Thus, the records with Soundex match quality codes of 0, 1, and 2 constitute a subset which is presumed to be representative of the entire population with respect to the phenomenon under study (criminal behaviour).

Records with the same province, Soundex code, date of birth and sex were then aggregated into person (career) records. To compensate for the deletion of records with a Soundex match quality of 3 (and of 2 for Saskatchewan for persons born in 1990), each person record was assigned a weight which was the inverse of the selection ratios. Therefore, all numbers of persons cited in the report are based on selected subsets of records, which are weighted in order to reproduce the original number of records.


Notes

  1. In fact, the number and identities of the police services included in the study changed slightly each year, because of mergers and closings of individual police services, particularly in the province of Québec. However, the selection process was done in such a way that the geographic areas which were under the jurisdiction of the police services included in the study remained approximately constant over the period of observation.
  2. The concept of the criminal incident used in the UCR2 is similar to the criminal event in criminology: that is, it occurs at a particular time in a particular place (Kennedy and Sacco, 1996). Normally, it is an activity captured in one police occurrence report. However, the Canadian Centre for Justice Statistics (2006) uses a definition of the incident which may differ in certain circumstances from the criminal event or the police occurrence: for example, traffic and non-traffic violations are scored as separate incidents even if they are perpetrated by the same offender at the same time and place. On the other hand, if the same offence is committed repeatedly against the same victim by the same offender over a long period of time, and only comes to the attention of the police at one point in time, that is recorded as one incident.
  3. The term used by the Canadian Centre for Justice Statistics is "charged/suspect-chargeable (accused)".
  4. The UCR2 records crime by alleged offenders as young as 3 years of age, but examination of data for 3 and 4 year olds born in 1992 indicated that there were so few of them that any conclusions concerning them would be unreliable.
  5. This section is based on the corresponding section of the companion report (Carrington et al., 2005).
  6. See Armstrong (2000) for details of the Soundex code, and a discussion of the issues surrounding its use in record matching.
  7. The quotation marks are used because these "individuals" may instead be more than one person, erroneously aggregated together.