Methodology

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Factor analysis results

Data sources

Geocoding

Mapping techniques

Spatial units

Multivariate analysis

Spatial autocorrelation

Factor analysis results

The factor analysis was produced using SPSS software. It was preferred over the principal components analysis in order to reveal latent factors (Costello and Osborne 2005). Since certain variables did not have normal distributions, the selected extraction method was that of the principal axis factors. Lastly, direct oblimin rotation was applied to clarify the factor structure while allowing the factors to be partially correlated, which corresponds more closely to the phenomena observed in the social sciences (ibid.).

In this study, 23 census variables were chosen for the factor analysis. The purpose of this selection process was to avoid redundancy of information between variables and to represent the main environmental characteristics identified by previous work for their association with crime (Fitzgerald et al. 2004; Savoie et al. 2006; Wallace et al. 2006; Kitchen 2006; Andresen and Brantingham 2007; Savoie 2008).

Six factors were identified. They were, in the order of extraction, residential mobility, young population, ethno-cultural diversity, socio-economic disadvantage, aging dwellings and commercial activity. It should be noted that the order of extraction depends on the variance of the entire data set and not on the importance of one factor in the differentiation of the DAs or the statistical association with crime rates.

The results show that inequalities in income (measured by such variables as property value, proportion of low-income families or median personal income) do not constitute a specific dimension of the residential space in Saskatoon. Indeed, these inequalities have many facets that are represented in this study by three distinct factors: residential mobility, socio-economic disadvantage and aging dwellings. In Saskatoon, the same comment holds true for the proportion of Aboriginal peoples in a DA: it is higher in neighbourhoods with a high socio-economic disadvantage, residential mobility and a high proportion of aging dwellings.

Despite relatively weak coefficients, there is significant correlation between the factors for the most part (Table 7). These correlations are evidence of the specificity of each of the factors, as well as of their interdependence.

Table 7
Pearson's correlation coefficient of factors, City of Saskatoon, 2001

The relative contribution of the variables to the factors is displayed in Table 8. Note that the higher the absolute value, the stronger the contribution. Therefore, a high negative value means there is a strong contribution, just as there is with a high positive value. In addition, high opposing values means that there are contrasts within a factor. For example, DAs with high residential mobility show high proportions of renters but low proportions of persons who had not changed their address since the last 5 years and vice versa. Lastly, the direction (+/-) of the contribution has only a relative meaning, that is, a negative value as opposed to a positive value.

Table 8
Contribution to factors matrix, City of Saskatoon, 2001

Top of Page

Data sources

Incident-based Uniform Crime Reporting Survey

The Incident-based Uniform Crime Reporting Survey (UCR2) collects detailed information on individual criminal incidents reported to the police, including characteristics of incidents, accused people and victims.

The UCR2 Survey allows a maximum of four offences per criminal incident to be recorded in the database. The selected offences are classified according to their level of seriousness, which is related to the maximum sentence that can be imposed under the Criminal Code.

Analyses of major offence categories (violent offences, property offences, other offences) undertaken in this report are based on the most serious offence in each incident, as are the crime rates published annually by the Canadian Centre for Justice Statistics (CCJS). In this type of classification, a higher priority is given to violent offences than to non-violent offences. As a result, less serious offences may be under-represented when only the most serious offence is considered.

When the analysis is focused on individual offence types, all incidents in which the offence is reported are included, whatever the seriousness or the ranking of the offence in the incident. In this study, incidents with assault, motor vehicle theft, break and enter, mischief, other theft and shoplifting are calculated that way. This method provides a more complete spatial representation of the different types of individual offences.

This report includes most Criminal Code offences and all offences under the Controlled Drug and Substances Act, but it excludes offences under other federal and provincial statutes and municipal by-laws. Also excluded are Criminal Code offences for which there is either no expected pattern of spatial distribution or a lack of information about the actual location of the offence. For example, administrative offences including bail violations, failure to appear and breaches of probation are typically reported at court locations; threatening or harassing phone calls are often reported at the receiving end of the call; and impaired driving offences may be more likely to be related to the location of apprehension (for example, apprehensions resulting from roadside stop programs).

Violent incidents include murder, attempted murder, sexual assault, assault, violation resulting in the privation of freedom, robbery, extortion, criminal harassment, utter threats to person, explosives causing death/bodily harm and other violations against the person.

Property incidents include arson, break and enter, theft over $5,000, theft $5,000 and under, motor vehicle theft, have stolen goods, fraud and mischief.

Census of population

The Census of Population provides the population and dwelling counts not only for Canada but also for each province and territory, and for smaller geographic units, such as cities or districts within cities. The census also provides information about Canada's demographic, social and economic characteristics.

The detailed socio-economic data used in this report are derived from the long form of the census, which is completed by a 20% sample of households.

The Census of Population is conducted by Statistics Canada every five years, most recently in 2006. To achieve the highest degree of compatibility between neighbourhood characteristics derived from the census and crime information, this report draws on police data from 2001 and census data from the same year. When the Saskatoon study was conducted, detailed data from the 2006 Census on population characteristics, in particular on individuals' income, were not yet available at the neighbourhood level.

Definition of variables

Dwellings built before 1961: Percentage of dwellings built before 1961.

Dwellings built after 1990: Percentage of dwellings built after 1990.

Dwellings needing major repairs: Percentage of dwellings needing major repairs (plumbing or electrical wiring, structural repairs to walls, floors or ceilings, etc.).

Rooms per dwelling: Average number of rooms per dwelling.

Average value of dwellings: Average dollar amount expected by the owner if the dwelling were to be sold.

Renters: Percentage of residents living in a dwelling not occupied by the owner.

Population aged less than 15 years: Percentage of residents less than 15 years of age.

Population aged 65 years and over: Percentage of residents aged 65 years and over.

Population living alone: Percentage of residents living alone.

Lone-parent families: Percentage of residents living in a lone-parent family.

Movers (1 year): Percentage of residents aged 1 year and over who were living at a different address one year earlier.

Stayers (5 years): Percentage of residents aged 5 years and over who were living at the same address five years earlier.

Automobile to work: Percentage of residents who worked the year preceding the Census, who indicated having a fixed workplace and who used a car as driver as a principal means of transportation to travel between home and place of work.

No high school diploma: Percentage of residents aged 20 years and over without a high school diploma.

University degree: Percentage of residents aged 20 years and over with a university degree.

Average years of schooling: Sum of the years of schooling at the elementary, high school, university and college levels (average for residents aged 15 years and over).

Unemployment rate: Percentage of residents unemployed expressed as a percentage of the labour force.

Average employment income: Average income from wages and salaries of residents aged 15 years and over with an income and working full-time.

Median employment income: Median income from wages and salaries of residents aged 15 years and over with an income and working full-time. It corresponds to the 50th percentile which divides in two halves the number of considered cases.

Government transfers: Percentage of total income composed of government transfers (benefits from Employment Insurance; benefits from Canada or Quebec Pension Plan; Canada Child Tax benefits; benefits from Social Assistance; etc.).

Low-income: Percentage of residents who are part of an economic family spending 20% more than average on food, shelter and clothing.

Aboriginal peoples: Percentage of residents who reported identifying with one Aboriginal group.

Recent immigrants: Percentage of residents who immigrated to Canada from 1991 to 2001.

Visible minority: Percentage of non-Aboriginal residents belonging to visible minorities.

Construction, manufacturing, transportation and warehousing: Number of workers from sectors 23, 31-33 and 48-49 of the North American Industry Classification System (NAICS) who works in the spatial unit.

Wholesale trade: Number of workers from sector 41 of the (NAICS) who works in the spatial unit.

Retail trade: Number of workers from sector 44-45 of the (NAICS) who works in the spatial unit.

Other services: Number of workers from sectors 51, 52, 53, 54, 55, 56, 71, 72 and 81 of the (NAICS) who works in the spatial unit.

Health care, social assistance and educational services: Number of workers from sectors 61 and 62 of the (NAICS) who works in the spatial unit.

Public administration: Number of workers from sector 91 of the (NAICS) who works in the spatial unit.

Density of workers in retail trade: Density, in km², of workers from sector 44-45 of the (NAICS) who works in the spatial unit.

Density of workers in accommodation and food services: Density, in km², of workers from sector 72 of the (NAICS) who works in the spatial unit.

Top of Page

Geocoding

Geocoding is the process of matching a particular address with a geographic location on the earth's surface. In this report, the address corresponds to the location of an incident that was reported to the police, after aggregation to the block-face level— that is, to one side of a city block between two consecutive intersections. This is done by matching records in two databases, one containing a list of addresses, the other containing information about the street network and the address range within a given block. The geocoding tool will match the address with its unique position in the street network. As the street network is geo-referenced (located in geographic space with reference to a co-ordinate system), it is possible to generate longitude and latitude values—or X and Y values—for each criminal incident. Where the incident location does not correspond to an address, geocoding is performed by creating a point on, say, an intersection of two streets or the middle of a public park. X and Y values in the criminal incident database provide the spatial component that allows points to be mapped, relative to the street or neighbourhood in which they occurred.

In 2001, the UCR2 Survey did not lend itself to collecting information on the geographic location of criminal incidents. For the purposes of this report, the Saskatoon police service sent to the CCJS the addresses of the incidents selected, reported and entered in the UCR2 database in 2001. This information was resolved by the CCJS into a set of geographical co-ordinates (X and Y) for each address. These co-ordinates were rolled up to the mid-point of a block-face in the case of specific addresses, and to intersection points in the case of streets and parks.

Saskatoon Police Service provided 23,711 selected incidents for 2001. The geocoding was successful for 92.5% of them. The low percentage of incidents that failed geocoding did not create a bias in offence trends. Incidents that failed geocoding contained information that was too vague, such as a bus number or a street without a civic number. In fact, geocoded offences and offences prior to geocoding both account for the same proportion of overall crime.

Top of Page

Mapping techniques

Kernel analysis is a method for making sense of a spatial distribution of crime data. This method makes it possible to examine criminal incident point data across neighbourhood boundaries and to identify areas where these incidents are concentrated. The goal of kernel analysis is to estimate how the density of events varies across a study area based on a point pattern. Kernel estimation was originally developed to estimate probability density from a sample of observations (Bailey and Gatrell 1995). In its application to spatial data, kernel analysis produces a smooth map of density values, where the density of each place corresponds to the concentration of points in a given area.

In kernel estimation, a fine grid is overlaid on the study area. Distances are measured from the centre of a grid cell to each observation that falls within a predefined region of influence known as a bandwidth. Each observation contributes to the density value of that grid cell based on its distance from the centre of the cell. Nearby observations are given more weight in the density calculation than those farther away. In this study, the grid cell size is 50 square metres and the research radius used is 500 metres.

The product of the kernel estimation method is a simple dot matrix (raster image) displaying contours of varying density. Contour loops define the boundaries of hotspot areas. Hotspots may be irregular in shape, and they are not limited by neighbourhood or other boundaries. This method of analysis was applied using the Spatial Analyst software of the Environmental Systems Research Institute.

Top of Page

Spatial units

Ecological studies such as those conducted in crime-mapping projects require a sufficiently large number of geographic units or neighbourhoods for the modelling of data to be effective and reliable. In previous studies, the geographic units used were locally-determined natural neighbourhoods (Winnipeg and Regina studies), census tracts (Montréal, Edmonton and Halifax) or dissemination areas (Thunder Bay).

The dissemination areas (DAs) of the 2001 Census were retained for the multivariate analyses in Section 3. It is the smallest standardized spatial unit to which data are disseminated.

DAs are small areas consisting of one or more blocks, with boundaries delimited by intersecting streets generally enclosing 400 to 700 residents. DAs must meet various delineation criteria designed to maximize their usefulness, including the following: DA boundaries respect the boundaries of census sub-divisions and census tracts (CT); DA boundaries follow roads as well as railways, water features and power transmission lines, where these features form part of the boundaries of census sub-divisions or CTs.

Only 328 of the 335 Saskatoon DAs are included in the analyses because average dwelling value were not available for the remaining seven, and that compromised the calculation of the factor scores. These seven DAs are scattered over the City of Saskatoon territory (Map 1). A total of 438 incidents were reported in these DAs (2% of all geocoded incidents). They have a collective crime rate of 63 incidents for 1,000 resident and workers, which is lower than Saskatoon's average of 72.

The official neighbourhoods of the City of Saskatoon were recreated by aggregation of census blocks. In a few exceptional locations, the neighbourhood boundaries as defined by the block-level aggregation do not exactly correspond to the official boundaries. These spatial units were used to develop the neighbourhood portraits discussed in Section 2.

Top of Page

Multivariate analysis

Ordinary least squares (OLS) regression is used to examine the distribution of violent and property crime rates as a function of the set of explanatory factors. The use of this method requires a continuous or quantitative outcome variable that has a normal distribution. As a number of variables studied here do not have normal distributions, it was necessary to submit the crime variables to normalizing transformations. The transformation needed was informed by the results of a Kolmogorov-Smirnov normality test. Variables have been normalized following this classification:

No transformation: Population aged less than 15 years; Median employment income; Density of workers in retail trade; Density of workers in accommodation and food services, Stayers (5 years); Rooms per dwellings;

Natural logarithm: Government transfers; Population living alone;

Square root: Dwellings built before 1961; Dwellings built after 1990; Dwellings needing major repairs; Average value of dwellings; Population aged 65 years and over; Member of a visible minority; Lone-parent family; Unemployment rate; Low-income; Renters, Recent immigrants; Movers (1 year); Aboriginal peoples; Violent crime rate; Property crime rate; Assault rate; Motor vehicle theft rate; Break and enter rate; Mischief rate; Shoplifting rate; Other theft rate.

The regressive models were developed using stepwise procedure. This method consists of a series of multiple regressions such that at each stage, the variable that accounts for the maximum remaining variance is added. At each stage, any superfluous variables are eliminated.

The standardized regression coefficients provide a means of assessing the relative importance of the different predictor variables in the multiple regression models. The coefficients indicate the expected change, in standard deviation units, of the dependent variable per one standard deviation unit increase in the independent variable, after controlling for the other variables. The maximum possible values are +1 and -1, with coefficient values closer to 0 indicating a weaker contribution to the explanation of the dependent variable.

Another aspect that must be taken into account in spatial analysis of data, such as crime data, is spatial autocorrelation (see "Spatial autocorrelation"). The presence of spatial autocorrelation is detected in the residuals of the OLS regression models for assault rate (Moran's I of 0.11; p<0.01), motor vehicle theft rate (Moran's I of 0.09; p<0.01), break and enter rate (Moran's I of 0.17; p<0.01), mischief rate (Moran's I of 0.07; p<0.05) and other theft rate (Moran's I of 0.09; p<0.01).

Top of Page

Spatial autocorrelation (by Krista Collins)

Data measured over a two-dimensional study area, such as the geocoding of criminal incidents, are often affected by the properties of the location in which they reside. If adjacent observations are affected by the same location properties, the observations will not be independent of one another. This lack of independence must be accounted for in the data analysis to produce accurate and unbiased results. This is accomplished through spatial modelling of data and is important for any dataset where there is a potential effect of location.

It is known that crime is not evenly distributed across cities and that it is concentrated in particular areas known as hot-spots. This is an initial indication that there might be a location effect in crime data can be seen by examining a map of crime density in city neighbourhoods. There could be a positive effect where areas with high crime rates are surrounded by other areas with high crime rates and areas with low crime rates that are adjacent to other areas with low crime rates. A negative location effect results from areas of low crime being surrounded by areas with high crime and vice versa. Either scenario indicates some sort of spatial structure or spatial dependence in the data, signifying that the neighbourhoods have an influence on each other. If the spatial structure of the data is not explained by the variables in the regression model, then there will be spatial effects in the model error terms. This phenomenon, which is known as spatial autocorrelation, violates the assumptions made in a standard regression analysis. The location effects must instead be accounted for in the multivariate model, to ensure accurate estimation of the regression coefficients and their associated variances.

For the purpose of spatial modelling, a definition of what constitutes neighbouring locations needs to be specified. In this analysis, a contiguity structure that includes all common borders or vertices that touch between the boundaries of the regions (first order queen contiguity) is used to define regions as neighbours of each other. The neighbourhood structure defines which locations have a potential influence on each other, the neighbours, and rules out any potential influence of regions that are not considered to be neighbours. The neighbourhood structure is used to test for spatial autocorrelation and to specify the spatial component in the autoregressive spatial model.

The basic process of modelling spatial data is to first fit a standard least squares regression model to the data and then test the error terms for the presence of spatial autocorrelation. This is done by a statistical test called Moran's I, which tests whether the error terms are randomly distributed over the study area. The value of the Moran's I statistic ranges from 1 to -1. A value approaching 1 indicates the presence of positive spatial autocorrelation, where regions with large error terms are adjacent to other areas with large error terms. A negative value near -1 indicates the presence of negative spatial autocorrelation, where regions with large error terms are neighbouring regions with small error terms. A value near zero indicates the absence of spatial autocorrelation. The significance of Moran's I statistic is determined by a random permutation approach, where a significant result indicates that there is spatial autocorrelation in the model error terms.

When spatial autocorrelation is detected in the residuals from a standard least squares regression model, a spatial model must be fit to the data instead. The spatial model provides the same analysis of the neighbourhood characteristics as the least squares model but adjusts for the spatial effects. This can be done in one of two ways: by adding an extra term to represent the effect of neighbouring locations or by modelling a spatial process in the error terms. In the former model, called the spatial lag model, a direct effect of the crime rate in neighbouring locations is assumed. In this case the average value from all neighbouring locations, termed the spatial lag, is added to the regression model to represent the direct effect of the neighbouring regions. The other model, termed the spatial error model, assumes the relationship between crime rates in adjacent neighbourhoods is the result of the same relationship of the explanatory variables in the adjacent neighbourhoods. Thus the spatial autocorrelation, detected in the standard regression model, is the result of spatially autocorrelated variables not present in the model. To determine the appropriate type of spatial model to use for any given dataset, the data are empirically tested to determine the structure of the spatial dependency.

The results from a spatial regression analysis are essentially the same as other multivariate regression analyses. The regression coefficients represent the change in the crime rate for a unit change in the variable, when all other variables are held constant. Since the variables representing the neighbourhood characteristics are standardised, the size of their regression coefficients denote their relative contribution to the prediction of crime. The spatial lag and spatial error regression coefficients, however, cannot be explained in the same way. The spatial lag coefficient in part represents the effect of neighbouring locations but also accounts for some of the measurement error in using administrative units to define the neighbourhoods. Thus there is no direct interpretation of the spatial lag coefficient. Similarly, the spatial error coefficient represents a nuisance parameter in the model and has no direct interpretation. Rather, the spatial term is only retained in the model to make the other results accurate.

The overall fit of the spatial models is assessed by the squared correlation between the observed crime rate in each neighbourhood and the values predicted using the spatial model. This squared correlation is equivalent to the coefficient of determination (R²), commonly used in standard regression models, where it represents the proportion of the variation explained by the regression model. However, in the presence of spatial autocorrelation the squared correlation between the observed and fitted values does not have the same interpretation. Rather, it represents the relative fit of the model. A value of 1 would represent a perfect fit of the model and values near zero indicate a poor predictive power of the model.

To ensure the spatial autocorrelation has been adequately accounted for in the model, the residuals from the spatial model are tested for the presence of spatial autocorrelation. This is done using Lagrange Multiplier tests, which test for the presence of spatial error dependence in the spatial lag model and for a missing spatial lag variable in the spatial error model. If the statistical test is not significant, it indicates the spatial dependence in the data has been accounted for in the model. In this study, the estimates of spatial autocorrelation and the autoregressive spatial models were computed using a custom-made SAS program.