Methodology

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Data sources
Geocoding
Definition of neighbourhoods
Description of variables
Multivariate analysis
Spatial autocorrelation
Normalization techniques

Data sources

Incident-based Uniform Crime Reporting Survey

The Incident-Based Uniform Crime Reporting Survey (UCR2) collects detailed information on individual criminal incidents reported to the police, including characteristics of incidents, accused people and victims.

The UCR2 Survey allows a maximum of four offences per criminal incident to be recorded in the database. The selected offences are classified according to their level of seriousness, which is related to the maximum sentence that can be imposed under the Criminal Code.

Analyses of major offence categories (violent offences, property offences, drug-related offences and other Criminal Code offences) undertaken in this report are based on the most serious offence in each incident, as are the crime rates published annually by the Canadian Centre for Justice Statistics (CCJS). In this type of classification, a higher priority is given to violent offences than to non-violent offences. As a result, less serious offences may be under-represented when only the most serious offence is considered.

The majority of analyses in this report are based on major offence categories, such as violent offences and property offences, and take into account only the most serious offence in each incident. However, when the analysis is focused on individual offence types, all incidents in which the offence is reported are included, whatever the seriousness or the ranking of the offence in the incident. This method provides a more complete spatial representation of the different types of individual offences.

This report includes most Criminal Code offences and all offences under the Controlled Drug and Substances Act, but it excludes offences under other federal and provincial statutes and municipal by-laws. Also excluded are Criminal Code offences for which there is either no expected pattern of spatial distribution or a lack of information about the actual location of the offence. For example, administrative offences including bail violations, failure to appear and breaches of probation are typically reported at court locations; threatening or harassing phone calls are often reported at the receiving end of the call; and impaired driving offences may be more likely to be related to the location of apprehension (for example, apprehensions resulting from roadside stop programs).

Census of population

The Census of Population provides the population and dwelling counts not only for Canada but also for each province and territory, and for smaller geographic units, such as cities or districts within cities. The census also provides information about Canada's demographic, social and economic characteristics.

The detailed socio-economic data used in this report are derived from the long form of the census, which is completed by a 20% sample of households. These data exclude the institutional population, that is, people living in hospitals, nursing homes, prisons and other institutions.

The Census of Population is conducted by Statistics Canada every five years, most recently in 2006. To achieve the highest degree of compatibility between neighbourhood characteristics derived from the census and crime information, this report draws on police data from 2001 and census data from the same year. When the Edmonton, Halifax and Thunder Bay studies were conducted, detailed data from the 2006 Census on population characteristics, in particular on individuals' income, were not yet available at the neighbourhood level.

Land use data

Land use data were used to calculate the proportions of neighbourhoods with commercial, multi-family residential and single-family residential zoning. Land use data show the actual use of urban lands, whereas zoning data reflect planned and legislated use. Land use parcels were aggregated to the neighbourhood level to calculate proportions.

  • Edmonton
    Zoning data for the City of Edmonton include several categories that do not correspond to the categories used in other cities. For example, the West Edmonton Mall is classified under "site specific development control provision," a category that is not found in the other cities studied and cannot be matched with any of their categories. To deal with this situation, the variable "retail trade worker" from the 2001 Census was used as an indicator of commercial land use.
  • Halifax
    Zoning data come from the services and geographical information systems of the Halifax Regional Municipality.
  • Thunder Bay
    The zoning data come from the Planning Division of the City of Thunder Bay.

Geocoding

Geocoding is the process of matching a particular address with a geographic location on the earth's surface. In this report, the address corresponds to the location of an incident that was reported to the police, after aggregation to the block-face level— that is, to one side of a city block between two consecutive intersections. This is done by matching records in two databases, one containing a list of addresses, the other containing information about the street network and the address range within a given block. The geocoding tool will match the address with its unique position in the street network. As the street network is geo-referenced (located in geographic space with reference to a co-ordinate system), it is possible to generate longitude and latitude values—or X and Y values—for each criminal incident. Where the incident location does not correspond to an address, geocoding is performed by creating a point on, say, an intersection of two streets, a subway station or the middle of a public park. X and Y values in the criminal incident database provide the spatial component that allows points to be mapped, relative to the street or neighbourhood in which they occurred.

In 2001, the UCR2 Survey did not lend itself to collecting information on the geographic location of criminal incidents. For the purposes of this report, the Edmonton, Halifax and Thunder Bay police services sent the CCJS the addresses of the incidents selected, reported and entered in the UCR2 database in 2001 and 2003. This information was resolved by the CCJS into a set of geographical co-ordinates (X and Y) for each address. These co-ordinates were rolled up to the mid-point of a block-face in the case of specific addresses, and to intersection points in the case of streets, parks and subway stations. All addresses of criminal incidents that were reported more than five times but failed the automated geocoding process were geocoded manually so as to represent crime concentrations as accurately as possible. The low percentage of incidents that failed geocoding did not create a bias in offence trends. Incidents that failed geocoding contained information that was too vague, such as a bus number or the trans-Canada registration. In fact, geocoded offences and offences prior to geocoding both account for the same proportion of overall crime.

The Edmonton Police Service sent more than 58,800 selected incidents for 2001 and more than 69,700 for 2003. Geocoding was successful in more than 93% of the 2001 data and more than 92% of the 2003 data.

For its part, the Halifax Police Service sent more than 22,600 selected incidents in 2001. Geocoding achieved a success rate of 92%.

The Thunder Bay Police Service sent nearly 7,000 selected incidents in 2001 and more than 7,300 in 2003, of which respectively 98% and 93% were geocoded.

Mapping techniques

In this report, the method of representing crime and the other aspects analysed consists of a constellation of points, where each point corresponds to a criminal incident or a residential address of an accused. This method shows high-density crime locations or 'hot spots.'

Mapping hot spots: Kernel analysis

Kernel analysis is an alternative method of making sense of the spatial distribution of crime data. This method makes it possible to examine criminal incident point data across neighbourhood boundaries and to see natural distributions and the areas where these incidents are concentrated. The goal of kernel analysis is to estimate how the density of events varies across a study area based on a point pattern. Kernel estimation was originally developed to estimate probability density from a sample of observations (Bailey and Gatrell 1995). In its application to spatial data, kernel analysis produces a smooth map of density values, where the density of each place corresponds to the concentration of points in a given area.

In kernel estimation, a fine grid is overlaid on the study area. Distances are measured from the centre of a grid cell to each observation that falls within a predefined region of influence known as a bandwidth. Each observation contributes to the density value of that grid cell based on its distance from the centre of the cell. Nearby observations are given more weight in the density calculation than those farther away. In this study, the grid cell size is 100 square metres in Edmonton and Halifax. The research radius used is 1,000 metres in Edmonton and Halifax, and the higher the research radius, the smoother the image produced. Because the City of Thunder Bay is smaller, the grid cell is set at 50 square metres and the research radius is 500 metres.

The product of the kernel estimation method is a simple dot matrix (raster image) displaying contours of varying density. Contour loops define the boundaries of hot spot areas. Hot spots may be irregular in shape, and they are not limited by neighbourhood or other boundaries. This method of analysis was applied using the Spatial Analyst software of the Environmental Systems Research Institute.

The dual kernel method is also used in this study to examine the distribution of two variables simultaneously. Use of the dual kernel serves to standardize the distribution of crime based on the population at risk (the sum of the number of persons who reside or work in a neighbourhood). The dual kernel is obtained by calculating the ratio of crime density values to population at risk density values. To avoid having the concentration of a small number of incidents represented as a crime hot spot, an adjustment is made for areas where the crime and population at risk densities are low, these areas having artificially high values.

Definition of neighbourhoods

Ecological studies such as those conducted in crime-mapping projects require a sufficiently large number of geographic units or neighbourhoods for the modelling of data to be effective and reliable. In previous studies, the geographic units used were locally determined natural neighbourhoods (Winnipeg and Regina studies) or census tracts (Montréal).

In the framework of analysis carried out in Edmonton and Halifax, the definition of 'neighbourhood' used corresponds to the census tract (CT). The terms 'census tracts' and 'neighbourhoods' are used interchangeably. The natural neighbourhoods used in this analysis correspond to CTs, which are delineated by Statistics Canada in conjunction with a committee of local experts (e.g., planners, social workers, health care workers and educators). The initial rules for delineation, in order of priority, are as follows:

  1. The CT boundaries should follow permanent and easily recognizable physical features.
  2. The population of the CT should be from 2,500 to 8,000 people, preferably averaging around 4,000.
  3. CTs should be as homogeneous as possible with respect to socio-economic characteristics.

CTs are also used in many other studies, and this makes it possible to add layers of supplementary information (health, education, economic factors, etc.) for an integrated approach toward prevention in neighbourhoods with a number of risk factors.

For reasons of data confidentiality and reliability, Statistics Canada requires that when using individual, family or household income data, the population size for any Canadian geographic area being considered must be least 250 people living in at least 40 private households. As a result, in Edmonton, only 147 of the 160 CTs are included. A map is appended showing the coverage of the 147 CTs over the territory of the City of Edmonton. In Halifax, each of the 51 census tracts have sufficient population to be included in the study. A map is appended showing the coverage of the 51 CTs making up the city of Halifax.

In the analyses of Thunder Bay, the number of CTs (30) available for analysis with insufficient. The DAs of the 2001 Census were therefore retained. It is the smallest standardized spatial unit to which data were disseminated.

DAs are small areas consisting of one or more blocks, with boundaries delimited by intersecting streets generally enclosing 400 to 700 residents. DAs must meet various delineation criteria designed to maximize their usefulness, including the following: DA boundaries respect the boundaries of census sub-divisions and census tracts; DA boundaries follow roads as well as railways, water features and power transmission lines, where these features form part of the boundaries of census sub-divisions or CTs.

In the analyses of Thunder Bay, neighbourhoods' demographic and socio-economic characteristics are the characteristics of the population of DAs in the 2001 Census. The terms 'DA' and 'natural neighbourhood' are therefore interchangeable.

Only 207 of 209 Thunder Bay DAs are included in the analyses because of confidentiality and reliability rules, as explained previously. A map is appended showing the coverage of the 207 DAs.

Description of variables

Crime variables and population at risk

Usually crime rates are calculated by examining the distribution of incidents based on the residential population of a given area. This method produces good results at the urban, provincial and national levels, but presents challenges when spatial components of interest, like neighbourhoods, are small and have low residential populations.

The distribution of criminal incidents across urban areas is often concentrated in or near the city centre, where residential populations are relatively low, but where there are high concentrations of people working or engaging in other activities. Rates based on residential population alone will artificially inflate the crime rates in these urban core neighbourhoods, as the total population at risk in these areas has not been taken into account. 

To more accurately gauge the risk of crime in neighbourhoods, crime rates are based on the population at risk. An approximation of the population at risk is obtained by adding the number of workers and the number of residents in each neighbourhood. Rates based on these combined populations more closely approximate the total number of people at risk of experiencing crime. This report uses the approach taken in the Winnipeg research project (Fitzgerald 2004). Table 1.1, Table 2.1 and Table 3.1 show rates based on the residential population and the population at risk in the areas.

  • Violent offence rates per 1,000 residents and workers. Violent offences include homicide, attempted murder, sexual assault, assault, violations resulting in the deprivation of freedom, robbery, extortion, criminal harassment, uttering threats, explosives causing death or bodily harm, and other violent crimes.
  • Property offence rates per 1,000 residents and workers. Property offences include arson, breaking and entering, theft $5,000 and under, theft over $5,000, vehicle theft, possession of stolen goods, fraud and mischief.

2001 Census of population variables

Population characteristic variables

  • Males aged 15 to 24 as a percentage of the total neighbourhood population. This age group is at highest risk of offending. In Canada in 2001, people aged 15 to 24 represented 14% of the total population, but accounted for 46% of the people accused of property crimes and 31% of those accused of violent crime.
  • Percentage of the neighbourhood population that is 65 years and older. Results from the General Social Survey (GSS) on victimization suggest that national rates of criminal victimization among the elderly are relatively low compared to the population as a whole, although elderly people report feeling less safe (Gannon and Mihorean 2005).
  • Percentage of single people in the neighbourhood, defined as single people aged 15 and older who have never been married. According to the 2004 GSS, single people are more at risk of experiencing violence. This situation is partly due to the fact that single people tend to participate more often in evening activities and are generally younger, and both these factors are strongly linked to a higher risk of victimization. In 2004, people who participated in at least 30 evening activities every month also had the highest rates of violent victimization (174 per 1,000 population). This rate was 4 times higher than that noted for people participating in fewer than 10 evening activities per month (44 incidents per 1,000 population).
  • Percentage of the neighbourhood population immigrating to Canada from 1991 to 2001. Initially, immigration may hinder integration into society; however, this drawback is lessened as the length of residence in the country increases (Breton 2003). Recent immigrants' social participation may be more limited, and consequently, they may not be able to benefit to the same extent from social capital or from relationships within the community. Numerous studies have demonstrated links between reduced levels of social participation and increased levels of crime (Morenoff, Sampson and Raudenbush 2001; Sampson, Raudenbush and Earls 1997; Sampson 1997).
  • Percentage of visible minority residents in the neighbourhood. Members of visible minorities "are people, other than Aboriginal peoples, who are non-Caucasian in race or non-white in colour." In 2002, according to the Ethnic Diversity Survey, roughly 9% of Canadians who reported being victims of crime in the previous five years believed that the offence perpetrated against them could be considered a hate crime. Members of visible minorities were 1.5 times more likely than non-members of visible minorities to have been a victim of a hate crime (13 per 1,000 population and 20 per 1,000 population, respectively) (Silver, Mihorean and Taylor-Butts 2004).
  • Percentage of people with an Aboriginal identity living in the neighbourhood. Includes people who reported identifying with at least one Aboriginal group, that is North American Indian, Métis or Inuit (Eskimo), who reported being a Treaty Indian or a Registered Indian as defined by the Indian Act of Canada, or who reported they were members of an Indian Band or First Nation. The Aboriginal population in Canada is over-represented with respect to victimization and offending (Statistics Canada 2001a). Thus, according to the most recent cycle of the GSS, Aboriginal people were three times more likely than non-Aboriginals to have been a victim of a violent incident (319 compared to 101 per 1,000 population), even when other factors such as age, sex and income were taken into account (Gannon and Mihorean 2005).
  • Percentage of lone-parent families among economic families living in private households. Although the after-tax income of lone-parent families is increasing in Canada, these families continue to be among the lowest income earners (Statistics Canada 2001b), and they are concentrated in the more disadvantaged areas of the city. Additionally, an increase in labour force participation among female lone-parents from 65% in 1995 to 82% in 2001 may be linked to decreased guardianship or supervision in neighbourhoods, which has been associated with higher crime rates (Cohen and Felson 1979).
  • Percentage of people who have moved. Includes people who, on Census Day, resided at an address other than the one where they were living one year earlier. According to the 2004 GSS, people who have occupied their residence for only a short time are more likely to have their household victimized (317 incidents per 1,000 households) than those who have lived there for 10 years (196). Residential mobility has been associated with higher crime rates through reduced guardianship or social involvement that is more typical of frequent movers. Studies of American cities also indicate that streets where neighbours know each other or feel responsible for their community have significantly lower rates of violent crime than those where social interaction is lower (Block 1979; Sampson 1993).

Dwelling characteristic variables

  • Percentage of dwellings constructed before 1961. In combination with other variables related to signs of physical decay within urban neighbourhoods, the age of urban buildings may be associated with higher crime rates through a perception of increased physical disorder (Kelling and Coles 1998).
  • Percentage of dwellings in need of major repairs. Refers to whether, in the judgement of the respondent, the dwelling requires any repairs (excluding desirable remodelling or additions). Major repairs refer to the repair of defective plumbing or electrical wiring, structural repairs to walls, floors or ceilings, etc. This variable may similarly be associated with higher crime rates through the perception of increased physical disorder in the neighbourhood (Kelling and Coles 1998).
  • Percentage of households spending more than 30% of total household income on shelter, including both owner-occupied and tenant-occupied households. This is a measure of housing affordability. The 30% Chart is based on research indicating that when the shelter costs of low-income households exceed 30% of their income, their consumption of other life necessities is reduced. Shelter expenses include payments for electricity, oil, gas, coal, wood or other fuels, water and other municipal services, mortgage payments, property taxes, condominium fees and rent. Decreased housing affordability within a neighbourhood is another indicator of socio-economic disadvantage.
  • Percentage of owner-occupied dwellings in the neighbourhood. Collective dwellings are excluded from both the numerator and denominator. Renters have the highest victimization rates among households. In 2004, the victimization rate for renters was 267 incidents per 1,000 households, compared to 242 for owners (Gannon and Mihorean 2005). Greater proportions of owner-occupied housing in a neighbourhood are linked to increased residential stability, social interaction among neighbours and a collective commitment to the neighbourhood. The 2003 GSS results show that people living in a neighbourhood for less than one year are less likely to know their neighbours (Schellenberg 2004).

Socio-economic variables

The results of research projects involving spatial analysis have shown major differences between the socio-economic characteristics of high-crime neighbourhoods and those of lower-crime neighbourhoods. High-crime neighbourhoods were characterized by reduced access to socio-economic resources (Fitzgerald, Wisener and Savoie 2004; Savoie, Bédard and Collins, 2006). A number of American studies have also demonstrated that inequality of socio-economic resources between neighbourhoods in American cities is strongly associated with the spatial distribution of crime (Morenoff, Sampson and Raudenbush 2001). In the present study, the following socio-economic variables are used:

  • Percentage of total income consisting of government transfer payments, including employment insurance benefits; Old Age Security benefits, including the Guaranteed Income Supplement and the spouse's allowance; net federal supplements; Canada and Quebec pension plan benefits; the Canada Child Tax Benefit; New Brunswick, Quebec, Alberta and British Columbia family allowances; the goods and services tax credit; workers' compensation benefits; social assistance; and provincial or territorial refundable tax credits.
  • Percentage of neighbourhood residents aged 20 and older without a high school diploma.
  • Percentage of neighbourhood residents aged 20 and older who have obtained a bachelor's degree.
  • Percentage of neighbourhood population in private households with low income in 2000. Low income refers to private households that spend 20% more of their disposable income than the average private household on food, shelter and clothing. Statistics Canada's low-income cut-offs (LICOs) are income thresholds that vary according to family and community size. Although LICOs are often referred to as poverty lines, they have no official status as such.
  • Neighbourhood unemployment rate for population aged 15 and older participating in the labour force.
  • Median household income in thousands of dollars or the dollar amount above and below which half the cases fall, namely the 50th percentile. Low household income increases the risk of violent victimization, while high income increases the risk of household victimization (Gannon and Mihorean 2005). It may be that potential thieves are more attracted to higher-income households since their members probably own more property or property of greater perceived value.

City land use variables

  • Commercial zoning—the proportion of square area within a neighbourhood zoned for commercial land use. Types of land use falling under commercial zoning include stores, supermarkets, discount stores, furniture stores, banks, hotels, motels, restaurants, service garages, service stations, full-service auto dealers, car washes, residential/commercial split properties and commercial offices.
    • In Edmonton, commercial zoning is represented by the number of workers in the retail trade industry, sectors 44 and 45 of the 1997 North American Industry Classification System (NAICS).
    • In Halifax, commercial zoning includes categories B, C-1, C-2, C-2A, C-2B, C-2C, C-2D, C-3, C-3A, C-4, C-6, CCDD, CGB, CHWY, CMC, CSC, CR-1, CR-2, DB, K, HZ, SC_MF1 and W, as defined by the City of Halifax.
    • In Thunder Bay, commercial zoning includes categories CBD, RC1, RC2, SC, NC1, NC2, CG1, CG2, CSG, HC and SPC, as defined by the City of Thunder Bay.
    • Multi-family residential zoning—the proportion of square area within a neighbourhood zoned for multi-family, two-family (duplex) or transitional dwellings, which include short- and longer-term subsidized housing for those in need.
      • In Edmonton, multi-family residential zoning is represented by the percentage of dwellings in the census tract that are contained within an apartment building.
      • In Halifax, multiple family residential zoning includes categories BSCDD, CDD, DN, K, R-2, R-2A, R-2AM, R-2P, R-2T, R-3, R-4, RMU, RTH, RTU, TH and WFCDD, as defined by the City of Halifax.
      • In Thunder Bay, multi-family residential zoning includes categories R2, R2A, RM1, RM2A, RM2B and RM3, as defined by the City of Thunder Bay.
      • Single family residential zoning—the proportion of square area within a neighbourhood zoned for single-family dwellings.
        • In Edmonton, single family residential zoning is represented by the percentage of dwellings in the CT that are considered single detached houses.
        • In Halifax, single family residential zoning includes categories BSCDD, BWCDD, CDD, H, HCR, K, MU-1, R-1, R-1M, R-2A, RA-1, RA-2, RA-3, RA-4, RB-1, RB-2, RB-3, RCDD, RDD, RR, RSU, T, V-1, V-3 et V-4, as defined by the City of Halifax.
        • In Thunder Bay, single family residential zoning includes categories RE, RS, R1, R1A, RMH, RF1, RF2 and CR, as defined by the City of Thunder Bay.
        • Institutional zoning—the proportion of the square area within a neighbourhood consisting of buildings or public spaces such parks, schools, hospitals and other government buildings.
          • In Halifax, institutional zoning includes categories AF, CFB, D-1, K, P, P-2, P-3, PK, POS, P_SI, RPK, RR, S, SI, TR, U-1, U-2 and W, as defined by the City of Halifax.
          • In Thunder Bay, institutional l zoning includes categories NIN, CIN and MIN, as defined by the City of Thunder Bay.
          • Industrial zoning—the proportion of the square area within a neighbourhood consisting of industrial spaces.
            • In Halifax, industrial zoning includes categories C-5, CD-2, CD-3, F-1, I-1, I-2, I-3, I-4, IHI, IHO, ILI, M, P5, W and WFCDD, as defined by the City of Halifax.
            • In Thunder Bay, industrial zoning includes categories FI, SI, LI, LIP, HI, EI, HRI, RR, PBP and GBP, as defined by the City of Thunder Bay.
            • Open space—the proportion of the square area within a neighbourhood consisting of spaces without any major buildings.
              • In Thunder Bay, open spaces include categories RU, OS, AIR, US, HL and FD, as defined by the City of Thunder Bay.

              Multivariate analysis

              Ordinary least squares (OLS) regression is used to examine the distribution of violent and property crime rates as a function of the set of explanatory factors. The use of this method requires a continuous or quantitative outcome variable that has a normal distribution. As a number of variables studied here do not have normal distributions, it was necessary to submit the crime variables to normalizing transformations. Most of the variables or neighbourhood characteristics were also changed so that they would exhibit a normal distribution. The combination of variables and the associated normalization techniques are included in the Methodology section.

              The regressive models were developed using stepwise procedure. This method consists of a series of multiple regressions such that at each stage, the variable that accounts for the maximum remaining variance is added. At each stage, any superfluous variables are eliminated.

              The standardized regression coefficients provide a means of assessing the relative importance of the different predictor variables in the multiple regression models. The coefficients indicate the expected change, in standard deviation units, of the dependent variable per one standard deviation unit increase in the independent variable, after controlling for the other variables. The maximum possible values are +1 and -1, with coefficient values closer to 0 indicating a weaker contribution to the explanation of the dependent variable.

              Many neighbourhood characteristics in this study are closely correlated with each other; they convey essentially the same information (the correlation matrix is located in the appendix). This situation takes into account the close links between many structural factors that are individually linked to crime (Land, McCall and Cohen 1990). To take account of this multicolinearity, which is likely to distort the results of the models, variance inflation factors (VIFs) are used to measure the multicolinearity between all independent variables in the regression models. A VIF greater than 10 indicates possible multicolinearity problems in a regression model (Montgomery, Peck and Vining, 2001). Variables that register a VIF of 10 or more are, therefore, eliminated from the final models. Finally, because members of visible minorities and new immigrants account for only 2% of the population, these variables were not included in the multivariate models.

              Another aspect that must be taken into account in spatial analysis of data, such as crime data, is spatial autocorrelation (see Spatial autocorrelation text box). The presence of spatial autocorrelation is detected in the residuals of the OLS regression models for Edmonton, that is, a Moran's I statistic of 0.12 (p<0.005) in the case of violent crimes and 0.19 (p<0.001) in the case of property crimes. Therefore, in modelling relationships between neighbourhoods, it is appropriate to take their relative geographic position into account. Thus, the use of a spatial autoregressive model is required.

              Text box 1

              Spatial autocorrelation

              Data measured over a two-dimensional study area, such as the geocoding of criminal incidents, are often affected by the properties of the location in which they reside. If adjacent observations are affected by the same location properties, the observations will not be independent of one another. This lack of independence must be accounted for in the data analysis to produce accurate and unbiased results. This is accomplished through spatial modelling of data and is important for any dataset where there is a potential effect of location.

              Crime is known not to be evenly distributed across cities and to be concentrated in particular areas known as hot spots. This indication of a location effect can be seen by examining a map of crime density in city neighbourhoods. A positive effect may occur in areas with high crime rates that are surrounded by other areas with high crime rates and areas with low crime rates that are adjacent to other areas with low crime rates. A negative location effect results from areas of low crime being surrounded by areas with high crime and vice versa. Either scenario indicates some sort of spatial structure or spatial dependence in the data, signifying that the neighbourhoods have an influence on each other. If the spatial structure of the data is not explained by the variables in the regression model, then there will be spatial effects in the model error terms. This phenomenon, which is known as spatial autocorrelation, violates the assumptions made in a standard regression analysis. The location effects must instead be accounted for in the multivariate model, to ensure accurate estimation of the regression coefficients and their associated variances.

              For the purpose of spatial modelling, a definition of what constitutes neighbouring locations needs to be specified. In this analysis, a contiguity structure that includes all common borders or vertices that touch between the boundaries of the regions is used to define regions as neighbours of each other. The neighbourhood structure defines which locations have a potential influence on each other, the neighbours, and rules out any potential influence of regions that are not considered to be neighbours. The neighbourhood structure is used to test for spatial autocorrelation and to specify the spatial component in the autoregressive spatial model.

              The basic process of modelling spatial data is to first fit a standard least squares regression model to the data and then test the error terms for the presence of spatial autocorrelation. This is done by a statistical test called Moran's I, which tests whether the error terms are randomly distributed over the study area. The value of the Moran's I statistic ranges from 1 to -1. A value approaching 1 indicates the presence of positive spatial autocorrelation, where regions with large error terms are adjacent to other areas with large error terms. A negative value near -1 indicates the presence of negative spatial autocorrelation, where regions with large error terms are neighbouring regions with small error terms. A value near zero indicates the absence of spatial autocorrelation. The significance of Moran's I statistic is determined by a random permutation approach, where a significant result indicates that there is spatial autocorrelation in the model error terms.

              When spatial autocorrelation is detected in the residuals from a standard least squares regression model, a spatial model must be fit to the data instead. The spatial model provides the same analysis of the neighbourhood characteristics as the least squares model but adjusts for the spatial effects. This can be done in one of two ways: by adding an extra term to represent the effect of neighbouring locations or by modelling a spatial process in the error terms. In the former model, called the spatial lag model, a direct effect of the crime rate in neighbouring locations is assumed. In this case the average value from all neighbouring locations, termed the spatial lag, is added to the regression model to represent the direct effect of the neighbouring regions. The other model, termed the spatial error model, assumes the relationship between crime rates in adjacent neighbourhoods is the result of the same relationship of the explanatory variables in the adjacent neighbourhoods. Thus the spatial autocorrelation, detected in the standard regression model, is the result of spatially autocorrelated variables not present in the model. To determine the appropriate type of spatial model to use for any given dataset, the data are empirically tested to determine the structure of the spatial dependency.

              The results from a spatial regression analysis are essentially the same as other multivariate regression analyses. The regression coefficients represent the change in the crime rate for a unit change in the variable, when all other variables are held constant. Since the variables representing the neighbourhood characteristics are standardised, the size of their regression coefficients denote their relative contribution to the prediction of crime. The spatial lag and spatial error regression coefficients, however, cannot be explained in the same way. The spatial lag coefficient in part represents the effect of neighbouring locations but also accounts for some of the measurement error in using administrative units to define the neighbourhoods. Thus there is no direct interpretation of the spatial lag coefficient. Similarly, the spatial error coefficient represents a nuisance parameter in the model and has no direct interpretation. Rather, the spatial term is only retained in the model to make the other results accurate.

              The overall fit of the spatial models is assessed by the squared correlation between the observed crime rate in each neighbourhood and the values predicted using the spatial model. This squared correlation is equivalent to the coefficient of determination (R2), commonly used in standard regression models, where it represents the proportion of the variation explained by the regression model. However, in the presence of spatial autocorrelation the squared correlation between the observed and fitted values does not have the same interpretation. Rather, it represents the relative fit of the model. A value of 1 would represent a perfect fit of the model and values near zero indicate a poor predictive power of the model.

              To ensure the spatial autocorrelation has been adequately accounted for in the model, the residuals from the spatial model are tested for the presence of spatial autocorrelation. This is done using Lagrange Multiplier tests, which test for the presence of spatial error dependence in the spatial lag model and for a missing spatial lag variable in the spatial error model. If the statistical test is not significant, it indicates the spatial dependence in the data has been accounted for in the model.

              Normalization techniques

              Edmonton

              No transformation

              Natural logarithm

              Square root

              Population aged 15 and under

              Violent crime rate

              Property crime rate

              Population at risk density

              Ratio of male to female

              Dwellings in need of major repairs

              Lone-parent families

              Population aged 65 and over

              Unemployment rate

              Owner-occupied households

              Aboriginals

              Recent immigrants
              (since 1991)

              Without high school diploma (20 years and over)

              Recent movers (1 year)

              Visible minority population

               

              With university diploma (20 years and over)

              Households spending 30% or more of income on housing

               

              Median employment income

              Part of government transfers in total income

               

              Average value of dwelling

               

              Median household income

              Persons in low-income households

               

              Persons living alone

               

               

              Dwellings built before 1961

               

               

              Dwellings built after 1990

               

               

              Single persons, never married

               

               

              Young men (18 to 24)

               

               

              Workers in retail trade

               

               

              Multiple-family zoning

               

               

              Single-family zoning

               

              Halifax

              No transformation

              Natural logarithm

              Square root

               

              All variables included in the study

               

              Thunder Bay

              No transformation

              Natural logarithm

              Square root

              Open space zoning

              Property crime rate

              Violent crime rate

              Single-family zoning

              Single persons, never married

              Population aged 65 and over

              Multiple-family zoning

              Unemployment rate

              Persons living alone

              Industrial zoning

               

              Recent movers (1 year)

              Institutional zoning

               

              Aboriginal

              Commercial zoning

               

              Without high school diploma (20 years and over)

              Ratio of male to female

               

              Population aged under 15

               

              With university diploma (20 years and over)

              Lone-parent families

               

              Owner-occupied households

               

              Median employment income

              Dwellings in need of major repairs

               

              Part of government transfers in total income

              Dwellings built before 1961

               

              Dwellings built after 1990

               

              Persons in low-income households

              Professional occupation

               

              Households spending 30% or more of income on housing