Use of administrative data
Administrative records are data collected for the purpose of carrying out various non-statistical programs. For example, administrative records are maintained to regulate the flow of goods and people across borders, to respond to the legal requirements of registering particular events such as births and deaths, and to administer benefits such as pensions or obligations such as taxation (for individuals or for businesses). As such, the records are collected with a specific decision-making purpose in mind, and so the identity of the unit corresponding to a given record is crucial. In contrast, in the case of statistical records, on which no action concerning an individual or a business is intended or even allowed, the identity of individuals/businesses is of no interest once the database has been finalized.
Using administrative records presents a number of advantages to a statistical agency and to analysts. Demands for statistics on all aspects of our lives, our society and our economy continue to grow. These demands often occur in a climate of tight budgetary constraints. Statistical agencies also share with many respondents a growing concern over the mounting burden of response to surveys. Respondents may also react negatively if they feel they have already provided similar information (e.g. revenue) to administrative programs and surveys. Administrative records, because they already exist, do not incur additional cost for data collection nor do they impose a further burden on respondents. Advancements in technology have permitted statistical agencies to overcome many of the limitations caused by processing large datasets. For all these reasons, administrative records are being used increasingly for statistical purposes.
Statistical uses of administrative records include: (i) use for survey frames, directly as the frame or to supplement/update an existing frame, (ii) replacement of data collection (e.g. use of taxation data for small businesses in lieu of seeking survey data for them), (iii) use in editing and imputation, (iv) direct tabulation, (v) indirect use in estimation (e.g. as auxiliary information in calibration estimation, benchmarking or calendarisation), and (vi) survey evaluation, including data confrontation (e.g. comparison of survey estimates with estimates from a related administrative program).
On the other hand, one must be careful in using administrative data as there are a number of limitations to be aware of including (i) the level or the lack of quality control over the data, (ii) the possibility of having missing items or missing records (an incomplete file), (iii) the difference in concepts which might lead to bias problems, as well as coverage problems, (iv) the timeliness of the data (the collection of the data being out of the statistical agency's control, it is possible that due to external events, part or all of the data might not be received on time), and (v) the cost that comes with administrative data: for instance, computer systems are needed to clean and complete the data in order to make it useful. For a discussion on the advantages and disadvantages of using administrative data, see Lavallée (2000).
It is Statistics Canada's guiding principle to use administrative records whenever they present a cost-effective alternative to direct data collection. As with any data acquisition program, consideration of the use of administrative records for statistical purposes is a matter of balancing the costs and benefits. In many cases, using administrative records avoids further data collection costs and respondent burden, provided the coverage and the conceptual framework of the administrative data are compatible with the target population. In other situations, there may be costs incurred through paying for the data capture or providing some service in exchange. Depending on the use, it is often valuable to combine an administrative source with another source of information.
The use of administrative records may raise concerns about the privacy of the information in the public domain. These concerns are even more important when the administrative records are linked to other sources of data. The Policy on Informing Survey Respondents (Statistics Canada, 1998) requires that Statistics Canada provide all respondents with information such as the purpose of the survey, the confidentiality protection measures, the record linkage plans and the identity of the parties to any agreements to share the information provided by those respondents. Record linkage must be in compliance with the Agency's Policy on Record Linkage (Statistics Canada, 2008). In particular, all requests for record linkage must be submitted to the Confidentiality and Legislation Committee and approved by the Policy Committee. Requests are usually approved for specific uses only. However, in certain cases, some data requests are approved for recurring or continuous uses.
The use of administrative data may require the statistical agency to implement a number, usually only a subset, of the survey steps discussed in the other sections. This is because many of the survey steps (e.g. direct collection and data capture) are performed by the administrative organization. As a result, additional guidelines to the others presented are required to suggest ways to compensate for any differences in the quality goals of the source organization. For instance, an extensive edit and imputation program might have to be developed in order to achieve a certain level of quality required for the uses of the data.
One must keep in mind the fundamental reason for the existence of these administrative records: they are the result of an administrative program that was put in place for administrative reasons. Often, the statistical uses of these records were unknown when the program was implemented and the statistical agency often has limited impact in the development of the program. For that reason, any decisions related to the use of administrative records must be preceded by an assessment of such records in terms of their coverage, content, concepts and definitions, the quality assurance and control procedures put in place by the administrative program to ensure their quality, the frequency of the data, the timeliness in receiving the data by the statistical agency and the stability of the program over time. Obviously, the cost of obtaining the administrative records is also a key factor in the decision to use such records.
The administrative program
Maintain a continuing liaison with the provider of administrative records. Liaising with the provider is necessary at the beginning of the use of administrative records. However, it is even more important to keep in close contact with the supplier at all times so that the statistical agency is not surprised by any changes, and can even influence them. Feedback to the supplier of statistical information and weaknesses found in the data can be of value to the supplier, leading to a strengthening of the administrative source.
Understand the context under which the administrative organization created the administrative program (e.g. legislation, objectives, and needs). It has a profound impact on (i) the universe covered, (ii) the contents, (iii) the concepts and definitions used, (iv) the frequency and timeliness, (v) the quality of the recorded information, and (vi) the stability over time. Pay special attention to the consistency of the concepts and data quality when there are multiple sources of administrative data, for example when each province manages its own program.
Keep in mind that if the information provided to the administrative source can cause gains or losses to individuals or businesses, there may be biases in the information supplied which can lead to unexpected coverage problems and biases. Special studies may be needed in order to assess and understand these sources of error.
Many of the guidelines in other sections are applicable to administrative records. Sampling and data capture guidelines will be relevant if administrative records exist only on paper and have to be coded and captured. These guidelines will also be of value for administrative data available in electronic form, including EDR (electronic data reporting). Note that these data, because they exist in electronic form, may be inherently less stable and subject to additional errors arising from data treatment and transmission processes at source. Editing and dissemination guidelines apply to all cases where a file of individual administrative records is obtained or created for subsequent processing and analysis.
Collaborate with the designers of new or redesigned administrative systems. This can help in building statistical requirements into administrative systems from the start. Such opportunities are rare, but when they happen, the eventual statistical value of the statistical agency's participation can far exceed the time and work expended on the exercise.
Study each data item in the administrative records that is planned to be used for statistical purposes. Investigate quality. Understand the concepts, definitions and procedures underlying collection and processing by the administrative organisation. Some of the items might be of very poor quality and thus might not be useful. For example, the quality of classification coding (e.g. occupation, industrial activity, geography) might not be sufficient for some statistical uses or might limit its use.
Keep in mind that the longevity of the source of administrative data and its continued scope is usually entirely in the hands of the administrative organization. The administrative considerations that originally dictated the concepts, definitions, coverage, frequency, timeliness and other attributes of the administrative program may, over time, undergo changes that distort time series derived from the administrative source. Be aware of such changes, and manage their impact on the statistical program.
Implement continuous or periodic assessment of incoming data quality. Assurance that data quality is being maintained is important because the statistical agency does not control the data collection process. This assessment may consist of implementing additional safeguards and controls (e.g. the use of statistical quality control methods and procedures, and edit rules) when receiving the data, comparisons with other sources or sample follow-up studies. A good practice is to provide feedback to the administrative source to assist them in improving their data.
Consider privacy implications of the publicationof information from administrative records. Although the Statistics Act provides Statistics Canada with the authority to access administrative records for statistical purposes, this use may not have been foreseen by the original suppliers of information (Statistics Canada, 2005). Therefore, programs should be prepared to explain and justify the public value and innocuous nature of this secondary use.
Administrative information is sometimes used to replace a set of questions that would otherwise be asked of the respondent. In this instance, permission from the respondent may have to be obtained. Follow the Policy on Informing Survey Respondents (Statistics Canada, 1998) in this regard. When consent is not obtained, put collection procedures in place for the equivalent survey questions to be asked of the respondents.
Administrative data often has information about specific people or businesses. Any data release from Statistics Canada is subject to the Statistics Act confidentiality provision, even when the data itself is already available in the public domain. Therefore, the guidelines for disclosure control should be considered when preparing any data analysis for release, including the release of administrative data.
- Like data collected by means of a survey, administrative data are also subject to partial and total nonresponse. In some instances, the lack of timeliness in obtaining all administrative data introduces greater nonresponse. Some nonresponse guidelines will thus apply. Unless nonrespondents can be followed up and responses obtained, develop an imputation or a weight-adjustment procedure to deal with this nonresponse. Administrative sources are sometimes outdated. Therefore, as part of the imputation process, give special attention to the identification of active and/or inactive units. Some imputation or transformation (e.g. calendarization) may also be required in cases where some of the units report the data at a different frequency (e.g. weekly or quarterly) than the one desired (e.g. monthly).
When record linkage of administrative records is necessary (e.g. for tracing respondents, for supplementing survey data, or for data analysis), conform to the Agency's Policy on Record Linkage (Statistics Canada, 2008). Privacy concerns that may arise when a single administrative record source is used are multiplied when linkage is made to other sources. In such cases, the subjects may not be aware that information supplied on two separate occasions is being combined. The Policy on Record Linkage is designed to ensure that the public value of each record linkage truly outweighs any intrusion on privacy that it represents.
It is not always easy to combine an administrative source with another source of information. This is especially true when a common matching key for both sources is not available and record linkage techniques are used. In this case, select the type of linkage methodology (e.g. exact matching or statistical matching) in accordance with the objectives of the statistical program. When the purpose is frame creation and maintenance, or data editing, exact matching should be used. In the case of imputation or weighting, exact matching should be used, but statistical matching can be also sufficient. When the sources are linked for performing some data analyses that are impossible otherwise, consider statistical matching, e.g. matching of records with similar statistical properties (see Cox and Boruch, 1988; Kovacevic, 1999).
When record linkage is to be performed, make appropriate use of existing software. Statistics Canada's Generalized Record Linkage Software is but one example of a number of well-documented packages.
When data from more than one administrative source are combined, pay additional attention to reconcile potential differences in their concepts, definitions, reference dates, coverage, and the data quality standards applied at each data source. Examples are education data sources, health and crime reports, and registries of births, marriages, licenses, and registered vehicles, which are provided by various organizations and government agencies.
Some administrative data are longitudinal in nature (e.g. income tax, goods and services tax). When records from different reference periods are linked, they are very rich data mines for researchers. Remain especially vigilant when creating such longitudinal and person-oriented databases, as their use raises very serious privacy concerns. Use the identifier with care, as a unit may change identifiers over time. Track down such changes to ensure proper temporal data analysis. In some instances, the same unit may have two or more identifiers for the same reference period, thus introducing duplication in the administrative file. If this occurs, develop an unduplication mechanism.
- Document the nature and quality of the administrative data once assessed. Documentation helps statisticians decide the uses for which the administrative data are best suited. Choose appropriate methodologies for the statistical program based on administrative data and inform users of the methodology and data quality.
Main quality elements: relevance, accuracy, timeliness, coherence
Do the data elements that are being captured in the administrative system reflect the concepts and definitions of the data user? Although it is often less expensive to mine administrative data than to collect the information via a survey, the analytical goals must be met with the administrative data in order for it to be a useful endeavour. Indicate the source, vintage, and how well definitions and classifications match to the survey data, and to the needs of data users.
Administrative data often does not go through the same edits that survey data does. Some edits are usually performed by the administrative organization, but their nature and purposes are usually different from those of the statistical agency. As a result, data quality can be an issue when using administrative sources for statistical purposes, particularly with no or limited ability to recontact the originator of the information. Additionally, sampled administrative data may not adhere to any standard sampling scheme, introducing possible biases and making the calculation of sampling error difficult. Finally, if the administrative data are used as a frame in addition to or in place of another one obtained from data collection, it may not be possible to analyze the issues of coverage and nonresponse. On the positive side, many administrative data sources are censuses, meaning that there will be no sampling error in the estimates obtained from them. Indicate the contribution to key estimates from administrative data. If used as a frame, report the imputation rate for item or complete nonresponse and explain how the imputation was performed. If the administrative data are simply summed to produce an estimate, include an estimate of the loss of precision due to imputation. If administrative data make up part of the estimate, the rest being accounted for by survey data, report the portion of the frame covered by the administrative data as well as the portion of the estimate. Produce a response rate combining both the administrative portion and the survey portion as explained in Trépanier et al. (2005).
This is a serious consideration for administrative data. It is common for this type of data to be unavailable until well after the reference period. In the case of using administrative data for a frame, it may be well out-of-date by the time it can be used. Additionally, if administrative data are integrated with survey data, it is important that the administrative data be as timely as the survey data: otherwise the entire process can be held up. Conversely, there are some cases where administrative systems are maintained in real time, making extraction of information from them timelier than performing a separate survey. Indicate the vintage of any administrative data used. Explain the assumptions that are made regarding the use of outdated administrative data.
This is another significant consideration with administrative data. This type of data is typically captured for another purpose and, as a result, will not necessarily mesh with already-defined concepts that might exist on other statistical holdings. This can be true in the case of concepts and definitions, and even in the sense of coverage and sample design. Administrative data might cover only a portion of the target population, making it problematic to use, or a sampling strategy may have been employed making the calculation of survey weights difficult to perform. There are cases where survey designers should have input into the design of the administrative systems, which can greatly increase the coherence of the data. List any exclusion that may complicate comparisons with other data. Indicators may include a measure of the target population not covered.
Babyak, C. 2007. "Challenges in Collecting Police-Reported Crime Data." ICES-III, Proceedings of the Third International Conference on Establishment Surveys, Survey Methods for Businesses, Farms, and Institutions. Montreal, Quebec. June 18-21, 2007. p. 959-966.
Brackstone, G.J. 1987. "Issues in the use of administrative records for statistical purposes." Survey Methodology. Vol. 13. p. 29–43.
Brion, P.H. 2007. "Redesigning French Structural Business Statistics, Using More Administrative Data." ICES-III, Proceedings of the Third International Conference on Establishment Surveys, Survey Methods for Businesses, Farms, and Institutions. Montreal, Quebec. June 18-21, 2007.
Cox, L.H. and R.F. Boruch. 1988. "Record linkage, privacy and statistical policy." Journal of Official Statistics. Vol. 4, no. 1. p. 3–16.
Haziza, D., G. Kuromi, J. Bérubé. 2007. "Sampling and Estimation in the Presence of Tax Data in Business Surveys at Statistics Canada." ICES-III, Proceedings of the Third International Conference on Establishment Surveys, Survey Methods for Businesses, Farms, and Institutions. Montreal, Quebec. June 18-21, 2007.
Kovacevic, M. 1999. "Record linkage and statistical matching – they aren't the same!" SSC Liaison. Vol. 13, no. 3. p. 24–29.
Lavallée, P. 2000. "Combining Survey And Administrative Data: Discussion Paper." ICES-II, Proceedings of the Second International Conference on Establishment Surveys, Survey Methods for Businesses, Farms, and Institutions. Buffalo, New York. June 17-21, 2000. p. 841-844.
Lavallée, P. 2005. "Quality Indicators when Combining Survey Data and Administrative Data." Proceedings of the XXII International Methodology Symposium. Statistics Canada. Ottawa, Ontario. October 25-28, 2005.
McKenzie, R. 2007. "A Statistical Architecture for Economic Statistics." ICES-III, Proceedings of the Third International Conference on Establishment Surveys, Survey Methods for Businesses, Farms, and Institutions. Montreal, Quebec. June 18-21, 2007.
Michaud, S., D. Dolson, D. Adams, and M. Renaud. 1995. "Combining administrative and survey data to reduce respondent burden in longitudinal surveys." Proceedings of the Section on Survey Research Methods. American Statistical Association. p. 11–20.
Penneck, S. 2007. "The Future of Using Administrative Data Sources for Statistical Purposes." ICES-III, Proceedings of the Third International Conference on Establishment Surveys, Survey Methods for Businesses, Farms, and Institutions. Montreal, Quebec. June 18-21, 2007.
Statistics Canada. 1998. " Policy on Informing Survey Respondents." Statistics Canada Policy Manual. Section 1.1. Last updated March 4, 2009.
Statistics Canada. 2005. "The Statistics Act." Ottawa, Ontario. /about-apercu/act-loi-eng.htm
Statistics Canada. 2008. "Policy on Record Linkage." Statistics Canada Policy Manual. Section 4.1. Last updated March 4, 2009.
Trépanier, J., C. Julien, and J. Kovar. 2005. "Reporting Response Rates when Survey and Administrative Data are Combined." Proceedings of the Federal Committee on Statistical Methodology Research Conference. Arlington, Virginia. November 14-16, 2005.
Wallgren, A. and B. Wallgren. 2007. Register-based Statistics: Administrative Data for Statistical Purposes. New York. John Wiley and Sons, 258 p.