 |
|
Survey steps >
Scope and purpose
Data analysis is the process of transforming raw data into useable
information that is often presented in the form of a published analytical
article. The basic steps in the analytic process consist of identifying
an issue, asking meaningful questions, developing answers to the questions
through examination and interpretation of data and communicating the message
to the reader.
Analytical results can underscore the usefulness of data sources by shedding
light on issues. Some Statistics Canada programs even depend on analytical
output as a major data product because, for confidentiality reasons, it
is not possible to release the microdata to the public. In recent years
there has been emphasis placed on increasing the amount of relevant analysis
being done within the Agency with Statistics Canada data.
Data analysis also has an important role as part of the survey development
and revision process. It can have a crucial impact on data quality by
helping to identify data quality related problems and by influencing future
improvements to the survey process. Analysis is essential for understanding
results from previous surveys and pilot studies, for planning new statistical
activities, for providing information on data gaps, for designing surveys,
and for formulating quality objectives.
Principles
A statistical agency is concerned with the relevance and usefulness to
users of the information contained in its data. Analysis is the principal
tool for obtaining information from the data. Analysis results may be
categorized into two general types: (a) descriptive results, which are
results relating to the survey population at the time that the data were
collected - for example, the median income in the year that the population
was surveyed; and (b) analytical results relating to a survey population
that often goes beyond the actual population surveyed – for example,
the chance of someone having a particular chronic disease.
To be effective, the analyst needs to know the audience and the issues
of concern (both current and those likely to emerge in the future) when
identifying topics and suitable ways to present results. Study of background
information allows the analyst to choose appropriate data sources and
statistical methods. Any conclusions presented in an analytical study,
including those that can impact on public policy, must be supported by
the data being analyzed.
Guidelines
- Ensure that the data are appropriate for the analysis to be carried
out. This requires investigation of a wide range of details such as
whether the survey population of the survey sufficiently approximates
the target population of the analysis, whether the variables and their
concepts and definitions are relevant to the study, whether the longitudinal
or cross-sectional nature of the survey is appropriate for the analysis,
whether the sample size in the study domain is sufficient to obtain
meaningful results and whether the ascertained quality of the data from
the survey supports these results.
- If more than one data source is being used for the analysis, investigate
whether the sources are consistent and how they may be appropriately
combined.
- Consider whether imputed values should be included in the analysis
and if so, how they should be handled (see section on Imputation).
- Consider how unit and/or item nonresponse should be handled in the
analysis.
- Choose an analytical method that is appropriate for the question being
investigated.
- When making comparisons between two groups of individuals, businesses,
or other units, control for extraneous factors. If significant differences
between the groups are found as a result of statistical tests, then
consider alternative plausible explanations for the differences.
- Since most analyses are based on observational studies rather than
on the results of a controlled experiment, avoid drawing conclusions
concerning causality.
- Use diagnostic techniques to assess the analytical model.
- Beware of focusing on short-term trends without inspecting them in
light of medium-and long-term trends. Frequently, short-term trends
are merely minor fluctuations around a more important medium- and/or
long-term trend.
- Where possible, avoid arbitrary time reference points, such as the
change from last year to this year. Instead, use meaningful points of
reference, such as the last major turning point for economic data, generation-to-generation
differences for demographic statistics, and legislative changes for
social statistics.
- Consult with experts both on the subject matter and on the statistical
methods.
- Analytical methods that ignore the survey design can be useful, provided
the model being assumed in the analysis is correct. However, alternative
methods that incorporate the sample design information, frequently called
design-based methods, will generally be effective even when some aspects
of the model are incorrectly specified. Assess whether the survey design
information can be incorporated into the analysis and if so how this
should be done. Having determined the appropriate analytical method,
investigate the software choices that are available to apply the method.
[See Binder and Roberts (2001) for a definition of ignorable survey
designs, and Binder and Roberts (2003) and Skinner, Holt and Smith (1989)
for discussion of ignoring the survey design. See Statistics Canada
(2003a), Chambers and Skinner (2003), Korn and Graubard (1999), Lehtonen
and Pahkinen (1995), Lohr (1999), Thomas (1993), and Skinner, Holt and
Smith (1989) for a number of examples showing the benefits of design-based
analytical methods.]
- Before beginning to write, prepare an outline of the article. When
preparing the outline, consider such questions as: “What issue
am I addressing? What data am I using? Can I eliminate any irrelevant
data? What analytical methods are appropriate? What results do I want
to highlight? What are my interesting findings?”
- Focus the article on the important variables and topics. Trying to
be too comprehensive will often interfere with a strong story line.
- Arrange ideas in a logical order and in order of relevance or importance.
Use headings, sub-headings and sidebars to strengthen the organization
of the article.
- Keep the language as simple as the subject permits. Depending on the
targeted audience for the article, some loss of precision may sometimes
be an acceptable tradeoff for more readable text.
- Use graphs in addition to text and tables to communicate the message.
Use headings that capture the meaning (e.g., “Women’s earnings
still trail men’s”) in preference to traditional chart titles
(e.g., “Income by age and sex”). Always help readers understand
the information in the tables and charts by discussing it in the text.
- When tables are used, take care that the overall format contributes
to the clarity of the data in the tables and prevents misinterpretation.
This includes spacing; the wording, placement and appearance of titles;
row and column headings and other labeling.
- Explain rounding practices or procedures. In the presentation of rounded
data, do not use more significant digits than are consistent with the
accuracy of the data.
- When presenting details about rates, be careful to distinguish between
percentage change and change in percentage points. Define the base used
for rates.
- Ensure that all references are accurate and are referred to in the
text.
- Check for errors in the article. Check details such as the consistency
of figures used in the text, tables and charts, the accuracy of external
data, and simple arithmetic.
- Ensure that the intentions stated in the introduction are fulfilled
by the rest of the article. Make sure that the conclusions are consistent
with the evidence.
- Have the article reviewed by at least two other persons. Where appropriate,
verify the quality of the translation.
- As a good practice, consider doing a presentation about the analysis
results that have been obtained. This is another kind of peer-review
that can help improve the article. Always dry run presentations involving
external audiences.
References
Binder, D.A. (1983). On the variances of asymptotically normal estimators
from complex surveys. International Statistical Review,
51, 279-292.
Binder, D.A. and Roberts, G. (2001). Can informative designs be ignorable?
Newsletter of the Survey Research Methods Section, American Statistical
Association, Issue 12.
Binder, D.A. and Roberts, G.R. (2003). Design based methods for estimating
model parameters. In Analysis of Survey Data, R.L. Chambers
and C.J. Skinner (eds.), Wiley, Chichester, 29-48.
Chambers, R.L. and Skinner, C.J. (eds.) (2003). Analysis of Survey
Data. Wiley, Chichester.
Korn, E.L. and Graubard, B.I. (1999). Analysis of Health Surveys.
Wiley, New York.
Lehtonen, R. and Pahkinen, E.J. (1995). Practical Methods for
Design and Analysis of Complex Surveys. Wiley, Chichester.
Lohr, S.L. (1999). Sampling: Design and Analysis. Duxbury
Press.
Skinner, C.K., Holt, D. and Smith, T.M.F. (1989). Analysis of
Complex Surveys. Wiley, Chichester.
Statistics Canada (1995). Policy
on the Review of Information Products. Policy Manual,
2.5.
Statistics Canada (2001a). Guidelines on Writing Analytical Articles.
Communications Division.
Statistics Canada (2003a). Analysis Handbook. Prepared
by the Data Analysis Resource Centre, Methodology Branch.
Statistics Canada (2003e). The Official Style Guide. Editorial Services,
Communications Division. See http://icn-rci.statcan.ca/10/10d/10d_000_e.htm
(STC intranet site). Publication updated regularly.
Thomas, D.R. (1993). Inference using complex data from surveys and experiments.
Canadian Psychology, 34, 415-431.
|