3 Data gathering and processing
3.1 Planning
Text begins
At first glance, conducting a survey might appear to be simply asking questions and compiling the answers to obtain statistics. However, it’s important to follow precise steps so that the survey results will provide accurate and useful information.
To begin, the following questions should be addressed:
- Why is this survey being conducted?
- Whom will the collected information be about?
- What do I need to know?
- How will the information be used?
- How accurate and timely does the information have to be?
To design a survey, many decisions have to be made about the following issues, which will be covered in detail in the present section.
- Survey objectives
- Target population
- Data requirements
- Choosing the type of collection
- Minimizing error
- Sample size
- Analysis plan
- Questionnaire design
- Data collection methods
- Data processing plan
- Quality control
- Analysis and dissemination of results
Survey objectives
A survey plan begins with objectives that describe why the survey is being done and which population has to be reached to carry this survey. The survey objectives tell a lot about the data that needs to be collected. The objectives also help determine the target population.
Imagine that Ridgemont High School’s student council wants to survey students to get information that would help in planning the graduation prom. From this general goal, you can refine objectives. Let’s say that the survey objectives are
- To gather information from students in order to determine the factors that will make the prom a success. (The criteria of “success” are that the largest possible number of students will attend the prom and that it will fulfill their expectations.)
- To obtain useful data that will help the prom organizing committee.
The survey plan shows how the objectives will be reached by clearly describing the target population, the data requirements and the variables to be measured, as well as looking at the questions and possible answers and how the data will be processed and analyzed.
Target population
If a survey’s objective is to collect information from students, for example, then asking the question “which students?” will help to define the target population.
In the example described previously, the prom organizing committee will probably want to question only students who will be graduating this year, that is, those in the last year of high school (Grade 12). If some of the Grade 12 students are studying part-time and don’t intend to graduate this year, they need not be consulted. The target population would therefore be defined as “the full-time Grade 12 graduating students of Ridgemont High School.”
Sometimes the target population, that is, the population for which information is required, and the survey population, the population actually being covered by the survey, differ for practical reasons. Ideally, the two populations should be very similar. It is important to note that conclusions based on the survey results will apply only to the survey population.
In our example, some of the full-time Grade 12 graduating students might be away from school at the time of the survey. Since it would be too difficult to reach them, they would not be part of the survey population, although they are part of the target population.
It is also possible that some of the survey concepts and methods that are used may be considered inappropriate for certain segments of the population. For example, consider a survey of post-secondary graduates where the objective is to determine if the graduates found jobs and, if so, what types of jobs. In this case, you might exclude graduates coming from specialized schools such as military schools. These types of graduates would be reasonably assured of securing employment in their field. The target population would therefore be those who graduated from universities, colleges and trade schools.
It may also be necessary to impose geographic limits that will exclude some members of the target population, as some regions may be too difficult or expensive to reach. For example, a business that is doing a survey using in-person interviews may wish to use a sample of the target population living in a densely populated area in order to minimize travel costs.
Data requirements
To determine what kinds of data to collect, ask, “What exactly do we want to know?” and “How will the collected information be used?”
In our example, the organizing committee might consider the following questions:
- Do we need to know the number of students who intend to go to the prom? (This number might also be established from ticket sales.)
- If we ask students whether they intend to go to the prom, should we ask anything in particular to those who don’t intend to go? (By understanding better their reasons for not going, it might be possible to plan certain activities that are of interest thus influencing them to change their minds!)
- When asking about student preferences concerning the prom, what aspects should we consider? Probably elements such as
- the cost of tickets,
- the music,
- the type of refreshments,
- the day of the week,
- the venue or location.
- Are there any other factors to consider? Would the students like to have a photographer available? Does everyone want to have a meal before the dance or do some students want just the dance?
- Are students interested in having security guards at the entrance of the venue? What type of transportation would students like to use to get to and from the prom? (The rental of a bus from a central location might be considered.)
When planning a survey, it’s tempting to want to collect as much information as possible. However, the more questions are asked, the longer the survey takes and the more it costs. It’s important to ask: “Do we really need this information?” while considering the time and resources needed to test the questionnaire, process the data and analyze the results.
Another aspect to take into account is the burden the survey imposes on the respondent, so that it’s not seen as a nuisance. Response burden is affected by
- the number of questions asked,
- the intrusiveness of the questions,
- the number of times the respondent is contacted (for the same survey or for many surveys),
- the detail of information requested (for example, if asked for a precise income figure, respondents need to consult their official documents, but if asked to choose between five different income ranges, they can answer more easily),
- the time it takes to complete the survey.
Choosing the type of data collection
The level of precision and details pursued and the resources available will determine the choice of the type of data collection. Advantages and disadvantages of different types of collection have been presented on the section on the type of data.
In our example, the organizing committee may decide to do a census of all the graduating students or to survey only a sample of that group.
The type of collection chosen often depends on the budget available. Costs are one of the main justifications for choosing to conduct a sample survey instead of a census. With sample surveys, it is possible to obtain valuable results with a relatively small sample of the target population. For example, if you need information on all Canadian citizens over 15 years of age, a survey of a small number of these (1,000 or 2,000 depending on the data requirements) might provide adequate results.
Another advantage of using a sample survey is that it allows investigators to produce information soon after they have identified the need for it, within a rapid turnaround time. For example, if an organization wants to measure the public awareness created through an advertising campaign, it should conduct a survey shortly after the campaign is undertaken. Since using a sample of the target population requires a smaller scale of operations, it reduces the data collection and processing time, while allowing more time for planning and quality control.
Minimizing error
When planning a survey, you must be aware of potential sources of error and try to reduce them as much as possible.
In a sample survey, the variation that exists between different samples causes uncertainty called sampling error. For example, let’s say you are estimating the average distance between home and school for students in your class of 25 from a sample of 5 persons. Your estimate will depend on which 5 students are sampled. If all 5 sampled students live very close to the school, the results will not be representative of the whole class. It’s the variation from one sample to another that causes the sampling error.
As a general rule, the more people surveyed, the smaller the sampling error will be. It is often possible to estimate the sampling error associated with a particular sampling plan, and try to minimize it.
By choosing to do a census, you can avoid errors related to sample variation, but not the other sources of errors, called non-sampling errors. For example, a question might be asked in a way that encourages a certain answer or an error might be made while processing the data or calculating a percentage for a table of results. These types of error must be avoided as much as possible by paying attention to quality control throughout every step of the survey process.
These two types of errors will be discussed in greater details in the section devoted to estimation.
Sample size
Since every sample survey is different, there are no hard and fast rules for determining sample size. The deciding factors are time, cost, operational constraints and the desired precision of the results. Evaluate and assess each of these issues and you will be in a better position to decide the sample size. Also, consider what should be the acceptable level of error in the sample. If there is one characteristic that is central to fulfill the survey objectives and there is a lot of variability of this characteristic in the population, the sample size will need to be bigger to obtain the specified level of precision.
Analysis plan
After identifying all the elements or variables to be measured and preparing the sample design, the next step is the analysis plan—conceiving what the results tables will look like. In other words, you need to plan the tables that you will create for the survey variables. These tables will not yet contain any data, but will show any cross-tabulations you want to make.
In our example, the organizing committee might plan results tables showing the number and percentage for each survey variable (for example, the number and percentage of students who prefer location A to location B for the prom). Some tables could also present cross-tabulations such as “Preferred music by gender.”
These tables help you verify whether the questions you are considering will allow you to reach the survey objectives. They illustrate concretely how the collected information will be used and whether it will adequately measure what you want to know.
Questionnaire design
The questionnaire design is based on the survey’s data requirements and analysis plan. To formulate the questions, it can be helpful to consult the people who will be using the results. You can also consult subject matter experts or look at questions from other surveys on similar topics or themes. Good practices for questionnaire design and testing will be presented in the data collection subsection.
Data collection methods
The choice of the method to gather the required data will have a direct impact on cost, human and material resources, time needed to carry the survey and to assess the quality of data. A first option is the interview, which can be face-to-face or by telephone, with or without computer assistance. The second option is the self-completed questionnaire, which can be paper or online.
Personal interviews are administered by a trained interviewer and can have either a structured or unstructured line of questioning. When done by telephone, questions are structured in a formal interview schedule.
The self-completed questionnaire must be highly structured as the respondent will not have as much help as he would have with an interviewer. It can be returned by mail, through a drop-off system or completed online.
In our example, the organizing committee may opt for a personal interview administered by interviewers who fill out an electronic questionnaire in a spreadsheet program. The interviewer would use a laptop computer to enter the students’ answers into the spreadsheet during the interviews. If some students are concerned about the confidentiality of their answers, the interviewer could give them the option of entering their answers themselves. Such an option, however, might cause more errors and compromise the quality of the collected data, which in turn could increase the time needed for data processing.
Data processing plan
This step deals with transforming the questionnaire responses into output. The tasks involved in data processing include coding, capture, editing, imputation and the creation of derived variables. In short, the aim of this step is to produce a file of data that is free of invalid and missing values and that can be used for estimation and data analysis.
Quality control
This process aims to identify errors and verify results. No matter how much planning and testing goes into a survey, something unexpected will happen. As a result, no survey is ever perfect. Quality control tasks are required to minimize non-sampling errors introduced during various stages of the survey. These tasks include interviewer training, computer program testing, follow-up of non-respondents, and spot-checks of collected responses and output data. Statistical quality-control programs ensure that error levels are kept to a minimum.
Analysis and dissemination of results
After planning data collection and processing, look ahead to the final steps in analyzing and disseminating the results:
- organizing the data using frequency distribution tables;
- summarizing the data using measures of central tendency and measures of dispersion;
- displaying the data through different graph types; and
- writing up the survey’s findings and then disseminating them to the public.
In our example, members of the prom organizing committee might share the tasks of organizing and analyzing the data, then writing up the conclusions. Decisions about the prom venue, ticket price, type of music, etc. would then be based on these findings. By publishing highlights of the survey in the school newspaper, the student council might demonstrate that its decisions about the prom are based on expectations expressed by the students.
Sections on data exploration and data visualization will present some of these steps with more details.
- Date modified: