Statistics Canada
Symbol of the Government of Canada

Statistics: Power from Data!
Glossary

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

The definitions below provide information for those who have questions about statistics but who do not need highly technical explanations.

The definitions provided here are, in some cases, oversimplifications of highly complex concepts. For those interested in more technical definitions, click here for a list of Statistics Canada's dictionaries and definitions.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A

Aboriginal peoples

Persons that are North American Indian, Métis or Inuit (Eskimo).

Accessibility

Accessibility reflects the availability of information from the holdings of the agency. It takes into account the suitability of the format in which the information is available; the media of dissemination; the availability of metadata (descriptive text); and whether the user has reasonable opportunity to know it is available and how to access it. For users, the affordability of the information in relation to its value to them is also an aspect of this characteristic.

Accuracy

The extent to which the results of a calculation or reading of an instrument approach the true values of the calculated or measured quantities and are free from error. (See also: precision.)

Administrative by-product

Data available from the information recorded in administrative records, applications, reports, etc.

Age–sex pyramid

A graph designed to represent the age structure of a population. It consists of two horizontal histogram graphs joined together. Example of an age-sex population pyramid

B

Baby boomer

Generally, those persons born following World War II between the years 1946 and 1966.

Balance of payments

The balance of payments covers all economic transactions between Canadian residents and non-residents. It includes the current account and the capital and financial account. Source: ARCHIVED - The Daily-Friday, May 28, 2004

Bar graph

A diagram that compares bars of the same width but of different heights according to the statistics or data they represent. Bar graphs are horizontal. (A vertical bar graph is called a column graph.) Example of a bar graph

Batch keying

One of the oldest methods of data capture. It uses a computer keyboard to type in the data. This process is very practical for high-volume entry where fast production is a requirement. No editing procedures are necessary but there must be a high degree of confidence in the editing program. Also, validity and range edits need to be implemented to ensure quality keying. This does not mean the data are being re-edited, but if a field is numeric and alpha characters are entered instead, the error will be flagged. This approach can be beneficial when used for large surveys with many questions and edits.

Bias

In estimation, the bias refers to the value of a parameter of a probability distribution, the difference between the expected value of the estimator and the true value of the parameter.

C

Capital and financial account

The capital and financial account mainly comprises of transactions in financial instruments. Financial assets and liabilities with non-residents are presented under three functional classes: direct investment, portfolio investment and other investment. These investments belong either to Canadian residents (Canadian assets) or to foreign residents (Canadian liabilities). Transactions resulting in a capital inflow are presented as positive values, while capital outflows from Canada are shown as negative values. Source: ARCHIVED - The Daily-Friday, May 28, 2004

Categorical data

Consists of data that can be grouped by specific categories (also known as qualitative variables). Categorical variables may have categories that are naturally ordered (ordinal variables) or have no natural order (nominal variables). For example, the variable "height" is ordinal because it contains the categories "short", "average" and "tall" which are naturally ordered according to ascending height. On the other hand, variables such as "sex" and "hair colour", which have no natural category order, are examples of nominal variables.

Census

The collection of information about all units in a population, sometimes also called a 100% sample survey. (When capitalized, "Census" usually refers to the national Census of Population.)

Central processing unit

The Central Processing Unit (CPU) is the heart of a computer system. It can be small enough to hold in your hand but can contain millions of logic circuits.

Central tendency

A measure of location of the middle or the centre of a distribution. Central tendency can refer to a wide variety of measures such as mean, median and mode. The mean is the most commonly used measure of central tendency.

Characteristic

A property which helps to differentiate between items of a given population. This differentiation may be either qualitative or quantitative.

Class intervals

If a variable has a large number of values, it is easier to present the data by grouping the values into class intervals (i.e., age of the population presented as age groups, for example 0 to 4, 5 to 9, 10 to 14, 15 to 19, etc.) rather than presenting all of the values together. This makes it easier to see the trends in the data.

Classification

The organized representation of a given population into homogeneous categories.

Cluster

A set of units grouped together on the basis of some well-defined criteria. The cluster may be an existing grouping of the population such as a city block, or hospital; or conceptual such as the area covered by a grid imposed on a map.

Coding

A process for converting questionnaire information into numbers or symbols to facilitate subsequent data processing operations. Sometimes, this involves interpreting responses and classifying them into predetermined results.

Coefficient of determination

A measure of how much the variability of one given variable depends on its relationship with another given variable. It is calculated by squaring the value of the linear correlation, r. For example, in a linear model, a correlation of 0.80 (r = 0.80) would mean that , the coefficient of determination, would equal 0.64. Therefore, 64% of the variability in the Y values could be predicted based on the relationship with the X values.

Coefficient of variation

A measure of dispersion calculated by dividing the standard deviation of a distribution divided by its mean. The standard error of an estimate, expressed as a ratio or percentage of the estimate.

Coherence

Coherence reflects the degree to which the data and information from a single statistical program are brought together with other data information and are logically connected and completed. Fully coherent data are consistent—internally, over time, and across products and programs. Where applicable, the concepts and target populations used or presented are logically distinguished from similar concepts and target populations or from commonly used notions or terminology.

Cold deck

Makes use of a fixed set of values, which covers all of the data items. These values can be constructed with the use of historical data, subject-matter expertise, etc. A 'perfect' questionnaire is created in order to answer complete or partial imputation requirement.

Column graph

A vertical bar graph. Example of a column graph

Common-law

Two people of the opposite or of the same sex who live together as a couple, but who are not legally married to each other.

Confidence interval

An estimate using a range of values (an interval) to predict the expected value of an unknown parameter, accompanied by a specific level of confidence, or probability, that the estimate will be correct (i.e. that the interval will in fact contain the true value of the parameter).

Confidentiality

In a confidential survey, the privacy of information provided by individual respondents is maintained, and information about the individual respondents cannot be derived from the published results.

Consistency edits

Compare different answers from the same record to ensure that they are coherent with one another. For example, if a person is declared to be in the 0 to 14 age group, but also claims that he or she is retired, there is a consistency problem between the two answers. Interfield edits are another form of a consistency edit. These edits verify that if a figure is reported in one section, a corresponding figure is reported in another.

Constant dollars

Dollars of a particular base year, which are adjusted (by inflation or deflation) to show changes in the purchasing power of the dollar (See also current dollars). The base year must always be stated. Note that the terms "uninflated dollars" and "deflated dollars" are used as synonyms for "constant dollars" in government publications.

Continuous variable

A numeric variable which can assume an infinite number of real values. For example, age, distance and temperature are considered continuous variables because an individual can walk 3.642531...km.

Correlation coefficient

A measure showing to which extent two variables vary in an interdependent way.

Cumulative frequency

Determines the number of observations that lie below a particular value. It is calculated by adding each frequency to the sum of its predecessor in a stem and leaf plot and frequency distribution table.

Cumulative percentage

Calculated by dividing the cumulative frequency by the number of observations and multiplying it by 100. The last value will always be equal to 100%. This allows for easier comparison of the data.

Current account

The current account covers transactions on goods, services, investment income and current transfers. Transactions in exports and interest income are examples of receipts, while imports and interest expense are payments. The balance from these transactions determines if Canada's current account is in surplus or deficit.
Source: ARCHIVED - The Daily-Friday, May 28, 2004

Current dollars

Dollars which express the cost of items in terms of the year in which the expenditure occurs. Note that "current dollars" are also known as "budget-year dollars" or "inflated dollars", as opposed to "deflated" or "constant dollars".

D

Data

Facts or figures from which conclusions can be drawn.

Database

An organized and sorted list of facts or information; usually generated by a computer.

Data capture

The process of putting responses into a machine-readable medium.

Data coding

Raw data entered into a computer may need to be coded. This is done by labeling each of the data items with an abbreviated code (usually numerical), to make the manipulation of the data easier.

Data editing

A process that ensures survey data is accurate, complete and consistent. (See also: Imputation).

Data input

A process (e.g., scanning, paper, magnetic tapes, cards, etc.) used to enter data.

Data item

The smallest piece of information that can be obtained from a survey or Census.

Data processing

A process that converts raw data into machine readable form, then sorts, edits, manipulates and presents the data in order to create information.

Data quality

A degree or level of confidence that the data and statistical information are "fit for use". The particular issues of quality or fitness for use that must be addressed by Statistics Canada can be summarized as relevance, accuracy, timeliness, accessibility, interpretability and coherence.

Data set

Any grouping of data which has a common theme or similar attributes.

Data storage

The capacity of a computer to store information, as well as the components of the computer in which such information is stored (i.e., magnetic tape, diskette, CD-ROM, etc.).

Decennial

An event recurring every ten years.

Decennial census (Canada)

Censuses are held at the beginning of each decade, in years ending with the number 1. (See also: Quinquennial census)

Discrete variable

A numeric variable that takes only a finite number of real values (e.g., X can equal only 1, 3, 5 and 1,000).

Dissolved census subdivision

The boundaries and names of census subdivisions can change from one census to the next because of annexations, dissolutions and incorporations. These changes can result in the "dissolution" of various census subdivisions. A dissolved census subdivision is a community that existed on January 1, 1996, but which no longer existed on January 1, 2001, the 2001 Census geographic reference date. The concept of "Census Subdivision - Previous Census" has been established to provide a means of tabulating current census data for census subdivisions as they were delineated for the previous (1996) census. A "best fit" linkage was established between blocks for the 2001 Census and census subdivisions for the 1996 Census. This linkage ensures that data from the current census can be tabulated for the communities from the previous census.

Dispersion

Describes how much the observations vary around the central tendency.

Dot graph

A two dimensional diagram that indicates two variables as a series of dots, mainly used to show the correlation between the two variables. Example of a dot graph

Duplication edits

Examine one full record at a time. These types of edits check for duplicated records, making certain that a respondent or a survey item has only been recorded once. A duplication edit also checks to ensure that the respondent does not appear on the survey universe more than once, especially if there has been a name change. Finally, it ensures that the data have been entered into the system only once.

E

Editing

See data editing.

Enumeration area (EA)

The geographical area canvassed by one interviewer in the Canadian Census of Population. EAs are units which are well defined and readily identifiable on maps, but are unique to a particular Census.

Estimation

Drawing larger conclusions from a sample to predict some characteristic or trend for the whole population.

Estimator

Uses information from other questions or from other answers (from the current cycle or a previous cycle), and through mathematical operations, derives a plausible value for the missing or incorrect field.

Ethnic origin

Refers to the ethnic or cultural group(s) to which the respondent's ancestors belong. Ethnic or cultural origin refers to the ethnic "roots" or ancestral background of the population and should not be confused with citizenship or nationality.

Exclusive

When the occurrence of one event automatically excludes the possibility of another event occurring at the same time, the results are exclusive. For example, round and square are exclusive terms since an object cannot be both at once.

Exhaustive

When a set of events comprises all possible occurrences of a reference set the results are exhaustive. For example, the list of age groups 0 to 19, 20 to 34, 35 to 59 and 60 years and over is exhaustive because it covers the whole spectrum of possible ages for members of the population.

F

Focus group

An interviewing technique whereby respondents are interviewed in a group setting. It is used to stimulate the respondents to talk freely, encourage the free expression of ideas or explore attitudes and feelings about a subject. It is often used to guide the design of a questionnaire based on the respondent's reaction to the subject matter and the issues raised during the discussion. It is also referred to as an interview or group discussion.

Formula

An equation or mathematical rule.

Frame

A list, map, or conceptual specification of the units comprising the survey population from which respondents can be selected. For example, a telephone or city directory, or a list of members of a particular association or group.

Frequency

The number of times an event or item occurs in a data set.

Frequency distribution

A chart or table showing how often each value or range of values of a variable appear in a data set.

Frequency polygon

A graph formed by joining the midpoints of histogram column tops. (See also: histogram ).

Frequency table

A table presenting statistical data by putting together the values of a characteristic along with the number of times each value appears in the data set. Example of a frequency table

G

Graph

Data represented in a pictorial form (e.g., bar graph, line graph, circle graph/pie chart, histogram, pictograph, etc.).

Grouped frequency distribution

The relationship between the values of a characteristic and their frequencies when those values are grouped into class intervals.

Grouped variable

A set of data which has been grouped or classified, according to some common qualitative or quantitative characteristics. Example of a grouped variable

H

Haphazard sampling

A sample selection based on convenience or availability.

Histogram

A graph that consists of a series of columns, each having a class interval as its base and frequency of occurrence as its height. Example of a histogram

Historical edits

Are used to compare survey answers in current and previous surveys. For example, any dramatic changes since the last survey will be flagged. The ratios and calculations are also compared, and any percentage variance that falls outside the established limits will be noted and questioned.

Home language

The language spoken most often at home by the individual at the time of the Census.

Hot deck

Uses other records as 'donors' in order to answer the question (or set of questions) that needs imputation. The donor can be randomly selected from a pool of donors with the same set of predetermined characteristics. For example, if a questionnaire has been returned with the yearly income missing, then we could determine donor characteristics as records with the same province, same occupation and same amount of experience as the respondent from the survey requiring imputation. A list of possible donors matching this criteria is created and one of them is randomly selected. Once a donor is found, the donor response (in this case, the yearly income) replaces the missing or invalid response.

I

Imputation

Replacing either missing or invalid data with accepted data. Normally performed in accordance with predetermined decision rules. It is often combined with data editing.

Index

A mathematical device or number which is used to express the observation (eg., price level, volume of trade, relative amount etc.) of a given period, in comparison with that of a base period. For example, a cost-of-living index.

Industry

A grouping of producers or service-providers assembled on the basis of the homogeneity of their products or services.

Inferential statistics

The statistical methods used for inferring population values from obtained sample values.

Information

Data that have been recorded, classified, organized, related or interpreted within a framework so that meaning emerges.

Input device

A tool such as tape, cards, keyboard, diskette, CD-ROM, light pen, scanner, digital camera, etc., used to input data into a computer.

Interactive capture

Often referred to as intelligent keying. Usually, captured data are edited before they are imputed. However, this method combines data capture and data editing in one function. Although interactive capture is slower, it is a very effective approach to use when there is a lot of interdependency between questions. This process requires knowledge of editing procedures, as the errors need to be corrected right away. Interactive capture also reduces the number of documents handled, as the edits are made directly on a computer.

Internet

A network that links computers all over the world by satellite and telephone, connecting users with service networks such as e-mail and the World Wide Web.

Interpretability

Interpretability reflects the ease with which the user may understand, properly use and analyse the data or information. The adequacy of the definitions of concepts, target populations, variables and terminology underlying the data, as well as the information on any limitations of the data, largely determines the degree of interpretability.

Interquartile range

The difference between the upper and lower quartiles (Q3–Q1) of a data set. This range is used as a measure of data spread: spanning 50% of a data set and eliminating the influence of outliers (the highest and lowest quarters of a data set are removed).

Interval

A set of numbers which consists of those that are greater than one fixed number and less than another: it may also include one or both end numbers. For example, the interval 1.5 –> 3 consists of all numbers that are equal to or greater than 1.5 and less than 3. Note that the number 3 is excluded from this interval.

Intranet

A private, internal Internet with the privacy and security of an in-house system. Intranets can be connected to the public Internet via secure gateways, which have "firewalls" to prevent unwanted external access to internal systems.

J

Judgment sampling

A sample chosen on the assumption that personal judgment and expertise can be the basis of selecting units that are typical or representative of the population of interest.

K

Knowledge of official languages

An individual's ability to conduct a conversation in English only, in French only, in both English and French or in neither of the official languages of Canada.

L

Labour force

Refers to the labour market activity of the population 15 years of age and over, excluding institutional residents, in the week containing the 15th day of the month prior to Census Day. Respondents are classified as either employed or unemployed. The remainder of the working–age population is classified as not in the labour force.

Labour force participation rate

Total labour force expressed as a percentage of the population aged 15 and over. The participation rate for a particular group (for example, women aged 25 years and over) is expressed as a percentage of the population for that group.

Line graph

A graph in which successive points representing the value of a variable at selected values of the dependent variable are connected by straight lines (e.g., unemployment rates among youth over the last ten years). Example of a line graph

Linear correlation

A measure of how well data points fit a straight line. When all the points fall on the line it is called a perfect correlation. When the points are scattered all over the graph there is no correlation.

Local area network (LAN)

A communications network that serves users within a confined geographical area. It is made up of servers, workstations, a network operating system and a communications link.

M

Magnetic recordings

Allows for both reading and writing capabilities. This method may be used in areas where data security is important. The largest application for this type of data capture is the PIN number found on automatic bank cards.

Mainframe

A computer with extensive capabilities and resources to which other computers may be connected so that they can share facilities.

Margin of error

Relative figure that may be expressed as a percentage and is calculated using the sampling error of an estimate. It is used to build a confidence interval for that estimate.

Marital status

The state of being legally married (but not separated), in a common-law union, separated (but still legally married), divorced, widowed or never married (single).

Mean

The most common measure of central tendency, the mean is the arithmetic average of a set of numbers.

Median

The value of the middle item when the data are arranged from lowest to highest; a measure of central tendency. If there is an even number of observations, the median is the average of the two middle observations. In raw data, the median is the middle value, the point at which exactly half of the data are above it and half below. 

Methodology

A set of research methods and techniques applied to a particular field of study. At Statistics Canada, methodology refers to survey methodology.

Metropolitan area

Statistics Canada has created groupings of municipalities, or Census subdivisions, in order to encompass the area under the influence of a major urban centre. Specific guidelines are used to group municipalities that are closely interconnected due to people working in one municipality and living in another. The resulting geographical units are called Census metropolitan areas.

Miscellaneous edits

Fall in the range of special-reporting arrangements; dynamic edits particular to the survey; correct classification checks; changes to physical addresses, locations and/or contacts; and legibility edits (i.e., making sure the figures or symbols are recognizable and easy to read).

Midrange computer

Covers a very broad range between high-end personal computers and mainframes. Formerly called "minicomputers", which used dumb terminals connected to centralized systems, most midrange computers today function as servers in a client/server configuration.

Mode

The observation that occurs most frequently in a data set; a measure of central tendency.

Mother tongue

The first language learned at home during childhood which is still understood by the individual at the time of the Census.

Multi-purpose surveys

Survey objectives call for the measurement of many characteristics. For example, a survey on farm expenditures might want to determine more than the overall costs of running a farm. One might want to discover the cost of farm equipment, wages, loans, seeds, feed, etc.

Multi-stage sampling

The process of selecting a sample in two or more successive stages. It involves a hierarchy of different types of units. Each "first-stage" unit is potentially divisible into "second-stage" units and so on.

N

Negative correlation

In a negative correlation, the two variables tend to go in opposite directions. As one variable increases, the other variable decreases. Therefore, it can also be called an inverse relationship.

Nominal variable

Type of categorical variable that describes a name, label or category with no natural order. For example, there is no natural order in listing different types of school subjects: "History" does not have to follow "Biology". These subjects can be placed in any order.

Non-probability sampling

A sample selected by a non-probability method. For example, a scheme whereby units are selected purposefully would yield a non-random sample.

Non-random sample

See non-probability sampling.

Non-response

The situation that occurs when information from sampling units is unavailable for one reason or another. For example, the respondent is unavailable, refuses to answer or refuses to take part in the interview.

Non-response errors

Errors occurring due to non-interviews or non-responses to a specific question on a questionnaire.

Non-sampling errors

Errors caused by factors other than sampling. For example, errors in coverage, response errors, non-response errors, faulty questionnaires, interviewer recording errors, processing errors, etc.

Normal distribution

Often just called the bell-curve or bell-shaped curve. Most of the scores in this graph accumulate around the middle. The mean, median and mode are all equal, and the scores at either end of the distribution occur less often. For example, a curve representing the results of an intelligence test would have the most number of people in the middle or around the 'average' intelligence range. Whereas the number of people decreases as the scores get farther away on either side of the average, giving the curve its shape and name.

Numeric variable

A quantitative variable that describes a numerically measured value (e.g., age or number of people in a household). These variables can be either continuous or discrete.

O

Observation

Data collected for a given variable.

Ogive

The curve on a frequency distribution graph. Note that not all distribution curves have the ogive form. It is therefore better to confine the term to the normal or nearly-normal distribution.

Optical character readers

Or bar-code scanners, are able to recognize alpha or numeric characters. These readers scan lines and translate them into the program.

Ordinal variable

A type of categorical variable: an ordinal variable is one that has a natural ordering of its possible values, but the distances between the values are undefined. Ordinal variables usually have categorical scales. For example, when asking people to choose between Excellent, Good, Fair and Poor to rate something, the answer is only a category but there is a natural ordering in those categories.

Outliers

In a set of data, a value so far removed from other values in the distribution that its presence cannot be attributed to the random combination of chance causes.

P

Parameter

Parameters are unknown, quantitative measures (e.g., total revenue, mean revenue, total yield or number of unemployed people) for the entire population or for specified domains which are of interest to the investigator.

Percentage frequency

The frequency of each value or class interval expressed as a percentage of the total number of observations. Derived by multiplying each of the relative frequency values by 100.

Percentiles

The proportion of values in a distribution that a specific value is greater than or equal to. For example, if you received a mark of 95% on a math test and this mark was greater than or equal to the marks of 88% of students then you would be in the 88th percentile.

Personal computer

A general-purpose, single-user microcomputer designed to be operated by one person at a time.

Pictograph

A chart giving statistics in pictorial form. For example, using a dollar in increasing sizes to represent the increase in the purchasing power over time. Example of a pictograph

Pie chart (Circle graph)

A circular chart that provides a visual concept of a whole (100% = 360 degrees). The pie is divided into slices, each corresponding to a category of the variable represented (e.g. each age group). The size of the slices is proportional to the percentage of the corresponding category. Example of a pie chart

Population

The complete group of units to which survey results are to apply. (These units may be persons, animals, objects, businesses, trips, etc.)

Population pyramids

See age-sex pyramid.

Positive correlation

In a positive correlation, the two variables tend to move in the same direction. When one variable increases, the other variable also increases.

Precision

Precision is a measure of similarity. The same surveys conducted more than once should have the same or similar results. The closer the results from each repetition of the survey, the more precise they are.

Probability sampling

A sampling method in which every member of the population has a chance of being selected. Also called random sampling, because of the random way of selecting individuals to ensure an unbiased representation of the whole population.

Processing errors

Errors that occur during any of the processes performed in transferring data from questionnaires, control sheets, etc., into sets of tabulations and estimates.

Programming

Process of producing a set of instructions to make a computer perform a particular activity.

Pyramids

See age-sex pyramid.

Q

Quality control

The set of operations required to ensure that error levels, introduced as a result of a survey operation, are controlled within specified levels.

Quartiles

In order to determine the interquartile range, a data set is divided into four equal parts. Each separating value is called a quartile (the first, the second, etc.). The second quartile is also known as the median.

Questionnaire

A series of questions designed to elicit information on one or more topics from a respondent.

Quinquennial Census (Canada)

Used to describe Censuses taken at mid-decade, in years ending in the number 6. For example, Statistics Canada conducted quinquennial Censuses in 1976, 1986, 1996. (See also: decennial)

Quota sampling

A procedure where the number of respondents in each of several categories is specified in advance and the final selection of respondents is left to the interviewer who proceeds until the quota for each category is filled.

R

Random error

The errors that are unpredictable in an estimate. These errors tend to cancel out in a large sample, as opposed to systematic errors that keep adding up because they all go in the same direction.

Random rounding

A method whereby all figures in a tabulation, including totals, are randomly rounded (either up or down) to a multiple of "5" or in some cases "10". This technique provides protection against direct, residual or negative disclosure of the actual data, while preserving the usefulness of the data to the greatest extent possible.

Random sampling

See probability sampling.

Range

The full distance over which results vary along a number line. The exclusive range is the difference between the largest and smallest results in a data set, and the inclusive range is the difference between the upper real limit of the highest interval and the lower real limit of the lowest interval.

Range edits

Are similar to validity edits in that they look at one field at a time. The purpose of this type of edit is to ensure that the values, ratios and calculations fall within the pre-established limits.

Ratio

A proportional relationship between two different numbers or quantities, or in mathematics a quotient of two numbers or expressions, arrived at by dividing one by the other.

Raw data

Information that has not yet been organized, formatted, or analysed.

Regression

A statistical method which tries to predict the value of a characteristic by studying its relationship with one or more other characteristics. This relationship is expressed through the means of a regression equation. (See also regression model).

Regression equation

An equation whereby one unknown variable can be predicted using the given value of one or more other variables. For example, the equation Y = a + bX provides the estimated value for Y when the value for X is known. (See also regression and regression model).

Regression model

A statistical model used to depict the relationship of a dependent variable to one or more independent variables. These models have a wide variety of forms and degrees of complexity. (See also regression and regression equation).

Relative frequency

The frequency (expressed as a proportion of a whole) of each value or class interval observed in a data set for a particular variable. Calculated by dividing the frequency by the number of observations.

Relevance

The relevance of data or of statistical information is a quantitative assessment of the value contributed by these data. Value is characterized by the degree to which the data or information serve the purposes for which they were produced and sought out by users. Value is further characterized by the merit of these purposes in terms of the mandate of the agency, legislated requirements and the opportunity cost to produce the data or information.

Response error

The difference between the true answer to a question and the respondent's answer. It may be caused by the respondent, the interviewer, the questionnaire, the survey procedure or the interaction between the respondent and the interviewer.

S

S

The mathematical symbol for standard deviation

The mathematical symbol for variance.

Sample design

A set of specifications that describe population, frame, survey units, sample size, sample selection and estimation method in detail.

Sampling fluctuation

The extent to which a statistic takes on different values with different samples. That is, it refers to how much the statistic's value fluctuates from sample to sample.

Sample survey

A collection of information from only part of a population.

Sampling error

An error which arises because the data are collected from a part, rather than the whole of the population. It is usually measurable from the sample data in the case of probability sampling.

Sampling variation

The variation shown by different samples of the same size from the same population.

Scale

A graded line divided into successive values, which may be graphical, descriptive or numerical, used in reporting assessments. For graphs the scale is the subdivision of each axis. The scale may be numerical or categorical. (See also: sample graph.)

School attendance

Refers to either full-time or part-time (day or evening) attendance at school, college or university during the eight-month period between September and May. Attendance is counted only for courses which could be used as credit towards a certificate, diploma or degree.

Seasonal adjustment

A statistical technique used to remove the effect of normal seasonal fluctuations in data so underlying trends become more evident. Economic statistics, which are subject to seasonal influence, are sometimes presented with the seasonal influence removed. The calculated effect of the seasons has been eliminated from the data.

Semi-quartile range

Computed as one half the difference between the 75th percentile (Q3) and the 25th percentile (Q1). The formula is (Q3–Q1)÷2. Since half of the values lie between Q3 and Q1, the semi-quartile range is one half the distance needed to cover one half of the values.

Greek letter sigma
(sigma)

A mathematical symbol for adding the values to which it applies.

Simple random sampling

A basic probability selection scheme in which each sample has an equal chance of being selected.

Skewed

Asymmetric distribution of the data values. The values on one side of the distribution curve tend to extend further from the "middle" than the values on the other side.

Software

There are two types of software—system software, which controls the operation of the computer, (e.g., Windows and DOS) and application software (e.g., Word, EXCEL, MS ACCESS and Lotus).

Standard deviation

The square root of variance, standard deviation, measures the spread or dispersion around the mean of a data set. It is the most widely-used measure of spread.

Statistic

A function that will produce a numerical figure whose value may vary with different outcomes of an experiment (e.g. with different samples). For example, the mean or median of a sample, etc.

Statistical edits

Look at the entire set of data. This type of edit is performed only after all other edits have been applied and the data have been corrected. The data are compiled and all extreme values, suspicious data and outliers are rejected.

Statistics

A type of information obtained through mathematical operations on numerical data.

Statistics: Power from Data!

A product from Statistics Canada that will assist readers in getting the most from statistics.

Statistics, the study of

Construed as singular it is the field of study that collects and arranges numerical facts or data, that relate to human affairs or natural phenomena.

Stem and leaf plot

A semi-graphical method used to represent numerical data, in which the first (leftmost) digit of each data value is a stem and the rest of the digits of the number are the leaves. A stem and leaf plot shows all the data values from a sample set.

Stratified sampling

A sampling procedure in which the population is divided into homogeneous subgroups or strata and the selection of samples is done independently in each stratum.

Substitution

Relies on the availability of comparable data. Imputed data can be extracted from the respondent's record from a previous cycle of the survey, or the imputed data can be taken from the respondent's alternative source file (e.g. administrative files or other survey files for the same respondent). This is often difficult to do because, in many cases, there is no other information available than the information provided in the current survey.

Survey

The collection of information about characteristics of interest from some or all units of a population, using well-defined concepts, methods and procedures, and the compilation of such information into a useful summary form.

Survey units

For the purpose of sample selection, the population should be divisible into a finite number of distinct, non-overlapping and identifiable units, so that each member of the population belongs to only one survey sampling unit, to be surveyed only once.

Systematic sampling

The selection of units from a list using a selection interval (k) so that every k'th element on the list, following a random start between 1 and k, is included in the sample. For example, if k were to equal 6, and the random start were 2, then the sample would be 2, 8, 14, 20, etc.

Systems analysis

The process of breaking down a data processing problem into functional components to determine the best method of handling the problem.

T

Tally chart

Used to record data from an experiment, count the occurrences of an event and develop frequency distribution tables.

Timeliness

Timeliness of information reflects the length of time between the information's availability and the event or phenomenon it describes. Timeliness must be considered in the context of the time period that permits the information to be of value and still be acted upon. Typically, timeliness can affect the reliability of the information.

Tree diagram

A branching diagram that shows all possible combinations or outcomes.

U

Ungrouped frequency distribution

A frequency distribution of numerical data where the raw data is not grouped.

Ungrouped variable

A set of data which has not been grouped or classified but rather a listing of individual observed values (e.g., single years of age).

V

Validity edits

Look at one question field or cell at a time. They check to ensure the record identifiers, invalid characters and values have been accounted for; essential fields have been completed (e.g., no quantity field is left blank where a number is required); specified units of measure have been properly used; and the reporting time is within the specified limits.

Variable

A characteristic that may assume more than one set of values to which a numerical measure can be assigned (e.g., income, age and weight).

Variance

A measure of spread, calculated as the average squared deviation of each number from the mean of a data set. The term variance also refers to the sampling variation.

Vital statistics

Statistics relating to births, deaths, marriages, health, and disease.

Volunteer Sampling

Samples that consist of people who have volunteered their services, knowing that the process will be lengthy or demanding, and even perhaps unpleasant. These volunteers have often been found to have favourable or a least neutral attitudes, whereas the general population tend to hold a wider range of attitudes on topics of interest.

W

Wide area network (WAN)

A communications network that covers a wide geographical area, such as a province or country. It is different from a Local area network (LAN) contained within a building or complex, and the Metropolitan area network (MAN) which generally covers a city or suburb.

X

x avec une barre au-dessous

Mathematical symbol for the mean of a data set.

X-axis

The horizontal number line on a graph (The Cartesian co-ordinate plane.)

Y

Y-axis

The vertical number line on the (Cartesian co-ordinate plane.)