When is dynamic microsimulation the appropriate simulation approach?

Whenever we study the dynamics of a system made up of smaller scale units, microsimulation is a possible simulation approach – but when is it worth the trouble creating thousands or millions of micro units? In this section we give three answers to this question, the first focusing on population heterogeneity, the second on the difficulty in aggregating behavioural relations, and the third on individual histories.

Population heterogeneity

Microsimulation is the preferred modeling choice if individuals are different, if differences matter, and if there are too many possible combinations of considered characteristics to split the population into a manageable number of groups.

Most of classical macroeconomic theory is based on the assumption that the household sector can be represented by one representative agent. Individuals are assumed to be identical or, in the case of overlapping generation models, to differ only by age. (Each cohort is represented by one representative agent). However, such an approach is not applicable whenever finer grained distributions matter. Imagine we are interested in studying the sustainability and distributional impact of a tax benefit system. If there is only one representative individual and the tax benefit system is balanced, this average person will receive in benefits and services what she pays for through taxes and social insurance contributions (with some of her work hours spent to administer the system). To model tax revenues, we have to account for the heterogeneity in the population--if income taxes are progressive, tax revenues depend not only on total income but also its distribution. When designing tax reform, we usually aim at distributing burdens differently. We have to represent the heterogeneity of the population in the model to identify the winners and losers of reform.

Microsimulation is not the only modeling choice when dealing with heterogeneity. The alternative is to group people by combinations of relevant characteristics instead of representing each person individually. This is done in cell-based models. The two approaches have a direct analogy in how data are stored: a set of individual records versus a cross-classification table in which each cell corresponds to a combination of characteristics. A population census can serve as an example. If we were only interested in age and sex breakdowns, a census could be conducted by counting the individuals with each combination of characteristics. The whole census could be displayed in a single table stored as a spreadsheet. However, if we were to add characteristics to our description beyond age and sex, the number of table cells would grow exponentially, making this approach increasingly impractical. For example, 12 variables or characteristics with 6 levels each would force us to group our population into more than 2 billion cells (6^12 = 2,176,782,336). We would quickly end up with more cells than people. In the presence of continuous variables (e.g. income) the grouping approach becomes impossible altogether without losing information, since we would have to group data (e.g. defining income levels). The solution is to keep the characteristics of each person in an individual record – the questionnaire – and eventually a database row.

These two types of data representation (cross-classification table versus a set of individual records) correspond to the two types of dynamic simulation. In cell-based models, we update a table; in microsimulation models, we change the characteristics of every single record (and create a new record at each birth event). In the first case we have to find formulas on how the occupancy of each cell changes over time; in the second we have to model individual changes over time. Both approaches aim at modeling the same processes but on different levels. Modeling on the macro level might save us a lot of work but is only possible under restrictive conditions since the individual behavioural relations themselves need to be aggregated, which is not always possible. Otherwise no formulas will exist on how the occupancy of each cell changes over time.

Contrasting microsimulation to cell-based models is fruitful for the understanding of the microsimulation approach. In the following we further develop this comparison using population projections as an example. With a cell-based approach, if we are only interested in total population numbers by age, updating an aggregated table (a population pyramid) only requires a few pieces of information: age-specific fertility rates, age-specific mortality rates, and the age distribution in the previous period. In the absence of migration, the population of age x in period t is the surviving population from age x-1 in the period t-1. For a given mortality assumption, we can directly calculate the expected future population size of age x. With a microsimulation approach, survival corresponds to an individual probability (or rate, if we model in continuous time). An assumption that 95% of an age group will be still alive in a year results in a stochastic process at the micro level--individuals can be either alive or dead. We draw a random number between 0 and 1--if it is below the .95 threshold, the simulated person survives. Such an exercise is called Monte Carlo simulation. Due to this random element, each simulation experiment will result in a slightly different aggregated outcome, converging to the expected value as we increase the simulated population size. This difference in aggregate results is called Monte Carlo variation which is a typical attribute of microsimulation.

The problem of aggregation

Microsimulation is the adequate modeling choice if behaviours are complex at the macro level but better understood at the micro level.

Many behaviours are modeled much more easily at the micro level, as this is where decisions are taken and tax rules are defined. In many cases, behaviours are also more stable at the micro level at which there is no interference from composition effects. Even complete stability at the micro level does not automatically correspond to stability at the macro level. For example, looking at educational attainment, one of the best predictors of educational decisions is parents’ education. So if we observe an educational expansion – e.g. increasing graduation rates - at the population level, the reason is not necessarily a change of micro behaviour; it can lie entirely in the changing composition of the parents’ generation.

Tax and social security regulations tie rules in a non-linear way to individual and family characteristics, impeding the aggregation of their operations. Again, there is no formula to directly calculate the effect of reform or the sustainability of a system, not even ignoring distributive issues. To calculate total tax revenues, we need to know the composition of the population by income (progressive taxes), family characteristics (dependent children and spouses) and all other characteristics which affect the calculation of individual tax liability. Using microsimulation, we are able to model such a system at any level of detail at the micro level and to then aggregate individual taxes, contributions and benefits.

Individual histories

Microsimulation is the only modeling choice if individual histories matter, i.e. when processes possess memory.

School dropout is influenced by previous dropout experiences, mortality by smoking histories, old age pensions by individual contribution histories, and unemployment by previous unemployment spells and durations. Processes of interest in the social sciences are frequently of this type, i.e. they have a memory. For such processes, events that have occurred in the past can have a direct influence on what happens in the future. This impedes the use of cell-based models because once a cell is entered, all information on previous cell membership is lost. In such cases, microsimulation thus becomes the only available modeling option.