This report provides the background, general methods and results on an extended approach in the use of remote sensing, agroclimatic and provincial crop insurance data to model reliable early-season and mid-season crop yield estimates as part of the Crop Reporting Series at Statistics Canada. This work builds on the original modelling process which replaced the crop yield estimation from the September Farm Survey in 2016.
2. General methodology for crop yield modelling
The extended methodology for modelling crop yield was developed and tested on the crops grown in Manitoba. Statistics Canada was able to adapt the existing yield model as a result of an agreement between Statistics Canada and the Manitoba Agricultural Services Corporation (MASC) whereby historical and current year crop insurance data at the parcel level was provided, in confidence, to Statistics Canada to assist in the modelling of current year crop yields.
3. Data sources used in the model
The modelling methodology used three data sources: 1) the coarse resolution satellite data used as part of Statistics Canada's Crop Condition Assessment Program; 2) agroclimatic data, and 3) MASC crop insurance data. A description of the first two data sets as well as the processing and extraction methods can be found in greater detail in the report referenced earlier. Only the changes incorporated in this extended yield model will be highlighted in this report.
3.1 Normalized Difference Vegetation Index
A spectral vegetation index, the Normalized Difference Vegetation Index (NDVI), was used as a surrogate for photosynthetic potential of crops. One main difference between the original yield model and the extended yield model was the utilization of the Moderate Resolution Imaging Spectroradiometer (MODIS) imagery used to calculate the NDVI values. The previous satellite dataset had a pixel resolution of 1 kilometer whereas the MODIS data have a pixel resolution of 250 meters. The greater resolution of the MODIS data provides a 16 fold increase in the number of image pixels compared to the previous satellite imagery used, which is essential for the geographic level required for this methodology. Both of these NDVI datasets are released on a weekly basis throughout the growing season (mid-April to mid-October) via Statistics Canada's Crop Condition Assessment Program (CCAP). MODIS data go back to year 2000.
3.2 Agroclimatic indexes
The station-based daily temperature and precipitation data used in the model were provided by Environment Canada and other partner institutions and were used to generate the climate-based predictors. To form a manageable array of potential crop yield predictors, Agriculture and Agri-Food Canada aggregated the daily agroclimatic indexes into monthly sums and means for the months of May to August and provided the aggregated data to Statistics Canada for use within the yield model.
3.3 Crop Insurance Data
Historical crop insurance data from 2000 – 2018 were provided to Statistics Canada by MASC. This dataset included the seeded area, harvested area, and yield at the parcel level for all crops insured by farm operators within Manitoba. The 2019 dataset included only the seeded area at the parcel level for all crops insured. These data are of excellent quality and aid substantially in the development of the extended yield model.
4. Modelling Survey Yields
4.1 Modelling Methods
The crop yields are estimated through the use of a robust multivariate linear model. The model is constructed using the historical relationships between the yields reported to MASC by the farm operator at the end of the growing season (the dependent variable in the model) and the NDVI and agroclimatic measurements taken at different times during the growing season as well as a temporal variable to account for overall changes in yield throughout the years. Data from the previous ten growing seasons are used in deriving the model.
The unit used in the model is the individual parcel as reported to MASC. The largest such parcels are at the quarter section level, representing 160 acres of land. The model estimates the crop yield for an individual parcel of land. This yield is then weighted according to the amount of seeded land in the parcel as reported at the start of the growing season to MASC by the farm operator to derive an initial estimate of yield at the provincial level. Only parcels with at least 145 acres of a single crop are used in the model. An additional adjustment is then made to the modelled estimate to account for smaller parcels. See the section titled Adjustment to Initial Estimates for more details.
Due to the vast number of available NDVI and agroclimatic readings available, it was unreasonable to include all variables in the model. The selection of the variables to be retained in the model was done using the GLMSELECT procedure in SAS with the LASSO (Least Absolute Shrinkage and Selection Operator) option. With a few exceptions, a minimum of five variables was required for the model and the final set of variables to be retained was determined using the Schwarz Bayesian Information Criterion method.
There are more data points available for more common crops. In the case of the most common crops, individual models were constructed at the sub-provincial ecological level and then aggregated to the provincial level. There are seven such ecological regions in Manitoba. They were derived from clusters of ecodistricts (Terrestrial ecodistricts of Canada). Crop yields are influenced by different factors like climate, physiography and soil types for which these regions are suitable to characterize. For less common crops, the model was constructed at the provincial level. See Appendix A for a listing of each crop and the geographic level at which the model was constructed.
4.2 Adjustment to Initial Estimates
The model was constructed using insured parcels of land where a single crop of at least 145 acres was seeded. This allows an accurate assignment of NDVI to the crop being grown. The results coming directly from the model represent the estimated yield from these parcels. Thus, parcels of land which either were not insured or had less than 145 acres of a single crop were not directly included in the modelled estimates.
In the case of uninsured parcels of land, no adjustment was made to the estimates to account for them. This assumes that the yield from uninsured parcels of land is similar to that from the insured parcels with over 145 acres of a single crop. The amount of uninsured land has been estimated by comparing area information from 2016, 2017 and 2018 MASC files with area estimates from the Statistics Canada Field Crop Reporting Series by individual crop. See Appendix B for the estimates of the percentage of uninsured crops.
In the case of parcels of land which had less than 145 acres of a single crop, the historical crop insurance information can be used to compare the yields by crop from parcels with at least 145 acres to those with less. The estimates coming directly from the model can then be adjusted by the ratio of these two values, weighted by the area seeded in the current year. The ratio was calculated using ten years of observations from historical crop insurance files. See Appendix C for the average ratio of the yield of the two sizes of fields by crop and the standard error of these ratios over the past ten years.
4.3 Comparison of Modelled Crop Yields with Other Yield Indicators
During the development cycle of the model, numerous model parameters were studied. This included the number of years of historical data to use in the model, the level at which the model was constructed (provincial or ecological level), the manner in which the predictor variables were selected for the model, the definition of a parcel of land, the methods for adjusting the values coming from the model to better represent the entire population of crop producers and other factors.
One manner of evaluating the success of the model was to compare the results with other statistics which measure crop yield. For the purpose of this study, two data sources were examined
- The crop yields reported to MASC by the farm operators at the end of the year
- The results from Statistics Canada's Field Crop Reporting Series surveys.
Statistics Canada measures crop yields at three points in the growing season
- Early-season estimates, using a survey which takes place in July, referred to as the July survey
- Mid-season estimates, using a survey which takes place in early September, referred to as the September survey
- End of season estimates, using a survey which takes place in late October.
As part of its evaluation, the crop yield model was used to produce both early-season and mid-season crop yield estimates
Nine sets of early-season and mid-season crop yield estimates (representing crop years 2010 to 2018) were produced using the model and compared to the results from the other data sources by crop when possibleFootnote 1. A relative difference measurement, relative to the yield estimate from the crop insurance files was calculated for both the survey and modelled values.
where method i is either the modelled yield estimate or the survey yield estimate.
Three statistics based on the relative difference were produced for each crop
- The average relative difference across the nine years
- The average of the absolute value of the relative difference across the nine years
- The maximum absolute relative difference across the nine years.
The results are presented in Table 1 for early-season estimates and those for mid-season are presented in Table 2.
The September survey estimates only cover the 2010 to 2015 period since the survey was cancelled in 2016.
|Average Relative Difference||Average Absolute Relative Difference||Maximum Absolute Relative Difference|
|Crop||2018 Seeded Area (acres)||July survey (%)||Modelled values (%)||July survey (%)||Modelled values (%)||July survey (%)||Modelled values (%)|
|Corn for grain||421,000||-12.7||5.5||13.2||10.1||22.0||30.6|
|Beans, dry coloured||105,800||-8.8||7.2||10.4||13.9||22.7||32.3|
|Rye, fall remaining||42,300||0.5||-2.0||15.0||14.8||34.6||24.8|
|Beans, dry white||30,100||-13.2||3.0||17.0||16.8||35.1||32.0|
|Wheat, Canada Western Red Spring||2,590,000||-12.6||-6.3||12.6||9.7||28.1||27.2|
|Wheat, other spring||20,900||-20.3||-2.8||25.6||15.8||68.4||31.1|
|Wheat, Canada Prairie Spring Red and Canada Prairie Spring White||51,000||-6.1||-9.0||18.4||15.3||42.0||33.6|
|Wheat, winter remaining||41,000||-1.9||2.5||6.1||7.2||10.2||19.8|
|Average Relative Difference||Average Absolute Relative Difference||Maximum Absolute Relative Difference|
|Crop||2018 Seeded Area (acres)||September survey (%)||Modelled values (%)||September survey (%)||Modelled values (%)||September survey (%)||Modelled values (%)|
|Corn for grain||421,000||-15.3||2.4||15.3||11.5||28.6||32.1|
|Beans, dry coloured||105,800||...||4.9||...||8.5||...||22.5|
|Rye, fall remaining||42,300||1.9||-2.3||11.7||17.4||20.3||27.0|
|Beans, dry white||30,100||...||-4.9||...||11.3||...||33.0|
|Wheat, Canada Western Red Spring||2,590,000||-9.3||-8.3||9.3||8.6||17.1||25.2|
|Wheat, other spring||20,900||-9.3||-3.8||9.3||13.7||17.1||32.5|
|Wheat, Canada Prairie Spring Red and Canada Prairie Spring White||51,000||-9.3||-4.2||9.3||11.3||17.1||30.9|
|Wheat, winter remaining||41,000||-0.9||1.6||2.9||9.7||5.9||23.9|
Overall, the model compares favourably to the results of the July and September surveys, especially for the more common crops in Manitoba such as canola, spring wheat and soybeans. The average relative differences and average absolute relative differences from the model are, in general, of better quality compared to those from the survey.
The value of the relative difference itself shows the percentage difference between the model and the crop insurance values. On average the relative differences are quite good for many cases, but it is important to note that a good average can consist of a large positive value with a similarly large negative value. The absolute values show that there may be an important difference between the two estimates at any one point in time. The maximum shows that this can be quite large in some situations.
6. Data Quality Indicator – the Coefficient of Variation:
Since the parcel level yield values are estimates from a model, they are subject to error. One indicator that can be used to measure the degree of possible error, and therefore the degree of uncertainty in the estimates is the coefficient of variation or CV. In the case of the yield model, the variability is measured based on the standard error of the individual predicted values i.e. the error in the prediction at the parcel level. Note that the modelled CVs are calculated in a different manner to those from a survey and are not directly comparable. The model CVs can be considered to be a conservative estimate of the true variability, that is, an upper bound. In fact the true CV may be lower. Table 3 shows the average CVs for each crop from the 2010-2018 tests.
|Crop||Early-season Estimate Average CV (%)||Mid-season Estimate Average CV (%)|
|Corn for grain||14.3||14.6|
|Beans, dry coloured||24.1||24.4|
|Rye, fall remaining||27.9||27.5|
|Beans, dry white||21.9||23.0|
|Wheat, Canada Western Red Spring||17.8||18.1|
|Wheat, other spring||18.9||19.0|
|Wheat, Canada Prairie Spring Red and Canada Prairie Spring White||27.1||25.8|
|Wheat, winter remaining||17.4||17.6|
|Note that the CVs do not account for any variability related to the adjustment factor for smaller parcels.|
7. Release Criteria
A set of rules were established to determine which modelled yields are of an acceptable level of quality to publish. These rules are based on the success of the robust multivariate linear model and the resulting CV calculated for each modelled estimate. These rules are applied to each crop and differ slightly depending on whether the model was constructed at the provincial or ecological level.
Firstly, it is possible that an estimate may not be generated by the model. This is most likely to occur with rare crops. It may result because there are less than 50 parcels of land in the historical database with which to build the model or because no mathematical solution can be found to run the model.
Secondly, if the CV of the provincial estimate from the model is greater than 35%, the estimate is not published.
Finally, there was an additional rule for crops which are modelled at the ecological level. The provincial estimate for a crop will not be published if the total seeded area from ecological regions that do not meet the previous set of conditions exceeds 10% of the provincial seeded area for the crop. In such cases, the model may be rerun at the provincial level to obtain a provincial estimate.
Appendix A: Geographic Level at which the Crop Yield Models were Constructed
|Crop||Geographic Level of Modelling|
|Canola (rapeseed)||Ecological regions|
|Corn for grain||Ecological regions|
|Beans, dry coloured||Ecological regions|
|Rye, fall remaining||Provincial|
|Beans, dry white||Ecological regions|
|Wheat, Canada Western Red Spring||Ecological regions|
|Wheat, other spring||Ecological regions|
|Wheat, Canada Prairie Spring Red and Canada Prairie Spring White||Provincial|
|Wheat, winter remaining||Ecological regions|
Appendix B: Percentage of Crop Area not insured through Crop Insurance
|Crop||Estimated Percentage of Uninsured Crop Area (%)Footnote 4|
|Corn for grain||8.5|
|Beans, dry coloured||8.8|
|Rye, fall remaining||9.5|
|Beans, dry white||8.8|
|Wheat, Canada Western Red Spring||6.0|
|Wheat, other spring||6.0|
|Wheat, Canada Prairie Spring Red and Canada Prairie Spring White||6.0|
|Wheat, winter remaining||15.4|
Appendix C: Ratio of the Yields between Small and Large Parcels of Insured Land
|Crop||Estimated Ratio of Yield of Smaller to Larger Fields||Variance|
|Corn for grain||0.99||0.0007|
|Beans, dry coloured||1.04||0.0035|
|Rye, fall remaining||0.92||0.0013|
|Beans, dry white||1.00||0.0045|
|Wheat, Canada Western Red Spring||0.92||0.0003|
|Wheat, other spring||0.95||0.0053|
|Wheat, Canada Prairie Spring Red and Canada Prairie Spring White||1.05||0.0295|
|Wheat, winter remaining||0.93||0.0007|