Computer vision models: seed classification project

By AI Lab, Canadian Food Inspection Agency

Introduction

The AI Lab team at the Canadian Food Inspection Agency CFIA) is composed of a diverse group of experts, including data scientists, software developers, and graduate researchers, all working together to provide innovative solutions for the advancement of Canadian society. By collaborating with members from inter-departmental branches of government, the AI Lab leverages state-of-the-art machine learning algorithms to provide data-driven solutions to real-world problems and drive positive change.

At the CFIA's AI Lab, we harness the full potential of deep learning models. Our dedicated team of Data Scientists leverage the power of this transformative technology and develop customised solutions tailored to meet the specific needs of our clients.

In this article, we motivate the need for computer vision models for the automatic classification of seed species. We demonstrate how our custom models have achieved promising results using "real-world" seed images and describe our future directions for deploying a user-friendly SeedID application.

At the CFIA AI Lab, we strive not only to push the frontiers of science by leveraging cutting-edge models but also in rendering these services accessible to others and foster knowledge sharing, for the continuous advancement of our Canadian society.

Computer vision

To understand how image classification models work, we first define what exactly computer vision tasks aim to address.

What is computer vision:

Computer Vision models are fundamentally trying to solve what is mathematically referred to as ill-posed problems. They seek to answer the question: what gave rise to the image?

As humans, we do this naturally. When photons enter our eyes, our brain is able to process the different patterns of light enabling us to infer the physical world in front of us. In the context of computer vision, we are trying to replicate our innate human ability of visual perception through mathematical algorithms. Successful computer vision models could then be used to address questions related to:

  • Object categorisation: the ability to classify objects in an image scene or recognise someone's face in pictures
  • Scene and context categorisation: the ability to understand what is going in an image through its components (e.g. indoor/outdoor, traffic/no traffic, etc.)
  • Qualitative spatial information: the ability to qualitatively describe objects in an image, such as a rigid moving object (e.g. bus), a non-rigid moving object (e.g. flag), a vertical/horizontal/slanted object, etc.

Yet, while these appear to be simple tasks, computers still have difficulties in accurately interpreting and understanding our complex world.

Why is computer vision so hard:

To understand why computers seemingly struggle to perform these tasks, we must first consider what an image is.

Figure 1

Are you able to describe what this image is from these values?

Description - Figure 1

This image shows a brown and white pixelated image of a person’s face. The person's face is pixelated, with the pixels being white and the background being brown. Next to the image, there's a zoomed in image showing the pixel values corresponding to a small patch of the original image.

An image is a set of numbers, with typically three colour channels: Red, Green, Blue. In order to derive any meaning from these values, the computer must perform what is known as image reconstruction. In its most simplified form, we can mathematically express this idea through an inverse function:

x = F-1(y)

Where:

y represents data measurements (ie. pixel values).
x represents a reconstructed version of measurements, y, into an image.

However, it turns out solving this inverse problem is harder than expected due to its ill-posed nature.

What is an ill-posed problem

When an image is registered, there is an inherent loss of information as the 3D world gets projected onto a 2D plane. Even for us, collapsing the spatial information we get from the physical world can make it difficult to discern what we are looking at through photos.

Figure 2

Michelangelo (1475-1564). Occlusion caused by different viewpoints can make it difficult to recognise the same person.

Description - Figure 2

The image shows three paintings of different figures, each with a different expression on their faces. One figure appears to be in deep thought, while the other two appear to be in a state of contemplation. The paintings are made of a dark, rough material, and the details of their faces are well-defined. The overall effect of the image is one of depth and complexity. The paintings are rotated in each frame to create a sense of change.

Figure 3

Bottom of soda cans. Different orientations can make it impossible to identify what is contained in the can.

Description - Figure 3

The image shows five metal cans, four of them with a different patch of color on the lid. The colors are blue, green, red, and yellow. The cans are arranged on a countertop. The countertop is made of a dark surface, such as granite or concrete.

Figure 4

Yale Database of faces. Variations in lighting can make it difficult to recognise the same person (recall: all computers “see” are pixel values).

Description - Figure 4

The image shows two images of the same face. The images are captured from different angles, resulting in two different perceived expressions of the face. On the left frame the man a neutral facial expression, whereas on the right frame he has a serious and angry expression.

Figure 5

Rick Scuteri-USA TODAY Sports. Different scales can make it difficult to understand context from images.

Description - Figure 5

The image shows four different images, at different scales. The first images contain only what looks like the eye of a bird. The second image contains the head and neck of a goose. The third image shows the entire animal, and the fourth image shows a man standing in front of the bird pointing in a direction.

Figure 6

Different photos of chairs. Intra-class variation can make it difficult to categorise objects (we can discern a chair through its functional aspect)

Description -Figure 6

The image shows 5 different chairs. The first one is a red chair with a wooden frame. The second one is a black leather swivel chair. The third looks like an unconventional artistic chair. The fourth one looks like a minimalist office chair, and the last one looks like a bench.

It can be difficult to recognise objects in 2D pictures due to possible ill-posed properties, such as:

  • Lack of uniqueness: Several objects can give rise to the same measurement.
  • Uncertainty: Noise (e.g. blurring, pixilation, physical damage) in photos can make it difficult or impossible to reconstruct and identify an image.
  • Inconsistency: slight changes in images (e.g. different viewpoints, different lighting, different scales, etc.) can make it challenging to solve for the solution, x, from available data points, y.

While computer vision tasks may, at first glance, appear superficial, the underlying problem they are trying to address is quite challenging!

Now we will address some Deep Learning driven solutions to tackle computer vision problems.

Convolutional Neural Networks (CNNs)

Figure 7

Graphical representation of a convolutional neural network (CNN) architecture for image recognition. (Hoeser and Kuenzer, 2020)

Description - Figure 7

This is a diagram of a convolutional neural network (ConvNet) architecture. The network consists of several layers, including an input layer, a convolutional layer, a pooling layer, and an output layer. The input layer takes in an image and passes it through the convolutional layer, which applies a set of filters to the image to extract features. The pooling layer reduces the size of the image by applying a pooling operation to the output of the convolutional layer. The output layer processes the image and produces a final output. The network is trained using a dataset of images and their corresponding labels.

Convolutional Neural Networks (CNNs) are a type of algorithm that has been really successful in solving many computer vision problems, as previously described. In order to classify or identify objects in images, a CNN model first learns to recognize simple features in the images, such as edges, corners, and textures. It does this by applying different filters to the image. These filters help the network focus on specific patterns. As the model learns, it starts recognizing more complex features and combines the simple features it learned in the previous step to create more abstract and meaningful representations. Finally, the CNN takes the learned features and to classify images based on the classes it's been trained with.

Figure 8

Evolution of CNN architectures and their accuracy, for image recognition tasks from 2012 to 2019. (Hoeser and Kuenzer, 2020).

Description - Figure 8

The image shows the plot of the size of different CNN architectures and models from the year 2012 until 2019. Each neural network is depicted as a circle, with the size of the circle corresponding to the size of the neural network in terms of number of parameters.

The first CNN was first proposed by Yann LeCun in 1989 (LeCun, 1989) for the recognition of handwritten digits. Since then, CNNs have evolved significantly over the years, driven by advancements in both model architecture and available computing power. To this day, CNNs continue to prove themselves are powerful architectures for various recognition and data analysis tasks.

Vision Transformers (ViTs)

Vision Transformers (ViTs) are a recent development in the field of computer vision that apply the concept of transformers, originally designed for natural language processing tasks, to visual data. Instead of treating an image as a 2D object, Vision Transformers view an image as a sequence of patches, similar to how transformers treat a sentence as a sequence of words.

Figure 9

An overview of a ViT as illustrated in An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Since the publication of the original ViT, numerous variations and flavours have been proposed and studied.

Description - Figure 9

The image shows the diagram of the ViT architecture. There is an image of the input image, being splitted into different patches, and each patch is fed into the neural network. The network consists of a transformer encoder block and an MLP Head block, followed by a classification head.

The process starts by splitting an image into a grid of patches. Each patch is then flattened into a sequence of pixel vectors. Positional encodings are added to retain the positional information, as is done in transformers for language tasks. The transformed input is then processed through multiple layers of transformer encoders to create a model capable of understanding complex visual data.

Just as Convolutional Neural Networks (CNNs) learn to identify patterns and features in an image through convolutional layers, Vision Transformers identify patterns by focusing on the relationships between patches in an image. They essentially learn to weigh the importance of different patches in relation to others to make accurate classifications. The ViT model was first introduced by Google's Brain team in a paper in 2020. While CNNs dominated the field of computer vision for years, the introduction of Vision Transformers demonstrated that methods developed for natural language processing could also be used for image classification tasks, often with superior results.

One significant advantage of Vision Transformers is that, unlike CNNs, they do not have a built-in assumption of spatial locality and shift invariance. This means they are better suited for tasks where global understanding of an image is required, or where small shifts can drastically change the meaning of an image.

However, ViTs typically require a larger amount of data and compute resources compared to CNNs. This factor has led to a trend of hybrid models that combine both CNNs and transformers to harness the strengths of both architectures.

Seed classification

Background:

Canada's multi-billion seed and grain industry has established a global reputation in the production, processing, and exportation of premium-grade seeds for planting or grains for food across a diverse range of crops. Its success is achieved through Canada's commitment to innovation and the development of advanced technologies, allowing for the delivery of high-quality products with national and international standards with diagnostic certification that meet both international and domestic needs.

Naturally, a collaboration between a research group from the Seed Science and Technology Section and the AI Lab of CFIA was formed to maintain Canada's role as a reputable leader in the global seed or grain and their associated testing industries.

Background: Quality Control

The seed quality of a crop is reflected in a grading report, whereby the final grade reflects how well a seed lot conforms with Canada's Seeds Regulations to meet minimum quality standards. Factors used to determine crop quality include contaminated weed seeds according to Canada's Weed Seeds Order, purity analysis, and germination and disease. While germination provides potentials of field performance, assessing content of physical purity is essential in ensuring that the crop contains a high amount of the desired seeds and is free from contaminants, such as prohibited and regulated species, other crop seeds, or other weed seeds. Seed inspection plays an important role in preventing the spread of prohibited and regulated species listed in the Canadian Weed Seeds Order. Canada is one of the biggest production bases for global food supply, exporting huge number of grains such as wheat, canola, lentils, and flax. To meet the Phyto certification requirement and be able to access wide foreign markets, analyzing regulated weed seeds for importing destinations is in high demand with quick turnaround time and frequent changes. Testing capacity for weeds seeds requires the support of advanced technologies since the traditional methods are facing a great challenge under the demands.

Motivation

Presently, the evaluation of a crop's quality is done manually by human experts. However, this process is tedious and time consuming. At the AI Lab, we leverage advanced computer vision models to automatically classify seed species from images, rendering this process more efficient and reliable.

This project aims to develop and deploy a powerful computer vision pipeline for seed species classification. By automating this classification process, we are able to streamline and accelerate the assessment of crop quality. We develop upon advanced algorithms and deep learning techniques, while ensuring an unbiased and efficient evaluation of crop quality, paving the way for improved agricultural practices.

Project #1: Multispectral Imaging and Analysis

In this project, we employ a custom computer vision model to assess content purity, by identifying and classifying desired seed species from undesired seed species.

We successfully recover and identify the contamination by three different weed species in a screening mixture of wheat samples.

Our model is customised to accept unique high resolution, 19-channel multi-spectral image inputs and achieves greater than 95% accuracy on held out testing data.

We further explored our model's potential to classify new species, by injecting five new canola species into the dataset and observing similar results. These encouraging findings highlight our model's potential for continual use even as new seed species are introduced.

Our model was trained to classify the following species:

  • Three different thistles (weed) species:
    • Cirsium arvense (regulated species)
    • Carduus nutans (Similar to the regulated species)
    • Cirsium vulgare (Similar to the regulated species)
  • Six Crop seeds,
    • Triticum aestivum subspecies aestivum
    • Brassica napus subspecies napus
    • Brassica juncea
    • Brassica juncea (yellow type)
    • Brassica rapa subspecies oleifera
    • Brassica rapa subspecies oleifera (brown type)

Our model was able to correctly identify each seed species with an accuracy of over 95%.

Moreover, when the three thistle seeds were integrated with the wheat screening, the model achieved an average accuracy of 99.64% across 360 seeds. This demonstrated the model's robustness and ability to classify new images.

Finally, we introduced five new canola species and types and evaluated our model's performance. Preliminary results from this experiment showed a ~93% accuracy on the testing data.

Project #2: Digital Microscope RGB Imaging and Analysis

In this project, we employ a 2-step process to identify a total of 15 different seed species with regulatory significance and morphological challenge across varying magnification levels.

First, a seed segmentation model is used to identify each instance of a seed in the image. Then, a classification model classifies each seed species instance.

We perform multiple ablation studies by training on one magnification profile then testing on seeds coming from a different magnification set. We show promising preliminary results of over 90% accuracy across magnification levels.

Three different magnification levels were provided for the following 15 species:

  • Ambrosia artemisiifolia
  • Ambrosia trifida
  • Ambrosia psilostachya
  • Brassica junsea
  • Brassica napus
  • Bromus hordeaceus
  • Bromus japonicus
  • Bromus secalinus
  • Carduus nutans
  • Cirsium arvense
  • Cirsium vulgare
  • Lolium temulentum
  • Solanum carolinense
  • Solanum nigrum
  • Solanum rostratum

A mix of 15 different species were taken at varying magnification levels. The magnification level was denoted by the total number of instances of seeds present in the image, either: 1, 2, 6, 8, or 15 seeds per image.

In order to establish a standardised image registration protocol, we independently trained separate models from a subset of data at each magnification then evaluated the model performance across a reserved test set for all magnification levels.

Preliminary results demonstrated the model's ability to correctly identify seed species across magnifications with over 90% accuracy.

This revealed the model's potential to accurately classify previously unseen data at varying magnification levels.

Throughout our experiments, we tried and tested out different methodologies and models.

Advanced models equipped with a canonical form such as Swin Transformers fared much better and proved to be less perturbed by the magnification and zoom level.

Discussion + Challenges

Automatic seed classification is a challenging task. Training a machine learning model to classify seeds poses several challenges due to the inherent heterogeneity within and between different species. Consequently, large datasets are required to effectively train a model to learn species-specific features. Additionally, the high degree of similarity among different species within genera for some of them makes it challenging for even human experts to differentiate between closely related intra-genus species. Furthermore, the quality of image acquisition can also impact the performance of seed classification models, as low-quality images can result in the loss of important information necessary for accurate classification.

To address these challenges and improve model robustness, data augmentation techniques were performed as part of the preprocessing steps. Affine transformations, such as scaling and translating images, were used to increase the sample size, while adding Gaussian noise can increase variation and improve generalization on unseen data, preventing overfitting on the training data.

Selecting the appropriate model architecture was crucial in achieving the desired outcome. A model may fail to produce accurate results if end users do not adhere to a standardized protocol, particularly when given data that falls outside the expected distribution. Therefore, it was imperative to consider various data sources and utilize a model that can effectively generalize across domains to ensure accurate seed classification.

Conclusion

The seed classification project is an example of the successful and ongoing collaboration between the AI Lab and the Seed Science group at the CFIA. By pooling their respective knowledge and expertise, both teams contribute to the advancement of Canada's seed and grain industries. The seed classification project showcases how leveraging advanced machine learning tools has the potential to significantly enhance the accuracy and efficiency of evaluating seed or grain quality with compliance of Seed or Plant Protection regulations, ultimately benefiting both the agricultural industry, consumers, Canadian biosecurity, and food safety.

As Data Scientists, we recognise the importance of open-source collaboration, and we are committed to upholding the principles of open science. Our objective is to promote transparency and engagement through open sharing with the public.

By making our application available, we invite fellow researchers, seed experts, and developers to contribute to its further improvement and customisation. This collaborative approach fosters innovation, allowing the community to collectively enhance the capabilities of the SeedID application and address specific domain requirements.

Meet the Data Scientist

If you have any questions about my article or would like to discuss this further, I invite you to Meet the Data Scientist, an event where authors meet the readers, present their topic and discuss their findings.

Register for the Meet the Data Scientist event. We hope to see you there!

MS Teams – link will be provided to the registrants by email

Subscribe to the Data Science Network for the Federal Public Service newsletter to keep up with the latest data science news.

Date modified: