- Last updated on 2004.12.24 by Frans Janssens
Checklist of the Collembola:
Image Analysis, Morphometry and Classification of scanned Collembola samples applied to Specimen Identification

Frans Janssens, Department of Biology, University of Antwerp (RUCA), Antwerp, B-2020, Belgium
Frank B. Dazzo, Department of Microbiology & Molecular Genetics, Michigan State University, East Lansing, Michigan 48824-4320, USA


Identifying specimens of Collembola is a very time consuming task. Especially in ecological studies, where the number of specimens to be investigated can be very large, is the effort to be spend to identify all specimens usually too high to produce the final results within the relatively short time frame of the study.
The automatic counting of specimens of the same species, using digital image analysis techniques, has been done with success (Krogh, Johansen & Holmstrup, 1998:201-205). The commercial application LemnaTec Collembol counts springtails automatically after suspending of the soil sample with stained water. Besides number, the size of each springtail is quantified to give additional information on number and size of each generation and individual quality of the specimens.
Further automatisation of the identifying process might relieve researchers from the tideous task of handling the many specimens manually.


The technology of image processing and analysis tools has advanced currently up to such an extend that it might become feasible to apply them to the field of specimen samples classification and identification. In the experimental approach, described in this work, commonly available equipment and software is used to process samples of specimens with the purpose to produce a report on the specimens in which they are classified according to predefined criteria.

Overview of the process


Preparing the specimens
Prepare a sample of with Tullgren or Berlese funnel extracted specimens in a petri dish. Use a clear glass dish with a flat bottom. Specimens are put in ethanol + 10% glycerol. The addition of the glycerol improves the distribution of the individual specimens in the petri dish and avoids in this way that specimens cross over one another. Also make sure to use a low 'watermark' in the dish: just as high as the specimens are thick. Then you can make use of the surface tension to spread specimens evenly across the dish. The problem remains that they can still touch eachother. This is solved by applying an erosion filter during image (pre)processing.
Fig.1. A scanned sample of Collembola specimens
Scanning the specimens sample
Initially, a low resolution scan (Fig.1) of a sample is used to evaluate the feasibility of the image processing and analysis technique for specimen identification purposes. The advantage of the low resolution is that the image files are relative small. Since a lot of processing is to be done on the images, the small size will reduce processing time, making results for evaluation available in a short time. The disadvantadge of a low resolution is that a lot of information is lost in the scanned image. This reduced information implies that also the available criteria to classify the specimens are reduced.

The resolution used to make the scan is 300 x 300 dpi (dots per inch). This means that the smallest in the image recognisable morphological feature is about 85 micron. It is clear that a higher resolution will give more detailed information that can be used by the classification logic to produce a more accurate classification. The scan is saved as a bilevel (black and white) image in the OS/2 or Windows Bitmap file format (BMP). The bilevel image keeps the file relatively small. The BMP format is an uncompressed format warranting that no information is lost while saving the scan.

Filtering the scanned image
For y=1 to Inbmp.Height-2
  For x=1 to Inbmp.Width-2
    if (pixa.Lum=0)
      if (pixa.Lum+pixb.Lum+pixc.Lum+pixd.Lum>764)
  Next x
Next y
Tab.1. The binary erosion algorithm
The scanned image contains a lot of noise. It is necessary to preprocess the image to remove this disturbing noise before starting with the image analysis itself. Several off-the-shelve available filters were evaluated. But none seemed to produce the desired result: the images of the specimens themselves should not be deformed by the noise filter. Eventually the fully programmable filter for BMP formatted image files 'BMP Wizard' of Andrea Benoni was used. This filter can be programmed to manipulate the image up to pixel level. The filter is programmed with a simplified binary erosion algorithm (see pseudocode in Tab.1). The erosion operation is one of the basic operators of what is called mathematical morphology. Mathematical morphology operators concentrate on the task of reducing imaging information. The erosion process causes all isolated points and small objects to disappear. The structuring element (mask) used in the erosion is a square of 4 pixels. Within the context of the over the image sliding mask all foreground (black) pixels that are touching 3 background (white) pixels are removed. This operation is recursively repeated until no more pixels can be removed. The result of this recursive binary erosion is that all isolated dots, small objects and small protrusions of larger objects are removed (Fig. 2).
Fig.2. The erosed scan
Fig.2a. All appendages removed
Fig.2b. Combined image to show removed parts

Applying another filter to the erosed image removes all appendages from the bodies of the specimens (fig. 2a). In our experiment, all processes narrower than 3 pixels are considered as appendages. The combined image in fig.2b shows what the filter has removed in red colour.
Object analysis of the filtered image
Object analysis and classification (also called 'blob analysis') is performed with the UTHSCSA Image Tool. Each object can be analysed using measurements such as: area, perimeter, compactness, roundness, elongation, major axis, minor axis, gray level, etc.
Fig.3. Some zoomed-in traced objects

Finding the objects using automatic thresholding of the image is the first step of the analysis. The find objects operation identifies isolated regions in the current image. Automatic thresholding will produce reproducible results. Automatic object selection will ignore objects which touch any edge of the image. Since the border objects are most likely not complete, further analysis does not make any sense. Therefore, objects at the edge of the image are excluded. Objects with a predefined minimum size can be discarded by automatic object selection. Objects are annotated with cardinal identification numbers on the original image. This is useful for interpreting the results of further analysis functions. Identified objects are marked with annotated outlines. The objects in the image are extracted by tracing the outlines of the objects. Each traced object is numbered and counted. The 8 times zoomed-in part (Fig.3) contains 9 traced objects, numbered as follows: 4, 7, 14, 15, 17, 19, 20, 21, and 22. Based on their size, the objects can be grouped in 4 categories: from small to large: (20), (4,7,15,17,19,20,21), (22), and (14). Obviously, object 20 is too small compared with the others. It can be discarded because it is probably an artefact of the imaging process or a sample contamination (dust, sand, etc.).

The object analysis process extracts the dimensional features of identified objects in an image. With UTHSCSA ImageTool, 19 different attributes of an object can be computed. Since a bilevel scanned image is used, not all attributes are relevant (e.g. all attributes related to gray scale images are ignored). The relevant attributes are defined as follows:

ObjectAreaPerimeterMajor Axis LengthMinor Axis LengthElongationRoundnessFeret DiameterCompactness
Tab.2. Relevant morphometric measurements of the objects in Fig.3

The complete image contains 82 objects in total. Measurements results of 19 morphometric parameters for all 82 objects.

The object analysis procedure has to be applied to the image with removed appendages (fig.2a). Comparing both sets of measurements gives an indication of the relative length of the appendages. To be completed.

Classification of the analysed objects
With UTHSCSA ImageTool, the object classification process can classify the objects in an image based upon a single criterion. Any of the classifiable attributes, such as area, compactness, elongation, feret diameter, major axis length, minor axis length, perimeter and roundness can be used to classify the objects into different groups.
Supervised classification
A more object oriented classification, as opposed to the default feature oriented classification of UTHSCSA ImageTool, of the specimens can be performed with the Center for Microbial Ecology Image Analysis System (CMEIAS), which is actually a UTHSCSA ImageTool plug-in.
CMEIAS 1.27 is a free, scientific software tool of computer-assisted microscopy and digital image analysis originally intended for use in microbiological research and education. It was developed by a team of microbial ecologists and computer scientists at the Michigan State University Center for Microbial Ecology to perform a semi-automatic morphotype classification of the microbes present in digital images of microbial populations and communities. CMEIAS 1.27 operates within the free program UTHSCSA ImageTool Ver. 1.27 on a PC running Windows NT 4.0/2000/ME/XP.
To perform a morphotype classification using CMEIAS 1.27 in ImageTool, the operator first finds the objects of interest in the image by using a thresholding procedure, then conducts an Object Analysis to extract various size and shape measurements from each microbe present, and finally uses these Object Analysis data to perform an Object Classification that automatically assigns the appropriate morphotype to each microbe found. This object classification procedure uses a series of pattern recognition algorithms optimized for 11 major microbial morphotypes represented by 98% of the genera described in the 9th Edition of Bergey's Manual of Determinative Bacteriology. Extensive testing using large ground truth data sets indicate that CMEIAS performs with an overall morphotype classification accuracy of 97% on properly edited images.
Fig.4. CMEIAS v1.27 supervised morphotype classification

Dr Frank Dazzo of the MSU read with interest about our initial digital image analysis based morphometric experiments in 1999 of Collembola specimens using UTHSCSA ImageTool, was curious how the CMEIAS morphotype classifier would perform on the same images, and applied CMEIAS to the image of the scanned petri dish sample of Collembola specimens in Fig.2. The result of the classification is illustrated in the pseudocoloured image of Fig.4. Depending on the shape of the Collembola, the specimens are classified as regular rods (13, blue), curved rods (3, magenta), U-shaped rods(1, pink), prosthecates (3, yellow), clubs (2, green), and rudimentary branched rods (2, gray) corresponding to the major microbial morphotypes as currently defined by CMEIAS. Note that the small Collembola specimens in the original image were too small to classify, so they were removed manually.
This preliminary test result is quite promessing, and Dr Frank Dazzo is prepared to develop the pattern recognition algorithms for the major morphotypes of Collembola.
Contact: Frank B. Dazzo, Professor of Microbiology, with questions and comments.

Unsupervised classification
An unsupervised clustering or classification procedure could be used to determine the ranges of classification criterion of the classes. In unsupervised clustering a given collection of samples is classified according to a criterion function. The set of samples is partioned into disjoint subsets. Each subset represents a cluster, with samples in the same cluster being somehow more similar than samples in different clusters. Hierarchical clustering is typically applied in biological taxonomy, where individual specimens are hierarchically grouped into species, species into genera, genera into families, and so on. Agglomerative (bottom-up) procedures start with singleton clusters and successively merge clusters. Divisive (top-down) procedures start with one cluster containing all samples and successively split clusters.
1. start with each sample = its own singleton cluster
2. stop if criterion function is satisfied
3. merge nearest distinct clusters pairwise
4. loop to 2
Fig.5. Basic Agglomerative Hierarchical Clustering

A classification is defined by specifying the ranges of the classification criterion for each cluster. Once a classification scheme is defined, the classification process will classify the objects based upon this scheme. The classification process basically can provide 3 different types of information: it can report statistics on the classifications themselves, it can report on the objects, and it can display an image in which objects are colored by their classification.
To be completed...

Autoclustering techniques applied to the feature extraction matrix might assist to get a kind of pseudotaxonomic classification of the specimens. E.g. for Collembola it should be feasible to at least classify the specimens in the two main groups: Arthropleona (with long stretched body) and Symphypleona (with more globular body). Classifying the specimens taking into account the measured features of both the images of the complete specimens (fig.2) and the postprocessed images of specimens without appendages (fig.2a) should allow further classification of the arthropleon Collembola into poduromorphs (typically with short antennae, legs and furca) and entomobryomorphs (typically with long antennae, legs and furca). To be completed.

Commercial Applications

The standardised reproduction test with Collembola (Folsomia candida) (ISO 11267:1999):
LemnaTec Collembol counts the springtails automatically after suspending of the soil with stained water. Besides number, the size of each springtail is quantified to give additional information on number and size of each generation and individual quality of the specimens.
Contact: Dirk Vandenhirtz, CEO, LemnaTec.


I thank Veselin Pizurica for his advise on using the mathematical morphology erosion operator.