Automated plant stage labelling of herbarium samples in the family Brassicaceae

Authors

Matteo Jucker Riva, Lindsey Viann Parkinson

Supervisors

Badru Stanicki, Albin Plathottathil, Barry Sunderland

Purpose

To identify trends in the phenology of Brassicaceae by analyzing current digitized preserved specimens. Automatically label images of Brassicaceae specimens as flowering, fruiting, both or none. Potential to also use for new sample collections.

Data

The Herbaria Z+ZT and ZSS currently provide digital access to a total of 277,548 specimens which are all published under CC BY 4.0 licence and are accessible through the online portal: https://www.herbarien.uzh.ch/en/belegsuche.html

Our study was limited to Brassicaceae samples collected from Valais, comprising approximately 6,000 images. Images are available from the herbarium online portal. However, some photos have been imcluded in the repo to allow the model to run through.

Requirements

The requirements.txt file contains all the necessary python packages.

TODO: hyperlink to requirements file

We ran the model using Google Colab. Some packages and dependencies may work better in Colab compared to other environments.

How to work with this repo

IMPORTANT : To make the model work please download:

Mask_RCNN model files from the AkTwelve's repo and add it to the src folder with the name "Mask_RCNN"
Model weights from the following address: model weights and place it in the src folder with the name "model_weights"
OPTIONAL annotated dataset from the following links: train, test

This repo is a customised version of the Mask_RCNN port for tensorflow/keras built by Allen Kelly. In the src folder contains the main classes and functions needed to run this version of the model (herbaria.py),the pretrained weights of the best performing version of the model(model weights). Files for interacting and ussing the model are explained here below.

Command line interface

In the main project folder preprocess_images.py and run_detection.py allow CLI (command line interface) for easy access to important functions of the model. For example run_detection can be run on the terminal in the following way

cd path/to/project
python3 run_detection.py -h (to show an explanation for all the parameters)
run_detection.py --input "path/to/images" --output "path/to/output/folder" (example only, other parameters are available) Preprocess_images

1_resize_images.ipynb
Resizes images, scales annotations appropriately, and removes segments that have too few points

2_Train_and_Inference.ipynb
We used Matterport's implementation of Mask R-CNN to train our dataset, then use the trained weights to run inference on new images

3_Detect_Flowers_and_Fruits.ipynb
Demonstrates the process to use the trained model to detect fruits and flowers in herabrium sample pictures. For simplicity you can run the detect_brassicas script using a command line interface

4_Evaluation_analysis.ipynb
Allows one to get predictions from a trained model and helps with understanding the results

Experimental Notebooks

This file has two note notebooks from our earlier experiments.

InceptionResNetV2_classification.ipynb
Attempts to classify the images overall as fruiting flowering, both, or neither. It is missing a proper function to balance the classes which could improve results. At the time we stopped experimenting we were getting approximately 0.45 F1 score.

UNet_segmentation.ipynb Before trying MaskRCNN, versions of UNet were our best results. The UNet models I think have the potential to be as effective as MaskRCNN with further experimentation. WE had trouble converting the masks created by the model into useful classification metrics.

Preprocessing images

Image mask annotations were created with Datatorch.io and provided here as a JSON file.

Running the model

We implemented a MaskRCNN model based off of the work by akTwelve and Matterport.

Example Results

TODO add images to repo

Currently: The model picks up 34% of reproductive structures. Of those it classifies 67% correctly as flowers or fruits.

We believe significant improvements can be made to the Mask RCNN model with further parameter tuning. However, at the moment the model can still run through images and pull those with characteristics useful in phenological studies. Saving time for the initial data search.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
.requirements		.requirements
LICENSE		LICENSE
README.md		README.md
development_log.md		development_log.md
environment.yml		environment.yml
preprocess_images.py		preprocess_images.py
requirements.txt		requirements.txt
run_detection.py		run_detection.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated plant stage labelling of herbarium samples in the family Brassicaceae

Authors

Supervisors

Purpose

Data

Requirements

How to work with this repo

Command line interface

Experimental Notebooks

Preprocessing images

Running the model

Example Results

About

Releases

Packages

Contributors 2

Languages

License

eth-library-lab/herbaria--plant-labeling

Folders and files

Latest commit

History

Repository files navigation

Automated plant stage labelling of herbarium samples in the family Brassicaceae

Authors

Supervisors

Purpose

Data

Requirements

How to work with this repo

Command line interface

Experimental Notebooks

Preprocessing images

Running the model

Example Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages