project-agml / agml Goto Github PK

AgML is a centralized framework for agricultural machine learning. AgML provides access to public agricultural datasets for common agricultural deep learning tasks, with standard benchmarks and pretrained models, as well the ability to generate synthetic data and annotations.

License: Apache License 2.0

Python 98.58% Shell 0.27% CMake 0.16% C++ 0.99%

deep-learning agriculture pytorch dataset image-classification object-detection semantic-segmentation computer-vision synthetic-data

agml's Introduction

👨🏿‍💻👩🏽‍💻🌈🪴 Want to join the AI Institute for Food Systems team and help lead AgML development? 🪴🌈👩🏼‍💻👨🏻‍💻

We're looking to hire a postdoc with both Python library development and ML experience. Send your resume and GitHub profile link to [email protected]!

Overview

AgML is a comprehensive library for agricultural machine learning. Currently, AgML provides access to a wealth of public agricultural datasets for common agricultural deep learning tasks. In the future, AgML will provide ag-specific ML functionality related to data, training, and evaluation. Here's a conceptual diagram of the overall framework.

AgML supports both the TensorFlow and PyTorch machine learning frameworks.

Installation

To install the latest release of AgML, run the following command:

pip install agml

Quick Start

AgML is designed for easy usage of agricultural data in a variety of formats. You can start off by using the AgMLDataLoader to download and load a dataset into a container:

import agml

loader = agml.data.AgMLDataLoader('apple_flower_segmentation')

You can then use the in-built processing methods to get the loader ready for your training and evaluation pipelines. This includes, but is not limited to, batching data, shuffling data, splitting data into training, validation, and test sets, and applying transforms.

import albumentations as A

# Batch the dataset into collections of 8 pieces of data:
loader.batch(8)

# Shuffle the data:
loader.shuffle()

# Apply transforms to the input images and output annotation masks:
loader.mask_to_channel_basis()
loader.transform(
    transform = A.RandomContrast(),
    dual_transform = A.Compose([A.RandomRotate90()])
)

# Split the data into train/val/test sets.
loader.split(train = 0.8, val = 0.1, test = 0.1)

The split datasets can be accessed using loader.train_data, loader.val_data, and loader.test_data. Any further processing applied to the main loader will be applied to the split datasets, until the split attributes are accessed, at which point you need to apply processing independently to each of the loaders. You can also turn toggle processing on and off using the loader.eval(), loader.reset_preprocessing(), and loader.disable_preprocessing() methods.

You can visualize data using the agml.viz module, which supports multiple different types of visualization for different data types:

# Disable processing and batching for the test data:
test_ds = loader.test_data
test_ds.batch(None)
test_ds.reset_prepreprocessing()

# Visualize the image and mask side-by-side:
agml.viz.visualize_image_and_mask(test_ds[0])

# Visualize the mask overlaid onto the image:
agml.viz.visualize_overlaid_masks(test_ds[0])

AgML supports both the TensorFlow and PyTorch libraries as backends, and provides functionality to export your loaders to native TensorFlow and PyTorch formats when you want to use them in a training pipeline. This includes both exporting the AgMLDataLoader to a tf.data.Dataset or torch.utils.data.DataLoader, but also internally converting data within the AgMLDataLoader itself, enabling access to its core functionality.

# Export the loader as a `tf.data.Dataset`:
train_ds = loader.train_data.export_tensorflow()

# Convert to PyTorch tensors without exporting.
train_ds = loader.train_data
train_ds.as_torch_dataset()

You're now ready to use AgML for training your own models! Luckily, AgML comes with a training module that enables quick-start training of standard deep learning models on agricultural datasets. Training a grape detection model is as simple as the following code:

import agml
import agml.models

import albumentations as A

loader = agml.data.AgMLDataLoader('grape_detection_californiaday')
loader.split(train = 0.8, val = 0.1, test = 0.1)
processor = agml.models.preprocessing.EfficientDetPreprocessor(
    image_size = 512, augmentation = [A.HorizontalFlip(p=0.5)]
)
loader.transform(processor)

model = agml.models.DetectionModel(num_classes=loader.num_classes)

model.run_training(loader)

Public Dataset Listing

Dataset	Task	Number of Images
bean_disease_uganda	Image Classification	1295
carrot_weeds_germany	Semantic Segmentation	60
plant_seedlings_aarhus	Image Classification	5539
soybean_weed_uav_brazil	Image Classification	15336
sugarcane_damage_usa	Image Classification	153
crop_weeds_greece	Image Classification	508
sugarbeet_weed_segmentation	Semantic Segmentation	1931
rangeland_weeds_australia	Image Classification	17509
fruit_detection_worldwide	Object Detection	565
leaf_counting_denmark	Image Classification	9372
apple_detection_usa	Object Detection	2290
mango_detection_australia	Object Detection	1730
apple_flower_segmentation	Semantic Segmentation	148
apple_segmentation_minnesota	Semantic Segmentation	670
rice_seedling_segmentation	Semantic Segmentation	224
plant_village_classification	Image Classification	55448
autonomous_greenhouse_regression	Image Regression	389
grape_detection_syntheticday	Object Detection	448
grape_detection_californiaday	Object Detection	126
grape_detection_californianight	Object Detection	150
guava_disease_pakistan	Image Classification	306
apple_detection_spain	Object Detection	967
apple_detection_drone_brazil	Object Detection	689
plant_doc_classification	Image Classification	2598
plant_doc_detection	Object Detection	2598
wheat_head_counting	Object Detection	6512
peachpear_flower_segmentation	Semantic Segmentation	42
red_grapes_and_leaves_segmentation	Semantic Segmentation	258
white_grapes_and_leaves_segmentation	Semantic Segmentation	273
ghai_romaine_detection	Object Detection	500
ghai_green_cabbage_detection	Object Detection	500
ghai_iceberg_lettuce_detection	Object Detection	500
riseholme_strawberry_classification_2021	Image Classification	3520
ghai_broccoli_detection	Object Detection	500
bean_synthetic_earlygrowth_aerial	Semantic Segmentation	2500
ghai_strawberry_fruit_detection	Object Detection	500
vegann_multicrop_presence_segmentation	Semantic Segmentation	3775

Usage Information

Using Public Agricultural Data

AgML aims to provide easy access to a range of existing public agricultural datasets The core of AgML's public data pipeline is AgMLDataLoader. You can use the AgMLDataLoader or agml.data.download_public_dataset() to download the dataset locally from which point it will be automatically loaded from the disk on future runs. From this point, the data within the loader can be split into train/val/test sets, batched, have augmentations and transforms applied, and be converted into a training-ready dataset (including batching, tensor conversion, and image formatting).

To see the various ways in which you can use AgML datasets in your training pipelines, check out the example notebook.

Annotation Formats

A core aim of AgML is to provide datasets in a standardized format, enabling the synthesizing of multiple datasets into a single training pipeline. To this end, we provide annotations in the following formats:

Image Classification: Image-To-Label-Number
Object Detection: COCO JSON
Semantic Segmentation: Dense Pixel-Wise

Contributions

We welcome contributions! If you would like to contribute a new feature, fix an issue that you've noticed, or even just mention a bug or feature that you would like to see implemented, please don't hesitate to use the Issues tab to bring it to our attention. See the contributing guidelines for more information.

Funding

This project is partly funded by the National AI Institute for Food Systems (AIFS).

agml's People

Contributors

Stargazers

Watchers

agml's Issues

Zero matrices as synthetic outputs

In your AgML-Synthetic.ipynb, I ran the following lines:

loader = agml.data.AgMLDataLoader.helios('tomato_sample')
_ = agml.viz.visualize_image_and_mask(loader[0])

This shows a pair of images as below, where the right seems to be correct with a segmentation result while the left is wrong with a black image.

I can find that loader[0][0] is simply a zero matrix because np.all(loader[0][0]==0) outputs True. I have tried this multiple times, but I always have the issue.

Image utils references np without import

Trying to instantiate a model class like DetectionModel gives NameError: name 'np' is not defined

Errors In Changing Annotation Types

Hey agml team,

I'd like to generate bounding boxes instead of segmentations, so I changed the opt annotation_ type to be "agml.synthetic.AnnotationType.object_detection". But here I got an error when generating the images:

Thanks!

When will the AgML Crop Detection Generalizability Challenge open?

Is it possible to put our pretrained model in this project?

Nice project for agriculture. I believe it can be useful for the community.

Is it possible to put our pretrained model and something else to this project.

We trained a ViT-large model in PlantCLEF, and then finetune the model in 13 plant disease-related recognition datasets, one plant growth stage classification and one weed detection dataset. The experiments suggest nice performance.

Paper details can be referred to here and code.

We want to put our model here. If it is possible, please tell me.

Issue with installation of Helios

Let me attach the log file generated. Thanks for your helps in advance.
.helios_compilation_log-20240409-154451.log

Failed to initialize GLFW in Synthetic output

I was running the AgML-Synthetic.ipynb notebook under examples folder. I encounter this error from this command:
generator.generate(name = 'tomato_sample2', num_images = 3, clear_existing_files = True)

I have checked the generator.py in synthetic folder, but I fail to understand which sub process caused this -6 return code.

I am using the Ubuntu-20.04 from WSL2 on windown 11 to run this file. I am not sure whether my platform caused this issue

Always Getting Simply Black Images under Annotations Files.

Hey,

I'm learning AgML-ActiveVision.ipynb and trying to generate some images. After completing the running process, the images under annotation files are simply black without any information. I wonder if I did something wrong or they should be black for some reason.

Thanks!

DeepWeeds has 9 classes only

In the rangeland_weeds_australia in the AgML repo, the class names are [no_weeds, chinee_apple, lantana, parkinsonia, parthenium, prickly_acacia, rubber_vine, siam_weed, snake_weed, negative]. However the actual dataset does not have the class "no_weeds". The actual total number of classes is 9 and not 10.

PermissionError when run agml.synthetic.reinstall_helios()

I am running the AgML-Synthetic.ipynb in /AgML/examples and encountered this isse:

I found the previous isse and solution as run:

agml.synthetic.reinstall_helios()

So I add a code cell above the troubled cell to run the above code. But I encouter another permission error as follow:

I am using conda environment as the kernel and window11 system

Datasets on plant diseases, pests detection & miscellaneous

PlantVillage-Dataset : https://github.com/spMohanty/PlantVillage-Dataset
Rice Leaf Diseases Dataset : https://archive.ics.uci.edu/ml/datasets/Rice+Leaf+Diseases
Plant Disease Symptoms : https://www.digipathos-rep.cnptia.embrapa.br/
Plant Diseases Dataset : https://www.kaggle.com/vipoooool/new-plant-diseases-dataset/
PlantVillage dataset : https://github.com/MarkoArsenovic/DeepLearning_PlantDiseases
Northern leaf blight : https://osf.io/p67rz/
Insects : http://www.nbair.res.in/insectpests/pestsearch.php
Crop : http://www.icgroupcas.cn/website_bchtk/index.html
PlantDoc dataset : https://github.com/pratikkayal/PlantDoc-Object-Detection-Dataset
Northern Leaf Blight (NLB) dataset for Maize : https://bisque.cyverse.org/client_service/browser?resource=/data_service/dataset
Apple leaf disease : https://www.kaggle.com/c/plantpathology-2020-fgvc7
Insect Pest Recognition : https://github.com/xpwu95/IP102
Tomato pest: https://data.mendeley.com/datasets/s62zm6djd2/1
LEM+ dataset: https://doi.org/10.1016/j.dib.2020.106553
Soybean images dataset: https://doi.org/10.1016/j.dib.2021.107756
fortunella margarita images : https://doi.org/10.1016/j.dib.2021.107293
Vegetable crops early stage of growth : https://doi.org/10.1016/j.dib.2022.108035
Pomegranate fruits: https://doi.org/10.1016/j.dib.2021.107249
Food crops and weed images: https://doi.org/10.1016/j.dib.2020.105833
FruitNet: Indian fruits image dataset: https://doi.org/10.1016/j.dib.2021.107686
Fuji apple: https://doi.org/10.1016/j.dib.2021.107629
Soybean seed data: https://doi.org/10.1016/j.dib.2018.12.090
Cassava whitefly count: https://doi.org/10.1016/j.dib.2022.107911
Downy mildew symptoms on Merlot grape variety: https://doi.org/10.1016/j.dib.2021.107250
Chinese medicinal blossoms: https://doi.org/10.1016/j.dib.2021.107655
Arabica coffee leaf images dataset: https://doi.org/10.1016/j.dib.2021.107142
Dataset of necrotized cassava root: https://doi.org/10.1016/j.dib.2020.106170
Medjool dates: https://doi.org/10.1016/j.dib.2021.107116

Encountered an error when attempting to compile Helios with CMake.

I created a conda evnrionment with python == 3.9.16; agml = 0.4.7
I am executing the floowing cell in /AgML/examples/AgML-Synthetic.ipynb

And run into this error
This is my Cmake version

Community `models` contributions API

We're opening up this issue regarding how to enable easy, yet high quality, model contributions to AgML. This was raised initially in issue #32. If you are interested in contributing to this discussion and code development, let's have this conversation below.

FileNotFoundError: "... /style_tomato_sample.xml"

I was following your "AgML-Synthetic.ipynb" example.

I have run the code below:

generator.generate(name = 'tomato_sample', num_images = 3),

and an error follows:

FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_2880/388478152.py in <module>
      1 # Generate the data.
----> 2 generator.generate(name = 'tomato_sample', num_images = 3)

~/anaconda3/envs/agml/lib/python3.7/site-packages/agml/synthetic/generator.py in generate(self, name, num_images, output_dir, convert_data)
    355         xml_options = self._convert_options_to_xml()
    356         xml_file_base = f"style_{name}.xml"
--> 357         xml_options.write(os.path.join(XML_PATH, xml_file_base))
    358         xml_options.write(os.path.join(metadata_dir, xml_file_base))
    359 

~/anaconda3/envs/agml/lib/python3.7/xml/etree/ElementTree.py in write(self, file_or_filename, encoding, xml_declaration, default_namespace, method, short_empty_elements)
    758                 encoding = "us-ascii"
    759         enc_lower = encoding.lower()
--> 760         with _get_writer(file_or_filename, enc_lower) as write:
    761             if method == "xml" and (xml_declaration or
    762                     (xml_declaration is None and

~/anaconda3/envs/agml/lib/python3.7/contextlib.py in __enter__(self)
    110         del self.args, self.kwds, self.func
    111         try:
--> 112             return next(self.gen)
    113         except StopIteration:
    114             raise RuntimeError("generator didn't yield") from None

~/anaconda3/envs/agml/lib/python3.7/xml/etree/ElementTree.py in _get_writer(file_or_filename, encoding)
    795         else:
    796             file = open(file_or_filename, "w", encoding=encoding,
--> 797                         errors="xmlcharrefreplace")
    798         with file:
    799             yield file.write

FileNotFoundError: [Errno 2] No such file or directory: '/home/username/anaconda3/envs/agml/lib/python3.7/site-packages/agml/_helios/Helios/projects/SyntheticImageAnnotation/xml/style_tomato_sample.xml'

No "~/.agml/helios_config.json" file error

Hi.

The code below
pprint(agml.synthetic.available_canopies())

generates the following error

FileNotFoundError: [Errno 2] No such file or directory: /home/username/.agml/helios_config.json

Should I have installed Helios separately?

Error "Existing installation of Helios not found."

Hey, I keep getting an error in this line of code:
pprint(agml.synthetic.available_canopies())

Could you please help me take a look on it? Thanks!

ECCV 2022 DEGA-CV Challenge

When will you release dataset?

AgML Crop Detection Generalizability Challenge
AgML Syn2Real Crop Detection Data Efficiency Challenge

Problems in visualizing grape_detection_syntheticday

On this dataset I can see the bounding boxes with the method viz.visualize_image_and_boxes, but It give errors if you try to run the other methods for showing the masks (visualize_image_and_mask, overlay_segmentation_masks, visualize_overlaid_masks). The problem could be that the coco_info["segmentation"] is a 3D array an not a 2D one.

Update README

README needs some updating. I think we could include a very light version of the AgML-data.ipynb within the README.

Warning and error with visualize

When working on GoogleColab in the section "Using native TensorFlow and PyTorch datasets", I received:

A warning when running __ = viz.visualize_image_and_mask(image, annotation)_

[AgML] 11-12-2021 03:25:02 WARNING - matplotlib.image: Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

An error when running __ = viz.visualize_image_and_boxes(next(iter(ds)))_

TypeError: Got multiple types for coordinates: [<class 'numpy.int32'>, <class 'numpy.int32'>, <class 'numpy.int32'>, <class 'numpy.int32'>]

Everything else worked pretty well!

Selecting data from PyTorch Dataset leads to dimension mismatch error

After converting an AgML dataset to a Torch dataset, selecting data gives an error. It seems like this error comes from the conversion between numpy arrays and torch tensors in tftorch:

https://github.com/plant-ai-biophysics-lab/AgML/blob/8fff70422885fe31a4849f150fa2c584bcb721a5/agml/backend/tftorch.py#L144

In the case of the apple flower segmentation dataset, the traceback suggests that the issue is with the annotation array, perhaps because it's a 2D array?

In the case of the bean disease and apple detection datasets (I wanted to try classification and object detection examples), it seems like I'm getting the issue with the image itself.

Bean disease:

Apple Detection:

Generation of multiple annotations at once

For synthesis, I'd like to suggest a feature to generate multiple types of annotation–bounding box, instance/class segmentation, etc.–per image at once. If I've understood correctly, the current version of AgML can only generate one type, and I need to synthesize again, if I need another type. It costs time, and the synthesized environment can change.

OSError: Encountered an error when generating synthetic data. Process returned code -6.

I have faced this error below when running:

generator = agml.synthetic.HeliosDataGenerator(opt)
generator.generate(name = 'AV-GOblet6', num_images = 1)

It mentions something about graphics. Actually, I was connecting my local machine to a remote server via Jupyter Notebook. Would this setting matter? Thanks for your helps in advance!

Loading XML file: /home/username/anaconda3/envs/agml/lib/python3.8/site-packages/agml/_helios/Helios/projects/SyntheticImageAnnotation/xml/style_AV-GOblet6.xml...done.
Reading XML file: /home/username/anaconda3/envs/agml/lib/python3.8/site-packages/agml/_helios/Helios/projects/SyntheticImageAnnotation/xml/style_AV-GOblet6.xml...Building canopy of goblet trellis grapevine...done.
Canopy consists of 1304 leaves and 942970 total primitives.
Ground geometry...done.
Ground consists of 1 total primitives.
done.
/home/username/.agml/synthetic/AV-GOblet6/image0Rendering RGB image containing 942.971K primitives...Initializing graphics...Failed to initialize graphics.
Common causes for this error:
-- OSX
  - Is XQuartz installed (xquartz.org) and configured as the default X11 window handler?  When running the visualizer, XQuartz should automatically open and appear in the dock, indicating it is working.
-- Linux
  - Are you running this program remotely via SSH? Remote X11 graphics along with OpenGL are not natively supported.  Installing and using VirtualGL is a good solution for this (virtualgl.org).
terminate called after throwing an instance of 'int'
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[13], line 5
      2 generator = agml.synthetic.HeliosDataGenerator(opt)
      4 # Generate the data.
----> 5 generator.generate(name = 'AV-GOblet6', num_images = 1)

File ~/anaconda3/envs/agml/lib/python3.8/site-packages/agml/synthetic/generator.py:410, in HeliosDataGenerator.generate(self, name, num_images, output_dir, convert_data, clear_existing_files)
    405         raise OSError(f"Encountered an error when generating synthetic "
    406                       f"data. Process returned code {process.returncode}, "
    407                       f"suggesting that the program ran out of memory. Try "
    408                       f"passing a smaller environment for generation.")
    409     else:
--> 410         raise OSError(f"Encountered an error when generating synthetic "
    411                       f"data. Process returned code {process.returncode}.")
    413 # Convert the dataset format.
    414 if convert_data:

OSError: Encountered an error when generating synthetic data. Process returned code -6.

Community `data` contributions API

We're opening up this issue regarding how to enable easy, yet high quality, data contributions to AgML. This was raised initially in Issue 15. If you are interested in contributing to this discussion and code development, let's have this conversation below.

Fatal error LiDAR.h No such file or directory

I have update from agml=0.4.6 to agml=0.5.0 and encounter "fatal error: LiDAR.h: No such file or directory" error when running "agml.synthetic.reinstall_helios()".

I have tried to use "pip install lidar" and that doesn't solve the error.

I am using wsl2 with win11

Changing the paths of saved synthetic images

I would like to generate synthetic images under a certain directory (e.g., './syn_data'). I have tried using:

agml.backend.set_synthetic_save_path('./syn_data') before generation, but it still saves to ~/.agml/synthetic. Is there any particular way of using it properly?