The main metric for object detection tasks is the Mean Average Precision, implemented

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

So this is my approach: <a href="https://github.com/tkupek/pytorchlightning-metric

Hi, I would love to take this on. <a class="user-mention notranslate" data-hovercard-t

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Add Mean Average Precision (mAP) metric,about lightning-ai/torchmetrics

Comments (44)

tkupek commented on May 13, 2024 7

@gau-nernst I have an approach where I bypassed the need for a temp file and initialize the CocoEval class with the detections and targets directly.
To the best of my knowledge, this works with the current torchmetrics functionality.
Let me upload my rough draft in the evening, to discuss the approach together. If we agree, we can improve the code to make it acceptable for the torchmetrics package.

from torchmetrics.

SkafteNicki commented on May 13, 2024 6

reopening as the current implementation in torchmetrics is for information retrieval and here is being asked for an implementation for object detection.

from torchmetrics.

OrthoDex commented on May 13, 2024 3

Hello! First time contributor here. I'd like to take a shot at this!

from torchmetrics.

tkupek commented on May 13, 2024 3

I have a draft for a pycocotools wrapper laying around. If you want, we can use this as a basis for a PR to discuss?

from torchmetrics.

tkupek commented on May 13, 2024 3

So this is my approach:
https://github.com/tkupek/pytorchlightning-metrics/blob/mean-average-precision/torchmetrics/image/map.py

It takes the preds and targets and converts them into COCO format in memory, then initializes the COCOeval class and runs the evaluation.
In my company this works smoothly with only a minor performance impact and even for multi GPU training.

Incomplete TODO list:

check if pycocoeval can handle tensors to avoid .cpu() calls
standardize MAPMetricResults to have all evaluation results in there
refactor some code parts (e.g. join get_coco_target and get_coco_preds methods)
add unittests and documentation in torchmetrics format

@zhiqwang @gau-nernst @SkafteNicki if you agree with this approach I can try to finalize the draft in the next week (or we can work on this together).

from torchmetrics.

Kshitij09 commented on May 13, 2024 2

I recently managed to implement a minimal version of Kaggle's Mean Average Precision metric. The difference is in the calculation itself. You can find the details here:

https://www.kaggle.com/c/global-wheat-detection/overview/evaluation

My kernel: pytorch-mean-absolute-precision-calculation

With slight modifications in the formula of mAP, I suppose we would be able to integrate this metric in pytorch-lightning (since we already have the average-precision implemented)

I've already written a logic to map predicted boxes to ground truth ones (taking their respective scores into consideration) so have a look at the kernel and let me know if you found any issues.

from torchmetrics.

briankosw commented on May 13, 2024 2

Hi, I would love to take this on. @SkafteNicki could you assign it to me?

from torchmetrics.

awaelchli commented on May 13, 2024 1

would be great to have that metric!

from torchmetrics.

SkafteNicki commented on May 13, 2024 1

@jspaezp I am fine with only having a functional interface (atleast to begin with).
Sorry about not referencing this before, some work got forgotten in the transfer.

from torchmetrics.

SkafteNicki commented on May 13, 2024 1

@tkupek Looks great to me. Please send a PR to us with what you have, then we can begin to review/comment/add to the your code.

from torchmetrics.

twsl commented on May 13, 2024 1

I'd like to add, that this would be beneficial for object detection and segmentation as well

from torchmetrics.

tkupek commented on May 13, 2024 1

@senarvi Thank you for your comment.
I think this is reasonable and fixed the dict keys to your suggestion (boxes, scores, labels).

The (y1, x1, y2, x2) format was an error in the docstring, we actually do use (x1, y1, x2, y2). Fixed that, too.

@SkafteNicki @Borda where are we on the issues with DDP and adjusting the test. Any time to look at this yet? I'm happy to help but I need some hints.

from torchmetrics.

NumesSanguis commented on May 13, 2024

Not yet sure yet, but it looks like AveragePrecision (https://pytorch-lightning.readthedocs.io/en/stable/metrics.html) seems to be acting like mAP. but they didn't add "Mean" to the name?

from torchmetrics.

SkafteNicki commented on May 13, 2024

@briankosw thanks you for wanting to contribute!
Please ping in your PR.

from torchmetrics.

Vichoko commented on May 13, 2024

We need this! 🤩

For PyTorch lightning modules focused on multi-class classification, would be useful to obtain mAP metric over the test_dataloader data samples in a standard way that works with DDP with minor changes to the architecture. Because this metric is commonly used to conclude about the representation learning side of the deep learning architectures without using any thresholds or max functions to take hard decisions on the predictions.

Most classifiers have a pair of Fully-connected layers that do a classwise likelihood prediction. The output of the last layer before the FC layers could be used for a retrieval task over the test dataset and mAP could be calculated over this.

from torchmetrics.

jspaezp commented on May 13, 2024

@Borda I can give this a go.

But I am wondering what input format would be the most consistent with the rest of the package ...
Most of the stuff that it implemented in the package takes directly tensors of shape [N,C, ...] ... and what scope would you like for it ...

So directly ...

Should it support only pre-aligned inputs? EDIT: I don't think so ... but supporting box inputs requires this to be implemented
- (preds = [N,C] , targets = [N,C] )
Should it support bounding boxes ? if so in what format (xxyy, xyxy, xwyh, xyhw ...)? Should each update be for a single "group" (each image for instance) or should it take a grouping argument?
- (preds: float = [N1,C] , targets: long/bool = [N2,C] , pred_boxes = [x,x,y,y], target_boxes = [x,x,y,y], pred_grouping: long = [N1], target_grouping: long = [N2], iou_cutoff = 0.5); where the values in the grouping are in the range [0, NUM_GROUPS] so only boxes in the same group are compared with each other.
Should it support masks?
- Option 1: (preds: float = [N,C] , targets: long/bool = [N,C] , pred_masks = [N,H,W], target_masks = [N,H,W], grouping = [N], iou_cutoff = 0.5, mask_threshold = 0.5)
- Option 2: (preds: float = [N1,C,H,W] , targets: long/bool = [N2,C,H,W], pred_grouping: long = [N1], target_grouping: long = [N2], iou_cutoff = 0.5, mask_threshold = 0.5)

On the other hand ... the COCO implementation has for each image a set of positive and negative labels in each image and ignores the rest ... should that be supported?

lmk what you think
Kindest wishes,
Sebastian

edit: leaving here for reference

from torchmetrics.

lucadiliello commented on May 13, 2024

mAP was added to PL some months ago and is now part of the torchmetrics package.

from torchmetrics.

Borda commented on May 13, 2024

added with 60c0eec

from torchmetrics.

jspaezp commented on May 13, 2024

Just FYI, I have been working in a implementation over here https://github.com/jspaezp/metrics/tree/feature-spatial-average-precision but it is not quite ready for a PR.

Right now it needs ... (this is more a personal checklist before I send the PR)

DDP testing
Documentation (it is there, just has some argument updating and moving it up the classes and be more user friendly)
"Translating" tests to be consistent with the rest of the testing suite (RN it is more to do some quick and dirty TDD and not really for production consistency)
A fair amount of code cleanup/reorganization so the code order makes logical sense.

If you have any suggestion before I send the PR it would be great but not expected.
Kindest wishes,
Sebastian

from torchmetrics.

SkafteNicki commented on May 13, 2024

@jspaezp if you are going to send a PR, please also take a look at this PR Lightning-AI/pytorch-lightning#4564 (it got closed after we moved from lightning to torchmetrics)
It basically was a nearly finished PR with working map implementation, but just need a bit more work on the testing aspects.

from torchmetrics.

jspaezp commented on May 13, 2024

@SkafteNicki thanks for letting me know! I feel like that implementation would be really hard to convert to a class metric due to how "monolitic" it is but I can definitely work on it.

Best,
Sebastian
(btw: it would have been nice to have that PR referenced in this issue before this time)

from torchmetrics.

patricieni commented on May 13, 2024

hello everyone! What is the status on this PR, is someone actively working on this?

From the prior PR in Lightning-AI/pytorch-lightning#4564 it doesn't seem that would work for DDP, which seems to be the main challenge when implementing this as a metric.

Has anyone tried first implementing this metric as a subclass of torchmetrics.Metric? - our team is currently working on this

from torchmetrics.

zhiqwang commented on May 13, 2024

Hi, as discussed in #190 , Perhaps adding a wrapper of pycocotools is also an option?

from torchmetrics.

Borda commented on May 13, 2024

Perhaps adding a wrapper of pycocotools is also an option?

yep, that sounds good to me 🐰
cc: @SkafteNicki @maximsch2

from torchmetrics.

SkafteNicki commented on May 13, 2024

@tkupek sounds good. Feel free to send a PR :]

from torchmetrics.

SkafteNicki commented on May 13, 2024

@tkupek have you had time to take a look at this?

from torchmetrics.

tkupek commented on May 13, 2024

Sorry, not yet. Will have a look at it in my vacation in two weeks.

from torchmetrics.

gau-nernst commented on May 13, 2024

Hello, I was also looking for a way to evaluate my object detector in PyTorch Lightning. In the end I implemented some simple logic to make predictions in validation_step(), gather them in validation_epoch_end(), write them to a temp file and run pycocotools. This probably won't work with multi-GPU training.

I tried implementing this as a torchmetrics.Metric subclass before but was not successful, I was not sure why. I can try again.

Just a note on technical details of Metric. Can a metric state be a list of lists, where each is a list of Tensors? The documentation seems to imply that a metric state can only be a torch.Tensor or a list of torch.Tensor.

Also, is it safe to move tensor to CPU inside the compute() method?

from torchmetrics.

zhiqwang commented on May 13, 2024

Hi @gau-nernst ,

I've also written a torchmetrics.Metric wrapper of pycocotools here (currently only supports 1 GPU), it can be used with the lightning-assembly, but there is still a lack of numerical unit-tests and multi-GPU supports.

from torchmetrics.

tkupek commented on May 13, 2024

The main problem with your approach (moving to CPU, writing to file) will be performance and the lack of multi GPU training.

from torchmetrics.

gau-nernst commented on May 13, 2024

Hey @zhiqwang,

Awesome work. I see it's based on torchvision/detectron2 evaluation code. From what I understand from your code, it requires a COCO ground truth annotations file. I think if we add object detection mAP to torchmetrics, it should be agnostic to dataset format, and can be computed based on a set of ground truth and predicted boxes only.

@tkupek Yes these are the obvious problems with my approach. If we want performance, the logic should be implemented in torch and not used pycocotools. Some people have done this I believe. But the problem is whether the torch implementation will be accurate and consistent with pycocotools.

For multi-GPU training, if torchmetrics synchronizes target and predicted boxes accross GPUs, it should be okay to transfer the results to CPU to calculate mAP using pycocotools?

from torchmetrics.

zhiqwang commented on May 13, 2024

I see it's based on torchvision/detectron2 evaluation code. From what I understand from your code, it requires a COCO ground truth annotations file. I think if we add object detection mAP to torchmetrics, it should be agnostic to dataset format, and can be computed based on a set of ground truth and predicted boxes only.

@gau-nernst Yep, I agree with you, the design and functionality of this module still requires some deliberation here.

For multi-GPU training, if torchmetrics synchronizes target and predicted boxes accross GPUs, it should be okay to transfer the results to CPU to calculate mAP using pycocotools?

I guess it is possiable, and seems that pytorch 1.9 provides a better interface to support the arbitrary picklable data synchronizes for multi-GPUs mode with torch.distributed.all_gather_object , and it has been applied in torchvision: pytorch/vision#3857

from torchmetrics.

SkafteNicki commented on May 13, 2024

It is clearly that you all are more experts than me on this specific metric, but let me get this straight:
Do we need from torchmetrics side to support gathering of arbitrarily objects if we want this metric to work or is there some trivial way around it?

from torchmetrics.

SkafteNicki commented on May 13, 2024

If anyone has any opinion on the input format that preds and target should have in this metric, please comment on #467

from torchmetrics.

tkupek commented on May 13, 2024

I had a look how tensorflow is handling this and it seems like a dict is most common:
https://github.com/tensorflow/models/blob/master/research/object_detection/metrics/coco_evaluation.py

They use this for the groundtruth (target):

groundtruth_dict = {
  "groundtruth_boxes": [ymin, xmin, ymax, xmax],
  "groundtruth_classes": [num_boxes]
}

and this for predictions (preds):

detections_dict = {
  "detection_boxes": [ymin, xmin, ymax, xmax],
  "detection_scores": [num_boxes],
  "detection_classes": [num_boxes]
}

I find this intuitive and would go the same way. One could discuss if we want different dict keys for groundtruth and detections (but I think we should do things more standardized).

The CocoEval class needs a list of values anyways, but I guess it would be more consistent for the torchmetrics API to expect fiels of type Tensor? -> I drafted a version that allows torch.Tensor, np.ndarray and List

Interesting sidenote: They also use a pycocotools wrapper.

from torchmetrics.

tkupek commented on May 13, 2024

@twsl we are implementing this for object detection (bounding boxes). You can have a look at the PR.
Do you know if the pycocotools work for segmentation as well? Maybe you even have an example?

from torchmetrics.

twsl commented on May 13, 2024

Yes, i do think so.
Actually something similar based on IoU is already available here

from torchmetrics.

tkupek commented on May 13, 2024

@SkafteNicki do you think this needs to be a part of this PR or can we make this an additional improvement?

from torchmetrics.

SkafteNicki commented on May 13, 2024

@tkupek, no lets try to get the metric merge as it is right now.
@twsl feel free to open a new issue when this is closed with the proposed enchancements :]

from torchmetrics.

senarvi commented on May 13, 2024

PyTorch Lightning Bolts includes two object detection models. They take a list of dictionaries containing:

"boxes": [num_boxes, 4] the ground truth boxes in (x1, y1, x2, y2) format
"labels": [num_boxes] the class label for each ground truth box

The detections contain:

"boxes": [num_boxes, 4] predicted boxes in (x1, y1, x2, y2) format
"scores": [num_boxes] detection confidences
"labels": [num_boxes] the predicted class label for each box

Torchvision ops, for example NMS, take "boxes" and "scores" consistently in the same format:

It would be convenient to standardize the keys and use the same keys as what the models use. (We can also change the models to use "classes" instead of "labels" if that's better.) Most importantly, I would prefer to use the same format for the boxes (x1, y1, x2, y2). @tkupek why do you think (y1, x1, y2, x2) is intuitive? I haven't seen this format being used before? Also note that torchvision.ops.box_convert() supports formats "xyxy" and "xywh", but not "yxyx"

from torchmetrics.

twsl commented on May 13, 2024

I'm considering converting the COCOeval class to pytorch in order to make the mAP, mAR calculation a bit faster and to enable calculation on GPU. Would love to get some feedback.

from torchmetrics.

senarvi commented on May 13, 2024

I'm considering converting the COCOeval class to pytorch in order to make the mAP, mAR calculation a bit faster and to enable calculation on GPU. Would love to get some feedback.

Using the COCOeval based metric is slow. I haven't checked if it's the the GPU-CPU transfer or if the operation is just so slow. If you can make it faster, that would be great, but I have a feeling that it's going take some effort to utilize the GPU efficiently and still keep the implementation correct.

from torchmetrics.

tkupek commented on May 13, 2024

I'm considering converting the COCOeval class to pytorch in order to make the mAP, mAR calculation a bit faster and to enable calculation on GPU. Would love to get some feedback.

I agree with @senarvi, it would be great but is probably not trivial. Feel free to give it a try based on the existing PR and let us know what you learned 🙂

from torchmetrics.

tkupek commented on May 13, 2024

The PR looks quite complete to me. Can we finalize it?
#467

from torchmetrics.

Add Mean Average Precision (mAP) metric about torchmetrics HOT 44 CLOSED

Comments (44)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent