🐛 Bug When working in torch.float64</co

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Explicitly initializing tensors in `float32` in MeanMetric goes against `torch.set_default_dtype`, leading to numerical errors about torchmetrics HOT 6 CLOSED

viktor-ktorvi commented on June 7, 2024

Explicitly initializing tensors in `float32` in MeanMetric goes against `torch.set_default_dtype`, leading to numerical errors

from torchmetrics.

Comments (6)

SkafteNicki commented on June 7, 2024 1

@viktor-ktorvi I had a bit of time to look at it and it should be fixed in PR #2366

from torchmetrics.

SkafteNicki commented on June 7, 2024 1

Hi @viktor-ktorvi, thanks for getting back. I was too quick on the trigger and did not correctly verify that everything was in order. Sorry about that. I have now created #2386 that should be correct. It uses a hard equal == in the comparison instead of torch.allclose as you wrote. Tensors should now be kept in whatever dtype the metric is initialized with.

from torchmetrics.

github-actions commented on June 7, 2024

Hi! thanks for your contribution!, great first issue!

from torchmetrics.

Borda commented on June 7, 2024

So honoring the default if it is set would help, right? mind sending a fix PR... 🐰

from torchmetrics.

viktor-ktorvi commented on June 7, 2024

Something like that.

I could try, would be my first time but I'll give it a go.

from torchmetrics.

viktor-ktorvi commented on June 7, 2024

Hi,

I just found the time to check this. I've upgraded and the issue still isn't fixed.

What's wrong

The example I stated still doesn't function as expected.

Why the test passed

The implemented test passes because of the use of torch.allclose, however, the values need not be close, but exactly equal. There's no reason for the two to not be exactly equal if all the calculations are performed in the same dtype.

Let me demonstrate why it's wrong:

import torch
from torchmetrics.aggregation import MeanMetric

torch.set_default_dtype(torch.float64)

metric = MeanMetric()

values = torch.randn(10000)
metric.update(values)
result = metric.compute()

actual_mean = values.mean()

print(f"{result} = Result\n{actual_mean} = Actual mean")

print(f"\nAll close = {torch.allclose(result, actual_mean, atol=1e-12)}")
print(f"Exactly equal = {result == actual_mean}")

-0.0041637580871582034 = Result
-0.004163758971599815 = Actual mean

All close = True
Exactly equal = False

Motivation

It might feel like I'm nitpicking, but these sorts of errors add up in complex problem formulations. For context, I'm working on approximating optimization problems in with ML, and in my particular case, when casting from float64 to float32 and recalculating the values, the equality and inequality constraints are no longer fulfilled and the objective function is off.

How to fix

I've narrowed it down to the _cast_and_nan_check_input function, which gets called in update line 564. At the end of _cast_and_nan_check_input (line 104), x.float() get's called, explicitly casting to float32, so that'd need changing.

Additionally, lots of these

if not isinstance(x, Tensor):
    x = torch.as_tensor(x, dtype=torch.float32, device=self.device)
if weight is not None and not isinstance(weight, Tensor):
    weight = torch.as_tensor(weight, dtype=torch.float32, device=self.device)

statements, where the dtype is explicitly called exist, e.g., line 79 or line 559. So, each of those would need to be replaced with dtype=torch.get_default_dtype().

Finally, the test needs to check for equality i.e., result == compare_function(values) instead of using torch.allclose.

Thanks for your time! @SkafteNicki @Borda

from torchmetrics.

Explicitly initializing tensors in `float32` in MeanMetric goes against `torch.set_default_dtype`, leading to numerical errors about torchmetrics HOT 6 CLOSED

Comments (6)

What's wrong

Why the test passed

Motivation

How to fix

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent