Hey musl. Some people on Discord recently asked whether <a class="co

I documented it now. Thanks musl! <bl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Any update on this <a class="user-mention notranslate" data-hovercard-type="user" data

All architectures with `rgb_mean = (0.5, 0.5, 0.5)` are incompatible about neosr HOT 9 CLOSED

RunDevelopment commented on August 20, 2024

All architectures with `rgb_mean = (0.5, 0.5, 0.5)` are incompatible

from neosr.

Comments (9)

muslll commented on August 20, 2024 1

I documented it now. Please send an example code of how you think it would be better to add such tensor for detection and I can add it to the modified networks. Would something like this work for you?

self.register_buffer("neosr", torch.zeros(1))

Just to make it clear: this change to the rgb mean values was needed to improve stability. Several datasets were giving unstable training, and after debugging for several hours I found that the original values was the cause of it. The reason is that the default values were copied directly from SwinIR code, and they used ImageNet1k rgb mean values. This means that datasets that do not have values close to these (none, not even the classic dataset used in SISR research, DIV2k) will cause extreme values, making training unstable.
So this was not a 'random change' at all, it was made for a reason. The alternative to this solution would be to remove normalization altogether, which would completely break all models. Simply changing it to neutral values was the best way I found while keeping compatibility.

from neosr.

RunDevelopment commented on August 20, 2024 1

Will the current official implementation still work with a model that has this new tensor?

As Joey said, if the official arch code load the model with strict=True, then the additional tensor will prevent the model from loading. This is the intended effect, because a model with this tensor is incompatible.

I also want to point out that only parameters that differ from the values in the official arch will be stored. So e.g. an ATD model with rgb_mean=(0.4488, 0.4371, 0.4040) will not have the extra tensor, and so official arch code is guaranteed to load it in strict mode.

from neosr.

RunDevelopment commented on August 20, 2024

I documented it now.

Thanks musl!

Just to make it clear: this change to the rgb mean values was needed to improve stability.

You have already made numerous improvements that objectively make these architectures better. I'm not suggesting for you to stop doing that, we simply need to find a way to make it possible for spandrel and other to detect these changes.

Please send an example code of how you think it would be better to add such tensor for detection and I can add it to the modified networks.

Some on EE suggested an interesting solution that would ensure that we'll never have this problem again: put all changes behind new hyperparameters and store those extra hyperparameters in the model.

So we make rgb_mean a hyperparameter and store it inside a dict that contains all hyperparameters that relate to modifications you made. We then JSON serialize that dict and put inside an uint8 tensor that we store in the model (let's called this tensor neosr_params for now). Spandrel and others can then look for this tensor and apply the same changes as neosr. Of course, if neosr doesn't make any modification, we don't need to store neosr_params at all.

This would also allow you to add a "compatibility mode" to neosr. While neosr use the best hyperparameters by default (so e.g. rgb_mean=(0.5, ...)), compatibility mode would revert back to the hyperparameters the original arch uses.

I suggest this solution, because it will make it easier for us both. When you add a new modification, you just need to add a new hyperparameter and add it to a dict. No more tensors for each new parameters (e.g. SPAN no_norm). We can then detect the change and implement the modification. But most importantly, we can detect that a change has been made even if we don't support this change. This will allow spandrel to reject loading models that it don't support.

What do you think of doing something like this?

As for code: I would implement a decorator to make this feature easy to use. Like this:

@ARCH_REGISTRY.register()
@modifications
class atd(nn.Module):
    defaults_for_modifications = {"rgb_mean": (0.4488, 0.4371, 0.4040)}

    def __init__(self, embed_dim=210, ..., rgb_mean=(0.5, 0.5, 0.5)):
        # normal init

The modifications decorator would then change the constructor to store modifications in neosr_params if they differ from the default values defined in defaults_for_modifications. So it wouldn't store anything if you create an ATD model with rgb_mean=(0.4488, 0.4371, 0.4040).

Implementing @modifications also shouldn't be difficult. It's similar to @store_hyperparameters I made for spandrel, so we can use it as a reference.

from neosr.

adegerard commented on August 20, 2024

So we make rgb_mean a hyperparameter and store it inside a dict that contains all hyperparameters that relate to modifications you made. We then JSON serialize that dict and put inside an uint8 tensor that we store in the model (let's called this tensor neosr_params for now). Spandrel and others can then look for this tensor and apply the same changes as neosr. Of course, if neosr doesn't make any modification, we don't need to store neosr_params at all.

why using a tensor to store metadata you can directly save in the state_dict?

Mixing metadata and tensors is not a good idea.

for example:

            metadata = {
                'datetime': datetime.strptime(
                    time.strftime("%Y-%m-%dT%H:%M:%S%z", time.localtime()),
                    '%Y-%m-%dT%H:%M:%S%z').isoformat(),
                'an_int': 3,
                'a_tuple': (2,3,4),
                'a_dict': dict(a=1,b=2,c=3)
            }
            state_dict[f'metadata'] = json.dumps(metadata)

from neosr.

RunDevelopment commented on August 20, 2024

why using a tensor to store metadata you can directly save in the state_dict?

safetensors.

Mixing metadata and tensors is not a good idea.

Why?

from neosr.

adegerard commented on August 20, 2024

safetensors

ok. there is a _ _ metadata _ _ field in the file format but I can understand that's easier to implement a single solution for both pth and saftetensors

Why?

Will the current official implementation still work with a model that has this new tensor?

from neosr.

joeyballentine commented on August 20, 2024

Will the current official implementation still work with a model that has this new tensor?

So long as you turn strict mode off, it would just discard the extra data. The actual changes themselves would break it, but that's gonna happen regardless since, well, the arch was changed

from neosr.

muslll commented on August 20, 2024

@RunDevelopment sorry for taking so long to answer.

As for code: I would implement a decorator to make this feature easy to use. Like this:

I made a commit here, please take a look. What would you suggest? I didn't modify the original decorator, feel free to suggest anything. Honestly, as long as the solution is not too convoluted, you can implement this as you see fit.

ps: ignore the other changes, check the end of the commit

from neosr.

muslll commented on August 20, 2024

Any update on this @RunDevelopment ?

from neosr.

All architectures with `rgb_mean = (0.5, 0.5, 0.5)` are incompatible about neosr HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent