Giter Club home page Giter Club logo

Comments (9)

muslll avatar muslll commented on August 20, 2024 1

I documented it now. Please send an example code of how you think it would be better to add such tensor for detection and I can add it to the modified networks. Would something like this work for you?

self.register_buffer("neosr", torch.zeros(1))

Just to make it clear: this change to the rgb mean values was needed to improve stability. Several datasets were giving unstable training, and after debugging for several hours I found that the original values was the cause of it. The reason is that the default values were copied directly from SwinIR code, and they used ImageNet1k rgb mean values. This means that datasets that do not have values close to these (none, not even the classic dataset used in SISR research, DIV2k) will cause extreme values, making training unstable.
So this was not a 'random change' at all, it was made for a reason. The alternative to this solution would be to remove normalization altogether, which would completely break all models. Simply changing it to neutral values was the best way I found while keeping compatibility.

from neosr.

RunDevelopment avatar RunDevelopment commented on August 20, 2024 1

Will the current official implementation still work with a model that has this new tensor?

As Joey said, if the official arch code load the model with strict=True, then the additional tensor will prevent the model from loading. This is the intended effect, because a model with this tensor is incompatible.

I also want to point out that only parameters that differ from the values in the official arch will be stored. So e.g. an ATD model with rgb_mean=(0.4488, 0.4371, 0.4040) will not have the extra tensor, and so official arch code is guaranteed to load it in strict mode.

from neosr.

RunDevelopment avatar RunDevelopment commented on August 20, 2024

I documented it now.

Thanks musl!

Just to make it clear: this change to the rgb mean values was needed to improve stability.

You have already made numerous improvements that objectively make these architectures better. I'm not suggesting for you to stop doing that, we simply need to find a way to make it possible for spandrel and other to detect these changes.

Please send an example code of how you think it would be better to add such tensor for detection and I can add it to the modified networks.

Some on EE suggested an interesting solution that would ensure that we'll never have this problem again: put all changes behind new hyperparameters and store those extra hyperparameters in the model.

So we make rgb_mean a hyperparameter and store it inside a dict that contains all hyperparameters that relate to modifications you made. We then JSON serialize that dict and put inside an uint8 tensor that we store in the model (let's called this tensor neosr_params for now). Spandrel and others can then look for this tensor and apply the same changes as neosr. Of course, if neosr doesn't make any modification, we don't need to store neosr_params at all.

This would also allow you to add a "compatibility mode" to neosr. While neosr use the best hyperparameters by default (so e.g. rgb_mean=(0.5, ...)), compatibility mode would revert back to the hyperparameters the original arch uses.

I suggest this solution, because it will make it easier for us both. When you add a new modification, you just need to add a new hyperparameter and add it to a dict. No more tensors for each new parameters (e.g. SPAN no_norm). We can then detect the change and implement the modification. But most importantly, we can detect that a change has been made even if we don't support this change. This will allow spandrel to reject loading models that it don't support.

What do you think of doing something like this?


As for code: I would implement a decorator to make this feature easy to use. Like this:

@ARCH_REGISTRY.register()
@modifications
class atd(nn.Module):
    defaults_for_modifications = {"rgb_mean": (0.4488, 0.4371, 0.4040)}

    def __init__(self, embed_dim=210, ..., rgb_mean=(0.5, 0.5, 0.5)):
        # normal init

The modifications decorator would then change the constructor to store modifications in neosr_params if they differ from the default values defined in defaults_for_modifications. So it wouldn't store anything if you create an ATD model with rgb_mean=(0.4488, 0.4371, 0.4040).

Implementing @modifications also shouldn't be difficult. It's similar to @store_hyperparameters I made for spandrel, so we can use it as a reference.

from neosr.

adegerard avatar adegerard commented on August 20, 2024

So we make rgb_mean a hyperparameter and store it inside a dict that contains all hyperparameters that relate to modifications you made. We then JSON serialize that dict and put inside an uint8 tensor that we store in the model (let's called this tensor neosr_params for now). Spandrel and others can then look for this tensor and apply the same changes as neosr. Of course, if neosr doesn't make any modification, we don't need to store neosr_params at all.

why using a tensor to store metadata you can directly save in the state_dict?
image
Mixing metadata and tensors is not a good idea.

for example:

            metadata = {
                'datetime': datetime.strptime(
                    time.strftime("%Y-%m-%dT%H:%M:%S%z", time.localtime()),
                    '%Y-%m-%dT%H:%M:%S%z').isoformat(),
                'an_int': 3,
                'a_tuple': (2,3,4),
                'a_dict': dict(a=1,b=2,c=3)
            }
            state_dict[f'metadata'] = json.dumps(metadata)

from neosr.

RunDevelopment avatar RunDevelopment commented on August 20, 2024

why using a tensor to store metadata you can directly save in the state_dict?

safetensors.

Mixing metadata and tensors is not a good idea.

Why?

from neosr.

adegerard avatar adegerard commented on August 20, 2024

safetensors

ok. there is a _ _ metadata _ _ field in the file format but I can understand that's easier to implement a single solution for both pth and saftetensors

Why?

Will the current official implementation still work with a model that has this new tensor?

from neosr.

joeyballentine avatar joeyballentine commented on August 20, 2024

Will the current official implementation still work with a model that has this new tensor?

So long as you turn strict mode off, it would just discard the extra data. The actual changes themselves would break it, but that's gonna happen regardless since, well, the arch was changed

from neosr.

muslll avatar muslll commented on August 20, 2024

@RunDevelopment sorry for taking so long to answer.

As for code: I would implement a decorator to make this feature easy to use. Like this:

I made a commit here, please take a look. What would you suggest? I didn't modify the original decorator, feel free to suggest anything. Honestly, as long as the solution is not too convoluted, you can implement this as you see fit.

ps: ignore the other changes, check the end of the commit

from neosr.

muslll avatar muslll commented on August 20, 2024

Any update on this @RunDevelopment ?

from neosr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.