Describe the bug Running ComfyUI, a Stable Diffusion frontend, wit

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

IPEX 2.1.10+xpu regression with Stable Diffusion workloads that were working with IPEX 2.0.120+xpu on WSL2/Linux,about intel/intel-extension-for-pytorch

Comments (20)

simonlui commented on June 13, 2024 1

Any updates with this issue?

from intel-extension-for-pytorch.

alexsin368 commented on June 13, 2024

@simonlui I just picked up your issue and will work on reproducing it first

from intel-extension-for-pytorch.

simonlui commented on June 13, 2024

Let me know if you need help with the Docker images setup but it should be self explanatory to set up. If you use the base Dockerfile, you will get the currently working version using IPEX 2.0.120+xpu and using the Dockerfile.latest file should get you a version using IPEX 2.1.10+xpu which doesn't work.

from intel-extension-for-pytorch.

alexsin368 commented on June 13, 2024

I'm running into an issue trying to build the docker image. I've set my proxy settings but it's complaining about the public key being not available. See the attached log.

docker_build_error.txt

from intel-extension-for-pytorch.

simonlui commented on June 13, 2024

Sorry for the late reply. The issue seems to be with the GPG key you have retrieved which is from Intel itself, and I believe the command that fetches and installs it didn't run correctly. Can you clean out your cache from building the image and try rebuilding it again to see what you get? I checked and can access and use the GPG key without issue here when I rebuilt the image from scratch just a while ago in the United States where I live. That to me means that it's not an outage from fetching the GPG key which was an issue a month ago. I'm not sure if your geographic region, HTTPS proxy settings, or corporate firewall/policy has anything to do with not being able to access it but I do find it weird you aren't able to access your company's own domain here for something like this. I am pulling the key using this line in the Dockerfile which I derived from Intel's own documentation for oneAPI installation for APT Linux system in Step 3 here. Here's the line I used which I believe you are running into an issue with.

RUN no_proxy=$no_proxy wget --progress=dot:giga -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB    | gpg --dearmor | tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null &&    echo 'deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main'    | tee /etc/apt/sources.list.d/oneAPI.list

Besides a few added arguments, this should be verbatim the commands from that page chained together. Maybe try without proxy? Just for reference, I just use the following when I build my image here locally: docker build -t ipex-arc-comfy:latest -f Dockerfile.latest .

Unrelated to the above, but there are a few other pieces of information I want to mention because I forgot to include in my initial bug report which I believe you will find helpful with solving this bug. I built a container using the following arguments on my Linux system to generate the bug report's output.

docker run -it --device /dev/dri --network=host -v /home/simonlui/Code_Repositories/ComfyUI:/ComfyUI:Z -v /home/simonlui/Code_Repositories/models:/models:Z -v deps:/deps -v huggingface:/root/.cache/huggingface  --security-opt=label=disable --name comfy-server -e ComfyArgs="--listen --disable-ipex-optimize" localhost/ipex-arc-comfy:latest

Obviously, adapt the above to your system but I believe you should be able to replicate it using the same arguments I did.

If you want to experiment with the image and installation with a container, you need to override the entrypoint startup.sh script I set. You can do this by adding in --entrypoint bash before the localhost/ipex-arc-comfy:latest argument with the example build line I used and omitting the -e ComfyArgs="--listen --disable-ipex-optimize". You can still run ComfyUI manually if you run the following lines inside such a container:

source /deps/venv/bin/activate
python3 /ComfyUI/main.py --listen --disable-ipex-optimize

But you should be able to quit out into the container bash shell using Crtl-C and do whatever instead of the normally intended usage of starting and stopping the container.

from intel-extension-for-pytorch.

alexsin368 commented on June 13, 2024

The only system I have available with an Arc A770 GPU to work with at this time is a system based outside of the US. Because I'm in the US, and I need VPN to access it, there are proxy settings to be configured. I resolved that, but I might have hit a firewall issue. I'm currently working with my team to resolve this before I can proceed with building the docker image. This is not an issue on US-based systems.

And thank you for the additional notes. Your sample docker run command will help me run mine with changes to the arguments.

from intel-extension-for-pytorch.

KerfuffleV2 commented on June 13, 2024

The fact that it is taking weeks to locate an Intel GPU at Intel is really not inspiring much confidence and AMD is coming out with a cheap 16GB VRAM card this month. My A770 is still in the return window until the end of the month and I am seriously considering taking advantage of it. I got this card to do SD and right now it is a serious struggle. Is there a reason to believe its going to get better?

from intel-extension-for-pytorch.

simonlui commented on June 13, 2024

The fact that it is taking weeks to locate an Intel GPU at Intel is really not inspiring much confidence and AMD is coming out with a cheap 16GB VRAM card this month. My A770 is still in the return window until the end of the month and I am seriously considering taking advantage of it. I got this card to do SD and right now it is a serious struggle. Is there a reason to believe its going to get better?

You can decide whatever you want to do but there are paths forward. This is only an issue with using the latest code and ComfyUI. Using the previous version of IPEX, IPEX 2.0.120+xpu, SD works correctly with ComfyUI and quite a few extensions too. It should work with no issue with other frontends like stable-diffusion-webui and SD.Next which patched workarounds for IPEX using the latest version if you must have it.

from intel-extension-for-pytorch.

KerfuffleV2 commented on June 13, 2024

You can decide whatever you want to do but there are paths forward. This is only an issue with using the latest code and ComfyUI. Using the previous version of IPEX, IPEX 2.0.120+xpu, SD works correctly with ComfyUI and quite a few extensions too. It should work with no issue with other frontends like stable-diffusion-webui and SD.Next which patched workarounds for IPEX using the latest version if you must have it.

Yes, I know it's possible to get it working currently after a fashion. I am actually maintaining my own set of patches based on SD.next's whole CUDA emulation layer: comfyanonymous/ComfyUI#476 (comment) - so it is possible to hack it into a working state (with no help from Intel as far as I know) but that is obviously not an idea scenario.

I think we have different definitions of "path forward". You are saying the path forward is to stay in the past and just run an old version. Also, the community hacked together something that works around the issues this time but what about the next?

I really don't think I am unique in wanting to know that my hardware is going to continue to see support, that problems will be fixed/responded to in a relatively timely way. Running old versions and losing access to performance/security/QoL improvements is not a solution.

from intel-extension-for-pytorch.

simonlui commented on June 13, 2024

I think we have different definitions of "path forward". You are saying the path forward is to stay in the past and just run an old version. Also, the community hacked together something that works around the issues this time but what about the next?

I would not have filed this issue the very day IPEX 2.1.10+xpu released if I didn't have a vested interest in getting it fixed properly. However, with things being at the "fixing" phase of things, "moving forward" is meant to mean being able to still run Stable Diffusion in this scenario although with less than ideal setup, workarounds and hacks to make it work and I wasn't aware if you were stuck at that part so it's great you know how to get it working. But I do agree with your definition too that the status quo is not acceptable going forward, and should be fixed before the next release of IPEX.

I really don't think I am unique in wanting to know that my hardware is going to continue to see support, that problems will be fixed/responded to in a relatively timely way. Running old versions and losing access to performance/security/QoL improvements is not a solution.

Given the way Intel's GPUs have launched thus far as a new player in the field and knowing where they are prioritizing things, I can't say I am surprised at this speed which this is getting addressed and I will say it is frustrating for me too. But the facts also mean that users are going to have to put up with hassle from various fronts like this and people who have reviewed these cards have said as much and various other issues have popped up about these GPUs. Again, you can decide what you want to do based on that and express displeasure at it but I don't believe it is helpful to solving the issue. I actually do want to put in another issue for enhancement talking about the root of it which is the fact most Stable Diffusion front-ends need to implement a workaround layer to implement operations that should be natively supported in IPEX but are not.

from intel-extension-for-pytorch.

alexsin368 commented on June 13, 2024

Hi @simonlui thanks for your patience. The team has been supporting issues that came in over the last month and I just came back from a business trip. Now that I'm back, I will be prioritizing your issue and get back to you with an update soon.

The first step of any debug is to reproduce the issue, which includes gathering the exact hardware and software resources. We don't have access to your setup, which means there is additional overhead needed before the issue is reproduced. If you have an easier way to reproduce the issue, that would be most helpful and can speed up the time to resolve it.

from intel-extension-for-pytorch.

simonlui commented on June 13, 2024

If you have an easier way to reproduce the issue, that would be most helpful and can speed up the time to resolve it.

The main issue seems to be your access to hardware which I don't know if you have resolved your GPG key access issues yet. Technically, you should be able to reproduce the issue as long as it's the same Arc Alchemist GPU family since the sample workflow in ComfyUI that would reproduce this error is a simple SD 1.5 workflow which should run on about everything as long as you have an Arc family graphics card and IPEX so even an A310 would work here. If you have to be even more strict there, you could possibly use a GPU using the same chip of which I think includes the other A770 8GB, the A750 and the A580. I think the Flex 170 is the only card Intel produces though that reasonably gets close but not exactly to the A770 16 GB since it uses the same chip and memory configuration but not firmware so I guess you can try it with that if you have access to that.

from intel-extension-for-pytorch.

intel-ravig commented on June 13, 2024

@simonlui - I took over this ticket recently and have been able to reproduce this issue.
After debugging and tracing a little bit, I suspect the issue is originating from "forward" function in Class CrossAttention.

q,k,v all have dtype of torch.float16.
But line "self.to_out=....." generates torch.float32 sometimes - which ultimately throws the dtype mismatch. [dtype initialization???]

Just for kicks I made an addition in /deps_latest/venv/lib/python3.10/site-packages/torch/nn/modules/linear.py

def forward(self, input: Tensor) -> Tensor:
  print(self.weight.dtype, input.dtype)
        if (self.weight.dtype != input.dtype):
            input = input.to(self.weight.dtype)
        return F.linear(input, self.weight, self.bias)

After making the above change, the model does run, but I do see numerous counts of data_type mismatches further down the lane.
Since torch.float32 is default. I am suspecting dtype is not flowing into functions properly or is getting set to default. Can you look over in your function for "to_out" and see what is going on.

In the mean time, I will talk to IPEX engineering team and bring to their attention.

from intel-extension-for-pytorch.

simonlui commented on June 13, 2024

After debugging and tracing a little bit, I suspect the issue is originating from "forward" function in Class CrossAttention.

q,k,v all have dtype of torch.float16.
But line "self.to_out=....." generates torch.float32 sometimes - which ultimately throws the dtype mismatch. [dtype initialization???]
After making the above change, the model does run, but I do see numerous counts of data_type mismatches further down the lane.

This makes sense as the backtrace does complain about this but the strange thing is that ComfyUI does work with the previous version of IPEX somehow if you run my non-latest docker image that does the setup for that. This either seems like a restriction that was put in or regression from a change in IPEX. This may belong in a different issue for this and please tell me if you think so but currently, other IPEX implementations have workarounds that works around functions IPEX does not implement correctly so it works with everything needed to run Stable Diffusion. Those workarounds shouldn't really exist if IPEX was doing everything correctly. ComfyUI's implementation worked with IPEX without those workarounds last version because some things were not being run GPU like the Text Encoder. If you run ComfyUI with --gpu-only, it never worked after a certain point. The expectation was that this arrangement although fragile should have been able to run on the latest version of IPEX but no longer does.

Since torch.float32 is default. I am suspecting dtype is not flowing into functions properly or is getting set to default. Can you look over in your function for "to_out" and see what is going on.

It depends on the model used but the default Stable Diffusion checkpoint ComfyUI tells you to use and downloads for the default workflow is FP16 by default from RunwayML's release. And ComfyUI is not my application, this is an open source frontend for Stable Diffusion at https://github.com/comfyanonymous/ComfyUI. The code you are referring to is at https://github.com/comfyanonymous/ComfyUI/blob/d76a04b6ea61306349861a7c4657567507385947/comfy/ldm/modules/attention.py#L382 so I hope that helps with you tracking down this issue more.

from intel-extension-for-pytorch.

intel-ravig commented on June 13, 2024

Thanks @simonlui . I do understand that newer version does not work, but both PyTorch and IPEX both are upgraded.

Some comments here -
a. Have you verified the same setup with cuda device + latest PyTorch?

b. Yes the function "torch.nn.functional.scaled_dot_product_attention" is in question. Please refer to similar issue tagged here - pytorch/pytorch#110213 . Perhaps related? and pointing towards a PyTorch difference rather than IPEX.

c. I tested fp16 models of CompVis/stablediffusionv1.4 and https://huggingface.co/runwayml/stable-diffusion-v1-5 on the latest PyTorch and IPEX released and they both worked fine.

from intel-extension-for-pytorch.

Disty0 commented on June 13, 2024

This is an autocast issue rather than sdpa issue. sdpa is working as intended, autocast should've catched it before it hit sdpa.

There are more autocast issues i have worked around here:
https://github.com/vladmandic/automatic/blob/dev/modules/intel/ipex/hijacks.py#L105

from intel-extension-for-pytorch.

simonlui commented on June 13, 2024

Sorry, I didn't have time until now to check and reply to this until now.

a.) The docker arrangement works with a Nvidia GPU on another machine I had access to, but was using WSL2 through Windows.
b.) Probably not, ComfyUI has had Pytorch 2.1 support from comfyanonymous/ComfyUI@48242be onwards with the removed usage of xformers.
c.) I don't doubt that it may be possible that if you use it directly or via something like Diffusers that it may work. However, this is a Stable Diffusion frontend issue with ComfyUI which doesn't use Diffusers to run these models. And it did work with the older version of IPEX and Pytorch. I would agree with @Disty0 this may not be the fault of the SDPA and that it could be something else that didn't work correctly like autocasting of types.

from intel-extension-for-pytorch.

Disty0 commented on June 13, 2024

scaled_dot_product_attention returns float32 when float16 inputs are used with ipex. bfloat16 inputs returns bfloat16 as expected.
This might be related to the ComfyUI issue. Autocast should catch this tho.

Diffusers also catches it with this line:
https://github.com/huggingface/diffusers/blob/v0.26.2/src/diffusers/models/attention_processor.py#L1249

from intel-extension-for-pytorch.

simonlui commented on June 13, 2024

Has there been any updates with this issue? It seems like per the prior release cycles for IPEX on XPU, a new version will be coming soon, so I would like to know if this has been addressed or if we will need to keep working around the issue. Thanks.

from intel-extension-for-pytorch.

intel-ravig commented on June 13, 2024

@simonlui - This task is on priority list on our engineering team side and they are actively working on it. However, it will likely not get fixed in the upcoming release. So the workarounds on the application side will be needed for some more time.

from intel-extension-for-pytorch.

IPEX 2.1.10+xpu regression with Stable Diffusion workloads that were working with IPEX 2.0.120+xpu on WSL2/Linux about intel-extension-for-pytorch HOT 20 OPEN

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent