Issue Deion Got empty result using Python Backend. <p di

Same here with VSCodium extension <a href="https://github.com/Venthe/vscode-fauxpilot"

Could you please give more context how you did that <a class="user-mention notranslate

Python backend - fauxpilot-copilot_proxy failed to connect to all addresses about fauxpilot HOT 8 OPEN

AlexLeoTW commented on May 2, 2024

Python backend - fauxpilot-copilot_proxy failed to connect to all addresses

from fauxpilot.

Comments (8)

matbgn commented on May 2, 2024

Same here with VSCodium extension https://github.com/Venthe/vscode-fauxpilot

Set up with model py-model

Respectively gives:

fauxpilot-copilot_proxy-1  | [StatusCode.UNAVAILABLE] failed to connect to all addresses
fauxpilot-copilot_proxy-1  | WARNING: Model 'py-model' is not available. Please ensure that `model` is set to either 'fastertransformer' or 'py-model' depending on your installation
fauxpilot-copilot_proxy-1  | Returned completion in 2.1827220916748047 ms

from fauxpilot.

becxer commented on May 2, 2024

fastertransformer and py-model are not working here too

from fauxpilot.

becxer commented on May 2, 2024

In my case, disabling the http proxy in the container resolves the problem!

from fauxpilot.

matbgn commented on May 2, 2024

Could you please give more context how you did that @becxer ?

from fauxpilot.

becxer commented on May 2, 2024

In my case, due to the environment being restricted by proxy for security, I set an internal proxy for building Dockerfiles like

ENV http_proxy xxx.xx.xx.xx

But I forgot to remove this after installing pip package install.
so I removed those ENV after pip install, both for proxy.Dockerfile and triton.Dockerfile

ENV http_proxy ""

And it works.

from fauxpilot.

matbgn commented on May 2, 2024

OK, if I understand correctly, in your case it was a non-standard configuration. Not my case, I installed Fauxpilot straight away.

from fauxpilot.

ClarkWain commented on May 2, 2024

same problem

from fauxpilot.

jukefr commented on May 2, 2024

same issue on wsl2
fastertransformer works fine but can only run 350M model whereas i could run 2B one with python backend

$ uname -a
Linux puter 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ python3 --version
Python 3.10.6

$ docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.50                 Driver Version: 531.79       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1070 Ti      On | 00000000:01:00.0  On |                  N/A |
| 29%   61C    P5               24W / 180W|    946MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A       333      G   /Xwayland                                 N/A      |
|    0   N/A  N/A      3198      G   /kitty                                    N/A      |
+---------------------------------------------------------------------------------------+

edit: so okay apparently it does a download but gives you no sort of feedback about it, you can see it by answering yes to the cache question and watch du -lh the directory and waiting until the size does not keep increasing and the tmp file seems extracted. the launch script should also end with a bunch of "started" logs

however the issue still persists

so i tried to make sure everything was stopped with docker compose down -v then followed by a docker system prune -a and reran setup.sh to force a redownload and rebuild of everything just in case and make sure nothing that shouldnt be cached somewhere or something wasnt being

however the issue still persists

so pretty sure its just doesnt work in current state

edit2 ("fixed", very big quotes):

okay so apparently if you are not specifically using the fauxpilot vscode extension and explicitely setting it to use py-model it will just default to fastertransformer

so thats one isssue

you can get around it by editing copilot_proxy/utils/codegen.py and adding data["model"] = "py-model" right under the def generate function (currently line 75 on main branch)

that takes care of one issue i was having

the default timeout configured seems to be 30 seconds

on my gtx 1070ti the 350M fastermodel answers in like less than 1 second and the py-model answers in like 20 seconds for the exact same completion

so when i was trying the 2B (thinking that i could reliably run it since it only consumes 4GB of vram) model it would pretty much timeout every time silently and return 0 completions (even though the call shows it "succeeds" a little bit after 30 seconds)

so sure you can increase the the timeout probably but like seing how the py-model is "slower" by something like a factor of 20 (so yeah that certainly is slower lmao) probably makes it not worth it to even try to be honest and i havent bothered looking how to increase this timeout but i guess maybe for specific use cases there could be a point in using less vram to have this response time

maybe

for at least somebody out there

probably

🤷

anyways tldr:

mount the cache and check its path with watch du -h to see when its actually done downloading everything, or dont and wait until you will also get the "started GRPCInferenceService" "started HTTPService" "started Metrics Service" lines once its done
editing copilot_proxy/utils/codegen.py and adding data["model"] = "py-model" right under the def generate function (currently line 75 on main branch) if youre not using the specific vscode fauxpilot extension and have it configured with the py-model option
there also seems to be some sort of timeout mechanism that i cba looking into tbh (given running the models on python seems at the very least ~20 times slower, if youre lucky, more like 1000 in practice)

also even after doing all of this i can only get it to actually give me the suggestions on the fauxpilot vscode extension (but at least without the fastertransformer setting changed to py-model so in theory unless the answer format is completely changed with the py-model for whatever reason, this should be working, but im maybe missing something, but at least its not either issue 1, 2 or 3 mentionned above (i think)

couldnt get it to work with nvim copilot at least

now im getting a bunch of

fauxpilot-triton-1         | The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
fauxpilot-triton-1         | Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

spammed all over

but i really dont think i will be putting any more effort into getting the python models running so i leave the rest to someone else

last edit: also im being extremely generous with the 20 times factor, after checking again, fastertransformer answers in ~50ms, and calls for the exact identical prompt on the py-model are ~50 seconds so thats a factor of 1000 (not even sure why its included in the script as it will lead people to believe they are an "okay" tradeoff since you consume half the vram for the same amount of parameters as the fastertransformer model but anyhoot thats not my place to say, surely again this will be useful to at least a single person, maybe, probably)
maybe just make it clearer in the setup script that by "slower" you mean by a factor of ~1000

so actual tldr: dont bother trying to run py-model unless you have infinite time

from fauxpilot.

Python backend - fauxpilot-copilot_proxy failed to connect to all addresses about fauxpilot HOT 8 OPEN

Comments (8)

anyways tldr:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent