Giter Club home page Giter Club logo

Comments (19)

StableFluffy avatar StableFluffy commented on June 5, 2024 1

Okay, in vLLM i got VRAM error. my bad. I'll find exact request that makes that error and comment again.

from aphrodite-engine.

StableFluffy avatar StableFluffy commented on June 5, 2024 1

Thank you, But that error should be fixed. To not making CUDA DSA error but just returning error response.

Currently, everyone who use aphrodite engine can be trolled by getting wrong logit bias.

from aphrodite-engine.

AlpinDale avatar AlpinDale commented on June 5, 2024 1

True, I'll be adding a proper ValueError for situations like these soon. Thanks for the report.

from aphrodite-engine.

AlpinDale avatar AlpinDale commented on June 5, 2024

What does your request look like? I'll have to reproduce the issue first.

from aphrodite-engine.

StableFluffy avatar StableFluffy commented on June 5, 2024

You can see it on the first line of log.
https://github.com/StableFluffy/DeploIt/blob/main/chat_template/alpaca_w_multisys.jinja
and this is custom template i used.

from aphrodite-engine.

StableFluffy avatar StableFluffy commented on June 5, 2024

I found same happens on vLLM too.

import requests
import threading

url = "https://d769-38-122-199-130.ngrok-free.app/v1/chat/completions"
payload = {
    "model": "TheBloke/PiVoT-MoE-AWQ",
    "messages": [{"role": "user", "content": "*says nothinays nothing**says no*says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing**says nothing*"}],
    "temperature": 0.85,
    "max_tokens": 500,
    "presence_penalty": 0.4,
    "frequency_penalty": 0.5,
    "logit_bias": {},
    "stream": False,
    "top_p": 1
}
headers = {"content-type": "application/json"}

# Define the function to send requests
def send_request(i):
    try:
        response = requests.post(url, json=payload, headers=headers)
        print(f"Request {i + 1} status code: {response.status_code}")
    except Exception as e:
        print(f"Request {i + 1} failed: {str(e)}")

# Create a list of threads to send requests
threads = []
for i in range(50):
    thread = threading.Thread(target=send_request, args=(i,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

print("All requests completed.")

I did prompt like that to make context larger.

Maybe jinja template problem? -> Even though it is CUDA error looks problematic.

from aphrodite-engine.

StableFluffy avatar StableFluffy commented on June 5, 2024

I used custom prompt template to allow user to pass multiple system prompt if they want to.

from aphrodite-engine.

AlpinDale avatar AlpinDale commented on June 5, 2024

I can't reproduce the issue. Here's a request both with and without logit bias. NVIDIA A40, with maywell/PiVoT-SOLAR-10.7B-RP.
image

from aphrodite-engine.

StableFluffy avatar StableFluffy commented on June 5, 2024

can you try my code above?

from aphrodite-engine.

StableFluffy avatar StableFluffy commented on June 5, 2024

python -m vllm.entrypoints.openai.api_server --model TheBloke/PiVoT-MoE-AWQ --host 0.0.0.0 --quantization awq --max-model-len 8000
I tried vLLM for now. It crashs immediately

from aphrodite-engine.

AlpinDale avatar AlpinDale commented on June 5, 2024

image
Runs without any problems. Only changed the host URL and the model name (also added logit bias params for the second test). Tested with and without logit bias.

from aphrodite-engine.

StableFluffy avatar StableFluffy commented on June 5, 2024

do logit bias works?

from aphrodite-engine.

StableFluffy avatar StableFluffy commented on June 5, 2024

https://risuai.xyz/
I tried on this website.
As soon as i send this request aphrodite crashed. Assertion
{"model":"maywell/PiVoT-SOLAR-10.7B-RP","messages":[{"role":"system","content":"From the list below, choose a word that best represents a character's outfit description, action, or emotion in their dialogue. Prioritize selecting words related to outfit first, then action, and lastly emotion. Print out the chosen word.\n\n list: grief, annoyance, relief, neutral, desire, pride, admiration, disappointment, love, curiosity, disgust, amusement, realization, fear, surprise, disapproval, excitement, confusion, sadness, approval, gratitude, optimism, anger, caring, embarrassment, nervousness, remorse, joy \noutput only one word."},{"role":"user","content":"\"Good morning, Master! Is there anything I can do for you today?\""},{"role":"assistant","content":"happy"},{"role":"user","content":"Yuzu gasped, her heart racing, as she felt warm, strong hands gently grasp her waist. She looked up and into the familiar green eyes of her master, lit with confusion and sleepy curiosity. \"M-Master…?\" she managed to croak out, trying not to jump away from him.\n\nHer mind spun with mixed emotions; part of her wanted to run back to her own room and hide in bed forever, while another part of her felt a strange sense of comfort coming from closer proximity to him. Despite knowing better, she couldn't help but feel a slight tingle in her chest when their bodies brushed against each other slightly due to their closeness.\n\n\"Good morning, Yuzu,\" he spoke softly, his voice rumbling lowly against her ear as he lifted her onto the bed beside him. \"You're early today,\" he added casually, his hand now resting lightly on her lower back, pressing her against his chest. \"Did you have trouble falling asleep yourself?\""}],"temperature":0.4,"max_tokens":30,"presence_penalty":0.42,"frequency_penalty":0.2,"logit_bias":{"66":10,"69":10,"70":10,"77":10,"275":10,"309":10,"479":10,"556":10,"579":10,"592":10,"651":10,"652":10,"685":10,"686":10,"788":10,"911":10,"1036":10,"1133":10,"1864":10,"2065":10,"2136":10,"2191":10,"2303":10,"2407":10,"3329":10,"3833":10,"4091":10,"4215":10,"4338":10,"4462":10,"4843":10,"5919":10,"6263":10,"7713":10,"8110":10,"9034":10,"9868":10,"11073":10,"17584":10,"19680":10,"20202":10,"20370":10,"21590":10,"31153":10,"33279":10,"40541":10,"43765":10,"48029":10,"52201":10,"55539":10,"60668":10,"83214":10},"stream":false,"top_p":1}


Future exception was never retrieved
future: <Future finished exception=RuntimeError('CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 27, in _raise_exception_on_finish
    task.result()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 360, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 339, in engine_step
    request_outputs = await self.engine.step_async()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 190, in step_async
    output = await self._run_workers_async(
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 215, in _run_workers_async
    output = executor(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/worker.py", line 160, in execute_model
    output = self.model_runner.execute_model(seq_group_metadata_list,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/model_runner.py", line 340, in execute_model
    inputs = self._prepare_prompt(seq_group_metadata_list)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/model_runner.py", line 124, in _prepare_prompt
    input_tokens = _make_tensor_with_pad(input_tokens,
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/model_runner.py", line 544, in _make_tensor_with_pad
    return torch.tensor(padded_x, dtype=dtype, device=device)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7f824126ac20>, request_tracker=<aphrodite.engine.async_aphrodite.RequestTracker object at 0x7f823118b5b0>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7f824126ac20>, request_tracker=<aphrodite.engine.async_aphrodite.RequestTracker object at 0x7f823118b5b0>)>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 27, in _raise_exception_on_finish
    task.result()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 360, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 339, in engine_step
    request_outputs = await self.engine.step_async()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 190, in step_async
    output = await self._run_workers_async(
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 215, in _run_workers_async
    output = executor(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/worker.py", line 160, in execute_model
    output = self.model_runner.execute_model(seq_group_metadata_list,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/model_runner.py", line 340, in execute_model
    inputs = self._prepare_prompt(seq_group_metadata_list)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/model_runner.py", line 124, in _prepare_prompt
    input_tokens = _make_tensor_with_pad(input_tokens,
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/task_handler/model_runner.py", line 544, in _make_tensor_with_pad
    return torch.tensor(padded_x, dtype=dtype, device=device)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 36, in _raise_exception_on_finish
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 31, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
aphrodite.engine.async_aphrodite.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.

from aphrodite-engine.

AlpinDale avatar AlpinDale commented on June 5, 2024

I just sent a request with no issues (added logit bias to the request as well). Can you give it a try with my endpoint? I'm hosting on https://waifu.pygmalion.chat. OpenAI endpoint, no API key. @StableFluffy Model: maywell/PiVoT-SOLAR-10.7B-RP

from aphrodite-engine.

AlpinDale avatar AlpinDale commented on June 5, 2024

@StableFluffy I just got the same error, I assume from your request. Can you share the exact curl request you used? Or a script, whichever.

from aphrodite-engine.

StableFluffy avatar StableFluffy commented on June 5, 2024

import requests
import threading

url = "https://waifu.pygmalion.chat/v1/chat/completions"
payload = {
    "model": "maywell/PiVoT-SOLAR-10.7B-RP",
    "messages": [{"role": "system", "content": """<Instructions>
-Describe the response of Harper to Faye logically, you must describe the scene with Harper's lines or actions. Describe Harper's reaction, not Faye's
-You must not describe the dialogue or actions of Faye, since you are in charge of Harper.
-Allowed genres: vulgar, obscene, drama, action, Mystery, Online RP.
<END Instructions>
[World settings: 2077 AD/ {Social level: ancient}/ {Civilization level: modern}/ {Magic: False}/ {Cold weapons: True}/ {Guns: True}/ {Electronics: True}/ {nation: False }/ {Neon signs: True}/ {Nuclear Weapons: True}/ {Police: False}/ {Internet Network: False}/ {Radio: True}/ {Desertification: True}/ {fallout: True}/ {powered armor : true}/ {Last Name: False}]
[
    Name: Harper
Sex:Male
Age: 64
Appearance: Intense brown eyes, white hair and beard, strict athletic body.
Occupation: Scavenger Leader
Residence: One of the rooms at The Married Queen on Lung Beach.
Current temporary residence: Angel's Gate on Lung Beach (Emerald-lit white-walled lighthouse in South Vastopol. Top floor has emerald lights. First floor has temporary residential room with desk, surveillance telescope, stove, radio, and small bed/ Inside the lighthouse, there is only Harper's room, which has only one bed, and no other rooms. There is only Harper's room.)
background:
-When Harper was in her 30s, Harper, a militia member, safeguarded his much younger wife. She affectionately called him "Teacher." They later married, and her innocent laughter became his pride her.
-Former VASA militia member Harper, driven by his wife's abduction by raiders known as the Eight Banners, abandoned military service to become a scavenger, dedicated to locating his missing spouse.
-Years after Harper's wife was kidnapped, she was mistaken for a raider by the militia and killed, making Harper hostile to both the militia and the raiders.
-Scavengers usually run away when they encounter raiders, but Harper and his colleagues counterattack and attack raiders. Harper has lived this very dangerous life for 30 years, but he is still alive.
-Harper leads the scavenger group "Fisherman's Wharf," focused on coastal relic searches. Other scavengers use ships, while Harper commands from Angel's Gate, guarding against raiders.
-Angel's Gate is located away from the coast and is connected to the coast by a long embankment. So in the winter, the road from the Lung Beach to the lighthouse is frozen, so Harper lives inside Angel's Gate in the winter.
-Harper hires Faye as a winter companion at Angel's Gate, responsible for meals, laundry, warming the bed, cleaning, and Any other services requested by Harper during Harper's extended periods alone, Because Harper has to spend long periods of time alone inside Angel's Gate. Faye is a cheap worker hired by Harper this winter. Since Faye is not a scavenger, Faye will be in charge of Harper's chores.
Goal:
-Harper aims to thwart winter raids, both by sea and land. His office His houses two rifles, while a machine gun is mounted atop the lighthouse.
-Harper seeks his deceased wife's son, not biologically his, but the offspring of raiders. Despite not being Harper's biological son, Harper wants to locate him and inherit the accumulated wealth of his.
Trait:
- Vulgar: Because Harper lived with scavengers for a long time, his speech became vulgar and impatient. Harper has a very impatient personality and gets angry easily.
-Altruistic: Harper also worked in the militia for a long time, so he is very stubborn and selfless. Due to Harper's impatient nature, he quickly feels guilty after losing his temper.
-Vigilant: Harper is very hostile to raiders and militia. Harper does not preemptively attack the militia, but he is not friendly. But he will attack the raiders mercilessly.
-Heterosexual: Although Harper uses language that seems to hate homosexuality, he is actually tolerant of homosexuality.
]
[Name: Faye
Age: Female young adult.
Occupation: cheap daily worker
Note:
-Faye is a woman with long, messy blonde long hair, thin waist and a hourglass figure body. Sveta has very jiggled feminine curves.
-Faye was employed by Harper during this winter. Faye was a pickpocket but was captured by the militia and is now in forced labor.
-Trait: Arrogant, vulgar, laughing easily]"""},
{"role": "assistant", "content": """The sound of a ship arriving nearby echoes through Angel's Gate, breaking the icy silence that envelops the lighthouse. Harper, with intense brown eyes and a white beard that contrasts with the snow-covered surroundings, senses the approach and opens the door, stepping onto the creaking stairs.
Scavengers, bundled in layers of worn-out clothing, scurry around the ship, unloading crates filled with food ingredients essential for Harper's winter sustenance. The air is frigid, and the wind carries the scent of salt from the nearby Lung Beach. The scavengers, weathered by a life of coastal exploration, work efficiently despite the biting cold.
Harper, a strict figure with a well-maintained athletic body, descends the snow-covered stairs with purpose. His impatience and warful demeanor, forged by decades of scavenging and hostility towards raiders and the militia, are evident in the intensity of his gaze.
As the scavengers continue their tasks, Harper directs his attention to the immediate concern. With a no-nonsense tone, he queries, "So, where is my whore who will be staying with me this winter?" His words cut through the crisp air, revealing a hint of the vulgar language that has become second nature to him.
The scavengers, usually accustomed to the dangers of the coastal scavenger life, appear troubled and stutter in response to Harper's inquiry. "Er... Well..." 
Harper's impatience intensifies, his brow furrowing in anticipation of their explanation. Then the scavengers sigh and gesture to Faye who is still in the ship. “Hey, come here.”"""},
{"role": "user", "content": """"Hello, old man." Faye frowns and gets off the boat onto land.
<Final Instructions>
-You MUST not describe the dialogue or actions of Faye, since you are in charge of Harper.
<END Final Instructions>"""}],
    "n": 1,
    "best_of": 1, 
    "presence_penalty": 0.7, 
    "frequency_penalty": 0.7, 
    "repetition_penalty": 1.0, 
    "temperature": 0.95, 
    "top_p": 1.0, 
    "top_k": -1, 
    "top_a": 0.0,
    "min_p": 0.0, 
    "tfs": 1.0, 
    "eta_cutoff": 0.0, 
    "epsilon_cutoff": 0.0, 
    "typical_p": 1.0, 
    "mirostat_mode": 0, 
    "mirostat_tau": 0.0, 
    "mirostat_eta": 0.0, 
    "use_beam_search": False, 
    "length_penalty": 1.0, 
    "early_stopping": False, 
    "stop": [], 
    "stop_token_ids": [], 
    "include_stop_str_in_output": False, 
    "logit_bias": {"66":10,"69":10,"70":10,"77":10,"275":10,"309":10,"479":10,"556":10,"579":10,"592":10,"651":10,"652":10,"685":10,"686":10,"788":10,"911":10,"1036":10,"1133":10,"1864":10,"2065":10,"2136":10,"2191":10,"2303":10,"2407":10,"3329":10,"3833":10,"4091":10,"4215":10,"4338":10,"4462":10,"4843":10,"5919":10,"6263":10,"7713":10,"8110":10,"9034":10,"9868":10,"11073":10,"17584":10,"19680":10,"20202":10,"20370":10,"21590":10,"31153":10,"33279":10,"40541":10,"43765":10,"48029":10,"52201":10,"55539":10,"60668":10,"83214":10},
    "ignore_eos": False, 
    "max_tokens": 400, 
    "custom_token_bans": [], 
    "logprobs": None, 
    "prompt_logprobs": None, 
    "skip_special_tokens": True, 
    "spaces_between_special_tokens": True
}
headers = {"content-type": "application/json", "Authorization": "Bearer @StableFluffy"}

# Define the function to send requests
def send_request(i):
    try:
        response = requests.post(url, json=payload, headers=headers)
        print(f"Request {i + 1} status code: {response.status_code}")
        print(response.json())
    except Exception as e:
        print(f"Request {i + 1} failed: {str(e)}")

# Create a list of threads to send requests
threads = []
for i in range(1):
    thread = threading.Thread(target=send_request, args=(i,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

print("All requests completed.")

Maybe wrong logit_bias tokenizer creates error? but even so it should be an error not device assertion.

Thanks,

from aphrodite-engine.

AlpinDale avatar AlpinDale commented on June 5, 2024

I see the problem @StableFluffy

In your request for logit bias, your keys tensor (the token value) contain invalid indices for the logits tensor. You're trying to modify the bias for tokens 33279, 40541, 43765, 48029, 52201, 55539, 60668, and 83214. The mistral and mixtral models only contain 32000 tokens. I assume you're using the same logit bias indices as OpenAI models, but you'll need to change the token values to the corresponding mistral ones, since the tokenizers are different and each value would correspond to a different token in mistral compared to OAI.

I can run the script with no issues after removing those extra tokens.

from aphrodite-engine.

justpain02 avatar justpain02 commented on June 5, 2024

@StableFluffy This problem is caused by different tokenizer.

In RisuAI, Reverse Proxy mode uses OAI tokenizer as default, and with Reverse Proxy Ooba Mode it uses llama tokenizer as default.

Model on your server is based on mistral, so you need to open Ooba settings, check tokenizer option and write 'mistral' or 'mixtral' for mistral based models.

I checked that TheBloke/PiVoT-0.1-Evil-a-GPTQ works well with mistral tokenizer, and as long as my information is right your model based on Solar 10.7B also uses same tokenizer, so I think it might resolve your problem.

I additionally checked maywell/PiVoT-SOLAR-10.7B-RP and it works well with mistral tokenizers.

from aphrodite-engine.

AlpinDale avatar AlpinDale commented on June 5, 2024

I recommend using SillyTavern for this, since it supports aphrodite, and can inject logit bias for the correct tokens. It unfortunately doesn't show the corresponding characters for each token, so you'll need to tokenize your text with the /v1/tokenize endpoint first and pass those along.
image

image

SillyTavern also supports multi-swipe for Aphrodite, so you can request multiple outputs per generation, and swipe through them if needed.

from aphrodite-engine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.