arihanv / shush Goto Github PK
View Code? Open in Web Editor NEWShush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app
Home Page: https://shush.arihanv.com
License: MIT License
Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app
Home Page: https://shush.arihanv.com
License: MIT License
Hi. I'm afraid I'm a complete novice when it comes to coding, so the process of getting this up and running was pretty laborious.
I tried the method described on the main GitHub page, using Bun, and eventually got the frontend running on http://localhost:3000/tryit. However, it doesn't display the server information from Modal. With your sample site, it looks like this:
Whereas mine looks like this:
And when I try to send the file regardless, I get a "failed to send file" error. That's neither here nor there, though, that's all just to say that that made me go try the second method, the one described here as "deploy on Modal".
When I get to the curl portion, if I use the following command:
curl -i -X POST -H "Content-Type: application/octet-stream" --data-binary @"D:\ac3.mp3" "https://<modal_org_name>--whisper-v3-entrypoint.modal.run?audio=ac3.mp3
after the file is uploaded, I get the following error in Modal itself:
Traceback (most recent call last):
File "/pkg/modal/_container_entrypoint.py", line 342, in handle_input_exception
yield
File "/pkg/modal/_container_entrypoint.py", line 460, in run_inputs
res = imp_fun.fun(*args, **kwargs)
File "/root/modal_app.py", line 77, in generate
output = self.pipe(
File "/usr/local/lib/python3.9/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 357, in call
return super().call(inputs, **kwargs)
File "/usr/local/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1132, in call
return next(
File "/usr/local/lib/python3.9/site-packages/transformers/pipelines/pt_utils.py", line 124, in next
item = next(self.iterator)
File "/usr/local/lib/python3.9/site-packages/transformers/pipelines/pt_utils.py", line 266, in next
processed = self.infer(next(self.iterator), **self.params)
File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 630, in next
data = self._next_data()
File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
data.append(next(self.dataset_iter))
File "/usr/local/lib/python3.9/site-packages/transformers/pipelines/pt_utils.py", line 183, in next
processed = next(self.subiterator)
File "/usr/local/lib/python3.9/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 434, in preprocess
inputs = ffmpeg_read(inputs, self.feature_extractor.sampling_rate)
File "/usr/local/lib/python3.9/site-packages/transformers/pipelines/audio_utils.py", line 41, in ffmpeg_read
raise ValueError(
ValueError: Soundfile is either not in the correct format or is malformed. Ensure that the soundfile has a valid audio file extension (e.g. wav, flac or mp3) and is not corrupted. If reading from a remote URL, ensure that the URL is the full address to download the audio file.
And Modal does process the same file using your sample site.
I get this error on Modal
2023-12-28T21:27:45+0000 Canceling remaining unfinished task <Task pending name='Task-15' coro=<asgi_app_wrapper..fn..fetch_data_in() running at /pkg/modal/_asgi.py:30> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f5d96a1ce20>()]> cb=[set.discard()]>
Just trying to setup this on Modal and going through the readme
On issuing modal deploy shush.py
I hit the error
NotFoundError: No secret named huggingface - you can add secrets to your account at https://modal.com/secrets
Do I just need a hugging face account as well ?
After deploying to modal and trying to run this command:
curl -X POST -F "audio=@<filename>" https://<org>--whisper-v3-entrypoint.modal.run
The console returns {"detail":"Not Found"}
Checking the logs on Modal, I see "404 Not Found" errors.
Trying the web app on localhost does not work either.
Im using this project since it seems like a user friendly way to achieve the latencies described here https://github.com/Vaibhavs10/insanely-fast-whisper but im trying a 27 min mp3 that seems to be processing for 20+ minutes. so what can i do to debug or go faster ?
some logs I'm seeing on modal
CUDA Version 12.1.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md
Received a request from Address(host='50.35.125.15', port=50528)
Request finished with status 200. (execution time: 48.7 ms, first-byte latency: 189.7 ms)
Received a request from Address(host='50.35.125.15', port=50528)
Request finished with status 200. (execution time: 46.0 ms, first-byte latency: 212.2 ms)
Received a request from Address(host='50.35.125.15', port=50528)
Request finished with status 200. (execution time: 23.4 ms, first-byte latency: 126.9 ms)
Received a request from Address(host='50.35.125.15', port=50528)
Request finished with status 200. (execution time: 32.9 ms, first-byte latency: 146.4 ms)
Received a request from Address(host='50.35.125.15', port=50528)
Request finished with status 200. (execution time: 28.2 ms, first-byte latency: 121.6 ms)
Received a request from Address(host='50.35.125.15', port=50528)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.