Speech-to-Image
Video Tutorial: https://youtu.be/uNfiu5k6RQk
Setup:
- Download and install comfyui: https://github.com/comfyanonymous/ComfyUI
- pip install faster-whisper (https://github.com/SYSTRAN/faster-whisper)
- Navigate to your ComfyUI directory
- git clone https://github.com/pydn/ComfyUI-to-Python-Extension.git
- Navigate to the ComfyUI-to-Python-Extension folder and install requirements
- pip install -r requirements.txt
- navigate to /ComfyUI-to-Python-Extension
- download zip or git clone https://github.com/All-About-AI-YouTube/speech-to-image.git
- pip install torch, pyaudio (https://pytorch.org/get-started/locally/)
- download a SD model (ex: https://civitai.com/models/139562/realvisxl-v30-turbo?modelVersionId=272378)
- place the model in /comfyui/models/checkpoints
- adjust parameters in workflow_api2.py (see video)
- set your SD model name in line 157
- set your save image path in display.py
- run workflow_api2.py