Giter Club home page Giter Club logo

vocode-python's Introduction

  vocode

Build voice-based LLM apps in minutes

Vocode is an open source library that makes it easy to build voice-based LLM apps. Using Vocode, you can build real-time streaming conversations with LLMs and deploy them to phone calls, Zoom meetings, and more. You can also build personal assistants or apps like voice-based chess. Vocode provides easy abstractions and integrations so that everything you need is in a single library.

We're actively looking for community maintainers, so please reach out if interested!

⭐️ Features

Check out our React SDK here!

🫂 Contribution and Roadmap

We're an open source project and are extremely open to contributors adding new features, integrations, and documentation! Please don't hesitate to reach out and get started building with us.

For more information on contributing, see our Contribution Guide.

And check out our Roadmap.

We'd love to talk to you on Discord about new ideas and contributing!

🚀 Quickstart

pip install 'vocode'
import asyncio
import logging
import signal
from vocode.streaming.streaming_conversation import StreamingConversation
from vocode.helpers import create_streaming_microphone_input_and_speaker_output
from vocode.streaming.transcriber import *
from vocode.streaming.agent import *
from vocode.streaming.synthesizer import *
from vocode.streaming.models.transcriber import *
from vocode.streaming.models.agent import *
from vocode.streaming.models.synthesizer import *
from vocode.streaming.models.message import BaseMessage
import vocode

# these can also be set as environment variables
vocode.setenv(
    OPENAI_API_KEY="<your OpenAI key>",
    DEEPGRAM_API_KEY="<your Deepgram key>",
    AZURE_SPEECH_KEY="<your Azure key>",
    AZURE_SPEECH_REGION="<your Azure region>",
)


logging.basicConfig()
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)


async def main():
    (
        microphone_input,
        speaker_output,
    ) = create_streaming_microphone_input_and_speaker_output(
        use_default_devices=False,
        logger=logger,
        use_blocking_speaker_output=True
    )

    conversation = StreamingConversation(
        output_device=speaker_output,
        transcriber=DeepgramTranscriber(
            DeepgramTranscriberConfig.from_input_device(
                microphone_input,
                endpointing_config=PunctuationEndpointingConfig(),
            )
        ),
        agent=ChatGPTAgent(
            ChatGPTAgentConfig(
                initial_message=BaseMessage(text="What up"),
                prompt_preamble="""The AI is having a pleasant conversation about life""",
            )
        ),
        synthesizer=AzureSynthesizer(
            AzureSynthesizerConfig.from_output_device(speaker_output)
        ),
        logger=logger,
    )
    await conversation.start()
    print("Conversation started, press Ctrl+C to end")
    signal.signal(
        signal.SIGINT, lambda _0, _1: asyncio.create_task(conversation.terminate())
    )
    while conversation.is_active():
        chunk = await microphone_input.get_audio()
        conversation.receive_audio(chunk)


if __name__ == "__main__":
    asyncio.run(main())

📞 Phone call quickstarts

🌱 Documentation

docs.vocode.dev

vocode-python's People

Contributors

adnaans avatar ajar98 avatar anunayajoshi avatar arpagon avatar atlemichaelselberg avatar cl3m avatar dantenoguez avatar divst3r avatar eliothsu avatar hhousen avatar jusgu avatar kian1354 avatar kshah707 avatar lefant avatar m-ods avatar macwilk avatar mysteriousagent avatar osilverstein avatar reuben avatar ripperdoc avatar rynld avatar sethgw avatar seungjin-vocode avatar shahafabileah avatar shobhitsrivastava avatar skirdey avatar stephenhandley avatar t4533n avatar vladcuciureanu avatar zaptrem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vocode-python's Issues

Telephone server install

I got stuck with an issue when setup https://github.com/vocodedev/vocode-python/blob/main/examples/telephony_app.py
It runs, but I hear only what's up. Nothing else. Then- the server went downs after 15 seconds. Seems idle time from constant.
Doesn't work on Linux/windows.
Used this config:

telephony_server = TelephonyServer(
base_url=BASE_URL,
config_manager=config_manager,
inbound_call_configs=[
InboundCallConfig(
url="/inbound_call",
agent_config=ChatGPTAgentConfig(
initial_message=BaseMessage(text="Hello! What's up?"),
prompt_preamble="Have a pleasant conversation about life",
generate_responses=True,
),
twilio_config=TwilioConfig(
account_sid="XX...",
auth_token="XX...",
),
synthesizer_config=GTTSSynthesizerConfig.from_telephone_output_device()
)
],
logger=logger,
)

language support across Transcribers/Synthesizers

here's a list of our transcribes/synthesizers that have non-English language support: would love for folks to pick up transcribers/synthesizers they enjoy using and add this feature!

emphasized the ones that folks use the most

Streaming Turn-based
AssemblyAI Transcriber n/a
Deepgram Transcriber n/a
Google Transcriber ❌ (in progress!) n/a
RevAI Transcriber n/a
Whisper.cpp Transcriber n/a
Whisper Transcriber n/a
Azure Synthesizer
Coqui Synthesizer
ElevenLabs Synthesizer
Rime Synthesizer
Play.ht Synthesizer
gTTS Synthesizer
StreamElements Synthesizer
Coqui TTS Synthesizer
Google Synthesizer n/a

Waiting on hold during phone calls

Hi,

Does anyone have any recommendations for a speech-to-text service that can handle waiting on hold when calling a phone tree? Specifically, it needs the ability to properly recognize where there is background music as opposed to an operator speaking.

Best,
J

add a cost estimate tool/section

very nice work!

assume a use case where n customers are called to get some feedback on whatever.

would be nice to have a cost estimate model, e.g. how much would it cost to call like a thousand customers staying on average 2 minutes on-line with the different cloud services.

certificate verify failed: unable to get local issuer certificate

I'm getting some weird certificate issue on my M1 Mac with python3

File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/websockets/legacy/client.py", line 663, in await_impl
_transport, _protocol = await self._create_connection()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 1103, in create_connection
transport, protocol = await self._create_connection_transport(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 1133, in _create_connection_transport
await waiter
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/sslproto.py", line 534, in data_received
ssldata, appdata = self._sslpipe.feed_ssldata(data)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/sslproto.py", line 188, in feed_ssldata
self._sslobj.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/ssl.py", line 975, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)

ERROR:asyncio:Task was destroyed but it is pending

DEBUG:main:Generating response for transcription
DEBUG:main:Sent chunk 0 with size 8000
DEBUG:main:Sent chunk 1 with size 8000
DEBUG:main:Sent chunk 2 with size 8000
DEBUG:main:sending interrupt
DEBUG:main:Interrupting synthesis
DEBUG:main:Human started speaking
DEBUG:main:Sent chunk 3 with size 8000
DEBUG:main:Interrupted, stopping text to speech after 4 chunks
DEBUG:main:Message sent: Well, as an AI language model, I don't have physical sensations, so I don't get -
DEBUG:main:Got transcription:  Okay, confidence: 0.0
DEBUG:main:Generating response for transcription
DEBUG:main:Sent chunk 0 with size 8000
DEBUG:main:Sent chunk 1 with size 8000
DEBUG:main:Sent chunk 2 with size 3043
DEBUG:main:Message sent: Is there anything else I can help you with?
DEBUG:main:sending interrupt
DEBUG:main:Human started speaking
DEBUG:main:Got transcription:  complete your last sentence., confidence: 0.99902344
DEBUG:main:Generating response for transcription
ERROR:asyncio:Task was destroyed but it is pending!
task: <Task pending name='Task-1476' coro=<StreamingConversation.send_messages_to_stream_async.<locals>.send_to_call() running at /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocode/streaming/streaming_conversation.py:200> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[_chain_future.<locals>._call_set_state() at /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/futures.py:394]>
DEBUG:main:Sent chunk 0 with size 8000
DEBUG:main:Sent chunk 1 with size 8000
DEBUG:main:Sent chunk 2 with size 8000
DEBUG:main:Sent chunk 3 with size 1304
DEBUG:main:Message sent: any sensations or emotions like humans do.```

async task unexpectedly died, could be race conditions: If multiple asynchronous tasks are accessing shared resources at the same time, it can lead to race conditions and errors, or improper handling of exceptions: If the asynchronous task raises an exception that is not properly handled, it can cause the program to crash or hang. 

[EPD-105] Make BaseConfigManager async

Currently all the methods on BaseConfigManager (and all its subclasses) are synchronous. That's an issue because for example the RedisConfigManager will block the main loop when a calls ends or starts, which will block all other tasks going on (ex: running the agent, sending the audio to transcriber, or accepting new incoming http calls in the Fast server).

Progress: telephony app, streaming conversation, client backend, and the langchain demo are tested and working

From SyncLinear.com | EPD-105

Dockerize the client backend app

Folks who aren't familiar with python environments struggle to set up the client_backend.py code, set this up with a Dockerfile

Websockets sometimes fail

2023-04-03 15:34:55,012 - vocode.streaming.hosted_streaming_conversation - INFO - Listening...press Ctrl+C to stop
Traceback (most recent call last):
  File "C:\Users\Josh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\websockets\legacy\protocol.py", line 968, in transfer_data
    message = await self.read_message()
  File "C:\Users\Josh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\websockets\legacy\protocol.py", line 1038, in read_message
    frame = await self.read_data_frame(max_size=self.max_size)
  File "C:\Users\Josh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\websockets\legacy\protocol.py", line 1113, in read_data_frame
    frame = await self.read_frame(max_size)
  File "C:\Users\Josh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\websockets\legacy\protocol.py", line 1170, in read_frame
    frame = await Frame.read(
  File "C:\Users\Josh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\websockets\legacy\framing.py", line 69, in read
    data = await reader(2)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\asyncio\streams.py", line 705, in readexactly
    raise exceptions.IncompleteReadError(incomplete, n)
asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of 2 expected bytes

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Josh\vocode\vocode-quickstart-external-host.py", line 51, in <module>
    asyncio.run(conversation.start())
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\asyncio\base_events.py", line 649, in run_until_complete
    return future.result()
  File "C:\Users\Josh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\vocode\streaming\hosted_streaming_conversation.py", line 102, in start
    return await asyncio.gather(sender(ws), receiver(ws))
  File "C:\Users\Josh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\vocode\streaming\hosted_streaming_conversation.py", line 96, in receiver
    async for msg in ws:
  File "C:\Users\Josh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\websockets\legacy\protocol.py", line 497, in __aiter__
    yield await self.recv()
  File "C:\Users\Josh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\websockets\legacy\protocol.py", line 568, in recv
    await self.ensure_open()
  File "C:\Users\Josh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\websockets\legacy\protocol.py", line 944, in ensure_open
    raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: no close frame received or sent

VectorDB

It would be great if there was support for querying over vector databases such as Pinecone.

Add native zoom integration

Add a native zoom integration to the list of I/O devices. features:

  • Join zoom meetings
  • Different participation modes (listening, participating, mute/unmute control)

Running locally always results in an error after a while

So, I'm playing around with the python demo now (using headphones); and just as with the react demo it throws an error after a few questions & answers. I've never been able to talk to it longer than a few minutes, which really is a bummer. Anyone else noticing this? The error is related to ws.send(AudioMessage.from_bytes(data).json()) and ends with websockets.exceptions.ConnectionClosedError: no close frame received or sent.

The full error is as follows:

    return await asyncio.gather(sender(ws), receiver(ws))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/vocode/streaming/hosted_streaming_conversation.py", line 86, in sender
    await ws.send(AudioMessage.from_bytes(data).json())
  File "/opt/homebrew/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 647, in send
    await self.write_frame(True, opcode, data)
  File "/opt/homebrew/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 1214, in write_frame
    await self.drain()
  File "/opt/homebrew/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 1203, in drain
    await self.ensure_open()
  File "/opt/homebrew/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 944, in ensure_open
    raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: no close frame received or sent

Another one I'm getting after a while ends with this:

File "/opt/homebrew/lib/python3.11/site-packages/vocode/streaming/hosted_streaming_conversation.py", line 102, in start
    return await asyncio.gather(sender(ws), receiver(ws))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/vocode/streaming/hosted_streaming_conversation.py", line 86, in sender
    await ws.send(AudioMessage.from_bytes(data).json())
  File "/opt/homebrew/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 635, in send
    await self.ensure_open()
  File "/opt/homebrew/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 953, in ensure_open
    raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1011 (unexpected error) keepalive ping timeout; no close frame received

[EPD-123] restart websocket when it dies in client_backend

Websockets are slightly finicky and may fall-over from time to time - this is particularly relevant in the client_backend code (and to some extend the telephony_app code, but have seen less bug reports) since they host websockets.

When the websocket falls over and the conversation is still active, we need a mechanism to restart the connection.

EPD-123

[Feature request] Add Gaze Correction Support for LLM and Voice Embedding Library

Chatgpt4 generated description here:

As the LLM and voice embedding library continue to grow, it would be a valuable enhancement to add gaze correction support similar to NVIDIA's eye correction. This feature would improve the user experience by ensuring that eye contact is maintained during video calls or other communication scenarios, making interactions feel more natural and engaging.

Gaze correction can help address the common issue of users looking at their screens instead of the camera during video calls, which can create the impression of a lack of eye contact. Implementing this feature would not only make our library more robust but also cater to a wider range of use cases in various communication platforms.

In order to accomplish this, we will need to:

  1. Research NVIDIA's eye correction implementation and identify the best practices and techniques used.
  2. Evaluate the feasibility of integrating gaze correction into the existing library architecture.
  3. Design and implement the gaze correction feature, ensuring compatibility with the current library.
  4. Test and optimize the performance of the gaze correction feature across different devices and scenarios.
  5. Update documentation to include instructions on how to use the new gaze correction feature.

By adding gaze correction support to our library, we can provide a more immersive and interactive experience for users, further setting our library apart from competitors.

incorrect text encoding for swedish output running ChatGPTAgentConfig with generate_responses=True

I noticed an issue with encoding when I set "generate_responses=True" in ChatGPTAgentConfig, then the text coming from chatgpt seems to be encoded incorrectly.

DEBUG:alfrid_python.vocode_examples.telephony_app:Human started speaking
DEBUG:alfrid_python.vocode_examples.telephony_app:Got transcription:  Hur många planeter finns I ett vårt provsystem?, confidence: 0.8232422
DEBUG:alfrid_python.vocode_examples.telephony_app:Generating response for transcription
DEBUG:alfrid_python.vocode_examples.telephony_app:Sent chunk 0 with size 8000

...

DEBUG:alfrid_python.vocode_examples.telephony_app:Sent chunk 9 with size 8000
DEBUG:alfrid_python.vocode_examples.telephony_app:Sent chunk 10 with size 1635
DEBUG:alfrid_python.vocode_examples.telephony_app:Message sent: Det finns åtta planeter som ingår i vårt solsystem: Merkurius, Venus, jorden, Mars, Jupiter, Saturnus, Uranus och Neptunus.

Det finns åtta planeter som ingår i vårt
should be
Det finns åtta planeter som ingår i vårt solsystem

if I run with generate_responses=False it looks like that.

I am running a self-hosted telephony server with Deepgram transcriber (configured for swedish language), chat gpt agent and azure synthesizer.

Errors getting swallowed

As I have been building my own agents, when they error, they largely do so silently. If I take them out of the vocode setup, and just run them solo, they show their errors. Is this something to do with asyncio? Does it swallow errors that show up in child processes?

package issue?

ModuleNotFoundError: No module named 'vocode.streaming'; 'vocode' is not a package

End-to-end latency react demo / Support for real-time synthesizers

Hi,

The React demo had too much latency for my use cases. When I mean latency, I mean the time between someone saying something and the ability for that person to hear the first word of an answer. Was the React demo using the streaming API on the backend? I could not find the source code of the backend of the react code.

Also is there a way with the current synthesizers to support real-time / streaming text-to-speech services such as Polly? My understanding is that none of the text-to-speech services / synthesizer in vocode.turn_based.synthesizer support a streaming API like Polly. Please let me know if that's not true.

Best,
J

add turn-based Google Synthesizer

currently, we only have a streaming Google Synthesizer, there should also be one that outputs an AudioSegment in vocode/turn_based/synthesizer/google_synthesizer.py

keyword-based interrupts

Right now, when the human begins speaking (and allow_human_to_cut_off_bot is True), the bot gets cut-off. However, there are some use-cases where it makes sense to only allow interrupts based on a wake-up word (think "Alexa")

information on data encoding

hi want to send received voice to discord and can you tell me what is encoding of audio received and what will be required input encoding

[Feature Request]: concurrent speech synthesis

Right now, it appears as though chunks are sent for synthesis one after another, each one blocking on the other being played. This introduces pauses between sentences during playback, due to the lag in getting the audio rendered. Is there a setting (or could it be added) to send them concurrently as soon as they are available, so there is no lag?

Websocket user implemented seems to be unimplemented

Irony in the title aside, it looks like the websockets based user-implemented agent referenced in these files:

https://github.com/vocodedev/vocode-python/blob/3a84fd02938e7e52edfe862a80a0749caf1e7aaf/docs/create-your-own-agent.mdx

class WebSocketUserImplementedAgentConfig(

Is missing from the factory:

return RESTfulUserImplementedAgent(agent_config=agent_config, logger=logger)

Or am I missing something?

Adding support for Hugging Face Hub models

Would it be possible to support speech-to-text inference via models on the Hugging Face Hub? For that matter, there are LLMs and text-to-speech models as well so adding support for an HF token would be useful!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.