shipbit / wingman-ai Goto Github PK

License: GNU General Public License v3.0

Python 84.19% Makefile 0.01% JavaScript 0.05% Cython 8.79% C++ 0.15% C 5.51% XSLT 1.17% CMake 0.12%

wingman-ai's Introduction

Wingman AI Core

Wingman AI allows you to use your voice to talk to various AI providers and LLMs, process your conversations, and ultimately trigger actions such as pressing buttons or reading answers. Our Wingmen are like characters and your interface to this world, and you can easily control their behavior and characteristics, even if you're not a developer.

1.5.0 Showreel:

Release trailer:

In-depth tutorial:

AI is complex and it scares people. It's also not just ChatGPT. We want to make it as easy as possible for you to get started. That's what Wingman AI is all about. It's a framework that allows you to build your own Wingmen and use them in your games and programs.

The idea is simple, but the possibilities are endless. For example, you could:

Role play with an AI while playing for more immersion. Have air traffic control (ATC) in Star Citizen or Flight Simulator. Talk to Shadowheart in Baldur's Gate 3 and have her respond in her own (cloned) voice.
Get live data such as trade information, build guides, or wiki content and have it read to you in-game by a character and voice you control.
Execute keystrokes in games/applications and create complex macros. Trigger them in natural conversations with no need for exact phrases. The AI understands the context of your dialog and is quite smart in recognizing your intent. Say "It's raining! I can't see a thing!" and have it trigger a command you simply named WipeVisors.
Automate tasks on your computer
improve accessibility
... and much more

Features

Since version 2.0, Wingman AI Core acts as a "backend" API (using FastAPI and Pydantic) with the following features:

Push-to-talk or voice activation to capture user audio
OpenAI text generation and function calling
Speech-to-text providers (STT) for transcription:
- OpenAI Whisper
- OpenAI Whisper via Azure
- Azure Speech
- whispercpp (local)
Text-to-speech (TTS) providers:
- OpenAI TTS
- Azure TTS
- Elevenlabs
- Edge TTS (free)
- XVASynth (local)
Sound effects that work with every supported TTS provider
Multilingual by default
Command recording & execution (keyboard & mouse)
- AI-powered: OpenAI decides when to execute commands based on user input. Users don't need to say exact phrases.
- Instant activation: Users can (almost) instantly trigger commands by saying exact phrases.
- Optional: Predetermined responses
Custom Wingman support: Developers can easily plug-in their own Python scripts with custom implementations
Skills that can do almost anything. Think Alexa... but better.
directory/file-based configuration for different use cases (e.g. games) and Wingmen. No database needed.
Wingman AI Core exposes a lot of its functionality via REST services (with an OpenAPI/Swagger spec) and can send and receive messages from clients, games etc. using WebSockets.

We (Team ShipBit) offer an additional client with a neat GUI that you can use to configure everything in Wingman AI Core.

Is this a "Star Citizen" thing?

No, it is not! We presented an early prototype of Wingman AI in Star Citizen on YouTube, which caused a lot of excitement and interest in the community. Star Citizen is a great game, we love it and it has a lot of interesting use-cases for Wingmen but it's not the only game we play and not the core of our interest. We're also not affiliated with CIG or Star Citizen in any way.

The video that started it all:

Wingman AI is an external, universal tool that you can run alongside any game or program. As such, it does not currently interact directly with Star Citizen or any other game, other than its ability to trigger system-wide keystrokes, which of course can have an effect on the game. However, if you find a way to interact with a game, either through an API or by reading the game's memory, you could - in theory - use it to directly trigger in-game actions or feed your models with live data. This is not the focus of Wingman AI, though.

Who is this for?

The project is intended for two different groups of users:

Developers

If you're a developer, you can just clone the repository and start building your own Wingmen. We try to keep the codebase as open and hackable as possible, with lots of hooks and extension points. The base classes you'll need are well documented, and we're happy to help you get started. We also provide a development guide to help you witht the setup. Wingman AI Core is currently 100% written in Python.

Gamers & other interested people

If you're not a developer, you can start with pre-built Wingmen from us or from the community and adapt them to your needs. Since version 2, we offer an eay-to-use client for Windows that you can use to cofigure every single detail of your Wingmen. It also handles multiple configurations and offers system-wide settings like audio device selection.

Providers & cost

Wingman AI Core is free but the AI providers you'll be using might not be. We know that this is a big concern for many people, so we are offering "Wingman Pro" which is a subscription-based service with a flat fee for all the AI providers you need (and additional GUI features). This way, you won't have to worry about intransparent "pay-per-use" costs.

Check out the pricing and features here: Wingman AI Pro

Wingman AI also supports local providers that you have to setup on your own but can then use and connect with our client for free:

Other providers

You can also use your own API key to use the following services:

OpenAI

Our Wingmen use OpenAI's APIs and they charge by usage. That means: You don't pay a flat subscription fee, but rather for each call you make to their APIs. You can find more information about the APIs and their pricing on the OpenAI website. You will need to create your API key:

Navigate to openai.com and click on "Try ChatGPT".
Choose "Sign-Up" and create an account.
(if you get an error, go back to openai.com)
Click "Login".
Fill in your personal information and verify your phone number.
Select API. You don't need ChatGPT Plus to use Wingman AI.
(Go to "Settings > Limits" and set a low soft and hard "usage limit" for your API key. We recommend this to avoid unexpected costs. $5 is fine for now)
Go to "Billing" and add a payment method.
Select "API Key" from the menu on the left and create one. Copy it! If you forget it, you can always create a new one.

ElevenLabs

You don't have to use ElevenLabs as TTS provider, but their voices are great. You can also clone your own with less than 5 minutes of sample audio, e.g. your friend, an actor or a recording of an NPC in your game.

They have a free tier with a limited number of characters generated per month so you can try it out first. You can find more information on their pricing page.

Signing up is very similar to OpenAI: Create your account, set up your payment method, and create an API key.

Edge TTS (Free)

Microsoft Edge TTS is actually free and you don't need an API key to use it. However, it's not as "good" as the others in terms of quality. Their voices are split by language, so the same voice can't speak different languages - you have to choose a new voice for the new language instead. Wingman does this for you, but it's still "Windows TTS" and not as good as the other providers.

Are local LLMs replacing OpenAI supported?

Wingman AI exposes the base_url property that the OpenAI Python client uses. So if you have a plug-in replacement for OpenAI's client, you can easily connect it to Wingman AI Core. You can also write your own custom Wingman that uses your local LLM.

Integrating specific LLMs oder models is currently not on our (ShipBit) priority list as explained here and we do not offer live support for it. Check out or Discord server if you're interested in local LLMs - there is a vibrant community discussing and testing different solutions and if we ever find one that satisfies our requirements, we might consider supporting it officially.

Installing Wingman AI

Windows

Download the installer of the latest version from wingman-ai.com.
Install it to a directory of your choice and start the client Wingman AI.exe.
- The client will will auto-start Wingman AI Core.exe in the background

If that doesn't work for some reason, try starting Wingman AI Core.exe manually and check the terminal or your logs directory for errors.

If you're a developer, you can also run from source. This way you can preview our latest changes on the develop branch and debug the code.

MacOS

Wingman runs well on MacOS. While we don't offer a precompiled package for it, you can run it from source. Note that the TTS provider XVASynth is Windows-only and therefore not supported on MacOS.

Linux

Linux is not officially supported but some of our community members were able to run it anyways. Check out their documentation.

Who are these Wingmen?

Our default Wingmen serve as examples and starting points for your own Wingmen, and you can easily reconfigure them using the client. You can also add your own Wingmen.

Computer & ATC

Our first two default Wingmen are using OpenAI's APIs. The basic process is as follows:

Your speech is transcribed by the configured TTS provider.
The transcript is then sent as text to the GPT-3.5 Turbo API, which responds with a text and maybe function calls.
Wingman AI Core executes function calls which equals a command execution.
The response is then read out to you by the configured TTS provider.
Clients connected to Wingman AI Core are notified about progress and changes live and display them in the UI.

Talking to a Wingman is like chatting with ChatGPT. This means that you can customize their behavior by giving them a context (or system) prompt as starting point for your conversation. You can also just tell them how to behave and they will remember that during your conversation. ATC and Computer use very different prompts, so they behave very differently.

The magic happens when you configure commands or key bindings. GPT will then try to match your request with the configured commands and execute them for you. It will automatically choose the best matching command based only on its name, so make sure you give it a good one (e.g. RequestLandingPermission).

More information about the API can be found in the OpenAI API documentation.

StarHead

StarHead is where it gets really interesting. This Wingman is tailored to Star Citizen and uses the StarHead API to enrich your gaming experience with external data. It is a showcase of how to build specialized Wingmen for specific use-cases and scenarios. Simply ask StarHead for the best trade route, and it will prompt you for your ship, location, and budget. It will then call the StarHead API and read the result back to you.

Like all of our OpenAI Wingmen, it will remember the conversation history and you can ask follow-up questions. For example, you can ask what the starting point of the route, or what the next stop is. You can also ask for the best trade route from a different location or with a different ship.

StarHead is a community project that aims to provide a platform for Star Citizen players to share their knowledge and experience. At the moment it is mainly focused on the trading aspect of Star Citizen. With a huge database of trade items, shop inventories and prices, it allows you to find the best trade routes and make the most profit. A large community of players is constantly working to keep the data up to date.

For updates and more information, visit the StarHead website or follow @KNEBEL on

Noteworthy community projects

UEXCorp by @JayMatthew: A former Custom Wingman, now Skill that utilizes the UEX Corp API to pull live data for Star Citizen. Think StarHead on steroids.
Clippy by @teddybear082: A tribute Skill to the sketchy Microsoft assistant we all used to hate.
WebSearch by @teddybear082: A Skill that can pull data from websites (and quote the sources) for you.

Can I configure Wingman AI Core without using your client?

Yes, you can! You can edit all the configs in your %APP_DATA%\ShipBit\WingmanAI\[version] directory.

The YAML configs are very indentation-sensitive, so please be careful. We recommend using VSCode with the YAML extension to edit them.

There is no hot reloading, so you have to restart Wingman AI Core after you made changes to the configs.

Directory/file-based configuration

Use these naming conventions to create different configurations for different games or scenarios:

any subdirectory in your config dir is a "configuration" or use case. Do not use special characters.
- _[name] (underscore): marks the default configuration that is launched on start, e.g. _Star Citizen.
Inside of a configuration directory, you can create different wingmen by adding [name].yaml files. Do not use special characters.
- .[name].yaml (dot): marks the Wingman as "hidden" and skips it in the UI and on start, e.g. .Computer.yaml.
- [name].png (image): Sets an avatar for the Wingman in the client, e.g. StarHead.png.

There are a couple of other files and directories in the config directory that you can use to configure Wingman AI.

defaults.yaml - contains the default settings for all Wingmen. This is merged with the settings of the individual Wingmen at runtime. Specific wingman settings always override the defaults. Once a wingman is saved using the client, it contains all the settings it needs to run and will no longer fallback to the defaults.
settings.yaml - contains user settings like the selected audio input and output devices
secrets.yaml - contains the API keys for different providers.

Access secrets in code by using secret_keeper.py. You can access everything else with config_manager.py.

Does it support my language?

Wingman supports all languages that OpenAI (or your configured AI provider) supports. Setting this up in Wingman is really easy:

Find the context setting for the Wingman you want to change.

Now add a simple sentence to the context prompt: Always answer in the language I'm using to talk to you. or something like Always answer in Portuguese.

The cool thing is that you can now trigger commands in the language of your choice without changing/translating the name of the commands - the AI will do that for you.

Also note that depending on your TTS provider, you might have to pick a voice that can actually speak your desired language or you'll end up with something really funny (like an American voice trying to speak German).

Develop with Wingman AI

Are you ready to build your own Wingman or implement new features to the framework?

Please follow our guides to setup your dev environment:

If you want to read some code first and understand how it all works, we recommend you start here (in this order):

http://127.0.0.1:8000/docs - The OpenAPI (ex: Swagger) spec
wingman_core.py - most of the public API endpoints that Wingman AI exposes
The config files in %APP_DATA%\ShipBit\WingmanAI\[version] to get an idea of what's configurable.
Wingman.py - the base class for all Wingmen
OpenAIWingman.py - derived from Wingman, using all the providers
Tower.py - the factory that creates Wingmen

If you're planning to develop a major feature or new integration, please contact us on Discord first and let us know what you're up to. We'll be happy to help you get started and make sure your work isn't wasted because we're already working on something similar.

Acknowledgements

Thank you so much for your support. We really appreciate it!

Open Source community

Wingman makes use of other Open Source projects internally (without modifying them in any way). We would like to thank their creators for their great work and contributions to the Open Source community.

azure-cognitiveservices-speech - Proprietary license, Microsoft
edge-tts - GPL-3.0
elevenlabslib - MIT, © 2018 The Python Packaging Authority
FastAPI - MIT, © 2018 Sebastián Ramírez
numpy - BSD 3, © 2005-2023 NumPy Developers
openai - Apache-2.0
packaging - Apache/BSD, © Donald Stufft and individual contributors
pedalboard - GPL-3.0, © 2021-2023 Spotify AB
platformdirs - MIT, © 2010-202x plaformdirs developers
pydantic - MIT, © 2017 to present Pydantic Services Inc. and individual contributors
pydirectinput-rgx - MIT, © 2022 [email protected], 2020 Ben Johnson
pyinstaller - extended GPL 2.0, © 2010-2023 PyInstaller Development Team
PyYAML - MIT, © 2017-2021 Ingy döt Net, 2006-2016 Kirill Simonov
scipy - BSD 3, © 2001-2002 Enthought, Inc. 2003-2023, SciPy Developers
sounddevice - MIT, © 2015-2023 Matthias Geier
soundfile - BSD 3, © 2013 Bastian Bechtold
uvicorn - BSD 3, © 2017-presen, Encode OSS Ltd. All rights reserved.

Individual persons

This list will inevitably remain incomplete. If you miss your name here, please let us know in Discord or via Patreon.

Special thanks

JayMatthew aka SawPsyder, @teddybear082 and @Thaendril for outstanding moderation in Discord, constant feedback and valuable Core & Skill contributions
@lugia19 for developing and improving the amazing elevenlabslib.
Knebel who helped us kickstart Wingman AI by showing it on stream and grants us access to the StarHead API for Star Citizen.
@Zatecc from UEX Corp who supports our community developers and Wingmen with live trading data for Star Citizen using the UEX Corp API.

Commanders (Patreons)

To our greatest Patreon supporters we say: o7 Commanders!

Premium Donators (Patreons)

The Announcer
Weyland
Morthius
Grobi
Paradox
Gopalfreak aka Rockhound
Averus

Wingmen (Patreons)

Ira Robinson aka Serene/BlindDadDoes, Zenith, DiVille, [Hiwada], Hades aka Architeutes, Raziel317, CptToastey, NeyMR AKA EagleOne (Capt.Epic), a Bit Brutal, AlexeiX, Dragon Aura, Perry-x-Rhodan, DoublarThackery, SilentCid, Bytebool, Exaust A.K.A Nikoyevitch, Tycoon3000, N.T.G, Jolan97, Greywolfe, Dayel Ostraco aka BlakeSlate, Nielsjuh01, Manasy, Sierra-Noble, Simon says aka Asgard, JillyTheSnail, [Admiral-Chaos aka Darth-Igi], The Don, Tristan Import Error, Munkey the pirate, Norman Pham aka meMidgety, meenie, Tilawan, Mr. Moo42, Geekdomo, Jenpai, Blitz, Aaron Sadler, SleeperActual, parawho, HypeMunkey, Huniken, SuperTruck, [NozDog], Skipster [Skipster Actual], Fredek, Ruls-23, Dexonist, Captain Manga

wingman-ai's People

Contributors

Stargazers

Watchers

wingman-ai's Issues

Website timing out

The supposed website https://www.wingman-ai.com/ is timing out for me

Star-head not working for me. The other wingmen work fine.

Don't know if this is a temporary thing due to an api being down or something, but I get this response when trying to request anything from star-head.

Running from source as per my previous issue was adviced. Didn't work with the .exe either btw.

Björn

Exception in thread Thread-7 (run_async_process):
Traceback (most recent call last):
  File "C:\Users\bjorn\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "C:\Users\bjorn\AppData\Local\Programs\Python\Python310\lib\threading.py", line 946, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\bjorn\Documents\wingman-ai\main.py", line 91, in run_async_process
    loop.run_until_complete(
  File "C:\Users\bjorn\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 641, in run_until_complete
    return future.result()
  File "C:\Users\bjorn\Documents\wingman-ai\wingmen\wingman.py", line 165, in process
    process_result, instant_response = await self._get_response_for_transcript(
  File "C:\Users\bjorn\Documents\wingman-ai\wingmen\open_ai_wingman.py", line 154, in _get_response_for_transcript
    instant_response = await self._handle_tool_calls(tool_calls)
  File "C:\Users\bjorn\Documents\wingman-ai\wingmen\open_ai_wingman.py", line 272, in _handle_tool_calls
    ) = await self._execute_command_by_function_call(
  File "C:\Users\bjorn\Documents\wingman-ai\wingmen\star_head_wingman.py", line 102, in _execute_command_by_function_call
    function_response = self._get_best_trading_route(**function_args)
  File "C:\Users\bjorn\Documents\wingman-ai\wingmen\star_head_wingman.py", line 138, in _get_best_trading_route
    cargo, qd = self._get_ship_details(ship)
  File "C:\Users\bjorn\Documents\wingman-ai\wingmen\star_head_wingman.py", line 193, in _get_ship_details
    qd = next(
  File "C:\Users\bjorn\Documents\wingman-ai\wingmen\star_head_wingman.py", line 197, in <genexpr>
    for item in loadout.get("data")
AttributeError: 'NoneType' object has no attribute 'get'

Auto-update mechanism

The client already checks if there is an updated version available. It would be great if it could also self-update itself after a prompt.

Refactor WingmanBase classes

use events and pass more context-specific params and status infos
- simplify virtual method calling and overriding
- Give derived classes a way to use base functionality (trigger keypresses) but also do own stuff elegantly
write more documentation

WingmanAI does not set its working dir correctly when exe is launched from startmenu

When the WingmanAICore.exe is launched from start menu (not a shortcut, but the exe) it runs into errors trying to create the mic recording and folder:

Traceback (most recent call last):
  File "pynput\_util\__init__.py", line 228, in inner
  File "pynput\keyboard\_win32.py", line 290, in _process
  File "pynput\_util\__init__.py", line 144, in inner
  File "main.py", line 58, in on_release
  File "services\audio_recorder.py", line 49, in stop_recording
    os.makedirs("audio_output")
  File "<frozen os>", line 225, in makedirs
PermissionError: [WinError 5] Zugriff verweigert: 'audio_output'

Probably on start, it should just switch to the dir of the exe?
In my uexcorp wingman (because I also create files), I solved it with this code:
os.path.dirname(os.path.abspath(sys.argv[0]))
This gives me the WingmanAICore.exe path reliably.

So either adjust all creations of folders or switch dir on start?
Not sure about the best solution, but you will figure it out 🙂

Discord Bug report: https://discord.com/channels/1173573578604687360/1185572510075453471

Can't un-default renamed Wingman

Seems like you can't "unstar" a Wingman for Voice Activation if the Wingman is a renamed ex-default Wingman.

AudioPlayer "cannot reshape" error (again)

According to several users, the error can still occur but it doesn't do anything and Wingman just keeps running after clicking it away. As we are unable to find the root cause of this and it doesn't destroy anything, we should just try-catch-log the error and hide it from the user.

Implement Vision API

Implement and showcase the use of Vision API in Wingman AI.

Pair-Bug in remember_messages functionality

I noticed an increase in reports and messages with this error.
Invalid parameter: messages with role 'tool' must be a response to a preceeding message with 'tool_calls'.

So I decided to dive a bit deeper in the code. It seems like the origin is the remember_messages functionality.

It "thinks" in pairs - but this is not correct.
So this is what it suspects the data to look like:

{'role': 'user', 'content': '...'}
ChatCompletionMessage(content="...", role='assistant', function_call=None, tool_calls=None)

But if a request resulted in one or multiple functions beeing triggered, it looks like this:

{'role': 'user', 'content': '...'}
ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_pvoBrd5vbS5RgS4cMIkHZsKI', function=Function(arguments='{}', name='PowerShields'), type='function'), ChatCompletionMessageToolCall(id='call_HXZ2ka96gOohkPjN9JSqcGPM', function=Function(arguments='{}', name='PowerEngines'), type='function'), ChatCompletionMessageToolCall(id='call_Xu2l7sz7DX9stIZLpyumtKSH', function=Function(arguments='{}', name='PowerShip'), type='function'), ChatCompletionMessageToolCall(id='call_M1kGfaHCkMF951PeVUdDRoif', function=Function(arguments='{}', name='InitiateStartSequence'), type='function')])
{'role': 'tool', 'content': '', 'tool_call_id': 'call_pvoBrd5vbS5RgS4cMIkHZsKI', 'name': 'PowerShields'}
{'role': 'tool', 'content': '', 'tool_call_id': 'call_HXZ2ka96gOohkPjN9JSqcGPM', 'name': 'PowerEngines'}
{'role': 'tool', 'content': '', 'tool_call_id': 'call_Xu2l7sz7DX9stIZLpyumtKSH', 'name': 'PowerShip'}
{'role': 'tool', 'content': '', 'tool_call_id': 'call_M1kGfaHCkMF951PeVUdDRoif', 'name': 'InitiateStartSequence'}

And when in this example a 'role': 'user' element is deleted, a 'role': 'tool' element will be at the start at the history. And thats where openai says invlaid_request_error.

So it should rather count elements with 'role': 'user', than think in pairs.

So "_cleanup_conversation_history" method in open_ai_wingman.py needs a rework. And probably at every point in code, where the "pairs" are printed out, as its currently not correct.

Linked to Discord Bug report: https://discord.com/channels/1173573578604687360/1183085077022908456

Client: Command list does not scroll

Check keyboard localisation in pydirectinput

Quoted from Discord:

I have been unable to make any command using the num# keys work. Everything is correctly ordered and aligned, the AI seems to think it's working, but the keypress is not registered. Pressing manually works as expected.

My guess is that pydirectinput might not be aware of localisation and non-English keyboard layouts. This might not just affect num-keys.

Whitescreen after login in 2 times

logged in with google, chose my name in the app, then used the app it worked perfectly, then when I clicked on my account again, it showed my name again and well when I continued the app is just completely white now and wont do anything, I even reinstalled it.

ESC hotkey to stop playback

Quick and natural way to stop playback

Language setting for STT is not visible if voice activation is on

Implement Edge TTS Example

Default/logically deleted templates get recreated

It should work like that:

on start, iterate src templates as tpl and check if it's already in the config dir
also consider "_tpl" and ".tpl" <- that's the part that's broken
If you find any variant of it, skip copying else copy it
(later)
Iterate all the default _tpls (should only ever be one)
if for some reason, you find multiple, just take the first one (and log)

So _Star Citizen gets copied all the time because it doesn't "detect" that "Star Citizen" or ".Star Citizen" already exists. Then it finds 2 defaults: _Star Citizen and _Xul. It always takes SC because it's lexically the first one. You'll find a message about this in the logs.

Global elevenlabs voice config overrides Wingman elevenlabs config | Breaks app

If a global voice name is supplied for the elevenlabs config object, voice: Adam, and a later elevenlabs config object has a voice object specified voice: id: <id>, it breaks the app trying to merge an item with a string.

Tower.py - line 32. Not exact area but starting point

def __instantiate_wingmen(self) -> list[Wingman]:
...
            merged_config = self.__merge_configs(global_config, wingman_config) <- HERE
            class_config = merged_config.get("class")

            wingman = None

Local LLM support in Wingman: Currently more of a "community task" - here's why!

I'm not a fan of OpenAI and their products and am not willing to use or support them.
Please add support for other, by now also well established, APIs too.
Like Text Generation WebUI ( https://github.com/oobabooga/text-generation-webui ) and
KoboldCPP ( https://github.com/LostRuins/koboldcpp ).
Both of them are able to be run locally on a users System and great at what they do - depending on the users hardware specifications.

Text Generation WebUI recently even implemented an OpenAI compatible API, so it might be pretty easy to implement them.
It would be a blast to see Wingman-AI supporting it soon as I'm eager to start testing it.

Optimize response streaming for lower latency

Quoted from Discord user meenie:

To help with the pretty large delay, it would be ideal to stream the reply from GPT-4 into TTS and then start streaming the TTS audio as it's being generated. I haven't jumped into the codebase yet to verify. If you aren't doing that, I think I might be able to help.

We are already utilising streams but should check if we really do it everywhere we can. The biggest delay seems to come from TTS as there is a noticeable delay before the voice output starts. Double-check if we're doing this right.

Build a GUI to edit Wingman config

Editing a yaml file might be too much for some users. Build a GUI that allows you to see and change all relevant config entries visually.

Skills need an unload/on_skill_removed hook

Skills can launch threads etc. but if the user removes the skill, there is currently no way to stop these threads and clean up.

Validate Wingman on start and let them fail gracefully

Currently, the entire programm crashes/closes if there's a Wingman with a faulty config or missing required config entries.

Add a function to Wingman base that derived classes can override to validate required params (like xyz_api_key).
Tower should try/catch this function and disable a Wingman if the validation fails
Wingman status should be shown in the UI.

On dev wingman, not having remember messages in config.yaml throws error

INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
Recording started (nms)
Recording stopped (nms)
Hey, look at this. What do you think I should do here?
Exception in thread Thread-2 (run_async_process):
Traceback (most recent call last):
  File "C:\Users\Willi\anaconda3\envs\wingman\Lib\threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "C:\Users\Willi\anaconda3\envs\wingman\Lib\threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Willi\Documents\GitHub\wingman-ai\wingman_core.py", line 163, in run_async_process
    loop.run_until_complete(
  File "C:\Users\Willi\anaconda3\envs\wingman\Lib\asyncio\base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\Users\Willi\Documents\GitHub\wingman-ai\wingmen\wingman.py", line 208, in process
    process_result, instant_response = await self._get_response_for_transcript(
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Willi\Documents\GitHub\wingman-ai\wingmen\open_ai_wingman.py", line 213, in _get_response_for_transcript
    self._add_user_message(transcript)
  File "C:\Users\Willi\Documents\GitHub\wingman-ai\wingmen\open_ai_wingman.py", line 389, in _add_user_message
    self._cleanup_conversation_history()
  File "C:\Users\Willi\Documents\GitHub\wingman-ai\wingmen\open_ai_wingman.py", line 394, in _cleanup_conversation_history
    remember_messages = self.config.features.remember_messages - 1
                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

I'm not sure of the exact architecture here but I think either: (a) a default has to be set for config.features.remember_messages if the variable is not set by the user or openai-wingman.py at line 394 needs to check if config.features.remember_messages is None before doing anything, if so, return 0 before any other calculations.

add 11Labs

Wrap the 11Labs API as service and build a custom Wingman that utilized it

Implement API call example

GPT fallback for not recognized instant commands

If you create instant commands, and don't get the phrase 100% correct, it should fallback to gpt and send the normal request, hoping, that gpt understands and calles it.

Prompt for API_KEY if none exists and move it to dedicated config file

The OpenAI API key (and other provider-specific later) should not reside in our general config.yaml as people share their wingmen setups regularly and sometimes forget to use their keys. The key should reside in a dedicated config file instead.

Wingman should prompt new users on first run for the API key so that they can just paste it in the terminal instead of having to open the config file and paste it there.

Open link (client) to OS default browser

when clicking a link, it open the sandboxed browser and no way back, have to close and restart the client. Can we set to open to windows default browser?

Detect config changes

... and restart the Wingman process afterwards

Let users override OpenAI base_url in the config

We should let users override the base_url property of the OpenAI client

to be able to switch if if they're using a proxy
can hook plug-in local LLMs like Llama

Command actions row drag&drop does not work in Tauri

Action rows in the command config can be drag&dropped to reorder them but somehow this does not work in the Tauri/release version. Tauri is probably suppressing the drag&drop behavior implemented in the browser.

Add ShipBit or Wingman icon as AppIcon

Dat app needs some swag!

Add timer commands to `OpenAiWingman` class

Timer commands are like delayed functions. Instead of doing an action right now, the AI would respond now and start a timer, running for a certain amount of time, which triggers another command when finished.
This would allow the AI to "remind the user in 1h to take a break", or "turn the lights back on in 5 minutes".

It still would not solves "reminding the user to say happy birthday to mum, every year on her birthday", but it's a start.

Wrong language detected and wrong response language

When I speak in English, I get a Chinese response.

Provide releases using GitHub actions and sign the damn exe

We already have a GitHub action that bundles and builds releases using pyinstaller. There are a couple of things that need clarification:

OV or EV certificate to sign the Windows executable?
How to best add ffmpeg to the release packages? bash script with a curl command? Add to zip? Installer?
Can we provide symlinks inside of our release zip pointing to /_internal/config.yaml and /_internal/wingman?

If somebody knows more about these topics, please contact me. Thank you!

Avast identifies virus

Avast seems to think there is a virus in WingmanAI.exe:

Edge TTS language detection does not work with pure azure configurations

The automated edge_tts language detection does not work with azure-only configurated wingmen, as it ignores the azure configuration and always uses direct OpenAI calls.

For example comparing __ask_gpt_for_locale and _summarize_function_calls methods in open_ai_wingman.py shows the missing azure "overwrite".

Discord bug report: https://discord.com/channels/1173573578604687360/1188640947358146600

Push to talk not work in Game

When I'm in the game and press the Push to talk buttons, the Wingman AI Tool doesn't respond

Can't get a .Vue or a .sh file to work with Wingman

Seems to be that .vue(Vue 2 Framework) or .sh(Linux bash) files do not work with Wingman. The script is just running over them, and not doing a thing. Is this something that has not been built into your system to work with these files, or have we broken it? We are running it in VSCode

Instant responses for non-instant commands not working properly

Follow up on this discord thread: https://discord.com/channels/1173573578604687360/1192112862030729246

The _get_response_for_transcript returns None, instant_response if responses are given in the command definition but the command isn't an instant command.

File: open_ai_wingman.py
Method: _get_response_for_transcript

if tool_calls:
   instant_response = await self._handle_tool_calls(tool_calls)
   if instant_response:
       return None, instant_response

This code snippet also calls _handle_tool_calls which calls _execute_command_by_function_call and in which the instant response is already played back to the user with the _play_to_user function.

The problem is now, that the return values (None, instant_response) of our method above (_get_response_for_transcript) are used in the process function.

File: wingman.py
Method: process

process_result, instant_response = await self._get_response_for_transcript(
    transcript, locale
)

and

await self._play_to_user(str(process_result))

So regardless of the instant response already getting played back to the user, it tries to also play back a None value to the user.
Which:

stops the current playback
plays "None"

Possible solution: Add an if-block to not play back None values.

if process_result:
    await self._play_to_user(str(process_result))

Check unreadable chars in ai context

config_eel.txt

Uploaded config as a txt since github doesnt support yaml

This is the file in question. You will notice that the context for the board-computer is fine, but it throws the error seen above. I rewrote the context and it worked.

For situational context: This is a config someone from the discord sent me to test/fix. So the file came from him. He stated to be using a regular keyboard, so we don't know what character could have been an issue.

Adding more properties to the apikeys.yaml seems to throw an exception in main.py

Adding more properties to the apikeys.yaml seems to throw an exception in main.py and exit the program.

# line 97 main.py
apikeys = get_or_create_api_keys()
for key, value in apikeys.items():
    config[key]["api_key"] = value.get("api_key")

We are adding another apikey for uexcorp in the apikeys.yaml. Does config[key]["api_key"] need the key to be set before writing to it? For this case config[ "uex-corp" ]["api_key"]. I assume not because it should just make a new key on its own, but im not sure how python handles that.

Make GPT model configurable

The GPT model used for conversational calls should be configurable/overridable in the config. gpt-4 as default is fine but people might want to fall back to cheaper models or add new ones when they're released.

The relevant code for this is in OpenAiWingman and the config setting should be placed unter openai in the general config section.