karashiiro / texttotalk Goto Github PK

View Code? Open in Web Editor NEW

39.0 5.0 26.0 2.01 MB

Chat TTS plugin for Dalamud. Has support for triggers/exclusions, several TTS providers, and more!

License: MIT License

C# 93.58% PLSQL 6.42%

ffxiv tts triggers websocket polly-voice dalamud-plugin uberduck-ai microsoft-cognitive-services

texttotalk's Introduction

TextToTalk

Chat TTS plugin for Dalamud. Has support for triggers/exclusions, several TTS providers, and more!

Commands

/tttconfig: Opens the configuration window.
/canceltts: Cancel all queued TTS messages.
/toggletts: Turns TTS on or off.
/disabletts: Turns TTS off.
/enabletts: Turns TTS on.

Lexicons

TextToTalk supports custom lexicons to modify how words are pronounced. For more information, please join our community lexicons discussion.

Direct links to information will be added here eventually.

Supported TTS providers

System (Windows)
AWS Polly
Azure (Microsoft Cognitive Services)
Uberduck
Websocket

WebSocket interfacing

TextToTalk can optionally open a WebSocket server to serve messages over. There are currently two JSON-format messages that can be sent (see IpcMessage):

TTS prompt:

{
  "Type": "Say",
  "Payload": "Firstname Lastname says something",
  // Will replace the logged-in player's name with {{FULL_NAME}}, {{FIRST_NAME}}, or {{LAST_NAME}} as appropriate.
  // Does not currently apply to players other than the logged-in player.
  "PayloadTemplate": "{{FULL_NAME}} says something",
  "Voice": {
    "Name": "Gender"
  },
  "Speaker": "Firstname Lastname",
  // or "AddonTalk", or "AddonBattleTalk"
  "Source": "Chat",
  "StuttersRemoved": false,
  // or null, for non-NPCs
  "NpcId": 1000115,
  // Refer to https://dalamud.dev/api/Dalamud.Game.Text/Enums/XivChatType
  "ChatType": 10,
  // Refer to https://dalamud.dev/api/Dalamud/Enums/ClientLanguage
  "Language": "English"
}

TTS cancel:

{
  "Type": "Cancel",
  "Payload": "",
  "PayloadTemplate": "",
  "Voice": null,
  "Speaker": null,
  // or "Chat", "AddonTalk", or "AddonBattleTalk"
  "Source": "None",
  "StuttersRemoved": false,
  "NpcId": null,
  "ChatType": null,
  "Language": null
}

Screenshots

Development

Refer to the wiki for dev documentation.

texttotalk's People

Contributors

Stargazers

Watchers

texttotalk's Issues

Don't say [Name says] on contiguous messages after the first

Stops working after logged out to the main screen. (Test version)

Title.

Getting logged out from the reintroduced anti-afk timer means you have to restart the game or disable>re-enable the plugin to get TTS back.

Separate voices for genders/characters

Some sort of system that allows different voices to be saved and assigned to different genders, or maybe even characters.

Websocket Server - Connection problems and static port

Tried using python and keep getting a 501 response so I tried using WebsocketSharp and I'm getting a "|Fatal|WebSocket.connect:0|WebSocketSharp.WebSocketException: Not a WebSocket handshake response" on connection.

Plan is to use Conqui TTS as a back-end.

Do you have a sample client that you used for testing?

IBM Watson Text To Speech API support

IBM Watson has an extremely natural-sounding TTS service that I use daily. It would be nice to have IBM Watson capability in the TextToTalk plugin.

Watson's TTS API Docs can be found here: https://cloud.ibm.com/apidocs/text-to-speech

Error generating speech for dialogue with <sigh>

https://i.imgur.com/QVQ2Q0A.png
https://i.imgur.com/nmEr2JK.png

Remove "x says:" from NPC dialogue

At least as an option, this seems like a good accessibility customization for quests without voice acting.

Clear TTS queue when disabling TTS through the keybind or chat commands

Clear TTS queue when disabling TTS through keybind toggle or the chat commands /toggletts and /disabletts

Amazon Polly crash

Hello, first of all I would like to thank you for this plugin its absolutely amazing. I seem to be having issues with Amazon Polly text to speech crashes after I try to enable Polly. I tried reinstalling the addon and as soon as I try to switch to Polly it just crashes.

Ungendered voice selected isn't used

I'm having an issue where the ungendered voice I selected isn't properly registered and (I assuming) the default US-english speaker voice is used instead.

See this vid for example. For some reasons, the game recognizes the report on the table is male speaker but that's probably how they assigned gender in the game? Not that I have any issue with that. Anyway, you can see around 0:44 where an unknown speaker enters the scene and, eventhough I selected Takumi, a JP voice, for ungendered in config, an english voice was played.

TTT sometimes stops working and game freezes

Hello,

I've uninstalled the TTT plugin and removed the settings, but I still have the problem.
Sometimes, after on sentence read successfully, the TTT plugin stops working.
Then , if I try to disable it or to exit the game, the game freezes.

I never had the problem before, but I've switched to Windows 11 recently, on a new computer.
I'm using the standard Windows voices.
It seems to happen when I alt-tab on Chrome.

EDIT:
When I type /xldebug and I open log window in verbose mod, then I launch a dialog, I have the following message:
"Unhandled SetStringChunkType: 16" a lot of times, then the same message with 29, 19, 41, 19, etc.

EDIT 2:
after uninstalling all plugins and installing TTT fresh, it seems to be more stable now. So I suppose that there was a conflict with another plugin or setting.

Support for Additional Voices

Version: 1909 Windows 10

I'm wondering if this plugin pulls from the available voices in control panel on Windows 10 or is limited to Zira / David when using Canadian English (EN) as the default language. In my control panel I have a couple extra voices available that were installed from a third party but they don't show up in the addon

Amazon Polly settings gone

Copied over the wrong user key & after it failed to authenticate I was unable to fix it. All Polly settings seems to have gone away. Have manually deleted the config files & completely reinstalled dalamud, but haven't been able to get the polly settings to show up again.

Often, but not always repeating

This seems to be happening more and more often, maybe one in three it will repeat the text it recently spoke. It will usually start from the beginning of the dialog tree as opposed to where I currently am.

I also don't have that translating plugin installed, which I think I saw mentioned before.

Disable "Character Name says:" completely

Perhaps this is a rare use case, but I would really appreciate if there was an option to remove "Character Name says:" completely, even for the first time a character speaks.

NPCs with singular lines won't trigger TTS if you speak to them multiple times in a row

Community lexicons

Migrated to #60 - please continue there!

I don't use lexicons myself, so I don't have one I'm maintaining, but if anyone else has lexicons they're willing to share I'd appreciate it if they could drop a link so I can provide them to anyone who wants them and doesn't know how to make them themselves. Alternatively, feel free to post them in the #preset-sharing channel in the goat place Discord, and I'll relink them somewhere here.

TTS reads out #H before colored text in NPC Dialogue boxes

Long "—" used in FFXIV dialogue leads to individual pronunciation of characters.

As the titles suggests the "—" character also known as emdash (at least i think that's what is used) makes TTS read each character individually rather than read the word infront and after the "—". So something like

behold Great King moogle — first of his name etc. sounds really silly.

would there be any way to make — be interpreted "-" instead? because "-" is treated as a normal dash and doesn't cause words to be read as individual characters.

Configuration not saving

Every time I log out and back in, I have to go into TextToTalk and reselect the channels I want. Uninstalling the plugin and reinstalling gives the same behavior and everything gets set to defaults.

(PLS) Longer graphemes do not take precedence

This may be related to #46, but it seemed different enough to open a new issue.

When matching entries in a lexicon file (for the System backend, at least), TTT appears to give priority to shorter grapheme matches. This prevents longer matches from working entirely.

E.g., given two entries, one with <grapheme>Ixal</grapheme> and the other with <grapheme>Ixali</grapheme>, when given the string "Ixali", TTT matches the first and ignores the second. The backend then sees the result as Ixal and i, and pronounces it as ['ɪk.sɑːl 'aɪ].

This is true regardless of the order of lexemes in the lexicon file. It seems like LexiconManager sorts the entries and then applies them in the order of shortest to longest.

FWIW, this seems to run counter to a couple guidelines in the PLS specification at https://www.w3.org/TR/pronunciation-lexicon/#AppC:

Precedence should be given to the retrieval of lexemes having a <grapheme> element whose content exactly matches the longest possible sequence of consecutive tokens. Thus, a lexeme for "they'll" should have precedence over a lexeme for "they" given the input "they'll'.

Lexical retrieval should be performed by the bias of tokens rather than characters. Thus, a lexeme for "do" should not match the beginning of "done".

The current implementation in LexiconManager doesn't appear to bother with any tokenization at the moment, so that might be worth pursuing. How exactly tokenization is typically implemented for speech synthesis is a bit beyond my depth, though.

Either way, I've found a workaround with the current version. Because aliases are applied before phonemes, you can use an alias to replace the longer grapheme with a string that doesn't match the shorter one, and then create a separate lexeme that matches that alias to the correct phoneme, like so:

  <lexeme>
    <grapheme>Ixali</grapheme>
    <grapheme>ixali</grapheme>
    <alias>I_x_a_l_i</alias>
  </lexeme>
  <lexeme>
    <grapheme>I_x_a_l_i</grapheme>
    <phoneme>ɪkˈsɑːli</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>Ixal</grapheme>
    <grapheme>ixal</grapheme>
    <phoneme>ɪksɑːl</phoneme>
  </lexeme>

This is super hacky, though.

Option to speak current dialogue bubble immediately upon TTS enable

This is a bit of a nitpick, but I love this plugin so much not to give more feedback.

TextToTalk used to speak the current NPC dialogue that is currently on the screen when I use the keyboard shortcut to enable TTS. This functionality, unfortunately, got removed in this commit.

Was it sometimes annoying hearing TTS NPC dialogue from an old cutscene I had 15 minutes ago? Sure, I guess, but to me, it's more disappointing that, if I forget to enable TTS before an NPC conversation, then I have to manually read the first line of dialogue before TTS kicks in for the next line of dialogue.

Furthermore, even if I never forget to enable TTS before NPC dialogue, I don't know which NPC dialogue will be voice acted before it starts, so, before this commit, I had previously disabled TTS just in case there was voice acting, and if not, then I'd enable it with the shortcut.

However, if there is imminent functionality to automatically disable TTS during voided cutscenes as described in this issue, my issue described here is more or less a non-issue, as I would have no reason to ever manually disable TTS.

More German Voices for Amazon Polly

At the moment the only German Voice in the option list for Amazon Polly is "Vicki".

According to this site there are two more German voices "Hans" and "Marlene". Could you please enable those in the option list as well?

I took a looked into the AWSSDK.Polly.dll binary that you use and the strings "Hans" and "Marlene" can be found inside and so I hope it is just a change in how the TTS plugin uses the Amazon Polly API.

Nested Replacements (w priority system) for lexicon

Would allow for priority for replacements, allowing the name Y'stola to have priority over 's

Tbh this isn't that important since the main usage of the lexicons being used is names, not recreating a whole language, so the only reason this would be helpful is for 's's after a name

Incompatibility with Chat Translator. Repeats text twice

Text is repeated twice on channels with Chat Translator enabled

'TTS disabled' message seems to be broken

I'm not entirely certain if it's just for me, but the 'TTS disabled' echo message seems to not be working right now.

Option to disable during voiced cutscenes

See title. For this to happen we need some list of voiced cutscenes and a method of detection.

Voice Unlocker Error

Error when trying to use the built in Voice unlocker, when clicking both Manual tutorial and also Enable all system voices on version 1.9.4 and 1.9.7. Other versions not tested.

There is nothing showing an issue in output or dalamud logs

Assign different voices/voice presets to different names in chat

See title, this needs a preset system first, and then a configurable mapping of name->preset.

Addon Crashes Launcher

If addon is loaded the game crashes, atleast 15 more are reporting that issue on the XIVLauncher Discord

1.8.0.1 Bugs

-Deleting plugin in plugin installer runs into an error

/echo chat doesn't trigger TTS when it's supposed to.

.NET SpeechSynthesizer bugs

-with custom lexicon, sometimes voice gets stuck on one voice, and changing the voice crashes the game.

one time this happened the game kept crashing until I deleted my config. (voice didn't get stuck after that) - Haven't Reproduced the game crashing, but sometimes the voice still does get stuck even when changing it. I think it might have something to do with lexicons and gendered voice presets not playing nice. (can force the voice I want by enabling gendered presets and putting them all on the preset I'm currently using)

Listed are issues that are fixed by deleting and reinstalling the plugin, or re-starting the game - Haven't had time to see if these have been fixed in 1.8.4.0 :

with a working custom lexicon selected after some time, TTS can't read anything out loud at all anymore.
custom lexicon sometimes stops being used and uses default pronunciation.

FIXED ~~-when using <proneme> in the lexicon.xml TTS can't read anything out loud at all.~~ Pronemes are working with some bugs(not tested for all the bugs yet)

FIXED ~~-also seems like European voices don't use the lexicon pronouncation when xml:lang="en' only when it's set to xml:lang="en-GB"~~

FIXED ~~- first time selecting a lexicon file, pronunciation isn't used.~~

FIXED ~~- deleting and re-selecting lexicons to use updated version of same lexicon.xml file won't use new pronunciation.~~

text with <3 and >/////< basically special characters is not read

it used to work.. now it skips these messages entirely and don't read them at all and am missing out on a lot while rping. is it possible to have it back to read special characters at least for the <3 ?

All Speech is interrupted by new dialogue box text and chat box messages when "Cancel the current speech when new text is available or text is advanced" is enabled.

The following bug occurs when "Cancel the current speech when new text is available or text is advanced" is enabled.

All text from dialogue boxes and triggered chat messages interrupt the queue and cancels current speech. Newest triggered chat box text or dialogue text is played.

Ideally this feature should only cancel dialogue text when new dialogue text is available or advanced.

A possible solution to this is to label dialogue text in the queue as dialogue, and when new dialogue text is available or text is advanced, clear the text labeled, "dialogue". Then, move triggered chat messages up the queue so they aren't lost.

I think people probably only want dialogue messages to interrupt dialogue messages, and everything should not interrupt each other.

Here is an recording using /echo and a dialogue box as an example of chat messages interrupting dialogue, dialogue interrupting chat messages, and chat messages interrupting chat messages.

https://streamable.com/8rpyoh

Feel free to reach out if you need any clarification! this one was a bit wordy because it was hard to explain 😅

Stop reading current NPC dialgoue box when dialogue box is advanced or closed

It would be great if TextToTalk would cancel reading the previous dialogue box when advancing to the next dialogue, and also when a text box is closed when you're done talking to an NPC. video example, clicking text box closed once I've finished reading it, it continues, and when a new textbox opens it has to wait for the previous voice to finish: https://streamable.com/saeger

Similarly, when toggling TextToTalk to disabled, it would be a nice addition to clear the TTS queue and cancel the current TTS! example : when disabling TTS, the voice continues https://streamable.com/8d72l1

It can be annoying when skipping through a bunch of unimportant text from a NPC, that the text drones on in the background which needs to be waited through, or manually cancelled.

(Also side note: Thank you so much for adding the Dialogue box reading! It's been making the the non-voiced cutscenes and it's accompanied missions so much easier to follow, and they feel more important! Thanks for making such a great plugin!)

enabling TTS sometimes reads an old dialogue box that isn't open anymore.

the TTS will talk as soon as something pop up on a screen even if it mean to talk over it self

Amazon Polly's Neural Engine voice options not listed...at first

Amazon Polly's Neural Engine voice options are not listed, and when I tried "standard" engine and picked voices, and then later switched back to the "neural" engine, the default neural engine voice that had worked before no longer works, and I'm still stuck with whichever "standard" engine voices that I had picked.

Gendered speakers being read at same time

If you advance through multiple dialog boxes that included different gendered speakers (even with gendered speakers option off), the dialog for the different speakers will be read at the same time.

Read current text when Enabling TTS while dialogue box is open, don't read current text when enabling TTS while dialogue box is closed.

In the XLDev Addon Inspector it is possible to see if the dialogue box is visible or not, so it should be possible to detect if the dialogue box is open.
I think that if the dialogue window is open, enabling TTS through the keybind shortcut or through chat commands, it should start reading the currently open dialogue box.
If the dialogue window is not open when enabling the TTS through the keybind or chat command, it should not read the current dialogue.

I hope that all makes sense! feel free to reach out for clarification if I said it weird!
Still really enjoying this plugin! it's been great for the Hildebrand missions!

User defined profiles with each their own hotkey?

Normally, I'd prefer to hear TTS for unvoiced NPC dialogue and maybe status effect changes on myself, but sometimes, when I'm just hanging out in a city chatting occasionally, I'd like to hear TTS from a number of additional sources. I'm not always paying close attention to the chatbox.

I think it would be great as a user, to be able to create separate profiles with different TTS settings, each having its own hotkey to switch to it.

As it is now, the hotkey is just a simple toggle, so if each profile's hotkey also worked as a toggle, it would probably need to disable any other previously active profile.

I do already enjoy FFXIV a lot more with this plugin as is. Thank you for creating it!

Localization

There should be it

Amazon Polly Voices no longer work with current release

They used to work, but after this update they no longer work.

TTS repeated twice in most dialogues

I am using testing version 1.8.0.4 and many of the dialogues are read twice. I am using Amazon Polly Neural Engine and its only free for 1 million characters, not sure if repeated speech is counted or not.

Add user-configurable pronunication lexicons

https://docs.microsoft.com/en-us/dotnet/api/system.speech.synthesis.speechsynthesizer.addlexicon?view=netframework-4.8

Needs to generate a lexicon from user configuration, write it to a temporary file? and then load it dynamically.

Amazon Polly Voices Rate Not Working

Amazon Polly Voices work, but changing the rate in the slider doesn't affect the voice speed for them. Is it possible to have rate supported for third party 64-bit voices?

Configurable playback speeds for Polly backend

See title, this isn't easily accomplished in NAudio on its own. I'll probably use Soundtouch for this.

Read quest text.

I have an idea for an accessibility feature where this addon could read from the quest text window directly for the non voice acted quests.
Right now, there is an option to enable npc dialogue, however the text is printed to chat after skipping to next line making it hard to follow.
A hotkey could be added and it would read on press so it doesn't automatically read everything.
I found where the text is stored if that is of any help.

Awkward pronunciation of some abbreviations with Amazon Polly

I'm using the Amazon Polly voices, which are great, but sometimes it extrapolates abbreviations like "nin" to mean "Nine Inch Nails"
or "res" to mean "residential" and it always throughs me for a loop every time I hear it. It would be pretty sweet to be able to disable that and/or be able to set our own custom pronunciations for words.

I don't know how the backend of this is set up, but assuming it's an Amazon Polly issue, then this issue isn't really a bug with this plugin but more of a feature request to be able to better utilize Amazon Polly.

I'll still love and use this plugin regardless.

TextToTalk Error

After installing the plugin and attempting to open the configuration window by clicking on the config button, i get this error popup:
https://i.imgur.com/F4T0atz.png

I tried restarting the game but didn't work. Please advise.