We need a separate tab or view that the players can see. Ideally, it's a fullscreen gallery that is updated live as images are generated. It also needs to not be too intrusive for tabs to not be easily switched to Owlbear Rodeo or Roll20.
Every time you prompt ChatGPT through the API, it has no memory of previous prompts. This sucks. We can get around this by creating "memories" with structured directories and text files. For example, if we want a feature that allows you to have dialogue with a specific character in your campaign you can have a main summary text file that lists notable information and events in chronological order. Next to each event, there will be a reference to a "memory" text file that expands on that individual event. Sort of like a foreign key in SQL. It might make sense to use sqlite in the future, but I'm keeping it basic for the PoC.
User interacts with dialogue interface -> engineered prompt + user input figures out who the player is trying to talk to and queries memories for references to the character -> ideally it finds it and scans the summary for the character -> any additional granular information that might be found in separate memory files is ascertained -> another (or several) prompts are automatically run in the background to summarize the relevant information the final prompt needs with the player dialogue -> an in-character reply is sent back
This will require a lot of prompts for each "cycle" of information retrieval. We need to also ensure every prompt stays below 3000 words to ensure we don't go past the API's max token amount of 4096.
Ideally we're not having users switch in and out of our tab constantly. We want them to be able to generate the prompt, as well as generate and view the image all in the same place.
The idea is there's a separate output box and button for an a natural language description of whatever the input is. For example, you could say "an opened treasure chest". The output would explain the the way the chest looked, the gold, jewels, and other treasures inside of it as if you were reading it in a book.
This will require an entirely different prompt to ChatGPT than our SD prompt generator. It may be best to have it pretend to be a famous fantasy author.
We would then have another button that would feed that natural language description into our original stable diffusion prompt generator for an even better (hopefully) image prompt output.
If you want your images to have a constant style, you may want to have "photorealistic" or "anime" as positive tags that are added to the beginning of the generated prompt. The same case with negatives to move away from certain styles or concepts.