cohere-ai / sandbox-conversant-lib Goto Github PK

Conversational AI tooling & personas built on Cohere's LLMs

Home Page: https://discord.gg/co-mmunity

License: MIT License

Python 98.75% CSS 1.25%

chatbot chatbot-framework chatbots cohere conversational-agent conversational-ai conversational-bots dialogue-generation dialogue-systems large-language-models llm nlp

sandbox-conversant-lib's People

Contributors

Stargazers

Watchers

sandbox-conversant-lib's Issues

Add new persona

Create new persona: Travel Advisor.

Add support for session management in chat histories

Right now, chatbots have a single chat history object for 1-on-1 dialogue, which doesn't lead to any obvious pathway for managing multiple sessions.

Allowing a bot to access multiple chat histories (e.g., by having a session key layer first in the dictionary) would allow a single bot instance to managing multiple chats when exposed through some interface.

(This happens to work currently in streamlit because each session instantiates a new bot, but this shouldn't be the only option.)

Release package on PyPI

This should be straightforward via poetry: https://python-poetry.org/docs/libraries/

The PR template should ask if `CHANGELOG.md` & `pyproject.toml` are up-to-date

This is helpful for staying on top of metadata / release notes.

docs: improve readme to make it clearer how to deploy on streamlit

Our docs talk about adding an API key to secrets.toml, but this is only relevant for local development and not for hosted streamlit apps. We could improve the README by clarifying the distinction, and either providing information for secrets management on streamlit or directing users to the relevant docs on streamlit.

See this discord discussion: https://discord.com/channels/954421988141711382/954431217560879134/1040301959149797496

Removing caching from github actions workflows

Our workflows do some caching of Poetry environments to speedup actions. This uses a ton of cache space though -- each env is > 100MB!

It's probably worth it to skip the caching in favour of a bit more runtime on running tests and building docs.

AC:

workflows still run correctly
caching is removed from the YAML files that define the workflows

refactor: could make searching code easier to understand

sandbox-conversant-lib/conversant/search/local_searcher.py

Lines 46 to 63 in 50dab74

 self._measure_similarity(embedded_query, d.embedding) 

 for d in self.documents 

 ] 

 max_similarity = max(similarities) 

 if max_similarity < threshold: 

 logging.warning( 

 f"Max search similarity {max_similarity} below threshold {threshold}; " 

 "no document returned." 

 ) 

 return None 

 logging.info(f"Search result found for query: {query}") 

 nearest_idx = similarities.index(max_similarity) 

 nearest_doc = self.documents[nearest_idx] 

 return nearest_doc

Right now there are some list comprehensions & reliance on list order preservation that make this hard to interpret.

The code could be made more explicit:

# Search for the most similar doc
most_similar_document = None
top_similarity_score = 0
for doc in documents:
    similarity_score = self._measure_similarity(embedded_query, doc.embedding)
    if similarity_score > top_similarity_score:
        most_similar_document = doc

# Make sure it's similar enough
if top_similarity_score < threshold:
    logging.warning(
        f"Max search similarity {max_similarity} below threshold {threshold}; "
        "no document returned."
    )
    return None

return most_similar_document

Shoutout to @mkozakov and Jacob-GK for flagging this!

We should handle the case where max_context_lines is set too high and .generate() errors due to too many characters

If you set a chatbot to remember many context lines, and/or you type really long utterances, you can error-out the call to co.generate() and we should have some graceful fallbacks here.

We could:

Retry with a smaller context
Make use of co.tokenize() to ensure we never exceed passing too many tokens
Do something else?

demo_utils.launch_streamlit does not work

co = cohere.Client("API KEY")
shakespeare_config = {
    "preamble": "Below is a conversation between Shakespeare and a Literature Student.",
    "example_separator": "<CONVERSATION>\n",
    "headers": {
        "user": "Literature Student",
        "bot": "William Shakespeare",
    },
    "examples": [
        [
            {
                "user": "Who are you?",
                "bot": "Mine own nameth is Shakespeare, and I speaketh in riddles.",
            },
        ]
    ],
}
shakespeare_bot = PromptChatbot(
    client=co,
    prompt=ChatPrompt.from_dict(shakespeare_config),
    persona_name="william-shakespeare",
)
demo_utils.launch_streamlit(shakespeare_bot)

Error raised:
TypeError: cannot pickle '_queue.SimpleQueue' object

This is because in Cohere 2.8.0, there is a ThreadPoolExecutor which is not serializable (https://github.com/cohere-ai/cohere-python/blob/1dd3194b24705aa38e58bb17194325fee6989e24/cohere/client.py#L43)

Release next version of package on PyPI

We'd like to release the latest version of conversant on PyPI and in doing so upgrade the PyPI release version to 0.1.2. See #4 for reference and be aware of the following gotchas (via @eddotman):

Don't forget to update changelog

make sure pyproject.toml has fields like the link to the github repo and the README, otherwise the PyPI release won't know about these details

Otherwise, it's just poetry build and poetry publish which is pretty nice!

Streamline persona injection into the demo streamlit

While personas are customizable through a few routes already, we are missing a directory customization option in the streamlit where the personas are loaded. Adding that option will make it easier for users to customize how a new persona can be loaded into the prepackaged streamlit app.

Update README with a nice example that includes screencaps & add schematic details

The best READMEs have a concise code example, and a screencap so that users can visually understand at a glance why they should care about this repo / tooling.

This example should be thorough enough that it makes the interface very explicit -- i.e., when you import this library, what do you actually need to import / call / configure / instantiate? What are some reasonable "advanced" configs when using this library? And what is intentionally under the hood?

We also need to show users how to add their custom persona and see it in a demo.

We would also like to add details about how this works schematically, such that developers can easily understand not only how to run the demo, but also how to navigate and ultimately contribute to the code.

Additional contributors to the text above: @eddotman and @yichern

Elegant handling of prompt overflow errors and warnings in Streamlit

Following up on #2, we should also handle the specific case of the demo Streamlit.

In general, it should work fine to raise errors / warnings as appropriate if the prompt becomes too large. The Streamlit demo could elegantly handle these errors 1) for better user experience and 2) to show an example to users of how we might handle these errors in general.

demo_utils.launch_streamlit unable to work on local install of `conversant`

The files of conversant that are installed by pip are only from sandbox-conversant-lib/conversant.

demo_utils.launch_streamlit requires app/streamlit_example.py to be available.

Code to replicate:
pip install conversant

co = cohere.Client("API KEY")
shakespeare_config = {
    "preamble": "Below is a conversation between Shakespeare and a Literature Student.",
    "example_separator": "<CONVERSATION>\n",
    "headers": {
        "user": "Literature Student",
        "bot": "William Shakespeare",
    },
    "examples": [
        [
            {
                "user": "Who are you?",
                "bot": "Mine own nameth is Shakespeare, and I speaketh in riddles.",
            },
        ]
    ],
}
shakespeare_bot = PromptChatbot(
    client=co,
    prompt=ChatPrompt.from_dict(shakespeare_config),
    persona_name="william-shakespeare",
)
demo_utils.launch_streamlit(shakespeare_bot)

Error:

Usage: streamlit run [OPTIONS] TARGET [ARGS]...
Try 'streamlit run --help' for help.

Error: Invalid value: File does not exist: /app/streamlit_example.py

One possible solution is to move app/ into conversant (with a rename), such that streamlit_example.py is available in the pip installed version of conversant as conversant/demo/streamlit_example.py.

Rename `StartPrompt` to `ConversationPrompt`

StartPrompt isn't suggestive that it is a prompt used prompting a LLM for dialogue. With other prompts like RewritePrompt included, it is important to be clear of the prompt's purpose in its naming.

PyPI release disallows direct git repo dependencies

We'll need to think of a workaround for the st-chat modifications we've made, since we can't point to a direct git repo as a dependency if we want to release conversant on PyPI.

We could make a PR back to st-chat as one solution 👀

In the interim: A very easy & quick fix is to update the README to add back in the note that the package is not yet on PyPI.

Add pre-commit hook isort and autoflake

We should add isort (https://github.com/PyCQA/isort) and autoflake (https://github.com/PyCQA/autoflake) to our pre-commit hooks

Add pre-commit hook to check for API keys

Rather than manually ensuring that API keys have not been committed, we could do this programmatically via pre-commit hooks. However, due to the diversity in API key format, there does not appear to be an off-the-shelf solution.

Creating new hooks is possible, but in order to make this work with the pre-commit package we would need to make a new git repo for this detect-api-key hook (source). This could be useful to teams beyond ConvAI.

Generally speaking, it's possible to create pre-commit hooks by simply creating a shell script, but in order to integrate with pre-commit it seems like creating a repo that is an installable package or exposes an executable is necessary.

Add linting to pre-commit hooks

Better documentation of parameters for `chatbot_config` and `client_config`

In configure_chatbot

sandbox-conversant-lib/conversant/prompt_chatbot.py

Lines 196 to 202 in 0836ecc

 def configure_chatbot(self, chatbot_config: Dict = {}) -> None: 

 """Configures chatbot options. 

  Args: 

  chatbot_config (Dict, optional): Updates self.chatbot_config. Defaults 

  to {}. 

  """

and configure_client

sandbox-conversant-lib/conversant/prompt_chatbot.py

Lines 217 to 222 in 0836ecc

 def configure_client(self, client_config: Dict = {}) -> None: 

 """Configures client options. 

  Args: 

  client_config (Dict, optional): Updates self.client_config. Defaults to {}. 

  """

We should better specify what each parameter is

Rename `headers` to `variables` or something similar

From #6 (comment) and #6 (comment)

I'm also not very settled on the naming of headers because headers seems to suggest something like "Math Teacher Bot" not "bot" (ie. the value of a field, not the name of the field itself). Would a better way to name it be variables (so: variable value and variable name)? This is probably better implemented in a separate PR

re: headers -- what about header_names or header_keys? In any case, agreed that it is better done in a refactor PR later IMO.

Do we want to rename headers to variables?

`pdoc` doesn't detect all submodules

Looking at the docs website, there are some submodules not detected by pdoc. I think this is related to the way we've set up __all__ in the __init__.py files, but I haven't thought more deeply on it yet. 🤔

	self._measure_similarity(embedded_query, d.embedding)
	for d in self.documents
	]

	max_similarity = max(similarities)

	if max_similarity < threshold:
	logging.warning(
	f"Max search similarity {max_similarity} below threshold {threshold}; "
	"no document returned."
	)
	return None

	logging.info(f"Search result found for query: {query}")
	nearest_idx = similarities.index(max_similarity)
	nearest_doc = self.documents[nearest_idx]

	return nearest_doc

	def configure_chatbot(self, chatbot_config: Dict = {}) -> None:
	"""Configures chatbot options.

	Args:
	chatbot_config (Dict, optional): Updates self.chatbot_config. Defaults
	to {}.
	"""

	def configure_client(self, client_config: Dict = {}) -> None:
	"""Configures client options.

	Args:
	client_config (Dict, optional): Updates self.client_config. Defaults to {}.
	"""

cohere-ai / sandbox-conversant-lib Goto Github PK

sandbox-conversant-lib's People

Contributors

Stargazers

Watchers

Forkers

sandbox-conversant-lib's Issues

Recommend Projects

Recommend Topics

Recommend Org