Giter Club home page Giter Club logo

sandbox-conversant-lib's People

Contributors

dependabot[bot] avatar eddotman avatar egrefen avatar harish-garg avatar irombie avatar kyliehe616 avatar lusmoura avatar madelinehjenkins avatar marcodel13 avatar marinamachado avatar neilatcohere avatar yichern avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sandbox-conversant-lib's Issues

Add support for session management in chat histories

Right now, chatbots have a single chat history object for 1-on-1 dialogue, which doesn't lead to any obvious pathway for managing multiple sessions.

Allowing a bot to access multiple chat histories (e.g., by having a session key layer first in the dictionary) would allow a single bot instance to managing multiple chats when exposed through some interface.

(This happens to work currently in streamlit because each session instantiates a new bot, but this shouldn't be the only option.)

docs: improve readme to make it clearer how to deploy on streamlit

Our docs talk about adding an API key to secrets.toml, but this is only relevant for local development and not for hosted streamlit apps. We could improve the README by clarifying the distinction, and either providing information for secrets management on streamlit or directing users to the relevant docs on streamlit.

See this discord discussion: https://discord.com/channels/954421988141711382/954431217560879134/1040301959149797496

Removing caching from github actions workflows

Our workflows do some caching of Poetry environments to speedup actions. This uses a ton of cache space though -- each env is > 100MB!

It's probably worth it to skip the caching in favour of a bit more runtime on running tests and building docs.

AC:

  • workflows still run correctly
  • caching is removed from the YAML files that define the workflows

refactor: could make searching code easier to understand

self._measure_similarity(embedded_query, d.embedding)
for d in self.documents
]
max_similarity = max(similarities)
if max_similarity < threshold:
logging.warning(
f"Max search similarity {max_similarity} below threshold {threshold}; "
"no document returned."
)
return None
logging.info(f"Search result found for query: {query}")
nearest_idx = similarities.index(max_similarity)
nearest_doc = self.documents[nearest_idx]
return nearest_doc

Right now there are some list comprehensions & reliance on list order preservation that make this hard to interpret.

The code could be made more explicit:

# Search for the most similar doc
most_similar_document = None
top_similarity_score = 0
for doc in documents:
    similarity_score = self._measure_similarity(embedded_query, doc.embedding)
    if similarity_score > top_similarity_score:
        most_similar_document = doc

# Make sure it's similar enough
if top_similarity_score < threshold:
    logging.warning(
        f"Max search similarity {max_similarity} below threshold {threshold}; "
        "no document returned."
    )
    return None

return most_similar_document

Shoutout to @mkozakov and Jacob-GK for flagging this!

demo_utils.launch_streamlit does not work

co = cohere.Client("API KEY")
shakespeare_config = {
    "preamble": "Below is a conversation between Shakespeare and a Literature Student.",
    "example_separator": "<CONVERSATION>\n",
    "headers": {
        "user": "Literature Student",
        "bot": "William Shakespeare",
    },
    "examples": [
        [
            {
                "user": "Who are you?",
                "bot": "Mine own nameth is Shakespeare, and I speaketh in riddles.",
            },
        ]
    ],
}
shakespeare_bot = PromptChatbot(
    client=co,
    prompt=ChatPrompt.from_dict(shakespeare_config),
    persona_name="william-shakespeare",
)
demo_utils.launch_streamlit(shakespeare_bot)

Error raised:
TypeError: cannot pickle '_queue.SimpleQueue' object

This is because in Cohere 2.8.0, there is a ThreadPoolExecutor which is not serializable (https://github.com/cohere-ai/cohere-python/blob/1dd3194b24705aa38e58bb17194325fee6989e24/cohere/client.py#L43)

Release next version of package on PyPI

We'd like to release the latest version of conversant on PyPI and in doing so upgrade the PyPI release version to 0.1.2. See #4 for reference and be aware of the following gotchas (via @eddotman):

  1. Don't forget to update changelog
  2. make sure pyproject.toml has fields like the link to the github repo and the README, otherwise the PyPI release won't know about these details
  3. Otherwise, it's just poetry build and poetry publish which is pretty nice!

Streamline persona injection into the demo streamlit

While personas are customizable through a few routes already, we are missing a directory customization option in the streamlit where the personas are loaded. Adding that option will make it easier for users to customize how a new persona can be loaded into the prepackaged streamlit app.

Update README with a nice example that includes screencaps & add schematic details

The best READMEs have a concise code example, and a screencap so that users can visually understand at a glance why they should care about this repo / tooling.

This example should be thorough enough that it makes the interface very explicit -- i.e., when you import this library, what do you actually need to import / call / configure / instantiate? What are some reasonable "advanced" configs when using this library? And what is intentionally under the hood?

We also need to show users how to add their custom persona and see it in a demo.

We would also like to add details about how this works schematically, such that developers can easily understand not only how to run the demo, but also how to navigate and ultimately contribute to the code.

Additional contributors to the text above: @eddotman and @yichern

Elegant handling of prompt overflow errors and warnings in Streamlit

Following up on #2, we should also handle the specific case of the demo Streamlit.

In general, it should work fine to raise errors / warnings as appropriate if the prompt becomes too large. The Streamlit demo could elegantly handle these errors 1) for better user experience and 2) to show an example to users of how we might handle these errors in general.

demo_utils.launch_streamlit unable to work on local install of `conversant`

The files of conversant that are installed by pip are only from sandbox-conversant-lib/conversant.

demo_utils.launch_streamlit requires app/streamlit_example.py to be available.

Code to replicate:
pip install conversant

co = cohere.Client("API KEY")
shakespeare_config = {
    "preamble": "Below is a conversation between Shakespeare and a Literature Student.",
    "example_separator": "<CONVERSATION>\n",
    "headers": {
        "user": "Literature Student",
        "bot": "William Shakespeare",
    },
    "examples": [
        [
            {
                "user": "Who are you?",
                "bot": "Mine own nameth is Shakespeare, and I speaketh in riddles.",
            },
        ]
    ],
}
shakespeare_bot = PromptChatbot(
    client=co,
    prompt=ChatPrompt.from_dict(shakespeare_config),
    persona_name="william-shakespeare",
)
demo_utils.launch_streamlit(shakespeare_bot)

Error:

Usage: streamlit run [OPTIONS] TARGET [ARGS]...
Try 'streamlit run --help' for help.

Error: Invalid value: File does not exist: /app/streamlit_example.py

One possible solution is to move app/ into conversant (with a rename), such that streamlit_example.py is available in the pip installed version of conversant as conversant/demo/streamlit_example.py.

Rename `StartPrompt` to `ConversationPrompt`

StartPrompt isn't suggestive that it is a prompt used prompting a LLM for dialogue. With other prompts like RewritePrompt included, it is important to be clear of the prompt's purpose in its naming.

PyPI release disallows direct git repo dependencies

We'll need to think of a workaround for the st-chat modifications we've made, since we can't point to a direct git repo as a dependency if we want to release conversant on PyPI.

We could make a PR back to st-chat as one solution ๐Ÿ‘€

In the interim: A very easy & quick fix is to update the README to add back in the note that the package is not yet on PyPI.

Add pre-commit hook to check for API keys

Rather than manually ensuring that API keys have not been committed, we could do this programmatically via pre-commit hooks. However, due to the diversity in API key format, there does not appear to be an off-the-shelf solution.

Creating new hooks is possible, but in order to make this work with the pre-commit package we would need to make a new git repo for this detect-api-key hook (source). This could be useful to teams beyond ConvAI.

Generally speaking, it's possible to create pre-commit hooks by simply creating a shell script, but in order to integrate with pre-commit it seems like creating a repo that is an installable package or exposes an executable is necessary.

Better documentation of parameters for `chatbot_config` and `client_config`

In configure_chatbot

def configure_chatbot(self, chatbot_config: Dict = {}) -> None:
"""Configures chatbot options.
Args:
chatbot_config (Dict, optional): Updates self.chatbot_config. Defaults
to {}.
"""

and configure_client

def configure_client(self, client_config: Dict = {}) -> None:
"""Configures client options.
Args:
client_config (Dict, optional): Updates self.client_config. Defaults to {}.
"""

We should better specify what each parameter is

Rename `headers` to `variables` or something similar

From #6 (comment) and #6 (comment)

I'm also not very settled on the naming of headers because headers seems to suggest something like "Math Teacher Bot" not "bot" (ie. the value of a field, not the name of the field itself). Would a better way to name it be variables (so: variable value and variable name)? This is probably better implemented in a separate PR

re: headers -- what about header_names or header_keys? In any case, agreed that it is better done in a refactor PR later IMO.

Do we want to rename headers to variables?

`pdoc` doesn't detect all submodules

Looking at the docs website, there are some submodules not detected by pdoc. I think this is related to the way we've set up __all__ in the __init__.py files, but I haven't thought more deeply on it yet. ๐Ÿค”

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.