cheshire-cat-ai / core Goto Github PK

View Code? Open in Web Editor NEW

2.0K 24.0 249.0 10.84 MB

Production ready AI assistant framework

Home Page: https://cheshirecat.ai

License: GNU General Public License v3.0

Dockerfile 0.32% Python 99.14% HTML 0.18% CSS 0.07% JavaScript 0.29%

ai chatbot docker llm plugin assistant vector-search agent framework function-calling

core's People

Contributors

Stargazers

Watchers

Forkers

peppicus woollover umbertogriffo ro-ol samirsalman march-08 zodraccir andrea8802 mte90 calebgcc nescio98 mallibus fmamberti alessandro-olivo pymike00 salvatorebottiglieri dariscappelletti varrialeciro nicola-corbellini angydev romanobenit giodegas aleimbri iltempe xpulvi cs4k1sr4c jeydi niatro 0siride ylenia1211 romanoing montman marcotondi pajeronda steve707-ae febbrile andreferra zioproto dsupertramp linecode salvog7 maremmahub fedestyla lorenzo-polaris perseog elio1fiore nearit87 quonn77 spleider moku23 paperbackpear3 miziodel kevinzhang19870314 sparda8917 marcozardo luca-rigutti vanno97 w3sley0bi tochka21u cannarocks danielusg frabazz francyjglisboa sphinnix ishan-marikar ishan-appendix bionicvapourboy emeierkeio luigipinna davidemodolo xdatap1 mobs75 luca1844 afiand zap123 muno1 alexcalabrese alessiotm cereal84 nicolagheza matreius chaosblak danieleongari lorenzodisidoro tremos rafleze frenz86 lorenzopalloni sorokinvld andlogiu lucagazzola senad96 zalweny26 samantatarun giorgiogtelian osbarcelos79 sydus albocode giovannialbero1992 gaelazzo

core's Issues

Configurations - Embedders

the BE should expose the available language model providers from the /settings/embedder endpoint. The FE will fetch the relevant information and allow the user to select and customise the language model using a JSON form.

condition episodic memory on time

When the cat retrieves relevant episodic memories (things user said in the past), it is important to:

sort them by time (most recent memories are more important)
insert into the memory text info about time (e.g. "yesterday", "days ago", "years ago")

Create new VectorStore collections whenever the embedder changes

If there is an embedder change, the VectorStore will not be compatible because:

the new embedder may have a different output dimensionality
even if the dimensionality is the same, it is a totally different space

So whenever the embedder changes:

delete the old VectorStore collections
create new ones (this is already done lazily and works perfectly if there are no VectorStore collections on disk)

Or if we want to preserve old memories, prepend embedder name to collection mane

Enable batch insertion of documents

Since everything you say or upload to the cat is vectorized and stored in a vector db.
It should be possible to upload and download documents in batch mode, such as from an ETL task.

Make ingestion running as standalone library to ingest memories from an external process.

User management

It can be interesting to save the output/input not just in the database but also divided by user.
In this way in the UI it is possible to see previous chats and different user settings.

There is https://pypi.org/project/fastapi-users/ that already use sqlalchemy and add the various endpoints.

Wrong network ip address for the frontend (Mac M1)

When I start the container, it shows:

frontend | ➜ Local: http://localhost:3000/
frontend | ➜ Network: http://172.19.0.3:3000/

But the IP address should be instead 192.168.1.4

Mac M1 - OSX 13.2.1

Markdown support for chat

We want to enable Markdown support in our CheshireCat project. By design, most language models use Markdown in their responses, so we will be editing the MessageBox component using the Remark library located at https://github.com/remarkjs/remark. This will enable us to support markdown responses from the CheshireCat.

Automatic build for mkdocs

The gh-pages branch contains sources for mkdocs documentation.
Github pages is already active and pointing to the site folder in the branch from this address

Can you configure a pipeline to automatically build this docs when there is a push?

bootstrap documentation with a small web presence

Before docs, let's have a few pages for the project directly on github pages.

TLDR; what is this project about?

Sorry, I was trying to understand the project's objective, but I couldn't find any description from the readme.

Python Code organization

Just some thoughts as the code can be reorganized:

change web name to backend (in this way is clear the different between the frontend folder)
~~every file should include a documentation head line to understand what is the purpose, ex: rabbit_hole.py~~ moved to #106
~~in python and in other language there is the convention that the file name correspond to the class, a file looking_glass.py that has a function CheshireCat is not clear~~

Clean main.py, to just execute the backend (and initialize stuff):

~~move from all the endpoints in a specific file (like routes.py)~~
move the version string to a dedicated file like version.txt, in this wat the version is not hardcoded

Organize differently the endpoints:

~~In main.py and setting.py there are endpoints, makes more sense to do a folder routes with 2 files (now): settings.py and base.py (or a better name)~~

Divide CheshireCat in 2 different files, one is the cat and another is just the boostrap like for the db and so on.
Also in thinking about the plugin stuff everything in the code that calls external frameworks/libraries like for langchain should be handled by a dedicated class so it is more easy to extend it or change it. https://github.com/pieroit/cheshire-cat/blob/main/web/cat/looking_glass.py#L61

Probably the files in the config folder should explains better what is the purpose. I think that is better if they are in the db folder as they are just the models for those data.

Display reasoning

We want to display the reasoning behind a response from the CheshireCat, to provide users with greater transparency and insight into the decision-making process. To do this, we will leverage the Sidebar component to present the content of the reasoning object, which is already sent from the backend. The reasoning object is defined as follows:

{
    "input": "What is Python?",
    "episodic_memory": [
      {
        "page_content": "it is for fictional purposes ",
        "lookup_str": "",
        "metadata": {
          "source": "user",
          "when": 1680432386.7730486,
          "text": "it is for fictional purposes "
        },
        "lookup_index": 0,
        "score": 0.5044264793395996
      },
      {
        "page_content": "Write a 400 words post on how Ai is going to change the world",
        "lookup_str": "",
        "metadata": {
          "source": "user",
          "when": 1680432337.0415337,
          "text": "Write a 400 words post on how Ai is going to change the world"
        },
        "lookup_index": 0,
        "score": 0.5165414810180664
      },
      {
        "page_content": "write the introduction of a novel that talks about how the world has been taken over by the AI",
        "lookup_str": "",
        "metadata": {
          "source": "user",
          "when": 1680432429.13744,
          "text": "write the introduction of a novel that talks about how the world has been taken over by the AI"
        },
        "lookup_index": 0,
        "score": 0.5386247634887695
      },
      {
        "page_content": "nice, write a 670 words paragraph on who Ai will take over humanity",
        "lookup_str": "",
        "metadata": {
          "source": "user",
          "when": 1680432365.206487,
          "text": "nice, write a 670 words paragraph on who Ai will take over humanity"
        },
        "lookup_index": 0,
        "score": 0.5566583275794983
      },
      {
        "page_content": "I am the Cheshire Cat",
        "lookup_str": "",
        "metadata": {
          "who": "cheshire-cat",
          "when": 1679948291.703731,
          "text": "I am the Cheshire Cat"
        },
        "lookup_index": 0,
        "score": 0.564825177192688
      }
    ],
    "declarative_memory": [
      {
        "page_content": "I am the Cheshire Cat",
        "lookup_str": "",
        "metadata": {
          "who": "cheshire-cat",
          "when": 1679948292.8870578,
          "text": "I am the Cheshire Cat"
        },
        "lookup_index": 0,
        "score": 0.564825177192688
      }
    ],
    "chat_history": "",
    "output": "Python is a programming language used in various applications such as web development, data analysis, machine learning, and artificial intelligence.",
    "intermediate_steps": []
  }

Simple Auth with .env

Let's postpone a full fledged user management as proposed in #62 and go for a simple token auth:

if there is no AUTH_TOKEN in the .env all endpoints are public
it there is one, endpoints only reply to requests having Authorization: Bearer <auth_token> in the header

Enable file-based upload of textual content via REST endpoint

Currently, the user can only add textual content to the cat by manually typing it in, which can be tedious and time-consuming. Therefore, we need to add a new feature that allows the user to add large amounts of textual content to the cat through a file uploader.

To implement this feature, we need to create an HTTP POST request that sends the selected file to the REST endpoint that accepts files. We also need to wait for the API response to ensure that the file was uploaded successfully. Once the file is uploaded, we will be able to process the content and add it to the cat.

This feature will provide users with an easy and efficient way to add textual content to the cat, which will improve the overall user experience

Take away Qdrant in favor of a file based vector db ( langchain.vectorstores.FAISS )

Qdrant is great but a little overkill for getting started with the Cat.
Systems like Django support any SQL db but start by simply shipping sqlite.

@umbertogriffo already introduced sqlite as a table db (merging soon!) and we'll do the same with the vector DB.

I checked out both FAISS and annoy and they do not support assigning metadata to vectors (which are essential to connect embeddings with symbolic stuff).

Langchain allows for a FAISS+Docstore combination but it has to be loaded and saved to disk manually, otherwise it only lives in memory. Search FAISS VectorStore here:
https://langchain.readthedocs.io/en/latest/reference/modules/vectorstore.html

Let's substitute Qdrant with FAISS while staying in the langchain constructs.

P.S.: compatibility of this pieces (LLM, Embedder, VectorStore) with langchain is a plus because the APIs are already well designed and helps developing a solid plugin system. Let's stick to them as much a spossible ;)

P.P.S.: conversation started in #23

[Frontend] Upload markdown documents

Currently, it is not possible to upload documents with the .md extension, even though they are supported by the backend.

navigation from smartphone returns error

https://lysvz.localtonet.com/
something went wrong whhile connecting to the server. pleasetry agin later

Integration with HuggingFace

Huggingface is the most popular hub for LLMs. Given the open source philosophy of the project I think integration and collaboration with the HuggingFace community could be great

Documentation

[Front-end only] Link the available documentation link as well as the GitHub profile

Integration with Cohere

Cohere exposes LLMs for free through their API, I think this could be very beneficial both for fast and free iteration during development, and to the users who cannot afford to use the OpenAI api

Improve logs

local environment and ensure codebase consistency and maintainability

Hi there!

I have enjoyed looking at this project and appreciate your effort in making it functional and easy to use. However, I have noticed that there is no way to set up a local environment using tools like pipenv, which can help manage dependencies outside the Docker container.

I think it would be beneficial to have this option available because it would make it easier for developers to work on the project without relying solely on the Docker container. Additionally, it would allow for greater flexibility regarding the tools and versions of packages that developers can use.

Furthermore, I believe introducing code auto reformatting, PEP 8 checker (flake8), and auto sort imports (isort) with pre-commit would be a good idea. This would help ensure that the codebase remains consistent and maintainable over time.

If you agree, I can introduce all of them in a PR.

agent manager can't be a singleton, otherwise stregatto can't be configured user by user

add multilingual support

Default prompts in chains and agents are in english.

OPTION 1: localization
There should be a language detector to classify user input and to load default prompts in the appropriate language.
(localized content should be organized in a similar manner as in CMSs like WordPress)
OPTION 2: wrapper
Run the language detector, translate user input in english using a translation model, run the pipelines/agents/chains in english and then translate back the final response back to user language.

Let's go with option 2 as llms are weak in languages other than english

autofocus when receiveing message in frontend

Input bar for messages should autofocus as soon as a cat message is received

After pull "Something went wrong while connecting to the server. Please try again later"

Hi Sorry if I am missing something, after last pull I got this error

The consolo throw this error

web | ERROR: Exception in ASGI application
web | Traceback (most recent call last):
web | File "/usr/local/lib/python3.9/site-packages/starlette/datastructures.py", line 702, in getattr
web | return self._state[key]
web | KeyError: 'ccat'
web |
web | During handling of the above exception, another exception occurred:
web |
web | Traceback (most recent call last):
web | File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/websockets/websockets_impl.py", line 238, in run_asgi
web | result = await self.app(self.scope, self.asgi_receive, self.asgi_send)
web | File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
web | return await self.app(scope, receive, send)
web | File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 271, in call
web | await super().call(scope, receive, send)
web | File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 118, in call
web | await self.middleware_stack(scope, receive, send)
web | File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 149, in call
web | await self.app(scope, receive, send)
web | File "/usr/local/lib/python3.9/site-packages/starlette/middleware/cors.py", line 76, in call
web | await self.app(scope, receive, send)
web | File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in call
web | raise exc
web | File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in call
web | await self.app(scope, receive, sender)
web | File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call
web | raise e
web | File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
web | await self.app(scope, receive, send)
web | File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 706, in call
web | await route.handle(scope, receive, send)
web | File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 341, in handle
web | await self.app(scope, receive, send)
web | File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 82, in app
web | await func(session)
web | File "/app/./cat/routes/websocket.py", line 14, in websocket_endpoint
web | ccat = websocket.app.state.ccat
web | File "/usr/local/lib/python3.9/site-packages/starlette/datastructures.py", line 705, in getattr
web | raise AttributeError(message.format(self.class.name, key))
web | AttributeError: 'State' object has no attribute 'ccat'

And web console

Markdown support for messages

UPLOAD NEW CONTENT WITH "SCORE" TO DRIVE "CAT" ANSWER

Do you think it is feasible to define an attribute of "authoritativeness" to the uploaded documents, a score between 0 and 1 where 1 is the highest authoritativeness, and to be able to steer the Cat's choices in formulating responses, prioritizing information sources with priority 1.
This would allow the Cat, given equal available content, to choose the one with higher score.
It could be useful to weigh reliable sources from less reliable ones.

Implement a more flexible CSS modules naming strategy to facilitate external contributions and allow effortless CSS class overrides by plugin creators

Problem

Currently, the CSS modules naming strategy used by VITE is not very flexible, which can make it difficult for external contributors and plugin creators to customise the CSS classes. This can be especially problematic when trying to apply custom styling or override existing styles

Approach

Given that we are using VITE as a bundler, we can take advantage of its built-in support for CSS modules and its various configuration options. Specifically, we can explore the css.modules option in VITE, which provides several options for customizing the CSS modules behavior, including the ability to specify custom naming conventions and post-processing steps.

For more information on how to configure CSS modules in VITE, see the official VITE documentation.

Add hook for summarization

It's useful to have inside CheshireCat a way to summarize text/documents before saving embeddings.
We can add a new hook in the default plugin and make it available in CheshireCat class. Something like this maybe:

def load_plugins(self):
  ...
  self.embedder = ...
  self.summarizer = self.mad_hatter.execute_hook("get_language_summarizer", self)
  ...

We can use Hugging Face maybe for the default one? let me know :)

Remove .env files from frontend for monolithic MVP architecture

At the moment the frontend codebase works with .env files, which is not suitable for the current monolithic architecture.
Hence, we must remove the .env files from the frontend repository and modify the codebase to reflect this modification.

Even though this approach is not optimal in the long-run, it is necessary for the MVP version to keep the application as basic yet functional as possible

add tutorial in default memory

The cat should be able to guide the user on how to extend and hack the system.
This info can be inserted in declarative memory and retrieved in conversation (HyDE should be enough)

Frontend message: Something went wrong while connecting to the server. Please try again later

Cat running in a Virtualbox VM ubuntu-20.04.6-live-server-amd64.iso, hosted by a Windows 10 Enterprise PC
Web interface starts on :3000
Message "Getting Ready" and then red banner telling "Something went wrong while connecting to the server. Please try again later"

The only warning message on the log window is

web         | /usr/local/lib/python3.9/site-packages/langchain/llms/openai.py:608: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`
web         |   warnings.warn(

I'm not sure if it has anything to do with the problem.
maybe it is something super simple, but the message is not much helping. For example I'm not sure which server is not reachable. A more detailed message, at least on the log, with server name/address and port would help in troubleshooting.

prompts get too large

Use more deeply langchain routines to keep the prompt at limited length (CombineDcoumentsChain etc.).
Summarization may also be appropriated when documents are uploaded.

Cohere model hallucinates like crazy

We are pigging back on a langchain adapter in order to support Cohere.
The default model large for text generation hallucinates:

Also, on a separate notebook:

Configurations - Language models

the BE should expose the available language model providers from the /settings/llm endpoint. The FE will fetch the relevant information and allow the user to select and customise the language model using a JSON form. Once the user saves the information should be sent to the BE

Production/dev/verbose log mode

So right now in the output in the shell we have:

Prompt after formatting:
You will be given a sentence.
If the sentence is a question, convert it to a plausible answer. If the sentence does not contain an question, repeat the sentence as is without adding anything to it.

Examples:
- what furniture there is in my room? --> In my room there is a bed, a guardrobe and a desk with my computer
- where did you go today --> today I was at school
- I like ice cream --> I like ice cream
- how old is Jack --> Jack is 20 years old
- Does pineapple belong on pizza?  -->

This can be confusing, with a flag (in the .env file) it could be possible to print different content.
Like also create an error.log with the python errors or different more.

/home/www/cheshire-cat/web/env/lib/python3.11/site-packages/langchain/llms/openai.py:608: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`
  warnings.warn(

As another example:

This is a conversation between a human and an intelligent robot cat that passes the Turing test.
The cat is curious and talks like the Cheshire Cat from Alice's adventures in wonderland.
The cat replies are based on the Context provided below.

Context of things the Human said in the past:
  - I am the Cheshire Cat
Context of documents containing relevant information:
  - I am the Cheshire Cat

Conversation until now:
Human: What's up?

What would the AI reply? Answer concisely to the user needs as best you can, according to the provided recent conversation and relevant context.
If Context is not enough, you have access to the following tools:


> my_shoes: Retrieves information about shoes
> my_shoes_color: Retrieves color of shoes

To use a tool, please use the following format:

This kind of output should be in the UI and not in the server.

Enable OpenAI API Key Configuration via UI

As a user, I want to be able to input my own OpenAI API key at app startup once and for future sessions.
This will require creating:

a backend service that can store and retrieve the API key securely from a local database
a configuration interface on the front end that allows users to enter their API key and send it to the backend service

Once the API key is saved, the backend should retrieve it from the database when needed so that the key is never exposed to the front end app

Custom and pluggable Agent

The Cat features a langchain ConversationalAgent which covers most easy use cases:

chat with a memory context
chat with tools use

At the moment this solution presents several limits:

confusion between memory context and tools
difficulty executing tools in sequence
prompt is too long (small models have no chance)

Here is a roadmap to improve the Agent:
1 - have a more agile and resilient default agent (choosing among [these])(https://python.langchain.com/docs/modules/agents.html)
2 - have a pluggable agent - inserting a few hooks into CustomAgent, making it multiprompt and working on dictionaries
3 - having a hook to totally override the cat Agent in case a dev is brave enough (same as we do with LLM and embedders)

@nicola-corbellini @sirius-0 let's tackle this

Voice Input and Text Transcription for Cheshire Cat Interaction

As a user, I would like to interact with the Cheshire Cat using voice.
This requires adding voice input capability to our web application.
The recorded audio will then be transcribed into text using the Web Speech API browser API.
The resulting text will be sent to the backend for processing.

To provide a smooth user experience, we aim to implement a WhatsApp-like UX interaction for the Cheshire Cat.
This will allow users to easily interact with the cat using their voice directly and see the corresponding text responses in real-time

Separate front-end(s) into admin and public

Leave this to core contributors

Plugin list

The backend should provide a list of available plugins through the /plugins endpoint. This list will include the plugin's name, description, and a unique id. The unique id may simply be the name of the folder that the plugin is stored in. To allow end users to define plugin metadata, the suggested approach is to have a non-mandatory plugin.json file stored in each plugin's directory where the user can define both name and description (as well as future metadata such as the JSON schema of the configuration).

// plugin.json
{
  "name": "MyCustomPlugin",
  "description": "Makes the cat cool af" 
}

if the plugin.json file is not defined then the backend should default to the values from the folder name.

A possible response from the /plugins endpoint will then be:

[
 { 
	 id: "cool-plugin", 
	 name: "MyCustomPlugin", 
	 description: "Makes the cat cool af" 
  },
]

The front end should fetch the list of available plugins and display them under the /plugins route as a read-only list. Create a new pluginsSlice using redux and follow the defined best practice on how to handle async states. At the moment, no interaction is scheduled.

CSV Upload

Could be useful to enable to upload CSV files?

add support for custom prompting

agent prompting should be customizable from API endpoints (and after that from the user interface)

custom suggestions at startup

The first time you open the cat there is a set of possible prompts.
At later interactions it would be interesting to see frequent prompts (which are relevant to users)

AssertionError()

Hi Piere, thanks for this proyect... is wonderful.
Look I have the netxt error in the UI

But in the cmd terminal I see the correct answer

The terminal throw the next error:
web | Traceback (most recent call last):
web | File "/app/./cat/main.py", line 57, in websocket_endpoint
web | cat_message = cheshire_cat(user_message)
web | File "/app/./cat/looking_glass.py", line 167, in call
web | episodic_memory_content = self.recall_memories_from_embedding(
web | File "/app/./cat/looking_glass.py", line 114, in recall_memories_from_embedding
web | memories = self.memory[collection].similarity_search_with_score_by_vector(
web | File "/usr/local/lib/python3.9/site-packages/langchain/vectorstores/faiss.py", line 151, in similarity_search_with_score_by_vector
web | scores, indices = self.index.search(np.array([embedding], dtype=np.float32), k)
web | File "/usr/local/lib/python3.9/site-packages/faiss/class_wrappers.py", line 329, in replacement_search
web | assert d == self.d
web | AssertionError

And Chat GPT4 says:

The error you are experiencing appears to be an AssertionError caused by a discrepancy in the dimension of the embedding vectors in the application you are using. Here's an explanation of the error in detail:

The error originates in the file looking_glass.py in the function recall_memories_from_embedding.
This function attempts to retrieve memories using a similarity search method based on embeddings.
The error occurs in the call to the similarity_search_with_score_by_vector method of the Faiss class, which is located in the faiss.py file.
The AssertionError is generated in the replacement_search function in class_wrappers.py of the Faiss library, when it checks whether the dimension d of the search vector matches the dimension d of the Faiss index.

The solution to this issue will depend on the underlying cause of the discrepancy in the dimensions of the embedding vectors. Here are some ideas for troubleshooting the issue:

Verify embedding configuration: Ensure that the application is using the correct configuration for the embeddings, including the models and dimensions of the vectors. Check if there is any preprocessing or transformation step for the embeddings that should be performed before using them in the similarity search.
Debug the data flow: Investigate how data is passed through the application and how embeddings are manipulated. Ensure that all steps in the data flow are consistent in terms of dimensions and data types.
Review documentation and code: Refer to the documentation and source code of the Faiss library and the application you are working on to understand how embeddings and search indexes should be configured and used. This can help you identify the cause of the error and fix it.
Update libraries: Ensure that you are using the most recent versions of the involved libraries. Sometimes, errors can be caused by issues in older versions that have already been fixed in later versions.

If after investigating these aspects you still cannot resolve the issue, consider reaching out to the developers of the application or the Faiss library for assistance or to report a possible bug.

Add priority queue for hooks

Report in frontend the specific server error

After playing with the cat for a while, looks like the context memory gets full.
The frontend message is

"Something went wrong while sending your message. Please try refreshing the page"

but the backend log reports

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 12341 tokens. Please reduce the length of the messages.

I am not in the code yet enough to propose a pull but from the user perspective my suggestion are

To make the frontend message more explicit (e.g, The context memory exceeded its maximum size, please remove something) and then the user shall do it in some way, e.g. via the endpoint mentioned here .
To inform the user that the memory is exceeded and then automatically remove the oldest memories and reiterate the query. The criterion to remove memories could also be flexible, e.g. remove the oldest, remove the ones closer to a given token.

Bring back Qdrant as a vectorstore

We moved to FAISS in #39 to make setup easier, but (my bad!) file based vectorstore do not allow prefiltering on metadata.
This means that the Cat cannot filter memories by metadata, which is key in many use cases.

With FAISS we are forced to get a lot of nearest neighbors in the hope they contain the correct metadata, then filter.
With Qdrant the neighbor search can be directly metadata-driven.

Let's get back Qdrant (on its own container) as in earler versions of the Cat

Add endpoints to list and delete memory contents

Anything you say or upload to the cat is vectorized and store in a vector db.

It should be possible to read and delete memories via endpoints, since any vector has metadata on its source.