Giter Club home page Giter Club logo

arxivchatguru's Introduction

ArXiv ChatGuru: Exploring Conversational Scientific Literature 📖

Welcome to ArXiv ChatGuru. This tool harnesses LangChain and Redis to make ArXiv's vast collection of scientific papers more interactive. Through this approach, we aim to make accessing and understanding research easier and more engaging, but also just to teach about how Retrieval Augmented Generation (RAG) systems work.

📖 How it Works

This diagram shows the process how ArXiv ChatGuru works. The user submits a topic, which is used to retrieve relevant papers from ArXiv. These papers are then chunked into smaller pieces, for which embeddings are generated. These embeddings are stored in Redis, which is used as a vector database. The user can then ask questions about the papers retrieved by the topic they submitted, and the system will return the most relevant answer.

ref arch ref arch

🛠 Components

  1. LangChain's ArXiv Loader: Efficiently pull scientific literature directly from ArXiv.
  2. Chunking + Embedding: Using LangChain, we segment lengthy papers into manageable pieces (rather arbitrarily currently), for which we then generate embeddings.
  3. Redis: Demonstrating fast and efficient vector storage, indexing, and retrieval for RAG.
  4. RetrievalQA: Building on LangChain's RetrievalQA and OpenAI models, users can write queries about papers retrieved by the topic they submit.
  5. Python Libraries: Making use of tools such as redisvl, Langchain, Streamlit, etc

💡 Learning Outcomes with ArXiv ChatGuru

  • Context Window Exploration: Learn about the importance of context window size and how it influences interaction results.
  • Vector Distance Insights: Understand the role of vector distance in context retrieval for RAG and see how adjustments can change response specificity.
  • Document Retrieval Dynamics: Observe how the number of documents retrieved can influence the performance of a RAG (Retriever-Augmented Generation) system.
  • Using Redis as a Vector DB and Semantic Cache: Learn how to use Redis as a vector database for RAG systems and how to use it as a semantic cache for RAG systems.

Note: This is not a production application. It's a learning tool more than anything. We're using Streamlit to make it easy to interact with, but it's not meant to be a scalable application. It's meant to be a learning tool for understanding how RAG systems work, and how they can be used to make scientific literature more interactive. We will continue to make this better over time.

🌟 If you love what we're doing, give us a star! Contributions and feedback are always welcome. 🌌🔭

Up Next

What we want to do next (ideas welcome!):

  • Pin stable versions of dependencies
  • Filters for Year, Author, etc.
  • More efficient chunking
  • More efficient embedding for semantic cache
  • Chat history and conversational memory (with langchain)

Run the App

Run Locally

  1. First, clone this repo and cd into it.

    $ git clone https://github.com/RedisVentures/ArxivChatGuru.git && cd ArxivChatGuru
  2. Create your env file:

    $ cp .env.template .env

    fill out values, most importantly, your OPENAI_API_KEY.

  3. Install dependencies: You should have Python 3.7+ installed and a virtual environment set up.

    $ pip install -r requirements.txt
  4. Run the app:

    $ streamlit run App.py
  5. Navigate to:

    http://localhost:8501/
    

Docker Compose

First, clone the repo like above.

  1. Create your env file:

    $ cp .env.template .env

    fill out values, most importantly, your OPENAI_API_KEY.

  2. Run with docker compose:

    $ docker compose up

    add -d option to daemonize the processes to the background if you wish.

    Issues with dependencies? Try force-building with no-cache:

    $ docker compose build --no-cache
    
  3. Navigate to:

    http://localhost:8080/
    

arxivchatguru's People

Contributors

antonum avatar luisalrp avatar spartee avatar tylerhutcherson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

arxivchatguru's Issues

Support Conversational Memory and multiple conversations

Users should be able to have multiple conversations and within each conversation, the arxiv chatguru should remember the context of the conversation.

This will involve changing from the RetrievalQA chain currently being used to one that support this within Langchain. For support with multiple conversations, we should think about separate indices.

Can vectorstore be superimposed continuously?

Hi, in the current logic, vectorstore either reads existing_index or create_index, whether it can support continuous superposition on the original basis, to achieve the effect of expanding vectorstore

def create_vectorstore() -> Redis:
    """Create the Redis vectorstore."""

    embeddings = get_embeddings()

    try:
        vectorstore = Redis.from_existing_index(
            embedding=embeddings,
            index_name=INDEX_NAME,
            redis_url=REDIS_URL
        )
        return vectorstore
    except:
        pass

    # Load Redis with documents
    documents = get_documents()
    vectorstore = Redis.from_documents(
        documents=documents,
        embedding=embeddings,
        index_name=INDEX_NAME,
        redis_url=REDIS_URL
    )
    return vectorstore

Need explanation how redis search helps here

@tylerhutcherson Can you please explain what is happening in brief? or any reference link for understanding in brief so far my understanding is at the app startup it will load csv into redis and when the user queries it will look into redis and construct some tokens[this part is not clear] and it fires request to openAI for the response. How redis search helps here

ArXiv ChatGuru

Объяснит любое исследование доступным языком.

Достаточно отправить заголовок нужной статьи, а нейронка сама извлечет все данные с портала ArXiv, пропустил их через себя и выдаст вам краткую и понятную суть. Если же и этого мало — есть чат, в котором можно задавать любые вопросы по теме.

Математика, физика, Data science и любые другие сложные темы теперь может освоить любой желающий.

Redis ConnectionError

Dear sir
Please assist with this exception. Thank you.

redis.exceptions.ConnectionError: This app has encountered an error. The original error message is redacted to prevent data leaks. Full error details have been recorded in the logs (if you're on Streamlit Cloud, click on 'Manage app' in the lower right of your app).
Traceback:
File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script
exec(code, module.dict)
File "/mount/src/arxivchatguru/app/app.py", line 117, in
create_arxiv_index(st.session_state['arxiv_topic'], st.session_state['num_papers'], prompt)
File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 212, in wrapper
return cached_func(*args, **kwargs)
File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 241, in call
return self._get_or_create_cached_value(args, kwargs)
File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 267, in _get_or_create_cached_value
return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 321, in _handle_cache_miss
computed_value = self._info.func(*func_args, **func_kwargs)
File "/mount/src/arxivchatguru/app/app.py", line 26, in create_arxiv_index
arxiv_db = get_vectorstore(arxiv_documents)
File "/mount/src/arxivchatguru/app/qna/db.py", line 43, in get_vectorstore
vectorstore = RedisVDB.from_documents(
File "/home/adminuser/venv/lib/python3.9/site-packages/langchain/schema/vectorstore.py", line 438, in from_documents
return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
File "/home/adminuser/venv/lib/python3.9/site-packages/langchain/vectorstores/redis/base.py", line 497, in from_texts
instance, _ = cls.from_texts_return_keys(
File "/home/adminuser/venv/lib/python3.9/site-packages/langchain/vectorstores/redis/base.py", line 414, in from_texts_return_keys
instance = cls(
File "/home/adminuser/venv/lib/python3.9/site-packages/langchain/vectorstores/redis/base.py", line 280, in init
check_redis_module_exist(redis_client, REDIS_REQUIRED_MODULES)
File "/home/adminuser/venv/lib/python3.9/site-packages/langchain/utilities/redis.py", line 49, in check_redis_module_exist
installed_modules = client.module_list()
File "/home/adminuser/venv/lib/python3.9/site-packages/redis/commands/core.py", line 5885, in module_list
return self.execute_command("MODULE LIST")
File "/home/adminuser/venv/lib/python3.9/site-packages/redis/client.py", line 533, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "/home/adminuser/venv/lib/python3.9/site-packages/redis/connection.py", line 1086, in get_connection
connection.connect()
File "/home/adminuser/venv/lib/python3.9/site-packages/redis/connection.py", line 270, in connect
raise ConnectionError(self._error_message(e))

docker compose up error!! help

arxivchatguru-redis-1 | 9:M 23 Oct 2023 05:59:20.357 * <redisgears_2> Failed loading RedisAI API.
arxivchatguru-redis-1 | 9:M 23 Oct 2023 05:59:20.357 * <redisgears_2> RedisGears v2.0.13, sha='c7993a0bb6f8e1a0a4b65cf44136bd7147967769', build_type='releasebuilt_for='Linux-ubuntu22.04.x86_64'.
arxivchatguru-redis-1 |
arxivchatguru-redis-1 | 9:M 23 Oct 2023 05:59:20.360 * <redisgears_2> Registered backend: js.
arxivchatguru-redis-1 | 9:M 23 Oct 2023 05:59:20.360 * Module 'redisgears_2' loaded from /opt/redis-stack/lib/redisgears.so
arxivchatguru-redis-1 | 9:M 23 Oct 2023 05:59:20.360 * Server initialized
arxivchatguru-redis-1 | 9:M 23 Oct 2023 05:59:20.360 * Ready to accept connections tcp
streamlit | Usage: streamlit run [OPTIONS] TARGET [ARGS]...
streamlit | Try 'streamlit run --help' for help.
streamlit |
streamlit |
streamlit | Error: Invalid value: File does not exist: App.py
streamlit exited with code 2

Re-enable use of other LLMs

In the previous refactor we took out

  • Azure
  • HuggingFace

We should re-enable these and add

  • local model support (llama, etc)

redis error

Hi, I'm receiving this error:

| redis.exceptions.ConnectionError: Error -2 connecting to your_redis_instance.your_region.redisenterprise.cache.azure.net:10000. Name or service not known.

Based on my understanding, this error suggests that a Redis setup is required in Azure. However, I couldn't find any information about this infrastructure in the project's readme. I'm also curious about the presence of a Redis Docker container in the setup, and how it relates to the Azure setup. I mean, it should work with the local Redis and should not be need of an Azure one, doesn't?

Regards.

Connection error 10061

Windows 10. Running locally. The error appears after generating chat

Full text:

2023-10-25 01:46:49.561 Uncaught app exception
Traceback (most recent call last):
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 263, in _get_or_create_cached_value
    cached_result = cache.read_result(value_key)
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\streamlit\runtime\caching\cache_resource_api.py", line 500, in read_result
    raise CacheKeyNotFoundError()
streamlit.runtime.caching.cache_errors.CacheKeyNotFoundError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 311, in _handle_cache_miss
    cached_result = cache.read_result(value_key)
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\streamlit\runtime\caching\cache_resource_api.py", line 500, in read_result
    raise CacheKeyNotFoundError()
streamlit.runtime.caching.cache_errors.CacheKeyNotFoundError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\redis\connection.py", line 264, in connect
    sock = self.retry.call_with_retry(
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\redis\retry.py", line 46, in call_with_retry
    return do()
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\redis\connection.py", line 265, in <lambda>
    lambda: self._connect(), lambda error: self.disconnect(error)
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\redis\connection.py", line 628, in _connect
    raise err
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\redis\connection.py", line 616, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 541, in _run_script
    exec(code, module.__dict__)
  File "C:\Users\exore\Desktop\ArxivChatGuru\app\app.py", line 116, in <module>
    create_arxiv_index(st.session_state['arxiv_topic'], st.session_state['num_papers'], prompt)
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 211, in wrapper
    return cached_func(*args, **kwargs)
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 240, in __call__
    return self._get_or_create_cached_value(args, kwargs)
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 266, in _get_or_create_cached_value
    return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 320, in _handle_cache_miss
    computed_value = self._info.func(*func_args, **func_kwargs)
  File "C:\Users\exore\Desktop\ArxivChatGuru\app\app.py", line 26, in create_arxiv_index
    arxiv_db = get_vectorstore(arxiv_documents)
  File "C:\Users\exore\Desktop\ArxivChatGuru\app\qna\db.py", line 43, in get_vectorstore
    vectorstore = RedisVDB.from_documents(
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\langchain\schema\vectorstore.py", line 438, in from_documents
    return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\langchain\vectorstores\redis\base.py", line 497, in from_texts
    instance, _ = cls.from_texts_return_keys(
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\langchain\vectorstores\redis\base.py", line 414, in from_texts_return_keys
    instance = cls(
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\langchain\vectorstores\redis\base.py", line 280, in __init__
    check_redis_module_exist(redis_client, REDIS_REQUIRED_MODULES)
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\langchain\utilities\redis.py", line 49, in check_redis_module_exist
    installed_modules = client.module_list()
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\redis\commands\core.py", line 5885, in module_list
    return self.execute_command("MODULE LIST")
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\redis\client.py", line 533, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\redis\connection.py", line 1087, in get_connection
    connection.connect()
  File "C:\Users\exore\anaconda3\envs\ArxivChatGuru\lib\site-packages\redis\connection.py", line 270, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 10061 connecting to localhost:6379. No connection could be made because the target machine actively refused it.

Support more LLM controls

Right now the app just supports changing the number of tokens. Given that this is largely meant to be an app to learn about how to use vector databases and LLMs with langchain, we should support parameters like tempurature, and also changing the underlying prompt.

api access error

I have placed my key in .env.template file and yet recieved this error
openai.error.AuthenticationError: Incorrect API key provided: ADD_YOUR*****HERE.

FIX: cp command missed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.