gabrielchua / ragxplorer Goto Github PK

View Code? Open in Web Editor NEW

972.0 11.0 88.0 1.09 MB

Open-source tool to visualise your RAG 🔮

License: MIT License

Python 7.47% Jupyter Notebook 92.53%

llm python rag streamlit visualization interactive

ragxplorer's Introduction

🔥 AI Events in Singapore 🇸🇬

ragxplorer's People

Contributors

Stargazers

Watchers

Forkers

alexfilothodoros thefig06 aiwaldoh baha-arfaoui danikagupta gryhkn vipmaha1 cesarcalvocobo polya20 codeaudit oxfordoutlander fly2fire ipriyaaanshu humphreymburu aniruddha-adhikary bytequester awellis auxon manu87ds borhanmukto bnassivet sri-awadh chandan0000 abhiwins jiaweilinx rhinojosa simjak thevhd menonpg n8bwert liric24 arrietafernando cobusgreyling techthiyanes aairom a-govil ai-jie01 pas-mllr marcoschaarbr joeaelkhoury eltociear mbrukman eputnam77 pengjinning djoguns aicodehunt kekewind yanxg abhishekmani12 tedsecretsource ayunillariy saidimu lexsf quanticoi ruanwz krishnagopika jgalego jonnyb111 geekcheng dan-s-mueller mentordotgit qinwentu startime-h jrseow-ro mohamedeladib isayahj sundogs8603 allthingsllm glareone edwardburns datakult0r jarekmor balakreshnan karthikrchandran carcruz97 vince-lam ibrahimroshdy doctorslimm david-ver4 dupsys ahmetinceelli jacky68147527 ego miladansari csujeong

ragxplorer's Issues

Add a link or button to reset the form so you can upload a new document after analyzing an initial document

Is your feature request related to a problem? Please describe.
Currently, once you upload and analyze a document, you have to refresh the application to analyze a new document. This is a request to add a button or link to the form so you can more easily analyze a new document.

Describe the solution you'd like
Add a link to the upper left-hand corner (in the header) to go back to the main form.

Describe alternatives you've considered
display the whole form below the graph

Data Privacy

Details

Please help us understand whether the uploaded pdfs on the hosted webapp are stored somewhere or are instantly deleted after the session?

Retrieved IDs unrelated to user query

Description

RAGxplorer returns seemingly random results to a query (see image). This is happening because chromadb returns documents typographically ordered by ID ("1", "10", "100" and so on, instead of "0", "1", "2").

Configuration

Document: All Amazon Shareholder Letters.pdf
Embedding Model: all-MiniLM-L6-v2
Chunk Size: 256
Chunk Overlap: 0

Write Tests

The graph has low accessibility (dark blue dots on a black background)

Describe the bug
The contrast ratio of the graph is somewhat low contributing to low accessibility.

To Reproduce
Steps to reproduce the behavior:

Upload a file
Enter search terms and press Enter
The graph displays on a black background with dark blue dots representing the vectors

Expected behavior
The contrast ratio between the dark blue dots and the background is great enough to more easily see the distribution.

Add more dimensionality reduction techniques

Is your feature request related to a problem? Please describe.

The package currently only applies UMAP dimensionality reduction. More dimensionality reduction techniques, like t-SNE and PCA, could be added to ragxplorer to improve functionality.

Describe the solution you'd like

Add t-SNE and PCA dimensionality reduction techniques to the package by updating the projections.py and ragxplorer.py scripts.

An additional parameter of dim_reduct will be added to the load_pdf and visualize_query methods. This parameter will have a default argument of UMAP and can also take t-SNE and PCA as inputs.

Feedback and Suggestions to Improve this Project

First and foremost, I want to express my heartfelt thanks to all of you for showing interest in this project. It's incredibly humbling and exciting to see others taking notice of something I built.

As this is my first time writing code that's being used by others, I am keenly aware that there's a lot I can learn and many ways in which the project can be improved. That's where I need your help!

I'm looking for suggestions on how best to carry this project forward and organize the code more effectively. If you have any ideas, best practices, or tips, please don't hesitate to share. Your insights will be invaluable in making this a better and more user-friendly project.

I also ask for your patience and understanding regarding the current state of the code. I'm aware that it may not be up to the professional standards yet, and I'm fully committed to learning and improving. Any constructive feedback or advice in this regard would be greatly appreciated.

Please feel free to post your suggestions, feedback, or any questions you might have as responses to this issue. I'm looking forward to reading your input and engaging in discussions that can lead to the betterment of this project.

Install, nothing worked

When ran 

pip install -r xxxxxx

nothing worked:

ERROR: Could not find a version that satisfies the requirement pysqlite3-binary (from versions: none)
ERROR: No matching distribution found for pysqlite3-binary

Sweep: RAGxplorer Demo link in README not working

Details

README.txt : Demo url link is not working. replace it with working url.

Checklist

Modify README.md ✓ 79dd9e0 Edit
Running GitHub Actions for README.md ✓ Edit

Create tutorials

Jupyter Notebook style

Pinecone support

Re-Build Streamlit App

Is your feature request related to a problem? Please describe.

With the next version, the streamlit app has been removed.

Describe the solution you'd like

To add back the streamlit app

Describe alternatives you've considered

To have the streamlit app in another repo

Data Privacy

Adding support for other Sentence Transformers

https://github.com/gabrielchua/RAGxplorer/blob/fd2c3706df667437bd29b53e5419bfccf77312b9/ragxplorer/ragxplorer.py#L81C44-L81C80

Can we add option to use other sentence transformers from the SentenceTransformerEmbeddingFunction class from chromadb.utils.embedding_functions?

Add ability to save visualization data and connect to existing database.

Is your feature request related to a problem? Please describe.
Having to load a PDF every time you use the tool can take a long time. It is also useful to look at multiple PDFs. Furthermore, since UMAP is stochastic, having the ability to reproduce results would be helpful for those performing studies.

Describe the solution you'd like
Add functionality to connect to existing chromadb's.
Add export functionality for the UMAP function and projections, which can be re-read in.

Describe alternatives you've considered
I have created a forked version of this repository which performs these tasks: https://github.com/dsmueller3760/RAGxplorer/tree/load_db (under load_db branch).

Additional context
N/A

Add support for huggingface embedding models

Is your feature request related to a problem? Please describe.
A user currently cannot add their own embedding models.

Describe the solution you'd like
To integrate with hugging face.

Additional context

See embeddings.py in the experiment branch
This seems promising: https://docs.trychroma.com/embeddings/hugging-face

gabrielchua / ragxplorer Goto Github PK

ragxplorer's Introduction

🔥 AI Events in Singapore 🇸🇬

ragxplorer's People

Contributors

Stargazers

Watchers

Forkers

ragxplorer's Issues

Details

Description

Configuration

Details

Recommend Projects

Recommend Topics

Recommend Org