gabrielchua / ragxplorer Goto Github PK
View Code? Open in Web Editor NEWOpen-source tool to visualise your RAG ๐ฎ
License: MIT License
Open-source tool to visualise your RAG ๐ฎ
License: MIT License
Is your feature request related to a problem? Please describe.
Currently, once you upload and analyze a document, you have to refresh the application to analyze a new document. This is a request to add a button or link to the form so you can more easily analyze a new document.
Describe the solution you'd like
Add a link to the upper left-hand corner (in the header) to go back to the main form.
Describe alternatives you've considered
display the whole form below the graph
Please help us understand whether the uploaded pdfs on the hosted webapp are stored somewhere or are instantly deleted after the session?
RAGxplorer returns seemingly random results to a query (see image). This is happening because chromadb returns documents typographically ordered by ID ("1", "10", "100" and so on, instead of "0", "1", "2").
all-MiniLM-L6-v2
Describe the bug
The contrast ratio of the graph is somewhat low contributing to low accessibility.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The contrast ratio between the dark blue dots and the background is great enough to more easily see the distribution.
Is your feature request related to a problem? Please describe.
The package currently only applies UMAP dimensionality reduction. More dimensionality reduction techniques, like t-SNE and PCA, could be added to ragxplorer to improve functionality.
Describe the solution you'd like
Add t-SNE and PCA dimensionality reduction techniques to the package by updating the projections.py
and ragxplorer.py
scripts.
An additional parameter of dim_reduct
will be added to the load_pdf
and visualize_query
methods. This parameter will have a default argument of UMAP
and can also take t-SNE
and PCA
as inputs.
First and foremost, I want to express my heartfelt thanks to all of you for showing interest in this project. It's incredibly humbling and exciting to see others taking notice of something I built.
As this is my first time writing code that's being used by others, I am keenly aware that there's a lot I can learn and many ways in which the project can be improved. That's where I need your help!
I'm looking for suggestions on how best to carry this project forward and organize the code more effectively. If you have any ideas, best practices, or tips, please don't hesitate to share. Your insights will be invaluable in making this a better and more user-friendly project.
I also ask for your patience and understanding regarding the current state of the code. I'm aware that it may not be up to the professional standards yet, and I'm fully committed to learning and improving. Any constructive feedback or advice in this regard would be greatly appreciated.
Please feel free to post your suggestions, feedback, or any questions you might have as responses to this issue. I'm looking forward to reading your input and engaging in discussions that can lead to the betterment of this project.
When ran
pip install -r xxxxxx
nothing worked:
ERROR: Could not find a version that satisfies the requirement pysqlite3-binary (from versions: none)
ERROR: No matching distribution found for pysqlite3-binary
Jupyter Notebook style
Is your feature request related to a problem? Please describe.
Describe the solution you'd like
Describe alternatives you've considered
Can we add option to use other sentence transformers from the SentenceTransformerEmbeddingFunction
class from chromadb.utils.embedding_functions
?
Is your feature request related to a problem? Please describe.
Having to load a PDF every time you use the tool can take a long time. It is also useful to look at multiple PDFs. Furthermore, since UMAP is stochastic, having the ability to reproduce results would be helpful for those performing studies.
Describe the solution you'd like
Add functionality to connect to existing chromadb's.
Add export functionality for the UMAP function and projections, which can be re-read in.
Describe alternatives you've considered
I have created a forked version of this repository which performs these tasks: https://github.com/dsmueller3760/RAGxplorer/tree/load_db (under load_db branch).
Additional context
N/A
Is your feature request related to a problem? Please describe.
A user currently cannot add their own embedding models.
Describe the solution you'd like
To integrate with hugging face.
Additional context
embeddings.py
in the experiment
branchA declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.