Comments (15)
Well it's not working very well with tinyllama, but regardless :)
from lumos.
Upping to Phi2 seemed a bit better fwiw
from lumos.
I was just doing quick checks like "what did user x say?" on a page of comments and it was not getting things correct. TBH, I'd have to compare results against SoTA embedding/LLM pairing to get more calibrated expectations for specific queries like that.
In any case, I think being able to set an embedding model to a smaller model for responsiveness could be a useful thing.
250ms vs 100ms per chunk is substantial.
from lumos.
nomic flies!
from lumos.
Disjoint musings:
-
You could leave it in there, just disabled/hidden I suppose if you want to save the work
More generally: I am a fan of feature flags as branches bit rot -
Given this is a developer tool, and people need to build it anyway, process.env.LUMOS_EMBEDDING_MODEL ought to suffice for people who just want to try out different embedding models.
-
You could also somehow call out to users to weigh in at the relevant Ollama issue
from lumos.
This is potentially relevant:
ollama/ollama#2848
from lumos.
Well it's not working very well with tinyllama, but regardless :)
Interesting idea. By "not working very well", do you mean the retrieval/search results were bad and resulted in a bad response overall?
from lumos.
Got it, thanks for clarifying. I'm wondering if this approach in combination with some retrieval/search optimization could make a difference. I haven't looked into it too deeply yet though.
from lumos.
retrieval/search optimization
I don't have any real experience with RAG yet, so I've "got nothing"
I assume you meant something more like keyword search to more quickly find the relevant chunks?
I wonder if you could develop some kind of special query syntax for that, shall we say, mode?
Which makes me further wonder if you'd ever use a combination of "classical" search techniques along with vector similarity?
One or the other, or both, and how that would inform said syntax.
from lumos.
The tricky thing about this, compared to "normal" RAG, is the desire (requirement?) for quick responses. Typically all the embedding is done well before, right? Other than shared embeddings (non-trivial technical/political challenge) or keyword/stem-word search, I'm not sure what you can do.
from lumos.
https://ollama.com/library/nomic-embed-text
https://ollama.com/library/all-minilm
from lumos.
I just gave nomic
a quick test. Lightning fast! I'm tempted to just hardcode it (and fall back to the main model if it's not available). I'm hesitant to expose a separate configuration for the embedding model because of option fatigue. What do you think?
Which makes me further wonder if you'd ever use a combination of "classical" search techniques along with vector similarity?
Separately, I'm working on adding a "classical" keyword search (and hybrid search) to the RAG workflow. Check this out: #101.
There will be a few other small improvements to the RAG implementation as well.
from lumos.
option fatigue.
You could just go with process.env to start with if that's a concern.
Would allow folks to customize without needing to manage branches
Ollama only has 2 embedding models atm, but later?
from lumos.
Here's an open PR with the functionality to switch the embedding model: #105
After testing, I'm finding that it's actually quite slow to switch between models. Ollama only stores 1 model in memory, so every prompt requires unloading the previous model and reloading the embedding model, and then the opposite immediately. There's an open issue in the Ollama repo (ollama/ollama#976) addressing this (and a few closed ones with workarounds). I'm not sure where this is on the priority list for them.
I'm not sure if I'll merge the PR. Net-net, it doesn't seem like a significantly better improvement for the user experience (yet).
from lumos.
Ollama v1.28.0 has a bug fix to stop Ollama from hanging when switching models. I'll test this out with my open PR.
from lumos.
Related Issues (20)
- Support uploading image file (multimodal)
- Add shortcut to unattach file
- Add functionality to delete individual message (or regenerate LLM response)
- Add LangChain `YoutubeLoader` HOT 4
- Add audio document loader HOT 1
- WebLLM HOT 2
- Window resolution abnormal HOT 12
- Add support for embedding model `mxbai-embed-large`
- Summarize chat for chat title/preview (chat history view)
- Support `snowflake-arctic-embed` embedding model
- Provide an option to use either selected text in the browser or text copied to the clipboard (necessary for using Google docs etc) HOT 13
- Add support for `moondream`, `llava-llama3`, and `llava-phi3` models
- Experiment with Ollama API concurrency HOT 1
- Any plans for Firefox? HOT 1
- Review/edit Mutable AI wiki
- Preload Ollama models
- Investigate LangGraph/agent implementation
- OPENAI embeddings API Integration HOT 4
- Yellow background is copied with text when highlighting text > right click > copy > paste
- TypeScript error when building `langgraph` branch with Webpack HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lumos.