Light

Separate (generally smaller) embeddings model? about lumos HOT 15 CLOSED

sublimator commented on June 1, 2024 1

Separate (generally smaller) embeddings model?

from lumos.

Comments (15)

sublimator commented on June 1, 2024 1

Well it's not working very well with tinyllama, but regardless :)

from lumos.

sublimator commented on June 1, 2024 1

Upping to Phi2 seemed a bit better fwiw

from lumos.

sublimator commented on June 1, 2024 1

Hi @andrewnguonly

I was just doing quick checks like "what did user x say?" on a page of comments and it was not getting things correct. TBH, I'd have to compare results against SoTA embedding/LLM pairing to get more calibrated expectations for specific queries like that.

In any case, I think being able to set an embedding model to a smaller model for responsiveness could be a useful thing.

250ms vs 100ms per chunk is substantial.

from lumos.

sublimator commented on June 1, 2024 1

nomic flies!

from lumos.

sublimator commented on June 1, 2024 1

Disjoint musings:

You could leave it in there, just disabled/hidden I suppose if you want to save the work
More generally: I am a fan of feature flags as branches bit rot
Given this is a developer tool, and people need to build it anyway, process.env.LUMOS_EMBEDDING_MODEL ought to suffice for people who just want to try out different embedding models.
You could also somehow call out to users to weigh in at the relevant Ollama issue

from lumos.

sublimator commented on June 1, 2024 1

This is potentially relevant:
ollama/ollama#2848

from lumos.

andrewnguonly commented on June 1, 2024

Well it's not working very well with tinyllama, but regardless :)

Interesting idea. By "not working very well", do you mean the retrieval/search results were bad and resulted in a bad response overall?

from lumos.

andrewnguonly commented on June 1, 2024

Got it, thanks for clarifying. I'm wondering if this approach in combination with some retrieval/search optimization could make a difference. I haven't looked into it too deeply yet though.

from lumos.

sublimator commented on June 1, 2024

retrieval/search optimization

I don't have any real experience with RAG yet, so I've "got nothing"
I assume you meant something more like keyword search to more quickly find the relevant chunks?

I wonder if you could develop some kind of special query syntax for that, shall we say, mode?

Which makes me further wonder if you'd ever use a combination of "classical" search techniques along with vector similarity?

One or the other, or both, and how that would inform said syntax.

from lumos.

sublimator commented on June 1, 2024

The tricky thing about this, compared to "normal" RAG, is the desire (requirement?) for quick responses. Typically all the embedding is done well before, right? Other than shared embeddings (non-trivial technical/political challenge) or keyword/stem-word search, I'm not sure what you can do.

from lumos.

sublimator commented on June 1, 2024

https://ollama.com/library/nomic-embed-text
https://ollama.com/library/all-minilm

from lumos.

andrewnguonly commented on June 1, 2024

I just gave nomic a quick test. Lightning fast! I'm tempted to just hardcode it (and fall back to the main model if it's not available). I'm hesitant to expose a separate configuration for the embedding model because of option fatigue. What do you think?

Which makes me further wonder if you'd ever use a combination of "classical" search techniques along with vector similarity?

Separately, I'm working on adding a "classical" keyword search (and hybrid search) to the RAG workflow. Check this out: #101.

There will be a few other small improvements to the RAG implementation as well.

from lumos.

sublimator commented on June 1, 2024

option fatigue.

You could just go with process.env to start with if that's a concern.
Would allow folks to customize without needing to manage branches

Ollama only has 2 embedding models atm, but later?

from lumos.

andrewnguonly commented on June 1, 2024

Here's an open PR with the functionality to switch the embedding model: #105

After testing, I'm finding that it's actually quite slow to switch between models. Ollama only stores 1 model in memory, so every prompt requires unloading the previous model and reloading the embedding model, and then the opposite immediately. There's an open issue in the Ollama repo (ollama/ollama#976) addressing this (and a few closed ones with workarounds). I'm not sure where this is on the priority list for them.

I'm not sure if I'll merge the PR. Net-net, it doesn't seem like a significantly better improvement for the user experience (yet).

from lumos.

andrewnguonly commented on June 1, 2024

Ollama v1.28.0 has a bug fix to stop Ollama from hanging when switching models. I'll test this out with my open PR.

from lumos.

Related Issues (20)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.