Giter Club home page Giter Club logo

Comments (15)

sublimator avatar sublimator commented on June 1, 2024 1

Well it's not working very well with tinyllama, but regardless :)

from lumos.

sublimator avatar sublimator commented on June 1, 2024 1

Upping to Phi2 seemed a bit better fwiw

from lumos.

sublimator avatar sublimator commented on June 1, 2024 1

Hi @andrewnguonly

I was just doing quick checks like "what did user x say?" on a page of comments and it was not getting things correct. TBH, I'd have to compare results against SoTA embedding/LLM pairing to get more calibrated expectations for specific queries like that.

In any case, I think being able to set an embedding model to a smaller model for responsiveness could be a useful thing.

250ms vs 100ms per chunk is substantial.

from lumos.

sublimator avatar sublimator commented on June 1, 2024 1

image

nomic flies!

from lumos.

sublimator avatar sublimator commented on June 1, 2024 1

Disjoint musings:

  1. You could leave it in there, just disabled/hidden I suppose if you want to save the work
    More generally: I am a fan of feature flags as branches bit rot

  2. Given this is a developer tool, and people need to build it anyway, process.env.LUMOS_EMBEDDING_MODEL ought to suffice for people who just want to try out different embedding models.

  3. You could also somehow call out to users to weigh in at the relevant Ollama issue

from lumos.

sublimator avatar sublimator commented on June 1, 2024 1

This is potentially relevant:
ollama/ollama#2848

from lumos.

andrewnguonly avatar andrewnguonly commented on June 1, 2024

Well it's not working very well with tinyllama, but regardless :)

Interesting idea. By "not working very well", do you mean the retrieval/search results were bad and resulted in a bad response overall?

from lumos.

andrewnguonly avatar andrewnguonly commented on June 1, 2024

Got it, thanks for clarifying. I'm wondering if this approach in combination with some retrieval/search optimization could make a difference. I haven't looked into it too deeply yet though.

from lumos.

sublimator avatar sublimator commented on June 1, 2024

retrieval/search optimization

I don't have any real experience with RAG yet, so I've "got nothing"
I assume you meant something more like keyword search to more quickly find the relevant chunks?

I wonder if you could develop some kind of special query syntax for that, shall we say, mode?

Which makes me further wonder if you'd ever use a combination of "classical" search techniques along with vector similarity?

One or the other, or both, and how that would inform said syntax.

from lumos.

sublimator avatar sublimator commented on June 1, 2024

The tricky thing about this, compared to "normal" RAG, is the desire (requirement?) for quick responses. Typically all the embedding is done well before, right? Other than shared embeddings (non-trivial technical/political challenge) or keyword/stem-word search, I'm not sure what you can do.

from lumos.

sublimator avatar sublimator commented on June 1, 2024

https://ollama.com/library/nomic-embed-text
https://ollama.com/library/all-minilm

from lumos.

andrewnguonly avatar andrewnguonly commented on June 1, 2024

I just gave nomic a quick test. Lightning fast! I'm tempted to just hardcode it (and fall back to the main model if it's not available). I'm hesitant to expose a separate configuration for the embedding model because of option fatigue. What do you think?

Which makes me further wonder if you'd ever use a combination of "classical" search techniques along with vector similarity?

Separately, I'm working on adding a "classical" keyword search (and hybrid search) to the RAG workflow. Check this out: #101.

There will be a few other small improvements to the RAG implementation as well.

from lumos.

sublimator avatar sublimator commented on June 1, 2024

option fatigue.

You could just go with process.env to start with if that's a concern.
Would allow folks to customize without needing to manage branches

Ollama only has 2 embedding models atm, but later?

from lumos.

andrewnguonly avatar andrewnguonly commented on June 1, 2024

Here's an open PR with the functionality to switch the embedding model: #105

After testing, I'm finding that it's actually quite slow to switch between models. Ollama only stores 1 model in memory, so every prompt requires unloading the previous model and reloading the embedding model, and then the opposite immediately. There's an open issue in the Ollama repo (ollama/ollama#976) addressing this (and a few closed ones with workarounds). I'm not sure where this is on the priority list for them.

I'm not sure if I'll merge the PR. Net-net, it doesn't seem like a significantly better improvement for the user experience (yet).

from lumos.

andrewnguonly avatar andrewnguonly commented on June 1, 2024

Ollama v1.28.0 has a bug fix to stop Ollama from hanging when switching models. I'll test this out with my open PR.

from lumos.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.