Giter Club home page Giter Club logo

lumos's Introduction

Lumos

A RAG LLM co-pilot for browsing the web, powered by local LLMs.

Screenshot of Lumos

This Chrome extension is powered by Ollama. Inference is done on your local machine without any remote server support. However, due to security constraints in the Chrome extension platform, the app does rely on local server support to run the LLM. This app is inspired by the Chrome extension example provided by the Web LLM project and the local LLM examples provided by LangChain.

Lumos. Nox. Lumos. Nox.

Use Cases

  • Summarize long threads on issue tracking sites, forums, and social media sites.
  • Summarize news articles.
  • Ask questions about reviews on business and product pages.
  • Ask questions about long, technical documentation.
  • ... what else?

Ollama Server

A local Ollama server is needed for the embedding database and LLM inference. Download and install Ollama and the CLI here.

Pull Image

Example:

ollama pull llama2

Start Server

Example:

OLLAMA_ORIGINS=chrome-extension://* ollama serve

Terminal output:

2023/11/19 20:55:16 images.go:799: total blobs: 6
2023/11/19 20:55:16 images.go:806: total unused blobs removed: 0
2023/11/19 20:55:16 routes.go:777: Listening on 127.0.0.1:11434 (version 0.1.10)

Note: The environment variable OLLAMA_ORIGINS must be set to chrome-extension://* to allow requests originating from the Chrome extension. The following error will occur in the Chrome extension if OLLAMA_ORIGINS is not set properly.

Access to fetch at 'http://localhost:11434/api/tags' from origin 'chrome-extension://<extension_id>' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

macOS

Run launchctl setenv to set OLLAMA_ORIGINS.

launchctl setenv OLLAMA_ORIGINS "chrome-extension://*"

Setting environment variables on Mac (Ollama)

Docker

The Ollama server can also be run in a Docker container. The container should have the OLLAMA_ORIGINS environment variable set to chrome-extension://*.

Run docker run with the -e flag to set the OLLAMA_ORIGINS environment variable:

docker run -e OLLAMA_ORIGINS="chrome-extension://*" -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Chrome Extension

In the project directory, you can run:

npm test

Launches the test runner in the interactive watch mode.
See the section about running tests for more information.

npm run lint

Runs eslint and prettier on src and __tests__ files.

npm run build

Builds the app for production to the dist folder.
It correctly bundles React in production mode and optimizes the build for the best performance.

The build is minified and the filenames include the hashes.
Your app is ready to be deployed!

See the section about deployment for more information.

Load Unpacked Extension (Install)

https://developer.chrome.com/docs/extensions/mv3/getstarted/development-basics/#load-unpacked

Keyboard Shortcut

Create a keyboard shortcut to make Lumos easily accessible.

  1. Navigate to chrome://extensions/shortcuts.
  2. Configure a shortcut for Lumos (Activate the extension). For example, ⌘L (command key + L).

Releases

If you don't have npm installed, you can download the pre-built extension package from the Releases page.

Lumos Options

Right-click on the extension icon and select Options to access the extension's Options page.

  • Ollama Model: Select desired model (e.g. llama2)
  • Ollama Embedding Model: Select desired embedding model (e.g. nomic-embed-text). Caution: Using a different embedding model requires Ollama to swap models, which may incur undesired latency in the app. This is a known limitation in Ollama and may be improved in the future.
  • Ollama Host: Select desired host (defaults to http://localhost:11434)
  • Vector Store TTL (minutes): Number of minutes to store a URL's content in the vector store cache.
  • Content Parser Config: Lumos's default content parser will extract all text content between a page's <body></body> tag. To customize the content parser, add an entry to the configuration.
  • Enable/Disable Tools: Enable or disable individual tools. If a tool is enabled, a custom prefix trigger (e.g. "calc:") can be specified to override the app's internal prompt classification mechanism.
  • Enable/Disable Dark Arts: 😈

Content Parser Config

Each URL path can have its own content parser. The content parser config for the longest URL path will be matched.

  • chunkSize: Number of characters to chunk page content into for indexing into RAG vectorstore
  • chunkOverlap: Number of characters to overlap in chunks for indexing into RAG vectorstore
  • selectors: document.querySelector() queries to perform to retrieve page content
  • selectorsAll: document.querySelectorAll() queries to perform to retrieve page content

For example, given the following config, if the URL path of the current tab is domain.com/path1/subpath1/subsubpath1, then the config for domain.com/path1/subpath1 will be used (i.e. chunkSize=600).

{
  "domain.com/path1/subpath1": {
    "chunkSize": 600,
    "chunkOverlap": 200,
    "selectors": [
      "#id"
    ],
    "selectorsAll": []
  },
  "domain.com/path1": {
    "chunkSize": 500,
    "chunkOverlap": 0,
    "selectors": [
      ".className"
    ],
    "selectorsAll": []
  }
}

See docs for How to Create a Custom Content Parser. See documentation for querySelector() and querySelectorAll() to confirm all querying capabilities.

Example:

{
  "default": {
    "chunkSize": 500,
    "chunkOverlap": 0,
    "selectors": [
      "body"
    ],
    "selectorsAll": []
  },
  "medium.com": {
    "chunkSize": 500,
    "chunkOverlap": 0,
    "selectors": [
      "article"
    ],
    "selectorsAll": []
  },
  "reddit.com": {
    "chunkSize": 500,
    "chunkOverlap": 0,
    "selectors": [],
    "selectorsAll": [
      "shreddit-comment"
    ]
  },
  "stackoverflow.com": {
    "chunkSize": 500,
    "chunkOverlap": 0,
    "selectors": [
      "#question-header",
      "#mainbar"
    ],
    "selectorsAll": []
  },
  "wikipedia.org": {
    "chunkSize": 2000,
    "chunkOverlap": 500,
    "selectors": [
      "#bodyContent"
    ],
    "selectorsAll": []
  },
  "yelp.com": {
    "chunkSize": 500,
    "chunkOverlap": 0,
    "selectors": [
      "#location-and-hours",
      "#reviews"
    ],
    "selectorsAll": []
  }
}

Highlighted Content

Alternatively, if content is highlighted on a page (e.g. highlighted text), that content will be parsed instead of the content produced from the content parser configuration.

Note: Content that is highlighted will not be cached in the vector store cache. Each subsequent prompt containing highlighted content will generate new embeddings.

Shortcuts

  • cmd + c: Copy last message to clipboard.
  • cmd + j: Toggle Disable content parsing checkbox.
  • cmd + k: Clear all messages.
  • cmd + ;: Open/close Chat History panel.
  • ctrl + c: Cancel request (LLM request/streaming or embeddings generation)
  • ctrl + x: Remove file attachment.
  • ctrl + r: Regenerate last LLM response.

Multimodal

Lumos supports multimodal models! Images that are present on the current page will be downloaded and bound to the model for prompting. See documentation and examples here.

File Attachments

File attachments can be uploaded to Lumos. The contents of a file will be parsed and processed through Lumos's RAG workflow (similar to processing page content). By default, the text content of a file will be parsed if the extension type is not explicitly listed below.

Supported extension types:

  • .csv
  • .json
  • .pdf
  • any plain text file format (.txt, .md, .py, etc)

Note: If an attachment is present, page content will not be parsed. Remove the file attachment to resume parsing page content.

Image Files

Image files will be processed through Lumos's multimodal workflow (requires multimodal model).

Supported image types:

  • .jpeg, .jpg
  • .png

Tools (Experimental)

Lumos invokes Tools automatically based on the provided prompt. See documentation and examples here.

Reading

lumos's People

Contributors

andrewnguonly avatar billykern avatar draculabo avatar eltociear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lumos's Issues

Separate (generally smaller) embeddings model?

Using tinyllama

[GIN] 2024/02/10 - 12:33:07 | 200 |   51.673542ms |       127.0.0.1 | POST     "/api/embeddings"
[GIN] 2024/02/10 - 12:33:07 | 200 |    51.70225ms |       127.0.0.1 | POST     "/api/embeddings"
[GIN] 2024/02/10 - 12:33:07 | 200 |   51.951042ms |       127.0.0.1 | POST     "/api/embeddings"
[GIN] 2024/02/10 - 12:33:07 | 200 |   43.755125ms |       127.0.0.1 | POST     "/api/embeddings"

Maybe you can "get away with" using a smaller model for quick embeddings to make things a bit more responsive ??

"Connection" indicator

Played with this again, did a git reset hard to origin to update (a bit mindlessly, oops), then of course it overwrote DEFAULT_MODEL (Oh, I see there's a gui for that now in options)

An indicator somewhere for:
Is Ollama alive?
Needs starting? Responding at all ?
Is model available ( I guess on the fly model config is a whole other issue)
Needs origins configuring ? 403 (or whatever) Forbidden responses?
I guess you could use the browser action icon even.

The thing is it just seems to quietly go about doing nothing when ollama is not running.

image

image

re: models
image

I have various quants of the same model, dunno how often people would actually??? but yeah, I guess if there is maybe should it show tag as discriminator?!

Make the popover longer (?)

This is more a suggestion, I feel like the popover (or popup in Arc) is a bit short.
Maybe make it at least resizable?

In this example I can barely see the conversation:

Screen Shot 2024-02-05 at 16 26 59@2x

Thank you,

Function calling support?

Can it do function calling? It will automate so much stuff if it can. Please close this if it already can do that.

I have played with function calling on ChatGPT and tried to make a local ChatGPT based tool. But I can't just let ChatGPT go to pages and do research for me (will be too expensive).

With function calling Lumos will be able to answer any question sending request to appropriate tool

Getting HTTP 400 errors on /api/embeddings

Hey! thx for your time in this poc.

Not working for me. Waiting for instructions to debug this. Installation not user friendly.

Context

  • Apple M1 Pro

  • Sonoma 14.1.1

  • 16GB RAM

  • Ollama working well:

ollama -v
ollama version 0.1.12
OLLAMA_ORIGINS=chrome-extension://* ollama serve
  • Extension built OK.
  • Extension installed in chrome OK:
    • Site access: This extension can read and change your data on sites. You can control which sites the extension can access.
    • Automatically allow access on the following sites: ENABLED.
    • Allow in Incognito: DISABLED
    • Allow access to file URLs: ENABLED
      Collect errors: ENABLED.
    • Source: Unpacked extension Loaded from: ~/Lumos/dist

Error

[GIN] 2023/11/30 - 11:55:01 | 400 |     174.833µs |       127.0.0.1 | POST     "/api/embeddings"
[GIN] 2023/11/30 - 11:55:03 | 400 |     411.917µs |       127.0.0.1 | POST     "/api/embeddings"
[GIN] 2023/11/30 - 11:55:06 | 400 |     285.208µs |       127.0.0.1 | POST     "/api/embeddings"
[GIN] 2023/11/30 - 11:55:11 | 400 |     375.584µs |       127.0.0.1 | POST     "/api/embeddings"
[GIN] 2023/11/30 - 11:55:19 | 400 |     382.666µs |       127.0.0.1 | POST     "/api/embeddings"

Blocked API Access due to CORS Policy - Disable CORS Checking for Specific Request.

When I try to use the extension, I receive the following error. I have Ollama running locally and can query it from Emacs and receive responses.

Access to fetch at 'http://127.0.0.1:11434/api/embeddings' from origin 'chrome-extension://asfdasfasdfs' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

I fed this to Llama and it suggested the following fixes. However I don't know see where in background.ts the fetch call is for me to add a no-cors mode.

The error message you encountered is related to the Cross-Origin
Resource Sharing (CORS) policy in web browsers. The browser is
blocking access to the http://127.0.0.1:11434/api/embeddings URL
from originating from a different domain than the one that served the
HTML document.

The error message specifically states that there is no
Access-Control-Allow-Origin header present in the response to the
preflight request (which is a request made by the browser to check if
the server supports CORS). As a result, the browser is blocking the
request from proceeding.

To fix this issue, you have two options:

  1. Add the Access-Control-Allow-Origin header to the server
    response. This can be done by adding the following line to the
    server-side code that handles the API request:
    #+BEGIN_SRC php
    header('Access-Control-Allow-Origin: *');
    #+END_SRC
    This will allow the browser to make requests to the API from any origin.

  2. Disable CORS checking for the specific API request by setting the
    mode parameter to 'no-cors' in the fetch() function. This can be
    done like this:
    #+BEGIN_SRC javascript
    fetch('http://127.0.0.1:11434/api/embeddings', {
    mode: 'no-cors'
    })
    #+END_SRC
    This will disable CORS checking for the specific API request, allowing
    it to proceed even though there is no Access-Control-Allow-Origin
    header present in the response.

Note that disabling CORS checking can be a security risk, as it allows
requests from any origin to access the API. You should only use this
option when you have verified that the API is being accessed from a
trusted source.

Ollama started service successfully but browser did not respond

First of all thank you for your excellent work, but I have found some issues with my local deployment. My docker backend has successfully pulled up the local model service. In the command line curl access works fine, but when I set the OLLAMA_BASE_URL and OLLAMA_MODEL in script/background.ts the plugin is not responding in the browser.

My specific process is to use ollama inside docker for local model packaging, outside docker using curl access there is normal return, but after setting two parameters in the plugin there is no response from the plugin inside the browser.

Refactor RAG workflow

  • skip embedding if page content is already embedded
  • add configurable search parameters

On longs pages it seems to get stuck

On long pages it seems to halt (e.g. https://news.ycombinator.com/item?id=39190468)

image

Maybe fixed in new versions

Might be nice to have some indication of the amount of work it's doing
Progress bar or something
I mean you know how many chunks it needs to embed, right?

I don't know the feasibility, but wondering if you can do embedding in parallel somehow?
I suppose with mmap'd model with model shared by multiple processes it could be?
But that's more of an ollama question perhaps?

Thanks

Add LICENSE

As the title says, can you add the License for this awesome project?

Embeddings cache regression (^h^h^h confusion)

I just updated to commit 72439bf but it seems like there is a regression?

image

The TTL is 60 minutes but it seems like it's requesting a series of embeddings for each query.

Ok, so I uninstalled it, then reinstalled it, in case my chrome storage options got in a wonky state somehow.

It's then not showing the connection indicator (which I was /was/ seeing at first!) for the model 404

image image

So, back to the embeddings, I've removed/installed. Once I select a model in options hopefully we are good?

Response:

image

Lots of embeddings (long page):
image

Hrmmm, it definitely seems like it's calling the embedding endpoint many times for each query. I could have sworn you were caching, that's what the TTL means, right!?

Oh, it's not cached when isHighlightedContent:

          chrome.runtime.sendMessage({
            prompt: prompt,
            skipRAG: false,
            chunkSize: config.chunkSize,
            chunkOverlap: config.chunkOverlap,
            url: activeTabUrl.toString(),
            skipCache: isHighlightedContent,
            imageURLs: imageURLs,
          });

Based on:

const getHighlightedContent = (): string => {
  const selection = window.getSelection();
  return selection ? selection.toString().trim() : "";
};

Oh, I see! I guess it's a bit complicated to use the cache easily, eh?

Hrmmmmmm, hhere's other optimizations you could do, but compared to creating completely new embeddings a simple linear search over the highlighted string to see if it contains any of the chunks that would otherwise be returned by the configured parser (i.e. "canonical" chunks?) ?

Implement custom content chunking for domains

Each domain should have its own chunkSize and chunkOverlap values. These values should be passed to the background script for processing.

Also, investigate if it's useful for each domain should have its own vectorstore retrieval config (e.g. number of documents to return).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.