andrewnguonly / lumos Goto Github PK

View Code? Open in Web Editor NEW

1.3K 9.0 89.0 11.26 MB

A RAG LLM co-pilot for browsing the web, powered by local LLMs

License: MIT License

TypeScript 96.33% JavaScript 2.11% CSS 1.55%

chrome-extension langchain langchain-js llm ollama react typescript webpack

lumos's Introduction

Lumos

A RAG LLM co-pilot for browsing the web, powered by local LLMs.

This Chrome extension is powered by Ollama. Inference is done on your local machine without any remote server support. However, due to security constraints in the Chrome extension platform, the app does rely on local server support to run the LLM. This app is inspired by the Chrome extension example provided by the Web LLM project and the local LLM examples provided by LangChain.

Lumos. Nox. Lumos. Nox.

Use Cases

Summarize long threads on issue tracking sites, forums, and social media sites.
Summarize news articles.
Ask questions about reviews on business and product pages.
Ask questions about long, technical documentation.
... what else?

Ollama Server

A local Ollama server is needed for the embedding database and LLM inference. Download and install Ollama and the CLI here.

Pull Image

Example:

ollama pull llama2

Start Server

Example:

OLLAMA_ORIGINS=chrome-extension://* ollama serve

Terminal output:

2023/11/19 20:55:16 images.go:799: total blobs: 6
2023/11/19 20:55:16 images.go:806: total unused blobs removed: 0
2023/11/19 20:55:16 routes.go:777: Listening on 127.0.0.1:11434 (version 0.1.10)

Note: The environment variable OLLAMA_ORIGINS must be set to chrome-extension://* to allow requests originating from the Chrome extension. The following error will occur in the Chrome extension if OLLAMA_ORIGINS is not set properly.

Access to fetch at 'http://localhost:11434/api/tags' from origin 'chrome-extension://<extension_id>' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

macOS

Run launchctl setenv to set OLLAMA_ORIGINS.

launchctl setenv OLLAMA_ORIGINS "chrome-extension://*"

Setting environment variables on Mac (Ollama)

Docker

The Ollama server can also be run in a Docker container. The container should have the OLLAMA_ORIGINS environment variable set to chrome-extension://*.

Run docker run with the -e flag to set the OLLAMA_ORIGINS environment variable:

docker run -e OLLAMA_ORIGINS="chrome-extension://*" -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Chrome Extension

In the project directory, you can run:

`npm test`

Launches the test runner in the interactive watch mode.
See the section about running tests for more information.

`npm run lint`

Runs eslint and prettier on src and __tests__ files.

`npm run build`

Builds the app for production to the dist folder.
It correctly bundles React in production mode and optimizes the build for the best performance.

The build is minified and the filenames include the hashes.
Your app is ready to be deployed!

See the section about deployment for more information.

Load Unpacked Extension (Install)

https://developer.chrome.com/docs/extensions/mv3/getstarted/development-basics/#load-unpacked

Keyboard Shortcut

Create a keyboard shortcut to make Lumos easily accessible.

Navigate to chrome://extensions/shortcuts.
Configure a shortcut for Lumos (Activate the extension). For example, ⌘L (command key + L).

Releases

If you don't have npm installed, you can download the pre-built extension package from the Releases page.

Lumos Options

Right-click on the extension icon and select Options to access the extension's Options page.

Ollama Model: Select desired model (e.g. llama2)
Ollama Embedding Model: Select desired embedding model (e.g. nomic-embed-text). Caution: Using a different embedding model requires Ollama to swap models, which may incur undesired latency in the app. This is a known limitation in Ollama and may be improved in the future.
Ollama Host: Select desired host (defaults to http://localhost:11434)
Vector Store TTL (minutes): Number of minutes to store a URL's content in the vector store cache.
Content Parser Config: Lumos's default content parser will extract all text content between a page's <body></body> tag. To customize the content parser, add an entry to the configuration.
Enable/Disable Tools: Enable or disable individual tools. If a tool is enabled, a custom prefix trigger (e.g. "calc:") can be specified to override the app's internal prompt classification mechanism.
Enable/Disable Dark Arts: 😈

Content Parser Config

Each URL path can have its own content parser. The content parser config for the longest URL path will be matched.

chunkSize: Number of characters to chunk page content into for indexing into RAG vectorstore
chunkOverlap: Number of characters to overlap in chunks for indexing into RAG vectorstore
selectors: document.querySelector() queries to perform to retrieve page content
selectorsAll: document.querySelectorAll() queries to perform to retrieve page content

For example, given the following config, if the URL path of the current tab is domain.com/path1/subpath1/subsubpath1, then the config for domain.com/path1/subpath1 will be used (i.e. chunkSize=600).

{
  "domain.com/path1/subpath1": {
    "chunkSize": 600,
    "chunkOverlap": 200,
    "selectors": [
      "#id"
    ],
    "selectorsAll": []
  },
  "domain.com/path1": {
    "chunkSize": 500,
    "chunkOverlap": 0,
    "selectors": [
      ".className"
    ],
    "selectorsAll": []
  }
}

See docs for How to Create a Custom Content Parser. See documentation for querySelector() and querySelectorAll() to confirm all querying capabilities.

Example:

{
  "default": {
    "chunkSize": 500,
    "chunkOverlap": 0,
    "selectors": [
      "body"
    ],
    "selectorsAll": []
  },
  "medium.com": {
    "chunkSize": 500,
    "chunkOverlap": 0,
    "selectors": [
      "article"
    ],
    "selectorsAll": []
  },
  "reddit.com": {
    "chunkSize": 500,
    "chunkOverlap": 0,
    "selectors": [],
    "selectorsAll": [
      "shreddit-comment"
    ]
  },
  "stackoverflow.com": {
    "chunkSize": 500,
    "chunkOverlap": 0,
    "selectors": [
      "#question-header",
      "#mainbar"
    ],
    "selectorsAll": []
  },
  "wikipedia.org": {
    "chunkSize": 2000,
    "chunkOverlap": 500,
    "selectors": [
      "#bodyContent"
    ],
    "selectorsAll": []
  },
  "yelp.com": {
    "chunkSize": 500,
    "chunkOverlap": 0,
    "selectors": [
      "#location-and-hours",
      "#reviews"
    ],
    "selectorsAll": []
  }
}

Highlighted Content

Alternatively, if content is highlighted on a page (e.g. highlighted text), that content will be parsed instead of the content produced from the content parser configuration.

Note: Content that is highlighted will not be cached in the vector store cache. Each subsequent prompt containing highlighted content will generate new embeddings.

Shortcuts

cmd + c: Copy last message to clipboard.
cmd + j: Toggle Disable content parsing checkbox.
cmd + k: Clear all messages.
cmd + ;: Open/close Chat History panel.
ctrl + c: Cancel request (LLM request/streaming or embeddings generation)
ctrl + x: Remove file attachment.
ctrl + r: Regenerate last LLM response.

Multimodal

Lumos supports multimodal models! Images that are present on the current page will be downloaded and bound to the model for prompting. See documentation and examples here.

File Attachments

File attachments can be uploaded to Lumos. The contents of a file will be parsed and processed through Lumos's RAG workflow (similar to processing page content). By default, the text content of a file will be parsed if the extension type is not explicitly listed below.

Supported extension types:

.csv
.json
.pdf
any plain text file format (.txt, .md, .py, etc)

Note: If an attachment is present, page content will not be parsed. Remove the file attachment to resume parsing page content.

Image Files

Image files will be processed through Lumos's multimodal workflow (requires multimodal model).

Supported image types:

.jpeg, .jpg
.png

Tools (Experimental)

Lumos invokes Tools automatically based on the provided prompt. See documentation and examples here.

Reading

lumos's People

Contributors

Stargazers

Watchers

Forkers

rayfernando1337-ai-forks rochemedia trizko jihunkim0 doubleespresso2018 saifrahmed vagelim osbarcelos79 render-ai hbcbh1999 pent sublimator rogervaas kennethh72 escottgoodwin frikadellios pyamin1878 ailabteam shashipal95 yanxg natestraub qqq-tech frrabelo ai-jie01 taner45 bngnly edwin254 tfius jwinter74 syaikhipin seshakiran codeaudit zeroxclem drjay7 sean810720 soi-20 rahulmanuwas mivanovitch polya20 scomants-0 centisreptilejide tearchoi-womanne x-xglossynn joeaelkhoury xrinairgi comfyrejiggyny bloggeno14 syntherperfectiveq surrealsleek-extorksta lawrt-serenesilly burkepinchglitznotes spreamhachoneprep mbrukman flashyzool45 weblogik56 glionaptu opissroo-glasedip jerryankur allishoesa scopency-46 mirembere papiguy fisherno-timeat cruzazzan-mrebur hhy5277 wysstartgo jaytoday lamardealmaker eltociear orefaleoluwayinka svorwerk-flextg henri-edh 5l1v3r1 vital121 maheskrishnan benjie91 coderworld520 smartlabsai billykern apollohuang1 francip devcharli draculabo snafi99 hanyunzhidao syedtahirhussan lokeshjonnakuti kustomzone bandid

lumos's Issues

Add TypeScript/JavaScript linter and configure pre-commit hook + CI/CD

Update LangChain package, migrate to `@langchain/core`, `@langchain/community`

implementing pieces ts sdk for local llms

Hi @andrewnguonly can you please try to build another project with https://github.com/pieces-app/client

Pieces allows for using LLMs to also get ChatGPT like Persistent chats that can answer any questions. And it supports using local LLMs. there is a /chats API endpoint.

The chrome extension is here: https://docs.pieces.app/extensions-plugins/chrome

Can you build a similar project but with LLM running through their Typescript SDK.

Focus on text input after token streaming is done

Separate (generally smaller) embeddings model?

Using tinyllama

[GIN] 2024/02/10 - 12:33:07 | 200 |   51.673542ms |       127.0.0.1 | POST     "/api/embeddings"
[GIN] 2024/02/10 - 12:33:07 | 200 |    51.70225ms |       127.0.0.1 | POST     "/api/embeddings"
[GIN] 2024/02/10 - 12:33:07 | 200 |   51.951042ms |       127.0.0.1 | POST     "/api/embeddings"
[GIN] 2024/02/10 - 12:33:07 | 200 |   43.755125ms |       127.0.0.1 | POST     "/api/embeddings"

Maybe you can "get away with" using a smaller model for quick embeddings to make things a bit more responsive ??

Support multi-modal LLM and media input

https://js.langchain.com/docs/integrations/llms/ollama#multimodal-models

"Connection" indicator

Played with this again, did a git reset hard to origin to update (a bit mindlessly, oops), then of course it overwrote DEFAULT_MODEL (Oh, I see there's a gui for that now in options)

An indicator somewhere for:
Is Ollama alive?
Needs starting? Responding at all ?
Is model available ( I guess on the fly model config is a whole other issue)
Needs origins configuring ? 403 (or whatever) Forbidden responses?
I guess you could use the browser action icon even.

The thing is it just seems to quietly go about doing nothing when ollama is not running.

re: models

I have various quants of the same model, dunno how often people would actually??? but yeah, I guess if there is maybe should it show tag as discriminator?!

Increase max tokens size for Web LLM

Implement UI to persist chat history

Persist LLM req/res state if extension popup is closed (and reopened)

Add calculator functionality

Make the popover longer (?)

This is more a suggestion, I feel like the popover (or popup in Arc) is a bit short.
Maybe make it at least resizable?

In this example I can barely see the conversation:

Thank you,

Investigate possible functionality for exporting LLM results, saving results, or searching historical results

Inconsistent spacing between avatar and message bubble

Model Web LLM ChatRestModule as LangChain SimpleChatModel

Document steps to install on Firefox

Error when trying to load the extension, as is:

Function calling support?

Can it do function calling? It will automate so much stuff if it can. Please close this if it already can do that.

I have played with function calling on ChatGPT and tried to make a local ChatGPT based tool. But I can't just let ChatGPT go to pages and do research for me (will be too expensive).

With function calling Lumos will be able to answer any question sending request to appropriate tool

Explore integrating LangChain tools

https://js.langchain.com/docs/integrations/tools

Create `CONTRIBUTING` file

https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/setting-guidelines-for-repository-contributors#adding-a-contributing-file

Getting HTTP 400 errors on /api/embeddings

Hey! thx for your time in this poc.

Not working for me. Waiting for instructions to debug this. Installation not user friendly.

Context

Apple M1 Pro
Sonoma 14.1.1
16GB RAM
Ollama working well:

ollama -v
ollama version 0.1.12
OLLAMA_ORIGINS=chrome-extension://* ollama serve

Extension built OK.
Extension installed in chrome OK:
- Site access: This extension can read and change your data on sites. You can control which sites the extension can access.
- Automatically allow access on the following sites: ENABLED.
- Allow in Incognito: DISABLED
- Allow access to file URLs: ENABLED
  Collect errors: ENABLED.
- Source: Unpacked extension Loaded from: ~/Lumos/dist

Error

[GIN] 2023/11/30 - 11:55:01 | 400 |     174.833µs |       127.0.0.1 | POST     "/api/embeddings"
[GIN] 2023/11/30 - 11:55:03 | 400 |     411.917µs |       127.0.0.1 | POST     "/api/embeddings"
[GIN] 2023/11/30 - 11:55:06 | 400 |     285.208µs |       127.0.0.1 | POST     "/api/embeddings"
[GIN] 2023/11/30 - 11:55:11 | 400 |     375.584µs |       127.0.0.1 | POST     "/api/embeddings"
[GIN] 2023/11/30 - 11:55:19 | 400 |     382.666µs |       127.0.0.1 | POST     "/api/embeddings"

Create documentation/tutorial showing how to inspect webpage to select content for parsing

Allow sending only selected visible text on a web page to the Ollama server

This is a feature request. Since sending the entire page is quite time consuming, can the extension allow for sending only the selected portion of a web page to Ollama.

This will be especially helpful with processing documents such as Google docs, as well as large web pages.

Support URL patterns for custom content config

Remove `@mlc-ai/web-llm` webpack configuration

Update README with instructions for configuring custom parsing

Add weather tool

Blocked API Access due to CORS Policy - Disable CORS Checking for Specific Request.

When I try to use the extension, I receive the following error. I have Ollama running locally and can query it from Emacs and receive responses.

Access to fetch at 'http://127.0.0.1:11434/api/embeddings' from origin 'chrome-extension://asfdasfasdfs' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

I fed this to Llama and it suggested the following fixes. However I don't know see where in background.ts the fetch call is for me to add a no-cors mode.

The error message you encountered is related to the Cross-Origin
Resource Sharing (CORS) policy in web browsers. The browser is
blocking access to the ~~http://127.0.0.1:11434/api/embeddings~~ URL
from originating from a different domain than the one that served the
HTML document.

The error message specifically states that there is no
~~Access-Control-Allow-Origin~~ header present in the response to the
preflight request (which is a request made by the browser to check if
the server supports CORS). As a result, the browser is blocking the
request from proceeding.

To fix this issue, you have two options:

Add the ~~Access-Control-Allow-Origin~~ header to the server
response. This can be done by adding the following line to the
server-side code that handles the API request:
#+BEGIN_SRC php
header('Access-Control-Allow-Origin: *');
#+END_SRC
This will allow the browser to make requests to the API from any origin.

Disable CORS checking for the specific API request by setting the
~~mode~~ parameter to ~~'no-cors'~~ in the fetch() function. This can be
done like this:
#+BEGIN_SRC javascript
fetch('http://127.0.0.1:11434/api/embeddings', {
mode: 'no-cors'
})
#+END_SRC
This will disable CORS checking for the specific API request, allowing
it to proceed even though there is no ~~Access-Control-Allow-Origin~~
header present in the response.

Note that disabling CORS checking can be a security risk, as it allows
requests from any origin to access the API. You should only use this
option when you have verified that the API is being accessed from a
trusted source.

Ollama started service successfully but browser did not respond

First of all thank you for your excellent work, but I have found some issues with my local deployment. My docker backend has successfully pulled up the local model service. In the command line curl access works fine, but when I set the OLLAMA_BASE_URL and OLLAMA_MODEL in script/background.ts the plugin is not responding in the browser.

My specific process is to use ollama inside docker for local model packaging, outside docker using curl access there is normal return, but after setting two parameters in the plugin there is no response from the plugin inside the browser.

Add functionality to parse highlighted text and pass it to prompt

Refactor RAG workflow

skip embedding if page content is already embedded
add configurable search parameters

Refactor `isArithmeticExpression()` and `isImagePrompt()` into common configurable function

Also, consider implementing function calling to retrieve binary yes or no response.

Implement Ollama `keepAlive` parameter for API requests

https://github.com/ollama/ollama/blob/v0.1.23/api/types.go#L45

Note: This may be dependent on LangChain changes.

On longs pages it seems to get stuck

On long pages it seems to halt (e.g. https://news.ycombinator.com/item?id=39190468)

Maybe fixed in new versions

Might be nice to have some indication of the amount of work it's doing
Progress bar or something
I mean you know how many chunks it needs to embed, right?

I don't know the feasibility, but wondering if you can do embedding in parallel somehow?
I suppose with mmap'd model with model shared by multiple processes it could be?
But that's more of an ollama question perhaps?

Thanks

Add LICENSE

As the title says, can you add the License for this awesome project?

Bring your own API key

Add listener for "Enter" key to submit prompt

Update README with Use Cases section

NaN cannot be parsed if `Vector Store TTL` config is deleted

Add error state and helper text for vector store TTL config.

`cmd+k` to clear messages

Fix bug with parsing HTML and CSS content

Save text field state after closing the extension popup

Update loading bar color and text field outline color to yellow

Add functionality to manually disable content parsing

This is helpful for use cases where a user just wants to prompt the LLM without any content.

Allow usage from `chrome://` URL

Embeddings cache regression (^h^h^h confusion)

I just updated to commit 72439bf but it seems like there is a regression?

The TTL is 60 minutes but it seems like it's requesting a series of embeddings for each query.

Ok, so I uninstalled it, then reinstalled it, in case my chrome storage options got in a wonky state somehow.

It's then not showing the connection indicator (which I was /was/ seeing at first!) for the model 404

So, back to the embeddings, I've removed/installed. Once I select a model in options hopefully we are good?

Response:

Lots of embeddings (long page):

Hrmmm, it definitely seems like it's calling the embedding endpoint many times for each query. I could have sworn you were caching, that's what the TTL means, right!?

Oh, it's not cached when isHighlightedContent:

          chrome.runtime.sendMessage({
            prompt: prompt,
            skipRAG: false,
            chunkSize: config.chunkSize,
            chunkOverlap: config.chunkOverlap,
            url: activeTabUrl.toString(),
            skipCache: isHighlightedContent,
            imageURLs: imageURLs,
          });

Based on:

const getHighlightedContent = (): string => {
  const selection = window.getSelection();
  return selection ? selection.toString().trim() : "";
};

Oh, I see! I guess it's a bit complicated to use the cache easily, eh?

Hrmmmmmm, hhere's other optimizations you could do, but compared to creating completely new embeddings a simple linear search over the highlighted string to see if it contains any of the chunks that would otherwise be returned by the configured parser (i.e. "canonical" chunks?) ?

andrewnguonly / lumos Goto Github PK

lumos's Introduction

Lumos

Use Cases

Ollama Server

Pull Image

Start Server

macOS

Docker

Chrome Extension

npm test

npm run lint

npm run build

Load Unpacked Extension (Install)

Keyboard Shortcut

Releases

Lumos Options

Content Parser Config

Highlighted Content

Shortcuts

Multimodal

File Attachments

Image Files

Tools (Experimental)

Reading

lumos's People

Contributors

Stargazers

Watchers

Forkers

lumos's Issues

Context

Error

Recommend Projects

Recommend Topics

Recommend Org

`npm test`

`npm run lint`

`npm run build`