sylv / marver Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 0.0 6.02 MB

turn your messy media archive into a personal streaming service, photo viewer, and searchable library.

License: GNU Affero General Public License v3.0

HTML 0.18% JavaScript 5.22% TypeScript 80.44% CSS 0.52% Rust 13.64%

marver's People

Contributors

Stargazers

Watchers

marver's Issues

Map view

Show GPS locations on a map so you can have a view of where your photos are from.

Pigeon maps will be good for this
k-means clustering can be used to group close photos into groups, which get smaller as you zoom in

Template metadata extraction

Let the user provide "templates", which are applied to files as a faster alternative to LLM extraction or other methods.

{title} ({year}) [imdbId-{imdbId}]

which is then converted into regex. Each property comes with a regex pattern that is pulled in, eg

/^[0-9A-Za-z_ +]+ \([0-9]{4}\) \[imdbId-tt[0-9]{4,12}\]$/

It would probably have to be setup per-directory, because I don't think it would be generic enough to apply without false positives.

LLM integration

During LLM extraction, feed in multiple samples and let the LLM determine if files in that directory can be extracted using a template instead. If they can, it returns the template and that is run instead. If they can't, the LLM extracts it as normal.

This would require batching multiple files into one LLM call so it can see if there are patterns in the directory structure. This will be necessary for other things anyway (more accurate gallery index extraction, etc) but isn't possible with the current setup.
This would significantly speed up LLM extraction compared to running per file, and if using external platforms like OpenAI would also cost significantly less.

Document support

Detect scanned receipts
Preview support for PDFs/other documents
Auto-detecting images of documents and treat them as documents
- OCR should allow to convert images of documents to a PDF which is much more useful

Detect file moves

Moving or changing files slightly should copy over metadata if the files are mostly the same

Support either the file path changing but the contents being the same (same size+perceptual hash+maybe sha25[6 hash) and the file extension changing but the contents remaining the same (same dimensions+perceptual hash) to allow for people organising their library and transcoding files to a new format without losing metadata.
For images, the width+height should match, or the perceptual hash should match.
For videos, the duration, height+width or perceptual hash of most frames should match.
For other types, don't copy tags unless explicitly told to.
For all types, it should only happen if the old file was deleted and the new file was added in the time since the last scan.]

Query builder

Let users query their data using raw SQL.
An LLM can be used to generate queries based on the user's input, "show me all photos taken between 2010 and 2015, with Ryan in them" could query by exif metadata, and filter by photos with faces that match Ryan. Can just open a read-only sqlite connection so nothing can break.
Would be a nightmare to do permissions for, but admin-only would still be useful.
Would also be a nightmare to do UI for, displaying the results in the expected format. Maybe just let the LLM pick the appropriate view format - give it options like generic, image_gallery, tv_shows, movies, documents and each one displays the results in a different way.

Use sponsorblock data

If a file is detected as a YouTube video, import SponsorBlock data and store it as segments for that video so ads can be easily skipped.

Dump python

dockerizing python is hard

Face detection
- Currently using insightface, which is based on onnx. (ort)[https://github.com/pykeio/ort] in rust might work, but it would require reimplementing a lot of things.
Text recognition
- Currently using doctr, though results are veeery not so great. paddleocr might be better but even in python it's a nightmare. There are ports to onnx which could work, but again, it would require reimplementing a load of stuff.
CLIP
- Probably the easiest. Also possible to use onnx

Smart Tags

Let users pick images that represent tags, and images that do not. Using CLIP vectors we can then do a pretty naive check by using similarity to determine if that image belongs in the tag or not, by seeing how close it is to the "in this tag" and how close it is to the "not in this tag" groups.

The user could configure a similarity required to match, and a similarity required to not match.

The user should be able to upload additional images to either list directly
There should be UI that shows which images will be picked for the tag during setup, ideally with the less close matches sorted first so the user can use them to improve the edge cases.
Configurable similarity for fine tuning, with sane defaults
Ability to import/export smart tags to a file, so they can be shared and improved by the community.
If a tag is removed from an image manually, automatically add it to the "does not contain" set and vice versa for adding. This should be configurable.
Show in the tag list whether a tag was added through the smart tag system with a little icon beside it.

sylv / marver Goto Github PK

marver's People

Contributors

Stargazers

Watchers

marver's Issues

Map view

Template metadata extraction

LLM integration

Document support

Detect file moves

Query builder

Use sponsorblock data

Dump python

Smart Tags

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent