Giter Club home page Giter Club logo

marver's People

Contributors

sylv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

marver's Issues

Map view

Show GPS locations on a map so you can have a view of where your photos are from.

  • Pigeon maps will be good for this
  • k-means clustering can be used to group close photos into groups, which get smaller as you zoom in

Template metadata extraction

Let the user provide "templates", which are applied to files as a faster alternative to LLM extraction or other methods.

{title} ({year}) [imdbId-{imdbId}]

which is then converted into regex. Each property comes with a regex pattern that is pulled in, eg

/^[0-9A-Za-z_ +]+ \([0-9]{4}\) \[imdbId-tt[0-9]{4,12}\]$/

It would probably have to be setup per-directory, because I don't think it would be generic enough to apply without false positives.

LLM integration

During LLM extraction, feed in multiple samples and let the LLM determine if files in that directory can be extracted using a template instead. If they can, it returns the template and that is run instead. If they can't, the LLM extracts it as normal.

  • This would require batching multiple files into one LLM call so it can see if there are patterns in the directory structure. This will be necessary for other things anyway (more accurate gallery index extraction, etc) but isn't possible with the current setup.
  • This would significantly speed up LLM extraction compared to running per file, and if using external platforms like OpenAI would also cost significantly less.

Document support

  • Detect scanned receipts
  • Preview support for PDFs/other documents
  • Auto-detecting images of documents and treat them as documents
    • OCR should allow to convert images of documents to a PDF which is much more useful

Detect file moves

Moving or changing files slightly should copy over metadata if the files are mostly the same

  • Support either the file path changing but the contents being the same (same size+perceptual hash+maybe sha25[6 hash) and the file extension changing but the contents remaining the same (same dimensions+perceptual hash) to allow for people organising their library and transcoding files to a new format without losing metadata.
  • For images, the width+height should match, or the perceptual hash should match.
  • For videos, the duration, height+width or perceptual hash of most frames should match.
  • For other types, don't copy tags unless explicitly told to.
  • For all types, it should only happen if the old file was deleted and the new file was added in the time since the last scan.]

Query builder

  • Let users query their data using raw SQL.
  • An LLM can be used to generate queries based on the user's input, "show me all photos taken between 2010 and 2015, with Ryan in them" could query by exif metadata, and filter by photos with faces that match Ryan. Can just open a read-only sqlite connection so nothing can break.
  • Would be a nightmare to do permissions for, but admin-only would still be useful.
  • Would also be a nightmare to do UI for, displaying the results in the expected format. Maybe just let the LLM pick the appropriate view format - give it options like generic, image_gallery, tv_shows, movies, documents and each one displays the results in a different way.

Use sponsorblock data

If a file is detected as a YouTube video, import SponsorBlock data and store it as segments for that video so ads can be easily skipped.

Dump python

dockerizing python is hard

  • Face detection
    • Currently using insightface, which is based on onnx. (ort)[https://github.com/pykeio/ort] in rust might work, but it would require reimplementing a lot of things.
  • Text recognition
    • Currently using doctr, though results are veeery not so great. paddleocr might be better but even in python it's a nightmare. There are ports to onnx which could work, but again, it would require reimplementing a load of stuff.
  • CLIP
    • Probably the easiest. Also possible to use onnx

Smart Tags

Let users pick images that represent tags, and images that do not. Using CLIP vectors we can then do a pretty naive check by using similarity to determine if that image belongs in the tag or not, by seeing how close it is to the "in this tag" and how close it is to the "not in this tag" groups.

The user could configure a similarity required to match, and a similarity required to not match.

  • The user should be able to upload additional images to either list directly
  • There should be UI that shows which images will be picked for the tag during setup, ideally with the less close matches sorted first so the user can use them to improve the edge cases.
  • Configurable similarity for fine tuning, with sane defaults
  • Ability to import/export smart tags to a file, so they can be shared and improved by the community.
  • If a tag is removed from an image manually, automatically add it to the "does not contain" set and vice versa for adding. This should be configurable.
  • Show in the tag list whether a tag was added through the smart tag system with a little icon beside it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.