Giter Club home page Giter Club logo

emby-dedupe's People

Contributors

dependabot[bot] avatar troykelly avatar

Stargazers

 avatar

Watchers

 avatar  avatar

emby-dedupe's Issues

Missing concurrency

# Ensures that only the latest edge workflow is running
concurrency:
  group: edge-workflow
  cancel-in-progress: true

No protection for missing keys

  File "/app/./dedupe.py", line 1279, in <module>
    main()
  File "/app/./dedupe.py", line 1245, in main
    decisions = process_duplicate_groups(client, base_url, duplicates)
  File "/app/./dedupe.py", line 908, in process_duplicate_groups
    decision = determine_items_to_delete(group, items_details)
  File "/app/./dedupe.py", line 814, in determine_items_to_delete
    rated_items = rate_media_items(all_items_details)
  File "/app/./dedupe.py", line 839, in rate_media_items
    (s for s in item["MediaStreams"] if s["Type"] == "Video"), None
KeyError: 'MediaStreams'

`warn` is deprecated

/app/./dedupe.py:114: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(f"Request failed: {exc}. Retrying...")

Improve Media Selection Logic for Duplicate Resolution

Challenge Description

Our EmbyDedupe script identifies duplicates purely based on media metadata available from Emby's API. It then selects one media item to keep and marks the rest for deletion. The logic behind the selection of the 'best' copy among duplicates primarily considers basic attributes such as resolution and bitrate. However, this approach could be improved by adopting a more sophisticated multi-criteria rating system encompassing a broader range of media quality metrics such as codec efficiency, audio quality, frame rate, HDR presence, etc.

Objective

The goal is to enhance the duplicate resolution logic to make a more informed decision when selecting the highest quality media item to retain. This will necessitate devising a weighted scoring system where each media attribute contributes to a composite 'quality score' for each media item. The item with the highest score would be presumed to be of the best quality and retained, while the others would be marked for deletion.

Criteria for Quality Assessment

Key criteria to be considered in the scoring system should include, but not be limited to:

  • Resolution: Both width and height dimensions.
  • Video Codec Efficiency: Efficiency of video codecs such as H.264, HEVC (H.265), VP9, and AV1.
  • Audio Quality: Channel count, audio codec type, and bitrate.
  • File Size: Generally, larger file sizes suggest higher quality, but this should be weighted less heavily than other criteria to account for codec efficiency.
  • Frame Rate: Actual frame rate information from the media, with a preference for higher rates.
  • HDR Presence: Whether the video has HDR generally improves viewing quality.

Other factors may also be considered where relevant, such as the colour depth, the presence of subtitles, and multiple language tracks.

Discussion Points

Before we implement these changes, we need to address several considerations:

  1. Determining the appropriate weight for each criterion based on its significance towards perceived media quality.
  2. Ensuring the system is flexible enough to handle future updates or new media attributes.
  3. Evaluating the computational complexity of the new selection logic and its impact on the script's performance, especially when dealing with large libraries.

I would appreciate feedback and thoughts on the proposed changes, including any additional criteria that might be relevant or potential pitfalls we should be aware of. Let's fine-tune our approach to establish a robust logic for media selection that satisfies our need for high-quality content.

Action Items

  • Discuss and finalize the criteria and their respective weights for the quality assessment formula.
  • Update the determine_items_to_delete function to incorporate the new weighted scoring system.
  • Test and validate the new media selection logic to ensure its accuracy and efficiency.
  • Document the changes and their rationale for future reference and maintenance.

Processing duplicates fails with `An unexpected error occurred: list index out of range`

~ docker run -it --rm \
  -e DEDUPE_EMBY_HOST="https://emby.example.com" \
  -e DEDUPE_EMBY_API_KEY="0000000000000000000000000" \
  -e DEDUPE_EMBY_LIBRARY="TV shows" \
  -e DEDUPE_DOIT="true" \
  -e DEDUPE_EMBY_USERNAME="user" \
  -e DEDUPE_EMBY_PASSWORD="secret\!PA55W0RD" \
  ghcr.io/troykelly/emby-dedupe:edge
Fetching media items: 100%|█████████| 109993/109993 [1:11:27<00:00, 25.66item/s]
Building sets: 100%|██████████████████| 64497/64497 [00:00<00:00, 718285.65item/s]
Grouping duplicates: 100%|████████████| 38007/38007 [00:00<00:00, 892569.50item/s]
Processing duplicate groups:   0%|          | 28/11768 [00:00<04:25, 44.28group/s]2023-11-29 10:03:47,159 - EmbyDedupe - ERROR - An unexpected error occurred: list index out of range
Processing duplicate groups:   0%|          | 30/11768 [00:00<05:48, 33.67group/s]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.