troykelly / emby-dedupe Goto Github PK
View Code? Open in Web Editor NEWDedeuplicates Emby media
License: Apache License 2.0
Dedeuplicates Emby media
License: Apache License 2.0
➜ ~ docker run -it --rm \
-e DEDUPE_EMBY_HOST="https://emby.example.com" \
-e DEDUPE_EMBY_API_KEY="0000000000000000000000000" \
-e DEDUPE_EMBY_LIBRARY="TV shows" \
-e DEDUPE_DOIT="true" \
-e DEDUPE_EMBY_USERNAME="user" \
-e DEDUPE_EMBY_PASSWORD="secret\!PA55W0RD" \
ghcr.io/troykelly/emby-dedupe:edge
Fetching media items: 100%|█████████| 109993/109993 [1:11:27<00:00, 25.66item/s]
Building sets: 100%|██████████████████| 64497/64497 [00:00<00:00, 718285.65item/s]
Grouping duplicates: 100%|████████████| 38007/38007 [00:00<00:00, 892569.50item/s]
Processing duplicate groups: 0%| | 28/11768 [00:00<04:25, 44.28group/s]2023-11-29 10:03:47,159 - EmbyDedupe - ERROR - An unexpected error occurred: list index out of range
Processing duplicate groups: 0%| | 30/11768 [00:00<05:48, 33.67group/s]
Line 643 in 7eac635
Our EmbyDedupe script identifies duplicates purely based on media metadata available from Emby's API. It then selects one media item to keep and marks the rest for deletion. The logic behind the selection of the 'best' copy among duplicates primarily considers basic attributes such as resolution and bitrate. However, this approach could be improved by adopting a more sophisticated multi-criteria rating system encompassing a broader range of media quality metrics such as codec efficiency, audio quality, frame rate, HDR presence, etc.
The goal is to enhance the duplicate resolution logic to make a more informed decision when selecting the highest quality media item to retain. This will necessitate devising a weighted scoring system where each media attribute contributes to a composite 'quality score' for each media item. The item with the highest score would be presumed to be of the best quality and retained, while the others would be marked for deletion.
Key criteria to be considered in the scoring system should include, but not be limited to:
Other factors may also be considered where relevant, such as the colour depth, the presence of subtitles, and multiple language tracks.
Before we implement these changes, we need to address several considerations:
I would appreciate feedback and thoughts on the proposed changes, including any additional criteria that might be relevant or potential pitfalls we should be aware of. Let's fine-tune our approach to establish a robust logic for media selection that satisfies our need for high-quality content.
determine_items_to_delete
function to incorporate the new weighted scoring system./app/./dedupe.py:114: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn(f"Request failed: {exc}. Retrying...")
File "/app/./dedupe.py", line 1279, in <module>
main()
File "/app/./dedupe.py", line 1245, in main
decisions = process_duplicate_groups(client, base_url, duplicates)
File "/app/./dedupe.py", line 908, in process_duplicate_groups
decision = determine_items_to_delete(group, items_details)
File "/app/./dedupe.py", line 814, in determine_items_to_delete
rated_items = rate_media_items(all_items_details)
File "/app/./dedupe.py", line 839, in rate_media_items
(s for s in item["MediaStreams"] if s["Type"] == "Video"), None
KeyError: 'MediaStreams'
# Ensures that only the latest edge workflow is running
concurrency:
group: edge-workflow
cancel-in-progress: true
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.