laughingman77 / video_list_csv Goto Github PK

Recursively create a CSV with details of all movie and TV media in a directory. Useful for cataloguing archive disks. This is tailored towards Jellyfin, Plex and Kodi style collections.

License: Apache License 2.0

Shell 100.00%

archive csv jellyfin bash-script filesystem video home-theater kodi plex bash posix spreadsheet

video_list_csv's Introduction

video_list_csv

Overview

This script recursively scans a directory and creates a CSV with details of all Movie and TV media in that directory and sub-directories. This is aimed towards Home Theatre enthusiasts and their archives. However it can be used for any directories containing video media.

The script uses ffprobe or mediainfo to scan the library. It can automatically detect which libraries are present, or you can define what package you want to use as a CLI argument or globally in the .env file.

This is a POSIX compliant bash script. It will work on all Linux and Mac systems. Windows users will be able to run the script using WSL (see How to run .sh or Shell Script file in Windows 11/10) (untested).

The script assumes that you have separate archives for TV and movies and automatically detects what kind of archive it is scanning and renders the appropriate archive list. You can also define what kind of list you want as a CLI argument or globally in the .env file.

You can define what columns you want in the CSV and the order in the settings file. An extra Sort column is automatically added to the end of the columns, to allow proper sorting (no other columns are 100% reliable).

The resultant CVS contains summary data of Disk Size, Disk Space used and Disk Space free. This allows you to see free disk space after other files in the archive are taken into account.

Note: The scan will be fastest if you have the media info in the filename.

Note: The automatic detection is based on the format of the first video filename that the script parses. If you have extras in your archive, you may want to specify the archive type as a CLI option: -t {tv,movie}.

Note: ffprobe is much faster than mediainfo, however it cannot fetch HDR10 or HDR10+ definitions at the present.

Disclaimer

This script is intended for people who want to maintain an archive of legitmately backed up or original videos. However, it does contain possible configuration to list Release type of a video, this is specific to pirated media (see Pirated movie release types). We do not condone Piracy in any way, it is against the law, and this feature has only been added for completeness and nerdiness.

Installation

Requirements

git
jq
ffprobe or mediainfo

To use `ffprobe` (default scanner)

sudo apt install git jq ffprobe

To use `mediainfo`

sudo apt install git jq mediainfo

Clone the reposirory

git clone [email protected]:laughingman77/video_list_csv.git

Configuration

The .env file contains the configuration for various options in the script. The example.env contains all of the default settings. Copy example.env to .env and, if needed, configure .env to your requirements:

cd video_list_csv && cp example.env .env

Usage

Copy the Movie Archive spreadsheet into your home directory.
In your new spreadsheet, duplicate the Archive Template sheet, and give it a meaningful name.

Run the script:

sh video_list_csv.sh /path/to/archive/dir/ > ~/archive.csv

./video_list_csv.sh /path/to/archive/dir/ > ~/archive.csv

Import archive.csv into your spreadsheet program.
Copy the cells from the imported CSV data and paste it into your archive sheet at cell A4.
Select the cells for the media data and sort by the Sort column.
Format the sheet to your preference.

CLI Options

CLI options allow you to override the values in .env:

-a, --trim-release-type Trim any Release type words from the Edition column (0 or 1).
-b, --trim-resolution Trim any Resolution words from the Edition column (0 or 1).
-h, -?, --help Display the help text.
-i, --default-stream: Display only the default streams for audio and video (0 or 1).
-s, --scanner Set the scanner program (ffprobe or mediainfo).
-t, --type Set the archive type (tv or movie).
-f, --force Force detect the media metadata from the file (0 or 1).
-d, --detect Detect the media metadata if not in the filename (0 or 1).
-e, --season Display season only when episode is #1 (0 or 1).
-r, --series Display series only when season is #1 and episode is #1 (0 or 1).
-x, --movie_columns Define the Movie columns.
-z, --tv_columns Define the TV columns.

.env options

scanner: (ffprobe, mediainfo) Select the preferred scanning program globally. If not set, then ffprobe takes preference but will fallback to mediainfo if it's not detected.
type: (tv or movie) Set the archive media type globally.
detect_if_not_in_filename: (0 or 1) If the audio/audio formats or resolution are not detected in the filename, then automatically detect them.
trim_release_type: (0 or 1) Trim any Release type words from the Edition column.
trim_resolution: (0 or 1) Trim any Resolution words from the Edition column.
default_stream: Only display the default streams (reverts to diplsaying all streams if no stream set to default). This affects the Audio, Video and Resolution columns.
force_detect: (0 or 1) Force detection of the video streams on all videos (this will override detect_if_not_in_filename and ignore any values found in the filename for the Resolution/Video/Audio columns).
display_season_for_1: (0 or 1) Only extract the season number if the episode is 01, it makes a TV list more readable.
display_series_for_1: (0 or 1) Only extract the series name if the season and episode are 01, it makes a TV list more readable.
tv_columns: TV archive columns to render, and their order.
movie_columns: Movie archive columns to render, and their order.

Columns config

By configuring the tv_columns and movie_columns, you can dictate which columns are rendered and in what order.

The column names are separated by the | character.

The possible columns are:

Title: (Only for Movies) the Movie title.
Edition: (Only for Movies) the release edition, ie. Director's Cut, Cinematic Cut, Special Edition, Unrated, Uncut etc.
Series: (Only for TV series) the TV series title.
Season: (Only for TV series) the TV series season.
Episode: (Only for TV series) the TV series episode.
Year: Relese date
Resolution: Video resolution (480p, 720p, 1080p, 2160, etc)
Video: The video codec and colouration streams, ie. DV, AVC, HEVC, HDR10+, etc
Audio: The audio codec, channel layout and language
Subtitles: (not in the default configuration) The list of subtitle srteam/s.s
Release Type: (not in the default configuration) Pirated release type - NOT recommended
Size (GB): File size in GB
Size (MB): File size in MB
Size (KB): File size in KB
Size (B): File size in B
Filename: Filename
Full Path: Absolute filepath and filename (this includes the mount path if the archive disk is an external disk)

Directory and Filenames

The script is designed for the directory and filenaming structure of Jellyfin, Plex and Kodi.

The script assumes a separator of space or period between words in the filename, and will do its best to detect items. Usage of hyphen could not be added to the detection, due to too many false positives.

All TV episodes should be in the format of S[0-9]{2}E[0-9]{2} (case-insensitive), examples:

S01E01
s01e01

Multiple audio/video streams

If the script falls-back to probing the video file:

If there is only one stream, it will list only the codec, as if it were in the filename, eg:
```
"AVC DV HDR10+ (en)"
```
If there are multiple streams, it will list each stream number and its codec in a comma separated list, eg:
```
"stream_1: DTS 5.1 (en), stream_2: AC3 2.0 (za)"
```

Testing

Locally

A script has ben created to manually lint all files, run:

cd video_list_csv
./test.sh

Expected output:

$ ./test.sh 
.env does not exist, generating the default .env...
Checking ./archive_list.sh
OK
Checking ./includes/ffprobe.sh
OK
Checking ./includes/functions.sh
OK
Checking ./includes/archive_list.sh
OK
Checking ./includes/mediainfo.sh
OK
Checking ./includes/progressbar.sh
OK
Checking ./video_list_csv.sh
OK
Checking ./test.sh
OK

Simulate GitHub Actions locally

CI/CD linting is implemented using GitHub Actions. You can run the pipelines locally, using nektos/act:

apt install act
cd video_list_csv
sudo act

This should give output similar to:

...
| beginning shell linting...
| not excluding any dirs
| finding and linting all shell scripts/files via shellcheck...
| [PASS]: shellcheck - successfully linted: ./ffprobe.sh
| [PASS]: shellcheck - successfully linted: ./archive_list.sh
| [PASS]: shellcheck - successfully linted: ./test.sh
| [PASS]: shellcheck - successfully linted: ./mediainfo.sh
| [PASS]: shellcheck - successfully linted: ./progressbar.sh
| finding and linting all files with shell shebangs via shellcheck...
| looking for subdirectories of bin directories that are not usable via PATH...
| looking for programs in PATH that have a filename suffix
| done
...

Thanks To

Awesome online aplications used in development and testing:

JSONLint: https://jsonlint.com/
JSON Pretty Print: https://jsonformatter.org/json-pretty-print
jq kung fu: https://jqkungfu.com/

Technical experts:

Progressbar inspiration: https://github.com/albertomosconi/posixbar
Parse command line options for a shell script (POSIX): https://gist.github.com/deshion/10d3cb5f88a21671e17a
Pseudo arrays: https://gist.github.com/biiont/290341b29657c0bb2df6
Padding a string: https://stackoverflow.com/a/74964817
Validation of dependencies: https://stackoverflow.com/questions/592620/how-can-i-check-if-a-program-exists-from-a-bash-script
Line count in a variable: https://unix.stackexchange.com/questions/482893/how-to-posix-ly-count-the-number-of-lines-in-a-string-variable
Suppress Permission Denied messages: https://stackoverflow.com/questions/762348/how-can-i-exclude-all-permission-denied-messages-from-find

video_list_csv's People

Contributors

Stargazers

Watchers

video_list_csv's Issues

Version generates quite a few false positives, does not support Plex or Kodi formats and misses out non-standard cut naming

Problem

We are not capturing all edition labels and generating quite a few false positives. We should also capture Kodi and Plex standards.

Plex correctly points out that version and edition are different

Versions all represent the same release of an item. So, you can have multiple versions (1080p vs 480p, HEVC vs H.264, MP4 vs MKV) of The Empire Strikes Back, but they’re all for the same theatrical release of the movie.

Editions represent different releases of an item. So, the “theatrical release” vs the “Special Edition” of The Empire Strikes Back. Or “Theatrical” vs “Director’s Cut” vs “Final Cut” of Blade Runner. Editions would also be appropriate for a 2D vs 3D version of a movie.

Plex

@see https://support.plex.tv/articles/200381043-multi-version-movies/
@see https://support.plex.tv/articles/multiple-editions/

title (year) {edition-Edition Name}
i.e. Blade Runner (1982) {edition-Director's Cut}.mkv

Note that Plex also adds the imdb and tmdb codes in curly brackets:

title (year) {imdb-tt0372784}.mp4
title (year) {tmdb-272}.mp4

Jellyfin

@see https://jellyfin.org/docs/general/server/media/movies/
uses everything after " - ":

title (year) - Edition Name
i.e. Blade Runner (1982) - The Final Cut

Kodi

#see https://kodi.wiki/view/Video_versions
uses everything after " - ":

title (year) - Edition Name
i.e. Blade Runner (1982) - The Final Cut [bluray]

Solution

Change Version column name to Edition.

Change the regex and regex order:

Search for '.* {edition-...} .*' (plex)
Search for '.* (\d{4}) - ...' (kodi & Jellyfin)
Search for '.* Director's Cut|Theatrical Cut|Extended Cut .*' (fallback)

Video and Audio columns can take multiple seconds to parse if reading mediainfo

This is probably because trying to do this all in jq and needs optimising

Trim release type from Edition column

In some cases, the user may want to list the release type in the filename (see Edition column: after - - Jellyfin and in {} - Plex).

This can lead to useless duplication of data.

The Edition column should strip Release type text from the field. It may be useful to set up a function that returns only the regex for release types words - this will make for a single point of code and easier maintenance.

The script does not work on Arch linux

It appears that the code is not fully posix compliant yet.

Arch links sh to the bash processing environment. This leads to issues undetected on Ubuntu:

grep: warning: stray \ before white space
grep: warning: stray \ before :
grep: warning: stray \ before -

Along with the progressbar not displaying correctly on arch.

This script should work on sh, dash as well as bash environments.

Create a new Video column to support different versions and display codecs used

Plex allows for multiple versions of as video (alongside Edition), e.g.:

Blade Runner (1982).4k.hevc {edition-Director's Cut}.mkv

We should also search for the following keywords after date (4 digits):

avc
- avc
- h 264
- h264
- mpeg-4 part 10
hevc
- hevc
- h 265
- h265
- mpeg-h part 2
vvc
- h266
- h.266
mpeg-4
- mpeg4
- mpeg 4
mpeg-2
- mpeg2
- mpeg 2
av-1
- av1
- av-1
10-bit
- 10bit
- 10 bit
dc
vc-1
hdr
hdr10
hdr10+
vp9
divx
mjpeg
3d

references:

Add CLI args for all the .env args

It would be very convenient to have CLI args for all of the .env args.

This would allow one-off changing of parameters for a single run, and as a side effect make the scripts self documenting.

Ensure the script is portable (posix systems, sh/zsh/fish/dash/etc shells)

The script was created on Ubuntu (bash shell).

It is untested on POSIX systems and other shells. Testing needs to be done on other systems/shells and possibly syntax updated to ensure the maximum portability.

Some very good comments in https://unix.stackexchange.com/questions/555099/what-is-the-most-portable-shell-and-relevant-best-practices-to-follow

Detect DV, HDR10, HDR10+

mediainfo and ffprobe are currently unable to detect these. This means that if they are not in the filename, then they are never found.

As soon as a way of detecting these is found, it should be implemented

Handle multiple Video and Audio streams

Currently, the script assumes a single Audio or Video stream, and looks at only the first.

After optimisation - fetching the full ffprobe result once and using jq to extract the required value from the stream/s, we now get the problem of potential multiple values (separated by newline). e.g.:

AVC
MJPEG
MJPEG

These need to be represented in a sensible manner, in a single cell in the CSV/spreadsheet. and also pair with matching columns, such as Resolution|Video

Add a column for Subtitles

Combine detect_resolution, detect_video_codec & detect_audio_codec into one setting

The individual settings are too fine-grained. For readability and simpler configuration, these should be combined into a single .env variable:

detect_if_not_in_filename

[optimisation] Allow the code to use ffprobe OR mediainfo

Proposal

Allow the user to specify the media scanning package or allow automatic selection in the .env file. If the config is not set, then fallback to automatic detection.

If the package specified does not exist, then fail gracefully.

If automatic, then prefer ffprobe, and fallback to mediainfo if ffprobe not present. If neither packages are available, exit gracefully.

Rationale:

mediainfo can take a long time to return results, compared to ffprobe. Sample results and times on a troublesome file:

$ time ffprobe -v quiet -print_format json  -show_format -show_streams sample_video.mkv
...
real	0m0.295s
user	0m0.040s
sys	0m0.075s

$  time mediainfo  --Output=JSON sample_video.mkv
...
real	0m23.114s
user	0m0.646s
sys	0m1.221s

The file was local (external hard disk). Repeated tests on the same file showed similar timings. The time differences (0.295s vs 23.114s) are huge!

Considerations

There may be regression on the colour sampling (this seems difficult in ffprobe)
It may be possible to reduce the mediainfo scan time by specifying only the information we need
Are there solutions other than ffprobe or mediainfo?
The fetching of the JSON data will need to be abstracted into functions, which will add a "little" extra processing time. But looking at the difference between ffprobe and mediainfo, who cares?
There appears to be no time difference between JSON and text format outputs, so this is not a factor.
There are reports of inconsistencies in the data between ffprobe and mediainfo, which is the most reliable?
It might be useful to allow an override of the cofig in the CLI, i.e ./archive_list.sh --scanner=ffprobe /path/to/dir/

Deprecate archive_list.sh

The entrypoint script archive_list.sh has been replaced by video_list_csv.sh.

At the 3 month date after the file was marked as deprecated, archive_list.sh should be deleted (end of April 2024).

Handle extras movie files

We need to handle:

Single file extras (suffixed with " - ...")
Directories containing specific extras
extras for TV Shows
extras for TV seasons

In addition what would the column structure look like?
the user should be given the configuration option to group extras under movie/series/season?

Create a new Audio column to support different versions and display codecs used

Option to display only the default steams

In some cases, the Video and Audio columns can become quite cluttered with multiple streams. Not all users was the full details, and only want to see the default stream.

A user should be able to define what level of information they want, with a simple config item in the .env.

Title column for special filenames should be prefixed with the movie name

Currently we are getting bad Title for trailers, etc:

trailer

These should be:

<dir_name> - trailer

HT media servers usually propagate the date for TV shows from the containing directory

We are currently searching for Year in the filename. While this is fine, we should also look in the containing directory, because this seems to be the standard...

├── TV Show Name (date)
│   ├── Season 1
│   │   ├── TV.Show.Name.S01E01.foo.bar.mkv
│   │   ├── ...
│   ├── Season 2
│   │   ├──...
│   ├── ...

Look for date in the filename. If not exists, then search the directory name 2-levels up.

Plex, Emby, Jellyfin and Kodi all have their own forks to implement this. It may be worth investigating integrating one of their packages somehow.

{
      "index": 0,
      "codec_name": "hevc",
      "codec_long_name": "H.265 / HEVC (High Efficiency Video Coding)",
      "profile": "Main 10",
      "codec_type": "video",
      "codec_tag_string": "[0][0][0][0]",
      "codec_tag": "0x0000",
      "width": 3840,
      "height": 2160,
      "coded_width": 3840,
      "coded_height": 2160,
      "closed_captions": 0,
      "film_grain": 0,
      "has_b_frames": 1,
      "sample_aspect_ratio": "1:1",
      "display_aspect_ratio": "16:9",
      "pix_fmt": "yuv420p10le",
      "level": 153,
      "color_range": "tv",
      "color_space": "bt2020nc",
      "color_transfer": "smpte2084",
      "color_primaries": "bt2020",
      "chroma_location": "topleft",
      "refs": 1,
      "r_frame_rate": "24000/1001",
      "avg_frame_rate": "24000/1001",
      "time_base": "1/1000",
      "start_pts": 0,
      "start_time": "0.000000",
      "extradata_size": 798,
      "disposition": {
        "default": 0,
        "dub": 0,
        "original": 0,
        "comment": 0,
        "lyrics": 0,
        "karaoke": 0,
        "forced": 0,
        "hearing_impaired": 0,
        "visual_impaired": 0,
        "clean_effects": 0,
        "attached_pic": 0,
        "timed_thumbnails": 0,
        "non_diegetic": 0,
        "captions": 0,
        "descriptions": 0,
        "metadata": 0,
        "dependent": 0,
        "still_image": 0
      },
      "tags": {
        "language": "eng",
        "BPS-eng": "64464572",
        "DURATION-eng": "01:34:05.640000000",
        "NUMBER_OF_FRAMES-eng": "135360",
        "NUMBER_OF_BYTES-eng": "45492971164",
        "SOURCE_ID-eng": "001011",
        "_STATISTICS_WRITING_APP-eng": "MakeMKV v1.17.6 linux(x64-release)",
        "_STATISTICS_WRITING_DATE_UTC-eng": "2024-03-31 01:08:10",
        "_STATISTICS_TAGS-eng": "BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES SOURCE_ID"
      },
      "side_data_list": [
        {
          "side_data_type": "DOVI configuration record",
          "dv_version_major": 1,
          "dv_version_minor": 0,
          "dv_profile": 7,
          "dv_level": 6,
          "rpu_present_flag": 1,
          "el_present_flag": 1,
          "bl_present_flag": 1,
          "dv_bl_signal_compatibility_id": 6
        }
      ]
    }

Move archive_list.sh into the includes/ directory

Having an entry list script name that is different to the repository name is a little confusing.

Tasks:

Move archive_list.sh into /includes/
Create a wrapper script in the root directory: video_list_csv.sh
Clone video_list_csv.sh into archive_list.sh in the root directory, for backwards compatibility.
- This will be deprecated at a later date