fredmorcos / tvrank Goto Github PK

View Code? Open in Web Editor NEW

7.0 5.0 6.0 20.49 MB

Movies & series ranking

License: MIT License

Rust 100.00%

movies series imdb tvdb tmdb

tvrank's Introduction

`TVrank`: A Rust library and command-line utility for ranking movies and series

Github Repository

TVrank is a library and command-line utility written in Rust for querying and ranking information about movies and series. It can be used to query a single title or scan media directories.

Currently, TVrank only supports IMDB's TSV dumps which it automatically downloads, caches and periodically updates. More work is required to be able to support and cache live-query services like TMDB and TVDB.

The in-memory database is reasonably fast and its on-disk persistent cache format reasonably efficient.

The library's documentation is badly lacking but there is an example on how to use it.

For now, the command-line utility of TVrank works well and fast enough to be usable e.g. instead of searching for a title through DuckDuckGo using something like !imdb TITLE. In case you still want to see the IMDB page for a title, TVrank will print out a direct link for each search result for direct access from the terminal.

Note that TVrank depends on the flate2 crate for decompression of IMDB TSV dumps. flate2 is extremely slow when built in debug mode, so it is recommended to always run TVrank in release mode unless there are good reasons not to. By default, release mode is built with debugging information enabled for convenience during development.

Usage

For information on how to use the library, see below.

The TVrank command-line interface has a few modes accessible through the use of sub-commands:

search "KEYWORDS..." to search by keywords.
search "KEYWORDS... (YYYY)" to search by keywords in a specific year.
search "TITLE (YYYY)" --exact to search for and exact title in a specific year.
search "TITLE" --exact to search for an exact title (-e also means exact).
scan-movies and scan-series to make batch queries based on directory scans.
mark to mark a directory with a title information file (tvrank.json).

Examples

To search for a specific title:

$ tvrank search "the great gatsby (2013)" -e

To search for all titles containing "the", "great" and "gatsby" in the year 2013:

$ tvrank search "the great gatsby (2013)"

To search based on keywords:

$ tvrank search "the great gatsby"

To search based on an exact title:

$ tvrank search "the great gatsby" -e

To query a series directory:

$ tvrank scan-series <SERIES_MEDIA_DIR>

Also, by default TVrank will sort by rating, year and title. To instead sort by year, rating and title, --sort-by-year can be passed before any sub-command:

$ tvrank --sort-by-year search "house of cards"

You can also limit the output of movies and series to the top N entries:

$ tvrank search "the great gatsby" --top 2

You can change the output format to json or yaml:

$ tvrank search "the great gatsby" --output json

Batch Queries

TVrank can recursively scan directories and print out information about titles it finds. This is achieved using the scan-movies and scan-series subcommands.

Movie Batch Queries

TVrank expects movie directories to be under a top-level movies media directory (herein called movies), as follows:

movies
├── ...
├── 127 Hours (2010)
├── 12 Mighty Orphans (2021)
├── 12 Monkeys (1995)
├── 12 Years a Slave (2013)
├── 13 Hours The Secret Soldiers of Benghazi (2016)
├── ...

Movie sub-directories are expected to follow the TITLE (YYYY) format where the TITLE matches either the primary or original movie title.

If a movie sub-directory does not adhere to this format, TVrank will recursively search it for more titles. An example of that is as follows:

movies
├── ...
├── The Naked Gun
│   ├── The Naked Gun (1988)
│   ├── The Naked Gun 2½ The Smell of Fear (1991)
│   └── The Naked Gun 33 1-3 The Final Insult (1994)
├── ...

Series Batch Queries

TVrank also expects series directories to be under a top-level series media directory (herein called series) following either TITLE or TITLE (YYYY) format. The TITLE (YYYY) format can be used to easily disambiguate similarly-titled series. Examples:

series
├── ...
├── House of Cards (1990)
├── Killing Eve
├── Kingdom (2019)
├── ...

Handling Ambiguity in Batch Queries

Sometimes it is impossible to distinguish between titles just from their original/primary title and release year, this is due to multiple movies or series being released during the same year using the same exact title.

To handle this issue, TVrank supports the ability to explicitly provide title information files (called tvrank.json) under the corresponding title directory. These files are detected when using the scan-movies and scan-series sub-commands and are used for exact identification using the title's unique ID.

A tvrank.json file looks like this:

{
  "imdb": {
    "id": "ttXXXXXXXX"
  }
}

where "ttXXXXXXXX" is the IMDB title id shown under the IMDB ID column or available as part of the IMDB URL of a title.

You can ask TVrank to write the title information (tvrank.json) file for you by using the mark sub-command and passing it the title's directory and ID that you would like to write.

tvrank mark "movies/The Great Gatsby (2013)" tt1343092

This will results in a file called movies/The Great Gatsby (2013)/tvrank.json containing the following information:

{
  "imdb": {
    "id": "tt1343092"
  }
}

If a tvrank.json file already exists, TVrank will refuse to overwrite it. To force overwriting it, the --force flag can be used.

Verbosity

To print out more information about what the application is doing, use -v before any sub-command. Multiple occurrences of -v on the command-line will increase the verbosity level:

$ tvrank -vvv --sort-by-year search "city of god"

The following options can come before or after the sub-command. The latter have precedence over the former.

--verbose
--sort-by-year
--force-update
--top <N>
--color
--output [table|json|yaml]

To find help, see the help sub-command:

$ tvrank help
$ tvrank help search
$ tvrank help scan-series
$ tvrank help scan-movies

Screencast

Please note that the screencast is slightly outdated. Please use the sub-commands described above instead of what is shown in the screencast.

Disabling Colors

By default, TVrank displays some of the content with color. However, it supports the NO_COLOR environment variable. When NO_COLOR is set, TVrank will not use color in its output. This can also be overridden by passing the --color argument on the command-line:

NO_COLOR=1 tvrank search "the great gatsby"           # Without colors
NO_COLOR=1 tvrank search "the great gatsby" --color   # With colors

Installation

It is recommended to use the pre-built releases.

From source

Installing TVrank from this repository's sources requires Cargo, a Rust compiler and a toolchain to be available. Once those are ready and the repository's contents are cloned, a simple build and install through cargo should suffice:

$ git clone https://github.com/fredmorcos/tvrank
$ cd tvrank
$ cargo install --path cli

From Crates.io

Installing TVrank from Crates.io also requires Cargo, a Rust compiler and a toolchain to be available. Once those are ready, a simple build and install using cargo should suffice:

$ cargo install tvrank-cli`

Using the library

Add the dependency to your Cargo.toml:

[dependencies]
tvrank = "0.8"

Or, using cargo add:

$ cargo add tvrank

Include the Imdb type:

use tvrank::imdb::{Imdb, ImdbQuery};
use tvrank::utils::search::SearchString;

Create a directory for the cache using the tempfile crate then create the database service. The closure passed to the service constructor is a callback for progress updates and is a FnMut to be able to e.g. mutate a progress bar object.

let cache_dir = tempfile::Builder::new().prefix("tvrank_").tempdir()?;
let imdb = Imdb::new(cache_dir.path(), false, |_, _| {})?;

Afterwards, one can query the database using either imdb.by_id(...), imdb.by_title(...), imdb.by_title_and_year(...) or imdb.by_keywords(...), and print out some information about the results.

let title = "city of god";
let year = 2002;

println!("Matches for {} and {:?}:", title, year);

let search_string = SearchString::try_from(title)?;
for title in imdb.by_title_and_year(&search_string, year, ImdbQuery::Movies)? {
  let id = title.title_id();

  println!("ID: {}", id);
  println!("Primary name: {}", title.primary_title());
  if let Some(original_title) = title.original_title() {
    println!("Original name: {}", original_title);
  } else {
    println!("Original name: N/A");
  }

  if let Some((rating, votes)) = title.rating() {
    println!("Rating: {}/100 ({} votes)", rating, votes);
  } else {
    println!("Rating: N/A");
  }

  if let Some(runtime) = title.runtime() {
    println!("Runtime: {}", humantime::format_duration(runtime));
  } else {
    println!("Runtime: N/A");
  }

  println!("Genres: {}", title.genres());
  println!("--");
}

See the query.rs example under the lib/examples/query directory for a fully-functioning version of the above.

tvrank's People

Contributors

Stargazers

Watchers

Forkers

mob-programming-meetup olsi-b caglaryucekaya fiplox oylenshpeegul flasherss

tvrank's Issues

Improved search function

The search function should allow for partial and/or incomplete matches. For example:

a search for "the a team" should yield "The A-Team" 2010 movie or the 1983 series "The A-Team"
a search for "equilib" should identify at least two movies titled "Equilibrium" (from 2002 and 2017)
ideally, the "strength" of the match should be customisable via flags (-e exact/strong, -d default, -t tentative/weak)

Support YAML output

Support results output in YAML format, as an example:

---
  movies:
    - primary title: ...
      original title: ...
      ...
    - primary title: ...
      ...
  series:
    - primary title: ...
      ...

Support alternative REST-based services

Currently TVrank does not support REST-based services at all, so some infrastructure and design work will be needed for that. Some of the services that could be used are:

An IMDB API
OMDB
The TVDB
TMDB

Turn project into a workspace

Turn the project into a workspace, this will help us with a few things:

Split dependencies between the TVrank library and command-line binary.
Split dependencies for tests (e.g. indoc, tempfile).
In the future, be able to add different binaries (e.g. different GUI implementations).

Share rustup and cargo caches between CI workflow jobs

Share the rustup installation directory, cargo installation directory and cargo dependency cache between CI workflow jobs to avoid re-downloading and re-building them.

Find alternatives to `actions-rs/toolchain` and `actions-rs/cargo`

Github is deprecating NodeJS 12 actions, and both actions-rs/toolchain and actions-rs/cargo are stuck on there. The last releases for each were 2020 and 2019, respectively. The projects are probably dead and we should figure out whether there are alternatives.

Add doctests to the internal `TVrank` library

Also see #25.

Add a `mark` subcommand to write the `tvrank.json` file for a directory entry

It is currently possible to add a file called tvrank.json under a title's directory - when using the movies-dir or series-dir subcommands - to force the use of title information (TitleInfo) like the IMDB ID when other pieces of information like title and year are ambiguous.

Writing that file by hand is a bit annoying, TVrank should have a subcommand called mark which takes a title's directory and an IMDB ID and writes the tvrank.json for the user.

Add tests to ensure that queries with "the" as part of another word continue to work correctly

Add tests to ensure that #49 (which was fixed in #50) does not happen again.

TVrank cannot handle the cancellation of database downloads

TVrank leaves behind semi-complete files when the database downloads are cancelled. The current workaround is to delete $XDG_CACHE_HOME/tvrank/* and re-run TVrank and wait for the downloads to complete.

TVrank should instead detect that the user is canceling while downloads are running, and clean up after itself: either by deleting whatever has been downloaded and processed so far, or finding a way to resume from that during the next run.

Separate workflows to speed up CI

Workflows can be separated into:

Lint (Linux)
Documentation (Linux)
Build and test (Linux, Windows, MacOS)

And also look at the release workflow and whether it can be split.

Additionally, it makes sense to share the cargo dependency cache between workflow jobs to avoid re-downloading and re-building them.

`TitleID`'s `try_from` impl should fail when there are trailing non-numeric characters

Currently the atoi implementation will consume the input as long as there are digits, and will return successfully when either a non-digit character is reached or the end of the input is reached.

This means that invalid IDs like tt1234abc are still accepted by TVrank as tt1234. TVrank should instead reject such IDs.

Document the internal TVrank library

A good start for documentation would be the title*.rs and utils.rs files.
Then, in order: genre.rs, ratings.rs, error.rs, db.rs, service.rs and mod.rs.

Fix speed reporting, and change spinner to progress bar

Currently, when downloading the IMDB database files, TVrank shows a spinner with unreasonably high download rate. The reason behind this is two-fold:

IMDB database files are gzip compressed.
For efficiency reasons, TVrank uses a custom binary on-disk database format that is different from the IMDB database format.

The files are downloaded, unzipped, parsed and converted, then written to disk in a streaming fashion where each of those functions streams into the next one. The spinner is shown - along with an incorrect download rate - because the information is not in relation to the original compressed file size, but in relation to the uncompressed file size.

Updating the progress object between the download and the decompression streams would allow the display of an accurate download rate, and would enable the use of a progress bar instead of a spinner.

Support custom IMDb database sources

Add support for providing different database sources to download the IMDb title database than what is currently hard-coded.

Move the `progress` module to `utils` or `utils::io`

The progress module should be moved to either under utils or utils::io or some such.

Combined series and movie search

An option to simultaneously search both movie and series databases for a given title would be very helpful.

Add tests for the internal TVrank library

Currently the internal TVrank library isn't very thoroughly tested.

A good start for adding tests would be the title*.rs and utils.rs files.
Then, in order: genre.rs, ratings.rs, error.rs, db.rs, service.rs and mod.rs.

Support the display of only the top N results

A command-line option like --results 3 or --top 3 could be used to only show the top 3 search results of movies and series.

Document the code in the `TVrank` binary

Code comments and documentation is non-existent and some parts of the binary's code isn't always clear.

Searching for "the weather man" or "the godfather" reveals no results

As the title says. Searching for "weather man" works fine and shows titles called "The Weather Man", but searching for "the weather man" reveals no series and movies matches:

>  tvrank title "the weather man"
No movie matches found for `weather, the, man`
No series matches found for `weather, the, man`
Total time: 399ms 941us 178ns

>  tvrank title "the weatherman"
No movie matches found for `the, weatherman`
No series matches found for `the, weatherman`
Total time: 396ms 281us 867ns

It also doesn't seem to be the "the" in the search keywords causing the problem:

tvrank title "amazing spider man"
Found 30 movie matches for `spider, amazing, man`:
...
Found 2 series matches for `spider, amazing, man`:

tvrank title "the amazing spider man"
Found 27 movie matches for `amazing, the, man, spider`:
...
Found 1 series match for `amazing, the, man, spider`:

Move from `structopt` to `clap`

structopt is now in maintenance mode and TVrank should move to clap instead which has incorporated almost all of structopt's features.

As clap v3 is now out, and the structopt features are integrated into (almost as-is), structopt is now in maintenance mode: no new feature will be added.

https://docs.rs/structopt/latest/structopt/

The `mark` sub-command should issue a warning when the directory's title/year does not match the given ID

Update the screencast showcase on the README file

The screencast on the README file is quite outdated and should be updated.

Evaluate `tabled` as a potential replacement for printing tables

tabled: https://github.com/zhiburt/tabled/

Respect the type of output (e.g. pipe, file, stdout)

TVrank should respect the type of output and print contents accordingly. As an example, when printing out to the a terminal, colors and tables should be rendered by default as usual. But when printing to a file or a pipe, contents should be printed in a way that is in line with other UNIX utilities (one line per entry).

One example is to print out the contents in tab-separated values.

Add tests for the `TVrank` binary

Tests for the binary are non-existent.

Make the TVrank command-line interface more convenient

Currently the TVrank command-line interface offers "application-wide" parameters like --force-update and --sort-by-year, which means that they cannot be used after a subcommand is specified. It would be great to be able to use them as part of subcommands to make the interface more convenient.

Example

Currently, passing --sort-by-year looks like so:

tvrank --sort-by-year title "foo" --exact

It should be possible to pass it as follows:

tvrank title "foo" --exact --sort-by-year

Searching for movie titles with 2 characters (eg: Up) it displays all existing movies and tv series

./tvrank -vvvv title 'Up'
[2022-01-24T20:21:17Z DEBUG tvrank] Cache directory: /Users/arpad.kosorus/Library/Caches/com.fredmorcos.Fred-Morcos.tvrank
[2022-01-24T20:21:17Z DEBUG tvrank] Created cache directory
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database exists and is less than a month old
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] Read IMDB database file in 55ms 443us 23ns
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] Parsed IMDB database in 500ms 305us 915ns
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database (thread 0) contains 317372 movies and 43280 series (360652 entries)
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database (thread 1) contains 317771 movies and 43229 series (361000 entries)
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database (thread 2) contains 314989 movies and 44911 series (359900 entries)
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database (thread 3) contains 314109 movies and 42491 series (356600 entries)
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database (thread 4) contains 310729 movies and 44271 series (355000 entries)
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database (thread 5) contains 315007 movies and 43293 series (358300 entries)
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database contains 1889977 movies and 261475 series (2151452 entries)
[2022-01-24T20:21:17Z DEBUG tvrank] Loaded IMDB database in 556ms 27us 476ns
[2022-01-24T20:21:17Z DEBUG tvrank] Could not parse title and year from `Up`
[2022-01-24T20:21:17Z DEBUG tvrank] Going to use `Up` as keywords for search query
[2022-01-24T20:21:17Z DEBUG tvrank] Keywords: []
Found 1889977 movie matches for ``:

Support partial keyword matches

A search for equilib should return matches like the following (as an example):

Equilibrium
The Equilibrium

This requires keyword indexing: #3

Support keyword-based searching

Currently, a search for perks wallflower returns nothing, but should return:

Perks of being a Wallflower

Additionally, a command-line parameter (e.g. --color) should be added to override the NO_COLOR environment variable.

Ideally, --color should take one of the following values:

on (the default) means that color and unicode art is output only when stdout is a terminal.
off means to never output color and unicode art.
always means to always output color and unicode art even when stdout is not a terminal.

Uppercase keywords retrieve no results - MacOS Monterey 12.1

I tried multiple movie titles using uppercase letters and got no results:

./tvrank -vvvv title "Coach Carter"
[2022-01-24T19:31:01Z DEBUG tvrank] Cache directory: /Users/arpad.kosorus/Library/Caches/com.fredmorcos.Fred-Morcos.tvrank
[2022-01-24T19:31:01Z DEBUG tvrank] Created cache directory
[2022-01-24T19:31:01Z DEBUG tvrank::imdb::service] IMDB database exists and is less than a month old
[2022-01-24T19:31:01Z DEBUG tvrank::imdb::service] Read IMDB database file in 46ms 547us 294ns
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] Parsed IMDB database in 481ms 166us 453ns
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] IMDB database (thread 0) contains 314751 movies and 43549 series (358300 entries)
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] IMDB database (thread 1) contains 318222 movies and 43130 series (361352 entries)
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] IMDB database (thread 2) contains 314007 movies and 44693 series (358700 entries)
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] IMDB database (thread 3) contains 312181 movies and 43319 series (355500 entries)
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] IMDB database (thread 4) contains 311612 movies and 43588 series (355200 entries)
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] IMDB database (thread 5) contains 319204 movies and 43196 series (362400 entries)
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] IMDB database contains 1889977 movies and 261475 series (2151452 entries)
[2022-01-24T19:31:02Z DEBUG tvrank] Loaded IMDB database in 527ms 926us 13ns
[2022-01-24T19:31:02Z DEBUG tvrank] Could not parse title and year from `Coach Carter`
[2022-01-24T19:31:02Z DEBUG tvrank] Going to use `Coach Carter` as keywords for search query
[2022-01-24T19:31:02Z DEBUG tvrank] Keywords: ["Carter", "Coach"]
No movie matches found for `Carter Coach`
No series matches found for `Carter Coach`
[2022-01-24T19:31:02Z DEBUG tvrank] IMDB query took 162ms 335us 936ns
Total time: 690ms 616us 749ns

Support JSON output

Support results output in JSON format, as an example:

{
  "movies": [
    ...
  ],
  "series": [
    ...
  ]
}