Giter Club home page Giter Club logo

tvrank's Introduction

TVrank: A Rust library and command-line utility for ranking movies and series

License Release (latest SemVer) Crates.io CI
docs.rs Github Open Issues Github Closed Issues Github Open Pull Requests Github Closed Pull Requests Contributors

Github Repository

TVrank is a library and command-line utility written in Rust for querying and ranking information about movies and series. It can be used to query a single title or scan media directories.

Currently, TVrank only supports IMDB's TSV dumps which it automatically downloads, caches and periodically updates. More work is required to be able to support and cache live-query services like TMDB and TVDB.

The in-memory database is reasonably fast and its on-disk persistent cache format reasonably efficient.

The library's documentation is badly lacking but there is an example on how to use it.

For now, the command-line utility of TVrank works well and fast enough to be usable e.g. instead of searching for a title through DuckDuckGo using something like !imdb TITLE. In case you still want to see the IMDB page for a title, TVrank will print out a direct link for each search result for direct access from the terminal.

Note that TVrank depends on the flate2 crate for decompression of IMDB TSV dumps. flate2 is extremely slow when built in debug mode, so it is recommended to always run TVrank in release mode unless there are good reasons not to. By default, release mode is built with debugging information enabled for convenience during development.

Usage

For information on how to use the library, see below.

The TVrank command-line interface has a few modes accessible through the use of sub-commands:

  • search "KEYWORDS..." to search by keywords.
  • search "KEYWORDS... (YYYY)" to search by keywords in a specific year.
  • search "TITLE (YYYY)" --exact to search for and exact title in a specific year.
  • search "TITLE" --exact to search for an exact title (-e also means exact).
  • scan-movies and scan-series to make batch queries based on directory scans.
  • mark to mark a directory with a title information file (tvrank.json).

Examples

To search for a specific title:

$ tvrank search "the great gatsby (2013)" -e

To search for all titles containing "the", "great" and "gatsby" in the year 2013:

$ tvrank search "the great gatsby (2013)"

To search based on keywords:

$ tvrank search "the great gatsby"

To search based on an exact title:

$ tvrank search "the great gatsby" -e

To query a series directory:

$ tvrank scan-series <SERIES_MEDIA_DIR>

Also, by default TVrank will sort by rating, year and title. To instead sort by year, rating and title, --sort-by-year can be passed before any sub-command:

$ tvrank --sort-by-year search "house of cards"

You can also limit the output of movies and series to the top N entries:

$ tvrank search "the great gatsby" --top 2

You can change the output format to json or yaml:

$ tvrank search "the great gatsby" --output json

Batch Queries

TVrank can recursively scan directories and print out information about titles it finds. This is achieved using the scan-movies and scan-series subcommands.

Movie Batch Queries

TVrank expects movie directories to be under a top-level movies media directory (herein called movies), as follows:

movies
├── ...
├── 127 Hours (2010)
├── 12 Mighty Orphans (2021)
├── 12 Monkeys (1995)
├── 12 Years a Slave (2013)
├── 13 Hours The Secret Soldiers of Benghazi (2016)
├── ...

Movie sub-directories are expected to follow the TITLE (YYYY) format where the TITLE matches either the primary or original movie title.

If a movie sub-directory does not adhere to this format, TVrank will recursively search it for more titles. An example of that is as follows:

movies
├── ...
├── The Naked Gun
│   ├── The Naked Gun (1988)
│   ├── The Naked Gun 2½ The Smell of Fear (1991)
│   └── The Naked Gun 33 1-3 The Final Insult (1994)
├── ...

Series Batch Queries

TVrank also expects series directories to be under a top-level series media directory (herein called series) following either TITLE or TITLE (YYYY) format. The TITLE (YYYY) format can be used to easily disambiguate similarly-titled series. Examples:

series
├── ...
├── House of Cards (1990)
├── Killing Eve
├── Kingdom (2019)
├── ...

Handling Ambiguity in Batch Queries

Sometimes it is impossible to distinguish between titles just from their original/primary title and release year, this is due to multiple movies or series being released during the same year using the same exact title.

To handle this issue, TVrank supports the ability to explicitly provide title information files (called tvrank.json) under the corresponding title directory. These files are detected when using the scan-movies and scan-series sub-commands and are used for exact identification using the title's unique ID.

A tvrank.json file looks like this:

{
  "imdb": {
    "id": "ttXXXXXXXX"
  }
}

where "ttXXXXXXXX" is the IMDB title id shown under the IMDB ID column or available as part of the IMDB URL of a title.

You can ask TVrank to write the title information (tvrank.json) file for you by using the mark sub-command and passing it the title's directory and ID that you would like to write.

tvrank mark "movies/The Great Gatsby (2013)" tt1343092

This will results in a file called movies/The Great Gatsby (2013)/tvrank.json containing the following information:

{
  "imdb": {
    "id": "tt1343092"
  }
}

If a tvrank.json file already exists, TVrank will refuse to overwrite it. To force overwriting it, the --force flag can be used.

Verbosity

To print out more information about what the application is doing, use -v before any sub-command. Multiple occurrences of -v on the command-line will increase the verbosity level:

$ tvrank -vvv --sort-by-year search "city of god"

The following options can come before or after the sub-command. The latter have precedence over the former.

--verbose
--sort-by-year
--force-update
--top <N>
--color
--output [table|json|yaml]

To find help, see the help sub-command:

$ tvrank help
$ tvrank help search
$ tvrank help scan-series
$ tvrank help scan-movies

Screencast

Please note that the screencast is slightly outdated. Please use the sub-commands described above instead of what is shown in the screencast.

Disabling Colors

By default, TVrank displays some of the content with color. However, it supports the NO_COLOR environment variable. When NO_COLOR is set, TVrank will not use color in its output. This can also be overridden by passing the --color argument on the command-line:

NO_COLOR=1 tvrank search "the great gatsby"           # Without colors
NO_COLOR=1 tvrank search "the great gatsby" --color   # With colors

Installation

It is recommended to use the pre-built releases.

From source

Installing TVrank from this repository's sources requires Cargo, a Rust compiler and a toolchain to be available. Once those are ready and the repository's contents are cloned, a simple build and install through cargo should suffice:

$ git clone https://github.com/fredmorcos/tvrank
$ cd tvrank
$ cargo install --path cli

From Crates.io

Installing TVrank from Crates.io also requires Cargo, a Rust compiler and a toolchain to be available. Once those are ready, a simple build and install using cargo should suffice:

$ cargo install tvrank-cli`

Using the library

Add the dependency to your Cargo.toml:

[dependencies]
tvrank = "0.8"

Or, using cargo add:

$ cargo add tvrank

Include the Imdb type:

use tvrank::imdb::{Imdb, ImdbQuery};
use tvrank::utils::search::SearchString;

Create a directory for the cache using the tempfile crate then create the database service. The closure passed to the service constructor is a callback for progress updates and is a FnMut to be able to e.g. mutate a progress bar object.

let cache_dir = tempfile::Builder::new().prefix("tvrank_").tempdir()?;
let imdb = Imdb::new(cache_dir.path(), false, |_, _| {})?;

Afterwards, one can query the database using either imdb.by_id(...), imdb.by_title(...), imdb.by_title_and_year(...) or imdb.by_keywords(...), and print out some information about the results.

let title = "city of god";
let year = 2002;

println!("Matches for {} and {:?}:", title, year);

let search_string = SearchString::try_from(title)?;
for title in imdb.by_title_and_year(&search_string, year, ImdbQuery::Movies)? {
  let id = title.title_id();

  println!("ID: {}", id);
  println!("Primary name: {}", title.primary_title());
  if let Some(original_title) = title.original_title() {
    println!("Original name: {}", original_title);
  } else {
    println!("Original name: N/A");
  }

  if let Some((rating, votes)) = title.rating() {
    println!("Rating: {}/100 ({} votes)", rating, votes);
  } else {
    println!("Rating: N/A");
  }

  if let Some(runtime) = title.runtime() {
    println!("Runtime: {}", humantime::format_duration(runtime));
  } else {
    println!("Runtime: N/A");
  }

  println!("Genres: {}", title.genres());
  println!("--");
}

See the query.rs example under the lib/examples/query directory for a fully-functioning version of the above.

tvrank's People

Contributors

caglaryucekaya avatar dependabot[bot] avatar dpolivaev avatar fredmorcos avatar henningholmde avatar koushik-ms avatar olsi-b avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tvrank's Issues

Improved search function

The search function should allow for partial and/or incomplete matches. For example:

  • a search for "the a team" should yield "The A-Team" 2010 movie or the 1983 series "The A-Team"
  • a search for "equilib" should identify at least two movies titled "Equilibrium" (from 2002 and 2017)
  • ideally, the "strength" of the match should be customisable via flags (-e exact/strong, -d default, -t tentative/weak)

Support YAML output

Support results output in YAML format, as an example:

---
  movies:
    - primary title: ...
      original title: ...
      ...
    - primary title: ...
      ...
  series:
    - primary title: ...
      ...

Turn project into a workspace

Turn the project into a workspace, this will help us with a few things:

  • Split dependencies between the TVrank library and command-line binary.
  • Split dependencies for tests (e.g. indoc, tempfile).
  • In the future, be able to add different binaries (e.g. different GUI implementations).

Add a `mark` subcommand to write the `tvrank.json` file for a directory entry

It is currently possible to add a file called tvrank.json under a title's directory - when using the movies-dir or series-dir subcommands - to force the use of title information (TitleInfo) like the IMDB ID when other pieces of information like title and year are ambiguous.

Writing that file by hand is a bit annoying, TVrank should have a subcommand called mark which takes a title's directory and an IMDB ID and writes the tvrank.json for the user.

TVrank cannot handle the cancellation of database downloads

TVrank leaves behind semi-complete files when the database downloads are cancelled. The current workaround is to delete $XDG_CACHE_HOME/tvrank/* and re-run TVrank and wait for the downloads to complete.

TVrank should instead detect that the user is canceling while downloads are running, and clean up after itself: either by deleting whatever has been downloaded and processed so far, or finding a way to resume from that during the next run.

Separate workflows to speed up CI

Workflows can be separated into:

  • Lint (Linux)
  • Documentation (Linux)
  • Build and test (Linux, Windows, MacOS)

And also look at the release workflow and whether it can be split.

Additionally, it makes sense to share the cargo dependency cache between workflow jobs to avoid re-downloading and re-building them.

Document the internal TVrank library

A good start for documentation would be the title*.rs and utils.rs files.
Then, in order: genre.rs, ratings.rs, error.rs, db.rs, service.rs and mod.rs.

Fix speed reporting, and change spinner to progress bar

Currently, when downloading the IMDB database files, TVrank shows a spinner with unreasonably high download rate. The reason behind this is two-fold:

  1. IMDB database files are gzip compressed.
  2. For efficiency reasons, TVrank uses a custom binary on-disk database format that is different from the IMDB database format.

The files are downloaded, unzipped, parsed and converted, then written to disk in a streaming fashion where each of those functions streams into the next one. The spinner is shown - along with an incorrect download rate - because the information is not in relation to the original compressed file size, but in relation to the uncompressed file size.

Updating the progress object between the download and the decompression streams would allow the display of an accurate download rate, and would enable the use of a progress bar instead of a spinner.

Add tests for the internal TVrank library

Currently the internal TVrank library isn't very thoroughly tested.

A good start for adding tests would be the title*.rs and utils.rs files.
Then, in order: genre.rs, ratings.rs, error.rs, db.rs, service.rs and mod.rs.

Searching for "the weather man" or "the godfather" reveals no results

As the title says. Searching for "weather man" works fine and shows titles called "The Weather Man", but searching for "the weather man" reveals no series and movies matches:

>  tvrank title "the weather man"
No movie matches found for `weather, the, man`
No series matches found for `weather, the, man`
Total time: 399ms 941us 178ns
>  tvrank title "the weatherman"
No movie matches found for `the, weatherman`
No series matches found for `the, weatherman`
Total time: 396ms 281us 867ns

It also doesn't seem to be the "the" in the search keywords causing the problem:

tvrank title "amazing spider man"
Found 30 movie matches for `spider, amazing, man`:
...
Found 2 series matches for `spider, amazing, man`:
tvrank title "the amazing spider man"
Found 27 movie matches for `amazing, the, man, spider`:
...
Found 1 series match for `amazing, the, man, spider`:

Respect the type of output (e.g. pipe, file, stdout)

TVrank should respect the type of output and print contents accordingly. As an example, when printing out to the a terminal, colors and tables should be rendered by default as usual. But when printing to a file or a pipe, contents should be printed in a way that is in line with other UNIX utilities (one line per entry).

One example is to print out the contents in tab-separated values.

Make the TVrank command-line interface more convenient

Currently the TVrank command-line interface offers "application-wide" parameters like --force-update and --sort-by-year, which means that they cannot be used after a subcommand is specified. It would be great to be able to use them as part of subcommands to make the interface more convenient.

Example

Currently, passing --sort-by-year looks like so:

tvrank --sort-by-year title "foo" --exact

It should be possible to pass it as follows:

tvrank title "foo" --exact --sort-by-year

Searching for movie titles with 2 characters (eg: Up) it displays all existing movies and tv series

./tvrank -vvvv title 'Up'
[2022-01-24T20:21:17Z DEBUG tvrank] Cache directory: /Users/arpad.kosorus/Library/Caches/com.fredmorcos.Fred-Morcos.tvrank
[2022-01-24T20:21:17Z DEBUG tvrank] Created cache directory
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database exists and is less than a month old
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] Read IMDB database file in 55ms 443us 23ns
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] Parsed IMDB database in 500ms 305us 915ns
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database (thread 0) contains 317372 movies and 43280 series (360652 entries)
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database (thread 1) contains 317771 movies and 43229 series (361000 entries)
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database (thread 2) contains 314989 movies and 44911 series (359900 entries)
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database (thread 3) contains 314109 movies and 42491 series (356600 entries)
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database (thread 4) contains 310729 movies and 44271 series (355000 entries)
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database (thread 5) contains 315007 movies and 43293 series (358300 entries)
[2022-01-24T20:21:17Z DEBUG tvrank::imdb::service] IMDB database contains 1889977 movies and 261475 series (2151452 entries)
[2022-01-24T20:21:17Z DEBUG tvrank] Loaded IMDB database in 556ms 27us 476ns
[2022-01-24T20:21:17Z DEBUG tvrank] Could not parse title and year from `Up`
[2022-01-24T20:21:17Z DEBUG tvrank] Going to use `Up` as keywords for search query
[2022-01-24T20:21:17Z DEBUG tvrank] Keywords: []
Found 1889977 movie matches for ``:

UP movie title

Support partial keyword matches

A search for equilib should return matches like the following (as an example):

  • Equilibrium
  • The Equilibrium

This requires keyword indexing: #3

TVrank should respect the `NO_COLOR` environment variable

When the NO_COLOR environment variable is set, TVrank should refrain from displaying colors (and perhaps even displaying unicode art for tables).

See https://no-color.org/ for more information.

Additionally, a command-line parameter (e.g. --color) should be added to override the NO_COLOR environment variable.

Ideally, --color should take one of the following values:

  • on (the default) means that color and unicode art is output only when stdout is a terminal.
  • off means to never output color and unicode art.
  • always means to always output color and unicode art even when stdout is not a terminal.

Uppercase keywords retrieve no results - MacOS Monterey 12.1

I tried multiple movie titles using uppercase letters and got no results:

./tvrank -vvvv title "Coach Carter"
[2022-01-24T19:31:01Z DEBUG tvrank] Cache directory: /Users/arpad.kosorus/Library/Caches/com.fredmorcos.Fred-Morcos.tvrank
[2022-01-24T19:31:01Z DEBUG tvrank] Created cache directory
[2022-01-24T19:31:01Z DEBUG tvrank::imdb::service] IMDB database exists and is less than a month old
[2022-01-24T19:31:01Z DEBUG tvrank::imdb::service] Read IMDB database file in 46ms 547us 294ns
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] Parsed IMDB database in 481ms 166us 453ns
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] IMDB database (thread 0) contains 314751 movies and 43549 series (358300 entries)
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] IMDB database (thread 1) contains 318222 movies and 43130 series (361352 entries)
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] IMDB database (thread 2) contains 314007 movies and 44693 series (358700 entries)
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] IMDB database (thread 3) contains 312181 movies and 43319 series (355500 entries)
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] IMDB database (thread 4) contains 311612 movies and 43588 series (355200 entries)
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] IMDB database (thread 5) contains 319204 movies and 43196 series (362400 entries)
[2022-01-24T19:31:02Z DEBUG tvrank::imdb::service] IMDB database contains 1889977 movies and 261475 series (2151452 entries)
[2022-01-24T19:31:02Z DEBUG tvrank] Loaded IMDB database in 527ms 926us 13ns
[2022-01-24T19:31:02Z DEBUG tvrank] Could not parse title and year from `Coach Carter`
[2022-01-24T19:31:02Z DEBUG tvrank] Going to use `Coach Carter` as keywords for search query
[2022-01-24T19:31:02Z DEBUG tvrank] Keywords: ["Carter", "Coach"]
No movie matches found for `Carter Coach`
No series matches found for `Carter Coach`
[2022-01-24T19:31:02Z DEBUG tvrank] IMDB query took 162ms 335us 936ns
Total time: 690ms 616us 749ns

Support JSON output

Support results output in JSON format, as an example:

{
  "movies": [
    ...
  ],
  "series": [
    ...
  ]
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.