Giter Club home page Giter Club logo

oscar-data-download's Introduction

This script downloads data from Oscar movies based on a list of awards nominees.

This project is a work in progress, expect changes and unstability.

The script expects and array with an object for each award with the following format:

 {
    shortName,
    winner,
    nominees: [
      { title},
      { title},
      { title},
      { title},
      { title},
      ...
    ],
  }

Here is an example:

 {
    shortName: "filme",
    winner: "CODA",
    nominees: [
      { title: "Belfast" },
      { title: "CODA" },
      { title: "Don't Look Up" },
      { title: "Drive My Car" },
      { title: "Dune" },
      { title: "King Richard" },
      { title: "Licorice Pizza" },
      { title: "Nightmare Alley" },
      { title: "The Power of the Dog" },
      { title: "West Side Story" },
    ],
  }

The script uses data from two sources:

So you need a valid API Key for the OMDb API, and to download the following IMDb datasets:

  • title.akas.tsv.gz: Includes name translation for the movies. You can filter for only your region using ripgrep. rg -e "\tBR\t" title.akas.tsv > titles.akas.br.tsv
  • title.basics.tsv.gz: Includes data from all IMDb titles. To filter for only movies, use rg -e "tt[0-9]*\tmovie" title.basics.tsv > movies.tsv
  • name.basics.tsv.gz: Includes information for all crew members

You need to manually re-include the header line if you filter using ripgrep

Total size of those files should be around 700mb after filtering (695mb from name.basics). I'm looking for suggestions on how to filter and reduce the size of those files.

You need to import those datasets to a MongoDB database. MongoDB Compass can easily import from tsv files. The following database structure is expected:

-> imdb           // Database
|-> akas          // Collection from title.akas data
|-> movies        // Collection from title.basics data
|-> names         // Collection from name.basics data

oscar-data-download's People

Contributors

ralacerda avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.