Giter Club home page Giter Club logo

juzershakir / investigate_tmdb_movies Goto Github PK

View Code? Open in Web Editor NEW
14.0 3.0 10.0 5.19 MB

Investigating Dataset contains information about 10,000+ movies collected from The Movie Database (TMDb)

Home Page: http://nbviewer.jupyter.org/github/JuzerShakir/Investigate_TMDb_Movies/blob/master/report.ipynb

Jupyter Notebook 40.52% HTML 59.48%
udacity udacity-machine-learning-nanodegree udacity-data-analyst-nanodegree movie-database python pandas numpy matplotlib seaborn data-analysis

investigate_tmdb_movies's Introduction

My Third Project in Machine Learning Foundation Nanodegree

Investigate a Dataset

Project: Investigate TMDb Movie Database


Table Of Contents:


Description

About the project

In this project, I have to choose any one Dataset for investigation out of 4. Click here to open a document with links and information about datasets that I can investigate for this project.

I have choosen TMDb Movie Data for my Investigation in this project.

What needs to be done

For the this project, I will conduct my own data analysis and create a file to share my findings. I will start by taking a look at the dataset and brainstorm what questions I could answer using it. Then i will use pandas and NumPy to answer the questions that I am most interested in, and create a report sharing the answers. I have not been required to use inferential statistics or machine learning to complete this project, but I will make it clear in my communications that my findings are tentative. This project is open-ended in that they aren't looking for one right answer.

Why this project

In this project, I'll go through the data analysis process and see how everything fits together. I'll use the Python libraries NumPy, pandas, and Matplotlib which make writing data analysis code in Python a lot easier! Not only that, these are sought-after skills by employers!


Data

Files

This project contains 2 files and 2 folder:

  • data.csv : The dataset file containing 10k+ entries of movies that I have worked on.
  • report.ipynb : The investigation of the dataset has been done in this jupyter notebook file.
  • export/ : Folder containing HTML and PDF file of notebook.
  • plots/ : Contains images of all the plots that are displayed in report.ipynb file.

Dataset file

This data set contains information about 10,000 movies collected from The Movie Database (TMDb). Contains data such as title, cast, director, runtime, budget, revenue, release year etc.

  • Certain columns, like ‘cast’ and ‘genres’, contain multiple values separated by pipe (|) characters.
  • There are some odd characters in the ‘cast’ column. Nothing to care much of, I leave them as is.
  • The final two columns ending with “_adj" show the budget and revenue of the associated movie in terms of 2010 dollars, accounting for inflation over time.

Loading Project

Requirements

This project requires Python 3 and the following Python libraries installed:

You will also need to have software installed to run and execute a Jupyter Notebook

If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included.

Execution

In a terminal or command window, navigate to the top-level project directory Investigate_TMDb_Movies/ (that contains this README) and run one of the following commands:

ipython notebook report.ipynb

or

jupyter notebook report.ipynb

or if you have 'Jupyter Lab' installed

jupyter lab

This will open the Jupyter/iPython Notebook software and project file/folder in your browser.


Conclusion

What I learned

  • What all steps are involved in a typical data analysis process.
  • Comfortable posing questions that can be answered with a given dataset and then answering those questions.
  • Know how to investigate problems in a dataset and wrangle the data into a format that can be used.
  • Have practice communicating the results of the analysis.
  • Being able to use vectorized operations in NumPy and Pandas to speed up your data analysis code.
  • Being familiar with Pandas Series and DataFrame objects, which lets access data more conveniently.
  • Last but not least, Know how to use Matplotlib and Seaborn to produce plots showing findings.

Evaluation

My project was reviewed by a Udacity reviewer against the Investigating a Dataset rubric. All criteria found in the rubric must be meeting specifications for me to pass.

Results

My Project Review by an Udacity Reviewer


investigate_tmdb_movies's People

Contributors

juzershakir avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.