big-data-analysis's Introduction

DAT 560M: Big Data and Cloud Computing Fall 2022

This repository contains the final project of Big Data course, which analyses a large movie dataset.

Final Project: Keys to financial success in the movie industry

Data Source: Movie dataset from kaggle: the movies dataset, more than 2GB
Project Objective: Find what characteristics is common to movies that are financially successful
Framework: Data processing & analysis using PySpark, Visualizaiton using Tableau; From genre, Popularity, Runtime and Rate to do feature analysis

Course Objectives:

An introduction to storage, retrieval, analyses, and display of data sets so large and complex that traditional data processing and analysis applications cannot readily be used. Topics include big data management, data architecture of hosting big data, big data retrieval languages, parallel computing methods, big data analytical methods, and data visualization.

Software: Based on the technology on hand, we might use Bash coding, Cloudera (through AWS server), Python, Tableau, and Amazon AWS tools. Bash tools, Python, and Cloudera are available while the instructions are provided.

Course Instructor: Hossein Amini

Recommend Projects

zixiao-wu / big-data-analysis Goto Github PK

big-data-analysis's Introduction

DAT 560M: Big Data and Cloud Computing Fall 2022

Final Project: Keys to financial success in the movie industry

Course Objectives:

big-data-analysis's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent