This project started with 3 source files: Movie Information from Kaggle, Ratings Information from MovieLens, and Movie Information from Wikipedia. Each of those files was read into a function that cleaned the data, deleted bad or unneeded data, and then were merged together to create 2 tables in a PostgreSQL dabatase. The upload process of populating the database was timed and is shown below:
Now that the database is created, it is much easier to mine the movies data for groups of movies or a single movie.
Submitted by Lisa K Wagner (08/15/2010)