Giter Club home page Giter Club logo

popularbandpairs's Introduction

PopularBandPairs

This was done as part of software engineering interview...

The assignment is:

The attached text file contains the favorite musical artists of 1000 users from Some Popular Music Review Website. Each line is a list of up to 50 artists, formatted as follows:

Radiohead,Pulp,Morrissey,Delays,Stereophonics,Blur,Suede,Sleeper,The La's,Super Furry Animals,Iggy Pop\n Band of Horses,Smashing Pumpkins,The Velvet Underground,Radiohead,The Decemberists,Morrissey,Television\n

Write a program that, using this file as input, produces an output file containing a list of pairs of artists which appear TOGETHER in at least fifty different lists. For example, in the above sample, Radiohead and Morrissey appear together twice, but every other pair appears only once. Your solution should be a cvs, with each row being a pair. For example:

Morrissey,Radiohead\n


Optimization Notes:

* File coding needing to be UTF-8. JVM requires argument -Dfile.encoding=UTF-8
* Band names are converted to band ids. Processing strings is expensive. Strings are converted to integers. All processing is done on the integers. For display the integer is translated back to its string band name.
* Also show use a shadow table to improve look up of band id given band name. The performance gained is worth the extra memory and processing used.
* Hashmaps are used to represent tables. Hashmaps allow for unique ids assigned by the programmer similar to database tables. Hashmaps are also limited in size only by memory and provide efficient id based storing and retrieving of data.
* For UserLike I used an array. This requires possible looping through empty cells where users liked less than 50 bands. However, the trade off in speed of inserting, retrieving from an arraylist made this extra looping trival.

Algorithm
Create three tables (hashmap)

Bands ->  : int    unique id called bandId Prime Number
            string bandName
         
UserLikes -> int userId
             array of bandIds user likes
             
PairsLiked -> int unique id called semiprime
              int band 1
              int band 2
              int freq
             
1) Read in each line from the file inserting the user id and bandId into the UserLikes table.
   If band name is not yet in Bands table, add it. bandIds are prime numbers.
   
2) Loop through each user
      A) Loop through each combination of bands that user likes
      B) create semiprime of band id
      c) check semiprime in list of existing pairs. Add combination to list either by adding a new combination or incrementing frequency of an existing combination
      
3) Save to file only those pairs that appear N times.

Assumptions

* Since each user can list up to 50 bands and we have 1,000 users, we have a possibility of 50,000 unique bands
* Each user will like a band only once
* Band names are case sensitive so REM and Rem are two separate bands
* Different spellings of band names counts as a new band so REM and R.E.M. are two separate bands
* Order to the band names in the pairs is not matter.

popularbandpairs's People

Contributors

jrmmba8314 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.