Giter Club home page Giter Club logo

novos-data-challenge's Introduction

Novum Pharma Data Analysis Challenge

My Solution:

https://github.com/roshin8/novos-data-challenge/blob/master/Novum-Pharma-Data-Analysis-Challenge.ipynb

The Challenge

Show us something interesting using the provided dataset. You should leverage python, along with any other tools/programming languages of your choice so long they are available for free (you may assume we have Microsoft Office). The use of additional, external datasets is welcomed, but not required.

Examples of interesting analyses include (but are not limited to):

- How effective is occupation as a predictor of whether a person will
  enjoy an action film?
- Are there generalizations that can be made regarding opinions of films
  released at different points in the viewer's life?

How you choose to convey your findings is completely up to you. Examples include: a Word document containing a written account of the analysis, a Power Point slide deck, a website, a Jupyter Notebook, etc.

The purpose of this challenge is to demonstrate your:

- Programming technique, focusing on data manipulation
- Ability to think through a problem from start to finish, and complete the work
- Ability to identify and present accurate findings in a clear, simple manner

About the dataset

The dataset you will be using is the MovieLens 100K Dataset. This dataset contains 100,000 movie ratings from 1000 users on 1700 films. It was compiled by the University of Minnesota during a seven-month time period between September 19th, 1997 and April 22nd, 1998. More information on the dataset can be found in the README, located here: http://files.grouplens.org/datasets/movielens/ml-100k-README.txt.

The data can be downloaded through the following url: http://files.grouplens.org/datasets/movielens/ml-100k.zip

Time limit

You have either 48 or 72 hours to complete your analysis depending on whether you choose to use the weekend or weeknights, respectively. This short time period is not meant to intimidate; rather, it is meant to be a reflection of our expectations for the depth of analysis in this challenge. We're not expecting academic-level research--just enough to make us say, "Hmm, that's interesting." The total time spent on this challenge should be around 6 to 8 hours, though more or less is fine.

Delivering your results

Before the end of your time limit, please reply to this email with one of the following:

- (Preferred) A link to a GitHub repo containing all findings, presentation
  materials, and code used in this challenge. If you choose this method, 
  please fork this repo and complete your work in the forked repo.
- A zip file (no larger than 25MB) containing all finding, presentation
  materials, and code used in this challenge.

Additionally, your zip file/repo should contain a file called "README.md" which explains where your findings can be found and how to reproduce your analysis.

Post-analysis questions

On your interview day, you will meet with a member of the Novos Growth team to discuss your analysis. Be prepared to answer the following questions during this time:

- If given more time what else would you do?
- Why and how did you use the tools you used?
- How would your approach change if the data set contained 1 billion
  ratings? Which tools would you use?
- Let’s say that you wanted to reproduce your analysis on a regular
  basis—-say, once a week in perpetuity. How would you accomplish this?

Questions or bug fixes

If you have any questions during your completion of the challenge or find any bugs along the way, please create an issue for this repo and tag it appropriately. A member of the team will respond as soon as possible.

Good Luck!

novos-data-challenge's People

Contributors

roshin8 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.