Giter Club home page Giter Club logo

kill_the_marvel_universe's Introduction

Kill the marvel universe Exercise

Please note the focus of this exercise is analyzing data and data science, I'm neglecting code quality to allow me to focus on experimenting with new tools, this is not production friendly work flow.

Plan of attack:

  • (DONE) Build simple scraper with grequests for character API use comic IDs as that's all we need save data into characters.json (check comic ID limit doesn't break anything)

  • (DONE) Write a simple reporter to output all characters alphabetically

  • (DONE) Write simple reporter to output top 10 heros by comics

  • (DONE) Build a networkx graph using comic IDs to connect heros in graph

  • (DONE) Use networkx to calculate centrality and get top 10

  • (DONE) Render graph to show character connections

  • (DONE) Calculate influence based on similar logic from previous project using nx.neighbors to get basic influence of character and the surrounding characters to n levels - Looks like similar results to degree algorithm already implemented so will need to investigate graph to make sure it's built correctly

  • (DONE) After analysing the data, looks like the results could be correct but the amount of comic relations on each character is limited to 20 on initial API call I thought I checked this as per my notes, will have to go through characters for > 20 relations and make additional calls to get their extra relations

  • (DONE) Added comic API scraping and after loading all the comic data got much more expected results

  • Clean up project, remove redundant code, write tests and update coverage

  • Reduce amount of comic data / character data stored from api scraper

  • Highlight high influence characters on graph and remove low influence to make it easier to digest

  • Use community to work out the communities in the marvel universe and then find the most connected community bridges that will cover the most communities (Might be very similar to betweeness centrality already calculated)

  • Show reasons for going for community links based on research that they are the ones who need to be vaccinated first inepidemics to fight them.

  • Add instructions on how to run crawler and different algorithms on data

Usage:

Algorithm options:

  • degree_centrality
  • closeness_centrality
  • eigenvector_centrality
  • katz_centrality
  • betweenness_centrality

$ marvel_reporter.py --algorithm $ALGORITHM -show_graph

e.g.

$ marvel_reporter.py --algorithm eigenvector_centrality

kill_the_marvel_universe's People

Contributors

jwnwilson avatar noeltron avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.