Giter Club home page Giter Club logo

manga_recom's Introduction

Background

The Japanese Manga Market saw an increase in sales by 10.3% in 2021 to reach 675.9 Billion Yen. This pushed manga sales to 40.4% of all domestic publishing sales in Japan, crossing the 40% mark for the first time in history (AnimeHunch).

Over in America, it was reported that manga sales made Up 76.71% of all Adult Fiction Graphic Novel Sales in 2021 (AnimeNewsNetwork).

With more than 100,000 titles to choose from, it can be a very daunting task to enter the world of manga. Even for the veterans, there are always more titles to be discovered.

Objective

MyAnimeList (MAL) is the world's largest Anime and Manga community.

It has hundreds of thousands of users in its community that actively rate, share and talk about manga titles.

The objective of this project is to build a user-based manga recommender system by utilizing the rating lists of users on MAL.

The completed app can be assessed here.

Data Dictionary

Feature Description
title manga title
mal_id ID number of title on MAL
url URL to title on MAL
image URL to cover image of title on MAL
synopsis synopsis of title
combined title combination of title and its synonyms
Drama, Action etc. each genre has its own column, with a value of either 0 or 1

Methodology

  1. Scrape a list of usernames using Jikan API
  • Rather than scraping by user ID number, scrape members from the largest MAL user community called "Recommendation Club"
  • Assume users in communities are more active in rating titles
  • Since this club was created in 2010 and still active today in 2022, assume it has a good spread of old and new users
  1. Use the list of usernames to scrape their respective lists of manga ratings
  1. Conduct EDA on data
  • MAL ratings are on a scale of 0 to 10
  • Almost half the ratings were 0 scores
  • Removed all 0 scores from data; almost 2,000 users only had 0 scores in their rating list
  1. Data Modeling using scikit-surprise library
  1. Scoring models
  • Parameters selected for this project are k = 10, and a threshold of >= 7/10 score to be considered a relevant recommendation
  • The scores are the calculated average after 5 folds of cross validation
  • Baseline model used was random recommendations. The model randomly recommends 10 titles to each user and the scores are calculated from these recommendations
  1. Select the best performing model
  • Baseline Only was the best performing model with a precision@10 score of 0.909 and recall@10 score of 0.862
  1. Build and deploy the recommender system using Streamlit

Key Findings

Model Selection

Model Performance

  • The scores above are calculated with a relevant score threshold of >= 7/10, at k =10
  • The scores are the calculated average after 5 folds of cross validation
  • The Baseline Only model performed the best and its recall and precision scores are almost a whole 1% higher than the 2nd best model

Baseline Random Recommendations

  • The machine learning models all vastly outperformed the baseline model that gives random recommendations

Streamlit

Streamlit App

  • Link to Streamlit App
  • Checkbox to toggle on/off the showing of recommendations with adult genres
  • User inputs genre preference for the recommendations generated. Only recommendations containing at least 1 of the selected genres will be shown.
  • User inputs favorite manga titles. These titles will be automatically given the maximum rating of 10.

Limitations of Recommender System

  • The recommender system can only recommend manga titles that are present in its data (~2,900 titles).
  • Data was scrapped in March 2022. Hence, newer titles will not be included.
  • Data is scrapped from the largest user community on MAL called "Recommendation Club", ratings will not be reflective of all manga readers

manga_recom's People

Contributors

only-rohit avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.