The Japanese Manga Market saw an increase in sales by 10.3% in 2021 to reach 675.9 Billion Yen. This pushed manga sales to 40.4% of all domestic publishing sales in Japan, crossing the 40% mark for the first time in history (AnimeHunch).
Over in America, it was reported that manga sales made Up 76.71% of all Adult Fiction Graphic Novel Sales in 2021 (AnimeNewsNetwork).
With more than 100,000 titles to choose from, it can be a very daunting task to enter the world of manga. Even for the veterans, there are always more titles to be discovered.
MyAnimeList (MAL) is the world's largest Anime and Manga community.
It has hundreds of thousands of users in its community that actively rate, share and talk about manga titles.
The objective of this project is to build a user-based manga recommender system by utilizing the rating lists of users on MAL.
The completed app can be assessed here.
Feature | Description |
---|---|
title | manga title |
mal_id | ID number of title on MAL |
url | URL to title on MAL |
image | URL to cover image of title on MAL |
synopsis | synopsis of title |
combined title | combination of title and its synonyms |
Drama, Action etc. | each genre has its own column, with a value of either 0 or 1 |
- Scrape a list of usernames using Jikan API
- Rather than scraping by user ID number, scrape members from the largest MAL user community called "Recommendation Club"
- Assume users in communities are more active in rating titles
- Since this club was created in 2010 and still active today in 2022, assume it has a good spread of old and new users
- Use the list of usernames to scrape their respective lists of manga ratings
- Note that Jikan API is deprecating this function in May 2022
- Conduct EDA on data
- MAL ratings are on a scale of 0 to 10
- Almost half the ratings were 0 scores
- Removed all 0 scores from data; almost 2,000 users only had 0 scores in their rating list
- Data Modeling using scikit-surprise library
- Details on models used can be found in the library documentation
- Scoring models
- Parameters selected for this project are k = 10, and a threshold of >= 7/10 score to be considered a relevant recommendation
- The scores are the calculated average after 5 folds of cross validation
- Baseline model used was random recommendations. The model randomly recommends 10 titles to each user and the scores are calculated from these recommendations
- Select the best performing model
- Baseline Only was the best performing model with a precision@10 score of 0.909 and recall@10 score of 0.862
- Build and deploy the recommender system using Streamlit
- The scores above are calculated with a relevant score threshold of >= 7/10, at k =10
- The scores are the calculated average after 5 folds of cross validation
- The Baseline Only model performed the best and its recall and precision scores are almost a whole 1% higher than the 2nd best model
- The machine learning models all vastly outperformed the baseline model that gives random recommendations
- Link to Streamlit App
- Checkbox to toggle on/off the showing of recommendations with adult genres
- User inputs genre preference for the recommendations generated. Only recommendations containing at least 1 of the selected genres will be shown.
- User inputs favorite manga titles. These titles will be automatically given the maximum rating of 10.
- The recommender system can only recommend manga titles that are present in its data (~2,900 titles).
- Data was scrapped in March 2022. Hence, newer titles will not be included.
- Data is scrapped from the largest user community on MAL called "Recommendation Club", ratings will not be reflective of all manga readers