boostcampaitech4recsys1 / level1_bookratingprediction_recsys-level1-recsys-02 Goto Github PK
View Code? Open in Web Editor NEWlevel1_bookratingprediction_recsys-level1-recsys-02 created by GitHub Classroom
level1_bookratingprediction_recsys-level1-recsys-02 created by GitHub Classroom
ㅁㅁㅁㅁ
.
error뜸 에러메세지
baseline NCF 모델 파라미터 튜닝
과연 이미지 데이터도 멀쩡할까? 싶어서 뜯어봤습니다.
정확히 짜보지는 않았지만 (1, 1) 크기의 이미지가 41802개가 발견되었습니다....
총 책의 개수가 15만개 정도니까 4분의 1 가량의 이미지 데이터가 의미가 없는 것 같습니다
1 by 1 이미지 제외 나머지 이미지는 크기는 조금 들쭉날쭉 해도 크게 문제는 없는 것 같습니다.
from PIL import Image
from collections import defaultdict
import pandas as pd
books = pd.read_csv('books.csv')
d = defaultdict(lambda : 0)
for i in range(len(books)):
d[Image.open(books['img_path'][i]).size] += 1
이 코드로 대충 확인 가능합니다.
groupings = {'Fiction': ['fiction'], # 너무 넓으니 맨 위로 빼자
'Literature & Poem': ['liter', 'poem', 'poetry'],
'Science & Math': ['science', 'math', 'logy'], # science, logy 범위가 너무 넓으니 맨 위로
'Parenting & Relationships': ['baby', 'parent', 'family', 'tionship', 'brother', 'sister'], # 좀 큼
'Medical Books': ['medi', 'psycho'], # psy의 세분화 가능
'Animal & Nature': ['animal', 'ecolo', 'plant', 'nature'],
'Arts & Photography': ['art', 'photo'], # art는 겹치는 글자가 너무 많음
'Biographies & Memoirs': ['biog', 'memo'],
'Business & Money': ['busi', 'money', 'econo'],
'Calendars': ['calen'],
'Children\'s Books': ['child', 'baby'],
'Christian Books & Bibles': ['christi', 'bible'], #크리스마스때매
'Comics & Graphic Novels': ['comics', 'graphic novel'],
'Computers & Technology': ['computer', 'techno', 'archi'],
'Cookbooks, Food & Wine': ['cook'],
'Crafts, Hobbies & Home': ['crafts'],
'Education & Teaching': ['educa', 'teach'],
'Engineering & Transportation': ['engine', 'transp'],
'Health, Fitness & Dieting': ['health', 'fitness', 'diet'],
'History': ['histo'],
'Humor & Entertainment': ['humor', 'entertai', 'comed', 'game'],
'Law': ['law'],
'LGBTQ+ Books': ['lesbian', 'gay', 'bisex'],
'Mystery, Thriller & Suspense': ['myste', 'thril', 'suspen'],
'Politics & Social Sciences': ['politic', 'social'],
'Reference': ['reference'],
'Religion & Spirituality': ['religi'],
'Romance': ['romance'],
'Science Fiction & Fantasy': ['science fiction', 'fantasy'],
'Self-Help': ['self'], # self 검색시 모두 자기계발 관련
'Sports & Outdoors': ['exerc','sport','outdoor'],
'Teen & Young Adult': ['teen', 'adol', 'juven'], #nonfiction이란 말은 청소년 관련뿐
'Test Preparation': ['test', 'school', 'examina'],
'Travel': ['travel'],
}
for i in range(len(books)): # 5033개의 항목을 미분류로 편입
if books.at[i, 'count'] < 10:
books.at[i, 'category_high'] = 'Unclassified'
books_count = books.groupby('category_high').count()['isbn'].to_dict() # category_high별 isbn 수?
for i in range(len(books)):
books.at[i, 'count'] = books_count[books['category_high'][i]] # 다시 세보자 미분류가 5033개 늘었으면 성공
english 134405 german 6706 franch 3405 espanol 3399 others 1655 Name: isbn_country, dtype: int64
usa 45301 canada 6538 anycountry 5154 germany 3609 unitedkingdom 3148 australia 1821 spain 1692 france 829 Name: location_country, dtype: int64
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.