Giter Club home page Giter Club logo

level2-nlp-datacentric-nlp-09's Introduction

๐Ÿ ๋ฉค๋ฒ„ ๊ตฌ์„ฑ ๋ฐ ์—ญํ• 

์ „ํ˜„์šฑ ๊ณฝ์ˆ˜์—ฐ ๊น€๊ฐ€์˜ ๊น€์‹ ์šฐ ์•ˆ์œค์ฃผ
  • ์ „ํ˜„์šฑ
    • ํŒ€ ๋ฆฌ๋”, Label Error Detection, G2P Noise
  • ๊ณฝ์ˆ˜์—ฐ
    • ํŠน์ˆ˜๋ฌธ์ž ๋ฐ ํ•œ์ž ์ฒ˜๋ฆฌ, Back Translation
  • ๊น€๊ฐ€์˜
    • Semantic Similarity Analysis
  • ๊น€์‹ ์šฐ
    • Data Augmentation
  • ์•ˆ์œค์ฃผ
    • Text Keyword Extraction

๐Ÿ ํ”„๋กœ์ ํŠธ ๊ธฐ๊ฐ„

2024.01.24 10:00 ~ 2024.02.01 19:00

๐ŸŒ ํ”„๋กœ์ ํŠธ ์†Œ๊ฐœ

  • ์ž์—ฐ์–ด์—์„œ ๋…ํ•ด ๋ฐ ๋ถ„์„ ๊ณผ์ •์„ ๊ฑฐ์ณ ์ฃผ์–ด์ง„ ํƒœ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ž์—ฐ์–ด์˜ ์ฃผ์ œ์— ๋Œ€ํ•œ ์ดํ•ด๊ฐ€ ํ•„์ˆ˜์ ์ด๋‹ค. KLUE-Topic Classification benchmark๋Š” ๋‰ด์Šค์˜ ํ—ค๋“œ๋ผ์ธ์„ ํ†ตํ•ด ๊ทธ ๋‰ด์Šค๊ฐ€ ์–ด๋–ค topic์„ ๊ฐ–๋Š”์ง€๋ฅผ ๋ถ„๋ฅ˜ํ•ด ๋‚ด๋Š” task๋กœ, ๊ฐ ์ž์—ฐ์–ด ๋ฐ์ดํ„ฐ์—์„œ ์ƒํ™œ๋ฌธํ™”, ์Šคํฌ์ธ , ์„ธ๊ณ„, ์ •์น˜, ๊ฒฝ์ œ, IT๊ณผํ•™, ์‚ฌํšŒ ๋“ฑ ๋‹ค์–‘ํ•œ ์ฃผ์ œ ์ค‘ ํ•˜๋‚˜๋กœ ๋ผ๋ฒจ๋งํ•œ๋‹ค.
  • ๋ณธ ํ”„๋กœ์ ํŠธ๋Š” Data-Centric์˜ ๋ชฉ์ ์— ๋งž๊ฒŒ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์…‹์„ ๋ฐ”ํƒ•์œผ๋กœ ๋ฒ ์ด์Šค๋ผ์ธ ๋ชจ๋ธ์˜ ์ˆ˜์ • ์—†์ด ์˜ค๋กœ์ง€ ๋ฐ์ดํ„ฐ์˜ ์ˆ˜์ •์œผ๋กœ๋งŒ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ด๋Œ์–ด๋‚ด์•ผ ํ•œ๋‹ค.

๐Ÿฅฅ ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

  • Train Data : 7,000๊ฐœ
  • Test Data : 47,785๊ฐœ

๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์กฐ

Column ์„ค๋ช…
ID ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ์˜ ๊ณ ์œ ๋ฒˆํ˜ธ
text ๋ถ„๋ฅ˜์˜ ๋Œ€์ƒ์ด ๋˜๋Š” ์—ฐํ•ฉ ๋‰ด์Šค ๊ธฐ์‚ฌ์˜ ํ—ค๋“œ๋ผ์ธ. ํ•œ๊ตญ์–ด ํ…์ŠคํŠธ์— ์ผ๋ถ€ ์˜์–ด, ํ•œ์ž ๋“ฑ์˜ ๋‹จ์–ด๊ฐ€ ํฌํ•จ
target ์ •์ˆ˜๋กœ ์ธ์ฝ”๋”ฉ๋œ ๋ผ๋ฒจ
url ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ์˜ ๋‰ด์Šค url (์ถœ์ฒ˜)
date ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ์˜ ๋‰ด์Šค๊ฐ€ ์ž‘์„ฑ๋œ ๋‚ ์งœ์™€ ์‹œ๊ฐ„

Label Class ๊ธฐ์ค€

id 0 1 2 3 4 5 6
์„ค๋ช… IT๊ณผํ•™ ๊ฒฝ์ œ ์‚ฌํšŒ ์ƒํ™œ๋ฌธํ™” ์„ธ๊ณ„ ์Šคํฌ์ธ  ์ •์น˜

ํ‰๊ฐ€ ์ง€ํ‘œ

  • macro F1 score : ๋ชจ๋“  class f1 score์˜ ํ‰๊ท 
  • accuracy

๐Ÿคฟ ์‚ฌ์šฉ ๋ชจ๋ธ

  • klue/bert-base (๊ณ ์ •)

๐Ÿ‘’ ํด๋” ๊ตฌ์กฐ

.
|-- README.md
|-- Special_character_check.ipynb
|-- back_translation.ipynb
|-- category_per_cnt.ipynb
|-- category_word_add.ipynb
|-- data
|   |-- culture.txt
|   |-- economy.txt
|   |-- it_science.txt
|   |-- politics.txt
|   |-- society.txt
|   |-- sport.txt
|   |-- train_special_characters.csv
|   `-- world.txt
|-- error_detection.ipynb
|-- functions.py
|-- g2pk.ipynb
|-- hanja.ipynb
|-- kmeans.ipynb
|-- sentence_similarty.py
|-- special_character.ipynb
`-- wrap-up_report.pdf

๐Ÿธ Leaderboard

f1 accuracy
Public 0.8454 0.8484
Private 0.8414 0.8443

level2-nlp-datacentric-nlp-09's People

Contributors

github-classroom[bot] avatar gusdnr122997 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.