Giter Club home page Giter Club logo

p4-smbly's Introduction

๐Ÿ†์ˆ˜์‹ ์ธ์‹: To be Modeler and Beyond!

Task Description

Subject

๋ณธ ๋Œ€ํšŒ์˜ ์ฃผ์ œ๋Š” ์ˆ˜์‹ ์ด๋ฏธ์ง€๋ฅผ LaTex ํฌ๋งท์˜ ํ…์ŠคํŠธ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฌธ์ œ์˜€์Šต๋‹ˆ๋‹ค. LaTex์€ ๋…ผ๋ฌธ ๋ฐ ๊ธฐ์ˆ  ๋ฌธ์„œ ์ž‘์„ฑ ํฌ๋งท์œผ๋กœ, ์ž์—ฐ ๊ณผํ•™ ๋ถ„์•ผ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ๊ด‘ํ•™ ๋ฌธ์ž ์ธ์‹(optical character recognition)๊ณผ ๋‹ฌ๋ฆฌ ์ˆ˜์‹์ธ์‹์€ multi-line recognition์„ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค.

์ผ๋ฐ˜์  ๋ฌธ์žฅ๊ณผ ๋‹ฌ๋ฆฌ ์ˆ˜์‹์€ ๋ถ„์ˆ˜์˜ ๋ถ„์žยท๋ถ„๋ชจ, ๊ทนํ•œ์˜ ๊ตฌ๊ฐ„ ํ‘œํ˜„ ๋“ฑ ๋‹ค์ฐจ์›์  ๊ด€๊ณ„ ํŒŒ์•…์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ˆ˜์‹์ธ์‹ ๋ฌธ์ œ๋Š” ์ผ๋ฐ˜์ ์ธ single line recognition ๊ธฐ๋ฐ˜์˜ OCR์ด ์•„๋‹Œ multi line recognition์„ ์ด์šฉํ•˜๋Š” OCR ๋ฌธ์ œ๋กœ ๋ฐ”๋ผ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Multi line recognition์˜ ๊ด€์ ์—์„œ ์ˆ˜์‹ ์ธ์‹์€ ๊ธฐ์กด OCR๊ณผ ์ฐจ๋ณ„ํ™”๋˜๋Š” task๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Data

  • ํ•™์Šต ๋ฐ์ดํ„ฐ: ์ถœ๋ ฅ๋ฌผ ์ˆ˜์‹ ์ด๋ฏธ์ง€ 5๋งŒ ์žฅ, ์†๊ธ€์”จ ์ˆ˜์‹ ์ด๋ฏธ์ง€ 5๋งŒ ์žฅ, ์ด 10๋งŒ ์žฅ์˜ ์ˆ˜์‹ ์ด๋ฏธ์ง€

  • ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ: ์ถœ๋ ฅ๋ฌผ ์ˆ˜์‹ ์ด๋ฏธ์ง€ 6์ฒœ ์žฅ, ์†๊ธ€์”จ ์ˆ˜์‹ ์ด๋ฏธ์ง€ 6์ฒœ ์žฅ

Metric

  • ํ‰๊ฐ€ ์ฒ™๋„: 0.9 ร— ๋ฌธ์žฅ ๋‹จ์œ„ ์ •ํ™•๋„ + 0.1 ร— (1 - ๋‹จ์–ด ์˜ค๋ฅ˜์œจ)

  • ๋ฌธ์žฅ ๋‹จ์œ„ ์ •ํ™•๋„(Sentence Accuracy): ์ „์ฒด ์ถ”๋ก  ๊ฒฐ๊ณผ ์ค‘ ๋ช‡ ๊ฐœ์˜ ์ˆ˜์‹์ด ์ •๋‹ต๊ณผ ์ •ํ™•ํžˆ ์ผ์น˜ํ•˜๋Š” ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ธ ์ฒ™๋„์ž…๋‹ˆ๋‹ค.

  • ๋‹จ์–ด ์˜ค๋ฅ˜์œจ(Word Error Rate, WER): ์ถ”๋ก  ๊ฒฐ๊ณผ๋ฅผ ์ •๋‹ต์— ์ผ์น˜ํ•˜๋„๋ก ์ˆ˜์ •ํ•˜๋Š” ๋ฐ ๋‹จ์–ด์˜ ์‚ฝ์ž…, ์‚ญ์ œ, ๋Œ€์ฒด๊ฐ€ ์ด ๋ช‡ ํšŒ ๋ฐœ์ƒํ•˜๋Š” ์ง€๋ฅผ ์ธก์ •ํ•˜๋Š” ์ฒ™๋„์ž…๋‹ˆ๋‹ค.

Project Result

  • 12ํŒ€ ์ค‘ 1์œ„

  • Public LB Score: 0.8574 / Private LB Score: 0.6288

  • 1๋“ฑ ์†”๋ฃจ์…˜ ๋ฐœํ‘œ ์ž๋ฃŒ๋Š” ์ด๊ณณ์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ˆ˜์‹ ์ธ์‹ ๊ฒฐ๊ณผ ์˜ˆ์‹œ

Installation

# clone repository
git clone https://github.com/bcaitech1/p4-fr-sorry-math-but-love-you.git

# install necessary tools
pip install -r requirments.txt

Structure

Dataset

[dataset]/
โ”œโ”€โ”€ gt.txt
โ”œโ”€โ”€ tokens.txt
โ””โ”€โ”€ images/
    โ”œโ”€โ”€ *.jpg
    โ”œโ”€โ”€ ...     
    โ””โ”€โ”€ *.jpg

Code

[code]
โ”œโ”€โ”€ configs/ # configuration files
โ”œโ”€โ”€ data_tools/ # modules for dataset
โ”œโ”€โ”€ networks/ # modules for model architecture
โ”œโ”€โ”€ postprocessing/ # modules for postprocessing during inference
โ”œโ”€โ”€ schedulers/ # scheduler for learning rate, teacher forcing ratio
โ”œโ”€โ”€ utils/ # useful utilities
โ”œโ”€โ”€ inference_modules/ # modules for inference
โ”œโ”€โ”€ train_modules/ # modules for train
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ train.py
โ””โ”€โ”€ inference.py

Command Line Interface

Train

๋‹จ์ผ ์˜ตํ‹ฐ๋งˆ์ด์ € ํ™œ์šฉ ํ•™์Šต

$ python train.py --train_type single_opt --config_file './configs/EfficientSATRN.yaml'

์ธ์ฝ”๋”์™€ ๋””์ฝ”๋”์— ์˜ตํ‹ฐ๋งˆ์ด์ €๋ฅผ ๊ฐœ๋ณ„ ๋ถ€์—ฌํ•œ ํ•™์Šต

$ python train.py --train_type dual_opt --config_file './configs/EfficientSATRN.yaml'

Weight & Bias ๋กœ๊น… ํˆด์„ ํ™œ์šฉํ•œ ํ•™์Šต

$ python train.py --train_type single_opt --project_name <PROJECTNAME> --exp_name <EXPNAME> --config_file './configs/EfficientSATRN.yaml'

Arguments

train_type (str): ํ•™์Šต ๋ฐฉ์‹
  • 'single_opt': ๋‹จ์ผ optimizer๋ฅผ ํ™œ์šฉํ•œ ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • 'dual_opt': ์ธ์ฝ”๋”, ๋””์ฝ”๋”์— optimizer๊ฐ€ ๊ฐœ๋ณ„ ๋ถ€์—ฌ๋œ ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
config_file (str): ํ•™์Šต ๋ชจ๋ธ์˜ configuration ํŒŒ์ผ ๊ฒฝ๋กœ
  • ๋ชจ๋ธ configuration์€ ์•„ํ‚คํ…์ฒ˜๋ณ„๋กœ ์ƒ์ดํ•˜๋ฉฐ, ์ด๊ณณ์—์„œ ํ•ด๋‹น ์˜ˆ์‹œ๋ฅผ ๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ์€ EfficientSATRN, EfficientASTER, ***SwinTRN***์ž…๋‹ˆ๋‹ค.
project_name (str): (optional) ํ•™์Šต ์ค‘ Weight & Bias ๋กœ๊น… ํˆด์„ ํ™œ์šฉํ•  ๊ฒฝ์šฐ ์‚ฌ์šฉํ•  ํ”„๋กœ์ ํŠธ๋ช…
exp_name (str): (optional) ํ•™์Šต ์ค‘ Weight & Bias ๋กœ๊น… ํˆด์„ ํ™œ์šฉํ•  ๊ฒฝ์šฐ ์‚ฌ์šฉํ•  ์‹คํ—˜๋ช…

Inference

๋‹จ์ผ ๋ชจ๋ธ ์ถ”๋ก 

$ python inference.py --inference_type single --checkpoint <MODELPATH.pth>

์•™์ƒ๋ธ” ๋ชจ๋ธ ์ถ”๋ก 

$ python inference.py --inference_type ensemble --checkpoint <MODEL1PATH.pth> <MODEL2PATH.pth> ...

Arguments

inference_type (str): ์ถ”๋ก  ๋ฐฉ์‹
  • single: ๋‹จ์ผ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์™€ ์ถ”๋ก ์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • ensemble: ์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์™€ ์•™์ƒ๋ธ” ์ถ”๋ก ์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
checkpoint (str): ๋ถˆ๋Ÿฌ์˜ฌ ๋ชจ๋ธ ๊ฒฝ๋กœ
  • ์•™์ƒ๋ธ” ์ถ”๋ก ์‹œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ชจ๋ธ์˜ ๊ฒฝ๋กœ๋ฅผ ๋‚˜์—ดํ•ฉ๋‹ˆ๋‹ค.

    --checkpoint <MODELPATH_1.pth> <MODELPATH_2.pth> <MODELPATH_3.pth> ...
max_sequence (int): ์ˆ˜์‹ ๋ฌธ์žฅ ์ƒ์„ฑ ์‹œ ์ตœ๋Œ€ ์ƒ์„ฑ ๊ธธ์ด (default. 230)
batch_size (int) : ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ (default. 32)
decode_type (str): ๋””์ฝ”๋”ฉ ๋ฐฉ์‹
  • 'greedy': ๊ทธ๋ฆฌ๋”” ๋””์ฝ”๋”ฉ ๋ฐฉ๋ฒ•์œผ๋กœ ๋””์ฝ”๋”ฉ์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • 'beam': ๋น”์„œ์น˜ ๋ฐฉ๋ฒ•์œผ๋กœ ๋””์ฝ”๋”ฉ์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
decoding_manager (bool): DecodingManager ์‚ฌ์šฉ ์—ฌ๋ถ€
tokens_path (str): ํ† ํฐ ํŒŒ์ผ ๊ฒฝ๋กœ
  • NOTE. DecodingManager๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ์—๋งŒ ํ™œ์šฉ๋ฉ๋‹ˆ๋‹ค.
max_cache (int): ์•™์ƒ๋ธ”('ensemble') ์ถ”๋ก  ์‹œ ์ธ์ฝ”๋” ์ถ”๋ก  ๊ฒฐ๊ณผ๋ฅผ ์ž„์‹œ ์ €์žฅํ•  ๋ฐฐ์น˜ ์ˆ˜
  • NOTE. ๋†’์€ ๊ฐ’์„ ์ง€์ •ํ•  ์ˆ˜๋ก ์ถ”๋ก  ์†๋„๊ฐ€ ๋นจ๋ผ์ง€๋งŒ, ์ผ์‹œ์ ์œผ๋กœ ๋งŽ์€ ์ €์žฅ ๊ณต๊ฐ„์„ ์ฐจ์ง€ํ•ฉ๋‹ˆ๋‹ค.
file_path (str): ์ถ”๋ก ํ•  ๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ
output_dir (str): ์ถ”๋ก  ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅํ•  ๋””๋ ‰ํ† ๋ฆฌ ๊ฒฝ๋กœ (default: './result/')

Team SMBLY

  • ๊ณ ์ง€ํ˜•
  • ๊น€์ค€์ฒ 
  • ๊น€ํ˜•๋ฏผ
  • ์†ก๋ˆ„๋ฆฌ
  • ์ด์ฃผ์˜
  • ์ตœ์ค€๊ตฌ

p4-smbly's People

Contributors

ilovemyminutes avatar lala-chick avatar jun9choi avatar ahaampo5 avatar doritos0812 avatar nureesong avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.