Giter Club home page Giter Club logo

vqa_nia's Introduction

VQA_NIA

0. 실험 환경

Ubuntu 20.04.4 LTS

Python 3.8.10

CUDA 11.6

GPU : NVIDIA A100 80GB x 4

CPU : AMD EPYC 7352 24-Core Processor x 48

Memory : 126G

패키지 버전 설정

pip install -r requirements.txt

1. 디렉토리 구조

VQA_NIA
┣━━━ data
┃      ┗━━━ image
┃      ┃       ┗━━━ image1
┃      ┃       ┗━━━ image2
┃      ┃       ┗━━━ ...
┃      ┗━━━ data.csv   
┗━━━ main.py
┗━━━ model.py
┗━━━ train.py
┗━━━ util.py
┗━━━ focal.py
┗━━━ vqa_dataset.py
┗━━━ download.py
┗━━━ inference.py
┗━━━ inference_each.py
┗━━━ requirements.txt

만약 G-drive에서 image.zip 을 다운했다면, image 디렉토리를 VQA_NIA/data/ 하위 디렉토리로 두셔야 합니다.

2. 학습

기본 구조

python main.py --[옵션]

옵션

  • train_data : str, choice ['A', 'B', 'all'] (default : 'all')
    • A : A형 질의 데이터
    • B : B형 질의 데이터
    • all : 모든 데이터 (A형 + B형 질의)
  • n_epoch : int (default : 50)
    • 에폭 수
  • lr : float (default : 3e-5)
    • 학습률
  • batch_size : int (default : 512)
    • 배치 크기
  • max_token : int (default : 50)
    • token max length
  • use_transformer_layer : store_true
    • 해당 옵션을 사용하면 멀티모달 표현 fusion 후 Transformer Encoder Layer를 추가합니다

예시

python main.py --n_epoch 30 --batch_size 128 --use_transformer_layer

3. 개별 데이터 추론

기본 구조

python inference_each.py --[옵션]

옵션

  • model_path : str (default : "./results/all_1228_1734/infer_model_57.15.pt")
    • 저장된 모델 경로

예시

python inference_each.py --model_path ./results/all_1228_1734/infer_model_57.15.pt

4. 모든 데이터 추론 (Test 성능 확인)

기본 구조

python inference.py --[옵션]

옵션

  • model_path : str (default : "./results/all_1228_1734/infer_model_57.15.pt")

    • 저장된 모델 경로
  • infer_data : str, choice ['all', 'abstract', 'triple', 'vqa'] (default : 'all')

    • 추론하고자 하는 데이터 종류

예시

python inference.py --model_path ./results/all_1228_1734/infer_model_57.15.pt --infer_data vqa

5. 모델 설명

Question Feature Extractor : XLM-Roberta-base

Image Feature Extractor : ResNet50 (timm)



기본 모델

  • XLM-Roberta

    • 질문을 입력받아 Representation을 출력
  • ResNet50

    • 이미지를 입력받아 Representation을 출력

Transformer encoder layer 추가

vqa_nia's People

Contributors

mjkmain avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.