Giter Club home page Giter Club logo

concept-based-demonstration-selection's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

concept-based-demonstration-selection's Issues

Can't reproduce metrics

Hello. Thank you for a great work and for sharing the code. I was trying to reproduce your scores on dataset 'sst2' reported in paper. I was using learned concepts from the Google Drive link (folder tune-train) and running the following script:

TRAIN_METHOD=direct
TEST_METHOD=direct
LR=1e-2
N_PREFIX=10
DATASET=glue-sst2
TRAIN_TASK=tune
SPLIT=train
MODEL=gpt2-large
TRAIN_SIZE=100
STEP=100000
K=4
DIFFICULTY=concept_calibrated
CUDA_VISIBLE_DEVICES=5 python test.py\
    --dataset $DATASET\
    --gpt $MODEL\
    --method $TEST_METHOD\
    --test_batch_size 16\
    --out_dir out/$MODEL\
    --k $K\
    --embedding_dir embeddings/\
    --use_demonstrations\
    --concept_temperature 50\
    --similarity_temperature 0.1\
    --train_size $TRAIN_SIZE\
    --difficulty $DIFFICULTY\
    --n_prefix_tokens $N_PREFIX\
    --concept_dir $DIFFICULTY-$K/gpt2-large/$TRAIN_TASK-$SPLIT-$TRAIN_SIZE/$DATASET-$TRAIN_METHOD-prefix=$N_PREFIX-lr=$LR-$STEP\
    --prefix_embed_file checkpoints/gpt2-large/$TRAIN_TASK-$SPLIT/prefix={$N_PREFIX}-{$TRAIN_METHOD}-lr={$LR}-initByVocab/soft_embeddings-$STEP.pt\
    --prior easiest\
    --reorder\
    # --prior most_similar\

I have obtained final metrics over 5 seeds as follows: None over 1 target tasks on average: Macro-F1: 37.6 +- 5.7, Accuracy: 50.6 +- 1.0 when the reported score is 86.1 What can possibly be the reason for such a huge discrepancy?

I've checked that test_prefix.sh results in the same scores that are reported as "optimal" in your paper. So the problem is not with the learned soft-tokens I guess.

Select demonstrations for a new dataset

Hello. I would like to use your method to select the best demonstrations for ICL on TREC dataset. However, as I see it is not included in any task sets for which there are provided checkpoints (Yet your code includes preprocessing of this dataset). What is the proper way to implement your method on a new dataset? Does it necessarily include pretraining prefix model from scratch? If yes, how to select tasks to combine them into task suite?

Too long execution time

I have run exactly the same train.sh file with tasks in tune. It takes 5 hours to execute 100 training steps for sst2 on single A100. It doesn't look normal. The only warning is "The NVIDIA driver on your system is too old". Have you encountered this long execution time? Thanks for your help.

Clarification Needed Regarding Demonstration Selection in LLaMA 2 7b Model

Hi there,

Firstly, I appreciate your effort in sharing the codes. I've been attempting to reproduce the results, but I've encountered a challenge that requires some clarification.

In the paper, it's mentioned that demonstrations are selected based on the probability of using (X,Y) as the input to generate concept tokens. This approach seems reasonable for language models like GPT-2, where word embeddings and LM heads are shared. However, I'm uncertain how this process is adapted for language models that do not share word embeddings and LM heads.

Specifically, in models where we haven't pre-trained the parameters in LM heads to generate concept tokens, how is the selection of demonstrations handled? Since the model hasn't been explicitly trained on this task before, the parameters in the LM heads related to concept token generation are untrained.

I'd appreciate any insights or guidance on how to address this issue.

Thank you in advance for your help!

Best regards,
WJMacro.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.