Offical code of the paper Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning.

License: MIT License

Python 99.58% Shell 0.42%

deep-learning large-language-models machine-learning prompting

concept-based-demonstration-selection's People

Stargazers

Watchers

Forkers

jxzhangjhu liujuncn pengfeihepower niiiikou celeste1114

concept-based-demonstration-selection's Issues

Can't reproduce metrics

Hello. Thank you for a great work and for sharing the code. I was trying to reproduce your scores on dataset 'sst2' reported in paper. I was using learned concepts from the Google Drive link (folder tune-train) and running the following script:

TRAIN_METHOD=direct
TEST_METHOD=direct
LR=1e-2
N_PREFIX=10
DATASET=glue-sst2
TRAIN_TASK=tune
SPLIT=train
MODEL=gpt2-large
TRAIN_SIZE=100
STEP=100000
K=4
DIFFICULTY=concept_calibrated
CUDA_VISIBLE_DEVICES=5 python test.py\
    --dataset $DATASET\
    --gpt $MODEL\
    --method $TEST_METHOD\
    --test_batch_size 16\
    --out_dir out/$MODEL\
    --k $K\
    --embedding_dir embeddings/\
    --use_demonstrations\
    --concept_temperature 50\
    --similarity_temperature 0.1\
    --train_size $TRAIN_SIZE\
    --difficulty $DIFFICULTY\
    --n_prefix_tokens $N_PREFIX\
    --concept_dir $DIFFICULTY-$K/gpt2-large/$TRAIN_TASK-$SPLIT-$TRAIN_SIZE/$DATASET-$TRAIN_METHOD-prefix=$N_PREFIX-lr=$LR-$STEP\
    --prefix_embed_file checkpoints/gpt2-large/$TRAIN_TASK-$SPLIT/prefix={$N_PREFIX}-{$TRAIN_METHOD}-lr={$LR}-initByVocab/soft_embeddings-$STEP.pt\
    --prior easiest\
    --reorder\
    # --prior most_similar\

I have obtained final metrics over 5 seeds as follows: None over 1 target tasks on average: Macro-F1: 37.6 +- 5.7, Accuracy: 50.6 +- 1.0 when the reported score is 86.1 What can possibly be the reason for such a huge discrepancy?

I've checked that test_prefix.sh results in the same scores that are reported as "optimal" in your paper. So the problem is not with the learned soft-tokens I guess.

FileNotFoundError: Couldn't find file at https://www.dropbox.com/s/1pzkadrvffbqw6o/train.txt?dl=1

Thanks for sharing codes. I was reproducing the results, and when I installed the data dependencies and downloaded the data following your instructions, I met a problem. I run 'python run.py' in the directory 'preprocess', and I got error: FileNotFoundError: Couldn't find file at https://www.dropbox.com/s/1pzkadrvffbqw6o/train.txt?dl=1. It seems that there are no files in the dropbox.

Select demonstrations for a new dataset

Hello. I would like to use your method to select the best demonstrations for ICL on TREC dataset. However, as I see it is not included in any task sets for which there are provided checkpoints (Yet your code includes preprocessing of this dataset). What is the proper way to implement your method on a new dataset? Does it necessarily include pretraining prefix model from scratch? If yes, how to select tasks to combine them into task suite?

Too long execution time

I have run exactly the same train.sh file with tasks in tune. It takes 5 hours to execute 100 training steps for sst2 on single A100. It doesn't look normal. The only warning is "The NVIDIA driver on your system is too old". Have you encountered this long execution time? Thanks for your help.

Clarification Needed Regarding Demonstration Selection in LLaMA 2 7b Model

Hi there,

Firstly, I appreciate your effort in sharing the codes. I've been attempting to reproduce the results, but I've encountered a challenge that requires some clarification.

In the paper, it's mentioned that demonstrations are selected based on the probability of using (X,Y) as the input to generate concept tokens. This approach seems reasonable for language models like GPT-2, where word embeddings and LM heads are shared. However, I'm uncertain how this process is adapted for language models that do not share word embeddings and LM heads.

Specifically, in models where we haven't pre-trained the parameters in LM heads to generate concept tokens, how is the selection of demonstrations handled? Since the model hasn't been explicitly trained on this task before, the parameters in the LM heads related to concept token generation are untrained.

I'd appreciate any insights or guidance on how to address this issue.

Thank you in advance for your help!

Best regards,
WJMacro.

wangxinyilinda / concept-based-demonstration-selection Goto Github PK

concept-based-demonstration-selection's People

Stargazers

Watchers

Forkers

concept-based-demonstration-selection's Issues

Can't reproduce metrics

FileNotFoundError: Couldn't find file at https://www.dropbox.com/s/1pzkadrvffbqw6o/train.txt?dl=1

Select demonstrations for a new dataset

Too long execution time

Clarification Needed Regarding Demonstration Selection in LLaMA 2 7b Model

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent