Hi, I am testing the document retrieval task. I found that the zip f

Two things to take note of here. ensure you're not using cross

Thanks for replying. I am also use two tower st

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

extremely high accuracy in document retrieval task about long-range-arena HOT 4 CLOSED

google-research commented on May 16, 2024

extremely high accuracy in document retrieval task

from long-range-arena.

Comments (4)

vanzytay commented on May 16, 2024

Two things to take note of here.

ensure you're not using cross attention between documents.
ensure that you're not using subword or word level but character level.

Thanks

from long-range-arena.

mlpen commented on May 16, 2024

Thanks for replying.

I am also use two tower style model
token_out_0 = self.model(input_ids_0, mask_0)
token_out_1 = self.model(input_ids_1, mask_1)
seq_scores = self.seq_classifer(token_out_0, token_out_1)
Within self.seq_classifer, the following is computed:
X_0 = pooling(token_out_0, self.pooling_mode)
X_1 = pooling(token_out_1, self.pooling_mode)
seq_scores = self.mlpblock(torch.cat([X_0, X_1, X_0 * X_1, X_0 - X_1], dim = -1))
I use the input_pipeline.get_matching_datasets to generate data and tokenizer is set to "char"
train_ds, eval_ds, test_ds, encoder = input_pipeline.get_matching_datasets(
n_devices = 1, task_name = None, data_dir = "../../lra_release/lra_release/tsv_data/",
batch_size = 1, fixed_vocab = None, max_length = 4000, tokenizer = "char",
vocab_file_path = None)

from long-range-arena.

adamsolomou commented on May 16, 2024

@mlpen How many training steps and warmup did you use? Config says to use 5K training steps and 8K warmup steps, but that feels weird.

from long-range-arena.

vanzytay commented on May 16, 2024

That's because we used some default FLAX code and only did cursory sweep of hparams (hparam sweeps not within scope of the paper). Some other folks have found that training longer leads to better performance, hence I recommend works like https://arxiv.org/abs/2106.01540 and follow their setup. Thanks :)

from long-range-arena.

Recommend Projects