Giter Club home page Giter Club logo

lex-glue's Introduction

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English βš–οΈ πŸ† πŸ§‘β€πŸŽ“ πŸ‘©β€βš–οΈ

LexGLUE Graphic

πŸ“£ 🚨 Important Notice related to the EUR-LEX dataset (Fixed) πŸ› πŸ‘ˆ

There was a major bug in HuggingFace data loader for the EUR-LEX task, which affected the label list under consideration in the training script. In the original experiments for the reported leaderboard we used custom data loaders, and then we built and released the HuggingFace dataset and data loader w/o noticing this β€œstealthy” bug. In other words, the leaderboard results are reliable.

The πŸ› has been already fixed, so you can continue developing models seamlessly. Make sure to update the HF Datasets library and clear the cache, in case there are cached versions of the dataset:

pip install --upgrade datasets
rm -rf  ~/.cache/huggingface/datasets/lex_glue

Thanks to @JamesLYC88 for digging up the πŸ›, and sorry for the inconvenience! πŸ€—

Dataset Summary

Inspired by the recent widespread use of the GLUE multi-task benchmark NLP dataset (Wang et al., 2018), the subsequent more difficult SuperGLUE (Wang et al., 2109), other previous multi-task NLP benchmarks (Conneau and Kiela,2018; McCann et al., 2018), and similar initiatives in other domains (Peng et al., 2019), we introduce LexGLUE, a benchmark dataset to evaluate the performance of NLP methods in legal tasks. LexGLUE is based on seven existing legal NLP datasets, selected using criteria largely from SuperGLUE.

We anticipate that more datasets, tasks, and languages will be added in later versions of LexGLUE. As more legal NLP datasets become available, we also plan to favor datasets checked thoroughly for validity (scores reflecting real-life performance), annotation quality, statistical power,and social bias (Bowman and Dahl, 2021).

As in GLUE and SuperGLUE (Wang et al., 2109) one of our goals is to push towards generic (or foundation) models that can cope with multiple NLP tasks, in our case legal NLP tasks,possibly with limited task-specific fine-tuning. An-other goal is to provide a convenient and informative entry point for NLP researchers and practitioners wishing to explore or develop methods for legalNLP. Having these goals in mind, the datasets we include in LexGLUE and the tasks they address have been simplified in several ways, discussed below, to make it easier for newcomers and generic models to address all tasks. We provide PythonAPIs integrated with Hugging Face (Wolf et al.,2020; Lhoest et al., 2021) to easily import all the datasets, experiment with and evaluate their performance.

By unifying and facilitating the access to a set of law-related datasets and tasks, we hope to attract not only more NLP experts, but also more interdisciplinary researchers (e.g., law doctoral students willing to take NLP courses). More broadly, we hope LexGLUE will speed up the adoption and transparent evaluation of new legal NLP methods and approaches in the commercial sector too. Indeed, there have been many commercial press releases in legal-tech industry, but almost no independent evaluation of the veracity of the performance of various machine learning and NLP-based offerings. A standard publicly available benchmark would also allay concerns of undue influence in predictive models, including the use of metadata which the relevant law expressly disregards.

If you participate, use the LexGLUE benchmark, or our experimentation library, please cite:

Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz, and Nikolaos Aletras. LexGLUE: A Benchmark Dataset for Legal Language Understanding in English. 2022. In the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin, Ireland.

@inproceedings{chalkidis-etal-2022-lexglue,
    title = "{L}ex{GLUE}: A Benchmark Dataset for Legal Language Understanding in {E}nglish",
    author = "Chalkidis, Ilias  and
      Jana, Abhik  and
      Hartung, Dirk  and
      Bommarito, Michael  and
      Androutsopoulos, Ion  and
      Katz, Daniel  and
      Aletras, Nikolaos",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.297",
    pages = "4310--4330",
}

Supported Tasks

DatasetSourceSub-domainTask TypeTrain/Dev/Test InstancesClasses
ECtHR (Task A) Chalkidis et al. (2019) ECHRMulti-label classification9,000/1,000/1,00010+1
ECtHR (Task B) Chalkidis et al. (2021a) ECHRMulti-label classification 9,000/1,000/1,00010+1
SCOTUS Spaeth et al. (2020)US LawMulti-class classification5,000/1,400/1,40014
EUR-LEX Chalkidis et al. (2021b)EU LawMulti-label classification55,000/5,000/5,000100
LEDGAR Tuggener et al. (2020)ContractsMulti-class classification60,000/10,000/10,000100
UNFAIR-ToS Lippi et al. (2019)ContractsMulti-label classification5,532/2,275/1,6078+1
CaseHOLDZheng et al. (2021)US LawMultiple choice QA45,000/3,900/3,900n/a

ECtHR (Task A)

The European Court of Human Rights (ECtHR) hears allegations that a state has breached human rights provisions of the European Convention of Human Rights (ECHR). For each case, the dataset provides a list of factual paragraphs (facts) from the case description. Each case is mapped to articles of the ECHR that were violated (if any).

ECtHR (Task B)

The European Court of Human Rights (ECtHR) hears allegations that a state has breached human rights provisions of the European Convention of Human Rights (ECHR). For each case, the dataset provides a list of factual paragraphs (facts) from the case description. Each case is mapped to articles of ECHR that were allegedly violated (considered by the court).

SCOTUS

The US Supreme Court (SCOTUS) is the highest federal court in the United States of America and generally hears only the most controversial or otherwise complex cases which have not been sufficiently well solved by lower courts. This is a single-label multi-class classification task, where given a document (court opinion), the task is to predict the relevant issue areas. The 14 issue areas cluster 278 issues whose focus is on the subject matter of the controversy (dispute).

EUR-LEX

European Union (EU) legislation is published in EUR-Lex portal. All EU laws are annotated by EU's Publications Office with multiple concepts from the EuroVoc thesaurus, a multilingual thesaurus maintained by the Publications Office. The current version of EuroVoc contains more than 7k concepts referring to various activities of the EU and its Member States (e.g., economics, health-care, trade). Given a document, the task is to predict its EuroVoc labels (concepts).

LEDGAR

LEDGAR dataset aims contract provision (paragraph) classification. The contract provisions come from contracts obtained from the US Securities and Exchange Commission (SEC) filings, which are publicly available from EDGAR. Each label represents the single main topic (theme) of the corresponding contract provision.

UNFAIR-ToS

The UNFAIR-ToS dataset contains 50 Terms of Service (ToS) from on-line platforms (e.g., YouTube, Ebay, Facebook, etc.). The dataset has been annotated on the sentence-level with 8 types of unfair contractual terms (sentences), meaning terms that potentially violate user rights according to the European consumer law.

CaseHOLD

The CaseHOLD (Case Holdings on Legal Decisions) dataset includes multiple choice questions about holdings of US court cases from the Harvard Law Library case law corpus. Holdings are short summaries of legal rulings accompany referenced decisions relevant for the present case. The input consists of an excerpt (or prompt) from a court decision, containing a reference to a particular case, while the holding statement is masked out. The model must identify the correct (masked) holding statement from a selection of five choices.

Leaderboard

Averaged LexGLUE Scores

We report the arithmetic, harmonic, and geometric mean across tasks following Shavrina and Malykh (2021). We acknowledge that the use of scores aggregated over tasks has been criticized in general NLU benchmarks (e.g., GLUE), as models are trained with different numbers of samples, task complexity, and evaluation metrics per task. We believe that the use of a standard common metric (F1) across tasks and averaging with harmonic mean alleviate this issue.

AveragingArithmeticHarmonicGeometric
ModelΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1
BERT 77.8 / 69.5 76.7 / 68.2 77.2 / 68.8
RoBERTa 77.8 / 68.7 76.8 / 67.5 77.3 / 68.1
RoBERTa (Large) 79.4 / 70.8 78.4 / 69.1 78.9 / 70.0
DeBERTa 78.3 / 69.7 77.4 / 68.5 77.8 / 69.1
Longformer 78.5 / 70.5 77.5 / 69.5 78.0 / 70.0
BigBird 78.2 / 69.6 77.2 / 68.5 77.7 / 69.0
Legal-BERT 79.8 / 72.0 78.9 / 70.8 79.3 / 71.4
CaseLaw-BERT 79.4 / 70.9 78.5 / 69.7 78.9 / 70.3

Task-wise LexGLUE scores

Large-sized (:older_man:) Models [1]

DatasetECtHR AECtHR BSCOTUSEUR-LEXLEDGARUNFAIR-ToSCaseHOLD
ModelΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1ΞΌ-F1 / m-F1
RoBERTa 73.8 / 67.6 79.8 / 71.6 75.5 / 66.3 67.9 / 50.3 88.6 / 83.6 95.8 / 81.6 74.4

[1] Results reported by Chalkidis et al. (2021). All large-sized transformer-based models follow the same specifications (L=24, H=1024, A=18).

Medium-sized (:man:) Models [2]

DatasetECtHR AECtHR BSCOTUSEUR-LEXLEDGARUNFAIR-ToSCaseHOLD
ModelΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1ΞΌ-F1 / m-F1
TFIDF+SVM 62.6 / 48.9 73.0 / 63.8 74.0 / 64.4 63.4 / 47.9 87.0 / 81.4 94.7 / 75.022.4
BERT 71.2 / 63.6 79.7 / 73.4 68.3 / 58.3 71.4 / 57.2 87.6 / 81.8 95.6 / 81.3 70.8
RoBERTa 69.2 / 59.0 77.3 / 68.9 71.6 / 62.0 71.9 / 57.9 87.9 / 82.3 95.2 / 79.2 71.4
DeBERTa 70.0 / 60.8 78.8 / 71.0 71.1 / 62.7 72.1 / 57.4 88.2 / 83.1 95.5 / 80.3 72.6
Longformer 69.9 / 64.7 79.4 / 71.7 72.9 / 64.0 71.6 / 57.7 88.2 / 83.0 95.5 / 80.9 71.9
BigBird 70.0 / 62.9 78.8 / 70.9 72.8 / 62.0 71.5 / 56.8 87.8 / 82.6 95.7 / 81.3 70.8
Legal-BERT 70.0 / 64.0 80.4 / 74.7 76.4 / 66.5 72.1 / 57.4 88.2 / 83.0 96.0 / 83.0 75.3
CaseLaw-BERT 69.8 / 62.9 78.8 / 70.3 76.6 / 65.9 70.7 / 56.6 88.3 / 83.0 96.0 / 82.3 75.4

[2] Results reported by Chalkidis et al. (2021). All medium-sized transformer-based models follow the same specifications (L=12, H=768, A=12).

Small-sized (:baby:) Models [3]

DatasetECtHR AECtHR BSCOTUSEUR-LEXLEDGARUNFAIR-ToSCaseHOLD
ModelΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1 ΞΌ-F1 / m-F1ΞΌ-F1 / m-F1
BERT-Tinyn/an/a 62.8 / 40.9 65.5 / 27.5 83.9 / 74.7 94.3 / 11.1 68.3
Mini-LM (v2)n/an/a 60.8 / 45.5 62.2 / 35.6 86.7 / 79.6 93.9 / 13.2 71.3
Distil-BERTn/an/a 67.0 / 55.9 66.0 / 51.5 87.5 / 81.5 97.1 / 79.4 68.6
Legal-BERT n/an/a75.6 / 68.5 73.4 / 54.487.8 /81.497.1 / 76.374.7

[3] Results reported by Atreya Shankar (@atreyasha) πŸ€— πŸ₯³. More details (e.g., validation scores, log files) are provided here. The small-sized models' specifications are:

Frequently Asked Questions (FAQ)

Where are the datasets?

We provide access to LexGLUE on Hugging Face Datasets (Lhoest et al., 2021) at https://huggingface.co/datasets/lex_glue.

For example to load the SCOTUS Spaeth et al. (2020) dataset, you first simply install the datasets python library and then make the following call:

from datasets import load_dataset 
dataset = load_dataset("lex_glue", "scotus")

How to run experiments?

Furthermore, to make reproducing the results for the already examined models or future models even easier, we release our code in this repository. In folder /experiments, there are Python scripts, relying on the Hugging Face Transformers library, to run and evaluate any Transformer-based model (e.g., BERT, RoBERTa, LegalBERT, and their hierarchical variants, as well as, Longforrmer, and BigBird). We also provide bash scripts in folder /scripts to replicate the experiments for each dataset with 5 randoms seeds, as we did for the reported results for the original leaderboard.

Make sure that all required packages are installed:

torch>=1.9.0
transformers>=4.9.0
scikit-learn>=0.24.1
tqdm>=4.61.1
numpy>=1.20.1
datasets>=1.12.1
nltk>=3.5
scipy>=1.6.3

For example to replicate the results for RoBERTa (Liu et al., 2019) on UNFAIR-ToS Lippi et al. (2019), you have to configure the relevant bash script (run_unfair_tos.sh):

> nano run_unfair_tos.sh
GPU_NUMBER=1
MODEL_NAME='roberta-base'
LOWER_CASE='False'
BATCH_SIZE=8
ACCUMULATION_STEPS=1
TASK='unfair_tos'

and then run it:

> sh run_unfair_tos.sh

Note: The bash scripts make use of two HF arguments/parameters (--fp16, --fp16_full_eval), which are only applicable (working) when there are available (and correctly configured) NVIDIA GPUs in a machine station (server or cluster), while also torch is correctly configured to use these compute resources.

So, in case you don't have such resources, just delete these two arguments from the scripts to train models with standard fp32 precision. In case you have such resources, make sure to correctly install the NVIDIA CUDA drivers, and also correctly install torch to identify these resources (Consider this page to figure out the appropriate steps: https://pytorch.org/get-started/locally/)

I don't have the resources to run all these Muppets. What can I do?

You can use Google Colab with GPU acceleration for free online (https://colab.research.google.com).

  • Set Up a new notebook (https://colab.research.google.com) and git clone the project.
  • Navigate to Edit β†’ Notebook Settings and select GPU from the Hardware Accelerator drop-down. You will probably get assigned with an NVIDIA Tesla K80 12GB.
  • You will also have to decrease the batch size and increase the accumulation steps for hierarchical models.

But, this is an interesting open problem (Efficient NLP), please consider using lighter pre-trained (smaller/faster) models, like:

, or non transformer-based neural models, like:

, or even non neural models, like:

  • Bag of Word (BoW) models using TF-IDF representations (e.g., SVM, Random Forest),
  • The eXtreme Gradient Boosting (XGBoost) of Chen and Guestrin (2016),

and report back the results. We are curious!

How to participate?

We are currently still lacking some technical infrastructure, e.g., an integrated submission environment comprised of an automated evaluation and an automatically updated leaderboard. We plan to develop the necessary publicly available web infrastructure extend the public infrastructure of LexGLUE in the near future.

In the mean-time, we ask participants to re-use and expand our code to submit new results, if possible, and open a new discussion (submission) in our repository (https://github.com/coastalcph/lex-glue/discussions/new?category=new-results) presenting their results, providing the auto-generated result logs and the relevant publication (or pre-print), if available, accompanied with a pull request including the code amendments that are needed to reproduce their experiments. Upon reviewing your results, we'll update the public leaderboard accordingly.

I want to re-load fine-tuned HierBERT models. How can I do this?

You can re-load fine-tuned HierBERT models following our example python script "Re-load HierBERT models".

I still have open questions...

Please post your question on Discussions section or communicate with the corresponding author via e-mail.

Credits

Thanks to @JamesLYC88 and @danigoju for digging up for πŸ›s!

lex-glue's People

Contributors

danigoju avatar iliaschalkidis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

lex-glue's Issues

Hierarchical bert

Hello! Thank you for starting this project.

I have a small question about the hierbert model (HierarchicalBert).
You use it to:
replace flat BERT encoder with hierarchical BERT encoder.

The hierarchy isnt about the labels/classes (classes could belong to a hierarchical tree), right? The hierarchy you mention is related to the text/token segments in a document, i.e you consider that a document is not only a big plaintext but a list of text segments and you give that information to the model?

Thank you for any information.

Number of Target Fields in the SCOTUS dataset on HuggingFace

The SCOTUS dataset available as part of the LexGlue corpus mentions 14 classes within the dataset. Upon verification over the HuggingFace SCOTUS dataset, we only get 13 classes through this method.

from datasets import load_dataset  # !pip install datasets
import numpy as np

scotus = load_dataset('lex_glue', 'scotus')
labels = list(scotus['train']['label'])
classes = np.unique(labels)
print(classes, len(classes))

scotus = load_dataset('lex_glue', 'scotus')
labels = list(scotus['test']['label'])
classes = np.unique(labels)
print(classes, len(classes))

The results display on 13 unique classes instead of 14, as shown below.

image

Is there an issue in which we're extracting the data, if so we'd greatly appreciate any help.

Hyper-parameters of DeBERTa for EUR-LEX

Hi, my reproduced results for EUR-LEX are quite far from the reported ones. Could you provide the hyper-parameters of DeBERTa for EUR-LEX? And which version of DeBERTa is used, V2/V3, Base/Large?

Looking forward to your reply. Thanks!

Bug in running the file

I'm running the following script: CUDA_VISIBLE_DEVICES=4 python3 -i /net/scratch/jasonhu/legal_dec-sum/lex-glue/experiments/ecthr.py --model_name_or_path 'bert-base-uncased' --do_lower_case 'True' --task 'ecthr_a' --output_dir logs/'ecthr_a'/'bert-base-uncased'/seed_1 --do_train --do_eval --do_pred --overwrite_output_dir --load_best_model_at_end --metric_for_best_model micro-f1 --greater_is_better True --evaluation_strategy epoch --save_strategy epoch --save_total_limit 5 --num_train_epochs 20 --learning_rate 3e-5 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --seed 1 --gradient_accumulation_steps 4 --eval_accumulation_steps 4

And then the following bug occurs:
image

Tried many ways to solve it but failed, any idea how to tackle this problem? Thanks!

Results for `legal-bert-small`

Hi @iliaschalkidis,

As mentioned previously, I have been running some experiments on LexGLUE benchmarks and will soon be finishing with the runs for legal-bert-small. It is mentioned that it would be useful to report the results of this smaller model.

Should I just post the results here, or would you prefer another medium?

Fast tokenizer for CaseHOLD

FYI: reglab/casehold#2

The bug with the fast tokenizer should be fixed now, so it is possible to use it.

tokenizer = AutoTokenizer.from_pretrained(
model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
cache_dir=model_args.cache_dir,
# Default fast tokenizer is buggy on CaseHOLD task, switch to legacy tokenizer
use_fast=False,
)

Bias in TF-IDF + SVM results for SCOTUS

You probably have realised that the results with TF-IDF + SVM approach for SCOTUS are pretty high, well, I think they have a bias. I think that the testing metrics are being computed after a retraining of the Pipeline with both training and validation sets combined, while the other Language Models are only fine-tuned with the training set. This is because sklearn.model_selection.GridSearchCV has the parameter "refit" equal True as default, which ends up in a biased comparison.

Training with only the training set for the best hyperparameters found in the validation set the microf1 score is closer to 74.0 and the macrof1 to 64.4

Reference:

gs_clf = GridSearchCV(text_clf, parameters, cv=val_split, n_jobs=32, verbose=4)

Script and results on eurlex

Hello! Thanks for this great repository. I have tried experiments on many of its subtasks and it works beautifully.

Now a problem is, when I am trying to reproduce the results on EUR-LEX, using run_eurlex.sh, it fails to give results similar to (or somewhere near) the ones in paper.

                       VALIDATION                                      | TEST
bert-base-uncased: MICRO-F1: 69.7      Β± 0.1  MACRO-F1: 32.8   Β± 0.4   | MICRO-F1: 63.1       MACRO-F1: 30.8

( I tried to change the model to legal-base-uncased, or change the number of epochs from 2 to 20, but these attempts failed too)

Can you help to have a look into this and give some suggestions?

A more detailed log for one of the 5 seeds are as follows:

...
[INFO|trainer.py:1419] 2022-06-27 05:09:06,003 >> ***** Running training *****
[INFO|trainer.py:1420] 2022-06-27 05:09:06,003 >>   Num examples = 55000
[INFO|trainer.py:1421] 2022-06-27 05:09:06,003 >>   Num Epochs = 2
[INFO|trainer.py:1422] 2022-06-27 05:09:06,003 >>   Instantaneous batch size per device = 8
[INFO|trainer.py:1423] 2022-06-27 05:09:06,003 >>   Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:1424] 2022-06-27 05:09:06,003 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1425] 2022-06-27 05:09:06,003 >>   Total optimization steps = 13750
{'loss': 0.1809, 'learning_rate': 2.890909090909091e-05, 'epoch': 0.07}
{'loss': 0.1112, 'learning_rate': 2.7818181818181818e-05, 'epoch': 0.15}
{'loss': 0.0966, 'learning_rate': 2.6727272727272728e-05, 'epoch': 0.22}
{'loss': 0.0857, 'learning_rate': 2.5636363636363635e-05, 'epoch': 0.29}
{'loss': 0.0784, 'learning_rate': 2.454545454545455e-05, 'epoch': 0.36}
{'loss': 0.072, 'learning_rate': 2.3454545454545456e-05, 'epoch': 0.44}
{'loss': 0.0676, 'learning_rate': 2.2363636363636366e-05, 'epoch': 0.51}
{'loss': 0.0663, 'learning_rate': 2.1272727272727273e-05, 'epoch': 0.58}
{'loss': 0.0632, 'learning_rate': 2.0181818181818183e-05, 'epoch': 0.65}
{'loss': 0.0603, 'learning_rate': 1.909090909090909e-05, 'epoch': 0.73}
{'loss': 0.0593, 'learning_rate': 1.8e-05, 'epoch': 0.8}
{'loss': 0.0571, 'learning_rate': 1.6909090909090907e-05, 'epoch': 0.87}
{'loss': 0.0551, 'learning_rate': 1.5818181818181818e-05, 'epoch': 0.95}
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                   | 6875/13750 [14:19<14:12,  8.07it/s]
[INFO|trainer.py:622] 2022-06-27 05:23:25,910 >> The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and
have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2590] 2022-06-27 05:23:25,913 >> ***** Running Evaluation *****
[INFO|trainer.py:2592] 2022-06-27 05:23:25,914 >>   Num examples = 5000
[INFO|trainer.py:2595] 2022-06-27 05:23:25,914 >>   Batch size = 8
{'eval_loss': 0.06690910458564758, 'eval_macro-f1': 0.26581931249101237, 'eval_micro-f1': 0.6573569918647109, 'eval_runtime': 25.2148, 'eval_samples_per_second': 198.296, 'eval
_steps_per_second': 24.787, 'epoch': 1.0}
INFO|trainer.py:2340] 2022-06-27 05:23:51,131 >> Saving model checkpoint to logs/062605_eurlex_original/eurlex/bert-base-uncased/se
ed_5/checkpoint-6875
[INFO|configuration_utils.py:446] 2022-06-27 05:23:51,134 >> Configuration saved in logs/062605_eurlex_original/eurlex/bert-base-un
cased/seed_5/checkpoint-6875/config.json
[INFO|modeling_utils.py:1542] 2022-06-27 05:23:52,343 >> Model weights saved in logs/062605_eurlex_original/eurlex/bert-base-uncase
d/seed_5/checkpoint-6875/pytorch_model.bin
[INFO|tokenization_utils_base.py:2108] 2022-06-27 05:23:52,345 >> tokenizer config file saved in logs/062605_eurlex_original/eurlex
/bert-base-uncased/seed_5/checkpoint-6875/tokenizer_config.json
[INFO|tokenization_utils_base.py:2114] 2022-06-27 05:23:52,346 >> Special tokens file saved in logs/062605_eurlex_original/eurlex/b
ert-base-uncased/seed_5/checkpoint-6875/special_tokens_map.json
{'loss': 0.0546, 'learning_rate': 1.4727272727272728e-05, 'epoch': 1.02}
{'loss': 0.0531, 'learning_rate': 1.3636363636363637e-05, 'epoch': 1.09}
{'loss': 0.0518, 'learning_rate': 1.2545454545454545e-05, 'epoch': 1.16}
{'loss': 0.0521, 'learning_rate': 1.1454545454545455e-05, 'epoch': 1.24}
{'loss': 0.0497, 'learning_rate': 1.0363636363636364e-05, 'epoch': 1.31}
{'loss': 0.0481, 'learning_rate': 9.272727272727273e-06, 'epoch': 1.38}
{'loss': 0.0487, 'learning_rate': 8.181818181818181e-06, 'epoch': 1.45}
{'loss': 0.0488, 'learning_rate': 7.090909090909091e-06, 'epoch': 1.53}
{'loss': 0.0477, 'learning_rate': 6e-06, 'epoch': 1.6}
{'loss': 0.0476, 'learning_rate': 4.90909090909091e-06, 'epoch': 1.67}
{'loss': 0.047, 'learning_rate': 3.818181818181818e-06, 'epoch': 1.75}
{'loss': 0.0471, 'learning_rate': 2.7294545454545455e-06, 'epoch': 1.82}
{'loss': 0.0462, 'learning_rate': 1.6385454545454545e-06, 'epoch': 1.89}
{'loss': 0.0466, 'learning_rate': 5.476363636363636e-07, 'epoch': 1.96}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 13750/13750 [29:11<00:00,  7.96it/s]
[INFO|trainer.py:622] 2022-06-27 05:38:17,987 >> The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and
have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2590] 2022-06-27 05:38:17,989 >> ***** Running Evaluation *****
[INFO|trainer.py:2592] 2022-06-27 05:38:17,989 >>   Num examples = 5000
[INFO|trainer.py:2595] 2022-06-27 05:38:17,989 >>   Batch size = 8
{'eval_loss': 0.06163998320698738, 'eval_macro-f1': 0.3223906812379972, 'eval_micro-f1': 0.6903704623792815, 'eval_runtime': 24.2671, 'eval_samples_per_second': 206.041, 'eval_
steps_per_second': 25.755, 'epoch': 2.0}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 13750/13750 [29:36<00:00,  7.96it/s[
INFO|trainer.py:2340] 2022-06-27 05:38:42,258 >> Saving model checkpoint to logs/062605_eurlex_original/eurlex/bert-base-uncased/se
ed_5/checkpoint-13750
[INFO|configuration_utils.py:446] 2022-06-27 05:38:42,261 >> Configuration saved in logs/062605_eurlex_original/eurlex/bert-base-un
cased/seed_5/checkpoint-13750/config.json
[INFO|modeling_utils.py:1542] 2022-06-27 05:38:43,511 >> Model weights saved in logs/062605_eurlex_original/eurlex/bert-base-uncase
d/seed_5/checkpoint-13750/pytorch_model.bin
[INFO|tokenization_utils_base.py:2108] 2022-06-27 05:38:43,513 >> tokenizer config file saved in logs/062605_eurlex_original/eurlex
/bert-base-uncased/seed_5/checkpoint-13750/tokenizer_config.json
[INFO|tokenization_utils_base.py:2114] 2022-06-27 05:38:43,513 >> Special tokens file saved in logs/062605_eurlex_original/eurlex/b
ert-base-uncased/seed_5/checkpoint-13750/special_tokens_map.json
[INFO|trainer.py:1662] 2022-06-27 05:38:46,057 >>

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1727] 2022-06-27 05:38:46,057 >> Loading best model from logs/062605_eurlex_original/eurlex/bert-base-uncased/seed
_5/checkpoint-13750 (score: 0.6903704623792815).
{'train_runtime': 1781.228, 'train_samples_per_second': 61.755, 'train_steps_per_second': 7.719, 'train_loss': 0.06421310944990678, 'epoch': 2.0}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 13750/13750 [29:41<00:00,  7.72it/s]
[INFO|trainer.py:2340] 2022-06-27 05:38:47,236 >> Saving model checkpoint to logs/062605_eurlex_original/eurlex/bert-base-uncased/s
eed_5
[INFO|configuration_utils.py:446] 2022-06-27 05:38:47,261 >> Configuration saved in logs/062605_eurlex_original/eurlex/bert-base-un
cased/seed_5/config.json
[INFO|modeling_utils.py:1542] 2022-06-27 05:38:48,560 >> Model weights saved in logs/062605_eurlex_original/eurlex/bert-base-uncase
d/seed_5/pytorch_model.bin
[INFO|tokenization_utils_base.py:2108] 2022-06-27 05:38:48,562 >> tokenizer config file saved in logs/062605_eurlex_original/eurlex
/bert-base-uncased/seed_5/tokenizer_config.json
[INFO|tokenization_utils_base.py:2114] 2022-06-27 05:38:48,563 >> Special tokens file saved in logs/062605_eurlex_original/eurlex/b
ert-base-uncased/seed_5/special_tokens_map.json
***** train metrics *****
  epoch                    =        2.0
  train_loss               =     0.0642
  train_runtime            = 0:29:41.22
  train_samples            =      55000
  train_samples_per_second =     61.755
  train_steps_per_second   =      7.719
06/27/2022 05:38:48 - INFO - __main__ - *** Evaluate ***
[INFO|trainer.py:622] 2022-06-27 05:38:48,611 >> The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and
have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2590] 2022-06-27 05:38:48,620 >> ***** Running Evaluation *****
[INFO|trainer.py:2592] 2022-06-27 05:38:48,620 >>   Num examples = 5000
[INFO|trainer.py:2595] 2022-06-27 05:38:48,620 >>   Batch size = 8
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 625/625 [00:22<00:00, 27.85it/s]
***** eval metrics *****
  epoch                   =        2.0
  eval_loss               =     0.0616
  eval_macro-f1           =     0.3224
  eval_micro-f1           =     0.6904
  eval_runtime            = 0:00:22.48
  eval_samples            =       5000
  eval_samples_per_second =    222.372
  eval_steps_per_second   =     27.796
06/27/2022 05:39:11 - INFO - __main__ - *** Predict ***
[INFO|trainer.py:622] 2022-06-27 05:39:11,101 >> The following columns in the test set don't have a corresponding argument in `BertForSequenceClassification.forward` and have b
een ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2590] 2022-06-27 05:39:11,106 >> ***** Running Prediction *****
[INFO|trainer.py:2592] 2022-06-27 05:39:11,106 >>   Num examples = 5000
[INFO|trainer.py:2595] 2022-06-27 05:39:11,106 >>   Batch size = 8
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 625/625 [00:22<00:00, 27.64it/s]
***** predict metrics *****
  predict_loss               =     0.0712
  predict_macro-f1           =     0.2969
  predict_micro-f1           =     0.6196
  predict_runtime            = 0:00:22.44
  predict_samples            =       5000
  predict_samples_per_second =    222.741
  predict_steps_per_second   =     27.843
...

Shell script for `ecthr_b`

Hello all, thank you for releasing this repository!

I am currently working on reproducing some of the results in this repository. In the readme, benchmarked results are presented for all tasks including ECtHR Task A and ECtHR Task B. However, the shell script run_ecthr.sh only encodes one task; namely ecthr_a:

TASK='ecthr_a'

Is there a reason for this, or is it implied to run this script another time after changing the TASK variable to ecthr_b?

Key Error while loading case_hold dataset

Hi,

I face with a problem while loading case_hold dataset. It reports Key Error: 'question'. Could you please solve it or give me some advice on loading that dataset?

Thank you very much!

dataset = load_dataset("lex_glue", "case_hold", revision="1.15.1")

Downloading and preparing dataset lex_glue/case_hold (download: 29.01 MiB, generated: 255.06 MiB, post-processed: Unknown size, total: 284.08 MiB) to C:\XXXXX
Traceback (most recent call last):
File "", line 1, in
File "D:\Anaconda3\lib\site-packages\datasets\load.py", line 1632, in load_dataset
builder_instance.download_and_prepare(
File "D:\Anaconda3\lib\site-packages\datasets\builder.py", line 607, in download_and_prepare
self._download_and_prepare(
File "D:\Anaconda3\lib\site-packages\datasets\builder.py", line 697, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "D:\Anaconda3\lib\site-packages\datasets\builder.py", line 1103, in _prepare_split
example = self.info.features.encode_example(record)
File "D:\Anaconda3\lib\site-packages\datasets\features\features.py", line 1033, in encode_example
return encode_nested_example(self, example)
File "D:\Anaconda3\lib\site-packages\datasets\features\features.py", line 808, in encode_nested_example
return {
File "D:\Anaconda3\lib\site-packages\datasets\features\features.py", line 808, in
return {
File "D:\Anaconda3\lib\site-packages\datasets\utils\py_utils.py", line 108, in zip_dict
yield key, tuple(d[key] for d in dicts)
File "D:\Anaconda3\lib\site-packages\datasets\utils\py_utils.py", line 108, in
yield key, tuple(d[key] for d in dicts)
KeyError: 'question'

Loading Hierarchical models

Hi, i used the scripts and everything worked fine, i was able to train the models without any trouble.
The results shown with the testing after training are also coherent.

But the issue (at the end of the message) occurred when i tried to load the model it order to test to predict other samples.
It is not possible to load the model because there is a difference between the names of the layers expected and the layers in the file. As we can see in the error message (at the end), there are double occurences of "encoder" in some layer names of the saved file. When loading, the model does not use those layer names.

This problem happens with ECtHR (A & B) and Scotus tasks (maybe even others) with Bert models, it seems that the issue occurs when using hierarchical variant. When not using hierarchical, we dont have any problem to load the models after saving them. But the results are not as performant as they should be.

Do you have the same issue ? I am using Ubuntu 20.04 with python 3.8.

[WARNING|modeling_utils.py:1501] 2022-03-28 18:01:33,192 >> Some weights of the model checkpoint at /home/X/Xs/lex-glue/seed_1 were not used when initializing BertForSequenceClassification: ['bert.encoder.encoder.layer.4.attention.self.query.weight', 'bert.seg_encoder.layers.1.self_attn.out_proj.weight', 'bert.encoder.encoder.layer.8.attention.self.query.bias', 'bert.seg_encoder.layers.1.norm2.weight', 'bert.encoder.encoder.layer.10.output.dense.bias', 'bert.encoder.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.0.attention.self.key.weight', 'bert.encoder.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.seg_encoder.layers.1.self_attn.out_proj.bias', 'bert.seg_encoder.layers.0.norm1.bias', 'bert.encoder.encoder.layer.10.attention.self.query.bias', 'bert.encoder.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.encoder.layer.7.attention.self.value.weight', 'bert.seg_encoder.layers.1.norm2.bias', 'bert.encoder.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.embeddings.token_type_embeddings.weight', 'bert.encoder.embeddings.word_embeddings.weight', 'bert.encoder.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.encoder.layer.11.intermediate.dense.weight', 'bert.seg_encoder.layers.0.self_attn.out_proj.bias', 'bert.seg_encoder.layers.1.norm1.weight', 'bert.encoder.encoder.layer.10.output.dense.weight', 'bert.seg_encoder.layers.0.norm1.weight', 'bert.encoder.encoder.layer.8.attention.self.value.weight', 'bert.encoder.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.embeddings.LayerNorm.weight', 'bert.encoder.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.9.output.dense.weight', 'bert.encoder.encoder.layer.6.output.dense.weight', 'bert.encoder.encoder.layer.2.output.dense.weight', 'bert.encoder.encoder.layer.11.output.LayerNorm.bias', 'bert.seg_encoder.layers.1.norm1.bias', 'bert.encoder.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.encoder.layer.11.attention.self.key.bias', 'bert.encoder.encoder.layer.3.attention.output.dense.bias', 'bert.seg_encoder.layers.0.linear2.weight', 'bert.encoder.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.3.attention.self.query.weight', 'bert.encoder.encoder.layer.3.output.dense.weight', 'bert.seg_encoder.norm.weight', 'bert.encoder.encoder.layer.8.output.dense.bias', 'bert.seg_encoder.layers.1.linear2.weight', 'bert.encoder.embeddings.position_ids', 'bert.encoder.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.encoder.layer.4.attention.self.key.weight', 'bert.encoder.encoder.layer.3.attention.self.query.bias', 'bert.encoder.encoder.layer.1.output.LayerNorm.bias', 'bert.encoder.encoder.layer.10.attention.self.key.weight', 'bert.encoder.encoder.layer.2.output.LayerNorm.weight', 'bert.encoder.encoder.layer.1.attention.self.query.bias', 'bert.encoder.encoder.layer.10.attention.self.value.weight', 'bert.encoder.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.encoder.layer.0.intermediate.dense.weight', 'bert.encoder.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.encoder.layer.5.output.dense.bias', 'bert.encoder.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.2.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.1.intermediate.dense.weight', 'bert.encoder.encoder.layer.9.attention.self.key.weight', 'bert.encoder.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.encoder.layer.8.attention.self.key.bias', 'bert.encoder.encoder.layer.4.attention.self.value.weight', 'bert.encoder.encoder.layer.3.attention.self.value.bias', 'bert.encoder.encoder.layer.9.attention.self.value.bias', 'bert.encoder.encoder.layer.9.attention.self.key.bias', 'bert.encoder.encoder.layer.0.attention.self.value.weight', 'bert.encoder.encoder.layer.7.output.dense.weight', 'bert.encoder.encoder.layer.7.attention.self.query.weight', 'bert.seg_encoder.layers.0.self_attn.in_proj_weight', 'bert.encoder.encoder.layer.6.attention.self.value.weight', 'bert.encoder.encoder.layer.11.attention.self.query.bias', 'bert.seg_encoder.layers.0.self_attn.out_proj.weight', 'bert.encoder.encoder.layer.2.output.dense.bias', 'bert.seg_encoder.layers.1.self_attn.in_proj_weight', 'bert.seg_encoder.layers.1.linear2.bias', 'bert.encoder.encoder.layer.0.attention.self.key.bias', 'bert.encoder.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.encoder.layer.4.attention.self.value.bias', 'bert.seg_encoder.layers.0.self_attn.in_proj_bias', 'bert.encoder.encoder.layer.6.attention.self.query.bias', 'bert.encoder.embeddings.position_embeddings.weight', 'bert.encoder.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.pooler.dense.weight', 'bert.encoder.encoder.layer.2.output.LayerNorm.bias', 'bert.encoder.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.encoder.layer.1.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.5.attention.self.query.weight', 'bert.encoder.encoder.layer.1.attention.output.dense.weight', 'bert.encoder.encoder.layer.1.output.dense.weight', 'bert.encoder.encoder.layer.0.output.dense.weight', 'bert.encoder.encoder.layer.3.attention.self.key.weight', 'bert.encoder.encoder.layer.2.attention.self.value.weight', 'bert.encoder.encoder.layer.5.attention.self.query.bias', 'bert.encoder.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.encoder.layer.9.attention.self.query.bias', 'bert.encoder.encoder.layer.1.attention.self.key.bias', 'bert.encoder.encoder.layer.7.attention.self.key.bias', 'bert.encoder.encoder.layer.11.attention.self.value.bias', 'bert.encoder.encoder.layer.1.attention.self.query.weight', 'bert.encoder.encoder.layer.1.attention.output.dense.bias', 'bert.encoder.encoder.layer.9.attention.self.query.weight', 'bert.encoder.encoder.layer.5.output.dense.weight', 'bert.encoder.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.1.attention.self.value.bias', 'bert.seg_encoder.layers.1.self_attn.in_proj_bias', 'bert.encoder.encoder.layer.3.attention.self.value.weight', 'bert.encoder.encoder.layer.11.output.dense.weight', 'bert.encoder.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.encoder.layer.0.output.LayerNorm.bias', 'bert.seg_encoder.layers.0.linear1.bias', 'bert.encoder.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.encoder.layer.0.attention.self.query.weight', 'bert.encoder.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.embeddings.LayerNorm.bias', 'bert.encoder.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.encoder.layer.6.attention.self.key.bias', 'bert.encoder.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.encoder.layer.8.attention.self.value.bias', 'bert.encoder.encoder.layer.11.output.dense.bias', 'bert.encoder.encoder.layer.11.intermediate.dense.bias', 'bert.seg_encoder.norm.bias', 'bert.encoder.encoder.layer.1.attention.self.value.weight', 'bert.encoder.encoder.layer.0.output.LayerNorm.weight', 'bert.encoder.encoder.layer.7.attention.self.query.bias', 'bert.encoder.encoder.layer.10.attention.self.query.weight', 'bert.encoder.encoder.layer.0.attention.output.LayerNorm.weight', 'bert.seg_encoder.layers.1.linear1.weight', 'bert.encoder.encoder.layer.0.attention.self.value.bias', 'bert.encoder.encoder.layer.3.attention.self.key.bias', 'bert.encoder.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.encoder.layer.2.attention.output.dense.weight', 'bert.encoder.encoder.layer.7.attention.self.key.weight', 'bert.encoder.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.1.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.4.output.dense.weight', 'bert.encoder.encoder.layer.7.attention.self.value.bias', 'bert.encoder.encoder.layer.7.output.dense.bias', 'bert.encoder.encoder.layer.5.attention.self.value.bias', 'bert.encoder.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.encoder.layer.10.intermediate.dense.bias', 'bert.seg_encoder.layers.0.linear2.bias', 'bert.seg_encoder.layers.0.linear1.weight', 'bert.encoder.encoder.layer.11.attention.self.query.weight', 'bert.encoder.encoder.layer.2.attention.self.query.weight', 'bert.encoder.encoder.layer.5.attention.self.value.weight', 'bert.encoder.encoder.layer.4.output.dense.bias', 'bert.encoder.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.encoder.layer.0.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.11.attention.self.value.weight', 'bert.encoder.encoder.layer.5.attention.self.key.bias', 'bert.encoder.encoder.layer.11.attention.self.key.weight', 'bert.encoder.encoder.layer.2.intermediate.dense.weight', 'bert.encoder.encoder.layer.1.output.dense.bias', 'bert.encoder.encoder.layer.2.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.encoder.layer.6.attention.self.key.weight', 'bert.encoder.encoder.layer.2.attention.output.dense.bias', 'bert.encoder.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.2.attention.self.key.weight', 'bert.encoder.pooler.dense.bias', 'bert.encoder.encoder.layer.2.attention.self.query.bias', 'bert.encoder.encoder.layer.0.output.dense.bias', 'bert.encoder.encoder.layer.6.attention.self.query.weight', 'bert.encoder.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.encoder.layer.0.attention.output.dense.bias', 'bert.encoder.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.0.attention.self.query.bias', 'bert.encoder.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.encoder.layer.8.attention.self.query.weight', 'bert.encoder.encoder.layer.0.intermediate.dense.bias', 'bert.encoder.encoder.layer.8.output.dense.weight', 'bert.encoder.encoder.layer.10.attention.self.value.bias', 'bert.encoder.encoder.layer.3.attention.output.dense.weight', 'bert.seg_encoder.layers.0.norm2.bias', 'bert.encoder.encoder.layer.9.attention.self.value.weight', 'bert.encoder.encoder.layer.8.attention.self.key.weight', 'bert.encoder.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.0.attention.output.dense.weight', 'bert.encoder.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.encoder.layer.9.output.dense.bias', 'bert.encoder.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.encoder.layer.1.output.LayerNorm.weight', 'bert.encoder.encoder.layer.6.output.dense.bias', 'bert.encoder.encoder.layer.1.attention.self.key.weight', 'bert.encoder.encoder.layer.5.attention.output.dense.bias', 'bert.seg_pos_embeddings.weight', 'bert.encoder.encoder.layer.2.attention.self.key.bias', 'bert.encoder.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.encoder.layer.2.intermediate.dense.bias', 'bert.encoder.encoder.layer.3.output.dense.bias', 'bert.encoder.encoder.layer.10.attention.self.key.bias', 'bert.encoder.encoder.layer.1.intermediate.dense.bias', 'bert.encoder.encoder.layer.9.output.LayerNorm.bias', 'bert.seg_encoder.layers.0.norm2.weight', 'bert.encoder.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.encoder.layer.4.attention.self.query.bias', 'bert.encoder.encoder.layer.5.attention.self.key.weight', 'bert.encoder.encoder.layer.6.attention.self.value.bias', 'bert.seg_encoder.layers.1.linear1.bias', 'bert.encoder.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.encoder.layer.2.attention.self.value.bias', 'bert.encoder.encoder.layer.4.attention.self.key.bias', 'bert.encoder.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.encoder.layer.7.attention.output.LayerNorm.bias']
This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:1512] 2022-03-28 18:01:33,192 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at /home/X/X/lex-glue/seed_1 and are newly initialized: ['bert.encoder.layer.0.output.LayerNorm.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.embeddings.LayerNorm.bias', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.0.attention.self.value.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.0.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.2.output.dense.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.0.attention.self.query.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.2.intermediate.dense.weight', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.0.intermediate.dense.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.2.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.0.attention.self.key.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.1.attention.output.dense.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.0.attention.output.dense.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.1.intermediate.dense.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.2.attention.output.dense.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.1.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.0.output.LayerNorm.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.2.output.dense.weight', 'bert.encoder.layer.2.attention.self.query.bias', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.2.attention.self.query.weight', 'bert.encoder.layer.0.output.dense.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.pooler.dense.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.1.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.1.output.dense.weight', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.0.attention.output.dense.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.2.attention.self.value.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.2.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.1.attention.self.key.bias', 'bert.encoder.layer.1.output.LayerNorm.bias', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.1.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.0.intermediate.dense.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.1.attention.self.query.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.embeddings.position_embeddings.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.0.output.dense.bias', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.0.attention.self.value.weight', 'bert.encoder.layer.2.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.1.attention.self.key.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.1.output.dense.bias', 'bert.encoder.layer.1.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.1.attention.output.dense.bias', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.2.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.embeddings.token_type_embeddings.weight', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.0.attention.self.query.weight', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.2.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.1.attention.self.value.bias', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.2.intermediate.dense.bias', 'bert.embeddings.word_embeddings.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.0.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.2.output.LayerNorm.bias', 'bert.embeddings.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.2.attention.self.value.weight', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.1.attention.output.LayerNorm.bias', 'bert.encoder.layer.1.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.2.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.0.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.pooler.dense.bias'] <

About loading ecthr_a

Hi,

I tried run_ecthr.sh, but it failed to load dataset.

The error is from line 236 in experiments/ecthr.py

train_dataset = load_dataset("lex_glue", name=data_args.task, split="train", data_dir='data', cache_dir=model_args.cache_dir)

Error info:

Traceback (most recent call last):
  File "main_ecthr.py", line 505, in <module>
    main()
  File "main_ecthr.py", line 236, in main
    train_dataset = load_dataset("lex_glue", name=data_args.task, split="train", data_dir='data', cache_dir=model_args.cache_dir)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 1723, in load_dataset
    builder_instance = load_dataset_builder(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 1500, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 1168, in dataset_module_factory
    return LocalDatasetModuleFactoryWithoutScript(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 691, in get_module
    else get_data_patterns_locally(base_path)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/data_files.py", line 451, in get_data_patterns_locally
    raise FileNotFoundError(f"The directory at {base_path} doesn't contain any data file") from None
FileNotFoundError: The directory at lex_glue/data doesn't contain any data file

If I delete data_dir='data', the error will turn to be:

08/02/2022 11:48:43 - INFO - datasets.data_files - Some files matched the pattern 'lex_glue/**[-._ 0-9/]train[-._ 0-9]*' at /workspace/MaxPlain/lex_glue but don't have valid data file extensions: [PosixPath('/workspace/MaxPlain/lex_glue/statistics/report_train_time.py')]
Traceback (most recent call last):
  File "main_ecthr.py", line 505, in <module>
    main()
  File "main_ecthr.py", line 236, in main
    train_dataset = load_dataset("lex_glue", name=data_args.task, split="train", cache_dir=model_args.cache_dir)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 1723, in load_dataset
    builder_instance = load_dataset_builder(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 1500, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 1168, in dataset_module_factory
    return LocalDatasetModuleFactoryWithoutScript(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 695, in get_module
    data_files = DataFilesDict.from_local_or_remote(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/data_files.py", line 786, in from_local_or_remote
    DataFilesList.from_local_or_remote(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/data_files.py", line 754, in from_local_or_remote
    data_files = resolve_patterns_locally_or_by_urls(base_path, patterns, allowed_extensions)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/data_files.py", line 359, in resolve_patterns_locally_or_by_urls
    raise FileNotFoundError(error_msg)
FileNotFoundError: Unable to resolve any data file that matches '['**[-._ 0-9/]train[-._ 0-9]*', 'train[-._ 0-9]*', '**[-._ 0-9/]training[-._ 0-9]*', 'training[-._ 0-9]*']' at /workspace/MaxPlain/lex_glue with any supported extension ['csv', 'tsv', 'json', 'jsonl', 'parquet', 'txt', 'blp', 'bmp', 'dib', 'bufr', 'cur', 'pcx', 'dcx', 'dds', 'ps', 'eps', 'fit', 'fits', 'fli', 'flc', 'ftc', 'ftu', 'gbr', 'gif', 'grib', 'h5', 'hdf', 'png', 'apng', 'jp2', 'j2k', 'jpc', 'jpf', 'jpx', 'j2c', 'icns', 'ico', 'im', 'iim', 'tif', 'tiff', 'jfif', 'jpe', 'jpg', 'jpeg', 'mpg', 'mpeg', 'msp', 'pcd', 'pxr', 'pbm', 'pgm', 'ppm', 'pnm', 'psd', 'bw', 'rgb', 'rgba', 'sgi', 'ras', 'tga', 'icb', 'vda', 'vst', 'webp', 'wmf', 'emf', 'xbm', 'xpm', 'BLP', 'BMP', 'DIB', 'BUFR', 'CUR', 'PCX', 'DCX', 'DDS', 'PS', 'EPS', 'FIT', 'FITS', 'FLI', 'FLC', 'FTC', 'FTU', 'GBR', 'GIF', 'GRIB', 'H5', 'HDF', 'PNG', 'APNG', 'JP2', 'J2K', 'JPC', 'JPF', 'JPX', 'J2C', 'ICNS', 'ICO', 'IM', 'IIM', 'TIF', 'TIFF', 'JFIF', 'JPE', 'JPG', 'JPEG', 'MPG', 'MPEG', 'MSP', 'PCD', 'PXR', 'PBM', 'PGM', 'PPM', 'PNM', 'PSD', 'BW', 'RGB', 'RGBA', 'SGI', 'RAS', 'TGA', 'ICB', 'VDA', 'VST', 'WEBP', 'WMF', 'EMF', 'XBM', 'XPM', 'zip']

Is there anything wrong in loading dataset?

LexGlue datasets

Are the datasets available right now?

When i run the code:

from datasets import load_dataset 
dataset = load_dataset("lex_glue", "scotus")

i get the following eror:

Couldn't reach https://raw.githubusercontent.com/huggingface/datasets/1.12.1/datasets/lex_glue/lex_glue.py

Can not find the file "compute_avg_scores.py" in directory 'statistics'

Hi,
I run the SCOTUS task program in the script of 'run_scotus.sh', but there is an error that can not find the file "compute_avg_scores.py" in the script last sentence: python statistics/compute_avg_scores.py --dataset ${TASK}
How can I solve this problem?
Thank you very much

Bug about fp16 in experiment run_ecthr.sh

When I run the run_ecthr.sh script in experiments folder
Such error occurs:

Traceback (most recent call last):
  File "main_ecthr.py", line 505, in <module>
    main()
  File "main_ecthr.py", line 454, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/transformers/trainer.py", line 1498, in train
    return inner_training_loop(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/transformers/trainer.py", line 1832, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/transformers/trainer.py", line 2038, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/transformers/trainer.py", line 2758, in evaluate
    output = eval_loop(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/transformers/trainer.py", line 2936, in evaluation_loop
    loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/transformers/trainer.py", line 3177, in prediction_step
    loss, outputs = self.compute_loss(model, inputs, return_outputs=True)
  File "/workspace/MaxPlain/lexglue/experiments/trainer.py", line 8, in compute_loss
    outputs = model(**inputs)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 1556, in forward
    outputs = self.bert(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/MaxPlain/lexglue/models/hierbert.py", line 100, in forward
    seg_encoder_outputs = self.seg_encoder(encoder_outputs)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 238, in forward
    output = mod(output, src_mask=mask, src_key_padding_mask=src_key_padding_mask)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 437, in forward
    return torch._transformer_encoder_layer_fwd(
RuntimeError: expected scalar type Half but found Float

I try to debug it. And find that it maybe due to the trainer fail to put model to dtype=torch.fp16.
I also tried the evaluation. It will fail and report the same error.

# Evaluation
    if training_args.do_eval:
        logger.info("*** Evaluate ***")
        metrics = trainer.evaluate(eval_dataset=eval_dataset)

        max_eval_samples = data_args.max_eval_samples if data_args.max_eval_samples is not None else len(eval_dataset)
        metrics["eval_samples"] = min(max_eval_samples, len(eval_dataset))

        trainer.log_metrics("eval", metrics)
        trainer.save_metrics("eval", metrics)

After I remove --fp16 --fp16_full_eval in the run_ecthr.sh, it works as expected.

scotus: ValueError: expected sequence of length 64 at dim 2 (got 128)

Hi, thanks for the awesome repo!

I have encountered an issue when running the scripts for scotus.

[INFO|trainer.py:1164] 2022-05-31 13:09:08,068 >> ***** Running training *****
[INFO|trainer.py:1165] 2022-05-31 13:09:08,068 >> Num examples = 100
[INFO|trainer.py:1166] 2022-05-31 13:09:08,068 >> Num Epochs = 10
[INFO|trainer.py:1167] 2022-05-31 13:09:08,068 >> Instantaneous batch size per device = 8
[INFO|trainer.py:1168] 2022-05-31 13:09:08,068 >> Total train batch size (w. parallel, distributed & accumulation) = 64
[INFO|trainer.py:1169] 2022-05-31 13:09:08,068 >> Gradient Accumulation steps = 1
[INFO|trainer.py:1170] 2022-05-31 13:09:08,068 >> Total optimization steps = 20
0%| | 0/20 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/cooelf/lex-glue-main/scotus.py", line 490, in
main()
File "/home/cooelf/lex-glue-main/scotus.py", line 439, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/cooelf/.local/lib/python3.7/site-packages/transformers/trainer.py", line 1254, in train
for step, inputs in enumerate(epoch_iterator):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/home/cooelf/.local/lib/python3.7/site-packages/transformers/data/data_collator.py", line 81, in default_data_collator
batch[k] = torch.tensor([f[k] for f in features])
ValueError: expected sequence of length 64 at dim 2 (got 128)
0%| | 0/20 [00:00<?, ?it/s]

Process finished with exit code 1

It seems to be some problem about the data processing. I have checked the dimension of the features but failed to find anything strange.

Could you give some hints to solve it?

Thanks!

compute_avg_scores.py

Sorry , I can not find "compute_avg_scores.py" file in " statistics"
" python statistics/compute_avg_scores.py --dataset ${TASK} "

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.