nlpaueb / multi-eurlex Goto Github PK

MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

Python 100.00%

multi-eurlex's People

Contributors

Stargazers

Watchers

Forkers

arnedefauw cfc87 akashmavle5 ashok161

multi-eurlex's Issues

Mapping for concepts from dataset to description json file

Hi, when I get the dataset by load_dataset('multi_eurlex', language='en', label_level='level_1')

The label in these data samples are all small numbers from 0 to around 20, which I assume that they are all re-labelled by sequence.

But in the eurovoc_concepts file, the label is still 100XXX for level_1. Could you please specify the mapping relationship between them? (Same question for level 2 and level 3 as well)

Or is there any other way to interpret the labels?

Thank you very much.

Key error for "all" labels

Hi,

I got a key error when I try to have the original label of the data.

The code I used is load_dataset('multi_eurlex', language='en', label_level='all')

The error message is KeyError: 'all'

Could you please check about that?

Thank you very much.

Evaluation data loading

Hey,
Thanks for releasing this code. I believe that there is a small inaccuracy in your code.
In the file experiments/trainer.py, line number: 111, the following line of code

dataset=eval_dataset['validation'][:eval_samples if eval_samples else len(eval_dataset)]

must be replaced with

dataset=eval_dataset['validation'][:eval_samples if eval_samples else len(eval_dataset['validation'])]

I believe that len(eval_dataset) would be 3 as it indicates the number of splits associated with the dataset. Please let me know whether this rectification is valid.

Hello, Thanks for sharing the code. I wanted to know the argument values for reproducing the results in the paper "LEGAL-BERT: The Muppets straight out of Law School". I tried executing the code in its default setting (eval_langs was set to ['en']). This resulted in a very poor performance. I am attaching the log file with this message (
MULTI-EURLEX_2022_01_19_05_01_19.txt). Thanks.

Reproducibility of the experiments

Hello,

Thank you for your research and your code which are very interesting. I'm trying to reproduce your results on your paper. I called python3 trainer.py --bert_path 'camembert-base' --native_bert True --batch_size 1 --train_lang 'fr' --eval_langs 'fr' --label_level 'level_3', but got NotImplementedError: Model type camembert is not supported for adaptation.

Thank you in advance for any reply.

nlpaueb / multi-eurlex Goto Github PK

multi-eurlex's People

Contributors

Stargazers

Watchers

Forkers

multi-eurlex's Issues

Mapping for concepts from dataset to description json file

Key error for "all" labels

Evaluation data loading

Replication of results

Reproducibility of the experiments

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent