shijie-wu / neural-transducer Goto Github PK

This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.

License: MIT License

Makefile 0.07% Shell 9.48% Python 81.00% C 9.46%

character-level-transduction sequence-to-sequence transducers

neural-transducer's Introduction

Neural Transducer

This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks. It powers the following papers and workshop.

Tiago Pimentel, Maria Ryskina, Sabrina J. Mielke, Shijie Wu, Eleanor Chodroff, Brian Leonard, Garrett Nicolai, Yustinus Ghanggo Ate, Salam Khalifa, Nizar Habash, Charbel El-Khaissi, Omer Goldman, Michael Gasser, William Lane, Matt Coler, Arturo Oncevay, Jaime Rafael Montoya Samame, Gema Celeste Silva Villegas, Adam Ek, Jean-Philippe Bernardy, Andrey Shcherbakov, Aziyana Bayyr-ool, Karina Sheifer, Sofya Ganieva, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Andrew Krizhanovsky, Natalia Krizhanovsky, Clara Vania, Sardana Ivanova, Aelita Salchak, Christopher Straughn, Zoey Liu, Jonathan North Washington, Duygu Ataman, Witold Kieraś, Marcin Woliński, Totok Suhardijanto, Niklas Stoehr, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Richard J. Hatcher, Emily Prud'hommeaux, Ritesh Kumar, Mans Hulden, Botond Barta, Dorina Lakatos, Gábor Szolnok, Judit Ács, Mohit Raj, David Yarowsky, Ryan Cotterell, Ben Ambridge, and Ekaterina Vylomova. SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages. SIGMORPHON. 2021. (Experiments Detail)
Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, and Mans Hulden. SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection. SIGMORPHON. 2020. (Experiments Detail)
Shijie Wu, Ryan Cotterell, and Mans Hulden. Applying the Transformer to Character-level Transduction. EACL. 2021. (Experiments Detail)
Arya D McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Miikka Silfverberg, Sebastian J Mielke, Jeffrey Heinz, Ryan Cotterell, and Mans Hulden. The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection. SIGMORPHON. 2019. (Experiments Detail)
Shijie Wu, Ryan Cotterell, and Timothy J O'Donnell. Morphological Irregularity Correlates with Frequency. ACL. 2019. (Experiments Detail)
Shijie Wu, and Ryan Cotterell. Exact Hard Monotonic Attention for Character-Level Transduction. ACL. 2019. (Experiments Detail)
Chaitanya Malaviya*, Shijie Wu*, and Ryan Cotterell. A Simple Joint Model for Improved Contextual Neural Lemmatization. NAACL. 2019. (Experiments Detail)
Shijie Wu, Pamela Shapiro, and Ryan Cotterell. Hard Non-Monotonic Attention for Character-Level Transduction. EMNLP. 2018. (Experiments Detail)

Miscellaneous

Environment (conda): environment.yml
Pre-commit check: pre-commit run --all-files
Compile: make

License

MIT

neural-transducer's People

Contributors

Stargazers

Watchers

neural-transducer's Issues

Data set for g2p

Could you provide the dataset you used in g2p experiments? I am wondering how you split the dictionary into training, dev and test sets, which is helpful to compare the performance using different models

Error for creating conda env

Hi Shijie,

I got the following error:

C:\research\neural-transducer-master>conda env create --file environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:
  - zlib==1.2.11=h516909a_1010
  - mkl_random==1.2.1=py39ha9443f7_2
  - mkl-service==2.3.0=py39h27cfd23_1
  - virtualenv==20.4.4=py39hf3d152e_0
  - ld_impl_linux-64==2.35.1=hea4e1c9_2
  - cffi==1.14.5=py39he32792d_0
  - libstdcxx-ng==9.3.0=h6de172a_19
  - readline==8.1=h46c0cb4_0
  - _libgcc_mutex==0.1=conda_forge
  - libgcc-ng==9.3.0=h2828fa1_19
  - libuv==1.41.0=h7f98852_0
  - tbb==2021.2.0=h4bd325d_0
  - python==3.9.2=hffdb5ce_0_cpython
  - pre-commit==2.12.1=py39hf3d152e_0
  - pytorch==1.8.1=py3.9_cuda11.1_cudnn8.0.5_0
  - ncurses==6.2=h58526e2_4
  - mkl==2021.2.0=h726a3e6_389
  - ninja==1.10.2=h4bd325d_0
  - _openmp_mutex==4.5=1_llvm
  - ca-certificates==2020.12.5=ha878542_0
  - numpy-base==1.20.1=py39h7d8b39e_0
  - pyyaml==5.4.1=py39h3811e60_0
  - cudatoolkit==11.1.1=h6406543_8
  - mkl_fft==1.3.0=py39h42c9631_2
  - yaml==0.2.5=h516909a_0
  - llvm-openmp==11.1.0=h4bd325d_1
  - sqlite==3.35.5=h74cdb3f_0
  - libffi==3.3=h58526e2_2
  - editdistance-s==1.0.0=py39h1a9c180_1
  - tk==8.6.10=h21135ba_1
  - jedi==0.18.0=py39hf3d152e_2
  - xz==5.2.5=h516909a_1
  - certifi==2020.12.5=py39hf3d152e_1
  - numpy==1.20.1=py39h93e21f0_0
  - setuptools==49.6.0=py39hf3d152e_3
  - ipython==7.22.0=py39hef51801_0
  - openssl==1.1.1k=h7f98852_0

It would be really helpful if you give me some hints to solve this issue. :)

encode() missing 1 required positional argument: 'src_mask' - Beam Decode

Hi, Thank you for making the code open-source!

I am trying to train a g2p based model with beam-decoding. Unfortunately, I am getting the following error. Please refer to the logs below for complete details.

FYI, the code works fine with greedy decoding. Kindly advice.

(base) [aagarwal@ip-0A000427 neural-transducer]$ python src/train.py --train data/100hrs-youtube.train --dev data/100hrs-youtube.dev --test data/100hrs-youtube.test --epochs 100 --dataset g2p --arch transformer --model models/v2-beam-search-decoding/v2 --decode beam
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: seed - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: train - ['data/100hrs-youtube.train']
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: dev - ['data/100hrs-youtube.dev']
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: test - ['data/100hrs-youtube.test']
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: model - 'models/v2-beam-search-decoding/v2'
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: load - ''
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: bs - 20
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: epochs - 100
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_steps - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: warmup_steps - 4000
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: total_eval - -1
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: optimizer - <Optimizer.adam: 'adam'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: scheduler - <Scheduler.reducewhenstuck: 'reducewhenstuck'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: lr - 0.001
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: min_lr - 1e-05
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: momentum - 0.9
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: beta1 - 0.9
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: beta2 - 0.999
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: estop - 1e-08
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: cooldown - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: patience - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: discount_factor - 0.5
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_norm - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: gpuid - []
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: loglevel - 'info'
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: saveall - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: shuffle - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: cleanup_anyway - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: dataset - <Data.g2p: 'g2p'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_seq_len - 128
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_decode_len - 128
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: init - ''
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: dropout - 0.2
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: embed_dim - 100
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: nb_heads - 4
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: src_layer - 1
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: trg_layer - 1
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: src_hs - 200
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: trg_hs - 200
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: label_smooth - 0.0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: tie_trg_embed - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: arch - <Arch.transformer: 'transformer'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: nb_sample - 2
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: wid_siz - 11
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: indtag - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: decode - <Decode.beam: 'beam'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: mono - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: bestacc - False
INFO - 10/18/20 14:31:37 - 0:00:00 - src vocab size 45
INFO - 10/18/20 14:31:37 - 0:00:00 - trg vocab size 44
INFO - 10/18/20 14:31:37 - 0:00:00 - src vocab ['<PAD>', '<s>', '<\\s>', '<UNK>', '"b', '"g', '"h', '"i', '"j', '"k', '"m', '"n', '"s', '"z', "'", 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'ß', 'ä', 'ö', 'ü']
INFO - 10/18/20 14:31:37 - 0:00:00 - trg vocab ['<PAD>', '<s>', '<\\s>', '<UNK>', "'", ',"', '-', '.', '\\', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '¨', 'ß', 'ä', 'ç', 'è', 'é', 'ö', 'ü', 'ș']
INFO - 10/18/20 14:31:37 - 0:00:00 - model: Transformer(
                                       (src_embed): Embedding(45, 100, padding_idx=0)
                                       (trg_embed): Embedding(44, 100, padding_idx=0)
                                       (position_embed): SinusoidalPositionalEmbedding()
                                       (encoder): TransformerEncoder(
                                         (layers): ModuleList(
                                           (0): TransformerEncoderLayer(
                                             (self_attn): MultiheadAttention(
                                               (out_proj): _LinearWithBias(in_features=100, out_features=100, bias=True)
                                             )
                                             (linear1): Linear(in_features=100, out_features=200, bias=True)
                                             (dropout): Dropout(p=0.2, inplace=False)
                                             (linear2): Linear(in_features=200, out_features=100, bias=True)
                                             (norm1): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (norm2): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (activation_dropout): Dropout(p=0.2, inplace=False)
                                           )
                                         )
                                         (norm): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                       )
                                       (decoder): TransformerDecoder(
                                         (layers): ModuleList(
                                           (0): TransformerDecoderLayer(
                                             (self_attn): MultiheadAttention(
                                               (out_proj): _LinearWithBias(in_features=100, out_features=100, bias=True)
                                             )
                                             (multihead_attn): MultiheadAttention(
                                               (out_proj): _LinearWithBias(in_features=100, out_features=100, bias=True)
                                             )
                                             (linear1): Linear(in_features=100, out_features=200, bias=True)
                                             (dropout): Dropout(p=0.2, inplace=False)
                                             (linear2): Linear(in_features=200, out_features=100, bias=True)
                                             (norm1): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (norm2): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (norm3): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (activation_dropout): Dropout(p=0.2, inplace=False)
                                           )
                                         )
                                         (norm): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                       )
                                       (final_out): Linear(in_features=100, out_features=44, bias=True)
                                       (dropout): Dropout(p=0.2, inplace=False)
                                     )
INFO - 10/18/20 14:31:37 - 0:00:00 - number of parameter 216544
INFO - 10/18/20 14:31:37 - 0:00:00 - maximum training 269700 steps (100 epochs)
INFO - 10/18/20 14:31:37 - 0:00:00 - evaluate every 1 epochs
INFO - 10/18/20 14:31:37 - 0:00:00 - At 0-th epoch with lr 0.001000.
100%|| 2697/2697 [01:10<00:00, 38.40it/s]
INFO - 10/18/20 14:32:47 - 0:01:11 - Running average train loss is 1.5452647511058266 at epoch 0
INFO - 10/18/20 14:32:47 - 0:01:11 - At 1-th epoch with lr 0.001000.
100%|| 2697/2697 [01:06<00:00, 40.65it/s]
INFO - 10/18/20 14:33:54 - 0:02:17 - Running average train loss is 1.218658867061779 at epoch 1
100%|| 338/338 [00:02<00:00, 128.70it/s]
INFO - 10/18/20 14:33:56 - 0:02:19 - Average dev loss is 0.9772854196073035 at epoch 1
  0%|| 0/6741 [00:00<?, ?it/s]
Exception ignored in: <generator object StandardG2P.read_file at 0x2af3a8d8b3d0>
RuntimeError: generator ignored GeneratorExit
Traceback (most recent call last):
  File "src/train.py", line 350, in <module>
    main()
  File "src/train.py", line 346, in main
    trainer.run(start_epoch, decode_fn=decode_fn)
  File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/trainer.py", line 373, in run
    eval_res = self.evaluate(DEV, epoch_idx, decode_fn)
  File "src/train.py", line 255, in evaluate
    decode_fn)
  File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/util.py", line 194, in evaluate_all
    pred, _ = decode_fn(model, src)
  File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/decoding.py", line 64, in __call__
    trg_eos=self.trg_eos)
  File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/decoding.py", line 364, in decode_beam_search
    enc_hs = transducer.encode(src_sentence)
TypeError: encode() missing 1 required positional argument: 'src_mask'

Strange results running the ACL2019 irregularity example code

hi! I've been playing around with the repo, the code is very nicely organized. but I have a question: I've run the code in example/irregularity-acl19 exactly as shown in the README, and I'm confused by the numbers I'm getting. I ran it on the English UniMorph data following the README, and also on German UniMorph as I'm working with German right now.

according to the README, the output (i.e. in model/unimorph/large/monotag-hmm/{lang}-{fold}.decode.{split}.tsv) contains p(inflected form|lemma, tags) / len(inflected form). I assume this is in the loss column in the TSVs, as that's the only column that makes sense.

here's the distribution of values I get in that column. N is number of predicted forms overall, across all folds and dev/test splits, and N(p > 1) is the number of predicted forms where the listed value for p(inflected form|lemma, tags) / len(inflected form) > 1. the results are split by whether the model correctly predicted the target form.

Lang	Prediction correct?	N	N (p > 1)	mean(p)	min(p)	max(p)
ENG	Yes	95861	0	0.0023	1e7	0.2833
	No	5437	1524	0.9874	5e2	22.3135
DEU	Yes	318311	0	0.0027	1e7	0.2891
	No	28423	7770	0.8299	5e2	23.2277

the main thing that confuses me is that the model systematically assigns higher probabilities to forms it gets wrong. (it also looks like there might a bug somewhere if 28% of the incorrectly predicted forms in each language are assigned a probability greater than one.)

going by the paper, the degree of irregularity metric i should be calculated as -log(p / (1 - p)). applying that to the results above, the average i for words the model got right is 9.6 (ENG) and 11.0 (DEU), while for words it predicted wrong ( excluding forms with p > 1 where i is undefined), the average i is 0.7 (ENG, DEU).

this seems completely at odds from the analysis described in the paper. I'm wondering if I've misunderstood something, or ran the example wrong? any ideas what's going on here?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

shijie-wu / neural-transducer Goto Github PK

neural-transducer's Introduction

Neural Transducer

Miscellaneous

License

neural-transducer's People

Contributors

Stargazers

Watchers

Forkers

neural-transducer's Issues

Data set for g2p

Error for creating conda env

encode() missing 1 required positional argument: 'src_mask' - Beam Decode

Strange results running the ACL2019 irregularity example code

Writing out final dev/test predictions to file?

Typo in readme

AttributeError: module 'align' has no attribute 'Aligner'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent