Hey, First of all I wanted to thank you for this amazing project.</p

How to get vocab.yml file when doing train->eval->dist about opus-mt-train HOT 6 CLOSED

helsinki-nlp commented on May 9, 2024

How to get vocab.yml file when doing train->eval->dist

from opus-mt-train.

Comments (6)

orendar commented on May 9, 2024

Nevermind, I saw someone else's comment about marian-vocab. It might be useful to add it to the documentation somewhere? Thanks!

from opus-mt-train.

orendar commented on May 9, 2024

Actually I'm going to reopen this as I'm still confused - I successfully created a vocab.yml file by concatenating the source and target vocabs and passing to marian-vocab, but when I try to convert my packaged model to Huggingface I get an error: Original vocab size {opus_state.cfg['vocab_size']} and new vocab size {len(tokenizer.encoder)} mismatched AssertionError: Original vocab size 32001 and new vocab size 61724 mismatched. Is there something I'm missing here? Should I ask this question over at Huggingface?

from opus-mt-train.

orendar commented on May 9, 2024

Looked at the commits and figured out that the old yml vocab can be produced if using USE_SPM_VOCAB=0 flag when creating the data. Does that mean that for now, I should train all models with that flag if I want to port them over to Huggingface?

from opus-mt-train.

jorgtied commented on May 9, 2024

Yes, sorry, this is what you need now but I will talk to the people at huggingface to also support the plain text vocab files that are taken from the sentence piece models. It's a bit of a moving target.

from opus-mt-train.

orendar commented on May 9, 2024

Great, thanks - happy to follow up with them or help with the porting if I can. Feel free to leave this issue up or close it if you think it's not directly relevant for this project, up to you.

from opus-mt-train.

jorgtied commented on May 9, 2024

Changed it now to have USE_SPM_VOCAB=0 as default. Seems more backward compatible with everything ...

from opus-mt-train.

Recommend Projects

How to get vocab.yml file when doing train->eval->dist about opus-mt-train HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent