Giter Club home page Giter Club logo

sonar's Introduction

SONAR

[Paper] [Demo]

We introduce SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders. It substantially outperforms existing sentence embeddings such as LASER3 and LabSE on the xsim and xsim++ multilingual similarity search tasks.

Speech segments can be embedded in the same SONAR embedding space using language-specific speech encoders trained in a teacher-student setting on speech transcription data. We also provide a single text decoder, which allows us to perform text-to-text and speech-to-text machine translation, including for zero-shot language and modality combinations.

SONAR stands for Sentence-level multimOdal and laNguage-Agnostic Representations

The full list of supported languages (along with download links) can be found here below.

SONAR Architecture:


Text results


Speech results


Installing

You can install SONAR with pip install sonar-space. Note that there is another sonar package on pip that IS NOT this project, make sure to use sonar-space in your dependencies.

If you want to install SONAR manually, you can install it localy. SONAR depends mainly on Fairseq2 and can be installed using (tested with python=3.8)

pip install --upgrade pip
pip install -e .

If fairseq2 does not provide a build for your machine, check the readme of that project to build it locally.

Usage

fairseq2 will automatically download models into your $TORCH_HOME/hub directory upon using the commands below.

Compute text sentence embeddings with SONAR:

from sonar.inference_pipelines.text import TextToEmbeddingModelPipeline
t2vec_model = TextToEmbeddingModelPipeline(encoder="text_sonar_basic_encoder",
                                           tokenizer="text_sonar_basic_encoder")
sentences = ['My name is SONAR.', 'I can embed the sentences into vectorial space.']
embeddings = t2vec_model.predict(sentences, source_lang="eng_Latn")
print(embeddings.shape)
# torch.Size([2, 1024])

Reconstruct text from SONAR embeddings

from sonar.inference_pipelines.text import EmbeddingToTextModelPipeline
vec2text_model = EmbeddingToTextModelPipeline(decoder="text_sonar_basic_decoder",
                                              tokenizer="text_sonar_basic_encoder")
reconstructed = vec2text_model.predict(embeddings, target_lang="eng_Latn", max_seq_len=512)
# max_seq_len is a keyword argument passed to the fairseq2 BeamSearchSeq2SeqGenerator.
print(reconstructed)
# ['My name is SONAR.', 'I can embed the sentences into vector space.']

Translate text with SONAR

from sonar.inference_pipelines.text import TextToTextModelPipeline
t2t_model = TextToTextModelPipeline(encoder="text_sonar_basic_encoder",
                                    decoder="text_sonar_basic_decoder",
                                    tokenizer="text_sonar_basic_encoder")  # tokenizer is attached to both encoder and decoder cards

sentences = ['My name is SONAR.', 'I can embed the sentences into vectorial space.']
t2t_model.predict(sentences, source_lang="eng_Latn", target_lang="fra_Latn")
# ['Mon nom est SONAR.', "Je peux intégrer les phrases dans l'espace vectoriel."]

Compute speech sentence embeddings with SONAR

from sonar.inference_pipelines.speech import SpeechToEmbeddingModelPipeline
s2vec_model = SpeechToEmbeddingModelPipeline(encoder="sonar_speech_encoder_eng")

s2vec_model.predict(["./tests/integration_tests/data/audio_files/audio_1.wav",
                     "./tests/integration_tests/data/audio_files/audio_2.wav"]).shape
# torch.Size([2, 1024])
import torchaudio
inp, sr = torchaudio.load("./tests/integration_tests/data/audio_files/audio_1.wav")
assert sr == 16000, "Sample rate should be 16kHz"

s2vec_model.predict([inp]).shape
# torch.Size([1, 1024])

Speech-to-text translation with SONAR

from sonar.inference_pipelines.speech import SpeechToTextModelPipeline

s2t_model = SpeechToTextModelPipeline(encoder="sonar_speech_encoder_eng",
                                      decoder="text_sonar_basic_decoder",
                                      tokenizer="text_sonar_basic_decoder")

import torchaudio
inp, sr = torchaudio.load("./tests/integration_tests/data/audio_files/audio_1.wav")
assert sr == 16000, "Sample rate should be 16kHz"

# passing loaded audio files
s2t_model.predict([inp], target_lang="eng_Latn")
# ['Television reports show white smoke coming from the plant.']

# passing multiple wav files 
s2t_model.predict(["./tests/integration_tests/data/audio_files/audio_1.wav",
                   "./tests/integration_tests/data/audio_files/audio_2.wav"], target_lang="eng_Latn")
# ['Television reports show white smoke coming from the plant.',
# 'These couples may choose to make an adoption plan for their baby.']

Predicting sentence similarity with BLASER 2.0 models

BLASER 2.0 is a family of models for automatic evaluation of machine translation quality based on SONAR embeddings. They predict cross-lingual semantic similarity between the translation and the source (optionally, also using a reference translation).

from sonar.inference_pipelines.text import TextToEmbeddingModelPipeline
from sonar.models.blaser.loader import load_blaser_model

blaser_ref = load_blaser_model("blaser_2_0_ref").eval()
blaser_qe = load_blaser_model("blaser_2_0_qe").eval()
text_embedder = TextToEmbeddingModelPipeline(encoder="text_sonar_basic_encoder", tokenizer="text_sonar_basic_encoder")

src_embs = text_embedder.predict(["Le chat s'assit sur le tapis."], source_lang="fra_Latn")
ref_embs = text_embedder.predict(["The cat sat on the mat."], source_lang="eng_Latn")
mt_embs = text_embedder.predict(["The cat sat down on the carpet."], source_lang="eng_Latn")

print(blaser_ref(src=src_embs, ref=ref_embs, mt=mt_embs).item())  # 4.688
print(blaser_qe(src=src_embs, mt=mt_embs).item())  # 4.708

Detailed model cards with more examples: facebook/blaser-2.0-ref, facebook/blaser-2.0-qe.

Demo notebooks

See more complete demo notebooks :

Supported languages and download links

The SONAR text encoder & decoder supports 200 languages. SONAR speech encoders support 37 languages.

Available text encoders/decoders
model link
encoder download
decoder download
finetuned decoder download
tokenizer download

All 200 languages from the No Language Left Behind project are supported.

Available speech encoders
lang_code language link
arb ms arabic download
asm assamese download
bel belarussian download
ben bengali download
bos bosnian download
bul bulgarian download
cat catalan download
ces czech download
cmn mandarin chinese download
cym welsh download
dan danish download
deu german download
est estonian download
fin finnish download
fra french download
guj gujurati download
heb hebrew download
hin hindi download
hrv croatian download
ind indonesian download
ita italian download
jpn japanse download
kan kannada download
kor korean download
lao lao download
lit lithaian download
lvs standard latvian download
mal malayalam download
mar marathi download
mkd macedonian download
mlt maltese download
npi nepali download
nld dutch download
ory odia download
pan punjabi download
pes western persian download
pol polish download
por portuguese download
ron romanian download
rus russian download
slk slovak download
slv slovenian download
snd sindhi download
srp serbian download
spa spanish download
swe swedish download
swh swahili download
tam tamil download
tel telugu download
tgl tagalog download
tha thai download
tur turkish download
ukr ukrainian download
urd urdu download
uzn northern uzbek download
vie vietnamese download
yue yue download

Citation Information

Please cite the paper when referencing the SONAR embedding space, encoders and decoders as:

@misc{Duquenne:2023:sonar_arxiv,
  author = {Paul-Ambroise Duquenne and Holger Schwenk and Benoit Sagot},
  title = {{SONAR:} Sentence-Level Multimodal and Language-Agnostic Representations},
  publisher = {arXiv},
  year = {2023},
  url = {https://arxiv.org/abs/2308.11466},
}

Contributing

See the CONTRIBUTING file for how to help out.

License

SONAR code is released under the MIT license (see CODE_LICENSE).

Some of SONAR models are released with the same MIT license, BUT BEWARE, some of them are released under a non commercial license (see NC_MODEL_LICENSE). Please refer to LICENSE for the details.

sonar's People

Contributors

pihey1995 avatar artemru avatar avidale avatar alrowithi avatar rumourscape avatar eswardivi avatar eltociear avatar mortimerp9 avatar simonchoi034 avatar antoine-tran avatar alexmourachko avatar hadware avatar

Stargazers

Sattaya Singkul avatar Mike avatar Lennart Keller avatar MikewasG avatar Khoa Le avatar  avatar  avatar Mircea Mironenco avatar mg avatar Ridhi Bandaru avatar  avatar Alonso Astroza Tagle avatar  avatar Yaoting Wang avatar Thomas Lux avatar Jade Yeom avatar Boxuan Lyu avatar Benjamin Akera avatar Darinka avatar pe653 avatar Shawon Ashraf avatar Fernando López Gavilánez avatar Pavarissy avatar LUÍS CARLOS DE SOUZA  MENEZES avatar  avatar Sina Ahmadi avatar gruebleen avatar  avatar Neil Scheidwasser avatar  avatar Bill Yang avatar DJ BLACKLORD avatar  avatar  avatar Brett Butterfield avatar Lidong avatar Xiang Liu avatar Muhammad Kharisma Azhari avatar Chun-Yi Kuan avatar Sriram Gopalakrishnan avatar Sebastian Sepulveda avatar Huang-Cheng, Chou avatar Thomas Schranz avatar Roman Tokarev avatar  avatar Kenton avatar Johnny avatar Khuyen Tran avatar Jonathan Fly avatar Marlowe avatar Alexandre avatar chen quan avatar Chuanming avatar Kamdoum Ngamgoum Franck Junior avatar Wiebke Hutiri avatar Pitikorn Khlaisamniang avatar Parinthapat Pengpun avatar HRNPH avatar Jinchao avatar  avatar  avatar  avatar Şeymanur Aktı avatar Chien-yu Huang avatar Kai-Wei Chang (張凱爲) avatar You Zuo avatar Gabriel Juan avatar  avatar Joshua Paul Verdin avatar Vladimir Gurevich avatar Irelia avatar Ladislas Nalborczyk avatar  avatar JSFan avatar Maha avatar Mostafa Samy avatar 彦祖 avatar Jongsu Liam Kim avatar Derrick avatar Junghwan Park avatar  avatar SomeoneElse avatar Ivan Zhuravlev avatar  avatar Daniel Morandini avatar Alex avatar  avatar  avatar  avatar  avatar HAESUNG JEON avatar  avatar Daria Diatlova avatar  avatar  avatar Michael R. Kirchner avatar Nguyễn Văn Anh Tuấn avatar Arjun Kava avatar Fred Bliss avatar Travis Morton avatar

Watchers

 avatar  avatar Mike avatar  avatar Hady Elsahar avatar Maha avatar  avatar Marta Ruiz Costa-jussà avatar Arun Sathiya avatar Lydia Nishimwe avatar  avatar  avatar  avatar SomeoneElse avatar  avatar

sonar's Issues

Possible languages specific Alternative Spelling or Capitalization Rules alignment issue for future improvement SONAR cross-lingual vector space or BLASER quality measure alignment

Problem description for possible scientific research and more details: Alternative Spelling rules in some languages for benchmarking embeddings models

Colab for reproduce (.ipynb and .py) with SONAR and BLASER quick test:
SONAR_BLASER_Alternative_Spelling_or_Capitalization_Rules_TEST.zip

For SONAR and BLASER2 we can observe a decrease (sometimes significant) for the similarity metric of words/sentences written in Alternative Spelling or Capitalization Rules:

  • Word-level results (EN-DE, same EN-DE word in Alternative Spelling):
    image

  • Word-level results (DE-DE word in Alternative Spelling):
    image

  • Word-level results (EN-DE test Capitalization Rules):
    image

  • Sentence-level (EN-DE, same EN-DE with German Alternative Spelling and Capitalization Rules) results:
    image

  • Sentence-level (one German language, sentence with words written in Alternative Spelling) results:
    image

Error downloading Mandarin speech encoder

Code to reproduce:

import torch
from sonar.inference_pipelines.speech import SpeechToEmbeddingModelPipeline
s2vec_model = SpeechToEmbeddingModelPipeline(encoder="sonar_speech_encoder_cmn", device=torch.device("cuda"))

Expected behavior: the code works.
Actual behavior: HTTP Error 403: Forbidden.

Full trace

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 s2vec_model = SpeechToEmbeddingModelPipeline(encoder="sonar_speech_encoder_cmn", device=torch.device("cuda"))

File ~/.conda/envs/external-sonar/lib/python3.10/site-packages/sonar/inference_pipelines/speech.py:394, in SpeechToEmbeddingModelPipeline.__init__(self, encoder, device, fbank_dtype)
    391 super().__init__(fbank_dtype)
    393 if isinstance(encoder, str):
--> 394     encoder = load_sonar_speech_model(encoder, device=device, progress=False)
    395 self.model = encoder.to(device).eval()

File ~/.conda/envs/external-sonar/lib/python3.10/site-packages/fairseq2/models/utils/model_loader.py:182, in ModelLoader.__call__(self, model_name_or_card, force, progress, device, dtype)
    179 # Load the checkpoint.
    180 uri = card.field("checkpoint").as_uri()
--> 182 pathname = self.download_manager.download_checkpoint(
    183     uri, card.name, force=force, progress=progress
    184 )
    186 checkpoint = load_checkpoint(
    187     pathname,
    188     card.name,
    189     map_location="cpu",
    190     converter=partial(self._upgrade_checkpoint, config=config),
    191 )
    193 try:
    194     # Try to construct the model on the meta device.

File ~/.conda/envs/external-sonar/lib/python3.10/site-packages/fairseq2/assets/download_manager.py:119, in DefaultAssetDownloadManager.download_checkpoint(self, uri, model_name, checkpoint_name, shard_idx, force, progress)
    115     display_name = f"{display_name} (shard {shard_idx})"
    117 pathname = self._get_pathname(uri, sub_dir="checkpoints")
--> 119 self._download_file(uri, pathname, display_name, force, progress)
    121 return pathname

File ~/.conda/envs/external-sonar/lib/python3.10/site-packages/fairseq2/assets/download_manager.py:223, in DefaultAssetDownloadManager._download_file(self, uri, pathname, display_name, force, progress)
    221     response = urlopen(uri)
    222 except HTTPError as ex:
--> 223     raise_connection_error(ex)
    225 with response, NamedTemporaryFile(delete=False, dir=pathname.parent) as fp:
    226     headers = response.info()

File ~/.conda/envs/external-sonar/lib/python3.10/site-packages/fairseq2/assets/download_manager.py:221, in DefaultAssetDownloadManager._download_file(self, uri, pathname, display_name, force, progress)
    218     _print_progress(f"Downloading the {display_name}...")
    220 try:
--> 221     response = urlopen(uri)
    222 except HTTPError as ex:
    223     raise_connection_error(ex)

File ~/.conda/envs/external-sonar/lib/python3.10/urllib/request.py:216, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    214 else:
    215     opener = _opener
--> 216 return opener.open(url, data, timeout)

File ~/.conda/envs/external-sonar/lib/python3.10/urllib/request.py:525, in OpenerDirector.open(self, fullurl, data, timeout)
    523 for processor in self.process_response.get(protocol, []):
    524     meth = getattr(processor, meth_name)
--> 525     response = meth(req, response)
    527 return response

File ~/.conda/envs/external-sonar/lib/python3.10/urllib/request.py:634, in HTTPErrorProcessor.http_response(self, request, response)
    631 # According to RFC 2616, "2xx" code indicates that the client's
    632 # request was successfully received, understood, and accepted.
    633 if not (200 <= code < 300):
--> 634     response = self.parent.error(
    635         'http', request, response, code, msg, hdrs)
    637 return response

File ~/.conda/envs/external-sonar/lib/python3.10/urllib/request.py:563, in OpenerDirector.error(self, proto, *args)
    561 if http_err:
    562     args = (dict, 'default', 'http_error_default') + orig_args
--> 563     return self._call_chain(*args)

File ~/.conda/envs/external-sonar/lib/python3.10/urllib/request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    494 for handler in handlers:
    495     func = getattr(handler, meth_name)
--> 496     result = func(*args)
    497     if result is not None:
    498         return result

File ~/.conda/envs/external-sonar/lib/python3.10/urllib/request.py:643, in HTTPDefaultErrorHandler.http_error_default(self, req, fp, code, msg, hdrs)
    642 def http_error_default(self, req, fp, code, msg, hdrs):
--> 643     raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: HTTP Error 403: Forbidden

Training on lower precision

Hi, great work done here!
Have you tried training or inferring the models at a lower precision? What is the performance loss for that?

Finetuning Speech Encoders further

Hi,

I tried finetuning the Swahili speech encoder but the performance only increases to 9.6 BLEU from a base BLEU score of 7.5 on your already finetuned encoder. I finetuned the speech encoder for 5 epochs with augmented data. I am not willing to try more epochs as the performance increase is not I had imagined. I finetuned with about 30hrs of data. The MSE loss in the last epoch was 1.5*10^-6. Any different approach that might help achieve a better BLEU?

Also, what is the finetuned decoder model checkpoint that I read in the paper does well for Swahili? When I try to use it I get the error - ValueError: The input sequence length must be less than or equal to the maximum sequence length (512), but is 513 instead which I do not get for the normal decoder. All my audios are less than or equal to 30 sec.

Thank you for your time!

Language Code Mappings [Text & Speech]

Hi Team,

Is there a clear mapping between languages in the two letter format (e.g. en, de, fr, pt, ...) to the format present for Sonar. Is there a conversion script somewhere, or a clear mapping and explanation of the language codes?

Particular, it seems there is a speech format:
https://github.com/facebookresearch/SONAR/blob/main/sonar/cards/sonar_speech_encoder.yaml

And there is a text format:
https://github.com/facebookresearch/SONAR/blob/main/sonar/cards/text_sonar_basic_encoder.yaml

Thank you.

RuntimeError at Ray Serve Deployment: Mismatched Devices

When running the model under Ray Serve, I encountered a RuntimeError suggesting a device mismatch ("Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!").

Error logs:

(ServeReplica:default_FastAPIDeployment pid=3985235) ERROR 2023-08-25 08:10:39,337 default_FastAPIDeployment default_FastAPIDeployment#IxFVYy KOFVbAbqev /embedding default replica.py:636 - Request failed due to RayTaskError(DataPipelineError):
(ServeReplica:default_FastAPIDeployment pid=3985235) Traceback (most recent call last):
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 633, in invoke_single
(ServeReplica:default_FastAPIDeployment pid=3985235)     result = await method_to_call(*request_args, **request_kwargs)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/ray/serve/_private/http_util.py", line 411, in __call__
(ServeReplica:default_FastAPIDeployment pid=3985235)     await self._asgi_app(
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/fastapi/applications.py", line 289, in __call__
(ServeReplica:default_FastAPIDeployment pid=3985235)     await super().__call__(scope, receive, send)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
(ServeReplica:default_FastAPIDeployment pid=3985235)     await self.middleware_stack(scope, receive, send)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
(ServeReplica:default_FastAPIDeployment pid=3985235)     raise exc
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
(ServeReplica:default_FastAPIDeployment pid=3985235)     await self.app(scope, receive, _send)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
(ServeReplica:default_FastAPIDeployment pid=3985235)     raise exc
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
(ServeReplica:default_FastAPIDeployment pid=3985235)     await self.app(scope, receive, sender)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
(ServeReplica:default_FastAPIDeployment pid=3985235)     raise e
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
(ServeReplica:default_FastAPIDeployment pid=3985235)     await self.app(scope, receive, send)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
(ServeReplica:default_FastAPIDeployment pid=3985235)     await route.handle(scope, receive, send)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
(ServeReplica:default_FastAPIDeployment pid=3985235)     await self.app(scope, receive, send)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
(ServeReplica:default_FastAPIDeployment pid=3985235)     response = await func(request)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/fastapi/routing.py", line 273, in app
(ServeReplica:default_FastAPIDeployment pid=3985235)     raw_response = await run_endpoint_function(
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/fastapi/routing.py", line 190, in run_endpoint_function
(ServeReplica:default_FastAPIDeployment pid=3985235)     return await dependant.call(**values)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/Github/sonar_testing/source/deployments/fast_api.py", line 119, in encode_sentences
(ServeReplica:default_FastAPIDeployment pid=3985235)     embeddings = ray.get(ref)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
(ServeReplica:default_FastAPIDeployment pid=3985235)     return fn(*args, **kwargs)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
(ServeReplica:default_FastAPIDeployment pid=3985235)     return func(*args, **kwargs)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/ray/_private/worker.py", line 2524, in get
(ServeReplica:default_FastAPIDeployment pid=3985235)     raise value.as_instanceof_cause()
(ServeReplica:default_FastAPIDeployment pid=3985235) ray.exceptions.RayTaskError(DataPipelineError): ray::ServeReplica:default_SentenceEncoder.handle_request() (pid=3985222, ip=192.168.4.101)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
(ServeReplica:default_FastAPIDeployment pid=3985235)     return forward_call(*args, **kwargs)
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/sonar/models/sonar_text/model.py", line 112, in forward
(ServeReplica:default_FastAPIDeployment pid=3985235)     sentence_embeddings = self.sentence_embedding_pooling(
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/sonar/models/sonar_text/model.py", line 96, in sentence_embedding_pooling
(ServeReplica:default_FastAPIDeployment pid=3985235)     sentence_embedding = torch.einsum(
(ServeReplica:default_FastAPIDeployment pid=3985235)   File "/data/share/user/simon.choi/.virtualenv/sonar_testing/lib/python3.10/site-packages/torch/functional.py", line 378, in einsum
(ServeReplica:default_FastAPIDeployment pid=3985235)     return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
(ServeReplica:default_FastAPIDeployment pid=3985235) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Fix:
It could be fixed by specifying the device when creating padding_mask under source/models/sonar_text/model.py:80

if padding_mask is None:
    padding_mask = torch.zeros(seqs.shape[:2], device=seqs.device)

How to Finetune with X->English data?

I have a dataset with audio in X language having corresponding English translations. Should I finetune the encoder to match the vector space of encoded English text, or should I finetune the decoder after freezing the X audio encoder parameters?

Thank you for your response!

embedding -> text pipeline

Thanks for your amazing work on this project.
Curious if you plan to create a simple wrapper for an embedding to text model pipeline? Basically a decoder only to leverage precomputed embeddings to translate into a variety of languages, rather than having to re-create the embeddings using the text2text pipelines over and over again.

Thanks!

change max seq len

Is it a way to change max_seq_len from 514 to 1024 for example?
or somehow compute current seq_len of text to avoid long texts, tokenizer hasnt method to return tokens.

[INPUT] Text (or Speech) Length of Blaser 2.0

For translation quality estimation of Blaser 2.0, I think there is no limitation of the text (or the speech) length. However, from my personal perspective, I do not think the estimation will be accurate if the text (or the speech) is too long.

So, what text length and speech length (of source, reference, and hypothesis) do you recommend?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.