Language modeling via stochastic processes. Oral @ ICLR 2022.

Python 94.69% Shell 0.77% Makefile 0.02% Dockerfile 0.04% Jsonnet 0.01% CSS 0.04% JavaScript 0.12% Jupyter Notebook 4.34%

contrastive-learning language language-model stochastic-processes

language_modeling_via_stochastic_processes's People

Contributors

Stargazers

Watchers

Forkers

dumpmemory coolcodelvs mr-insec linxueyuanstdio lyndonlens simrit1 lipiji henderson11 swappysh jiemingcheng-hub tianyu-z aps0611

language_modeling_via_stochastic_processes's Issues

how to reproduce results in the paper?

Hi Rose and other authors,

I found your work quite interesting but I'm confused about some details, hope that you can help clarify my confusions:

Are there instructions on how to reproduce results (i.e., numbers in tables) in the paper? I can understand that you cannot share the large trained models, but can you release scripts and instructions for producing the numbers using small models?

For example, it's not clear to me how you calculated the length mismatch in table 3: in appendix E - Wikisection you said The length mismatch in % used in Table 3 is calculated with respect to the training set lengths, but in your code language_modeling_via_stochastic_processes/transformers/examples/pytorch/text-generation/generation_metrics.py at lines 308-311, the statistics used for wikisection do not match either the training or the test statistics. Which split did you actually use? Besides, how did you calculate the exact numbers? Did you compare the absolute difference between the average section lengths of each section type, or did you compare the absolute difference between the corresponding examples (since for each generation you have a corresponding ground truth where the start of latent variables come from) and then take the average? Did you take the average over all section types to produce a single number? And in the forced long generation, how did you compute the mismatch?

As another example, it is not clear whether the provided generation setting is for the experiments in table 3 or table 4: since no_eos was set in the command, it seems like the forced long generation setting, but then table 4 doesn't use wikisection dataset. Can you clarify which experiment this provided generation command is for?

Lastly, I noticed a seemingly inconsistency between your paper and code: in the paper you said We first sample a start and end latent, z_0 ∼ p(z_0), z_T ∼ p(z_T ) where p(z_0), p(z_T ) are calculated as the density estimates over the training dataset. (appendix C.1), but in the code you only sampled z_T from the estimated gaussian on the training set: z_0 is directly taken from the encoded first ground truth sentence: https://github.com/rosewang2008/language_modeling_via_stochastic_processes/blob/main/language_modeling_via_stochastic_processes/transformers/examples/pytorch/text-generation/run_decoding_from_embeddings.py#L371 In fact, the estimated p(z_0) at this line was never used: https://github.com/rosewang2008/language_modeling_via_stochastic_processes/blob/main/language_modeling_via_stochastic_processes/transformers/examples/pytorch/text-generation/run_decoding_from_embeddings.py#L292

Looking forward to your reply!

Thanks,
Yuntian

code for roc stories?

It seems that ROC Stories use a different setup text infilling, but I can't find code in this repository for infilling. Am I missing anything or is the code not part of this repo? Thank you in advance!

Issue in data loading

It seems to me that this line should be changed to if 'tm' in self.name (

language_modeling_via_stochastic_processes/language_modeling_via_stochastic_processes/transformers/src/transformers/data/datasets/language_modeling.py

Line 1201 in 5cbc3ee

if "restaurant" in self.name:

), since you were using self.start_conversation and self.end_conversation to split the training and test sets (see
https://github.com/rosewang2008/language_modeling_via_stochastic_processes/blob/main/language_modeling_via_stochastic_processes/transformers/src/transformers/data/datasets/language_modeling.py#L1182
) for the tm2 dataset. With the current code, it seems that the training and test sets would be the same for tm2.

please check it in " C.1 NORMAL TEXT GENERATION"

We pin our trajectory to the start and end latent, and run the Brownian bridge using Equation ??

please check it,thanks

The classification_model attribute cannot be found from the code...

language_modeling_via_stochastic_processes/language_modeling_via_stochastic_processes/transformers/examples/pytorch/text-generation/generation_metrics.py

Line 24 in 95b5492

model_path = model_args.classification_model

Just a simple question on your paper

I read your paper just before and realized it is really impressing. As I didn't register for ICLR, I want to ask a simple question (maybe embarrassing) on here.
I thought the function d() on the Loss function you proposed calculates distance between hidden vector of {x_t} and {mu_t} (which lies on the line starts from z_0 to z_T). So does it mean further between z_t and mu_t makes bigger d()? If so, how does the contrastive loss you proposed work? I thought the objective of the loss is to increase d(z_t, mu_t) and to decrease d(z', mu_t).

path2huggenface

Hi, How should I change the path2huggenface variable in constant.py?
Thanks a lot!

AttributeError

language_modeling_via_stochastic_processes/language_modeling_via_stochastic_processes/scripts/train_encoder.py

Lines 76 to 77 in 693a374

 M=system.train_dataset.A, 

 dt=config.data_params.dt,

Thank you very much for open-sourcing your work.

For these two lines, when I run it, I find that the attribute A of the dataset cannot be found, and the corresponding data_params.dt cannot be found in the brownian_bridge.yaml file.

I don't know if my operation is wrong, I look forward to your answer.

error when using batch size > 1 for decoder training

Hello,

Thanks for the amazing work. I run into this error when using larger batch size to train the decoder.

 File "/data/khalifam/envs/tc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl                            return forward_call(*input, **kwargs)
  File "/data/khalifam/tc/language_modeling_via_stochastic_processes/transformers/src/transformers/models/gpt2/modeling_time_gpt2.py", line 1163, in forward
    transformer_outputs = self.transformer(
  File "/data/khalifam/envs/tc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl                            result = forward_call(*input, **kwargs)
  File "/data/khalifam/tc/language_modeling_via_stochastic_processes/transformers/src/transformers/models/gpt2/modeling_time_gpt2.py", l
ine 991, in forward                                                                                                                         outputs = block(                                                                                                                      File "/data/khalifam/envs/tc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl                            return forward_call(*input, **kwargs)                                                                                                 File "/data/khalifam/tc/language_modeling_via_stochastic_processes/transformers/src/transformers/models/gpt2/modeling_time_gpt2.py", line 321, in forward                                                                                                                         attn_outputs = self.attn(
  File "/data/khalifam/envs/tc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/khalifam/tc/language_modeling_via_stochastic_processes/transformers/src/transformers/models/gpt2/modeling_time_gpt2.py", l
ine 247, in forward
    query, key, value = self.c_attn(hidden_states).split(self.split_size, dim=2)
ValueError: not enough values to unpack (expected 3, got 1)

I assume the code only works with bsz=1, but I wanted to make sure.

Question in your generation process :simulate_brownian_bridge

Thx for opening your source code!
However there exists some problems:
In the run_decoding_from_embeddings.py,
line 91: x_tp1 = x_t,
this code doesnt work, should it be reverse?

a small error in your setup.py

"datasets=2.0.0" → "datasets==2.0.0"

Issues during encoder training with wikisection data

Hi Rose,

The issue occurred during the testing phase. The error message reads like:

ValueError: 'self.log(test_loss, [3.5329943])' was called, but 'ndarray' values cannot be logged

And it was raised from pytorch_lighting/core/lighting.py

Do you have any ideas?

Thanks so much!

Zhecheng

It seems that there is a detail inconsistency between the code and the paper. Where does the µt that the negative sample z' needs to stay away from come from?

Thank you for your excellent work, but I still have a question. As shown in a figure of your paper (the link is posted below), in the latent space, for a positive triplet (z0,zt,zT) and a negative sample z', the encoder makes zt close to the expected embedding µt and z' far from µt. What I understand is that µt is generated according to z0, zT, and time t, which is independent of the time of negative sample z'. In other words, when calculating the negative sample part in the loss function, it does not use the negative sample's own time step.
https://github.com/rosewang2008/language_modeling_via_stochastic_processes/blob/main/images/encoder.png
However, in the code, I found that the time t' of the negative sample itself is used (see t=self.t[idx]), even if this time is greater than T in some triplets of the same batch. These times are equivalent to using z0, zT, and a time t' greater than T to estimate µt' and stay away from it. I want to know whether this approach is different from the paper and reasonable. I look forward to and thank you for your answers.

language_modeling_via_stochastic_processes/language_modeling_via_stochastic_processes/src/objectives/brownian_bridge.py

Line 105 in a546add

t_=self.t_, t=self.t[idx], T=self.T)

Error in wikihow data preprocessing

language_modeling_via_stochastic_processes/language_modeling_via_stochastic_processes/src/datasets/wikihow.py

Line 41 in 5cbc3ee

split_pattern = ". "

Seems that there's an extra space here, which would result in not splitting sentences.

question of a recent commit (and also reproducibility questions)

Hi Rose,

The recent commit changed x_tp1=x_tp to x_tp=x_tp1 (cb3d345?diff=split?), but are your results based on this new commit or the old one? Since I also got different numbers on Wikisection compared to your paper (based on the version before the above commit), as brought up by another person in the latest post of #7. (Just to make sure, the results in the paper are based on GPT2-small which is called gpt-2 in huggingface, but not GPT2-large/xl right?)

Besides, I have a question about simulate_brownian_bridge: in this function, x_tp1 = x_t * (1- dt/(1-t)) + (dt/(1-t))*B_T + noise, but why is there a dt=0.05? According to the Brownian bridge process, shouldn't this be either x_tp1 = x_0 * (1 - t) + t * B_T + noise (if you use the older version always interpolating between x_0 and x_T), or x_tp1 = x_t * (1-1/(T-num_samples)) + 1/(T-num_samples) * B_T + noise (if you use the newer version interpolating between x_t and x_T)? And why is this noise term fixed rather than depending on t and T as in Equation 1 in the paper?

Lastly, I wonder if it's possible for you to share one setting of your model (Wikisection, TC-32) since that would address the issue of #7 as well. I can understand that it's hard to share big files, but I think google drive allows uploading big files, and you can remove optimizer.pt to make the checkpoint smaller.

Thanks,
Yuntian

rosewang2008 / language_modeling_via_stochastic_processes Goto Github PK

language_modeling_via_stochastic_processes's People

Contributors

Stargazers

Watchers

Forkers

language_modeling_via_stochastic_processes's Issues

Recommend Projects

Recommend Topics

Recommend Org