pdlan / oscar Goto Github PK
View Code? Open in Web Editor NEWCode for ICML 2021 paper: How could Neural Networks understand Programs?
Home Page: https://arxiv.org/pdf/2105.04297
Code for ICML 2021 paper: How could Neural Networks understand Programs?
Home Page: https://arxiv.org/pdf/2105.04297
The given dataset link https://1drv.ms/u/s!AjYwgux2zLgMiAhYpoCU3jLu20Z6?e=XR52y9 is not available. It says that "this item might not exist or is no longer available".
Hi i am trying to replicate the clone classification task with the poj104 dataset, however I am having an error when I execute the process shell script inside process-poj-clone-detection folder
The script fails to run 5_json_to_rawtext.py as files inst_dict.txt and state_dict.txt meant to be located in ../data-bin/pretrain are not created .
Hi I'm doing pretrain with given scripts and faced below error while executing ./model/scripts/pretrain.sh
.
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/oscar/model/train.py", line 309, in distributed_main
main(args, init_distributed=True)
File "/oscar/model/train.py", line 51, in main
model = task.build_model(args)
File "/oscar/model/fairseq/tasks/ir_masked_lm.py", line 210, in build_model
model = models.build_model(args, self)
File "/oscar/model/fairseq/models/__init__.py", line 45, in build_model
return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
File "/oscar/model/fairseq/models/irbert/model.py", line 86, in build_model
encoder = IRBertEncoder(args, task.instruction_dictionary, task.state_dictionary)
File "/oscar/model/fairseq/models/irbert/model.py", line 263, in __init__
copy_weights(self.sentence_encoder, self.sentence_encoder_momentum)
NameError: name 'copy_weights' is not defined
copy_weights function seems missing in this repository.
Hi!I tried to fix it on a servers without internet.So I modified the dockerfile and solve some probelms but this one too hard for me.Could you help me? Thank You
Current default time zone: 'Etc/UTC'
Local time is now: Sun Nov 20 22:20:26 UTC 2022.
Universal Time is now: Sun Nov 20 22:20:26 UTC 2022.
Run 'dpkg-reconfigure tzdata' if you wish to change it.
Setting up systemd-sysv (245.4-4ubuntu3.18) ...
Setting up libelf1:amd64 (0.176-1.1build1) ...
Setting up libicu66:amd64 (66.1-2ubuntu2.1) ...
Setting up libglib2.0-0:amd64 (2.64.6-1ubuntu20.04.4) ...ubuntu20.04.1) ...
Setting up libtinfo6:amd64 (6.2-0ubuntu2) ...
Setting up libproxy1v5:amd64 (0.4.15-10ubuntu1.2) ...
Setting up glib-networking-services (2.64.2-1ubuntu0.1) ...
Setting up distro-info-data (0.43ubuntu1.11) ...
Setting up cmake-data (3.16.3-1ubuntu1) ...
Setting up libstemmer0d:amd64 (0+svn585-2) ...
Setting up librtmp1:amd64 (2.4+20151223.gitfa8646d.1-2build1) ...
Setting up libpackagekit-glib2-18:amd64 (1.1.13-2ubuntu1.1) ...
Setting up libbsd0:amd64 (0.10.0-1) ...
Setting up libkrb5support0:amd64 (1.17-6ubuntu4.1) ...
Setting up ucf (3.0038+nmu1) ...
Setting up libgirepository-1.0-1:amd64 (1.64.1-1
Setting up libxml2:amd64 (2.9.10+dfsg-5ubuntu0.20.04.4) ...
Setting up libmagic-mgc (1:5.38-4) ...
Setting up uuid-runtime (2.34-0.1ubuntu9.3) ...
Adding group uuidd' (GID 105) ... Done. Warning: The home dir /run/uuidd you specified can't be accessed: No such file or directory Adding system user
uuidd' (UID 104) ...
Adding new user uuidd' (UID 104) with group
uuidd' ...
Not creating home directory `/run/uuidd'.
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Created symlink /etc/systemd/system/sockets.target.wants/uuidd.socket → /lib/systemd/system/uuidd.socket.
Setting up libmagic1:amd64 (1:5.38-4) ...
Setting up librhash0:amd64 (1.3.9-1) ...
Setting up libcbor0.6:amd64 (0.6.0-0ubuntu1) ...
Setting up libyaml-0-2:amd64 (0.2.2-1) ...
Setting up gir1.2-glib-2.0:amd64 (1.64.1-1ubuntu20.04.1) ...ubuntu20.04.4) ...
Setting up libglib2.0-data (2.64.6-1
Setting up krb5-locales (1.17-6ubuntu4.1) ...
Setting up publicsuffix (20200303.0012-1) ...
Setting up libfido2-1:amd64 (1.3.1-1ubuntu2) ...
Setting up wget (1.20.3-1ubuntu2) ...
Setting up libdconf1:amd64 (0.36.0-1) ...
Setting up libcrypt-dev:amd64 (1:4.4.10-10ubuntu4) ...
Setting up dmsetup (2:1.02.167-1ubuntu1) ...
Setting up shared-mime-info (1.15-1) ...
Setting up gir1.2-packagekitglib-1.0 (1.1.13-2ubuntu1.1) ...
Setting up libc-dev-bin (2.31-0ubuntu9.9) ...
Setting up libxdmcp6:amd64 (1:1.1.3-0ubuntu1) ...
Setting up libkeyutils1:amd64 (1.6-6ubuntu1.1) ...
Setting up libglib2.0-bin (2.64.6-1ubuntu20.04.4) ...20.04.5) ...
Setting up libc6-dev:amd64 (2.31-0ubuntu9.9) ...
Setting up xdg-user-dirs (0.17-2ubuntu1) ...
Setting up libx11-data (2:1.6.9-2ubuntu1.2) ...
Setting up libxau6:amd64 (1:1.0.9-0ubuntu1) ...
Setting up libmpdec2:amd64 (2.4.2-3) ...
Setting up libpolkit-gobject-1-0:amd64 (0.105-26ubuntu1.3) ...
Setting up libdbus-1-3:amd64 (1.12.16-2ubuntu2.3) ...
Setting up libreadline8:amd64 (8.0-4) ...
Setting up libjsoncpp1:amd64 (1.7.4-3.1ubuntu2) ...
Setting up libedit2:amd64 (3.1-20191231-1) ...
Setting up libk5crypto3:amd64 (1.17-6ubuntu4.1) ...
Setting up less (551-1ubuntu0.1) ...
Setting up libgstreamer1.0-0:amd64 (1.16.3-0ubuntu1.1) ...
Setcap worked! gst-ptp-helper is not suid!
Setting up libarchive13:amd64 (3.4.0-2ubuntu1.2) ...
Setting up libpolkit-agent-1-0:amd64 (0.105-26ubuntu1.3) ...
Setting up libncursesw6:amd64 (6.2-0ubuntu2) ...
Setting up file (1:5.38-4) ...
Setting up libkrb5-3:amd64 (1.17-6ubuntu4.1) ...
Setting up dbus (1.12.16-2ubuntu2.3) ...
Setting up libxcb1:amd64 (1.14-2) ...
Setting up libpython3.8-stdlib:amd64 (3.8.10-0ubuntu1
Setting up libpython3-stdlib:amd64 (3.8.2-0ubuntu2) ...
Setting up libpam-systemd:amd64 (245.4-4ubuntu3.18) ...
Setting up policykit-1 (0.105-26ubuntu1.3) ...
Setting up python3.8 (3.8.10-0ubuntu1~20.04.5) ...
Setting up libx11-6:amd64 (2:1.6.9-2ubuntu1.2) ...
Setting up libxmuu1:amd64 (2:1.1.3-0ubuntu1) ...
Setting up dbus-user-session (1.12.16-2ubuntu2.3) ...
Setting up libgssapi-krb5-2:amd64 (1.17-6ubuntu4.1) ...
Setting up libssh-4:amd64 (0.9.3-2ubuntu2.2) ...
Setting up openssh-client (1:8.2p1-4ubuntu0.5) ...
Setting up libxext6:amd64 (2:1.3.4-0ubuntu1) ...
Setting up python3 (3.8.2-0ubuntu2) ...
Setting up dconf-service (0.36.0-1) ...
Setting up libcurl3-gnutls:amd64 (7.68.0-1ubuntu2.14) ...
Setting up python3-idna (2.8-1) ...
Setting up libcurl4:amd64 (7.68.0-1ubuntu2.14) ...
Setting up python3-six (1.14.0-2) ...
Setting up python3-certifi (2019.11.28-1) ...
Setting up python3-pkg-resources (45.2.0-1) ...
Setting up python3-gi (3.36.0-1) ...
Setting up lsb-release (11.1.0ubuntu2) ...
Setting up xauth (1:1.1-0ubuntu1) ...
Setting up python3-chardet (3.0.4-4build1) ...
Setting up python3-urllib3 (1.25.8-2ubuntu0.1) ...
Setting up cmake (3.16.3-1ubuntu1) ...
Setting up dconf-gsettings-backend:amd64 (0.36.0-1) ...
Setting up git (1:2.25.1-1ubuntu3.6) ...
Setting up python3-distro-info (0.23ubuntu1) ...
Setting up python3-apt (2.0.0ubuntu0.20.04.8) ...
Setting up python3-dbus (1.2.16-1build1) ...
Setting up gsettings-desktop-schemas (3.36.0-1ubuntu1) ...
Setting up glib-networking:amd64 (2.64.2-1ubuntu0.1) ...
Setting up unattended-upgrades (2.3ubuntu0.3) ...
Creating config file /etc/apt/apt.conf.d/20auto-upgrades with new version
Creating config file /etc/apt/apt.conf.d/50unattended-upgrades with new version
Created symlink /etc/systemd/system/multi-user.target.wants/unattended-upgrades.service → /lib/systemd/system/unattended-upgrades.service.
Setting up python3-requests (2.22.0-2ubuntu1) ...
Setting up python3-software-properties (0.99.9.8) ...
Setting up networkd-dispatcher (2.1-2~ubuntu20.04.3) ...
Created symlink /etc/systemd/system/multi-user.target.wants/networkd-dispatcher.service → /lib/systemd/system/networkd-dispatcher.service.
Setting up python3-requests-unixsocket (0.2.0-2) ...
Setting up libsoup2.4-1:amd64 (2.70.0-1) ...
Setting up libappstream4:amd64 (0.12.10-2) ...
Setting up packagekit (1.1.13-2ubuntu1.1) ...
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of force-reload.
Failed to open connection to "system" message bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
Created symlink /etc/systemd/user/sockets.target.wants/pk-debconf-helper.socket → /usr/lib/systemd/user/pk-debconf-helper.socket.
Setting up software-properties-common (0.99.9.8) ...
Setting up packagekit-tools (1.1.13-2ubuntu1.1) ...
Processing triggers for systemd (245.4-4ubuntu3.18) ...
Processing triggers for libc-bin (2.31-0ubuntu9.9) ...
Processing triggers for dbus (1.12.16-2ubuntu2.3) ...
The command '/bin/sh -c apt-get update && apt-get install -y git cmake uuid-runtime lsb-release wget software-properties-common && wget --quiet https://golang.org/dl/go1.16.6.linux-amd64.tar.gz -O ~/go.tar.gz && tar xzf ~/go.tar.gz -C /opt/ && ln -s /opt/go/bin/go /usr/local/bin/go && rm ~/go.tar.gz' returned a non-zero code: 4
Here is the new Dockerfile.
The main modification is at env.
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH /opt/conda/bin:$PATH
ENV DEBIAN_FRONTEND noninteractive
env http_proxy "http://59.69.106.68:808"
env https_proxy "http://59.69.106.68:808"
env ftp_proxy "http://59.69.106.68:808"
ADD sources.list /etc/apt
RUN apt-get update &&
apt-get install -y git cmake uuid-runtime lsb-release wget software-properties-common &&
wget --quiet https://golang.org/dl/go1.16.6.linux-amd64.tar.gz -O ~/go.tar.gz &&
tar xzf ~/go.tar.gz -C /opt/ &&
ln -s /opt/go/bin/go /usr/local/bin/go &&
rm ~/go.tar.gz
Hello! Do you have plans to upload source code of the baseline models (especially the 'BinaryAI')? Thanks a lot.
Hi! Thanks for open-sourcing your great work.
I'd like to use OSCAR for embedding binary, but I couldn't find the pre-trained BERT model so I was pre-training it by myself with provided scripts.
So, I executed OSCAR/process-pretrain-data/process.sh
in the given docker and faced the below error msg.
It seems the irexp_transformer_sentence_encoder
file is missing under fairseq/modules directory.
Could you help to resolve this? Thanks!
Traceback (most recent call last):
File "preprocess.py", line 13, in <module>
from fairseq import options, tasks, utils
File "/oscar/model/fairseq/__init__.py", line 9, in <module>
import fairseq.criterions # noqa
File "/oscar/model/fairseq/criterions/__init__.py", line 24, in <module>
importlib.import_module('fairseq.criterions.' + module)
File "/opt/conda/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/oscar/model/fairseq/criterions/sentence_ranking.py", line 11, in <module>
from fairseq import utils
File "/oscar/model/fairseq/utils.py", line 20, in <module>
from fairseq.modules import gelu, gelu_accurate
File "/oscar/model/fairseq/modules/__init__.py", line 28, in <module>
from .irexp_transformer_sentence_encoder import IRTransformerSentenceEncoderNoMiddleLayers
ModuleNotFoundError: No module named 'fairseq.modules.irexp_transformer_sentence_encoder'
Hi,
Thanks for your excellent work. But I encountered an issue when running
./scripts/bindiff.sh
training environment
torch 1.10.0+cu111
torchaudio 0.10.0+cu111
torchvision 0.11.0+cu111
GPU: RTX 2080ti
RAM: 11GB
@register_criterion('poj_similarity')
class PojSimilarityLoss(FairseqCriterion):
def __init__(self, args, task):
super().__init__(args, task)
self.inst_padding_idx = task.instruction_dictionary.pad()
self.state_padding_idx = task.state_dictionary.pad()
self.task = task
self.args = args
def forward(self, model, sample, reduce=True, train=True):
no_state = self.args.no_state
no_pce = self.args.no_pce
pooling = self.args.use_pooling
output = model(**sample['net_input'], masked_tokens=None, features_only=True, moco_head=False,
moco_head_only_proj=False, lm_head=False, classification_head_name=None,
has_state=not no_state, has_pce=not no_pce, pooling_instruction=pooling)
after changing this to
#pooling = self.args.use_pooling
pooling = self.args.no_pooling
got another error:
2. multiple values for keyword "has_pce"
File "/mnt/g/Projects/OSCAR/model/fairseq/models/irbert/model.py", line 92, in forward
x, extra = self.decoder(src, features_only, return_all_hiddens, moco_head=moco_head, has_state=has_state,
TypeError: IRBertEncoder object got multiple values for keyword argument 'has_pce‘
after removing this keyword, got another error:
3. got multiple values for keyword argument 'pooling_instruction'
File "/mnt/g/Projects/OSCAR/model/fairseq/models/irbert/model.py", line 92, in forward
x, extra = self.decoder(src, features_only, return_all_hiddens, moco_head=moco_head, has_state=has_state,
TypeError: IRBertEncoder object got multiple values for keyword argument 'pooling_instruction'
after removing this, got another error too:
add_(Tensor other, *, Number alpha) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:1050.)
exp_avg.mul_(beta1).add_(1 - beta1, grad)
Traceback (most recent call last):
File "train.py", line 356, in <module>
cli_main()
File "train.py", line 321, in cli_main
main(args)
File "train.py", line 95, in main
train(args, trainer, task, epoch_itr)
File "train.py", line 139, in train
log_output = trainer.train_step(samples)
File "/OSCAR/model/fairseq/trainer.py", line 346, in train_step
raise e
File "/Projects/OSCAR/model/fairseq/trainer.py", line 309, in train_step
loss, sample_size, logging_output = self.task.train_step(
File "/OSCAR/model/fairseq/tasks/fairseq_task.py", line 248, in train_step
optimizer.backward(loss)
File "/OSCAR/model/fairseq/optim/fp16_optimizer.py", line 103, in backward
loss.backward()
File "/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/lib/python3.8/site-packages/torch/autograd/__init__.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I don't know how to fix this. Can you tell me what did I miss? Thanks for your time in advance.
Hi,
I built the dependencies for the docker image that you provided, but when I run make
in /oscar/bin
, I get:
cd ../irlexer && go build && cp irlexer ../bin
main.go:14:2: cannot find package "github.com/ianlancetaylor/demangle" in any of:
/usr/lib/go-1.10/src/github.com/ianlancetaylor/demangle (from $GOROOT)
/root/go/src/github.com/ianlancetaylor/demangle (from $GOPATH)
main.go:15:2: cannot find package "github.com/llir/ll" in any of:
/usr/lib/go-1.10/src/github.com/llir/ll (from $GOROOT)
/root/go/src/github.com/llir/ll (from $GOPATH)
Makefile:5: recipe for target 'irlexer' failed
make: *** [irlexer] Error 1
Best,
Jesse
Hi. I was trying to train the models with given scripts and dataset but it took more time than I expected.
So, if you don't mind, could you share your pre-trained models? Probably, I do not have enough GPU to achieve reasonable training time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.