jetbrains-research / psiminer Goto Github PK
View Code? Open in Web Editor NEWA Tool for Mining Rich Abstract Syntax Trees from Code
License: Apache License 2.0
A Tool for Mining Rich Abstract Syntax Trees from Code
License: Apache License 2.0
Does it provide parsing errors given some buggy code?
While evaluating code2seq model on projects apache__hbase and wildfly__wildfly from test part of java-med dataset, preprocessed via psiminer (see the config) I got errors:
wildfly__wildfly
Global seed set to 7
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Testing: 0it [00:00, ?it/s][W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Testing: 93%|████████████████████████████████████████████████████████████████▍ | 14/15 [00:13<00:00, 1.14it/s]Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/ubuntu/fine-tuning/fine-tuning-ml-models/scripts/test_all.py", line 42, in <module>
test_all(args.dataset, args.model, args.results)
File "/home/ubuntu/fine-tuning/fine-tuning-ml-models/scripts/test_all.py", line 25, in test_all
metrics = test_single(model_path, os.path.join(PREPROCESSED_DATASETS_DIR, project_name))
File "/home/ubuntu/fine-tuning/fine-tuning-ml-models/scripts/test_single.py", line 21, in test_single
results = test(model_path, project_path, batch_size=1)
File "dependencies/code2seq_repo/code2seq/test.py", line 57, in test
return trainer.test(model, datamodule=data_module)
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 581, in test
results = self._run(model)
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run
self.dispatch()
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 795, in dispatch
self.accelerator.start_evaluating(self)
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 99, in start_evaluating
self.training_type_plugin.start_evaluating(trainer)
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 148, in start_evaluating
self._results = trainer.run_stage()
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 806, in run_stage
return self.run_evaluate()
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1049, in run_evaluate
eval_loop_results = self.run_evaluation()
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 954, in run_evaluation
for batch_idx, batch in enumerate(dataloader):
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/torch/_utils.py", line 425, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "dependencies/code2seq_repo/code2seq/dataset/path_context_dataset.py", line 82, in __getitem__
splitted_contexts = [self._split_context(str_contexts[i]) for i in context_indexes]
File "dependencies/code2seq_repo/code2seq/dataset/path_context_dataset.py", line 82, in <listcomp>
splitted_contexts = [self._split_context(str_contexts[i]) for i in context_indexes]
File "dependencies/code2seq_repo/code2seq/dataset/path_context_dataset.py", line 51, in _split_context
from_token, path_nodes, to_token = context.split(",")
ValueError: not enough values to unpack (expected 3, got 2)
Exception ignored in: <function tqdm.__del__ at 0x7ff4422eeaf0>
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/tqdm/std.py", line 1122, in __del__
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/tqdm/std.py", line 1335, in close
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/tqdm/std.py", line 1514, in display
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/tqdm/std.py", line 1125, in __repr__
File "/home/ubuntu/anaconda3/envs/fine-tuning-env/lib/python3.8/site-packages/tqdm/std.py", line 1475, in format_dict
TypeError: cannot unpack non-iterable NoneType object
It seems that part of the path context is missing. Note, that it is also the last line of the .c2s file, so the preprocessed file looks uncompleted.
As a part of mining CodeSearchNet graphs (#31), we should be able to mine JS data. It should include:
As a part of mining CodeSearchNet graphs (#31), we should be able to mine PHP data. It should include:
The java-med dataset has a file train/stanfordnlp__CoreNLP/src/edu/stanford/nlp/process/PTBLexer.java
containing 76704 lines of code. IDEA parses this file as a single PsiPlainText
element and psiminer does the same. I think the miner should skip such files with a warning
As a part of mining CodeSearchNet graphs (#31), we should be able to mine Python data. It should include:
!!!Actions required!!!
We have published current plugin-utilities lib version from master branch to space maven repository and in a week planning to modify api (so master branch will become invalid). Please, replace plugin-utilities lib git-based installation in your build.gradle.kts with (standard dependency declaration) by adding space repository and declaring required dependencies:
repositories {
maven(“https://packages.jetbrains.team/maven/p/big-code/bigcode”)
}
dependencies {
implementation(“org.jetbrains.research:plugin-utilities-core:1.0")
}
As a part of mining CodeSearchNet graphs (#31), we should be able to mine Go data. It should include:
As a part of mining CodeSearchNet graphs (#31), we should be able to mine Ruby data. It should include:
Hello, thanks for a great tool! On this line the message in the exception should be "resolve types" not "remove comments.
Hi All,
I download the source code and open with idea version 2020.3.
It starts build, but fails. The log is:
A problem occurred configuring root project 'psiminer'.
Could not resolve all artifacts for configuration ':classpath'.
Could not resolve org.jetbrains.intellij.plugins:gradle-intellij-plugin:1.1.4.
Required by:
project : > org.jetbrains.intellij:org.jetbrains.intellij.gradle.plugin:1.1.4
> No matching variant of org.jetbrains.intellij.plugins:gradle-intellij-plugin:1.1.4 was found. The consumer was configured to find a runtime of a library compatible with Java 8, packaged as a jar, and its dependencies declared externally, as well as attribute 'org.gradle.plugin.api-version' with value '7.0.1' but:
- Variant 'apiElements' capability org.jetbrains.intellij.plugins:gradle-intellij-plugin:1.1.4 declares a library, packaged as a jar, and its dependencies declared externally:
- Incompatible because this component declares an API of a component compatible with Java 11 and the consumer needed a runtime of a component compatible with Java 8
- Other compatible attribute:
- Doesn't say anything about org.gradle.plugin.api-version (required '7.0.1')
- Variant 'runtimeElements' capability org.jetbrains.intellij.plugins:gradle-intellij-plugin:1.1.4 declares a runtime of a library, packaged as a jar, and its dependencies declared externally:
- Incompatible because this component declares a component compatible with Java 11 and the consumer needed a component compatible with Java 8
- Other compatible attribute:
- Doesn't say anything about org.gradle.plugin.api-version (required '7.0.1')
It seems that it fails to download dependencies.
Could anyone help?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.