ycachy / codee Goto Github PK
View Code? Open in Web Editor NEWCodee
Codee
Hi, I have some questions related to source code execution while running to test the source codes for academic purposes.
Question 1.
Looking at your source code for tensor-embeddings,
Add the ida_ouput path in line 2 of the source code. However, what I understood in the paper was to perform token embedding and basic block embedding based on ANGR and MATLAB
Do I really need an IDA to run the source code?
Can't I just run ANGR and MATLAB without IDA?
Question 2.
The source code consists of a total of three components (1. token-level embedding, 2. basic-block level embedding, 3. function-embedding).
Where is the binary code search module that finally finds binary similarities?
Question 3.
As written in the paper, the order of execution is 1. Token-level embedding -> Extraction result transfer of 1. Token-level embedding -> 2. Basic block embedding -> Extraction result transfer of 2. Basic Block embedding -> 3. Function embedding -> 3. Result check
Is it correct to execute in the above order?
I would like to ask questions related to source code implementation, but there is no email in the paper, so I would appreciate it if you could provide your email address for a personal inquiry email.
sorry to bother you
could you please explain the differrences between the function buildAndTraining_skipgram() and buildAndTraining()
in buildAndTraining() : train_inputs = tf.placeholder(tf.int32, shape=[batch_size, 2, 5]) is differet for normal idea.
in buildAndTraining_skipgram(): train_inputs = tf.placeholder(tf.int32, shape=[batch_size, 5]) is the same as normal idea . so i don't understand what functions buildAndTraining() is using for ?
hi, thanks for your work and source code :
there is one question:
After running "embedding.m", I got an amazing tensor ( kcompress * program_num * functions_num ), meanwhile, I have many programs and many functions, but I don't know how to match the functions and their embeddings? how can I get the index of the function in tensor?
sorry for my bad English.
当我运行全部程序后,得到一个维度为( kcompress * program_num * functions_num )的巨大张量,但在张量读取和压缩过程中,没有程序名称函数名称等信息,如果我想要获取指定函数及其对应的嵌入向量,我该如何操作呢?
疫情形势严峻,祝您健康,平安。期待您的回复!
.
Hi @ycachy,
Is the dataset available for performing experimentation along with the ground truth for getting the precision, recall scores?
It would be great if you can provide that.
Thanks.
excuse me
there is "Run tensor embedding/Embedding.py " in readme.md
but I can not find the file.
Is that mean Tembedding.m ? or How can I get the "tensor embedding"?
thanks so much
hi first thank you for your contribution ,this article is of great reference value .
i have one questions :
why you implement the skip-gram algorithm by yourself without using the implement by gensim ?
can you explain the difference between them?
DeepBinDiff is training online and is costing too much time when diffing larger files due to its diff algorithm.
Is "Codee" wokring like DeepBinDiff?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.