Giter Club home page Giter Club logo

Comments (12)

SpirinEgor avatar SpirinEgor commented on June 20, 2024 1

You can find this function in the commode-utils package. You can also install it with pip.

from embeddings-for-trees.

SamraMehboob avatar SamraMehboob commented on June 20, 2024

Hi
Can you please help me with this.
Thanks.

from embeddings-for-trees.

SpirinEgor avatar SpirinEgor commented on June 20, 2024

Hi!

.ckpt files correspond to model checkpoints with weights.
If you want to extract embedding you need to write some code :)

  1. First of all, you need to load checkpoint. See this example.
  2. Since you don't need a decoder, extract embedding and encoder modules:
node_emb = model._TreeLSTM2Seq__embedding
tree_enc = model._TreeLSTM2Seq__encoder

Sorry for not very convenient code, I will look forward to provide more accurate interfaces.
3. Retrieve trees embeddings

batched_trees.ndata["x"] = self.__embedding(batched_trees)
encoded_nodes = self.__encoder(batched_trees)
batched_encoded_nodes, mask = cut_into_segments(
    encoded_nodes, batched_trees.batch_num_nodes(), False
)
  1. Now you have embeddings of all nodes in the tree, you can aggregate them by mean for example.

from embeddings-for-trees.

SamraMehboob avatar SamraMehboob commented on June 20, 2024

Thanks a lot for detailed explanation. I'll try above steps and will let you know.

from embeddings-for-trees.

SamraMehboob avatar SamraMehboob commented on June 20, 2024

Sorry to disturb you again.

batched_encoded_nodes, mask = cut_into_segments(
encoded_nodes, batched_trees.batch_num_nodes(), False
)

Where to put this code patch?

from embeddings-for-trees.

SpirinEgor avatar SpirinEgor commented on June 20, 2024

Write after extracting encoded_nodes.
To speed up computation, all trees in the batch are collated into a single tree. This function cuts this tree back, so you will have encoded nodes for each tree in batch and mask to properly slice it.

from embeddings-for-trees.

SamraMehboob avatar SamraMehboob commented on June 20, 2024

cut_into_segments function is undefined. Do I need to write this script on own or provided by some library?

from embeddings-for-trees.

SamraMehboob avatar SamraMehboob commented on June 20, 2024

Thank you.

from embeddings-for-trees.

SamraMehboob avatar SamraMehboob commented on June 20, 2024

Hi,
I created vector embedding for Java files.
Basically, I want to train a model for program repair task, such that it takes input-embeddings of buggy line and learn to generate output for fixed line.
I gave input buggy line and created embeddings as shown below. But I'm confused, how to convert these vectors in source code. As which value in vectors corresponding to which token?

Sorry for this silly question, but if you can provide some pointers it will be really helpful.
Thanks.

image

from embeddings-for-trees.

SpirinEgor avatar SpirinEgor commented on June 20, 2024

Wow, it is quite surprising for me that this even works :)
I use PyTorch as a framework to train the model and you use TensorFlow functions to operate with torch tensors...

Answering your questions:

  • You may see that encoded_nodes passed into the decoder module. This module generates a sequence of output_length size. output_logits is a matrix with the shape [sequence length; batch size; vocab size]. So, if you want to generate some code, you should: (1) create a vocabulary with possible tokens for generating (label vocabulary), (2) pass the required output length to the forward method.
  • batched_encoded_nodes shape should be something like [batch size; size of max tree in the batch]. The first vector corresponds to the root vector, the following vectors correspond to the tree with respect to their definition.

from embeddings-for-trees.

SamraMehboob avatar SamraMehboob commented on June 20, 2024

Alright thanks for help. I'll try with above points.

from embeddings-for-trees.

SpirinEgor avatar SpirinEgor commented on June 20, 2024

I close it due to inactivity, but feel free to reopen or create new issues in case of any questions!

from embeddings-for-trees.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.