lucidrains / point-transformer-pytorch Goto Github PK
View Code? Open in Web Editor NEWImplementation of the Point Transformer layer, in Pytorch
License: MIT License
Implementation of the Point Transformer layer, in Pytorch
License: MIT License
Hi!
Thanks for sharing the great work! I have a confusion that how did you implement trilinear interpolation on point clouds in the block of transition up. Because I think trilinear interpolation is a method on 3-dimensional regular grids.
Thank you!
Firstly, thanks for your awesome work!!!
I have a confusion that if the vector attention can modulate individual feature channel, the attention map should have a axis whose dimension is same as the feature dimension of cooresponding value (V). And based on the Eq. (2) in the paper, if the shape of query, key and value is [batch_size, num_point, num_dim],
the shape of vector attention map may should be [batch_size, num_points, num_points, num_dim].
Looking forwoard to your reply!
Hi,
Thanks for this contribution. In the implementation of attn_mlp
the first linear layer increases the dimension. Is this a standard practice because I did not find any details about this in the paper. Also paper also does not describe use of mask
, is this again some standard practice for attention layers?
Thanks!!
No one can reproduce the performance reported in your original paper. Please post your pre-trained model or your original code. Otherwise, we must question your academic ethics!****
Great job! I have a question about the number of the points in the point cloud. Do you have any suggestion to deal with point clouds with different point. As I know, point cloud models are always applied in Shapenet which contains point clouds with 2048 points. So what can we do if the number of the point clouds is not constant?
I wrote some wrapper code to turn this layer into a full transformer and I can't seem to figure out what is going wrong. The following works:
import torch
from torch import nn, einsum
import x_transformers
from point_transformer_pytorch import PointTransformerLayer
layer = PointTransformerLayer(
dim = 7,
pos_mlp_hidden_dim = 64,
attn_mlp_hidden_mult = 4,
num_neighbors = 16 # only the 16 nearest neighbors would be attended to for each point
)
feats = torch.randn(1, 5, 7)
pos = torch.randn(1, 5, 3)
mask = torch.ones(1, 5).bool()
y = layer(feats, pos, mask = mask)
However this doesn't work
import torch
from torch import nn, einsum
import x_transformers
from point_transformer_pytorch import PointTransformerLayer
class PointTransformer(nn.Module):
def __init__(self, feats, mask, neighbors = 16, layers=5, dimension=5):
super().__init__()
self.feats = feats
self.mask = mask
self.neighbors = neighbors
self.layers = []
for _ in range(layers):
self.layers.append(PointTransformerLayer(
dim = dimension,
pos_mlp_hidden_dim = 64,
attn_mlp_hidden_mult = 4,
num_neighbors = self.neighbors
))
def forward(self, pos):
curr_pos = pos
for layer in self.layers:
print(curr_pos)
curr_pos = layer(self.feats, pos, self.mask)
print("----")
return curr_pos
model = PointTransformer(feats, mask)
model(pos)
The error I'm getting is mat1 and mat2 shapes cannot be multiplied (5x7 and 5x15)
Can you provide the test file? I want to visualize the generated prediction file.
Dear Authors,
In your paper you wrote:
"The layer is invariant to permutation and cardinality and is thus inherently suited to point cloud processing."
I do not understand this statement, because your PointTransformerLayer
https://github.com/lucidrains/point-transformer-pytorch/blob/main/point_transformer_pytorch/point_transformer_pytorch.py#L31
requires the dim
parameter in initialization. So it always expects dim
elements in input.
What if a point cloud has dim+1 points?
Thank you in advance.
It seems that the implementation of the multi-head point transformer produces scalar attention scores for each head.
Hi!
Thanks for bringing the transformer into the point cloud task. Would you share the full code of the model and the pre-trained weights for the segmentation task such as S3DIS? Thank you!
I'm not sure whether I used the point-transformer correctly: I just implemented one block for training, and the data shape of (x, pos) in each gpu are both [16, 2048, 3], later I was informed that my gpu is running out of the memory(11.77 GB total capacity)
FYI
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.