martinxm / gap Goto Github PK

View Code? Open in Web Editor NEW

86.0 2.0 9.0 1.63 MB

official implementation for Language Supervised Training for Skeleton-based Action Recognition

License: Apache License 2.0

Python 100.00%

graph-neural-networks skeleton-based-action-recognition

gap's Introduction

MartinXM

Hi there 👋

gap's People

Contributors

Stargazers

Watchers

Forkers

songxinran-coding lichenyang-github reskyqian jktae shangguanmingxia wmingdao

gap's Issues

About Bone modality

Thanks for the great work！

When I trained the bone modality following the Readme， I found the modality still is joint.
In the config, the "bone" is set to False.
And the running command CUDA_VISIBLE_DEVICES=0,1 python main_multipart_ntu.py --config config/nturgbd120-cross-subject/lst_bone.yaml --model model.ctrgcn.Model_lst_4part_bone --work-dir work_dir/ntu120/csub/lst_bone --device 0 1 does not change this setting.

About NW-UCLA

Hello, when i tried to use CTR-GCN to train NW-UCLA, i got 95.5 accurancy , 1% lower than the result CTR-GCN given.
Did you meet the same problem in your experiments.

About Ablation Study

Hi, thanks for your wonderful work but I have some questions about the ablation study. I tried to run the training process with the same settings described in the 'Implementation Details' to see the influence of partition strategies, but I got a different result. So I doubt about the settings of my model and I hope you could help me correct it. Thank you!
I use the model Model_lst_4part in the training process, with a modified TCN_GCN_unit to apply single-scale temporal convolution. The code are described below.

class TCN_GCN_unit(nn.Module):
    def __init__(self, in_channels, out_channels, A, stride=1, residual=True, adaptive=True, kernel_size=5, dilations=[1,2]):
        super(TCN_GCN_unit, self).__init__()
        self.gcn1 = unit_gcn(in_channels, out_channels, A, adaptive=adaptive)
        # self.tcn1 = MultiScale_TemporalConv(out_channels, out_channels, kernel_size=kernel_size, stride=stride, dilations=dilations,
        #                                     residual=False)
        self.tcn1 = unit_tcn(out_channels, out_channels, kernel_size=kernel_size, stride=stride)
        self.relu = nn.ReLU(inplace=True)
        if not residual:
            self.residual = lambda x: 0

        elif (in_channels == out_channels) and (stride == 1):
            self.residual = lambda x: x

        else:
            self.residual = unit_tcn(in_channels, out_channels, kernel_size=1, stride=stride)

When using 4 part partition strategy, I just use the code in the main_multipart_ntu.py without any modification.

When using global partition strategy, I also apply the model above but only use return value self.fc(x) and feature_dict in training process. I add num_text_aug = 1 in the forward process of every epoch to make sure only the encoded global information is permitted to compute the contrastive loss.

for batch_idx, (data, label, index) in enumerate(process):
    self.global_step += 1
    with torch.no_grad():
        data = data.float().cuda(self.output_device)
    timer['dataloader'] += self.split_time()
    self.optimizer.zero_grad()

    # forward
    with torch.cuda.amp.autocast():
        output, feature_dict, logit_scale, part_feature_list = self.model(data)

		# here is what I added
        num_text_aug = 1

        label_g = gen_label(label)
        label = label.long().cuda(self.output_device)
        loss_te_list = []
        for ind in range(num_text_aug):
            if ind > 0:
                text_id = np.ones(len(label), dtype=np.int8) * ind
                texts = torch.stack([text_dict[j][i, :] for i, j in zip(label, text_id)])
                texts = texts.cuda(self.output_device)

            else:

                texts = list()
                for i in range(len(label)):
                    text_len = len(text_list[label[i]])
                    text_id = np.random.randint(text_len, size=1)
                    text_item = text_list[label[i]][text_id.item()]
                    texts.append(text_item)
                texts = torch.cat(texts).cuda(self.output_device)

I trained both model for 110 epochs. The best accuracy of global and 4 parts strategy are 0.9045 and 0.9027 separately. They are not around 85% as the paper says. And I know it seems ridiculous that global strategy even outperforms 4 parts strategy in my experiment. So I wonder maybe there are some mistakes in my implementation. I hope you can help me. Looking forward to your reply and thank you very much!

About pre-trained models

Thanks for the great work!

I'm working on a project that could greatly benefit from the use of your pre-trained weights as described in your paper. Would it be possible for me to access or obtain the pre-trained weights to use in my research? I would greatly appreciate your assistance.

Thank you for your time and consideration.

Sincerely.

some questions about code

I want to switch the backbone to ST-GCN, but it seems that this approach is not effective. I suspect it might be because I haven't trained the CLIP. Can you tell me how CLIP is trained?

accuracy for nw-ucla data is below 97.2%

I have run your code according your settings, but I have got the accuracy for NW-UCLA 96.12%.

NameError: name 'gen_label' is not defined

Hello, I got some errors when I tried to run the project, ‘gen_label’ and ‘create_logits’ seem to be undefined functions

2D data set use case possibility?

Hi, thanks for sharing your work and I am interested in trying to another application. I am aware that the dataset is based on 3D co-ordinates but I have a 2D pose dataset. Will it be possible to incorporate that data to train this model using transfer learning? If yes, how can i do so?Thanks

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.