Giter Club home page Giter Club logo

gap's Introduction

MartinXM

Hi there 👋

MartinXM's GitHub stats

gap's People

Contributors

martinxm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

gap's Issues

About Bone modality

Thanks for the great work!

When I trained the bone modality following the Readme, I found the modality still is joint.
In the config, the "bone" is set to False.
And the running command CUDA_VISIBLE_DEVICES=0,1 python main_multipart_ntu.py --config config/nturgbd120-cross-subject/lst_bone.yaml --model model.ctrgcn.Model_lst_4part_bone --work-dir work_dir/ntu120/csub/lst_bone --device 0 1 does not change this setting.

About NW-UCLA

Hello, when i tried to use CTR-GCN to train NW-UCLA, i got 95.5 accurancy , 1% lower than the result CTR-GCN given.
Did you meet the same problem in your experiments.

About Ablation Study

​ Hi, thanks for your wonderful work but I have some questions about the ablation study. I tried to run the training process with the same settings described in the 'Implementation Details' to see the influence of partition strategies, but I got a different result. So I doubt about the settings of my model and I hope you could help me correct it. Thank you!
​ I use the model Model_lst_4part in the training process, with a modified TCN_GCN_unit to apply single-scale temporal convolution. The code are described below.

class TCN_GCN_unit(nn.Module):
    def __init__(self, in_channels, out_channels, A, stride=1, residual=True, adaptive=True, kernel_size=5, dilations=[1,2]):
        super(TCN_GCN_unit, self).__init__()
        self.gcn1 = unit_gcn(in_channels, out_channels, A, adaptive=adaptive)
        # self.tcn1 = MultiScale_TemporalConv(out_channels, out_channels, kernel_size=kernel_size, stride=stride, dilations=dilations,
        #                                     residual=False)
        self.tcn1 = unit_tcn(out_channels, out_channels, kernel_size=kernel_size, stride=stride)
        self.relu = nn.ReLU(inplace=True)
        if not residual:
            self.residual = lambda x: 0

        elif (in_channels == out_channels) and (stride == 1):
            self.residual = lambda x: x

        else:
            self.residual = unit_tcn(in_channels, out_channels, kernel_size=1, stride=stride)

​ When using 4 part partition strategy, I just use the code in the main_multipart_ntu.py without any modification.

​ When using global partition strategy, I also apply the model above but only use return value self.fc(x) and feature_dict in training process. I add num_text_aug = 1 in the forward process of every epoch to make sure only the encoded global information is permitted to compute the contrastive loss.

for batch_idx, (data, label, index) in enumerate(process):
    self.global_step += 1
    with torch.no_grad():
        data = data.float().cuda(self.output_device)
    timer['dataloader'] += self.split_time()
    self.optimizer.zero_grad()

    # forward
    with torch.cuda.amp.autocast():
        output, feature_dict, logit_scale, part_feature_list = self.model(data)

		# here is what I added
        num_text_aug = 1

        label_g = gen_label(label)
        label = label.long().cuda(self.output_device)
        loss_te_list = []
        for ind in range(num_text_aug):
            if ind > 0:
                text_id = np.ones(len(label), dtype=np.int8) * ind
                texts = torch.stack([text_dict[j][i, :] for i, j in zip(label, text_id)])
                texts = texts.cuda(self.output_device)

            else:

                texts = list()
                for i in range(len(label)):
                    text_len = len(text_list[label[i]])
                    text_id = np.random.randint(text_len, size=1)
                    text_item = text_list[label[i]][text_id.item()]
                    texts.append(text_item)
                texts = torch.cat(texts).cuda(self.output_device)

​ I trained both model for 110 epochs. The best accuracy of global and 4 parts strategy are 0.9045 and 0.9027 separately. They are not around 85% as the paper says. And I know it seems ridiculous that global strategy even outperforms 4 parts strategy in my experiment. So I wonder maybe there are some mistakes in my implementation. I hope you can help me. Looking forward to your reply and thank you very much!

About pre-trained models

Thanks for the great work!

I'm working on a project that could greatly benefit from the use of your pre-trained weights as described in your paper. Would it be possible for me to access or obtain the pre-trained weights to use in my research? I would greatly appreciate your assistance.

Thank you for your time and consideration.

Sincerely.

some questions about code

I want to switch the backbone to ST-GCN, but it seems that this approach is not effective. I suspect it might be because I haven't trained the CLIP. Can you tell me how CLIP is trained?

2D data set use case possibility?

Hi, thanks for sharing your work and I am interested in trying to another application. I am aware that the dataset is based on 3D co-ordinates but I have a 2D pose dataset. Will it be possible to incorporate that data to train this model using transfer learning? If yes, how can i do so?Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.