alexanderrichard / neuralnetwork-viterbi Goto Github PK

View Code? Open in Web Editor NEW

53.0 53.0 19.0 28 KB

License: MIT License

Python 100.00%

neuralnetwork-viterbi's People

Contributors

Stargazers

Watchers

Forkers

hzhang57 julieli jbehley hyzcn feiwang2018 sj-li paper-implementation yassersouri sagniklp yyht giulio93 luuckiest greenrhyno chinayi jszgz yuhonghong95721 shaanrockz kmsquire clavir

neuralnetwork-viterbi's Issues

MPII Cooking Dataset

Hi everyone,
I'm trying to use this decoder over MPII Cooking Dataset
This dataset is a lot way bigger than Breakfast dataset:

The longest video il 71K frame long
The FPS ~30
Number of actions is 64
Dimension of the features provided is 4000

In order to make training work i modified the parameters in this way:

frame sampling = 440 in train.py
buffered_frame_ratio = 10 in train.py
-batch_size = 256 in train.py
n_old_frames = min( int(7000), self.buffer.n_frames() ) in network.py
window_size = 50 in network.py

After 2 Days and 4 hours still, waiting for the training to happen.
So i've decide to abort.

Now i'm planning to:

Reduce the number of frame per action.
Augmenting the window stride, now window stop over each frame.
Reduce features dimensionality, perhaps creating Fisher Vectors.

I was wondering also if this type of Dataset fits the purpose of this Decoder, or since the action are too long, i should find another way... any ideas, suggestion or critic is really appreciated!

I have trained using the 50 salads split 1 for 3K iterations and the test accuracy is 0.386362. Does that look reasonable to you? Or do I have to optimize the parameters (assuming repo parameters are for breakfast data)? Please suggest. Thanks.

Size mis-match errors

Fisher Vectors of improved dense trajectories

Hi,
Do you guys have a script that I can use to convert my own action dataset into the fisher vectors format to train with the nn_viterbi?

Breakfast: frame accuracy: 0.375708

Evaluate 252 video files...
frame accuracy: 0.375708

Hello, I downloaded the dataset you provided and ran your code, but only get 0.375708 on Breakfast dataset. By the way, is it segmentation acc or alignment acc?

Updated link for the dataset

The link https://uni-bonn.sciebo.de/s/wOxTiWe5kfeY4Vd has been outdated. I am wondering if there're any other alternative link to download the dataset?

Can you explain the features used in your code?

Regarding Viterbi decoding

Hi Alex,
I have another two issues regarding the Viterbi decoding.

For the auxiliary function Q(t,l,c.h), you distinguish two cases
(1) when the frame t stays in the same label, you multiply the function with the frame score of the last label, which is line 90 in utils/viterbi.py

score = hyp.score + self.frame_score(frame_scores, t, label)

(2) when the frame t goes to the next label, you multiply the function with the frame score of the new label, the length model score (Poisson probability of the last segment which just ends) and also the grammar score (just to check whether the new label is a valid successor), as in line 98

score = hyp.score + self.frame_score(frame_scores, t, label) + self.length_model.score(length, label) + self.grammar.score(context, new_label)

Therefore, here the frame score should be self.frame_score(frame_scores, t, new_label) instead of self.frame_score(frame_scores, t, label), right? Please forgive me if I have misunderstood the auxiiiary function.

In the trackback function, I guess you want to add all the missing frames (remainder of sampling of every 30 frames) to the last segment (which is always SIL label 0), as in line 127-128

segments[0].length += n_frames - len(labels) # append length of missing frames
labels += [hyp.traceback.label] * (n_frames - len(labels)) # append labels for missing frames

However, you add the number of missing frames to the length of the first segment and then the labels of missing frames to end of labels. In this case, the returned labels and segments do not correspond to each other in the first and last segment. (This has actually no big effect as segment was never used during the training.)

Regards

features for Hollywood Extended?

Hello,

Can you also provide the extracted features for the Hollywood Extended dataset as you did for the other two datasets?

Thank you.

output of GRU

Hi,
in utils/network.py line 108, shouldn't it be output, dummy = self.gru(x) instead of dummy, output = self.gru(x) as the first output is output from GRU and the second output is the hidden state (doc https://pytorch.org/docs/0.4.1/nn.html#torch.nn.GRU).
Regards

What is the fps of the frame extractor which produces the frame for Idt & FV

thx~

alexanderrichard / neuralnetwork-viterbi Goto Github PK

neuralnetwork-viterbi's People

Contributors

Stargazers

Watchers

Forkers

neuralnetwork-viterbi's Issues

MPII Cooking Dataset

Results on 50 salads

Size mis-match errors

Fisher Vectors of improved dense trajectories

Breakfast: frame accuracy: 0.375708

Updated link for the dataset

Can you explain the features used in your code?

Regarding Viterbi decoding

features for Hollywood Extended?

output of GRU

What is the fps of the frame extractor which produces the frame for Idt & FV

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent