antoine77340 / youtube-8m-willow Goto Github PK

View Code? Open in Web Editor NEW

467.0 467.0 165.0 58 KB

Kaggle Youtube 8M WILLOW approach

License: Apache License 2.0

Python 100.00%

youtube-8m-willow's People

Contributors

Stargazers

Watchers

Forkers

bityangke slai11 baiyancheng20 yonashub jizhihang jimdowling way2joy mengqhui jxlijunhao zhanghaoinf longchuan1985 bmyan lyk125 wikipedia2008 phimachine loryculaire ml-lab zhangxgu yetianjhu basilahamed94 daijucug mave5 qizailiu yyuzhongpv zhoudaqing mohanarunachalam qingshan412 fx-cc walkoncross yinweichong lchia vana77 youjiangxu deeplearningsky ykwon0407 snehil karolmajek hbcbh1999 zozulya ironyoung appcoreopc jdc08161063 lanyastar xwyangjshb shaoyandea opencvbaby duane-edgington lzjtt2017 kimilovesy kevd1337 murari023 jackaryyy zgsxwsdxg peternara hzhang57 xshhhm feitiandemiaomi kwan-ywan wantongtang locussam chapter544 ytann unnir wzhang1 yuhai-china amorgun walidar miracle-fmh yaochuanyu soywu shaoli-huang icaffe huangchaohuangchao slbinilkumar chuckcho wayne980 aaronlau0 lightsun jzkay12 wandaoming khaledto rewreu lvaleriu rakacza xibaoxuan zyanzhang junhyeongjeon beautymess kaz-anova ritika26 chicm-ms rahasayantan echowaiwai sorkanius fendaq fazilsajeer bigdatasciencegroup gearchen chenqionggao distrect9

youtube-8m-willow's Issues

Error in file_averaging.py

You mentioned to run final_averaging.py, but i guess you meant file_averaging.py

$ python file_averaging.py
Traceback (most recent call last):
File "file_averaging.py", line 64, in
avg = read_models(model_weights1)
NameError: name 'model_weights1' is not defined

InvalidArgumentError: Name: , Context feature 'video_id' is required but could not be found.

I download the 1/100 frame level features and run the train.py code. However, the follow wrong codes are obtained:

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Name: , Context feature 'video_id' is required but could not be found.
[[Node: train_input/ParseSingleSequenceExample_2/ParseSingleSequenceExample = ParseSingleSequenceExample[Ncontext_dense=1, Ncontext_sparse=1, Nfeature_list_dense=1, Nfeature_list_sparse=0, Tcontext_dense=[DT_STRING], context_dense_shapes=[[]], context_sparse_types=[DT_INT64], feature_list_dense_shapes=[[]], feature_list_dense_types=[DT_STRING], feature_list_sparse_types=[], _device="/job:localhost/replica:0/task:0/cpu:0"](train_input/ReaderReadV2_2:1, train_input/ParseSingleSequenceExample_2/ParseSingleSequenceExample/feature_list_dense_missing_assumed_empty, train_input/ParseSingleSequenceExample_2/ParseSingleSequenceExample/context_sparse_keys_0, train_input/ParseSingleSequenceExample_2/ParseSingleSequenceExample/context_dense_keys_0, train_input/ParseSingleSequenceExample_2/ParseSingleSequenceExample/feature_list_dense_keys_0, train_input/ParseSingleSequenceExample_2/Const, train_input/ParseSingleSequenceExample_2/ParseSingleSequenceExample/debug_name)]]
[[Node: train_input/shuffle_batch_join/cond_2/random_shuffle_queue_EnqueueMany/_98 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_90_train_input/shuffle_batch_join/cond_2/random_shuffle_queue_EnqueueMany", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

Caused by op u'train_input/ParseSingleSequenceExample_2/ParseSingleSequenceExample', defined at:
File "train.py", line 638, in
app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 626, in main
FLAGS.export_model_steps).run(start_new_model=FLAGS.start_new_model)
File "train.py", line 353, in run
saver = self.build_model(self.model, self.reader)
File "train.py", line 524, in build_model
num_epochs=FLAGS.num_epochs)
File "train.py", line 236, in build_graph
num_epochs=num_epochs))
File "train.py", line 164, in get_input_data_tensors
reader.prepare_reader(filename_queue) for _ in range(num_readers)
File "/media/ResearchProject/deeplearning/code/Youtube-8M-WILLOW/readers.py", line 212, in prepare_reader
max_quantized_value, min_quantized_value)
File "/media/ResearchProject/deeplearning/code/Youtube-8M-WILLOW/readers.py", line 224, in prepare_serialized_examples
for feature_name in self.feature_names
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/parsing_ops.py", line 780, in parse_single_sequence_example
feature_list_dense_defaults, example_name, name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/parsing_ops.py", line 977, in _parse_single_sequence_example_raw
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_parsing_ops.py", line 287, in _parse_single_sequence_example
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1264, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Name: , Context feature 'video_id' is required but could not be found.

How can I solve this problem?

Could I map a video to Euclidean space?

Is there any possible to use it in video retrieval？

Alway crash down after hours

I have met a problem, when I training the model after hours, it will crash down, I could not find why did it happened, can you help me finger this. I'm using tensorflow 1.3.
Here is log:

INFO:tensorflow:/job:master/task:0: training step 88648| Hit@1: 0.91 PERR: 0.79 GAP: 0.84 Loss: 5.15765
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.DataLossError'>, truncated record at 72435675
[[Node: train_input/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](train_input/TFRecordReaderV2_1, train_input/input_producer)]]
[[Node: train_input/SparseToDense_1/_407 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_33_train_input/SparseToDense_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
INFO:tensorflow:/job:master/task:0: training step 88649| Hit@1: 0.82 PERR: 0.67 GAP: 0.81 Loss: 5.31512
INFO:tensorflow:gatednetvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe/model.ckpt-88649 is not in all_model_checkpoint_paths. Manually adding it.
INFO:tensorflow:/job:master/task:0: Exporting the model at step 88649 to gatednetvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe/export/step_88649.
2017-08-26 07:09:39.978262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1055] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:02:00.0, compute capability: 3.5)
INFO:tensorflow:Restoring parameters from gatednetvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe/model.ckpt-88649
INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: gatednetvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe/export/step_88649/saved_model.pb
error:Tensor("map/while/Minimum:0", shape=(), dtype=int32) Tensor("map/while/Minimum_2:0", shape=(), dtype=int32)
Traceback (most recent call last):
File "train.py", line 638, in
app.run()
File "/data1/wangchao16/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 626, in main
FLAGS.export_model_steps).run(start_new_model=FLAGS.start_new_model)
File "train.py", line 428, in run
task_as_string(self.task))
File "/usr/lib64/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/data1/wangchao16/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/data1/wangchao16/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/data1/wangchao16/local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/data1/wangchao16/local/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
enqueue_callable()
File "/data1/wangchao16/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1229, in _single_operation_run
target_list_as_strings, status, None)
File "/usr/lib64/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/data1/wangchao16/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
c_api.TF_GetCode(status))
tensorflow.python.framework.errors_impl.DataLossError: truncated record at 72435675
[[Node: train_input/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](train_input/TFRecordReaderV2_1, train_input/input_producer)]]
[[Node: train_input/SparseToDense_1/_407 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_33_train_input/SparseToDense_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Got different results from inference.py and eval.py

Hello Antoine,

I've applied your code on my own datasets, but some strange things happened. Having input the same data and same model checkpoint, I got different results from eval.py and inference.py. Here's some examples:
From eval.py, numpy.argmax(predictions_val, axis=1) gives:
[17 18 17 18 17 17 17 17 8 17 17 17 0 18 18 17 17 0 17 17 0 17 17 17
17 17 10 17 19 17 17 17 19 10 17 17 17 9 17 7 17 17 17 17 9 17 17 17
10 10 10 0 17 17 17 8 17 17 0 17 17 17 17 17 17 17 17 17 12 17 17 17
17 17 14 17 17 10 17 0 17 16 17 17 17 17 9 0 17 17 17 17 17 17 17 0
17 0 17 17 17 17 17 17 17 0 1 3 5 17 17 9 17 17 10 17 17 17 17 9
17 17 17 10 17 17 17 17]
But from inference.py, numpy.argmax(predictions_val, axis=1) gives:
[11 8 4 18 6 10 10 6 10 0 10 8 0 18 8 10 10 0 10 8 0 8 10 18
8 8 10 10 19 17 18 10 19 10 10 14 9 10 10 8 0 10 10 11 10 3 11 9
10 10 10 0 18 8 10 8 10 0 0 10 7 10 8 0 9 0 10 10 6 10 10 3
10 0 10 6 11 10 10 0 10 10 1 9 0 11 4 3 0 6 0 17 6 17 10 18
7 0 6 7 0 7 10 0 10 0 1 0 16 6 0 9 10 5 10 7 0 11 10 0
0 6 10 10 1 0 6 3]

They are totally different, have you tried this as I did? Looking forward to your reply. Thanks in advance.

Sylvia

How to reduce the model size?

The trained model size is over 1 GB, How could I reduce the model size?

inference error using the pretrained model

I downloaded the pertained model zip file , it contains these files

Did I miss any files in downloading?

in this line
latest_checkpoint = tf.train.latest_checkpoint(train_dir)
latest_checkpoint returns None
so it gives this error because of the following code
if latest_checkpoint is None:
raise Exception("unable to find a checkpoint at location: %s" % train_dir)

Can anyone please help. Many thanks!!!

I tried to freeze the model, but in my freezing code since the model is not found, I can not even freeze it

Have you found the pca-matrix which used in youtube8M?

Could it be possible to found the pca-matrix by ourselves? Thx!

Google audio set

Hello @antoine77340 ,

I wanted to know if it would be viable to run your models on the google audio set data.

The data is stored as tfrecords. The number of classes in the data is 527. What changes do you think I should make in the script/ architecture to get your models running?

Thanks in advance.

How to

where is the model?

How to fine-tune your pretrained model

Hi,could you tell me how to modify code if I want to fine-tune the pretrained model you provided （gatednetvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe）,my own dataset has only 12 classes.
Thanks.

A question about NetVLAD

In NetVLAD,

        activation = tf.reshape(activation, [-1, self.max_frames, self.cluster_size])

        a_sum = tf.reduce_sum(activation,-2,keep_dims=True) # [num_video, 1, 64]

        cluster_weights2 = tf.get_variable("cluster_weights2",
            [1,self.feature_size, self.cluster_size], # 1x1024x64
            initializer = tf.random_normal_initializer(stddev=1 / math.sqrt(self.feature_size)))
        
        a = tf.multiply(a_sum,cluster_weights2) # what is the shape of a?

TypeError: Value passed to parameter 'shape' has DataType float32 not in list of allowed values: int32, int64

When running train.py, I get this error. plz resolve

Problem in running pre-trained model

When i run the pre-trained model using the inference command, the script gets stuck at the last line of the following log :

INFO:tensorflow:number of input files: 13
INFO:tensorflow:loading meta-graph: F:/project_27_06_2018/yt8m/v2/models/video/sample_model/model.ckpt-310001.meta
WARNING:tensorflow:The saved meta_graph is possibly from an older release:
'local_variables' collection should be of type 'byte_list', but instead is of type 'node_list'.
WARNING:tensorflow:The saved meta_graph is possibly from an older release:
'model_variables' collection should be of type 'byte_list', but instead is of type 'node_list'.
INFO:tensorflow:restoring variables from F:/project_27_06_2018/yt8m/v2/models/video/sample_model/model.ckpt-310001
INFO:tensorflow:Restoring parameters from F:/project_27_06_2018/yt8m/v2/models/video/sample_model/model.ckpt-310001

I use the following command to run the inference script :

runfile('F:/project_27_06_2018/yt8m/code/inference.py', args='--output_file="F:/project_27_06_2018/yt8m/output/test-lstm-0002-val-150-random.csv" --input_data_pattern="F:/project_27_06_2018/yt8m/v2/video/validate*.tfrecord" --model="F:/project_27_06_2018/yt8m/v2/models/video/sample_model/model.ckpt-310001.data-00000-of-00001" --train_dir="F:/project_27_06_2018/yt8m/v2/models/video/sample_model/" --frame_features=False --feature_names="rgb,audio" --feature_sizes="1024,128" --batch_size=1024 --base_learning_rate=0.0002 --iterations=150 --lstm_random_sequence=True --run_once=True --top_k=50', wdir='F:/project_27_06_2018/yt8m/code')

What am I doing wrong?

Inference on new videos?

Hi, thanks for sharing the great work. Have you figured out how to use this model for any given videos, i.e., have you figured out how to extract the features from RGB frames into the required tfrecords? Thanks!

About the validation

Hi, can you tell me about the validation part, there is no describe about validation in your code. Can I just use the eval.py to evaluate the trained model on validation data like your method on inference.py as following? Or can you give me an example.

python eval.py --eval_data_pattern="$path_to_features/validatea*.tfrecord" --model=NetVLADModelLF --train_dir=gatedlightvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe --frame_features=True --feature_names="rgb,audio" --feature_sizes="1024,128" --batch_size=1024 --base_learning_rate=0.0002 --netvlad_cluster_size=256 --netvlad_hidden_size=1024 --moe_l2=1e-6 --iterations=300 --learning_rate_decay=0.8 --netvlad_relu=False --gating=True --moe_prob_gating=True --lightvlad=True --run_once=True --top_k=50

Run Inference Error Feature list 'audio' is required but could not be found.

you are probably trying to process video level features whereas his code I think is for frame level features, I don't know if it would work by changing --feature_names="rgb,audio" to --feature_names="mean_rgb,mean_audio" because that's what the features from video level are supposed to be.

Originally posted by @estathop in #16 (comment)

Context feature 'video_id' is required but could not be found.

Hello @antoine77340

I am trying to run the GRU model in Antoine's code to the new version of the data, and this is the error I get when I run it:
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Name: , Context feature 'video_id' is required but could not be found.

The code runs on the older version of the data (version1), but does not run on the newer version.

This is the code that I ran:
python train.py --train_data_pattern=${HOME}/yt8m/v2/frame/train*.tfrecord --model=GruModel --train_dir ~/yt8m/v2/models/frame/GRU_model2 --frame_features=True --feature_names="audio" --feature_sizes="128" --batch_size=128 --base_learning_rate=0.0002 --gru_cells=1200 --learning_rate_decay=0.9 --moe_l2=1e-6 --max_step=300000 --start_new_model

NOTE: I am running it on the audio features only. The GRU model is able to handle this (the code runs on the older data), so this should not be an issue.

I need to run the models on the newer data only. Please suggest changes to the code, so that the model can run.

Thanks in advance.

Feature list 'audio' is required but could not be found.

Hi i want to use your code to do experiments on audioset (audio only, frame-level features 128-dim).

But i have problem with your NetVLADModelLF model:
Command:
python train.py --train_data_pattern='/vol/vssp/msos/yx/audioset/audioset_v1_embeddings/bal_train/*.tfrecord' --model=NetVLADModelLF --train_dir=gatednetvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe --frame_features=True --feature_names='audio_embedding' --feature_sizes='128' --batch_size=80 --base_learning_rate=0.0002 --netvlad_cluster_size=256 --netvlad_hidden_size=128 --moe_l2=1e-6 --iterations=300 --learning_rate_decay=0.8 --netvlad_relu=False --gating=True --moe_prob_gating=True --max_step=700000

Error: 2017-07-24 15:44:44.093022: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Name: , Feature list 'audio' is required but could not be found. Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?

But i can successfully run using LstmModel or GruModel :
command:
python train.py --train_data_pattern='/vol/vssp/msos/yx/audioset/audioset_v1_embeddings/bal_train/*.tfrecord' --frame_features=True --model=LstmModel --feature_names='audio_embedding' --feature_sizes='128' --train_dir=tmp_model/frame_level_lstm_model_bal

The feature name in Audioset is "--feature_names='audio_embedding'', i think there might be feature name error in your NetVLADModelLF definition. Could you give me some help on it ?

Thanks a lot.

Didn't anyone ever notice this small bug? Or its too small to be mentioned

In the file frame_level_models.py line 102

flags.DEFINE_integer("fv_coupling_factor", 0.01,
                     "Coupling factor")

Not found: Key tower/fully_connected/biases not found in checkpoint

I downloaded your pretrained model and tried to evaluate it with 10 videos from the training sample, but as I press this command, an error follows. And I can't find why this is happening. Everything else is installed normally

python eval.py --eval_data_pattern=/home/estathop/features/train*.tfrecord --train_dir=/home/estathop/Documents/pretrained/public/ --run_once=True
/home/estathop/tensorflow/local/lib/python2.7/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
tensorflow version: 1.9.0-dev20180502
INFO:tensorflow:Using batch size of 1024 for evaluation.
INFO:tensorflow:number of evaluation files: 43
INFO:tensorflow:built evaluation graph
2018-05-04 15:33:19.482097: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-05-04 15:33:19.573605: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-04 15:33:19.573963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1349] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.20GiB
2018-05-04 15:33:19.573978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1428] Adding visible gpu devices: 0
2018-05-04 15:33:19.760748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-04 15:33:19.760790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:922] 0
2018-05-04 15:33:19.760795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:935] 0: N
2018-05-04 15:33:19.760999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1046] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6955 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Loading checkpoint for eval: /home/estathop/Documents/pretrained/public/model.ckpt-310001
INFO:tensorflow:Restoring parameters from /home/estathop/Documents/pretrained/public/model.ckpt-310001
2018-05-04 15:33:19.884136: W tensorflow/core/framework/op_kernel.cc:1290] CtxFailure at save_restore_v2_ops.cc:184: Not found: Key tower/fully_connected/biases not found in checkpoint
Traceback (most recent call last):
File "eval.py", line 332, in
app.run()
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "eval.py", line 328, in main
evaluate()
File "eval.py", line 320, in evaluate
last_global_step_val)
File "eval.py", line 197, in evaluation_loop
saver.restore(sess, latest_checkpoint)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1746, in restore
six.reraise(exception_type, exception_value, exception_traceback)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1734, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key tower/fully_connected/biases not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op u'save/RestoreV2', defined at:
File "eval.py", line 332, in
app.run()
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "eval.py", line 328, in main
evaluate()
File "eval.py", line 309, in evaluate
saver = tf.train.Saver(tf.global_variables())
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1266, in init
self.build()
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1278, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1315, in _build
build_save=build_save, build_restore=build_restore)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 763, in _build_internal
restore_sequentially, reshape)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 400, in _AddRestoreOps
restore_sequentially)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 814, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3408, in create_op
op_def=op_def)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1734, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Key tower/fully_connected/biases not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

NetVLAD 256 out of memory on titan x 12 GB

Hi,
I am running this model but got CUDA out_of_memory error when training. What is your approach to get around this?
Thank you.

Question about map.

MAP is so low, is that normal?

Are you the first team in the game, please

Prediction results are the same in inference

I just use the pretrained model to predict my own data without training.
The model is NetVLADModelLF.

I only extract rgb features on my own data and construct input tensor.

for i in xrange(data_num): 
        feature = data[i]
        pad_feature = np.zeros([300, 1152])
        pad_feature[0, :1024] = feature
        video_batch = pad_feature[np.newaxis, :, :].astype(np.float32)
        num_frames_batch = np.array([1], dtype=np.int32)
        predictions = sess.run([predictions_tensor],
                        feed_dict={input_tensor: video_batch, num_frames_tensor: num_frames_batch})

The predictions are the same.
I tried to construct 300 frame input but the problems remains.

Did i do the inference correctly?

Problem in inference

Hi,glad to tell you, google have released pca matrix , and now I want to write a predict function to test the original video with your model.
But I have met a question,when I set batch_size to 1,the result is strange, could you please tell me the reason?

About descriptors in NetVLAD

Hi @antoine77340
In the paper "Learnable pooling with Context Gating for video classification".
At page 4, NetVLAD aggregation section. Would please tell me where the descriptor xi comes from?
In original VLAD, they are descriptors produced by SIFT on each frame.
Here, Does the xi mean each frame-level feature(dimension=1024) ?
Thanks !

Issues when test video/frame feature

Hi, @antoine77340. I have download youtube-8m dataset. Then, i use video/frame test folder test your pretrained model. But i occur a error when testing, error information as follows:
INFO:tensorflow:number of input files: 4096
INFO:tensorflow:loading meta-graph: pretrainedmodel/model.ckpt-310001.meta
INFO:tensorflow:restoring variables from pretrainedmodel/model.ckpt-310001
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, ../YT8M/youtube-8m/features/validatelN.tfrecord
[[Node: train_input/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](train_input/TFRecordReaderV2_1, train_input/input_producer)]]

Caused by op u'train_input/ReaderReadV2_1', defined at:
File "inference.py", line 203, in
app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "inference.py", line 199, in main
FLAGS.output_file, FLAGS.batch_size, FLAGS.top_k)
File "inference.py", line 128, in inference
saver = tf.train.import_meta_graph(meta_graph_location, clear_devices=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1577, in import_meta_graph
**kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/meta_graph.py", line 498, in import_scoped_meta_graph
producer_op_list=producer_op_list)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 287, in import_graph_def
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1264, in init
self._traceback = _extract_stack()

NotFoundError (see above for traceback): ../YT8M/youtube-8m/features/validatelN.tfrecord
[[Node: train_input/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](train_input/TFRecordReaderV2_1, train_input/input_producer)]]

In addition, i use the command begin testing as follows:
python inference.py --output_file=test_video_v1.csv --input_data_pattern="video_test/test*.tfrecord" --model=NetVLADModelLF --train_dir=pretrainedmodel --frame_features=false --batch_size=1024 --base_learning_rate=0.0002 --netvlad_cluster_size=256 --netvlad_hidden_size=1024 --moe_l2=1e-6 --iterations=300 --learning_rate_decay=0.8 --netvlad_relu=False --gating=True --moe_prob_gating=True --run_once=True --top_k=50

Looking forward to your reply, thank you!

inference wrong

I get an error when running

python inference.py --output_file=test-lstm-0002-val-150-random.csv --input_data_pattern="/data/dataset/yt8m/video-level/test/test*.tfrecord" --model=LstmModel --train_dir="/data/yt8m/public" --frame_features=True --feature_names="rgb,audio" --feature_sizes="1024,128" --batch_size=1024 --base_learning_rate=0.0002 --iterations=150 --lstm_random_sequence=True --run_once=True --top_k=50

Traceback (most recent call last):
File "inference.py", line 206, in
app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "inference.py", line 202, in main
FLAGS.output_file, FLAGS.batch_size, FLAGS.top_k)
File "inference.py", line 175, in inference
coord.join(threads)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
enqueue_callable()
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1259, in _single_operation_run
None)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Name: , Feature list 'audio' is required but could not be found. Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?

Consultation on training questions

1、Can i use video-level features train your model ?
2、Can i use rgb only(no audio) of frame-level features train your model?
thanks.

train time

How long does model(gatednetfvLF-128k-1024-80-0002-300iter-norelu-basic-gatedmoe) takes if use only one GPU?

Request your trained model

Congratulations on your winning. And thanks for sharing your work. I'm very interested in the YouTube-8M Large-Scale Video Understanding project. Now, i want to use my own video test your team model. Can you send me your trained model. My email:[email protected]. Thanks.

mean average precision greater than 1!!!!!!!!!!!!!!!!!

Hi
I have used these function to calculate mean average precision (mAp) for a subset of google Audioset. I have run a lot of training. now I have changes my features and run a new train, but in some epochs, mAp is greater than 1 !!! I don't know what the problem is? other metrics such as hit_at_one or AUC (area under the curve) are normal. how can I solve it?