Giter Club home page Giter Club logo

youtube-8m-willow's People

Contributors

antoine77340 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

youtube-8m-willow's Issues

Error in file_averaging.py

You mentioned to run final_averaging.py, but i guess you meant file_averaging.py

$ python file_averaging.py
Traceback (most recent call last):
File "file_averaging.py", line 64, in
avg = read_models(model_weights1)
NameError: name 'model_weights1' is not defined

InvalidArgumentError: Name: , Context feature 'video_id' is required but could not be found.

I download the 1/100 frame level features and run the train.py code. However, the follow wrong codes are obtained:

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Name: , Context feature 'video_id' is required but could not be found.
[[Node: train_input/ParseSingleSequenceExample_2/ParseSingleSequenceExample = ParseSingleSequenceExample[Ncontext_dense=1, Ncontext_sparse=1, Nfeature_list_dense=1, Nfeature_list_sparse=0, Tcontext_dense=[DT_STRING], context_dense_shapes=[[]], context_sparse_types=[DT_INT64], feature_list_dense_shapes=[[]], feature_list_dense_types=[DT_STRING], feature_list_sparse_types=[], _device="/job:localhost/replica:0/task:0/cpu:0"](train_input/ReaderReadV2_2:1, train_input/ParseSingleSequenceExample_2/ParseSingleSequenceExample/feature_list_dense_missing_assumed_empty, train_input/ParseSingleSequenceExample_2/ParseSingleSequenceExample/context_sparse_keys_0, train_input/ParseSingleSequenceExample_2/ParseSingleSequenceExample/context_dense_keys_0, train_input/ParseSingleSequenceExample_2/ParseSingleSequenceExample/feature_list_dense_keys_0, train_input/ParseSingleSequenceExample_2/Const, train_input/ParseSingleSequenceExample_2/ParseSingleSequenceExample/debug_name)]]
[[Node: train_input/shuffle_batch_join/cond_2/random_shuffle_queue_EnqueueMany/_98 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_90_train_input/shuffle_batch_join/cond_2/random_shuffle_queue_EnqueueMany", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

Caused by op u'train_input/ParseSingleSequenceExample_2/ParseSingleSequenceExample', defined at:
File "train.py", line 638, in
app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 626, in main
FLAGS.export_model_steps).run(start_new_model=FLAGS.start_new_model)
File "train.py", line 353, in run
saver = self.build_model(self.model, self.reader)
File "train.py", line 524, in build_model
num_epochs=FLAGS.num_epochs)
File "train.py", line 236, in build_graph
num_epochs=num_epochs))
File "train.py", line 164, in get_input_data_tensors
reader.prepare_reader(filename_queue) for _ in range(num_readers)
File "/media/ResearchProject/deeplearning/code/Youtube-8M-WILLOW/readers.py", line 212, in prepare_reader
max_quantized_value, min_quantized_value)
File "/media/ResearchProject/deeplearning/code/Youtube-8M-WILLOW/readers.py", line 224, in prepare_serialized_examples
for feature_name in self.feature_names
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/parsing_ops.py", line 780, in parse_single_sequence_example
feature_list_dense_defaults, example_name, name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/parsing_ops.py", line 977, in _parse_single_sequence_example_raw
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_parsing_ops.py", line 287, in _parse_single_sequence_example
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1264, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Name: , Context feature 'video_id' is required but could not be found.

How can I solve this problem?

Alway crash down after hours

I have met a problem, when I training the model after hours, it will crash down, I could not find why did it happened, can you help me finger this. I'm using tensorflow 1.3.
Here is log:

INFO:tensorflow:/job:master/task:0: training step 88648| Hit@1: 0.91 PERR: 0.79 GAP: 0.84 Loss: 5.15765
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.DataLossError'>, truncated record at 72435675
[[Node: train_input/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](train_input/TFRecordReaderV2_1, train_input/input_producer)]]
[[Node: train_input/SparseToDense_1/_407 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_33_train_input/SparseToDense_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
INFO:tensorflow:/job:master/task:0: training step 88649| Hit@1: 0.82 PERR: 0.67 GAP: 0.81 Loss: 5.31512
INFO:tensorflow:gatednetvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe/model.ckpt-88649 is not in all_model_checkpoint_paths. Manually adding it.
INFO:tensorflow:/job:master/task:0: Exporting the model at step 88649 to gatednetvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe/export/step_88649.
2017-08-26 07:09:39.978262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1055] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:02:00.0, compute capability: 3.5)
INFO:tensorflow:Restoring parameters from gatednetvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe/model.ckpt-88649
INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: gatednetvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe/export/step_88649/saved_model.pb
error:Tensor("map/while/Minimum:0", shape=(), dtype=int32) Tensor("map/while/Minimum_2:0", shape=(), dtype=int32)
Traceback (most recent call last):
File "train.py", line 638, in
app.run()
File "/data1/wangchao16/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 626, in main
FLAGS.export_model_steps).run(start_new_model=FLAGS.start_new_model)
File "train.py", line 428, in run
task_as_string(self.task))
File "/usr/lib64/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/data1/wangchao16/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/data1/wangchao16/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/data1/wangchao16/local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/data1/wangchao16/local/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
enqueue_callable()
File "/data1/wangchao16/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1229, in _single_operation_run
target_list_as_strings, status, None)
File "/usr/lib64/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/data1/wangchao16/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
c_api.TF_GetCode(status))
tensorflow.python.framework.errors_impl.DataLossError: truncated record at 72435675
[[Node: train_input/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](train_input/TFRecordReaderV2_1, train_input/input_producer)]]
[[Node: train_input/SparseToDense_1/_407 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_33_train_input/SparseToDense_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Got different results from inference.py and eval.py

Hello Antoine,

I've applied your code on my own datasets, but some strange things happened. Having input the same data and same model checkpoint, I got different results from eval.py and inference.py. Here's some examples:
From eval.py, numpy.argmax(predictions_val, axis=1) gives:
[17 18 17 18 17 17 17 17 8 17 17 17 0 18 18 17 17 0 17 17 0 17 17 17
17 17 10 17 19 17 17 17 19 10 17 17 17 9 17 7 17 17 17 17 9 17 17 17
10 10 10 0 17 17 17 8 17 17 0 17 17 17 17 17 17 17 17 17 12 17 17 17
17 17 14 17 17 10 17 0 17 16 17 17 17 17 9 0 17 17 17 17 17 17 17 0
17 0 17 17 17 17 17 17 17 0 1 3 5 17 17 9 17 17 10 17 17 17 17 9
17 17 17 10 17 17 17 17]
But from inference.py, numpy.argmax(predictions_val, axis=1) gives:
[11 8 4 18 6 10 10 6 10 0 10 8 0 18 8 10 10 0 10 8 0 8 10 18
8 8 10 10 19 17 18 10 19 10 10 14 9 10 10 8 0 10 10 11 10 3 11 9
10 10 10 0 18 8 10 8 10 0 0 10 7 10 8 0 9 0 10 10 6 10 10 3
10 0 10 6 11 10 10 0 10 10 1 9 0 11 4 3 0 6 0 17 6 17 10 18
7 0 6 7 0 7 10 0 10 0 1 0 16 6 0 9 10 5 10 7 0 11 10 0
0 6 10 10 1 0 6 3]

They are totally different, have you tried this as I did? Looking forward to your reply. Thanks in advance.

Sylvia

inference error using the pretrained model

I downloaded the pertained model zip file , it contains these files
image
Did I miss any files in downloading?

in this line
latest_checkpoint = tf.train.latest_checkpoint(train_dir)
latest_checkpoint returns None
so it gives this error because of the following code
if latest_checkpoint is None:
raise Exception("unable to find a checkpoint at location: %s" % train_dir)

Can anyone please help. Many thanks!!!

I tried to freeze the model, but in my freezing code since the model is not found, I can not even freeze it

Google audio set

Hello @antoine77340 ,

I wanted to know if it would be viable to run your models on the google audio set data.

The data is stored as tfrecords. The number of classes in the data is 527. What changes do you think I should make in the script/ architecture to get your models running?

Thanks in advance.

How to fine-tune your pretrained model

Hi,could you tell me how to modify code if I want to fine-tune the pretrained model you provided (gatednetvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe),my own dataset has only 12 classes.
Thanks.

A question about NetVLAD

In NetVLAD,

        activation = tf.reshape(activation, [-1, self.max_frames, self.cluster_size])

        a_sum = tf.reduce_sum(activation,-2,keep_dims=True) # [num_video, 1, 64]

        cluster_weights2 = tf.get_variable("cluster_weights2",
            [1,self.feature_size, self.cluster_size], # 1x1024x64
            initializer = tf.random_normal_initializer(stddev=1 / math.sqrt(self.feature_size)))
        
        a = tf.multiply(a_sum,cluster_weights2) # what is the shape of a?

Problem in running pre-trained model

When i run the pre-trained model using the inference command, the script gets stuck at the last line of the following log :

INFO:tensorflow:number of input files: 13
INFO:tensorflow:loading meta-graph: F:/project_27_06_2018/yt8m/v2/models/video/sample_model/model.ckpt-310001.meta
WARNING:tensorflow:The saved meta_graph is possibly from an older release:
'local_variables' collection should be of type 'byte_list', but instead is of type 'node_list'.
WARNING:tensorflow:The saved meta_graph is possibly from an older release:
'model_variables' collection should be of type 'byte_list', but instead is of type 'node_list'.
INFO:tensorflow:restoring variables from F:/project_27_06_2018/yt8m/v2/models/video/sample_model/model.ckpt-310001
INFO:tensorflow:Restoring parameters from F:/project_27_06_2018/yt8m/v2/models/video/sample_model/model.ckpt-310001

I use the following command to run the inference script :

runfile('F:/project_27_06_2018/yt8m/code/inference.py', args='--output_file="F:/project_27_06_2018/yt8m/output/test-lstm-0002-val-150-random.csv" --input_data_pattern="F:/project_27_06_2018/yt8m/v2/video/validate*.tfrecord" --model="F:/project_27_06_2018/yt8m/v2/models/video/sample_model/model.ckpt-310001.data-00000-of-00001" --train_dir="F:/project_27_06_2018/yt8m/v2/models/video/sample_model/" --frame_features=False --feature_names="rgb,audio" --feature_sizes="1024,128" --batch_size=1024 --base_learning_rate=0.0002 --iterations=150 --lstm_random_sequence=True --run_once=True --top_k=50', wdir='F:/project_27_06_2018/yt8m/code')

What am I doing wrong?

Inference on new videos?

Hi, thanks for sharing the great work. Have you figured out how to use this model for any given videos, i.e., have you figured out how to extract the features from RGB frames into the required tfrecords? Thanks!

About the validation

Hi, can you tell me about the validation part, there is no describe about validation in your code. Can I just use the eval.py to evaluate the trained model on validation data like your method on inference.py as following? Or can you give me an example.

python eval.py --eval_data_pattern="$path_to_features/validatea*.tfrecord" --model=NetVLADModelLF --train_dir=gatedlightvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe --frame_features=True --feature_names="rgb,audio" --feature_sizes="1024,128" --batch_size=1024 --base_learning_rate=0.0002 --netvlad_cluster_size=256 --netvlad_hidden_size=1024 --moe_l2=1e-6 --iterations=300 --learning_rate_decay=0.8 --netvlad_relu=False --gating=True --moe_prob_gating=True --lightvlad=True --run_once=True --top_k=50

Context feature 'video_id' is required but could not be found.

Hello @antoine77340

I am trying to run the GRU model in Antoine's code to the new version of the data, and this is the error I get when I run it:
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Name: , Context feature 'video_id' is required but could not be found.

The code runs on the older version of the data (version1), but does not run on the newer version.

This is the code that I ran:
python train.py --train_data_pattern=${HOME}/yt8m/v2/frame/train*.tfrecord --model=GruModel --train_dir ~/yt8m/v2/models/frame/GRU_model2 --frame_features=True --feature_names="audio" --feature_sizes="128" --batch_size=128 --base_learning_rate=0.0002 --gru_cells=1200 --learning_rate_decay=0.9 --moe_l2=1e-6 --max_step=300000 --start_new_model

NOTE: I am running it on the audio features only. The GRU model is able to handle this (the code runs on the older data), so this should not be an issue.

I need to run the models on the newer data only. Please suggest changes to the code, so that the model can run.

Thanks in advance.

Feature list 'audio' is required but could not be found.

Hi i want to use your code to do experiments on audioset (audio only, frame-level features 128-dim).

But i have problem with your NetVLADModelLF model:
Command:
python train.py --train_data_pattern='/vol/vssp/msos/yx/audioset/audioset_v1_embeddings/bal_train/*.tfrecord' --model=NetVLADModelLF --train_dir=gatednetvladLF-256k-1024-80-0002-300iter-norelu-basic-gatedmoe --frame_features=True --feature_names='audio_embedding' --feature_sizes='128' --batch_size=80 --base_learning_rate=0.0002 --netvlad_cluster_size=256 --netvlad_hidden_size=128 --moe_l2=1e-6 --iterations=300 --learning_rate_decay=0.8 --netvlad_relu=False --gating=True --moe_prob_gating=True --max_step=700000

Error: 2017-07-24 15:44:44.093022: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Name: , Feature list 'audio' is required but could not be found. Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?

But i can successfully run using LstmModel or GruModel :
command:
python train.py --train_data_pattern='/vol/vssp/msos/yx/audioset/audioset_v1_embeddings/bal_train/*.tfrecord' --frame_features=True --model=LstmModel --feature_names='audio_embedding' --feature_sizes='128' --train_dir=tmp_model/frame_level_lstm_model_bal

The feature name in Audioset is "--feature_names='audio_embedding'', i think there might be feature name error in your NetVLADModelLF definition. Could you give me some help on it ?

Thanks a lot.

Not found: Key tower/fully_connected/biases not found in checkpoint

I downloaded your pretrained model and tried to evaluate it with 10 videos from the training sample, but as I press this command, an error follows. And I can't find why this is happening. Everything else is installed normally

python eval.py --eval_data_pattern=/home/estathop/features/train*.tfrecord --train_dir=/home/estathop/Documents/pretrained/public/ --run_once=True
/home/estathop/tensorflow/local/lib/python2.7/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
tensorflow version: 1.9.0-dev20180502
INFO:tensorflow:Using batch size of 1024 for evaluation.
INFO:tensorflow:number of evaluation files: 43
INFO:tensorflow:built evaluation graph
2018-05-04 15:33:19.482097: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-05-04 15:33:19.573605: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-04 15:33:19.573963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1349] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.20GiB
2018-05-04 15:33:19.573978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1428] Adding visible gpu devices: 0
2018-05-04 15:33:19.760748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-04 15:33:19.760790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:922] 0
2018-05-04 15:33:19.760795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:935] 0: N
2018-05-04 15:33:19.760999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1046] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6955 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Loading checkpoint for eval: /home/estathop/Documents/pretrained/public/model.ckpt-310001
INFO:tensorflow:Restoring parameters from /home/estathop/Documents/pretrained/public/model.ckpt-310001
2018-05-04 15:33:19.884136: W tensorflow/core/framework/op_kernel.cc:1290] CtxFailure at save_restore_v2_ops.cc:184: Not found: Key tower/fully_connected/biases not found in checkpoint
Traceback (most recent call last):
File "eval.py", line 332, in
app.run()
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "eval.py", line 328, in main
evaluate()
File "eval.py", line 320, in evaluate
last_global_step_val)
File "eval.py", line 197, in evaluation_loop
saver.restore(sess, latest_checkpoint)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1746, in restore
six.reraise(exception_type, exception_value, exception_traceback)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1734, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key tower/fully_connected/biases not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op u'save/RestoreV2', defined at:
File "eval.py", line 332, in
app.run()
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "eval.py", line 328, in main
evaluate()
File "eval.py", line 309, in evaluate
saver = tf.train.Saver(tf.global_variables())
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1266, in init
self.build()
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1278, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1315, in _build
build_save=build_save, build_restore=build_restore)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 763, in _build_internal
restore_sequentially, reshape)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 400, in _AddRestoreOps
restore_sequentially)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 814, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3408, in create_op
op_def=op_def)
File "/home/estathop/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1734, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Key tower/fully_connected/biases not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Question about map.

Hers is the evaluate result.
INFO:tensorflow:epoch/eval number 300001 | Avg_Hit@1: 0.848 | Avg_PERR: 0.726 | MAP: 0.077 | GAP: 0.8038 | Avg_Loss: 5.594060

MAP is so low, is that normal?

Prediction results are the same in inference

I just use the pretrained model to predict my own data without training.
The model is NetVLADModelLF.

I only extract rgb features on my own data and construct input tensor.

for i in xrange(data_num): 
        feature = data[i]
        pad_feature = np.zeros([300, 1152])
        pad_feature[0, :1024] = feature
        video_batch = pad_feature[np.newaxis, :, :].astype(np.float32)
        num_frames_batch = np.array([1], dtype=np.int32)
        predictions = sess.run([predictions_tensor],
                        feed_dict={input_tensor: video_batch, num_frames_tensor: num_frames_batch})

The predictions are the same.
I tried to construct 300 frame input but the problems remains.

Did i do the inference correctly?

Problem in inference

Hi,glad to tell you, google have released pca matrix , and now I want to write a predict function to test the original video with your model.
But I have met a question,when I set batch_size to 1,the result is strange, could you please tell me the reason?

About descriptors in NetVLAD

Hi @antoine77340
In the paper "Learnable pooling with Context Gating for video classification".
At page 4, NetVLAD aggregation section. Would please tell me where the descriptor xi comes from?
In original VLAD, they are descriptors produced by SIFT on each frame.
Here, Does the xi mean each frame-level feature(dimension=1024) ?
Thanks !

Issues when test video/frame feature

Hi, @antoine77340. I have download youtube-8m dataset. Then, i use video/frame test folder test your pretrained model. But i occur a error when testing, error information as follows:
INFO:tensorflow:number of input files: 4096
INFO:tensorflow:loading meta-graph: pretrainedmodel/model.ckpt-310001.meta
INFO:tensorflow:restoring variables from pretrainedmodel/model.ckpt-310001
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, ../YT8M/youtube-8m/features/validatelN.tfrecord
[[Node: train_input/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](train_input/TFRecordReaderV2_1, train_input/input_producer)]]

Caused by op u'train_input/ReaderReadV2_1', defined at:
File "inference.py", line 203, in
app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "inference.py", line 199, in main
FLAGS.output_file, FLAGS.batch_size, FLAGS.top_k)
File "inference.py", line 128, in inference
saver = tf.train.import_meta_graph(meta_graph_location, clear_devices=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1577, in import_meta_graph
**kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/meta_graph.py", line 498, in import_scoped_meta_graph
producer_op_list=producer_op_list)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 287, in import_graph_def
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1264, in init
self._traceback = _extract_stack()

NotFoundError (see above for traceback): ../YT8M/youtube-8m/features/validatelN.tfrecord
[[Node: train_input/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](train_input/TFRecordReaderV2_1, train_input/input_producer)]]

In addition, i use the command begin testing as follows:
python inference.py --output_file=test_video_v1.csv --input_data_pattern="video_test/test*.tfrecord" --model=NetVLADModelLF --train_dir=pretrainedmodel --frame_features=false --batch_size=1024 --base_learning_rate=0.0002 --netvlad_cluster_size=256 --netvlad_hidden_size=1024 --moe_l2=1e-6 --iterations=300 --learning_rate_decay=0.8 --netvlad_relu=False --gating=True --moe_prob_gating=True --run_once=True --top_k=50

Looking forward to your reply, thank you!

inference wrong

I get an error when running

python inference.py --output_file=test-lstm-0002-val-150-random.csv --input_data_pattern="/data/dataset/yt8m/video-level/test/test*.tfrecord" --model=LstmModel --train_dir="/data/yt8m/public" --frame_features=True --feature_names="rgb,audio" --feature_sizes="1024,128" --batch_size=1024 --base_learning_rate=0.0002 --iterations=150 --lstm_random_sequence=True --run_once=True --top_k=50

Traceback (most recent call last):
File "inference.py", line 206, in
app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "inference.py", line 202, in main
FLAGS.output_file, FLAGS.batch_size, FLAGS.top_k)
File "inference.py", line 175, in inference
coord.join(threads)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
enqueue_callable()
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1259, in _single_operation_run
None)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Name: , Feature list 'audio' is required but could not be found. Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?

Consultation on training questions

1、Can i use video-level features train your model ?
2、Can i use rgb only(no audio) of frame-level features train your model?
thanks.

train time

How long does model(gatednetfvLF-128k-1024-80-0002-300iter-norelu-basic-gatedmoe) takes if use only one GPU?

Request your trained model

Congratulations on your winning. And thanks for sharing your work. I'm very interested in the YouTube-8M Large-Scale Video Understanding project. Now, i want to use my own video test your team model. Can you send me your trained model. My email:[email protected]. Thanks.

mean average precision greater than 1!!!!!!!!!!!!!!!!!

Hi
I have used these function to calculate mean average precision (mAp) for a subset of google Audioset. I have run a lot of training. now I have changes my features and run a new train, but in some epochs, mAp is greater than 1 !!! I don't know what the problem is? other metrics such as hit_at_one or AUC (area under the curve) are normal. how can I solve it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.