lzx551402 / tfmatch Goto Github PK

View Code? Open in Web Editor NEW

104.0 104.0 19.0 59 KB

TensorFlow implementation of GeoDesc (ECCV'18), ContextDesc (CVPR'19) and ASLFeat (CVPR'20)

License: BSD 3-Clause "New" or "Revised" License

Python 99.49% Shell 0.51%

tfmatch's People

Contributors

Stargazers

Watchers

Forkers

chenyihang1993 jiaxueli marcelomata robioperay zhoudihuang linxiyao yajiang kshaonan zhichen902 timingspace sgjq chlee98 bizhiwei sunstarchan atztao wahello welersonmelo ibinti gidobot

tfmatch's Issues

Model validation

How was the model validated before running on the test sets? I see that you can set is_training=False to run on a separate held out set, but this causes multiple errors when running ASLFeat. Thank you in advance!

training error

Hi @lzx551402 Thanks for your great works!
When I run train_aslfeat_base.sh, an error occured.

OverflowError: cannot fit 'int' into an index-sized integer
My environment is Python 3.7.7 while others are installed by your requirement!

Pytorch porting failed

Dear @zjhthu @lzx551402, thanks for ur great works and code sharing!
I'm struggling to reimplement your TF code into PT version, but failed to get same accuracy/loss.
almost all code seems same and I actually saw same result over 1 iteration using same input while doing kinda unit test.
my testing sequence is like:

[ Tesing ASLFeat Forward part ]

by running sess.run(...), get numpy net_input0, 1 / depth 0,1 / K0,1 / rel_pose / dense_feat_map, sum_det_score_map of TF version.
converting above numpy inputs to PT tensors (with permutating NHWC->NCHW over net_inputs and depths), and I got so silmilar sum_det_score_map(conv1+conv3+conv6 after peakiness_score calling)

[ Testing Loss part ]

by saving TF's inputs(pos0, pos1, dense_feat_map0, dense_feat_map1, score_map0, score_map1) for make_detector_loss() and run both TF/PT and got exactly same loss/accuracy.

but when I run PT training, the loss doesn't drop lower than 0.6 and
I checked the "moving_instance_max" values of each conv1, 3, 8's input in peakiness_score and surprisingly the values change so diffently .

TF version :
---------at the begging ---------------- step 100000
conv1 => from 6, keep growing to almost 36 - 38
conv3 => from 12, keep growing to almost 100 - 103
conv8 => from 4, keep growing to around 12 - 14

PT version :
---------at the begging ------------- step 1000(PEAK) ----------------------- step 100000
conv1 => from 6 to almost 12 - 15 ->> not growing and decrese again 1
conv3 => from 12 to almost 10 -13 ->> not growing and decrese again 1
conv8 => from 4 to around 4 - 7 ->> not growing and decrese again 0.xx

I used same ExpLR scheduler as below:

optimizer = optim.SGD(model.parameters(),  
                      momentum=0.9, lr=0.1, weight_decay=0.0001) 
scheduler_expLR = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.99999)

can u guess why feature maps are growing differently?

Plus, tf.control_dependencies([assign_moving_average(moving_instance_max, instance_max, decay)]) with "reuse"
for using the same value over batches.

I think it is for getting moving average of input's max value to normalize the growing feature maps.
I just update the moving average like:

instance_max = tc.max(inputs)  # tf.reduce_max(inputs) 
if self.moving_instance_max[idx] == tc.ones(1).to('cuda'): 
       self.moving_instance_max[idx] = instance_max
moving_instance_max= moving_instance_max * decay + instance_max * (1 - decay)

I think this may be one of the many reasons that trigger the different result from your code.
could you explain the moving average part more in detail??

Thank you very much ~!!!

softmax problem in contextDesc

Hi,
In contextDesc paper, I saw

in equation(5), the s is softmax(2-D), but the code in line170 of loss.py, list below
softmax_row = tf.nn.softmax(log_scale * dist_mat, axis=2)
the dist_mat has not subtract by 2, I don't know there is any wrong or i think wrong?

Thanks

train custom dataset

Nice work first! Is it possible for ASLFeat to train on a custom dataset which only consists of image pairs (without camera.txt or other info)? I find ASLFeat loss calculation needs depth, rel_pos and other infos which i dont have in my dataset.

a problem of comparison algorithm

Hello, does your comparison algorithm use the same training data as ASLFEAT?

phototourism data not optional?

I downloaded the required GL3D part of the training set and ran the training for the first time to generate the matches.

The dataset preparation fails when trying to access the tourism_0001 folder, which is not part of the core training set:

2020-07-02 11:22:54.930044: W Prepare match sets upon request.
 37% (401 of 1073) |#######              | Elapsed Time: 5:26:00 ETA:  11:46:47 50% (542 of 1073) |##########          | Elapsed Time: 6:58:28 ETA:   0:49:24Traceback (most recent call last):
  File "train.py", line 141, in <module>
    tf.compat.v1.app.run()
  File "[...]/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "[...]/absl/app.py", line 299, in run
    _run_main(main, args)
  File "[...]/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "train.py", line 132, in main
    regenerate=FLAGS.regenerate, is_training=FLAGS.is_training, data_split=FLAGS.data_split)
  File "[...]/tfmatch/preprocess.py", line 101, in prepare_match_sets
    visualize=False, global_img_list=global_img_list)
  File "[...]/tfmatch/tools/io.py", line 142, in parse_corr_to_match_set
    matches = read_corr(input_corr)
  File "[...]/tfmatch/tools/io.py", line 118, in read_corr
    with open(file_path, 'rb') as fin:
FileNotFoundError: [Errno 2] No such file or directory: '[...]/GL3D/data/tourism_0001/geolabel/corr.bin'
100% (1073 of 1073) |###################| Elapsed Time: 6:58:29 Time:  6:58:29

Is the phototourism data required for training?

accuracy calculation mistake

` for i in range(batch_size):
if corr_weight is not None:
loss += tf.reduce_sum(tot_loss[i][inlier_mask[i]]) /
(tf.reduce_sum(corr_weight[i][inlier_mask[i]]) + 1e-6)
else:
loss += tf.reduce_mean(tot_loss[i][inlier_mask[i]])
cnt_err_row = tf.count_nonzero(
err_row[i][inlier_mask[i]], dtype=tf.float32)
cnt_err_col = tf.count_nonzero(
err_col[i][inlier_mask[i]], dtype=tf.float32)
tot_err = cnt_err_row + cnt_err_col
accuracy += 1. -
tf.math.divide_no_nan(tot_err, tf.cast(
inlier_num[i], tf.float32)) / batch_size / 2.

matched_mask = tf.logical_and(tf.equal(err_row, 0), tf.equal(err_col, 0))
matched_mask = tf.logical_and(matched_mask, inlier_mask)

loss /= batch_size
accuracy /= batch_size`

As I noticed, there maybe a little mistake in accuracy calculation. I don't think the accuracy calculation should divide batch_size as in your code.

Mask in losses.py

Hi Zixin,

I noticed that you apply a mask (line.134 of losses.py) with a threshold of 0.008. May I know the rules to select this value?

Best,
Bing

The training set of ASLFeat

Hi @lzx551402 , thanks for your great work.
You mentioned that your ASLFeat v2.0 is trained using GL3D. I'm curioused about the composition of this training set.

For example, have you added the tourism dataset into the training set?
Is this ASLFeat robust to illumination/weather/season variations? Have you evaluated that?
I also wanna ask about what kind of improvement can be done on ASLFeat, like time cost or perfomrance improvement.

Best, quite looking forward to your reply.

hseq_eval accuracy

Hi @lzx551402 @zjhthu Thanks for your great works!

I evaluate my own trained model on HPatches Sequences, get Recall of 65.39/72.91 for i/v sequences.
I fellow the Training ContextDesc instructions in readme.
I get the regional features through the following codes:
def load_imgs(img_paths):
rgb_list = []
img_name_list = []
for img_path in img_paths:
img = cv2.imread(img_path)
img = cv2.resize(img,(448,448),interpolation=cv2.INTER_CUBIC)
rgb_list.append(img)
img_name_list.append(img_path.split('/')[-1].split('.')[0])
return rgb_list, img_name_list

def extract_regional_features(rgb_list, img_name_list, model_path, save_path):
model = get_model('reg_model')(model_path)
for i, val in enumerate(rgb_list):
reg_feat = model.run_test_data(val)
reg_feat.astype(np.float32).tofile(save_path + img_name_list[i] +'.bin')

All other parts are the same as the readme ! TF version is 1.14.0! Any mistakes did I make?

Loss does not decrease

Hi,I encountered a problem that the loss did not decrease.
Specifically,the loss remains at 0.5 from the 100th iteration to 360,000 iterations and the effect has not improved after training.The indicator is always worse than what you give(avg_precision=0.71,avg_MMA=0.69) with the code and data that
is same with you.
So,What do you think is the possible cause of this problem?

batch_size and the unstable loss

@lzx551402 @zjhthu Hi, thanks a lot for your great work.
I'v trained aslfeat with circle loss , and when I use tensorboard to see the loss , I found it decreased unstably from step 5K to 400K. I also check the default batch_size is 2.

Did you do some experiments on the batch_size or other hyperparameters?
Would the loss be more stable, if I increase the batch_size? or get a better model ?
which base model did you use, when you train dcn with circle loss ? 380K step, 100K, or others?

Best Regards! look forward to your reply.