Hi, What exactly are the first and last dimensions of strides for ? In the sec

A question about strides in conv and pooling layers about tensorflow-tutorials HOT 1 CLOSED

hvass-labs commented on May 19, 2024

A question about strides in conv and pooling layers

from tensorflow-tutorials.

Comments (1)

Hvass-Labs commented on May 19, 2024 65

I know it can be frustrating to learn these things. But it would be best to ask this type of question on StackOverflow because it is a general question related to TensorFlow, and the answer could have benefited more people.

You also don't mention which tutorial this is from. It looks like Tutorial #2, but it was 6 months since I wrote that one, so it is a bit much to expect me to clearly remember the details of that tutorial.

In general:

A 4-dim tensor has shape [batch, height, width, channel]. For example, we could have a tensor with shape [10, 80, 120, 3] which means the batch has 10 images, each of which are 80 pixels high and 120 pixels wide, with 3 channels (e.g. RGB colours).

The word 'stride' is similar in meaning to a step-size. It means how much should the index be incremented in each of those dimensions when we move the convolutional filters across the input tensor. The first and last strides have to be 1. This might seem like a strange design choice by the TensorFlow team, because there is no need to have a parameter that you cannot change. Anyway, you must set them to 1, which means that we increment the indices 1 for the batch-number and 1 for the colour-channel.

If you were to set the last one to 3, then you would increment the index for the colour-channel by 3 every time you move the convolutional filter, which means that you would only use one of the colour-channels and skip the other 2.

So the other numbers in the stride, e.g. [1, H, W, 1] would mean that after calculating each convolution (i.e. dot-product of the conv-filter with a small part of the input tensor), we move the conv-filter H pixels in the 2nd dimension, and we move W pixels in the 3rd dimension of the input tensor. Then we calculate another dot-product with the same conv-filter for the new position in the input tensor.

Why are H and W set to 1 in this tutorial? Because we instead use max-pooling to downsample the images. As I recall, one of the exercises in Tutorial #2 or #3 is to replace the max-pooling with a stride in the conv-layer and see if that changes the results.

It might be useful to watch the video for Tutorial #2 again and also try and do the exercises.

I would also suggest adding print-statements to the tutorial, so you can see the shape of the tensors that are being passed around.

In the future, please post general questions about TensorFlow on StackOverflow so it can help more people.

from tensorflow-tutorials.

Recommend Projects

A question about strides in conv and pooling layers about tensorflow-tutorials HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent