Comments (4)
Hi. Thanks for the confirmation! Oh I see, I was not aware that the norm was to use only a single frame per segment. Sure, although it is difficult for me right now, if I come up with a way to improve the current strategy, I'll definitely raise a PR. Thanks again.
from video-dataset-loading-pytorch.
Hi. Good question.
So, the equation is:
segment_duration = (record.num_frames - self.frames_per_segment + 1) // self.num_segments
If you don't subtract frames_per_segment and add 1, then an IndexOutOfBounds error can occur later.
In the case where frames_per_segment=1, - self.frames_per_segment + 1
obviously makes no difference, because we are subtracting 1 and adding 1. So, this only matters in the case where frames_per_segment > 1.
Some Context
When you use frames_per_segment > 1, what happens is that for each segment, a random start index is sampled, and then starting from each start_index, frames_per_segment consecutive frames are loaded and returned. This function _sample_indices
does not return the indices of all frames to be loaded, but only the start index of each segment's frames_per_segment frames.
An Example
num_segments = 3
frames_per_segment = 2
num_frames = 6
frame_indices = [0, 1, 2, 3, 4, 5]
We can not use 5 as a start index. This is because starting from 5 we would need to take frames_per_segment=2 frames which would be the two frames at index 5 and 6. Index 6 is out of bounds though. If you do not do - self.frames_per_segment + 1
, then the function will sometimes return index 5 as the start index for the third segment. Doing - self.frames_per_segment + 1
is not a perfect solution, but it works.
from video-dataset-loading-pytorch.
Thank you for the detailed explanation! I understood the purpose of it now but my doubt persists.
In the example you have given, the segments would be [0,1], [2,3] and [4,5]. So, the starting index can be either 0, 2 or 4. However, the result of segment_duration = (record.num_frames - self.frames_per_segment + 1) // self.num_segments
gives segment_duration=1
. Should this not be 2 (each segment [0,1], [2,3],[4,5] has 2 frames) ?
Let's say segment_duration=1
. Now, _sample_indices
always returns offsets=[0,1,2]
. With frames_per_segment = 2
, this spans frames [0,1,2,3]. So frames 4 and 5 will never be used. Is this supposed to be how it works?
Thanks again!
from video-dataset-loading-pytorch.
Yes, you are right that this is how it works, even though it is not the best behavior. Because most people only use a single frame per segment, I did not pay much attention to improving this behavior, when I adapted this repostiroy from the original code repository, which implemented this sub-optimal behavior. In my own experiments, I also only ever use a single frame per segment.
However, you are very welcome to create a pull request and suggest an improvement for this. If it is suitable, I am happy to merge it!!
from video-dataset-loading-pytorch.
Related Issues (9)
- demo fix error HOT 1
- demo with transformations not working HOT 2
- Assigning values from data loader in training loop causing error HOT 4
- END_FRAME HOT 3
- Using a dataset with different widths and heights for each frame HOT 2
- reporting of different issues encountered while working
- Question about annotations.txt HOT 1
- Extra whitespace characters in the annotation file breaks HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from video-dataset-loading-pytorch.