Skip to content

Commit 0e3155e

Browse files
authored
Update README.md
1 parent 97f4f0b commit 0e3155e

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ the `imagefile_template` parameter as "img_{:05d}.jpg", is all that it takes to
120120

121121
### 3. Video Frame Sampling Method
122122
When loading a video, only a number of its frames are loaded. They are chosen in the following way:
123-
1. The frame indices [1,N] are divided into NUM_SEGMENTS even segments. From each segment, FRAMES_PER_SEGMENT consecutive indices are chosen at random.
123+
1. The frame indices [1,N] are divided into NUM_SEGMENTS even segments. From each segment, a random start-index is sampled from which FRAMES_PER_SEGMENT consecutive indices are loaded.
124124
This results in NUM_SEGMENTS*FRAMES_PER_SEGMENT chosen indices, whose frames are loaded as PIL images and put into a list and returned when calling
125125
`dataset[i]`.
126126
![alt text](https://github.com/RaivoKoot/images/blob/main/Sparse_Temporal_Sampling.jpg "Sparse-Temporal-Sampling-Strategy")
@@ -141,8 +141,8 @@ As of `torchvision 0.8.0`, all torchvision transforms can now also operate on ba
141141
on the batch identically on all images of the batch. Therefore, any torchvision transform can be used here to apply video-uniform preprocessing and augmentation.
142142

143143
REMEMBER:
144-
Pytorch transforms are applied to individual dataset samples (in this case a video frame PIL list, or a frame tensor after `imglist_totensor()`) before
145-
batching. So, any transforms used here must expect its input to be a frame tensor of shape `FRAMES x CHANNELS x HEIGHT x WIDTH` or a list of PIL images if `imglist_totensor()` is not used.
144+
Pytorch transforms are applied to individual dataset samples (in this case a video frame PIL list, or a frame tensor after `ImglistToTensor()`) before
145+
batching. So, any transforms used here must expect its input to be a frame tensor of shape `FRAMES x CHANNELS x HEIGHT x WIDTH` or a list of PIL images if `ImglistToTensor()` is not used.
146146
### 6. Conclusion
147147
A proper code-based explanation on how to use VideoFrameDataset for training is provided in `demo.py`
148148

0 commit comments

Comments
 (0)