You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+22-16Lines changed: 22 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,10 +57,11 @@ for image in frames:
57
57
-[1. Requirements](#1-requirements)
58
58
-[2. Custom Dataset](#2-custom-dataset)
59
59
-[3. Video Frame Sampling Method](#3-video-frame-sampling-method)
60
-
-[4. Using VideoFrameDataset for Training](#4-using-videoframedataset-for-training)
61
-
-[5. Conclusion](#5-conclusion)
62
-
-[6. Upcoming Features](#6-upcoming-features)
63
-
-[7. Acknowledgements](#6-acknowledgements)
60
+
-[4. Alternate Video Frame Sampling Methods](#4-alternate-video-frame-sampling-methods)
61
+
-[5. Using VideoFrameDataset for Training](#5-using-videoframedataset-for-training)
62
+
-[6. Conclusion](#6-conclusion)
63
+
-[7. Upcoming Features](#7-upcoming-features)
64
+
-[8. Acknowledgements](#8-acknowledgements)
64
65
65
66
### 1. Requirements
66
67
```
@@ -119,33 +120,38 @@ the `imagefile_template` parameter as "img_{:05d}.jpg", is all that it takes to
119
120
120
121
### 3. Video Frame Sampling Method
121
122
When loading a video, only a number of its frames are loaded. They are chosen in the following way:
122
-
1. The frame indices [1,N] are divided into NUM_SEGMENTS even segments. From each segment, FRAMES_PER_SEGMENT consecutive indices are chosen at random.
123
+
1. The frame indices [1,N] are divided into NUM_SEGMENTS even segments. From each segment, a random start-index is sampled from which FRAMES_PER_SEGMENT consecutive indices are loaded.
123
124
This results in NUM_SEGMENTS*FRAMES_PER_SEGMENT chosen indices, whose frames are loaded as PIL images and put into a list and returned when calling
If you do not want to use sparse temporal sampling and instead want to sample a single N-frame continuous
130
+
clip from a video, this is possible. Set `NUM_SEGMENTS=1` and `FRAMES_PER_SEGMENT=N`. Because VideoFrameDataset
131
+
will chose a random start index per segment and take `NUM_SEGMENTS` continuous frames from each sampled start
132
+
index, this will result in a single N-frame continuous clip per video. An example of this is in `demo.py`.
127
133
128
-
### 4. Using VideoFrameDataset for training
134
+
### 5. Using VideoFrameDataset for training
129
135
As demonstrated in `demo.py`, we can use PyTorch's `torch.utils.data.DataLoader` class with VideoFrameDataset to take care of shuffling, batching, and more.
130
-
To turn the lists of PIL images returned by VideoFrameDataset into tensors, the transform `video_dataset.imglist_totensor()` can be supplied
136
+
To turn the lists of PIL images returned by VideoFrameDataset into tensors, the transform `video_dataset.ImglistToTensor()` can be supplied
131
137
as the `transform` parameter to VideoFrameDataset. This turns a list of N PIL images into a batch of images/frames of shape `N x CHANNELS x HEIGHT x WIDTH`.
132
-
We can further chain preprocessing and augmentation functions that act on batches of images onto the end of `imglist_totensor()`.
138
+
We can further chain preprocessing and augmentation functions that act on batches of images onto the end of `ImglistToTensor()`.
133
139
134
140
As of `torchvision 0.8.0`, all torchvision transforms can now also operate on batches of images, and they apply deterministic or random transformations
135
141
on the batch identically on all images of the batch. Therefore, any torchvision transform can be used here to apply video-uniform preprocessing and augmentation.
136
142
137
143
REMEMBER:
138
-
Pytorch transforms are applied to individual dataset samples (in this case a video frame PIL list, or a frame tensor after `imglist_totensor()`) before
139
-
batching. So, any transforms used here must expect its input to be a frame tensor of shape `FRAMES x CHANNELS x HEIGHT x WIDTH` or a list of PIL images if `imglist_totensor()` is not used.
140
-
### 5. Conclusion
144
+
Pytorch transforms are applied to individual dataset samples (in this case a video frame PIL list, or a frame tensor after `ImglistToTensor()`) before
145
+
batching. So, any transforms used here must expect its input to be a frame tensor of shape `FRAMES x CHANNELS x HEIGHT x WIDTH` or a list of PIL images if `ImglistToTensor()` is not used.
146
+
### 6. Conclusion
141
147
A proper code-based explanation on how to use VideoFrameDataset for training is provided in `demo.py`
142
148
143
-
### 6. Upcoming Features
144
-
-[] Add demo for sampling a single continous-frame clip from videos.
149
+
### 7. Upcoming Features
150
+
-[x] Add demo for sampling a single continous-frame clip from videos.
145
151
-[ ] Add support for arbitrary labels that are more than just a single integer.
146
152
-[ ] Add support for specifying START_FRAME and END_FRAME for a video instead of NUM_FRAMES.
147
153
148
-
### 7. Acknowledgements
154
+
### 8. Acknowledgements
149
155
We thank the authors of TSN for their [codebase](https://github.com/yjxiong/tsn-pytorch), from which we took VideoFrameDataset and adapted it
0 commit comments