You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-8Lines changed: 15 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,9 +54,10 @@ for image in frames:
54
54
-[1. Requirements](#1-requirements)
55
55
-[2. Custom Dataset](#2-custom-dataset)
56
56
-[3. Video Frame Sampling Method](#3-video-frame-sampling-method)
57
-
-[4. Using VideoFrameDataset for Training](#4-using-videoframedataset-for-training)
58
-
-[5. Conclusion](#5-conclusion)
59
-
-[6. Acknowledgements](#6-acknowledgements)
57
+
-[4. Alternate Video Frame Sampling Methods](#4-alternate-vide-frame-sampling-methods)
58
+
-[5. Using VideoFrameDataset for Training](#5-using-videoframedataset-for-training)
59
+
-[6. Conclusion](#6-conclusion)
60
+
-[7. Acknowledgements](#7-acknowledgements)
60
61
61
62
### 1. Requirements
62
63
```
@@ -118,20 +119,26 @@ When loading a video, only a number of its frames are loaded. They are chosen in
118
119
1. The frame indices [1,N] are divided into NUM_SEGMENTS even segments. From each segment, FRAMES_PER_SEGMENT consecutive indices are chosen at random.
119
120
This results in NUM_SEGMENTS*FRAMES_PER_SEGMENT chosen indices, whose frames are loaded as PIL images and put into a list and returned when calling
120
121
`dataset[i]`.
122
+
123
+
### 4. Alternate Video Frame Sampling Methods
124
+
If you do not want to use sparse temporal sampling and instead want to sample a single N-frame continuous
125
+
clip from a video, this is possible. Set `NUM_SEGMENTS=1` and `FRAMES_PER_SEGMENT=N`. Because VideoFrameDataset
126
+
will chose a random start index per segment and take `NUM_SEGMENTS` continuous frames from each sampled start
127
+
index, this will result in a single N-frame continuous clip per video. An example of this is in `demo.py`.
121
128
122
-
### 4. Using VideoFrameDataset for training
129
+
### 5. Using VideoFrameDataset for training
123
130
As demonstrated in `demo.py`, we can use PyTorch's `torch.utils.data.DataLoader` class with VideoFrameDataset to take care of shuffling, batching, and more.
124
-
To turn the lists of PIL images returned by VideoFrameDataset into tensors, the transform `video_dataset.imglist_totensor()` can be supplied
131
+
To turn the lists of PIL images returned by VideoFrameDataset into tensors, the transform `video_dataset.ImglistToTensor()` can be supplied
125
132
as the `transform` parameter to VideoFrameDataset. This turns a list of N PIL images into a batch of images/frames of shape `N x CHANNELS x HEIGHT x WIDTH`.
126
-
We can further chain preprocessing and augmentation functions that act on batches of images onto the end of `imglist_totensor()`.
133
+
We can further chain preprocessing and augmentation functions that act on batches of images onto the end of `ImglistToTensor()`.
127
134
128
135
As of `torchvision 0.8.0`, all torchvision transforms can now also operate on batches of images, and they apply deterministic or random transformations
129
136
on the batch identically on all images of the batch. Therefore, any torchvision transform can be used here to apply video-uniform preprocessing and augmentation.
130
137
131
-
### 5. Conclusion
138
+
### 6. Conclusion
132
139
A proper code-based explanation on how to use VideoFrameDataset for training is provided in `demo.py`
133
140
134
-
### 6. Acknowledgements
141
+
### 7. Acknowledgements
135
142
We thank the authors of TSN for their [codebase](https://github.com/yjxiong/tsn-pytorch), from which we took VideoFrameDataset and adapted it.
0 commit comments