You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you find the code useful, please star the repository.
4
6
5
7
If you are completely unfamiliar with loading datasets in PyTorch using `torch.utils.data.Dataset` and `torch.utils.data.DataLoader`, I recommend
6
8
getting familiar with these first through [this](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html) or
@@ -10,7 +12,7 @@ getting familiar with these first through [this](https://pytorch.org/tutorials/b
10
12
The VideoFrameDataset class serves to `easily`, `efficiently` and `effectively` load video samples from video datasets in PyTorch.
11
13
1) Easily because this dataset class can be used with custom datasets with minimum effort and no modification. The class merely expects the
12
14
video dataset to have a certain structure on disk and expects a .txt annotation file that enumerates each video sample. Details on this
13
-
can be found below and at `https://video-dataset-loading-pytorch.readthedocs.io/`.
15
+
can be found below and at `https://video-dataset-loading-pytorch.readthedocs.io/en/latest/VideoDataset.html`.
14
16
2) Efficiently because the video loading pipeline that this class implements is very fast. This minimizes GPU waiting time during training by eliminating input bottlenecks
15
17
that can slow down training time by several folds.
16
18
3) Effectively because the implemented sampling strategy for video frames is very strong. Video training using the entire sequence of
@@ -21,7 +23,8 @@ This approach has shown to be very effective and is taken from
21
23
22
24
In conjunction with PyTorch's DataLoader, the VideoFrameDataset class returns video batch tensors of size `BATCH x FRAMES x CHANNELS x HEIGHT x WIDTH`.
23
25
24
-
For a demo, visit `demo.py`.
26
+
For a demo, visit `demo.py`.
27
+
25
28
### QuickDemo (demo.py)
26
29
```python
27
30
root = os.path.join(os.getcwd(), 'demo_dataset') # Folder in which all videos lie in a specific structure
-[3. Video Frame Sampling Method](#3-video-frame-sampling-method)
57
60
-[4. Alternate Video Frame Sampling Methods](#4-alternate-vide-frame-sampling-methods)
58
61
-[5. Using VideoFrameDataset for Training](#5-using-videoframedataset-for-training)
59
62
-[6. Conclusion](#6-conclusion)
60
-
-[7. Acknowledgements](#7-acknowledgements)
63
+
-[7. Upcoming Features](#7-upcoming-features)
64
+
-[8. Acknowledgements](#8-acknowledgements)
61
65
62
66
### 1. Requirements
63
67
```
@@ -119,12 +123,13 @@ When loading a video, only a number of its frames are loaded. They are chosen in
119
123
1. The frame indices [1,N] are divided into NUM_SEGMENTS even segments. From each segment, FRAMES_PER_SEGMENT consecutive indices are chosen at random.
120
124
This results in NUM_SEGMENTS*FRAMES_PER_SEGMENT chosen indices, whose frames are loaded as PIL images and put into a list and returned when calling
If you do not want to use sparse temporal sampling and instead want to sample a single N-frame continuous
125
130
clip from a video, this is possible. Set `NUM_SEGMENTS=1` and `FRAMES_PER_SEGMENT=N`. Because VideoFrameDataset
126
131
will chose a random start index per segment and take `NUM_SEGMENTS` continuous frames from each sampled start
127
-
index, this will result in a single N-frame continuous clip per video. An example of this is in `demo.py`.
132
+
index, this will result in a single N-frame continuous clip per video. An example of this is in `demo.py`.
128
133
129
134
### 5. Using VideoFrameDataset for training
130
135
As demonstrated in `demo.py`, we can use PyTorch's `torch.utils.data.DataLoader` class with VideoFrameDataset to take care of shuffling, batching, and more.
@@ -134,12 +139,21 @@ We can further chain preprocessing and augmentation functions that act on batche
134
139
135
140
As of `torchvision 0.8.0`, all torchvision transforms can now also operate on batches of images, and they apply deterministic or random transformations
136
141
on the batch identically on all images of the batch. Therefore, any torchvision transform can be used here to apply video-uniform preprocessing and augmentation.
137
-
142
+
143
+
REMEMBER:
144
+
Pytorch transforms are applied to individual dataset samples (in this case a video frame PIL list, or a frame tensor after `imglist_totensor()`) before
145
+
batching. So, any transforms used here must expect its input to be a frame tensor of shape `FRAMES x CHANNELS x HEIGHT x WIDTH` or a list of PIL images if `imglist_totensor()` is not used.
138
146
### 6. Conclusion
139
147
A proper code-based explanation on how to use VideoFrameDataset for training is provided in `demo.py`
140
148
141
-
### 7. Acknowledgements
142
-
We thank the authors of TSN for their [codebase](https://github.com/yjxiong/tsn-pytorch), from which we took VideoFrameDataset and adapted it.
149
+
### 7. Upcoming Features
150
+
-[x] Add demo for sampling a single continous-frame clip from videos.
151
+
-[ ] Add support for arbitrary labels that are more than just a single integer.
152
+
-[ ] Add support for specifying START_FRAME and END_FRAME for a video instead of NUM_FRAMES.
153
+
154
+
### 8. Acknowledgements
155
+
We thank the authors of TSN for their [codebase](https://github.com/yjxiong/tsn-pytorch), from which we took VideoFrameDataset and adapted it
156
+
for general use and compatibility.
143
157
```
144
158
@InProceedings{wang2016_TemporalSegmentNetworks,
145
159
title={Temporal Segment Networks: Towards Good Practices for Deep Action Recognition},
0 commit comments