Skip to content
This repository was archived by the owner on Dec 14, 2023. It is now read-only.

Commit 98177c5

Browse files
Update README.md
1 parent 5caab03 commit 98177c5

File tree

1 file changed

+10
-13
lines changed

1 file changed

+10
-13
lines changed

README.md

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,15 @@
99
## Getting Started
1010
### Requirements
1111

12+
#### Installation
1213
```bash
1314
git clone https://github.com/ExponentialML/Text-To-Video-Finetuning.git
15+
cd Text-To-Video-Finetuning
16+
git lfs install
17+
git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b ./models/model_scope_diffusers/
1418
```
1519

20+
#### Python Requirements
1621
```bash
1722
pip install -r requirements.txt
1823
```
@@ -23,20 +28,12 @@ You could potentially save memory by installing xformers and enabling it in your
2328

2429
https://github.com/facebookresearch/xformers
2530

26-
### Models
27-
The models were downloaded from here https://huggingface.co/damo-vilab/text-to-video-ms-1.7b/tree/main.
28-
29-
This repository was only tested with **FP16** safetensors. Other files (bin, FP32) should work fine, but if you have any trouble, refer to this.
30-
31-
If you wish to download all of the models, you can use this command:
32-
33-
```bash
34-
git lfs install
35-
git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b
36-
```
37-
3831
## Hardware
39-
Minimum RTX 3090. You're free to open a PR for optimization (please do!), but this is heavy without gradient checkpointing support.
32+
Recommended to use a RTX 3090, but you should be able to train on GPUs with <= 16GB ram with:
33+
- Validation turned off
34+
- Xformers or Torch 2.0 Scaled Dot-Product Attention
35+
- gradient checkpointing enabled.
36+
- Resolution of 256.
4037

4138
## Usage
4239

0 commit comments

Comments
 (0)