Skip to content

Commit 474e539

Browse files
authored
docs: Add findings from exploration into model tuning performance degradation (#315)
* docs: Add findings from exploration into model tuning performance degradation Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: More specifically refer to COS instead of just PVC Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * docs: Change section name and remove numbers from README.md Signed-off-by: Will Johnson <mwjohnson728@gmail.com> --------- Signed-off-by: Will Johnson <mwjohnson728@gmail.com> Signed-off-by: Anh Uong <anh.uong@ibm.com>
1 parent 2c56c30 commit 474e539

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,13 @@ generation_config.json model-00005-of-00006.safetensors tokenizer.model
270270

271271
</details>
272272

273+
#### Optimizing writing checkpoints
274+
Writing models to Cloud Object Storage (COS) is an expensive operation. Saving model checkpoints to a local directory causes much faster training times than writing to COS. You can use `output_dir` and `save_model_dir` to control which type of storage you write your checkpoints and final model to.
275+
276+
You can set `output_dir` to a local directory and set `save_model_dir` to COS to save time on write operations while ensuring checkpoints are saved.
277+
278+
In order to achieve the fastest train time, set `save_strategy="no"`, as saving no checkpoints except for the final model will remove intermediate write operations all together.
279+
273280
## Tuning Techniques:
274281

275282
### LoRA Tuning Example

0 commit comments

Comments
 (0)