docs: Add findings from exploration into model tuning performance degradation (#315)

willmj · web-flow · commit 474e539afa31 · 2024-08-27T17:54:07.000-06:00
* docs: Add findings from exploration into model tuning performance degradation

Signed-off-by: Will Johnson &lt;mwjohnson728@gmail.com&gt;

* fix: More specifically refer to COS instead of just PVC

Signed-off-by: Will Johnson &lt;mwjohnson728@gmail.com&gt;

* docs: Change section name and remove numbers from README.md

Signed-off-by: Will Johnson &lt;mwjohnson728@gmail.com&gt;

---------

Signed-off-by: Will Johnson &lt;mwjohnson728@gmail.com&gt;
Signed-off-by: Anh Uong &lt;anh.uong@ibm.com&gt;
diff --git a/README.md b/README.md
@@ -270,6 +270,13 @@ generation_config.json	model-00005-of-00006.safetensors  tokenizer.model
 
 </details>
 
+#### Optimizing writing checkpoints
+Writing models to Cloud Object Storage (COS) is an expensive operation. Saving model checkpoints to a local directory causes much faster training times than writing to COS. You can use `output_dir` and `save_model_dir` to control which type of storage you write your checkpoints and final model to.
+
+You can set `output_dir` to a local directory and set `save_model_dir` to COS to save time on write operations while ensuring checkpoints are saved.
+
+In order to achieve the fastest train time, set `save_strategy="no"`, as saving no checkpoints except for the final model will remove intermediate write operations all together.
+
 ## Tuning Techniques:
 
 ### LoRA Tuning Example