Skip to content

Conversation

@daviswer
Copy link
Collaborator

Current code prints multiple warnings from each gpu at the start of training, which clutters up the log. Updates dataloader and process group constructors to eliminate these warnings, respectively:

/app/fms-fsdp/fms_fsdp/utils/dataloader_utils.py:27: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  data_seq = torch.tensor(data_seq, dtype=torch.int)

and

[rank1]:[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)

No practical effect on code behavior otherwise.

Signed-off-by: Davis Wertheimer <davis.wertheimer@ibm.com>
Signed-off-by: Davis Wertheimer <davis.wertheimer@ibm.com>
Signed-off-by: Davis Wertheimer <davis.wertheimer@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants