Suppress spammy warnings #120

daviswer · 2024-10-10T18:51:36Z

Current code prints multiple warnings from each gpu at the start of training, which clutters up the log. Updates dataloader and process group constructors to eliminate these warnings, respectively:

/app/fms-fsdp/fms_fsdp/utils/dataloader_utils.py:27: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  data_seq = torch.tensor(data_seq, dtype=torch.int)

and

[rank1]:[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)

No practical effect on code behavior otherwise.

Signed-off-by: Davis Wertheimer <davis.wertheimer@ibm.com>

daviswer requested a review from lchu6 October 10, 2024 18:51

daviswer force-pushed the less-spammy branch from 514ffd5 to c54c7e5 Compare October 10, 2024 18:53

daviswer added 3 commits October 11, 2024 16:14

Stop over-tensoring input lines

7193031

Signed-off-by: Davis Wertheimer <davis.wertheimer@ibm.com>

Update nccl error handling flag

9adccc0

Signed-off-by: Davis Wertheimer <davis.wertheimer@ibm.com>

Resync branch (#121)

5ce7df3

Signed-off-by: Davis Wertheimer <davis.wertheimer@ibm.com>

daviswer force-pushed the less-spammy branch from 78a239d to 5ce7df3 Compare October 11, 2024 20:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suppress spammy warnings #120

Suppress spammy warnings #120

Uh oh!

daviswer commented Oct 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Suppress spammy warnings #120

Are you sure you want to change the base?

Suppress spammy warnings #120

Uh oh!

Conversation

daviswer commented Oct 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants