Skip to content

Commit f7fda2b

Browse files
committed
fix: make dataset robust to empty samples
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
1 parent b03c9c7 commit f7fda2b

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

tuning/utils/data_loaders.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,11 @@ def __iter__(self):
109109
sample[self.tokens_field] = self.tokenizer.encode(
110110
sample[self.text_field]
111111
)
112+
if not sample[self.tokens_field]:
113+
logger.warning(
114+
f"skipping an empty sample : {sample[self.tokens_field]}"
115+
)
116+
continue
112117
except Exception as e: # pylint: disable=broad-exception-caught
113118
logger.warning(
114119
"failed to tokenize the data {} of type {}.".format(

0 commit comments

Comments
 (0)