Bug fixes #453

oleksost · 2026-01-19T21:11:40Z

✨ Description

Test do not import from apriel2 modelling file since it has cuda imports
int16 addition overflow: when adding python int with numpy int16, addition is performed in int 16 resulting in overflow
bunch of tests from external module were not marked with @requires_cuda

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

List the key changes introduced in this PR:

Change A
Change B

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

📜 I have read and followed the contributing guidelines.
🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
🎉 The functionality is complete, and I have tested the changes.
📝 I have updated the documentation if needed.
⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

🐋 I have updated the Docker configuration or dependencies, if applicable.
🔄 I have ensured compatibility with the existing setup after dependency changes.

Testing

🧪 I have added or updated tests to cover my changes.
✔️ New and existing tests pass locally with my changes.
🚦 I have tested these changes on GPUs and verified training stability.
🏋️ I have tested the changes on realistic training workloads, if applicable.

Performance Impact

📊 I have run benchmarks where applicable to evaluate the performance impact.
✅ The benchmarks show no performance regression.
🚀 The benchmarks indicate a potential performance improvement.
⚠️ The benchmarks indicate a potential performance degradation.
📈 I have provided benchmark results and detailed any performance impact below, if applicable.

📊 Performance Impact Details

If there is any impact on performance, describe it and provide benchmark results, if applicable:

🗒️ Additional Notes

Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.

jlamypoirier · 2026-01-19T21:40:42Z

fast_llm/data/sample/token.py

        # Torch doesn't support type promotion between signed and unsigned types, so we convert here to avoid issues.
        # Convert begin and end to int to avoid numpy dtype overflow when adding to begin_
-        return TokenSample(self._tokens[begin_ + begin : begin_ + end].to(torch.int64), [end - begin])
+        return TokenSample(self._tokens[begin_ + int(begin) : begin_ + int(end)].to(torch.int64), [end - begin])


These should already be integers?

without test_checkpoint_and_eval[apriel2_text_all_hybrid] fails with OverflowError: Python integer 44345 out of bounds for int16

That points to another issue upstream, given the method signature

def get_unsigned_integer_type(max_size: int) -> DataType: # TODO: Use uint types (recently added for torch, not enough methods supported yet) if max_size < 2**8: return DataType.uint8 elif max_size < 2**15: return DataType.int16 elif max_size < 2**31: return DataType.int32 else: return DataType.int64

Here only the uint8 is unsigned, others are signed.

we probably need signed ints everywhere as inferred by the fact that we have this max token_start_index_in_document = max(token_start - token_count, 0) operation here, which underflows in tests with unsigned uint8.

The unsigned integer should be fine, the problem is we should convert to python int before doing this kind of operation. In this line token_start is a python int and token_count is a tensor, so the operation isn't really safe in signed ints either.

…verflow

fast_llm/data/dataset/sampled.py

…nteger overflow" This reverts commit 2ba1401.

…instead of uint8.

…se int8 instead of uint8." This reverts commit a466078.

oleksost added 2 commits January 19, 2026 20:59

token sample int

9ad5a90

tests should not import from modeling

23eecf5

oleksost requested review from jlamypoirier and tscholak January 19, 2026 21:11

tests should not import from modeling without try

06f7570

jlamypoirier reviewed Jan 19, 2026

View reviewed changes

oleksost added 3 commits January 19, 2026 21:41

remove module level checks

086efc4

convert token count to Python int to prevent numpy unsigned integer o…

2ba1401

…verflow

add requires_cuda decorator to relevant test classes and parameters

e5b7da4

oleksost commented Jan 19, 2026

View reviewed changes

fast_llm/data/dataset/sampled.py Outdated Show resolved Hide resolved

oleksost added 3 commits January 19, 2026 22:26

Revert "convert token count to Python int to prevent numpy unsigned i…

89c4952

…nteger overflow" This reverts commit 2ba1401.

undo 9ad5a90

e50a954

replace get_unsigned_integer_type with get_integer_type and use int8 …

a466078

…instead of uint8.

oleksost requested a review from jlamypoirier January 19, 2026 22:36

oleksost added 3 commits January 20, 2026 16:01

Revert "replace get_unsigned_integer_type with get_integer_type and u…

106a9ea

…se int8 instead of uint8." This reverts commit a466078.

make sure token count is an int

74279c9

Merge branch 'main' into bug_fixes

e5d2d85

jlamypoirier approved these changes Jan 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug fixes #453

Bug fixes #453

Uh oh!

oleksost commented Jan 19, 2026 •

edited

Loading

Uh oh!

jlamypoirier Jan 19, 2026

Uh oh!

oleksost Jan 19, 2026

Uh oh!

jlamypoirier Jan 19, 2026 •

edited

Loading

Uh oh!

oleksost Jan 19, 2026

Uh oh!

oleksost Jan 19, 2026 •

edited

Loading

Uh oh!

jlamypoirier Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bug fixes #453

Are you sure you want to change the base?

Bug fixes #453

Uh oh!

Conversation

oleksost commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Description

🔍 Type of change

📝 Changes

✅ Checklist

General

Dependencies and Configuration

Testing

Performance Impact

📊 Performance Impact Details

🗒️ Additional Notes

Uh oh!

jlamypoirier Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

oleksost Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

jlamypoirier Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oleksost Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

oleksost Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlamypoirier Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oleksost commented Jan 19, 2026 •

edited

Loading

jlamypoirier Jan 19, 2026 •

edited

Loading

oleksost Jan 19, 2026 •

edited

Loading