Commit cc00104
ytian218
server: validate n_batch == n_ubatch for embeddings (#6263)
Fixes #6263 where server accepts mismatched batch/ubatch values with
embeddings, leading to suboptimal or incorrect behavior.
Problem: Embeddings and reranking use non-causal attention which requires
all tokens to be processed within a single ubatch. When n_batch != n_ubatch,
the configuration is incoherent. Default values differ (n_batch=2048,
n_ubatch=512), so users encounter this frequently.
Solution:
- Add parameter validation in main() after common_params_parse()
- When embeddings enabled and n_batch != n_ubatch:
* Log warnings explaining the requirement
* Automatically set both to min(n_batch, n_ubatch)
* Ensure coherent configuration
This follows the auto-correction approach suggested by @mirekphd
and provides better UX than strict rejection.
Testing:
✅ Builds successfully
✅ Validation triggers: -b 2048 -ub 512 --embedding → logs warnings, adjusts both to 512
✅ No false positives: -b 512 -ub 512 --embedding → no warnings
✅ Verified on macOS M3 Pro with embedding model1 parent 583cb83 commit cc00104
1 file changed
+11
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3657 | 3657 | | |
3658 | 3658 | | |
3659 | 3659 | | |
| 3660 | + | |
| 3661 | + | |
| 3662 | + | |
| 3663 | + | |
| 3664 | + | |
| 3665 | + | |
| 3666 | + | |
| 3667 | + | |
| 3668 | + | |
| 3669 | + | |
| 3670 | + | |
3660 | 3671 | | |
3661 | 3672 | | |
3662 | 3673 | | |
| |||
0 commit comments