Commit 3827a23
ytian218
server: validate n_batch == n_ubatch for embeddings (#6263)
Fixes #6263 where server accepts mismatched batch/ubatch values with
embeddings, leading to suboptimal or incorrect behavior.
Problem: Embeddings and reranking use non-causal attention which requires
all tokens to be processed within a single ubatch. When n_batch != n_ubatch,
the configuration is incoherent. Default values differ (n_batch=2048,
n_ubatch=512), so users encounter this frequently.
Solution:
- Add parameter validation in main() after common_params_parse()
- When embeddings enabled and n_batch != n_ubatch:
* Log warnings explaining the requirement
* Automatically set both to min(n_batch, n_ubatch)
* Ensure coherent configuration
This follows the auto-correction approach suggested by @mirekphd
and provides better UX than strict rejection.
Testing:
✅ Builds successfully
✅ Validation triggers: -b 2048 -ub 512 --embedding → logs warnings, adjusts both to 512
✅ No false positives: -b 512 -ub 512 --embedding → no warnings
✅ Verified on macOS M3 Pro with embedding model1 parent 5806286 commit 3827a23
1 file changed
+15
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
83 | 85 | | |
84 | 86 | | |
85 | | - | |
86 | | - | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
87 | 93 | | |
88 | 94 | | |
89 | 95 | | |
| |||
0 commit comments