You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* server/webui: add server-side WebUI config support
Add CLI arguments --webui-config (inline JSON) and --webui-config-file
(file path) to configure WebUI default settings from server side.
Backend changes:
- Parse JSON once in server_context::load_model() for performance
- Cache parsed config in webui_settings member (zero overhead on /props)
- Add proper error handling in router mode with try/catch
- Expose webui_settings in /props endpoint for both router and child modes
Frontend changes:
- Add 14 configurable WebUI settings via parameter sync
- Add tests for webui settings extraction
- Fix subpath support with base path in API calls
Addresses feedback from @ngxson and @ggerganov
* server: address review feedback from ngxson
* server: regenerate README with llama-gen-docs
@@ -82,13 +82,16 @@ For the ful list of features, please refer to [server's changelog](https://githu
82
82
|`-sm, --split-mode {none,layer,row}`| how to split the model across multiple GPUs, one of:<br/>- none: use one GPU only<br/>- layer (default): split layers and KV across GPUs<br/>- row: split rows across GPUs<br/>(env: LLAMA_ARG_SPLIT_MODE) |
83
83
|`-ts, --tensor-split N0,N1,N2,...`| fraction of the model to offload to each GPU, comma-separated list of proportions, e.g. 3,1<br/>(env: LLAMA_ARG_TENSOR_SPLIT) |
84
84
|`-mg, --main-gpu INDEX`| the GPU to use for the model (with split-mode = none), or for intermediate results and KV (with split-mode = row) (default: 0)<br/>(env: LLAMA_ARG_MAIN_GPU) |
85
+
|`-fit, --fit [on\|off]`| whether to adjust unset arguments to fit in device memory ('on' or 'off', default: 'on')<br/>(env: LLAMA_ARG_FIT) |
86
+
|`-fitt, --fit-target MiB`| target margin per device for --fit option, default: 1024<br/>(env: LLAMA_ARG_FIT_TARGET) |
87
+
|`-fitc, --fit-ctx N`| minimum ctx size that can be set by --fit option, default: 4096<br/>(env: LLAMA_ARG_FIT_CTX) |
85
88
|`--check-tensors`| check model tensor data for invalid values (default: false) |
86
-
|`--override-kv KEY=TYPE:VALUE`| advanced option to override model metadata by key. may be specified multiple times.<br/>types: int, float, bool, str. example: --override-kv tokenizer.ggml.add_bos_token=bool:false|
89
+
|`--override-kv KEY=TYPE:VALUE,...`| advanced option to override model metadata by key. to specify multiple overrides, either use comma-separated or repeat this argument.<br/>types: int, float, bool, str. example: --override-kv tokenizer.ggml.add_bos_token=bool:false,tokenizer.ggml.add_eos_token=bool:false|
87
90
|`--op-offload, --no-op-offload`| whether to offload host tensor operations to device (default: true) |
88
-
|`--lora FNAME`| path to LoRA adapter (can be repeated to use multiple adapters) |
89
-
|`--lora-scaled FNAMESCALE`| path to LoRA adapter with user defined scaling (can be repeated to use multiple adapters)|
90
-
|`--control-vector FNAME`| add a control vector<br/>note: this argument can be repeated to add multiple control vectors |
91
-
|`--control-vector-scaled FNAMESCALE`| add a control vector with user defined scaling SCALE<br/>note: this argument can be repeated to add multiple scaled control vectors|
91
+
|`--lora FNAME`| path to LoRA adapter (use comma-separated values to load multiple adapters) |
92
+
|`--lora-scaled FNAME:SCALE,...`| path to LoRA adapter with user defined scaling (format: FNAME:SCALE,...)<br/>note: use comma-separated values|
93
+
|`--control-vector FNAME`| add a control vector<br/>note: use comma-separated values to add multiple control vectors |
94
+
|`--control-vector-scaled FNAME:SCALE,...`| add a control vector with user defined scaling SCALE<br/>note: use comma-separated values (format: FNAME:SCALE,...)|
92
95
|`--control-vector-layer-range START END`| layer range to apply the control vector(s) to, start and end inclusive |
93
96
|`-m, --model FNAME`| model path to load<br/>(env: LLAMA_ARG_MODEL) |
94
97
|`-mu, --model-url MODEL_URL`| model download url (default: unused)<br/>(env: LLAMA_ARG_MODEL_URL) |
@@ -120,7 +123,7 @@ For the ful list of features, please refer to [server's changelog](https://githu
120
123
|`--sampling-seq, --sampler-seq SEQUENCE`| simplified sequence for samplers that will be used (default: edskypmxt) |
121
124
|`--ignore-eos`| ignore end of stream token and continue generating (implies --logit-bias EOS-inf) |
|`--webui, --no-webui`| whether to enable the Web UI (default: enabled)<br/>(env: LLAMA_ARG_WEBUI) |
181
186
|`--embedding, --embeddings`| restrict to only support embedding use case; use only with dedicated embedding models (default: disabled)<br/>(env: LLAMA_ARG_EMBEDDINGS) |
182
187
|`--reranking, --rerank`| enable reranking endpoint on server (default: disabled)<br/>(env: LLAMA_ARG_RERANKING) |
0 commit comments