Skip to content

Commit 603e9a4

Browse files
CLI: llama cli and completion cosmetics
1 parent b1f3a6e commit 603e9a4

File tree

6 files changed

+10
-7
lines changed

6 files changed

+10
-7
lines changed

.github/ISSUE_TEMPLATE/019-bug-misc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ body:
4444
- Documentation/Github
4545
- libllama (core library)
4646
- llama-cli
47+
- llama-completion
4748
- llama-server
4849
- llama-bench
4950
- llama-quantize

.github/copilot-instructions.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,8 @@ Add `ggml-ci` to commit message to trigger heavy CI workloads on the custom CI i
183183

184184
### Built Executables (in `build/bin/`)
185185
Primary tools:
186-
- **`llama-cli`**: Main inference tool
186+
- **`llama-cli`**: Main CLI tool
187+
- **`llama-completion`**: Text completion tool
187188
- **`llama-server`**: OpenAI-compatible HTTP server
188189
- **`llama-quantize`**: Model quantization utility
189190
- **`llama-perplexity`**: Model evaluation tool

docs/backend/SYCL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ SYCL backend supports Intel GPU Family:
116116
*Notes:*
117117

118118
- **Memory**
119-
- The device memory is a limitation when running a large model. The loaded model size, *`llm_load_tensors: buffer_size`*, is displayed in the log when running `./bin/llama-cli`.
119+
- The device memory is a limitation when running a large model. The loaded model size, *`llm_load_tensors: buffer_size`*, is displayed in the log when running `./bin/llama-completion`.
120120
- Please make sure the GPU shared memory from the host is large enough to account for the model's size. For e.g. the *llama-2-7b.Q4_0* requires at least 8.0GB for integrated GPU and 4.0GB for discrete GPU.
121121

122122
- **Execution Unit (EU)**

docs/backend/hexagon/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ To generate an installable "package" simply use cmake --install:
6262
...
6363
-- Installing: /workspace/pkg-adb/llama.cpp/bin/llama-bench
6464
-- Installing: /workspace/pkg-adb/llama.cpp/bin/llama-cli
65+
-- Installing: /workspace/pkg-adb/llama.cpp/bin/llama-completion
6566
...
6667
```
6768

docs/backend/hexagon/developer.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ M=gpt-oss-20b-Q4_0.gguf NDEV=4 D=HTP0,HTP1,HTP2,HTP3 P=surfing.txt scripts/snapd
5353
...
5454
LD_LIBRARY_PATH=/data/local/tmp/llama.cpp/lib
5555
ADSP_LIBRARY_PATH=/data/local/tmp/llama.cpp/lib
56-
GGML_HEXAGON_NDEV=4 ./bin/llama-cli --no-mmap -m /data/local/tmp/llama.cpp/../gguf/gpt-oss-20b-Q4_0.gguf
56+
GGML_HEXAGON_NDEV=4 ./bin/llama-completion --no-mmap -m /data/local/tmp/llama.cpp/../gguf/gpt-oss-20b-Q4_0.gguf
5757
-t 4 --ctx-size 8192 --batch-size 128 -ctk q8_0 -ctv q8_0 -fa on -ngl 99 --device HTP0,HTP1,HTP2,HTP3 -no-cnv -f surfing.txt
5858
...
5959
llama_model_loader: - type f32: 289 tensors

scripts/fetch_server_test_models.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -74,21 +74,21 @@ def collect_hf_model_test_parameters(test_file) -> Generator[HuggingFaceModel, N
7474
for m in models:
7575
logging.info(f' - {m.hf_repo} / {m.hf_file}')
7676

77-
cli_path = os.environ.get(
77+
completion_path = os.environ.get(
7878
'LLAMA_CLI_BIN_PATH',
7979
os.path.join(
8080
os.path.dirname(__file__),
81-
'../build/bin/Release/llama-cli.exe' if os.name == 'nt' else '../build/bin/llama-cli'))
81+
'../build/bin/Release/llama-completion.exe' if os.name == 'nt' else '../build/bin/llama-completion'))
8282

8383
for m in models:
8484
if '<' in m.hf_repo or (m.hf_file is not None and '<' in m.hf_file):
8585
continue
8686
if m.hf_file is not None and '-of-' in m.hf_file:
8787
logging.warning(f'Skipping model at {m.hf_repo} / {m.hf_file} because it is a split file')
8888
continue
89-
logging.info(f'Using llama-cli to ensure model {m.hf_repo}/{m.hf_file} was fetched')
89+
logging.info(f'Using llama-completion to ensure model {m.hf_repo}/{m.hf_file} was fetched')
9090
cmd = [
91-
cli_path,
91+
completion_path,
9292
'-hfr', m.hf_repo,
9393
*([] if m.hf_file is None else ['-hff', m.hf_file]),
9494
'-n', '1',

0 commit comments

Comments
 (0)