-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Hi this is my config
services:
main:
image: evilfreelancer/llama.cpp-rpc:latest
restart: unless-stopped
volumes:
- ./models:/app/models
environment:
# Operation mode (RPC client in API server format)
APP_MODE: server
# Path to the model weights, preloaded inside the container
APP_MODEL: /app/models/Qwen3-4B-Instruct-2507-UD-Q8_K_XL.gguf
# Addresses of the RPC servers the client will interact with
APP_RPC_BACKENDS: backend-cuda01:50052,backend-cuda02:50052
ports:
- "8080:8080"
networks:
- ai-network
depends_on:
- backend-cuda01
- backend-cuda02
backend-cuda01:
image: evilfreelancer/llama.cpp-rpc:latest-cuda
restart: unless-stopped
environment:
# Operation mode (RPC server)
APP_MODE: backend
# Amount of system RAM available to the RPC server (in Megabytes)
APP_MEM: 8192
networks:
- ai-network
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0']
capabilities: [ gpu ]
backend-cuda02:
image: evilfreelancer/llama.cpp-rpc:latest-cuda
restart: "unless-stopped"
environment:
# Operation mode (RPC server)
APP_MODE: backend
# Amount of GPU memory available to the RPC server (in Megabytes)
APP_MEM: 8192
networks:
- ai-network
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['1']
capabilities: [ gpu ]
networks:
ai-network:
driver: bridge
ipam:
config:
- subnet: 10.10.12.0/24
logs from backend
Executing command: /app/rpc-server --host 0.0.0.0 --port 50052 --mem 8192 --threads 16
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING: Host ('0.0.0.0') is != '127.0.0.1'
Never expose the RPC server to an open network!
This is an experimental feature and is not secure!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1080, compute capability 6.1, VMM: yes
logs from main
Executing command: /app/llama-server --host 0.0.0.0 --port 8080 --model /app/models/Qwen3-4B-Instruct-2507-UD-Q8_K_XL.gguf --repeat-penalty 1.0 --gpu-layers 99 --rpc backend-cuda01:50052,backend-cuda02:50052
same logs for both main and backend repeated on every restart
then containers don't start and restarts without any reason
any help is appreciated
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels