Skip to content

All containers keeps restarting #2

@smartmindkw

Description

@smartmindkw

Hi this is my config

services:

  main:
    image: evilfreelancer/llama.cpp-rpc:latest
    restart: unless-stopped
    volumes:
      - ./models:/app/models
    environment:
      # Operation mode (RPC client in API server format)
      APP_MODE: server
      # Path to the model weights, preloaded inside the container
      APP_MODEL: /app/models/Qwen3-4B-Instruct-2507-UD-Q8_K_XL.gguf
      # Addresses of the RPC servers the client will interact with
      APP_RPC_BACKENDS: backend-cuda01:50052,backend-cuda02:50052
    ports:
      - "8080:8080"
    networks:
      - ai-network
    depends_on:
      - backend-cuda01
      - backend-cuda02

  backend-cuda01:
    image: evilfreelancer/llama.cpp-rpc:latest-cuda
    restart: unless-stopped
    environment:
      # Operation mode (RPC server)
      APP_MODE: backend
      # Amount of system RAM available to the RPC server (in Megabytes)
      APP_MEM: 8192
    networks:
      - ai-network
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0']
              capabilities: [ gpu ]

  backend-cuda02:
    image: evilfreelancer/llama.cpp-rpc:latest-cuda
    restart: "unless-stopped"
    environment:
      # Operation mode (RPC server)
      APP_MODE: backend
      # Amount of GPU memory available to the RPC server (in Megabytes)
      APP_MEM: 8192
    networks:
      - ai-network
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['1']
              capabilities: [ gpu ]


networks:
  ai-network:
    driver: bridge
    ipam:
      config:
        - subnet: 10.10.12.0/24

logs from backend

Executing command: /app/rpc-server --host 0.0.0.0 --port 50052 --mem 8192 --threads 16


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING: Host ('0.0.0.0') is != '127.0.0.1'
         Never expose the RPC server to an open network!
         This is an experimental feature and is not secure!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1080, compute capability 6.1, VMM: yes

logs from main

Executing command: /app/llama-server --host 0.0.0.0 --port 8080 --model /app/models/Qwen3-4B-Instruct-2507-UD-Q8_K_XL.gguf --repeat-penalty 1.0 --gpu-layers 99 --rpc backend-cuda01:50052,backend-cuda02:50052

same logs for both main and backend repeated on every restart

then containers don't start and restarts without any reason

any help is appreciated

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions