Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 31 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -85,26 +85,54 @@ ENTRYPOINT ["/app/model-runner"]
# --- vLLM variant ---
FROM llamacpp AS vllm

ARG VLLM_VERSION=0.12.0
ARG VLLM_VERSION=0.15.1
ARG VLLM_CUDA_VERSION=cu130
ARG VLLM_PYTHON_TAG=cp38-abi3
ARG TARGETARCH
# Build vLLM from source on ARM64 for CUDA 13 compatibility (e.g., NVIDIA DGX).
# Set to "false" to use prebuilt wheels instead (faster build, but may not work on CUDA 13).
ARG VLLM_ARM64_BUILD_FROM_SOURCE=true

USER root

RUN apt update && apt install -y python3 python3-venv python3-dev curl ca-certificates build-essential && rm -rf /var/lib/apt/lists/*
# Install build dependencies including CUDA toolkit for compiling vLLM from source on ARM64
# Note: Base image already has CUDA repo configured, just install cuda-toolkit directly
RUN apt update && apt install -y \
python3 python3-venv python3-dev \
curl ca-certificates build-essential \
git cmake ninja-build \
&& if [ "$(uname -m)" = "aarch64" ] && [ "$VLLM_ARM64_BUILD_FROM_SOURCE" = "true" ]; then \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

You're using uname -m to check the architecture. While this works, it's more idiomatic and robust in Dockerfiles to use the built-in TARGETARCH build argument, which is explicitly provided by the builder for the target platform. This avoids any potential discrepancies with the build environment and improves consistency with other parts of the Dockerfile.

    && if [ "$TARGETARCH" = "arm64" ] && [ "$VLLM_ARM64_BUILD_FROM_SOURCE" = "true" ]; then \

apt install -y cuda-toolkit-13-0; \
fi \
&& rm -rf /var/lib/apt/lists/*

# Set CUDA paths for ARM64 builds
ENV PATH=/usr/local/cuda-13.0/bin:$PATH
ENV LD_LIBRARY_PATH=/usr/local/cuda-13.0/lib64:$LD_LIBRARY_PATH
Comment on lines +110 to +111
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PATH and LD_LIBRARY_PATH environment variables for the CUDA toolkit are set unconditionally for this build stage. However, the CUDA toolkit is only installed for arm64 builds when VLLM_ARM64_BUILD_FROM_SOURCE is true. For other build configurations (like amd64), these paths will point to non-existent directories. This pollutes the environment and could potentially lead to subtle build issues. Consider setting these environment variables only when they are actually needed, for example by moving this logic into a script that is executed within the RUN instruction where the build from source happens.


RUN mkdir -p /opt/vllm-env && chown -R modelrunner:modelrunner /opt/vllm-env

USER modelrunner

# Install uv and vLLM as modelrunner user
# For AMD64: Use prebuilt CUDA 13 wheels (PyTorch pulled as dependency)
# For ARM64 with VLLM_ARM64_BUILD_FROM_SOURCE=true: Build from source against PyTorch nightly
# For ARM64 with VLLM_ARM64_BUILD_FROM_SOURCE=false: Use prebuilt wheel (old behavior)
RUN curl -LsSf https://astral.sh/uv/install.sh | sh \
&& ~/.local/bin/uv venv --python /usr/bin/python3 /opt/vllm-env \
&& if [ "$TARGETARCH" = "amd64" ]; then \
WHEEL_ARCH="manylinux_2_31_x86_64"; \
WHEEL_ARCH="manylinux_2_35_x86_64"; \
WHEEL_URL="https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}%2B${VLLM_CUDA_VERSION}-${VLLM_PYTHON_TAG}-${WHEEL_ARCH}.whl"; \
~/.local/bin/uv pip install --python /opt/vllm-env/bin/python "$WHEEL_URL"; \
elif [ "$VLLM_ARM64_BUILD_FROM_SOURCE" = "true" ]; then \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition for building from source only checks VLLM_ARM64_BUILD_FROM_SOURCE. While the variable name implies it's for arm64, and the logic for installing dependencies also checks the architecture, it's safer and clearer to be explicit here as well. Adding a check for TARGETARCH makes the intent unambiguous and prevents this branch from being accidentally taken on other architectures if the build arguments are misconfigured.

    elif [ "$TARGETARCH" = "arm64" ] && [ "$VLLM_ARM64_BUILD_FROM_SOURCE" = "true" ]; then \

~/.local/bin/uv pip install --python /opt/vllm-env/bin/python \
torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu130 \
&& git clone --depth 1 --branch v${VLLM_VERSION} https://github.com/vllm-project/vllm.git /tmp/vllm \
&& cd /tmp/vllm \
&& /opt/vllm-env/bin/python use_existing_torch.py \
&& ~/.local/bin/uv pip install --python /opt/vllm-env/bin/python -r requirements/build.txt \
&& VLLM_TARGET_DEVICE=cuda ~/.local/bin/uv pip install --python /opt/vllm-env/bin/python . --no-build-isolation \
&& rm -rf /tmp/vllm; \
else \
~/.local/bin/uv pip install --python /opt/vllm-env/bin/python "vllm==${VLLM_VERSION}"; \
fi
Expand Down
5 changes: 3 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ DOCKER_TARGET ?= final-llamacpp
PORT := 8080
MODELS_PATH := $(shell pwd)/models-store
LLAMA_ARGS ?=
EXTRA_DOCKER_BUILD_ARGS ?=
DOCKER_BUILD_ARGS := \
--load \
--platform linux/$(shell docker version --format '{{.Server.Arch}}') \
Expand Down Expand Up @@ -84,11 +85,11 @@ lint:

# Build Docker image
docker-build:
docker buildx build $(DOCKER_BUILD_ARGS) .
docker buildx build $(DOCKER_BUILD_ARGS) $(EXTRA_DOCKER_BUILD_ARGS) .

# Build multi-platform Docker image
docker-build-multiplatform:
docker buildx build --platform linux/amd64,linux/arm64 $(DOCKER_BUILD_ARGS) .
docker buildx build --platform linux/amd64,linux/arm64 $(DOCKER_BUILD_ARGS) $(EXTRA_DOCKER_BUILD_ARGS) .

# Run in Docker container with TCP port access and mounted model storage
docker-run: docker-build
Expand Down