CerebriumAI · elijah-rou · Jan 21, 2026 · Jan 21, 2026 · Jan 21, 2026 · Jan 21, 2026
@@ -51,40 +51,71 @@ CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8192
 When creating a Dockerfile for Cerebrium, there are three key requirements:
 
 1. You must expose a port using the `EXPOSE` command - this port will be referenced later in your `cerebrium.toml` configuration
-2. A `CMD` command is required to specify what runs when the container starts (typically your server process)
+2. Either a `CMD` or `ENTRYPOINT` directive must be defined in your Dockerfile OR the `entrypoint` key in your `cerebrium.toml` under `[runtime.docker]`. This specifies what runs when the container starts. The TOML configuration will take precedence
 3. Set the working directory using `WORKDIR` to ensure your application runs from the correct location (defaults to root directory if not specified)
 
-Update cerebrium.toml to include a custom runtime section with the `dockerfile_path` parameter:
+Update cerebrium.toml to include a docker runtime section with the `dockerfile_path` parameter:
 
 ```toml
-[cerebrium.runtime.custom]
+[deployment]
+name = "my-docker-app"
+
+[runtime.docker]
+dockerfile_path = "./Dockerfile"
 port = 8192
 healthcheck_endpoint = "/health"
 readycheck_endpoint = "/ready"
-dockerfile_path = "./Dockerfile"
 ```
 
-The configuration requires three key parameters:
+The configuration requires the following parameters:
 
+- `dockerfile_path`: The relative path to the Dockerfile used to build the app.
 - `port`: The port the server listens on.
 - `healthcheck_endpoint`: The endpoint used to confirm instance health. If unspecified, defaults to a TCP ping on the configured port. If the health check registers a non-200 response, it will be considered _unhealthy_, and be restarted should it not recover timely.
 - `readycheck_endpoint`: The endpoint used to confirm if the instance is ready to receive. If unspecified, defaults to a TCP ping on the configured port. If the ready check registers a non-200 response, it will not be a viable target for request routing.
-- `dockerfile_path`: The relative path to the Dockerfile used to build the app.
+- `entrypoint` (optional): The command to start the application. Will be required if `CMD` or `ENTRYPOINT` is not defined in the given dockerfile.
+
+### Entrypoint Precedence
 
-If a Dockerfile does not contain a `CMD` clause, specifying the `entrypoint` parameter in the `cerebrium.toml` file is required.
+<Info>
+  The `entrypoint` parameter in `cerebrium.toml` **always takes precedence**
+  over the `CMD` or `ENTRYPOINT` instruction in your Dockerfile. If you specify
+  an `entrypoint` in your TOML configuration, it will be used regardless of what
+  `CMD` or `ENTRYPOINT` is defined in your Dockerfile.
+</Info>
+
+If your Dockerfile does not contain a `CMD` or `ENTRYPOINT` instruction, you **must** specify the `entrypoint` parameter in your `cerebrium.toml`:
 
 ```toml
-[cerebrium.runtime.custom]
+[deployment]
+name = "my-docker-app"
+
+[runtime.docker]
+dockerfile_path = "./Dockerfile"
 entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8192"]
-...
+port = 8192
+healthcheck_endpoint = "/health"
+readycheck_endpoint = "/ready"
+```
+
+If you want to override your Dockerfile's `CMD` at deploy time without modifying the Dockerfile, simply add the `entrypoint` parameter to your TOML configuration:
+
+```toml
+[deployment]
+name = "my-docker-app"
+
+[runtime.docker]
+dockerfile_path = "./Dockerfile"
+# This will override any CMD in your Dockerfile
+entrypoint = ["python", "server.py", "--port", "8192"]
+port = 8192
 ```
 
 <Warning>
   When specifying a `dockerfile_path`, all dependencies and necessary commands
   should be installed and executed within the Dockerfile. Dependencies listed
-  under `cerebrium.dependencies.*`, as well as
-  `cerebrium.deployment.shell_commands` and
-  `cerebrium.deployment.pre_build_commands`, will be ignored.
+  under `dependencies.*`, as well as `shell_commands` and `pre_build_commands`,
+  will be ignored.
 </Warning>
 
 ## Building Generic Dockerized Apps
@@ -165,9 +196,12 @@ CMD ["dumb-init", "--", "/rs_server"]
 Similarly to the FastAPI webserver, the application should be configured in the `cerebrium.toml` file:
 
 ```toml
-[cerebrium.runtime.custom]
+[deployment]
+name = "rust-server"
+
+[runtime.docker]
+dockerfile_path = "./Dockerfile"
 port = 8192
 healthcheck_endpoint = "/health"
 readycheck_endpoint = "/ready"
-dockerfile_path = "./Dockerfile"
 ```
@@ -3,7 +3,7 @@ title: "Custom Python Web Servers"
 description: "Run ASGI/WSGI Python apps on Cerebrium"
 ---
 
-While Cerebrium's default runtime works well for most app needs, teams sometimes need more control over their web server implementation. Using ASGI or WSGI servers through Cerebrium's custom runtime feature enables capabilities like custom authentication, dynamic batching, frontend dashboards, public endpoints, and WebSocket connections.
+While Cerebrium's default runtime works well for most app needs, teams sometimes need more control over their web server implementation. Using ASGI or WSGI servers through Cerebrium's Python runtime feature enables capabilities like custom authentication, dynamic batching, frontend dashboards, public endpoints, and WebSocket connections.
 
 ## Setting Up Custom Servers
 
@@ -26,29 +26,46 @@ def ready():
     return "OK"
 ```
 
-Configure this server in `cerebrium.toml` by adding a custom runtime section:
+Configure this server in `cerebrium.toml` by adding a Python runtime section:
 
 ```toml
-[cerebrium.runtime.custom]
-port = 5000
+[deployment]
+name = "my-fastapi-app"
+
+[runtime.python]
 entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "5000"]
+port = 5000
 healthcheck_endpoint = "/health"
 readycheck_endpoint = "/ready"
 
-[cerebrium.dependencies.pip]
+[dependencies.pip]
 pydantic = "latest"
 numpy = "latest"
 loguru = "latest"
 fastapi = "latest"
+uvicorn = "latest"
 ```
 
-The configuration requires three key parameters:
+The configuration requires the following key parameters:
 
 - `entrypoint`: The command that starts your server
 - `port`: The port your server listens on
 - `healthcheck_endpoint`: The endpoint used to confirm instance health. If unspecified, defaults to a TCP ping on the configured port. If the health check registers a non-200 response, it will be considered _unhealthy_, and be restarted should it not recover timely.
 - `readycheck_endpoint`: The endpoint used to confirm if the instance is ready to receive. If unspecified, defaults to a TCP ping on the configured port. If the ready check registers a non-200 response, it will not be a viable target for request routing.
 
+You can also configure build settings in the Python runtime section:
+
+```toml
+[runtime.python]
+python_version = "3.11"
+docker_base_image_url = "debian:bookworm-slim"
+use_uv = true
+entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
+port = 8000
+healthcheck_endpoint = "/health"
+readycheck_endpoint = "/ready"
+```
+
 <Info>
   For ASGI applications like FastAPI, include the appropriate server package
   (like `uvicorn`) in your dependencies. After deployment, your endpoints become

@@ -21,17 +21,17 @@ Check out the [Introductory Guide](/cerebrium/getting-started/introduction) for
 <Info>
   It is possible to initialize an existing project by adding a `cerebrium.toml`
   file to the root of your codebase, defining your entrypoint (`main.py` if
-  using the default runtime, or adding an entrypoint to the .toml file if using
-  a custom runtime) and including the necessary files in the `deployment`
-  section of your `cerebrium.toml` file.
+  using the default cortex runtime, or adding an entrypoint to the runtime
+  section if using a python or docker runtime) and including the necessary files
+  in the `deployment` section of your `cerebrium.toml` file.
 </Info>
 
 ## Hardware Configuration
 
 Cerebrium provides flexible hardware options to match app requirements. The basic configuration specifies GPU type and memory allocations.
 
 ```toml
-[cerebrium.hardware]
+[hardware]
 compute = "AMPERE_A10" # GPU selection
 memory = 16.0          # Memory allocation in GB
 cpu = 4                # Number of CPU cores
@@ -44,11 +44,20 @@ For detailed hardware specifications and performance characteristics see the [GP
 
 ### Selecting a Python Version
 
-The Python runtime version forms the foundation of every Cerebrium app. We currently support versions 3.10 to 3.13. Specify the Python version in the deployment section of the configuration:
+The Python runtime version forms the foundation of every Cerebrium app. We currently support versions 3.10 to 3.13. Specify the Python version in the runtime section of the configuration:
 
 ```toml
-[cerebrium.deployment]
-python_version = 3.11
+[runtime.cortex]
+python_version = "3.11"
+```
+
+Or for custom Python ASGI/WSGI apps:
+
+```toml
+[runtime.python]
+python_version = "3.11"
+entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
+port = 8000
 ```
 
 The Python version affects the entire dependency chain. For instance, some packages may not support newer Python versions immediately after release.
@@ -63,7 +72,7 @@ The Python version affects the entire dependency chain. For instance, some packa
 Python dependencies can be managed directly in TOML or through requirement files. The system caches packages to speed up builds:
 
 ```toml
-[cerebrium.dependencies.pip]
+[dependencies.pip]
 torch = "==2.0.0"
 transformers = "==4.30.0"
 numpy = "latest"
@@ -72,8 +81,8 @@ numpy = "latest"
 Or using an existing requirements file:
 
 ```toml
-[cerebrium.dependencies.paths]
-pip = "requirements.txt"
+[dependencies.pip]
+_file_relative_path = "requirements.txt"
 ```
 
 <Tip>
@@ -85,20 +94,20 @@ The system implements an intelligent caching strategy at the node level. When an
 
 ### Adding APT Packages
 
-System-level packages provide the foundation for many ML apps, handling everything from image-processing libraries to audio codecs. These can be added to the `cerebrium.toml` file under the `[cerebrium.dependencies.apt]` section as follows:
+System-level packages provide the foundation for many ML apps, handling everything from image-processing libraries to audio codecs. These can be added to the `cerebrium.toml` file under the `[dependencies.apt]` section as follows:
 
 ```toml
-[cerebrium.dependencies.apt]
+[dependencies.apt]
 ffmpeg = "latest"
 libopenblas-base = "latest"
 libomp-dev = "latest"
 ```
 
-For teams with standardized system dependencies, text files can be used instead by adding the following to the `[cerebrium.dependencies.paths]` section:
+For teams with standardized system dependencies, text files can be used instead:
 
 ```toml
-[cerebrium.dependencies.paths]
-apt = "deps_folder/pkglist.txt"
+[dependencies.apt]
+_file_relative_path = "deps_folder/pkglist.txt"
 ```
 
 Since APT packages modify the system environment, any changes to these dependencies trigger a full rebuild of the container image. This ensures system-level changes are properly integrated but means builds will take longer than when modifying Python packages alone.
@@ -108,7 +117,7 @@ Since APT packages modify the system environment, any changes to these dependenc
 Conda excels at managing complex system-level Python dependencies, particularly for GPU support and scientific computing:
 
 ```toml
-[cerebrium.dependencies.conda]
+[dependencies.conda]
 cuda = ">=11.7"
 cudatoolkit = "11.7"
 opencv = "latest"
@@ -117,8 +126,8 @@ opencv = "latest"
 Teams using conda environments can specify their environment file:
 
 ```toml
-[cerebrium.dependencies.paths]
-conda = "conda_pkglist.txt"
+[dependencies.conda]
+_file_relative_path = "conda_pkglist.txt"
 ```
 
 Like APT packages, Conda packages often modify system-level components. Changes to Conda dependencies will trigger a full rebuild to ensure all binary dependencies and system libraries are correctly configured. Consider batching Conda dependency updates together to minimize rebuild time.
@@ -132,8 +141,7 @@ Cerebrium's build process includes two specialized command types that execute at
 Pre-build commands execute at the start of the build process, before dependency installation begins. This early execution timing makes them essential for setting up the build environment:
 
 ```toml
-
-[cerebrium.deployment]
+[runtime.cortex]
 pre_build_commands = [
     # Add specialized build tools
     "curl -o /usr/local/bin/pget -L 'https://github.com/replicate/pget/releases/download/v0.6.2/pget_linux_x86_64'",
@@ -148,7 +156,7 @@ Pre-build commands typically handle tasks like installing build tools, configuri
 Shell commands execute after all dependencies install and the application code copies into the container. This later timing ensures access to the complete environment:
 
 ```toml
-[cerebrium.deployment]
+[runtime.cortex]
 shell_commands = [
     # Initialize application resources
     "python -m download_models",
@@ -178,7 +186,7 @@ The base image selection shapes how an app runs in Cerebrium. While the default
 Cerebrium supports several categories of base images to ensure system compatibility such as nvidia, ubuntu and python images.
 
 ```toml
-[cerebrium.deployment]
+[runtime.cortex]
 docker_base_image_url = "debian:bookworm-slim" # Default minimal image
 #docker_base_image_url = "nvidia/cuda:12.0.1-runtime-ubuntu22.04" # CUDA-enabled images
 #docker_base_image_url = "ubuntu:22.04"  # debian images
@@ -205,7 +213,7 @@ docker login -u your-dockerhub-username
 After logging in, you can use the image in your configuration:
 
 ```toml
-[cerebrium.deployment]
+[runtime.cortex]
 docker_base_image_url = "bob/infinity:latest"
 ```
 
@@ -226,27 +234,30 @@ docker_base_image_url = "bob/infinity:latest"
 Public ECR images from the `public.ecr.aws` registry work without authentication:
 
 ```toml
-[cerebrium.deployment]
+[runtime.cortex]
 docker_base_image_url = "public.ecr.aws/lambda/python:3.11"
 ```
 
 However, **private ECR images** require authentication. See [Using Private Docker Registries](/cerebrium/container-images/private-docker-registry) for setup instructions.
 
 ## Custom Runtimes
 
-While Cerebrium's default runtime works well for most apps, teams often need more control over their server implementation. Custom runtimes enable features like custom authentication, dynamic batching, public endpoints, or WebSocket connections.
+While Cerebrium's default cortex runtime works well for most apps, teams often need more control over their server implementation. Custom runtimes enable features like custom authentication, dynamic batching, public endpoints, or WebSocket connections.
 
-### Basic Configuration
+### Python Runtime (ASGI/WSGI)
 
-Define a custom runtime by adding the `cerebrium.runtime.custom` section to the configuration:
+For custom Python web servers, use the `[runtime.python]` section:
 
 ```toml
-[cerebrium.runtime.custom]
+[deployment]
+name = "my-fastapi-app"
+
+[runtime.python]
+python_version = "3.11"
 entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
 port = 8080
 healthcheck_endpoint = ""  # Empty string uses TCP health check
 readycheck_endpoint = ""  # Empty string uses TCP health check
-
 ```
 
 Key parameters:
@@ -259,21 +270,45 @@ Key parameters:
 <Info>
   Check out [this
   example](https://github.com/CerebriumAI/examples/tree/master/11-python-apps/1-asgi-fastapi-server)
-  for a detailed implementation of a FastAPI server that uses a custom runtime.
+  for a detailed implementation of a FastAPI server that uses a Python runtime.
 </Info>
 
+### Docker Runtime
+
+For complete control over your container, use the `[runtime.docker]` section with a custom Dockerfile:
+
+```toml
+[deployment]
+name = "my-docker-app"
+
+[runtime.docker]
+dockerfile_path = "./Dockerfile"
+port = 8080
+healthcheck_endpoint = "/health"
+readycheck_endpoint = "/ready"
+```
+
+<Warning>
+  When using the docker runtime, all dependencies and build commands should be
+  handled within the Dockerfile. The `[dependencies.*]` sections will be
+  ignored.
+</Warning>
+
 ### Self-Contained Servers
 
 Custom runtimes also support apps with built-in servers. For example, deploying a VLLM server requires no Python code:
 
 ```toml
-[cerebrium.runtime.custom]
-entrypoint = "vllm serve meta-llama/Meta-Llama-3-8B-Instruct --host 0.0.0.0 --port 8000 --device cuda"
+[deployment]
+name = "vllm-server"
+
+[runtime.python]
+entrypoint = ["vllm", "serve", "meta-llama/Meta-Llama-3-8B-Instruct", "--host", "0.0.0.0", "--port", "8000", "--device", "cuda"]
 port = 8000
 healthcheck_endpoint = "/health"
-healthcheck_endpoint = "/ready"
+readycheck_endpoint = "/ready"
 
-[cerebrium.dependencies.pip]
+[dependencies.pip]
 torch = "latest"
 vllm = "latest"
 ```