From 455c43ac5e4906d1230de7364057ac8f1b54a1fc Mon Sep 17 00:00:00 2001 From: Elijah Roussos Date: Wed, 21 Jan 2026 10:35:14 -0500 Subject: [PATCH 01/16] docs: clarify that TOML entrypoint always takes precedence over Dockerfile CMD - Add "Entrypoint Precedence" section to custom-dockerfiles.mdx explaining that cerebrium.toml entrypoint overrides Dockerfile CMD when specified - Update toml-reference.mdx to document dockerfile_path parameter and add entrypoint precedence info box - Clarify that either Dockerfile CMD or TOML entrypoint is required --- .../container-images/custom-dockerfiles.mdx | 28 ++++++++++++++++--- toml-reference/toml-reference.mdx | 7 ++++- 2 files changed, 30 insertions(+), 5 deletions(-) diff --git a/cerebrium/container-images/custom-dockerfiles.mdx b/cerebrium/container-images/custom-dockerfiles.mdx index 836ac455..1a9452c7 100644 --- a/cerebrium/container-images/custom-dockerfiles.mdx +++ b/cerebrium/container-images/custom-dockerfiles.mdx @@ -51,7 +51,7 @@ CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8192 When creating a Dockerfile for Cerebrium, there are three key requirements: 1. You must expose a port using the `EXPOSE` command - this port will be referenced later in your `cerebrium.toml` configuration -2. A `CMD` command is required to specify what runs when the container starts (typically your server process) +2. Either a `CMD` command in your Dockerfile OR an `entrypoint` in your `cerebrium.toml` is required to specify what runs when the container starts 3. Set the working directory using `WORKDIR` to ensure your application runs from the correct location (defaults to root directory if not specified) Update cerebrium.toml to include a custom runtime section with the `dockerfile_path` parameter: @@ -64,19 +64,39 @@ readycheck_endpoint = "/ready" dockerfile_path = "./Dockerfile" ``` -The configuration requires three key parameters: +The configuration requires the following parameters: - `port`: The port the server listens on. - `healthcheck_endpoint`: The endpoint used to confirm instance health. If unspecified, defaults to a TCP ping on the configured port. If the health check registers a non-200 response, it will be considered _unhealthy_, and be restarted should it not recover timely. - `readycheck_endpoint`: The endpoint used to confirm if the instance is ready to receive. If unspecified, defaults to a TCP ping on the configured port. If the ready check registers a non-200 response, it will not be a viable target for request routing. - `dockerfile_path`: The relative path to the Dockerfile used to build the app. +- `entrypoint` (optional): The command to start the application. -If a Dockerfile does not contain a `CMD` clause, specifying the `entrypoint` parameter in the `cerebrium.toml` file is required. +### Entrypoint Precedence + + + The `entrypoint` parameter in `cerebrium.toml` **always takes precedence** over the `CMD` instruction in your Dockerfile. If you specify an `entrypoint` in your TOML configuration, it will be used regardless of what `CMD` is defined in your Dockerfile. + + +If your Dockerfile does not contain a `CMD` clause, you **must** specify the `entrypoint` parameter in your `cerebrium.toml`: ```toml [cerebrium.runtime.custom] entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8192"] -... +dockerfile_path = "./Dockerfile" +port = 8192 +healthcheck_endpoint = "/health" +readycheck_endpoint = "/ready" +``` + +If you want to override your Dockerfile's `CMD` at deploy time without modifying the Dockerfile, simply add the `entrypoint` parameter to your TOML configuration: + +```toml +[cerebrium.runtime.custom] +# This will override any CMD in your Dockerfile +entrypoint = ["python", "server.py", "--port", "8192"] +dockerfile_path = "./Dockerfile" +port = 8192 ``` diff --git a/toml-reference/toml-reference.mdx b/toml-reference/toml-reference.mdx index 65b024d1..a7a28045 100644 --- a/toml-reference/toml-reference.mdx +++ b/toml-reference/toml-reference.mdx @@ -96,9 +96,10 @@ The `[cerebrium.runtime.custom]` section configures custom web servers and runti | Option | Type | Default | Description | | -------------------- | -------- | -------- | ----------------------------------------------------------------------------------------------------------------- | | port | integer | required | Port the application listens on | -| entrypoint | string[] | required | Command to start the application | +| entrypoint | string[] | - | Command to start the application. Required for Python ASGI/WSGI apps or if Dockerfile has no CMD. **Always takes precedence over Dockerfile CMD when specified.** | | healthcheck_endpoint | string | "" | HTTP path for health checks (empty uses TCP). Failure causes the instance to restart | | readycheck_endpoint | string | "" | HTTP path for readiness checks (empty uses TCP). Failure ensures the load balancer does not route to the instance | +| dockerfile_path | string | - | Relative path to a custom Dockerfile. When specified, your Dockerfile is used for building the container image | The port specified in entrypoint must match the port parameter. All endpoints @@ -106,6 +107,10 @@ The `[cerebrium.runtime.custom]` section configures custom web servers and runti /{app - name}/your/endpoint` + + **Entrypoint Precedence:** When both `entrypoint` in `cerebrium.toml` and `CMD` in your Dockerfile are defined, the TOML `entrypoint` always takes precedence. This allows you to override your Dockerfile's default command at deploy time without modifying the Dockerfile itself. + + ## Hardware Configuration The `[cerebrium.hardware]` section defines compute resources. From ee9d1cc62f5df14e7f3ead954e366ca41120368c Mon Sep 17 00:00:00 2001 From: elijah-rou Date: Wed, 21 Jan 2026 15:38:14 +0000 Subject: [PATCH 02/16] Prettified Code! --- .../container-images/custom-dockerfiles.mdx | 7 +++++-- toml-reference/toml-reference.mdx | 19 +++++++++++-------- 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/cerebrium/container-images/custom-dockerfiles.mdx b/cerebrium/container-images/custom-dockerfiles.mdx index 1a9452c7..f6a4ea39 100644 --- a/cerebrium/container-images/custom-dockerfiles.mdx +++ b/cerebrium/container-images/custom-dockerfiles.mdx @@ -75,10 +75,13 @@ The configuration requires the following parameters: ### Entrypoint Precedence - The `entrypoint` parameter in `cerebrium.toml` **always takes precedence** over the `CMD` instruction in your Dockerfile. If you specify an `entrypoint` in your TOML configuration, it will be used regardless of what `CMD` is defined in your Dockerfile. + The `entrypoint` parameter in `cerebrium.toml` **always takes precedence** + over the `CMD` or `ENTRYPOINT` instruction in your Dockerfile. If you specify an `entrypoint` + in your TOML configuration, it will be used regardless of what `CMD` or `ENTRYPOINT` is + defined in your Dockerfile. -If your Dockerfile does not contain a `CMD` clause, you **must** specify the `entrypoint` parameter in your `cerebrium.toml`: +If your Dockerfile does not contain a `CMD` or `ENTRYPOINT` instruction, you **must** specify the `entrypoint` parameter in your `cerebrium.toml`: ```toml [cerebrium.runtime.custom] diff --git a/toml-reference/toml-reference.mdx b/toml-reference/toml-reference.mdx index a7a28045..e528ad38 100644 --- a/toml-reference/toml-reference.mdx +++ b/toml-reference/toml-reference.mdx @@ -93,13 +93,13 @@ Include in your deployment: The `[cerebrium.runtime.custom]` section configures custom web servers and runtime behavior. -| Option | Type | Default | Description | -| -------------------- | -------- | -------- | ----------------------------------------------------------------------------------------------------------------- | -| port | integer | required | Port the application listens on | -| entrypoint | string[] | - | Command to start the application. Required for Python ASGI/WSGI apps or if Dockerfile has no CMD. **Always takes precedence over Dockerfile CMD when specified.** | -| healthcheck_endpoint | string | "" | HTTP path for health checks (empty uses TCP). Failure causes the instance to restart | -| readycheck_endpoint | string | "" | HTTP path for readiness checks (empty uses TCP). Failure ensures the load balancer does not route to the instance | -| dockerfile_path | string | - | Relative path to a custom Dockerfile. When specified, your Dockerfile is used for building the container image | +| Option | Type | Default | Description | +| -------------------- | -------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| port | integer | required | Port the application listens on | +| entrypoint | string[] | - | Command to start the application. Required for Python ASGI/WSGI apps or if Dockerfile has no CMD/ENTRYPOINT. **Always takes precedence over Dockerfile CMD/ENTRYPOINT when specified.** | +| healthcheck_endpoint | string | "" | HTTP path for health checks (empty uses TCP). Failure causes the instance to restart | +| readycheck_endpoint | string | "" | HTTP path for readiness checks (empty uses TCP). Failure ensures the load balancer does not route to the instance | +| dockerfile_path | string | - | Relative path to a custom Dockerfile. When specified, your Dockerfile is used for building the container image | The port specified in entrypoint must match the port parameter. All endpoints @@ -108,7 +108,10 @@ The `[cerebrium.runtime.custom]` section configures custom web servers and runti - **Entrypoint Precedence:** When both `entrypoint` in `cerebrium.toml` and `CMD` in your Dockerfile are defined, the TOML `entrypoint` always takes precedence. This allows you to override your Dockerfile's default command at deploy time without modifying the Dockerfile itself. + **Entrypoint Precedence:** When both `entrypoint` in `cerebrium.toml` and + `CMD`/`ENTRYPOINT` in your Dockerfile are defined, the TOML `entrypoint` always takes + precedence. This allows you to override your Dockerfile's default command at + deploy time without modifying the Dockerfile itself. ## Hardware Configuration From 518eddc72452448a7f53c54fd2d93d8770b53cba Mon Sep 17 00:00:00 2001 From: elijah-rou Date: Wed, 21 Jan 2026 15:58:56 +0000 Subject: [PATCH 03/16] Prettified Code! --- .../container-images/custom-dockerfiles.mdx | 6 +++--- toml-reference/toml-reference.mdx | 18 +++++++++--------- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/cerebrium/container-images/custom-dockerfiles.mdx b/cerebrium/container-images/custom-dockerfiles.mdx index f6a4ea39..488dda74 100644 --- a/cerebrium/container-images/custom-dockerfiles.mdx +++ b/cerebrium/container-images/custom-dockerfiles.mdx @@ -76,9 +76,9 @@ The configuration requires the following parameters: The `entrypoint` parameter in `cerebrium.toml` **always takes precedence** - over the `CMD` or `ENTRYPOINT` instruction in your Dockerfile. If you specify an `entrypoint` - in your TOML configuration, it will be used regardless of what `CMD` or `ENTRYPOINT` is - defined in your Dockerfile. + over the `CMD` or `ENTRYPOINT` instruction in your Dockerfile. If you specify + an `entrypoint` in your TOML configuration, it will be used regardless of what + `CMD` or `ENTRYPOINT` is defined in your Dockerfile. If your Dockerfile does not contain a `CMD` or `ENTRYPOINT` instruction, you **must** specify the `entrypoint` parameter in your `cerebrium.toml`: diff --git a/toml-reference/toml-reference.mdx b/toml-reference/toml-reference.mdx index e528ad38..165aa40a 100644 --- a/toml-reference/toml-reference.mdx +++ b/toml-reference/toml-reference.mdx @@ -93,13 +93,13 @@ Include in your deployment: The `[cerebrium.runtime.custom]` section configures custom web servers and runtime behavior. -| Option | Type | Default | Description | -| -------------------- | -------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| port | integer | required | Port the application listens on | +| Option | Type | Default | Description | +| -------------------- | -------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| port | integer | required | Port the application listens on | | entrypoint | string[] | - | Command to start the application. Required for Python ASGI/WSGI apps or if Dockerfile has no CMD/ENTRYPOINT. **Always takes precedence over Dockerfile CMD/ENTRYPOINT when specified.** | -| healthcheck_endpoint | string | "" | HTTP path for health checks (empty uses TCP). Failure causes the instance to restart | -| readycheck_endpoint | string | "" | HTTP path for readiness checks (empty uses TCP). Failure ensures the load balancer does not route to the instance | -| dockerfile_path | string | - | Relative path to a custom Dockerfile. When specified, your Dockerfile is used for building the container image | +| healthcheck_endpoint | string | "" | HTTP path for health checks (empty uses TCP). Failure causes the instance to restart | +| readycheck_endpoint | string | "" | HTTP path for readiness checks (empty uses TCP). Failure ensures the load balancer does not route to the instance | +| dockerfile_path | string | - | Relative path to a custom Dockerfile. When specified, your Dockerfile is used for building the container image | The port specified in entrypoint must match the port parameter. All endpoints @@ -109,9 +109,9 @@ The `[cerebrium.runtime.custom]` section configures custom web servers and runti **Entrypoint Precedence:** When both `entrypoint` in `cerebrium.toml` and - `CMD`/`ENTRYPOINT` in your Dockerfile are defined, the TOML `entrypoint` always takes - precedence. This allows you to override your Dockerfile's default command at - deploy time without modifying the Dockerfile itself. + `CMD`/`ENTRYPOINT` in your Dockerfile are defined, the TOML `entrypoint` + always takes precedence. This allows you to override your Dockerfile's default + command at deploy time without modifying the Dockerfile itself. ## Hardware Configuration From 3cc0fac7a8361f44d06d49255750e8a6796a4df7 Mon Sep 17 00:00:00 2001 From: Elijah Roussos Date: Wed, 21 Jan 2026 11:49:17 -0500 Subject: [PATCH 04/16] even more explicit --- cerebrium/container-images/custom-dockerfiles.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cerebrium/container-images/custom-dockerfiles.mdx b/cerebrium/container-images/custom-dockerfiles.mdx index 488dda74..21fc77fd 100644 --- a/cerebrium/container-images/custom-dockerfiles.mdx +++ b/cerebrium/container-images/custom-dockerfiles.mdx @@ -51,7 +51,7 @@ CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8192 When creating a Dockerfile for Cerebrium, there are three key requirements: 1. You must expose a port using the `EXPOSE` command - this port will be referenced later in your `cerebrium.toml` configuration -2. Either a `CMD` command in your Dockerfile OR an `entrypoint` in your `cerebrium.toml` is required to specify what runs when the container starts +2. Either a `CMD` or `ENTRYPOINT` directive must be defined in your Dockerfile OR the `entrypoint` key in your `cerebrium.toml` under `[cerebrium.runtime.custom]`. This specify what runs when the container starts. The TOML configuration will take precedence 3. Set the working directory using `WORKDIR` to ensure your application runs from the correct location (defaults to root directory if not specified) Update cerebrium.toml to include a custom runtime section with the `dockerfile_path` parameter: @@ -70,7 +70,7 @@ The configuration requires the following parameters: - `healthcheck_endpoint`: The endpoint used to confirm instance health. If unspecified, defaults to a TCP ping on the configured port. If the health check registers a non-200 response, it will be considered _unhealthy_, and be restarted should it not recover timely. - `readycheck_endpoint`: The endpoint used to confirm if the instance is ready to receive. If unspecified, defaults to a TCP ping on the configured port. If the ready check registers a non-200 response, it will not be a viable target for request routing. - `dockerfile_path`: The relative path to the Dockerfile used to build the app. -- `entrypoint` (optional): The command to start the application. +- `entrypoint` (optional): The command to start the application. Will be required if `CMD` or `ENTRYPOINT` is not defined in the given dockerfile. ### Entrypoint Precedence From 55f9878ecc62ae420c066da066955418360fdbd0 Mon Sep 17 00:00:00 2001 From: Elijah Roussos Date: Thu, 22 Jan 2026 15:36:57 -0500 Subject: [PATCH 05/16] docs: restructure runtime configuration documentation Split runtime documentation into separate cortex, python, and docker sections. Update all examples to use new [cerebrium.runtime.cortex], [cerebrium.runtime.python], and [cerebrium.runtime.docker] configuration patterns. Add backwards compatibility notes for deprecated [cerebrium.runtime.custom] section. --- .../container-images/custom-dockerfiles.mdx | 39 ++- .../container-images/custom-web-servers.mdx | 27 +- .../defining-container-images.mdx | 75 +++-- toml-reference/toml-reference.mdx | 261 ++++++++++++++---- 4 files changed, 307 insertions(+), 95 deletions(-) diff --git a/cerebrium/container-images/custom-dockerfiles.mdx b/cerebrium/container-images/custom-dockerfiles.mdx index 21fc77fd..6595eddf 100644 --- a/cerebrium/container-images/custom-dockerfiles.mdx +++ b/cerebrium/container-images/custom-dockerfiles.mdx @@ -51,25 +51,28 @@ CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8192 When creating a Dockerfile for Cerebrium, there are three key requirements: 1. You must expose a port using the `EXPOSE` command - this port will be referenced later in your `cerebrium.toml` configuration -2. Either a `CMD` or `ENTRYPOINT` directive must be defined in your Dockerfile OR the `entrypoint` key in your `cerebrium.toml` under `[cerebrium.runtime.custom]`. This specify what runs when the container starts. The TOML configuration will take precedence +2. Either a `CMD` or `ENTRYPOINT` directive must be defined in your Dockerfile OR the `entrypoint` key in your `cerebrium.toml` under `[cerebrium.runtime.docker]`. This specifies what runs when the container starts. The TOML configuration will take precedence 3. Set the working directory using `WORKDIR` to ensure your application runs from the correct location (defaults to root directory if not specified) -Update cerebrium.toml to include a custom runtime section with the `dockerfile_path` parameter: +Update cerebrium.toml to include a docker runtime section with the `dockerfile_path` parameter: ```toml -[cerebrium.runtime.custom] +[cerebrium.deployment] +name = "my-docker-app" + +[cerebrium.runtime.docker] +dockerfile_path = "./Dockerfile" port = 8192 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -dockerfile_path = "./Dockerfile" ``` The configuration requires the following parameters: +- `dockerfile_path`: The relative path to the Dockerfile used to build the app. - `port`: The port the server listens on. - `healthcheck_endpoint`: The endpoint used to confirm instance health. If unspecified, defaults to a TCP ping on the configured port. If the health check registers a non-200 response, it will be considered _unhealthy_, and be restarted should it not recover timely. - `readycheck_endpoint`: The endpoint used to confirm if the instance is ready to receive. If unspecified, defaults to a TCP ping on the configured port. If the ready check registers a non-200 response, it will not be a viable target for request routing. -- `dockerfile_path`: The relative path to the Dockerfile used to build the app. - `entrypoint` (optional): The command to start the application. Will be required if `CMD` or `ENTRYPOINT` is not defined in the given dockerfile. ### Entrypoint Precedence @@ -84,9 +87,12 @@ The configuration requires the following parameters: If your Dockerfile does not contain a `CMD` or `ENTRYPOINT` instruction, you **must** specify the `entrypoint` parameter in your `cerebrium.toml`: ```toml -[cerebrium.runtime.custom] -entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8192"] +[cerebrium.deployment] +name = "my-docker-app" + +[cerebrium.runtime.docker] dockerfile_path = "./Dockerfile" +entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8192"] port = 8192 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" @@ -95,19 +101,21 @@ readycheck_endpoint = "/ready" If you want to override your Dockerfile's `CMD` at deploy time without modifying the Dockerfile, simply add the `entrypoint` parameter to your TOML configuration: ```toml -[cerebrium.runtime.custom] +[cerebrium.deployment] +name = "my-docker-app" + +[cerebrium.runtime.docker] +dockerfile_path = "./Dockerfile" # This will override any CMD in your Dockerfile entrypoint = ["python", "server.py", "--port", "8192"] -dockerfile_path = "./Dockerfile" port = 8192 ``` When specifying a `dockerfile_path`, all dependencies and necessary commands should be installed and executed within the Dockerfile. Dependencies listed - under `cerebrium.dependencies.*`, as well as - `cerebrium.deployment.shell_commands` and - `cerebrium.deployment.pre_build_commands`, will be ignored. + under `cerebrium.dependencies.*`, as well as `shell_commands` and + `pre_build_commands`, will be ignored. ## Building Generic Dockerized Apps @@ -188,9 +196,12 @@ CMD ["dumb-init", "--", "/rs_server"] Similarly to the FastAPI webserver, the application should be configured in the `cerebrium.toml` file: ```toml -[cerebrium.runtime.custom] +[cerebrium.deployment] +name = "rust-server" + +[cerebrium.runtime.docker] +dockerfile_path = "./Dockerfile" port = 8192 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -dockerfile_path = "./Dockerfile" ``` diff --git a/cerebrium/container-images/custom-web-servers.mdx b/cerebrium/container-images/custom-web-servers.mdx index 50d0da68..369f42e1 100644 --- a/cerebrium/container-images/custom-web-servers.mdx +++ b/cerebrium/container-images/custom-web-servers.mdx @@ -3,7 +3,7 @@ title: "Custom Python Web Servers" description: "Run ASGI/WSGI Python apps on Cerebrium" --- -While Cerebrium's default runtime works well for most app needs, teams sometimes need more control over their web server implementation. Using ASGI or WSGI servers through Cerebrium's custom runtime feature enables capabilities like custom authentication, dynamic batching, frontend dashboards, public endpoints, and WebSocket connections. +While Cerebrium's default runtime works well for most app needs, teams sometimes need more control over their web server implementation. Using ASGI or WSGI servers through Cerebrium's Python runtime feature enables capabilities like custom authentication, dynamic batching, frontend dashboards, public endpoints, and WebSocket connections. ## Setting Up Custom Servers @@ -26,12 +26,15 @@ def ready(): return "OK" ``` -Configure this server in `cerebrium.toml` by adding a custom runtime section: +Configure this server in `cerebrium.toml` by adding a Python runtime section: ```toml -[cerebrium.runtime.custom] -port = 5000 +[cerebrium.deployment] +name = "my-fastapi-app" + +[cerebrium.runtime.python] entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "5000"] +port = 5000 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" @@ -40,15 +43,29 @@ pydantic = "latest" numpy = "latest" loguru = "latest" fastapi = "latest" +uvicorn = "latest" ``` -The configuration requires three key parameters: +The configuration requires the following key parameters: - `entrypoint`: The command that starts your server - `port`: The port your server listens on - `healthcheck_endpoint`: The endpoint used to confirm instance health. If unspecified, defaults to a TCP ping on the configured port. If the health check registers a non-200 response, it will be considered _unhealthy_, and be restarted should it not recover timely. - `readycheck_endpoint`: The endpoint used to confirm if the instance is ready to receive. If unspecified, defaults to a TCP ping on the configured port. If the ready check registers a non-200 response, it will not be a viable target for request routing. +You can also configure build settings in the Python runtime section: + +```toml +[cerebrium.runtime.python] +python_version = "3.11" +docker_base_image_url = "debian:bookworm-slim" +use_uv = true +entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] +port = 8000 +healthcheck_endpoint = "/health" +readycheck_endpoint = "/ready" +``` + For ASGI applications like FastAPI, include the appropriate server package (like `uvicorn`) in your dependencies. After deployment, your endpoints become diff --git a/cerebrium/container-images/defining-container-images.mdx b/cerebrium/container-images/defining-container-images.mdx index c4cad161..db8395ec 100644 --- a/cerebrium/container-images/defining-container-images.mdx +++ b/cerebrium/container-images/defining-container-images.mdx @@ -21,8 +21,8 @@ Check out the [Introductory Guide](/cerebrium/getting-started/introduction) for It is possible to initialize an existing project by adding a `cerebrium.toml` file to the root of your codebase, defining your entrypoint (`main.py` if - using the default runtime, or adding an entrypoint to the .toml file if using - a custom runtime) and including the necessary files in the `deployment` + using the default cortex runtime, or adding an entrypoint to the runtime section if using + a python or docker runtime) and including the necessary files in the `deployment` section of your `cerebrium.toml` file. @@ -44,11 +44,20 @@ For detailed hardware specifications and performance characteristics see the [GP ### Selecting a Python Version -The Python runtime version forms the foundation of every Cerebrium app. We currently support versions 3.10 to 3.13. Specify the Python version in the deployment section of the configuration: +The Python runtime version forms the foundation of every Cerebrium app. We currently support versions 3.10 to 3.13. Specify the Python version in the runtime section of the configuration: ```toml -[cerebrium.deployment] -python_version = 3.11 +[cerebrium.runtime.cortex] +python_version = "3.11" +``` + +Or for custom Python ASGI/WSGI apps: + +```toml +[cerebrium.runtime.python] +python_version = "3.11" +entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] +port = 8000 ``` The Python version affects the entire dependency chain. For instance, some packages may not support newer Python versions immediately after release. @@ -132,8 +141,7 @@ Cerebrium's build process includes two specialized command types that execute at Pre-build commands execute at the start of the build process, before dependency installation begins. This early execution timing makes them essential for setting up the build environment: ```toml - -[cerebrium.deployment] +[cerebrium.runtime.cortex] pre_build_commands = [ # Add specialized build tools "curl -o /usr/local/bin/pget -L 'https://github.com/replicate/pget/releases/download/v0.6.2/pget_linux_x86_64'", @@ -148,7 +156,7 @@ Pre-build commands typically handle tasks like installing build tools, configuri Shell commands execute after all dependencies install and the application code copies into the container. This later timing ensures access to the complete environment: ```toml -[cerebrium.deployment] +[cerebrium.runtime.cortex] shell_commands = [ # Initialize application resources "python -m download_models", @@ -178,7 +186,7 @@ The base image selection shapes how an app runs in Cerebrium. While the default Cerebrium supports several categories of base images to ensure system compatibility such as nvidia, ubuntu and python images. ```toml -[cerebrium.deployment] +[cerebrium.runtime.cortex] docker_base_image_url = "debian:bookworm-slim" # Default minimal image #docker_base_image_url = "nvidia/cuda:12.0.1-runtime-ubuntu22.04" # CUDA-enabled images #docker_base_image_url = "ubuntu:22.04" # debian images @@ -205,7 +213,7 @@ docker login -u your-dockerhub-username After logging in, you can use the image in your configuration: ```toml -[cerebrium.deployment] +[cerebrium.runtime.cortex] docker_base_image_url = "bob/infinity:latest" ``` @@ -226,7 +234,7 @@ docker_base_image_url = "bob/infinity:latest" Public ECR images from the `public.ecr.aws` registry work without authentication: ```toml -[cerebrium.deployment] +[cerebrium.runtime.cortex] docker_base_image_url = "public.ecr.aws/lambda/python:3.11" ``` @@ -234,19 +242,22 @@ However, **private ECR images** require authentication. See [Using Private Docke ## Custom Runtimes -While Cerebrium's default runtime works well for most apps, teams often need more control over their server implementation. Custom runtimes enable features like custom authentication, dynamic batching, public endpoints, or WebSocket connections. +While Cerebrium's default cortex runtime works well for most apps, teams often need more control over their server implementation. Custom runtimes enable features like custom authentication, dynamic batching, public endpoints, or WebSocket connections. -### Basic Configuration +### Python Runtime (ASGI/WSGI) -Define a custom runtime by adding the `cerebrium.runtime.custom` section to the configuration: +For custom Python web servers, use the `[cerebrium.runtime.python]` section: ```toml -[cerebrium.runtime.custom] +[cerebrium.deployment] +name = "my-fastapi-app" + +[cerebrium.runtime.python] +python_version = "3.11" entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"] port = 8080 healthcheck_endpoint = "" # Empty string uses TCP health check readycheck_endpoint = "" # Empty string uses TCP health check - ``` Key parameters: @@ -259,19 +270,43 @@ Key parameters: Check out [this example](https://github.com/CerebriumAI/examples/tree/master/11-python-apps/1-asgi-fastapi-server) - for a detailed implementation of a FastAPI server that uses a custom runtime. + for a detailed implementation of a FastAPI server that uses a Python runtime. +### Docker Runtime + +For complete control over your container, use the `[cerebrium.runtime.docker]` section with a custom Dockerfile: + +```toml +[cerebrium.deployment] +name = "my-docker-app" + +[cerebrium.runtime.docker] +dockerfile_path = "./Dockerfile" +port = 8080 +healthcheck_endpoint = "/health" +readycheck_endpoint = "/ready" +``` + + + When using the docker runtime, all dependencies and build commands should be + handled within the Dockerfile. The `[cerebrium.dependencies.*]` sections will + be ignored. + + ### Self-Contained Servers Custom runtimes also support apps with built-in servers. For example, deploying a VLLM server requires no Python code: ```toml -[cerebrium.runtime.custom] -entrypoint = "vllm serve meta-llama/Meta-Llama-3-8B-Instruct --host 0.0.0.0 --port 8000 --device cuda" +[cerebrium.deployment] +name = "vllm-server" + +[cerebrium.runtime.python] +entrypoint = ["vllm", "serve", "meta-llama/Meta-Llama-3-8B-Instruct", "--host", "0.0.0.0", "--port", "8000", "--device", "cuda"] port = 8000 healthcheck_endpoint = "/health" -healthcheck_endpoint = "/ready" +readycheck_endpoint = "/ready" [cerebrium.dependencies.pip] torch = "latest" diff --git a/toml-reference/toml-reference.mdx b/toml-reference/toml-reference.mdx index 165aa40a..c8b324cc 100644 --- a/toml-reference/toml-reference.mdx +++ b/toml-reference/toml-reference.mdx @@ -5,37 +5,139 @@ description: Complete reference for all parameters available in Cerebrium's defa The configuration is organized into the following main sections: -- **[cerebrium.deployment]** Core settings like app name, Python version, and file inclusion rules -- **[cerebrium.runtime.custom]** Custom web server settings and app startup behavior +- **[cerebrium.deployment]** Core settings like app name and file inclusion rules +- **[cerebrium.runtime.cortex]** Default Cerebrium-managed Python runtime (build settings) +- **[cerebrium.runtime.python]** Custom Python ASGI/WSGI web server settings +- **[cerebrium.runtime.docker]** Custom Dockerfile settings - **[cerebrium.hardware]** Compute resources including CPU, memory, and GPU specifications - **[cerebrium.scaling]** Auto-scaling behavior and replica management - **[cerebrium.dependencies]** Package management for Python (pip), system (apt), and Conda dependencies ## Deployment Configuration -The `[cerebrium.deployment]` section defines core deployment settings. +The `[cerebrium.deployment]` section defines core deployment settings that apply to all runtime types. | Option | Type | Default | Description | | --------------------------------- | -------- | ---------------------- | ------------------------------------------------------------------------------------------------------------ | | name | string | required | Desired app name | -| python_version | string | "3.12" | Python version to use (3.10, 3.11, 3.12) | | disable_auth | boolean | false | Disable default token-based authentication on app endpoints | | include | string[] | ["*"] | Files/patterns to include in deployment | | exclude | string[] | [".*"] | Files/patterns to exclude from deployment | -| shell_commands | string[] | [] | Commands to run at the end of the build | -| pre_build_commands | string[] | [] | Commands to run before dependencies install | -| docker_base_image_url | string | "debian:bookworm-slim" | Base Docker image | -| use_uv | boolean | false | Use UV for faster Python package installation | | deployment_initialization_timeout | integer | 600 (10 minutes) | The max time to wait for app initialisation during build before timing out. Value must be between 60 and 830 | +## Runtime Configuration + +Cerebrium supports three runtime types. You should only specify one runtime section in your configuration. + +### Cortex Runtime (Default) + +The `[cerebrium.runtime.cortex]` section configures the default Cerebrium-managed Python runtime. This is ideal for standard Python applications using the default Cortex framework. + +| Option | Type | Default | Description | +| -------------------- | -------- | ---------------------- | ------------------------------------------------- | +| python_version | string | "3.11" | Python version to use (3.10, 3.11, 3.12) | +| docker_base_image_url| string | "debian:bookworm-slim" | Base Docker image | +| shell_commands | string[] | [] | Commands to run at the end of the build | +| pre_build_commands | string[] | [] | Commands to run before dependencies install | +| use_uv | boolean | false | Use UV for faster Python package installation | + +**Example:** + +```toml +[cerebrium.deployment] +name = "my-cortex-app" + +[cerebrium.runtime.cortex] +python_version = "3.12" +docker_base_image_url = "debian:bookworm-slim" +use_uv = true +``` + Changes to python_version or docker_base_image_url trigger full rebuilds since they affect the base environment. +### Python Runtime (Custom ASGI/WSGI) + +The `[cerebrium.runtime.python]` section configures custom Python web servers (ASGI/WSGI). Use this when you need full control over your web server implementation for features like custom authentication, dynamic batching, or WebSocket connections. + +| Option | Type | Default | Description | +| -------------------- | -------- | ------------------------------------------------------------------ | -------------------------------------------------------------------------- | +| python_version | string | "3.11" | Python version to use (3.10, 3.11, 3.12) | +| docker_base_image_url| string | "debian:bookworm-slim" | Base Docker image | +| shell_commands | string[] | [] | Commands to run at the end of the build | +| pre_build_commands | string[] | [] | Commands to run before dependencies install | +| use_uv | boolean | false | Use UV for faster Python package installation | +| entrypoint | string[] | ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"] | Command to start the application | +| port | integer | 8000 | Port the application listens on | +| healthcheck_endpoint | string | "" | HTTP path for health checks (empty uses TCP). Failure causes restart | +| readycheck_endpoint | string | "" | HTTP path for readiness checks (empty uses TCP). Failure stops routing | + +**Example:** + +```toml +[cerebrium.deployment] +name = "my-fastapi-app" + +[cerebrium.runtime.python] +python_version = "3.11" +entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] +port = 8000 +healthcheck_endpoint = "/health" +readycheck_endpoint = "/ready" + +[cerebrium.dependencies.pip] +fastapi = "latest" +uvicorn = "latest" +``` + + + The port specified in entrypoint must match the port parameter. All endpoints + will be available at `https://api.aws.us-east-1.cerebrium.ai/v4/{project-id}/{app-name}/your/endpoint` + + +### Docker Runtime (Custom Dockerfile) + +The `[cerebrium.runtime.docker]` section configures deployments using custom Dockerfiles. Use this for non-Python applications or when you need complete control over the container build process. + +| Option | Type | Default | Description | +| -------------------- | -------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| dockerfile_path | string | required | Relative path to a custom Dockerfile | +| entrypoint | string[] | [] | Command to start the application. Required if Dockerfile has no CMD/ENTRYPOINT. **Always takes precedence over Dockerfile CMD/ENTRYPOINT when specified.** | +| port | integer | 8000 | Port the application listens on | +| healthcheck_endpoint | string | "" | HTTP path for health checks (empty uses TCP). Failure causes the instance to restart | +| readycheck_endpoint | string | "" | HTTP path for readiness checks (empty uses TCP). Failure ensures the load balancer does not route to the instance | + +**Example:** + +```toml +[cerebrium.deployment] +name = "my-docker-app" + +[cerebrium.runtime.docker] +dockerfile_path = "./Dockerfile" +port = 8080 +healthcheck_endpoint = "/health" +readycheck_endpoint = "/ready" +``` + + + **Entrypoint Precedence:** When both `entrypoint` in `cerebrium.toml` and + `CMD`/`ENTRYPOINT` in your Dockerfile are defined, the TOML `entrypoint` + always takes precedence. This allows you to override your Dockerfile's default + command at deploy time without modifying the Dockerfile itself. + + + + When using `dockerfile_path`, all dependencies and build commands should be + handled within the Dockerfile. The `[cerebrium.dependencies.*]` sections, + `shell_commands`, and `pre_build_commands` will be ignored. + + ### UV Package Manager -UV is a fast Python package installer written in Rust that can significantly speed up deployment times. When enabled, UV will be used instead of pip for installing Python dependencies. +UV is a fast Python package installer written in Rust that can significantly speed up deployment times. When enabled in `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]`, UV will be used instead of pip for installing Python dependencies. UV typically installs packages 10-100x faster than pip, especially beneficial for: @@ -49,7 +151,7 @@ UV typically installs packages 10-100x faster than pip, especially beneficial fo **Example with UV enabled:** ```toml -[cerebrium.deployment] +[cerebrium.runtime.cortex] use_uv = true ``` @@ -89,31 +191,6 @@ Include in your deployment: - Ensure requirements.txt is in your project directory - Deploy with UV enabled -## Runtime Configuration - -The `[cerebrium.runtime.custom]` section configures custom web servers and runtime behavior. - -| Option | Type | Default | Description | -| -------------------- | -------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| port | integer | required | Port the application listens on | -| entrypoint | string[] | - | Command to start the application. Required for Python ASGI/WSGI apps or if Dockerfile has no CMD/ENTRYPOINT. **Always takes precedence over Dockerfile CMD/ENTRYPOINT when specified.** | -| healthcheck_endpoint | string | "" | HTTP path for health checks (empty uses TCP). Failure causes the instance to restart | -| readycheck_endpoint | string | "" | HTTP path for readiness checks (empty uses TCP). Failure ensures the load balancer does not route to the instance | -| dockerfile_path | string | - | Relative path to a custom Dockerfile. When specified, your Dockerfile is used for building the container image | - - - The port specified in entrypoint must match the port parameter. All endpoints - will be available at `https://api.aws.us-east-1.cerebrium.ai/v4/{project - id} - /{app - name}/your/endpoint` - - - - **Entrypoint Precedence:** When both `entrypoint` in `cerebrium.toml` and - `CMD`/`ENTRYPOINT` in your Dockerfile are defined, the TOML `entrypoint` - always takes precedence. This allows you to override your Dockerfile's default - command at deploy time without modifying the Dockerfile itself. - - ## Hardware Configuration The `[cerebrium.hardware]` section defines compute resources. @@ -214,26 +291,23 @@ apt = "pkglist.txt" conda = "conda_pkglist.txt" ``` -## Complete Example +## Complete Examples + +### Cortex Runtime (Default) ```toml [cerebrium.deployment] name = "llm-inference" -python_version = "3.12" disable_auth = false include = ["*"] exclude = [".*"] -shell_commands = [] -pre_build_commands = [] + +[cerebrium.runtime.cortex] +python_version = "3.12" docker_base_image_url = "debian:bookworm-slim" use_uv = true -# Enable fast package installation with UV (omit or set to false if you want to use pip) - -[cerebrium.runtime.custom] -port = 8000 -entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] -healthcheck_endpoint = "/health" -readycheck_endpoint = "/ready" +shell_commands = [] +pre_build_commands = [] [cerebrium.hardware] cpu = 4 @@ -252,23 +326,98 @@ cooldown = 1800 scaling_metric = "concurrency_utilization" scaling_target = 100 evaluation_interval = 30 -# load_balancing = "" # Auto-selects based on replica_concurrency roll_out_duration_seconds = 0 +[cerebrium.dependencies.pip] +torch = "latest" +transformers = "latest" +``` + +### Python Runtime (FastAPI) + +```toml +[cerebrium.deployment] +name = "fastapi-server" +disable_auth = false +include = ["*"] +exclude = [".*"] + +[cerebrium.runtime.python] +python_version = "3.11" +entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] +port = 8000 +healthcheck_endpoint = "/health" +readycheck_endpoint = "/ready" + +[cerebrium.hardware] +cpu = 4 +memory = 16.0 +compute = "AMPERE_A10" +gpu_count = 1 +provider = "aws" +region = "us-east-1" + +[cerebrium.scaling] +min_replicas = 0 +max_replicas = 2 +replica_concurrency = 10 + [cerebrium.dependencies.pip] torch = "latest" transformers = "latest" uvicorn = "latest" +fastapi = "latest" +``` -[cerebrium.dependencies.apt] -ffmpeg = "latest" +### Docker Runtime -[cerebrium.dependencies.conda] -# Optional conda dependencies +```toml +[cerebrium.deployment] +name = "rust-server" +include = ["*"] +exclude = [".*"] -[cerebrium.dependencies.paths] -# Optional paths to dependency files -# pip = "requirements.txt" -# apt = "pkglist.txt" -# conda = "conda_pkglist.txt" +[cerebrium.runtime.docker] +dockerfile_path = "./Dockerfile" +port = 8192 +healthcheck_endpoint = "/health" +readycheck_endpoint = "/ready" + +[cerebrium.hardware] +cpu = 4 +memory = 16.0 +compute = "CPU" +provider = "aws" +region = "us-east-1" + +[cerebrium.scaling] +min_replicas = 0 +max_replicas = 2 +replica_concurrency = 10 ``` + +## Backwards Compatibility + + +The following configuration patterns are deprecated but still supported for backwards compatibility. +We recommend migrating to the new runtime sections. + + +### Deprecated: Runtime fields in [cerebrium.deployment] + +The following fields in `[cerebrium.deployment]` are deprecated. Please move them to the appropriate runtime section: + +| Deprecated Field | New Location | +| ---------------------- | ----------------------------------------------------------- | +| python_version | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | +| docker_base_image_url | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | +| shell_commands | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | +| pre_build_commands | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | +| use_uv | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | + +### Deprecated: [cerebrium.runtime.custom] + +The `[cerebrium.runtime.custom]` section is deprecated. Please migrate to: + +- `[cerebrium.runtime.python]` - For custom Python ASGI/WSGI applications +- `[cerebrium.runtime.docker]` - For custom Dockerfile deployments (when using `dockerfile_path`) From b547fbf813858ab163dd22a61ce5e11e0bb6277c Mon Sep 17 00:00:00 2001 From: elijah-rou Date: Thu, 22 Jan 2026 20:37:33 +0000 Subject: [PATCH 06/16] Prettified Code! --- .../defining-container-images.mdx | 6 +- toml-reference/toml-reference.mdx | 85 ++++++++++--------- 2 files changed, 46 insertions(+), 45 deletions(-) diff --git a/cerebrium/container-images/defining-container-images.mdx b/cerebrium/container-images/defining-container-images.mdx index db8395ec..2e1b8481 100644 --- a/cerebrium/container-images/defining-container-images.mdx +++ b/cerebrium/container-images/defining-container-images.mdx @@ -21,9 +21,9 @@ Check out the [Introductory Guide](/cerebrium/getting-started/introduction) for It is possible to initialize an existing project by adding a `cerebrium.toml` file to the root of your codebase, defining your entrypoint (`main.py` if - using the default cortex runtime, or adding an entrypoint to the runtime section if using - a python or docker runtime) and including the necessary files in the `deployment` - section of your `cerebrium.toml` file. + using the default cortex runtime, or adding an entrypoint to the runtime + section if using a python or docker runtime) and including the necessary files + in the `deployment` section of your `cerebrium.toml` file. ## Hardware Configuration diff --git a/toml-reference/toml-reference.mdx b/toml-reference/toml-reference.mdx index c8b324cc..7f6d5cea 100644 --- a/toml-reference/toml-reference.mdx +++ b/toml-reference/toml-reference.mdx @@ -17,13 +17,13 @@ The configuration is organized into the following main sections: The `[cerebrium.deployment]` section defines core deployment settings that apply to all runtime types. -| Option | Type | Default | Description | -| --------------------------------- | -------- | ---------------------- | ------------------------------------------------------------------------------------------------------------ | -| name | string | required | Desired app name | -| disable_auth | boolean | false | Disable default token-based authentication on app endpoints | -| include | string[] | ["*"] | Files/patterns to include in deployment | -| exclude | string[] | [".*"] | Files/patterns to exclude from deployment | -| deployment_initialization_timeout | integer | 600 (10 minutes) | The max time to wait for app initialisation during build before timing out. Value must be between 60 and 830 | +| Option | Type | Default | Description | +| --------------------------------- | -------- | ---------------- | ------------------------------------------------------------------------------------------------------------ | +| name | string | required | Desired app name | +| disable_auth | boolean | false | Disable default token-based authentication on app endpoints | +| include | string[] | ["*"] | Files/patterns to include in deployment | +| exclude | string[] | [".*"] | Files/patterns to exclude from deployment | +| deployment_initialization_timeout | integer | 600 (10 minutes) | The max time to wait for app initialisation during build before timing out. Value must be between 60 and 830 | ## Runtime Configuration @@ -33,13 +33,13 @@ Cerebrium supports three runtime types. You should only specify one runtime sect The `[cerebrium.runtime.cortex]` section configures the default Cerebrium-managed Python runtime. This is ideal for standard Python applications using the default Cortex framework. -| Option | Type | Default | Description | -| -------------------- | -------- | ---------------------- | ------------------------------------------------- | -| python_version | string | "3.11" | Python version to use (3.10, 3.11, 3.12) | -| docker_base_image_url| string | "debian:bookworm-slim" | Base Docker image | -| shell_commands | string[] | [] | Commands to run at the end of the build | -| pre_build_commands | string[] | [] | Commands to run before dependencies install | -| use_uv | boolean | false | Use UV for faster Python package installation | +| Option | Type | Default | Description | +| --------------------- | -------- | ---------------------- | --------------------------------------------- | +| python_version | string | "3.11" | Python version to use (3.10, 3.11, 3.12) | +| docker_base_image_url | string | "debian:bookworm-slim" | Base Docker image | +| shell_commands | string[] | [] | Commands to run at the end of the build | +| pre_build_commands | string[] | [] | Commands to run before dependencies install | +| use_uv | boolean | false | Use UV for faster Python package installation | **Example:** @@ -62,17 +62,17 @@ use_uv = true The `[cerebrium.runtime.python]` section configures custom Python web servers (ASGI/WSGI). Use this when you need full control over your web server implementation for features like custom authentication, dynamic batching, or WebSocket connections. -| Option | Type | Default | Description | -| -------------------- | -------- | ------------------------------------------------------------------ | -------------------------------------------------------------------------- | -| python_version | string | "3.11" | Python version to use (3.10, 3.11, 3.12) | -| docker_base_image_url| string | "debian:bookworm-slim" | Base Docker image | -| shell_commands | string[] | [] | Commands to run at the end of the build | -| pre_build_commands | string[] | [] | Commands to run before dependencies install | -| use_uv | boolean | false | Use UV for faster Python package installation | -| entrypoint | string[] | ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"] | Command to start the application | -| port | integer | 8000 | Port the application listens on | -| healthcheck_endpoint | string | "" | HTTP path for health checks (empty uses TCP). Failure causes restart | -| readycheck_endpoint | string | "" | HTTP path for readiness checks (empty uses TCP). Failure stops routing | +| Option | Type | Default | Description | +| --------------------- | -------- | ------------------------------------------------------------------ | ---------------------------------------------------------------------- | +| python_version | string | "3.11" | Python version to use (3.10, 3.11, 3.12) | +| docker_base_image_url | string | "debian:bookworm-slim" | Base Docker image | +| shell_commands | string[] | [] | Commands to run at the end of the build | +| pre_build_commands | string[] | [] | Commands to run before dependencies install | +| use_uv | boolean | false | Use UV for faster Python package installation | +| entrypoint | string[] | ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"] | Command to start the application | +| port | integer | 8000 | Port the application listens on | +| healthcheck_endpoint | string | "" | HTTP path for health checks (empty uses TCP). Failure causes restart | +| readycheck_endpoint | string | "" | HTTP path for readiness checks (empty uses TCP). Failure stops routing | **Example:** @@ -94,20 +94,21 @@ uvicorn = "latest" The port specified in entrypoint must match the port parameter. All endpoints - will be available at `https://api.aws.us-east-1.cerebrium.ai/v4/{project-id}/{app-name}/your/endpoint` + will be available at `https://api.aws.us-east-1.cerebrium.ai/v4/{project - id} + /{app - name}/your/endpoint` ### Docker Runtime (Custom Dockerfile) The `[cerebrium.runtime.docker]` section configures deployments using custom Dockerfiles. Use this for non-Python applications or when you need complete control over the container build process. -| Option | Type | Default | Description | -| -------------------- | -------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| dockerfile_path | string | required | Relative path to a custom Dockerfile | -| entrypoint | string[] | [] | Command to start the application. Required if Dockerfile has no CMD/ENTRYPOINT. **Always takes precedence over Dockerfile CMD/ENTRYPOINT when specified.** | -| port | integer | 8000 | Port the application listens on | -| healthcheck_endpoint | string | "" | HTTP path for health checks (empty uses TCP). Failure causes the instance to restart | -| readycheck_endpoint | string | "" | HTTP path for readiness checks (empty uses TCP). Failure ensures the load balancer does not route to the instance | +| Option | Type | Default | Description | +| -------------------- | -------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | +| dockerfile_path | string | required | Relative path to a custom Dockerfile | +| entrypoint | string[] | [] | Command to start the application. Required if Dockerfile has no CMD/ENTRYPOINT. **Always takes precedence over Dockerfile CMD/ENTRYPOINT when specified.** | +| port | integer | 8000 | Port the application listens on | +| healthcheck_endpoint | string | "" | HTTP path for health checks (empty uses TCP). Failure causes the instance to restart | +| readycheck_endpoint | string | "" | HTTP path for readiness checks (empty uses TCP). Failure ensures the load balancer does not route to the instance | **Example:** @@ -399,21 +400,21 @@ replica_concurrency = 10 ## Backwards Compatibility -The following configuration patterns are deprecated but still supported for backwards compatibility. -We recommend migrating to the new runtime sections. + The following configuration patterns are deprecated but still supported for + backwards compatibility. We recommend migrating to the new runtime sections. ### Deprecated: Runtime fields in [cerebrium.deployment] The following fields in `[cerebrium.deployment]` are deprecated. Please move them to the appropriate runtime section: -| Deprecated Field | New Location | -| ---------------------- | ----------------------------------------------------------- | -| python_version | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | -| docker_base_image_url | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | -| shell_commands | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | -| pre_build_commands | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | -| use_uv | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | +| Deprecated Field | New Location | +| --------------------- | ------------------------------------------------------------ | +| python_version | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | +| docker_base_image_url | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | +| shell_commands | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | +| pre_build_commands | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | +| use_uv | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | ### Deprecated: [cerebrium.runtime.custom] From aaf86699e0c66df5fc83e7ed0b029608513e40a5 Mon Sep 17 00:00:00 2001 From: Elijah Roussos Date: Fri, 23 Jan 2026 12:33:36 -0500 Subject: [PATCH 07/16] docs: update TOML reference for runtime-specific dependencies and prefix removal - Add deprecation notice for cerebrium.* prefix (new format removes prefix) - Update Dependencies section to show runtime-specific dependencies as recommended - Add deprecation warnings for top-level [cerebrium.dependencies.*] sections - Update complete examples to use runtime-specific dependencies - Add backwards compatibility section for dependency migration paths --- toml-reference/toml-reference.mdx | 100 +++++++++++++++++++++++++----- 1 file changed, 85 insertions(+), 15 deletions(-) diff --git a/toml-reference/toml-reference.mdx b/toml-reference/toml-reference.mdx index 7f6d5cea..3ee0cb3f 100644 --- a/toml-reference/toml-reference.mdx +++ b/toml-reference/toml-reference.mdx @@ -3,6 +3,18 @@ title: TOML Reference description: Complete reference for all parameters available in Cerebrium's default `cerebrium.toml` configuration file. --- + +**Deprecation Notice:** The `cerebrium.` prefix is being removed from all configuration sections. Sub-keys are becoming top-level keys (e.g., `[cerebrium.deployment]` → `[deployment]`). The prefixed format is still supported for backwards compatibility but will be removed in a future release. We recommend migrating to the new format. + +| Current (Deprecated) | New Format | +| --------------------------------- | --------------------- | +| `[cerebrium.deployment]` | `[deployment]` | +| `[cerebrium.runtime.cortex]` | `[runtime.cortex]` | +| `[cerebrium.hardware]` | `[hardware]` | +| `[cerebrium.scaling]` | `[scaling]` | +| `[cerebrium.dependencies]` | `[dependencies]` | + + The configuration is organized into the following main sections: - **[cerebrium.deployment]** Core settings like app name and file inclusion rules @@ -87,7 +99,7 @@ port = 8000 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[cerebrium.dependencies.pip] +[cerebrium.runtime.python.dependencies.pip] fastapi = "latest" uvicorn = "latest" ``` @@ -250,7 +262,48 @@ For example, with `min_replicas=0` and `scaling_buffer=3`, the system will maint ## Dependencies -### Pip Dependencies +Dependencies can be specified either at the runtime level (recommended) or at the top level. Runtime-specific dependencies are the preferred approach as they keep dependencies alongside their runtime configuration. + +### Runtime-Specific Dependencies (Recommended) + +Dependencies can be specified within the runtime section. This is the recommended approach: + +```toml +[cerebrium.runtime.cortex] +python_version = "3.12" + +[cerebrium.runtime.cortex.dependencies.pip] +torch = "==2.0.0" # Exact version +numpy = "latest" # Latest version +pandas = ">=1.5.0" # Minimum version + +[cerebrium.runtime.cortex.dependencies.apt] +ffmpeg = "latest" +libopenblas-base = "latest" + +[cerebrium.runtime.cortex.dependencies.conda] +cuda = ">=11.7" +cudatoolkit = "11.7" + +[cerebrium.runtime.cortex.dependencies.paths] +pip = "requirements.txt" # Alternative: use a file instead of inline +``` + +This approach works with any runtime type (`cortex`, `python`, or partner services). + + +**Deprecated:** Top-level `[cerebrium.dependencies.*]` sections are deprecated. +Please move your dependencies to `[cerebrium.runtime.{runtime_type}.dependencies.*]`. +Dependencies specified at the runtime level take precedence over top-level dependencies when both are present. + + +### Top-Level Dependencies (Deprecated) + + +This approach is deprecated. Please migrate to runtime-specific dependencies above. + + +#### Pip Dependencies The `[cerebrium.dependencies.pip]` section lists Python package requirements. @@ -261,7 +314,7 @@ numpy = "latest" # Latest version pandas = ">=1.5.0" # Minimum version ``` -### APT Dependencies +#### APT Dependencies The `[cerebrium.dependencies.apt]` section specifies system packages. @@ -271,7 +324,7 @@ ffmpeg = "latest" libopenblas-base = "latest" ``` -### Conda Dependencies +#### Conda Dependencies The `[cerebrium.dependencies.conda]` section manages Conda packages. @@ -281,7 +334,7 @@ cuda = ">=11.7" cudatoolkit = "11.7" ``` -### Dependency Files +#### Dependency Files The `[cerebrium.dependencies.paths]` section allows using requirement files. @@ -310,6 +363,10 @@ use_uv = true shell_commands = [] pre_build_commands = [] +[cerebrium.runtime.cortex.dependencies.pip] +torch = "latest" +transformers = "latest" + [cerebrium.hardware] cpu = 4 memory = 16.0 @@ -328,10 +385,6 @@ scaling_metric = "concurrency_utilization" scaling_target = 100 evaluation_interval = 30 roll_out_duration_seconds = 0 - -[cerebrium.dependencies.pip] -torch = "latest" -transformers = "latest" ``` ### Python Runtime (FastAPI) @@ -350,6 +403,12 @@ port = 8000 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" +[cerebrium.runtime.python.dependencies.pip] +torch = "latest" +transformers = "latest" +uvicorn = "latest" +fastapi = "latest" + [cerebrium.hardware] cpu = 4 memory = 16.0 @@ -362,12 +421,6 @@ region = "us-east-1" min_replicas = 0 max_replicas = 2 replica_concurrency = 10 - -[cerebrium.dependencies.pip] -torch = "latest" -transformers = "latest" -uvicorn = "latest" -fastapi = "latest" ``` ### Docker Runtime @@ -422,3 +475,20 @@ The `[cerebrium.runtime.custom]` section is deprecated. Please migrate to: - `[cerebrium.runtime.python]` - For custom Python ASGI/WSGI applications - `[cerebrium.runtime.docker]` - For custom Dockerfile deployments (when using `dockerfile_path`) + +### Deprecated: Top-level [cerebrium.dependencies.*] + +Top-level dependency sections are deprecated. Please move dependencies to runtime-specific sections: + +| Deprecated Location | New Location | +| ---------------------------------- | ------------------------------------------------------ | +| `[cerebrium.dependencies.pip]` | `[cerebrium.runtime.{type}.dependencies.pip]` | +| `[cerebrium.dependencies.apt]` | `[cerebrium.runtime.{type}.dependencies.apt]` | +| `[cerebrium.dependencies.conda]` | `[cerebrium.runtime.{type}.dependencies.conda]` | +| `[cerebrium.dependencies.paths]` | `[cerebrium.runtime.{type}.dependencies.paths]` | + +Where `{type}` is your runtime type (e.g., `cortex`, `python`). + + +When both top-level and runtime-specific dependencies are present, runtime-specific dependencies take precedence on a per-package basis. This allows gradual migration. + From cf84405a1efd385a0382bada54719b05dedcf142 Mon Sep 17 00:00:00 2001 From: Elijah Roussos Date: Fri, 23 Jan 2026 12:43:02 -0500 Subject: [PATCH 08/16] docs: deprecate cerebrium.* prefix, migrate all examples to new TOML format MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add deprecation notice for cerebrium.* prefix in TOML sections - Update all documentation examples to use new format: - [cerebrium.deployment] → [deployment] - [cerebrium.runtime.*] → [runtime.*] - [cerebrium.hardware] → [hardware] - [cerebrium.scaling] → [scaling] - [cerebrium.dependencies.*] → [dependencies.*] - Update text references to match new format - The old format remains backwards compatible --- .../container-images/custom-dockerfiles.mdx | 20 +- .../container-images/custom-web-servers.mdx | 8 +- .../defining-container-images.mdx | 52 ++--- .../private-docker-registry.mdx | 4 +- cerebrium/deployments/gradual-roll-out.mdx | 6 +- .../deployments/multi-region-deployment.mdx | 4 +- cerebrium/endpoints/websockets.mdx | 4 +- cerebrium/hardware/cpu-and-memory.mdx | 4 +- cerebrium/hardware/using-cuda.mdx | 4 +- cerebrium/hardware/using-gpus.mdx | 4 +- cerebrium/other-topics/faster-cold-starts.mdx | 2 +- cerebrium/partner-services/deepgram.mdx | 8 +- cerebrium/partner-services/rime.mdx | 10 +- cerebrium/scaling/batching-concurrency.mdx | 10 +- cerebrium/scaling/graceful-termination.mdx | 8 +- cerebrium/scaling/scaling-apps.mdx | 20 +- cerebrium/storage/managing-files.mdx | 2 +- migrations/hugging-face.mdx | 23 ++- migrations/mystic.mdx | 14 +- migrations/replicate.mdx | 12 +- toml-reference/toml-reference.mdx | 187 ++++++++++-------- v4/examples/aiVoiceAgents.mdx | 16 +- v4/examples/asgi-gradio-interface.mdx | 10 +- v4/examples/comfyUI.mdx | 12 +- ...oy-a-vision-language-model-with-sglang.mdx | 14 +- ...y-an-llm-with-tensorrtllm-tritonserver.mdx | 8 +- v4/examples/gpt-oss.mdx | 8 +- v4/examples/high-throughput-embeddings.mdx | 10 +- v4/examples/langchain-langsmith.mdx | 2 +- v4/examples/langchain.mdx | 22 +-- v4/examples/livekit-outbound-agent.mdx | 10 +- v4/examples/mistral-vllm.mdx | 18 +- .../openai-compatible-endpoint-vllm.mdx | 4 +- v4/examples/realtime-voice-agents.mdx | 18 +- v4/examples/sdxl.mdx | 12 +- v4/examples/streaming-falcon-7B.mdx | 18 +- v4/examples/transcribe-whisper.mdx | 16 +- v4/examples/twilio-voice-agent.mdx | 8 +- v4/examples/wandb-sweep.mdx | 6 +- 39 files changed, 325 insertions(+), 293 deletions(-) diff --git a/cerebrium/container-images/custom-dockerfiles.mdx b/cerebrium/container-images/custom-dockerfiles.mdx index 6595eddf..f029dc8e 100644 --- a/cerebrium/container-images/custom-dockerfiles.mdx +++ b/cerebrium/container-images/custom-dockerfiles.mdx @@ -51,16 +51,16 @@ CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8192 When creating a Dockerfile for Cerebrium, there are three key requirements: 1. You must expose a port using the `EXPOSE` command - this port will be referenced later in your `cerebrium.toml` configuration -2. Either a `CMD` or `ENTRYPOINT` directive must be defined in your Dockerfile OR the `entrypoint` key in your `cerebrium.toml` under `[cerebrium.runtime.docker]`. This specifies what runs when the container starts. The TOML configuration will take precedence +2. Either a `CMD` or `ENTRYPOINT` directive must be defined in your Dockerfile OR the `entrypoint` key in your `cerebrium.toml` under `[runtime.docker]`. This specifies what runs when the container starts. The TOML configuration will take precedence 3. Set the working directory using `WORKDIR` to ensure your application runs from the correct location (defaults to root directory if not specified) Update cerebrium.toml to include a docker runtime section with the `dockerfile_path` parameter: ```toml -[cerebrium.deployment] +[deployment] name = "my-docker-app" -[cerebrium.runtime.docker] +[runtime.docker] dockerfile_path = "./Dockerfile" port = 8192 healthcheck_endpoint = "/health" @@ -87,10 +87,10 @@ The configuration requires the following parameters: If your Dockerfile does not contain a `CMD` or `ENTRYPOINT` instruction, you **must** specify the `entrypoint` parameter in your `cerebrium.toml`: ```toml -[cerebrium.deployment] +[deployment] name = "my-docker-app" -[cerebrium.runtime.docker] +[runtime.docker] dockerfile_path = "./Dockerfile" entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8192"] port = 8192 @@ -101,10 +101,10 @@ readycheck_endpoint = "/ready" If you want to override your Dockerfile's `CMD` at deploy time without modifying the Dockerfile, simply add the `entrypoint` parameter to your TOML configuration: ```toml -[cerebrium.deployment] +[deployment] name = "my-docker-app" -[cerebrium.runtime.docker] +[runtime.docker] dockerfile_path = "./Dockerfile" # This will override any CMD in your Dockerfile entrypoint = ["python", "server.py", "--port", "8192"] @@ -114,7 +114,7 @@ port = 8192 When specifying a `dockerfile_path`, all dependencies and necessary commands should be installed and executed within the Dockerfile. Dependencies listed - under `cerebrium.dependencies.*`, as well as `shell_commands` and + under `dependencies.*`, as well as `shell_commands` and `pre_build_commands`, will be ignored. @@ -196,10 +196,10 @@ CMD ["dumb-init", "--", "/rs_server"] Similarly to the FastAPI webserver, the application should be configured in the `cerebrium.toml` file: ```toml -[cerebrium.deployment] +[deployment] name = "rust-server" -[cerebrium.runtime.docker] +[runtime.docker] dockerfile_path = "./Dockerfile" port = 8192 healthcheck_endpoint = "/health" diff --git a/cerebrium/container-images/custom-web-servers.mdx b/cerebrium/container-images/custom-web-servers.mdx index 369f42e1..1cb015f6 100644 --- a/cerebrium/container-images/custom-web-servers.mdx +++ b/cerebrium/container-images/custom-web-servers.mdx @@ -29,16 +29,16 @@ def ready(): Configure this server in `cerebrium.toml` by adding a Python runtime section: ```toml -[cerebrium.deployment] +[deployment] name = "my-fastapi-app" -[cerebrium.runtime.python] +[runtime.python] entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "5000"] port = 5000 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[cerebrium.dependencies.pip] +[dependencies.pip] pydantic = "latest" numpy = "latest" loguru = "latest" @@ -56,7 +56,7 @@ The configuration requires the following key parameters: You can also configure build settings in the Python runtime section: ```toml -[cerebrium.runtime.python] +[runtime.python] python_version = "3.11" docker_base_image_url = "debian:bookworm-slim" use_uv = true diff --git a/cerebrium/container-images/defining-container-images.mdx b/cerebrium/container-images/defining-container-images.mdx index 2e1b8481..8686d60e 100644 --- a/cerebrium/container-images/defining-container-images.mdx +++ b/cerebrium/container-images/defining-container-images.mdx @@ -31,7 +31,7 @@ Check out the [Introductory Guide](/cerebrium/getting-started/introduction) for Cerebrium provides flexible hardware options to match app requirements. The basic configuration specifies GPU type and memory allocations. ```toml -[cerebrium.hardware] +[hardware] compute = "AMPERE_A10" # GPU selection memory = 16.0 # Memory allocation in GB cpu = 4 # Number of CPU cores @@ -47,14 +47,14 @@ For detailed hardware specifications and performance characteristics see the [GP The Python runtime version forms the foundation of every Cerebrium app. We currently support versions 3.10 to 3.13. Specify the Python version in the runtime section of the configuration: ```toml -[cerebrium.runtime.cortex] +[runtime.cortex] python_version = "3.11" ``` Or for custom Python ASGI/WSGI apps: ```toml -[cerebrium.runtime.python] +[runtime.python] python_version = "3.11" entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] port = 8000 @@ -72,7 +72,7 @@ The Python version affects the entire dependency chain. For instance, some packa Python dependencies can be managed directly in TOML or through requirement files. The system caches packages to speed up builds: ```toml -[cerebrium.dependencies.pip] +[dependencies.pip] torch = "==2.0.0" transformers = "==4.30.0" numpy = "latest" @@ -81,7 +81,7 @@ numpy = "latest" Or using an existing requirements file: ```toml -[cerebrium.dependencies.paths] +[dependencies.paths] pip = "requirements.txt" ``` @@ -94,19 +94,19 @@ The system implements an intelligent caching strategy at the node level. When an ### Adding APT Packages -System-level packages provide the foundation for many ML apps, handling everything from image-processing libraries to audio codecs. These can be added to the `cerebrium.toml` file under the `[cerebrium.dependencies.apt]` section as follows: +System-level packages provide the foundation for many ML apps, handling everything from image-processing libraries to audio codecs. These can be added to the `cerebrium.toml` file under the `[dependencies.apt]` section as follows: ```toml -[cerebrium.dependencies.apt] +[dependencies.apt] ffmpeg = "latest" libopenblas-base = "latest" libomp-dev = "latest" ``` -For teams with standardized system dependencies, text files can be used instead by adding the following to the `[cerebrium.dependencies.paths]` section: +For teams with standardized system dependencies, text files can be used instead by adding the following to the `[dependencies.paths]` section: ```toml -[cerebrium.dependencies.paths] +[dependencies.paths] apt = "deps_folder/pkglist.txt" ``` @@ -117,7 +117,7 @@ Since APT packages modify the system environment, any changes to these dependenc Conda excels at managing complex system-level Python dependencies, particularly for GPU support and scientific computing: ```toml -[cerebrium.dependencies.conda] +[dependencies.conda] cuda = ">=11.7" cudatoolkit = "11.7" opencv = "latest" @@ -126,7 +126,7 @@ opencv = "latest" Teams using conda environments can specify their environment file: ```toml -[cerebrium.dependencies.paths] +[dependencies.paths] conda = "conda_pkglist.txt" ``` @@ -141,7 +141,7 @@ Cerebrium's build process includes two specialized command types that execute at Pre-build commands execute at the start of the build process, before dependency installation begins. This early execution timing makes them essential for setting up the build environment: ```toml -[cerebrium.runtime.cortex] +[runtime.cortex] pre_build_commands = [ # Add specialized build tools "curl -o /usr/local/bin/pget -L 'https://github.com/replicate/pget/releases/download/v0.6.2/pget_linux_x86_64'", @@ -156,7 +156,7 @@ Pre-build commands typically handle tasks like installing build tools, configuri Shell commands execute after all dependencies install and the application code copies into the container. This later timing ensures access to the complete environment: ```toml -[cerebrium.runtime.cortex] +[runtime.cortex] shell_commands = [ # Initialize application resources "python -m download_models", @@ -186,7 +186,7 @@ The base image selection shapes how an app runs in Cerebrium. While the default Cerebrium supports several categories of base images to ensure system compatibility such as nvidia, ubuntu and python images. ```toml -[cerebrium.runtime.cortex] +[runtime.cortex] docker_base_image_url = "debian:bookworm-slim" # Default minimal image #docker_base_image_url = "nvidia/cuda:12.0.1-runtime-ubuntu22.04" # CUDA-enabled images #docker_base_image_url = "ubuntu:22.04" # debian images @@ -213,7 +213,7 @@ docker login -u your-dockerhub-username After logging in, you can use the image in your configuration: ```toml -[cerebrium.runtime.cortex] +[runtime.cortex] docker_base_image_url = "bob/infinity:latest" ``` @@ -234,7 +234,7 @@ docker_base_image_url = "bob/infinity:latest" Public ECR images from the `public.ecr.aws` registry work without authentication: ```toml -[cerebrium.runtime.cortex] +[runtime.cortex] docker_base_image_url = "public.ecr.aws/lambda/python:3.11" ``` @@ -246,13 +246,13 @@ While Cerebrium's default cortex runtime works well for most apps, teams often n ### Python Runtime (ASGI/WSGI) -For custom Python web servers, use the `[cerebrium.runtime.python]` section: +For custom Python web servers, use the `[runtime.python]` section: ```toml -[cerebrium.deployment] +[deployment] name = "my-fastapi-app" -[cerebrium.runtime.python] +[runtime.python] python_version = "3.11" entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"] port = 8080 @@ -275,13 +275,13 @@ Key parameters: ### Docker Runtime -For complete control over your container, use the `[cerebrium.runtime.docker]` section with a custom Dockerfile: +For complete control over your container, use the `[runtime.docker]` section with a custom Dockerfile: ```toml -[cerebrium.deployment] +[deployment] name = "my-docker-app" -[cerebrium.runtime.docker] +[runtime.docker] dockerfile_path = "./Dockerfile" port = 8080 healthcheck_endpoint = "/health" @@ -290,7 +290,7 @@ readycheck_endpoint = "/ready" When using the docker runtime, all dependencies and build commands should be - handled within the Dockerfile. The `[cerebrium.dependencies.*]` sections will + handled within the Dockerfile. The `[dependencies.*]` sections will be ignored. @@ -299,16 +299,16 @@ readycheck_endpoint = "/ready" Custom runtimes also support apps with built-in servers. For example, deploying a VLLM server requires no Python code: ```toml -[cerebrium.deployment] +[deployment] name = "vllm-server" -[cerebrium.runtime.python] +[runtime.python] entrypoint = ["vllm", "serve", "meta-llama/Meta-Llama-3-8B-Instruct", "--host", "0.0.0.0", "--port", "8000", "--device", "cuda"] port = 8000 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[cerebrium.dependencies.pip] +[dependencies.pip] torch = "latest" vllm = "latest" ``` diff --git a/cerebrium/container-images/private-docker-registry.mdx b/cerebrium/container-images/private-docker-registry.mdx index 3af2526e..756298ed 100644 --- a/cerebrium/container-images/private-docker-registry.mdx +++ b/cerebrium/container-images/private-docker-registry.mdx @@ -61,8 +61,10 @@ You should see your registry URL(s) listed. Edit your cerebrium.toml and enter the url of the registry image from the step 2: ```toml -[cerebrium.deployment] +[deployment] name = "my-app" + +[runtime.cortex] python_version = "3.11" docker_base_image_url = "your-registry.com/your-org/your-image:tag" diff --git a/cerebrium/deployments/gradual-roll-out.mdx b/cerebrium/deployments/gradual-roll-out.mdx index 56310cfa..e636f33d 100644 --- a/cerebrium/deployments/gradual-roll-out.mdx +++ b/cerebrium/deployments/gradual-roll-out.mdx @@ -5,7 +5,7 @@ description: "Control the transition between revisions during deployments" This feature is available from CLI version 1.38.2 -The `roll_out_duration_seconds` parameter in the `[cerebrium.scaling]` section of your `cerebrium.toml` file controls how quickly traffic transitions between revisions after a successful build. +The `roll_out_duration_seconds` parameter in the `[scaling]` section of your `cerebrium.toml` file controls how quickly traffic transitions between revisions after a successful build. ## Overview @@ -15,10 +15,10 @@ Traffic is shifted in 5 batches of 20% each, over the specified duration. This g ## Configuration -Add the `roll_out_duration_seconds` parameter to the `[cerebrium.scaling]` section of your `cerebrium.toml` file: +Add the `roll_out_duration_seconds` parameter to the `[scaling]` section of your `cerebrium.toml` file: ```toml -[cerebrium.scaling] +[scaling] roll_out_duration_seconds = 0 # Default value ``` diff --git a/cerebrium/deployments/multi-region-deployment.mdx b/cerebrium/deployments/multi-region-deployment.mdx index a7a8ae03..ce7e3d5e 100644 --- a/cerebrium/deployments/multi-region-deployment.mdx +++ b/cerebrium/deployments/multi-region-deployment.mdx @@ -70,10 +70,10 @@ cerebrium ls --region eu-west-2 ## App Deployment -Configure your app's deployment region using the `region` parameter in the `[cerebrium.hardware]` section of your `cerebrium.toml` file: +Configure your app's deployment region using the `region` parameter in the `[hardware]` section of your `cerebrium.toml` file: ```toml -[cerebrium.hardware] +[hardware] region = "us-east-1" provider = "aws" compute = "AMPERE_A10" diff --git a/cerebrium/endpoints/websockets.mdx b/cerebrium/endpoints/websockets.mdx index 87b6bd0f..0d871416 100644 --- a/cerebrium/endpoints/websockets.mdx +++ b/cerebrium/endpoints/websockets.mdx @@ -10,9 +10,9 @@ To set up a WebSocket endpoint, you need to configure your app to use a custom r Below is an example of the required changes in your `cerebrium.toml` configuration file: ```toml -[cerebrium.runtime.custom] +[runtime.python] port = 5000 -entrypoint = "uvicorn main:app --host 0.0.0.0 --port 5000" +entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "5000"] healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" ``` diff --git a/cerebrium/hardware/cpu-and-memory.mdx b/cerebrium/hardware/cpu-and-memory.mdx index c2db8b15..3332d184 100644 --- a/cerebrium/hardware/cpu-and-memory.mdx +++ b/cerebrium/hardware/cpu-and-memory.mdx @@ -13,7 +13,7 @@ CPU and memory resources on Cerebrium are allocated per container and billed bas CPU resources are specified as vCPU units (float) in the `cerebrium.toml` file: ```toml -[cerebrium.hardware] +[hardware] cpu = 4 # Number of vCPU cores ``` @@ -24,7 +24,7 @@ For most applications, starting with 4 CPU cores is recommended. Additional core Memory is specified in gigabytes as a floating-point number: ```toml -[cerebrium.hardware] +[hardware] memory = 16.0 # Memory in GB ``` diff --git a/cerebrium/hardware/using-cuda.mdx b/cerebrium/hardware/using-cuda.mdx index 2f2490a8..3f7359d9 100644 --- a/cerebrium/hardware/using-cuda.mdx +++ b/cerebrium/hardware/using-cuda.mdx @@ -15,7 +15,7 @@ CUDA connects apps to graphics cards, splitting large tasks into smaller pieces Many Python packages come with built-in CUDA support. For example, the popular machine learning package PyTorch includes CUDA in its installation: ```toml -[cerebrium.dependencies.pip] +[dependencies.pip] torch = "latest" # PyTorch with graphics card support ``` @@ -31,7 +31,7 @@ Some apps need direct access to CUDA system libraries and tools. The CUDA base i The base image can be set in the `cerebrium.toml` file: ```toml -[cerebrium.deployment] +[runtime.cortex] docker_base_image_url = "nvidia/cuda:12.1.1-runtime-ubuntu22.04" ``` diff --git a/cerebrium/hardware/using-gpus.mdx b/cerebrium/hardware/using-gpus.mdx index 9627a04c..8f11c2e8 100644 --- a/cerebrium/hardware/using-gpus.mdx +++ b/cerebrium/hardware/using-gpus.mdx @@ -8,7 +8,7 @@ Applications deployed on Cerebrium can access GPU computing power without managi ## Specifying GPUs -The GPU configuration in the `cerebrium.toml` is handled through the `[cerebrium.hardware]` section, where you can specify both the type (using the `compute` parameter) and quantity of GPUs (`gpu_count`) for your app. We address additional deployment configurations and GPU scaling considerations in more detail in the sections below. +The GPU configuration in the `cerebrium.toml` is handled through the `[hardware]` section, where you can specify both the type (using the `compute` parameter) and quantity of GPUs (`gpu_count`) for your app. We address additional deployment configurations and GPU scaling considerations in more detail in the sections below. ## Available GPUs @@ -54,7 +54,7 @@ Multiple GPUs become essential when: Multiple GPUs are configured in the `cerebrium.toml` file: ```toml -[cerebrium.hardware] +[hardware] compute = "AMPERE_A100_80GB" gpu_count = 4 # Number of GPUs needed cpu = 8 diff --git a/cerebrium/other-topics/faster-cold-starts.mdx b/cerebrium/other-topics/faster-cold-starts.mdx index 9f158844..02a21f3b 100644 --- a/cerebrium/other-topics/faster-cold-starts.mdx +++ b/cerebrium/other-topics/faster-cold-starts.mdx @@ -46,7 +46,7 @@ In this section below, we'll show you how to use **Tensorizer** to load your mod ### Installation -Add the following to your `[cerebrium.dependencies.pip]` in your `cerebrium.toml` file to install Tensorizer in your deployment: +Add the following to your `[dependencies.pip]` in your `cerebrium.toml` file to install Tensorizer in your deployment: ```txt tensorizer = ">=2.7.0" diff --git a/cerebrium/partner-services/deepgram.mdx b/cerebrium/partner-services/deepgram.mdx index d97ebcee..3bd25c05 100644 --- a/cerebrium/partner-services/deepgram.mdx +++ b/cerebrium/partner-services/deepgram.mdx @@ -252,21 +252,21 @@ max_response_size = 1073741824 # 1GB 6. Update the cerebrium.toml file with the following configuration to set hardware requirements, scaling parameters, region, and other settings: ```toml -[cerebrium.deployment] +[deployment] name = "deepgram" # Enable below in production environments disable_auth = true -[cerebrium.runtime.deepgram] +[runtime.deepgram] -[cerebrium.hardware] +[hardware] cpu = 4 region = "us-east-1" memory = 32 compute = "AMPERE_A10" gpu_count = 1 -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 2 cooldown = 120 diff --git a/cerebrium/partner-services/rime.mdx b/cerebrium/partner-services/rime.mdx index 6e519946..0cb96779 100644 --- a/cerebrium/partner-services/rime.mdx +++ b/cerebrium/partner-services/rime.mdx @@ -20,24 +20,24 @@ Cerebrium's partnership with [Rime](https://www.rime.ai/) helps teams deliver te cerebrium init rime ``` -3. Rime services use a simplified TOML configuration with the `[cerebrium.runtime.rime]` section. Create a `cerebrium.toml` file with the following: +3. Rime services use a simplified TOML configuration with the `[runtime.rime]` section. Create a `cerebrium.toml` file with the following: ```toml -[cerebrium.deployment] +[deployment] name = "rime" disable_auth = true -[cerebrium.runtime.rime] +[runtime.rime] port = 8001 -[cerebrium.hardware] +[hardware] cpu = 4 memory = 30 compute = "AMPERE_A10" gpu_count = 1 region = "us-east-1" -[cerebrium.scaling] +[scaling] min_replicas = 1 max_replicas = 2 cooldown = 120 diff --git a/cerebrium/scaling/batching-concurrency.mdx b/cerebrium/scaling/batching-concurrency.mdx index a22ddf48..0ca6371d 100644 --- a/cerebrium/scaling/batching-concurrency.mdx +++ b/cerebrium/scaling/batching-concurrency.mdx @@ -8,7 +8,7 @@ description: "Improve throughput and cost performance with batching and concurre Concurrency in Cerebrium allows each instance to process multiple requests simultaneously. The `replica_concurrency` setting in the `cerebrium.toml` file determines how many requests each instance handles in parallel: ```toml -[cerebrium.scaling] +[scaling] replica_concurrency = 4 # Process up to 4 requests simultaneously. ``` @@ -27,13 +27,13 @@ Cerebrium supports two approaches to request batching. Many frameworks include features for processing multiple requests efficiently. vLLM, for example, automatically handles batched model inference requests: ```toml -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 2 cooldown = 10 replica_concurrency = 4 # Each container can now handle multiple requests. -[cerebrium.dependencies.pip] +[dependencies.pip] sentencepiece = "latest" torch = "latest" vllm = "latest" @@ -57,13 +57,13 @@ Applications requiring precise control over request processing can implement cus As an example, implementation with LitServe requires additional configuration in the `cerebrium.toml` file: ```toml -[cerebrium.runtime.custom] +[runtime.python] port = 8000 entrypoint = ["python", "app/main.py"] healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[cerebrium.dependencies.pip] +[dependencies.pip] litserve = "latest" fastapi = "latest" ``` diff --git a/cerebrium/scaling/graceful-termination.mdx b/cerebrium/scaling/graceful-termination.mdx index 3ca2377f..9ec10426 100644 --- a/cerebrium/scaling/graceful-termination.mdx +++ b/cerebrium/scaling/graceful-termination.mdx @@ -125,12 +125,12 @@ def hello(): If you already have a `cerebrium.toml` file, add or update these sections. If you don't have one, create a new file with the following: ```toml -[cerebrium.deployment] +[deployment] name = "your-app-name" include = ["./*"] exclude = [".*"] -[cerebrium.hardware] +[hardware] cpu = 1 memory = 1.0 compute = "CPU" @@ -138,13 +138,13 @@ gpu_count = 0 provider = "aws" region = "us-east-1" -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 2 cooldown = 30 replica_concurrency = 1 # Must match max_concurrency in your AppState -[cerebrium.runtime.custom] +[runtime.docker] port = 8000 dockerfile_path = "./Dockerfile" ``` diff --git a/cerebrium/scaling/scaling-apps.mdx b/cerebrium/scaling/scaling-apps.mdx index f418e389..5a264ebd 100644 --- a/cerebrium/scaling/scaling-apps.mdx +++ b/cerebrium/scaling/scaling-apps.mdx @@ -23,7 +23,7 @@ As traffic decreases, instances enter a cooldown period at reduced concurrency. The `cerebrium.toml` file controls scaling behavior through several key parameters: ```toml -[cerebrium.scaling] +[scaling] min_replicas = 0 # Minimum running instances max_replicas = 3 # Maximum concurrent instances cooldown = 60 # Cooldown period in seconds @@ -67,7 +67,7 @@ Cerebrium ensures reliability through automatic instance health management. The Apps requiring maximum reliability often combine several scaling features: ```toml -[cerebrium.scaling] +[scaling] min_replicas = 2 # Maintain redundant instances cooldown = 600 # Extended warm period max_replicas = 10 # Room for traffic spikes @@ -108,10 +108,10 @@ and so Cerebrium currently provides four scaling metrics to choose from: - `cpu_utilization` - `memory_utilization` -These can be added to the `cerebrium.scaling` section as such, by specifying one of these metrics and a target: +These can be added to the `scaling` section as such, by specifying one of these metrics and a target: ```toml -[cerebrium.scaling] +[scaling] min_replicas = 0 cooldown = 600 max_replicas = 10 @@ -143,14 +143,14 @@ if `scaling_target=5`, Cerebrium will attempt to maintain a 5 requests/s average ### CPU Utilization `cpu_utilization` uses a maximum CPU percentage utilization averaged over all instances of an application to scale out, relative to the -`cerebrium.hardware.cpu` value. For example, if an application has `cpu=2` and `scaling_target=80`, Cerebrium will attempt +`hardware.cpu` value. For example, if an application has `cpu=2` and `scaling_target=80`, Cerebrium will attempt to maintain _80%_ CPU utilization (1.6 CPUs) per instance across your entire deployed service. Since there is no notion of scaling relative to 0 CPU units, it is required that `min_replicas=1` if using this metric. ### Memory Utilization `memory_utilization` uses a maximum memory percentage utilization averaged over all instances of an application to scale out, relative to the -`cerebrium.hardware.memory` value. Note this refers to RAM, **not** GPU VRAM utilization. For example, if an application has `memory=10` and `scaling_target=80`, Cerebrium will attempt +`hardware.memory` value. Note this refers to RAM, **not** GPU VRAM utilization. For example, if an application has `memory=10` and `scaling_target=80`, Cerebrium will attempt to maintain _80%_ Memory utilization (8GB) per instance across your entire deployed service. Since there is no notion of scaling relative to 0GB of memory, it is required that `min_replicas=1` if using this metric. @@ -165,10 +165,10 @@ is only available when using the following scaling metrics: - `concurrency_utilization` - `requests_per_second` -The buffer can be added to the `cerebrium.scaling` section as such, by specifying `scaling_buffer`: +The buffer can be added to the `scaling` section as such, by specifying `scaling_buffer`: ```toml -[cerebrium.scaling] +[scaling] min_replicas = 1 cooldown = 600 max_replicas = 10 @@ -192,7 +192,7 @@ Once this request has completed, the usual `cooldown` period will apply, and the The `evaluation_interval` parameter controls the time window (in seconds) over which the autoscaler evaluates metrics before making scaling decisions. The default is 30 seconds, with a valid range of 6-300 seconds. ```toml -[cerebrium.scaling] +[scaling] evaluation_interval = 30 # Evaluate metrics over 30-second windows ``` @@ -211,7 +211,7 @@ A shorter interval makes the autoscaler more responsive to traffic spikes but ma The `load_balancing` parameter controls how incoming requests are distributed across your replicas. When not specified, the system automatically selects the best algorithm based on your `replica_concurrency` setting. ```toml -[cerebrium.scaling] +[scaling] load_balancing = "min-connections" # Explicitly set load balancing algorithm ``` diff --git a/cerebrium/storage/managing-files.mdx b/cerebrium/storage/managing-files.mdx index d6da4e4b..1acfd8a7 100644 --- a/cerebrium/storage/managing-files.mdx +++ b/cerebrium/storage/managing-files.mdx @@ -15,7 +15,7 @@ Cerebrium offers file management through a 50GB persistent volume that's availab The `cerebrium.toml` configuration file controls which files become part of the app: ```toml -[cerebrium.deployment] +[deployment] include = [ "src/*.py", # Python files in src. "config/*.json", # JSON files in config. diff --git a/migrations/hugging-face.mdx b/migrations/hugging-face.mdx index 64e0d53a..a8f8d7c7 100644 --- a/migrations/hugging-face.mdx +++ b/migrations/hugging-face.mdx @@ -53,24 +53,26 @@ pip install cerebrium --upgrade Scaffold your application by running `cerebrium init [PROJECT_NAME]`. During the initialization, a `cerebrium.toml` is created. This file configures the deployment, hardware, scaling, and dependencies for your Cerebrium project. Update your `cerebrium.toml` file to reflect the following: ```toml -[cerebrium.deployment] +[deployment] name = "llama-8b-vllm" -python_version = "3.11" -docker_base_image_url = "debian:bookworm-slim" include = ["./*", "main.py", "cerebrium.toml"] exclude = [".*"] -[cerebrium.hardware] +[runtime.cortex] +python_version = "3.11" +docker_base_image_url = "debian:bookworm-slim" + +[hardware] cpu = 2 memory = 12.0 compute = "AMPERE_A10" -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 5 cooldown = 30 -[cerebrium.dependencies.pip] +[dependencies.pip] sentencepiece = "latest" torch = "latest" transformers = "latest" @@ -83,10 +85,11 @@ bitsandbytes = "latest" Let's break down this configuration: -- `cerebrium.deployment`: Specifies the project name, Python version, base Docker image, and which files to include/exclude as project files. -- `cerebrium.hardware`: Defines the CPU, memory, and GPU requirements for your deployment. -- `cerebrium.scaling`: Configures auto-scaling behavior, including minimum and maximum replicas, and cooldown period. -- `cerebrium.dependencies.pip`: Lists the Python packages required for your project. +- `deployment`: Specifies the project name and which files to include/exclude as project files. +- `runtime.cortex`: Specifies the Python version and base Docker image. +- `hardware`: Defines the CPU, memory, and GPU requirements for your deployment. +- `scaling`: Configures auto-scaling behavior, including minimum and maximum replicas, and cooldown period. +- `dependencies.pip`: Lists the Python packages required for your project. #### 1.3 Update your code diff --git a/migrations/mystic.mdx b/migrations/mystic.mdx index 30646c69..5006d7f1 100644 --- a/migrations/mystic.mdx +++ b/migrations/mystic.mdx @@ -62,26 +62,28 @@ Transforms into Cerebrium's easy-to-understand TOML config: ```toml # cerebrium.toml -[cerebrium.deployment] +[deployment] name = "stable-diffusion" -python_version = "3.11" -docker_base_image_url = "debian:bookworm-slim" include = ["./*", "main.py", "cerebrium.toml"] exclude = [".*"] -[cerebrium.hardware] +[runtime.cortex] +python_version = "3.11" +docker_base_image_url = "debian:bookworm-slim" + +[hardware] compute = "AMPERE_A10" # Choose your GPU type cpu = 4 # Number of CPU cores memory = 16.0 # Memory in GB gpu_count = 1 # Number of GPUs -[cerebrium.scaling] +[scaling] min_replicas = 0 # Save costs when inactive and scale down your app max_replicas = 2 # Handle increased traffic and scale up where necessary cooldown = 60 # Time window at reduced concurrency before scaling down replica_concurrency = 1 # The number of requests a single container can support -[cerebrium.dependencies.pip] +[dependencies.pip] torch = ">=2.0.0" pydantic = "latest" transformers = "latest" diff --git a/migrations/replicate.mdx b/migrations/replicate.mdx index f93b6ebf..bed603a3 100644 --- a/migrations/replicate.mdx +++ b/migrations/replicate.mdx @@ -22,17 +22,19 @@ Now Cerebrium and Replicate have a common setup in that they both have a setup f Looking at the cog.yaml, we need to add/change the following in our cerebrium.toml ```python -[cerebrium.deployment] +[deployment] name = "cog-migration-sdxl" -python_version = "3.11" include = ["./*", "main.py", "cerebrium.toml"] exclude = ["./example_exclude"] + +[runtime.cortex] +python_version = "3.11" docker_base_image_url = "nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04" shell_commands = [ "curl -o /usr/local/bin/pget -L 'https://github.com/replicate/pget/releases/download/v0.6.2/pget_linux_x86_64' && chmod +x /usr/local/bin/pget" ] -[cerebrium.hardware] +[hardware] region = "us-east-1" provider = "aws" compute = "AMPERE_A10" @@ -40,14 +42,14 @@ cpu = 2 memory = 12.0 gpu_count = 1 -[cerebrium.dependencies.pip] +[dependencies.pip] "accelerate" = "latest" "diffusers" = "latest" "torch" = "==2.0.1" "torchvision" = "==0.15.2" "transformers" = "latest" -[cerebrium.dependencies.apt] +[dependencies.apt] "curl" = "latest" ``` diff --git a/toml-reference/toml-reference.mdx b/toml-reference/toml-reference.mdx index 3ee0cb3f..e75fa885 100644 --- a/toml-reference/toml-reference.mdx +++ b/toml-reference/toml-reference.mdx @@ -9,7 +9,7 @@ description: Complete reference for all parameters available in Cerebrium's defa | Current (Deprecated) | New Format | | --------------------------------- | --------------------- | | `[cerebrium.deployment]` | `[deployment]` | -| `[cerebrium.runtime.cortex]` | `[runtime.cortex]` | +| `[cerebrium.runtime.cortex]` | `[runtime.auto-py]` | | `[cerebrium.hardware]` | `[hardware]` | | `[cerebrium.scaling]` | `[scaling]` | | `[cerebrium.dependencies]` | `[dependencies]` | @@ -17,17 +17,17 @@ description: Complete reference for all parameters available in Cerebrium's defa The configuration is organized into the following main sections: -- **[cerebrium.deployment]** Core settings like app name and file inclusion rules -- **[cerebrium.runtime.cortex]** Default Cerebrium-managed Python runtime (build settings) -- **[cerebrium.runtime.python]** Custom Python ASGI/WSGI web server settings -- **[cerebrium.runtime.docker]** Custom Dockerfile settings -- **[cerebrium.hardware]** Compute resources including CPU, memory, and GPU specifications -- **[cerebrium.scaling]** Auto-scaling behavior and replica management -- **[cerebrium.dependencies]** Package management for Python (pip), system (apt), and Conda dependencies +- **[deployment]** Core settings like app name and file inclusion rules +- **[runtime.auto-py]** Default Cerebrium-managed Python runtime (build settings) +- **[runtime.python]** Custom Python ASGI/WSGI web server settings +- **[runtime.docker]** Custom Dockerfile settings +- **[hardware]** Compute resources including CPU, memory, and GPU specifications +- **[scaling]** Auto-scaling behavior and replica management +- **[dependencies]** Package management for Python (pip), system (apt), and Conda dependencies ## Deployment Configuration -The `[cerebrium.deployment]` section defines core deployment settings that apply to all runtime types. +The `[deployment]` section defines core deployment settings that apply to all runtime types. | Option | Type | Default | Description | | --------------------------------- | -------- | ---------------- | ------------------------------------------------------------------------------------------------------------ | @@ -41,9 +41,9 @@ The `[cerebrium.deployment]` section defines core deployment settings that apply Cerebrium supports three runtime types. You should only specify one runtime section in your configuration. -### Cortex Runtime (Default) +### Auto-Py Runtime (Default) -The `[cerebrium.runtime.cortex]` section configures the default Cerebrium-managed Python runtime. This is ideal for standard Python applications using the default Cortex framework. +The `[runtime.auto-py]` section configures the default Cerebrium-managed Python runtime. This is ideal for standard Python applications where Cerebrium automatically manages the web server. | Option | Type | Default | Description | | --------------------- | -------- | ---------------------- | --------------------------------------------- | @@ -56,10 +56,10 @@ The `[cerebrium.runtime.cortex]` section configures the default Cerebrium-manage **Example:** ```toml -[cerebrium.deployment] -name = "my-cortex-app" +[deployment] +name = "my-app" -[cerebrium.runtime.cortex] +[runtime.auto-py] python_version = "3.12" docker_base_image_url = "debian:bookworm-slim" use_uv = true @@ -72,7 +72,7 @@ use_uv = true ### Python Runtime (Custom ASGI/WSGI) -The `[cerebrium.runtime.python]` section configures custom Python web servers (ASGI/WSGI). Use this when you need full control over your web server implementation for features like custom authentication, dynamic batching, or WebSocket connections. +The `[runtime.python]` section configures custom Python ASGI/WSGI web servers. Use this when you need full control over your web server implementation for features like custom authentication, dynamic batching, or WebSocket connections. | Option | Type | Default | Description | | --------------------- | -------- | ------------------------------------------------------------------ | ---------------------------------------------------------------------- | @@ -89,17 +89,17 @@ The `[cerebrium.runtime.python]` section configures custom Python web servers (A **Example:** ```toml -[cerebrium.deployment] +[deployment] name = "my-fastapi-app" -[cerebrium.runtime.python] +[runtime.python] python_version = "3.11" entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] port = 8000 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[cerebrium.runtime.python.dependencies.pip] +[runtime.python.deps.pip] fastapi = "latest" uvicorn = "latest" ``` @@ -112,7 +112,7 @@ uvicorn = "latest" ### Docker Runtime (Custom Dockerfile) -The `[cerebrium.runtime.docker]` section configures deployments using custom Dockerfiles. Use this for non-Python applications or when you need complete control over the container build process. +The `[runtime.docker]` section configures deployments using custom Dockerfiles. Use this for non-Python applications or when you need complete control over the container build process. | Option | Type | Default | Description | | -------------------- | -------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | @@ -125,10 +125,10 @@ The `[cerebrium.runtime.docker]` section configures deployments using custom Doc **Example:** ```toml -[cerebrium.deployment] +[deployment] name = "my-docker-app" -[cerebrium.runtime.docker] +[runtime.docker] dockerfile_path = "./Dockerfile" port = 8080 healthcheck_endpoint = "/health" @@ -144,13 +144,13 @@ readycheck_endpoint = "/ready" When using `dockerfile_path`, all dependencies and build commands should be - handled within the Dockerfile. The `[cerebrium.dependencies.*]` sections, + handled within the Dockerfile. The `[dependencies.*]` sections, `shell_commands`, and `pre_build_commands` will be ignored. ### UV Package Manager -UV is a fast Python package installer written in Rust that can significantly speed up deployment times. When enabled in `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]`, UV will be used instead of pip for installing Python dependencies. +UV is a fast Python package installer written in Rust that can significantly speed up deployment times. When enabled in `[runtime.auto-py]` or `[runtime.python]`, UV will be used instead of pip for installing Python dependencies. UV typically installs packages 10-100x faster than pip, especially beneficial for: @@ -164,7 +164,7 @@ UV typically installs packages 10-100x faster than pip, especially beneficial fo **Example with UV enabled:** ```toml -[cerebrium.runtime.cortex] +[runtime.auto-py] use_uv = true ``` @@ -206,7 +206,7 @@ Include in your deployment: ## Hardware Configuration -The `[cerebrium.hardware]` section defines compute resources. +The `[hardware]` section defines compute resources. | Option | Type | Default | Description | | --------- | ------- | ----------- | ------------------------------------ | @@ -224,7 +224,7 @@ The `[cerebrium.hardware]` section defines compute resources. ## Scaling Configuration -The `[cerebrium.scaling]` section controls auto-scaling behavior. +The `[scaling]` section controls auto-scaling behavior. | Option | Type | Default | CLI Requirement | Description | | ------------------------- | ------- | ------------------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | @@ -249,8 +249,8 @@ The `scaling_metric` options are: - **concurrency_utilization**: Maintains a percentage of your replica_concurrency across instances. For example, with `replica_concurrency=200` and `scaling_target=80`, maintains 160 requests per instance. - **requests_per_second**: Maintains a specific request rate across all instances. For example, `scaling_target=5` maintains 5 requests/s average across instances. -- **cpu_utilization**: Maintains CPU usage as a percentage of cerebrium.hardware.cpu. For example, with `cpu=2` and `scaling_target=80`, maintains 80% CPU utilization (1.6 CPUs) per instance. -- **memory_utilization**: Maintains RAM usage as a percentage of cerebrium.hardware.memory. For example, with `memory=10` and `scaling_target=80`, maintains 80% memory utilization (8GB) per instance. +- **cpu_utilization**: Maintains CPU usage as a percentage of hardware.cpu. For example, with `cpu=2` and `scaling_target=80`, maintains 80% CPU utilization (1.6 CPUs) per instance. +- **memory_utilization**: Maintains RAM usage as a percentage of hardware.memory. For example, with `memory=10` and `scaling_target=80`, maintains 80% memory utilization (8GB) per instance. The scaling_buffer option is only available with concurrency_utilization and requests_per_second metrics. @@ -269,31 +269,31 @@ Dependencies can be specified either at the runtime level (recommended) or at th Dependencies can be specified within the runtime section. This is the recommended approach: ```toml -[cerebrium.runtime.cortex] +[runtime.auto-py] python_version = "3.12" -[cerebrium.runtime.cortex.dependencies.pip] +[runtime.auto-py.deps.pip] torch = "==2.0.0" # Exact version numpy = "latest" # Latest version pandas = ">=1.5.0" # Minimum version -[cerebrium.runtime.cortex.dependencies.apt] +[runtime.auto-py.deps.apt] ffmpeg = "latest" libopenblas-base = "latest" -[cerebrium.runtime.cortex.dependencies.conda] +[runtime.auto-py.deps.conda] cuda = ">=11.7" cudatoolkit = "11.7" -[cerebrium.runtime.cortex.dependencies.paths] +[runtime.auto-py.deps.paths] pip = "requirements.txt" # Alternative: use a file instead of inline ``` -This approach works with any runtime type (`cortex`, `python`, or partner services). +This approach works with any runtime type (`auto-py`, `python`, or partner services). -**Deprecated:** Top-level `[cerebrium.dependencies.*]` sections are deprecated. -Please move your dependencies to `[cerebrium.runtime.{runtime_type}.dependencies.*]`. +**Deprecated:** Top-level `[dependencies.*]` sections are deprecated. +Please move your dependencies to `[runtime.{type}.deps.*]`. Dependencies specified at the runtime level take precedence over top-level dependencies when both are present. @@ -305,10 +305,10 @@ This approach is deprecated. Please migrate to runtime-specific dependencies abo #### Pip Dependencies -The `[cerebrium.dependencies.pip]` section lists Python package requirements. +The `[dependencies.pip]` section lists Python package requirements. ```toml -[cerebrium.dependencies.pip] +[dependencies.pip] torch = "==2.0.0" # Exact version numpy = "latest" # Latest version pandas = ">=1.5.0" # Minimum version @@ -316,30 +316,30 @@ pandas = ">=1.5.0" # Minimum version #### APT Dependencies -The `[cerebrium.dependencies.apt]` section specifies system packages. +The `[dependencies.apt]` section specifies system packages. ```toml -[cerebrium.dependencies.apt] +[dependencies.apt] ffmpeg = "latest" libopenblas-base = "latest" ``` #### Conda Dependencies -The `[cerebrium.dependencies.conda]` section manages Conda packages. +The `[dependencies.conda]` section manages Conda packages. ```toml -[cerebrium.dependencies.conda] +[dependencies.conda] cuda = ">=11.7" cudatoolkit = "11.7" ``` #### Dependency Files -The `[cerebrium.dependencies.paths]` section allows using requirement files. +The `[dependencies.paths]` section allows using requirement files. ```toml -[cerebrium.dependencies.paths] +[dependencies.paths] pip = "requirements.txt" apt = "pkglist.txt" conda = "conda_pkglist.txt" @@ -347,27 +347,27 @@ conda = "conda_pkglist.txt" ## Complete Examples -### Cortex Runtime (Default) +### Auto-Py Runtime (Default) ```toml -[cerebrium.deployment] +[deployment] name = "llm-inference" disable_auth = false include = ["*"] exclude = [".*"] -[cerebrium.runtime.cortex] +[runtime.auto-py] python_version = "3.12" docker_base_image_url = "debian:bookworm-slim" use_uv = true shell_commands = [] pre_build_commands = [] -[cerebrium.runtime.cortex.dependencies.pip] +[runtime.auto-py.deps.pip] torch = "latest" transformers = "latest" -[cerebrium.hardware] +[hardware] cpu = 4 memory = 16.0 compute = "AMPERE_A10" @@ -375,7 +375,7 @@ gpu_count = 1 provider = "aws" region = "us-east-1" -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 2 replica_concurrency = 10 @@ -390,26 +390,26 @@ roll_out_duration_seconds = 0 ### Python Runtime (FastAPI) ```toml -[cerebrium.deployment] +[deployment] name = "fastapi-server" disable_auth = false include = ["*"] exclude = [".*"] -[cerebrium.runtime.python] +[runtime.python] python_version = "3.11" entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] port = 8000 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[cerebrium.runtime.python.dependencies.pip] +[runtime.python.deps.pip] torch = "latest" transformers = "latest" uvicorn = "latest" fastapi = "latest" -[cerebrium.hardware] +[hardware] cpu = 4 memory = 16.0 compute = "AMPERE_A10" @@ -417,7 +417,7 @@ gpu_count = 1 provider = "aws" region = "us-east-1" -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 2 replica_concurrency = 10 @@ -426,25 +426,25 @@ replica_concurrency = 10 ### Docker Runtime ```toml -[cerebrium.deployment] +[deployment] name = "rust-server" include = ["*"] exclude = [".*"] -[cerebrium.runtime.docker] +[runtime.docker] dockerfile_path = "./Dockerfile" port = 8192 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[cerebrium.hardware] +[hardware] cpu = 4 memory = 16.0 compute = "CPU" provider = "aws" region = "us-east-1" -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 2 replica_concurrency = 10 @@ -453,41 +453,64 @@ replica_concurrency = 10 ## Backwards Compatibility - The following configuration patterns are deprecated but still supported for - backwards compatibility. We recommend migrating to the new runtime sections. +The following configuration patterns are deprecated but still supported for backwards compatibility. +We recommend migrating to the new format. -### Deprecated: Runtime fields in [cerebrium.deployment] +### Deprecated: cerebrium.* prefix -The following fields in `[cerebrium.deployment]` are deprecated. Please move them to the appropriate runtime section: +The `cerebrium.` prefix on all section names is deprecated. Please migrate to the new format: -| Deprecated Field | New Location | -| --------------------- | ------------------------------------------------------------ | -| python_version | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | -| docker_base_image_url | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | -| shell_commands | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | -| pre_build_commands | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | -| use_uv | `[cerebrium.runtime.cortex]` or `[cerebrium.runtime.python]` | +| Deprecated Format | New Format | +| ---------------------------------- | ----------------------------------- | +| `[cerebrium.deployment]` | `[deployment]` | +| `[cerebrium.runtime.cortex]` | `[runtime.auto-py]` | +| `[cerebrium.runtime.python]` | `[runtime.python]` | +| `[cerebrium.runtime.docker]` | `[runtime.docker]` | +| `[cerebrium.hardware]` | `[hardware]` | +| `[cerebrium.scaling]` | `[scaling]` | +| `[cerebrium.deps.pip]` | `[dependencies.pip]` | +| `[cerebrium.deps.apt]` | `[dependencies.apt]` | -### Deprecated: [cerebrium.runtime.custom] +### Deprecated: Runtime fields in [deployment] -The `[cerebrium.runtime.custom]` section is deprecated. Please migrate to: +The following fields in `[deployment]` are deprecated. Please move them to the appropriate runtime section: -- `[cerebrium.runtime.python]` - For custom Python ASGI/WSGI applications -- `[cerebrium.runtime.docker]` - For custom Dockerfile deployments (when using `dockerfile_path`) +| Deprecated Field | New Location | +| ---------------------- | --------------------------------------------- | +| python_version | `[runtime.auto-py]` or `[runtime.python]` | +| docker_base_image_url | `[runtime.auto-py]` or `[runtime.python]` | +| shell_commands | `[runtime.auto-py]` or `[runtime.python]` | +| pre_build_commands | `[runtime.auto-py]` or `[runtime.python]` | +| use_uv | `[runtime.auto-py]` or `[runtime.python]` | -### Deprecated: Top-level [cerebrium.dependencies.*] +### Deprecated: [runtime.custom] + +The `[runtime.custom]` section is deprecated. Please migrate to: + +- `[runtime.python]` - For custom Python ASGI/WSGI applications +- `[runtime.docker]` - For custom Dockerfile deployments (when using `dockerfile_path`) + +### Deprecated: Runtime names `cortex` and `python` + +The runtime names `cortex` and `python` are deprecated. The old names still work for backwards compatibility but will be removed in a future release. + +| Deprecated Name | New Name | +| ---------------------- | -------------------- | +| `[runtime.cortex]` | `[runtime.auto-py]` | + +### Deprecated: Top-level [dependencies.*] Top-level dependency sections are deprecated. Please move dependencies to runtime-specific sections: -| Deprecated Location | New Location | -| ---------------------------------- | ------------------------------------------------------ | -| `[cerebrium.dependencies.pip]` | `[cerebrium.runtime.{type}.dependencies.pip]` | -| `[cerebrium.dependencies.apt]` | `[cerebrium.runtime.{type}.dependencies.apt]` | -| `[cerebrium.dependencies.conda]` | `[cerebrium.runtime.{type}.dependencies.conda]` | -| `[cerebrium.dependencies.paths]` | `[cerebrium.runtime.{type}.dependencies.paths]` | +| Deprecated Location | New Location | +| ------------------------ | ----------------------------------------- | +| `[dependencies.pip]` | `[runtime.{type}.deps.pip]` | +| `[dependencies.apt]` | `[runtime.{type}.deps.apt]` | +| `[dependencies.conda]` | `[runtime.{type}.deps.conda]` | +| `[dependencies.paths]` | `[runtime.{type}.deps.paths]` | -Where `{type}` is your runtime type (e.g., `cortex`, `python`). +Where `{type}` is your runtime type (e.g., `auto-py`, `python`). When both top-level and runtime-specific dependencies are present, runtime-specific dependencies take precedence on a per-package basis. This allows gradual migration. diff --git a/v4/examples/aiVoiceAgents.mdx b/v4/examples/aiVoiceAgents.mdx index f554a880..f59c4dbb 100644 --- a/v4/examples/aiVoiceAgents.mdx +++ b/v4/examples/aiVoiceAgents.mdx @@ -35,24 +35,24 @@ For our LLM we deploy a OpenAI compatible Llama-3 endpoint using the vLLM framew Run `cerebrium init llama-llm` and add the following code to your cerebrium.toml: ``` -[cerebrium.deployment] +[deployment] name = "llama-llm" python_version = "3.11" docker_base_image_url = "debian:bookworm-slim" include = ["./*", "main.py", "cerebrium.toml"] exclude = [".*"] -[cerebrium.hardware] +[hardware] cpu = 4 memory = 12.0 compute = "ADA_L40" -[cerebrium.scaling] +[scaling] min_replicas = 1 max_replicas = 5 cooldown = 60 -[cerebrium.dependencies.pip] +[dependencies.pip] vllm = "latest" pydantic = "latest" ``` @@ -169,13 +169,13 @@ In your IDE, run the following command to create our Cerebrium starter project: Add the following pip packages to your `cerebrium.toml` to create your deployment environment: ``` -[cerebrium.deployment] +[deployment] name = "pipecat-agent" python_version = "3.11" include = ["./*", "main.py", "cerebrium.toml"] exclude = ["./example_exclude"] -[cerebrium.hardware] +[hardware] region = "us-east-1" provider = "aws" compute = "CPU" @@ -183,12 +183,12 @@ cpu = 6 memory = 18.0 gpu_count = 1 -[cerebrium.scaling] +[scaling] min_replicas = 1 # Note: This incurs a constant cost since at least one instance is always running. max_replicas = 2 cooldown = 180 -[cerebrium.dependencies.pip] +[dependencies.pip] torch = ">=2.0.0" "pipecat-ai[silero, daily, openai, deepgram, cartesia]" = "==0.0.67" aiohttp = ">=3.9.4" diff --git a/v4/examples/asgi-gradio-interface.mdx b/v4/examples/asgi-gradio-interface.mdx index 06d40ae1..e69b884c 100644 --- a/v4/examples/asgi-gradio-interface.mdx +++ b/v4/examples/asgi-gradio-interface.mdx @@ -39,30 +39,30 @@ cerebrium init 2-gradio-interface Next, let us add the following configuration to our `cerebrium.toml` file: ```toml -[cerebrium.deployment] +[deployment] name = "2-gradio-interface" python_version = "3.12" disable_auth = true include = ['./*', 'main.py', 'cerebrium.toml'] exclude = ['.*'] -[cerebrium.runtime.custom] +[runtime.custom] entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"] port = 8080 healthcheck_endpoint = "/health" -[cerebrium.hardware] +[hardware] cpu = 2 memory = 4.0 compute = "CPU" -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 2 cooldown = 30 replica_concurrency = 10 -[cerebrium.dependencies.pip] +[dependencies.pip] gradio = "latest" fastapi = "latest" requests = "latest" diff --git a/v4/examples/comfyUI.mdx b/v4/examples/comfyUI.mdx index 31436a51..79e4a5cb 100644 --- a/v4/examples/comfyUI.mdx +++ b/v4/examples/comfyUI.mdx @@ -261,24 +261,24 @@ def shutdown_event(): Before deploying our application, we must first define the deployment configuration, which installs all our dependencies, contains our hardware and scaling configurations, as well as any scripts which must be executed before running our app. The following should be added to our `cerebrium.toml` file: ``` -[cerebrium.deployment] +[deployment] name = "1-comfyui" python_version = "3.11" include = ["./*", "main.py", "cerebrium.toml", "workflow.json", "workflow_api.json", "helpers.py", "model.json"] exclude = ["./example_exclude", "./ComfyUI", "./ComfyUI/models/checkpoints/sd_xl_base_1.0.safetensors", "./ComfyUI/models/controlnet/diffusion_pytorch_model.fp16.safetensors"] shell_commands = ["git clone https://github.com/comfyanonymous/ComfyUI", "pip install -r ComfyUI/requirements.txt"] -[cerebrium.runtime.custom] +[runtime.custom] port = 8765 entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8765"] healthcheck_endpoint = "/health" -[cerebrium.hardware] +[hardware] compute = "AMPERE_A10" cpu = 4 memory = 16.0 -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 2 cooldown = 30 @@ -288,7 +288,7 @@ scaling_metric = "concurrency_utilization" scaling_target = 100 scaling_buffer = 0 -[cerebrium.dependencies.pip] +[dependencies.pip] uvicorn = "latest" fastapi = "latest" requests = "latest" @@ -313,7 +313,7 @@ tqdm = "latest" psutil = "latest" kornia = ">=0.7.1" -[cerebrium.dependencies.apt] +[dependencies.apt] git = "latest" ``` diff --git a/v4/examples/deploy-a-vision-language-model-with-sglang.mdx b/v4/examples/deploy-a-vision-language-model-with-sglang.mdx index f507c1f6..d70c4d8a 100644 --- a/v4/examples/deploy-a-vision-language-model-with-sglang.mdx +++ b/v4/examples/deploy-a-vision-language-model-with-sglang.mdx @@ -72,25 +72,25 @@ While we use flashinfer as our backend here, other options like flash attention Update your cerebrium.toml with: ```toml -[cerebrium.deployment] +[deployment] name = "7-vision-language-sglang" python_version = "3.11" docker_base_image_url = "nvidia/cuda:12.8.0-devel-ubuntu22.04" deployment_initialization_timeout = 860 -[cerebrium.hardware] +[hardware] cpu = 6.0 memory = 60.0 compute = "ADA_L40" -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 2 -[cerebrium.build] +[build] use_uv = true -[cerebrium.dependencies.pip] +[dependencies.pip] transformers = "latest" huggingface_hub = "latest" pydantic = "latest" @@ -101,10 +101,10 @@ torch = "latest" "sgl-kernel" = "latest" "flashinfer-python" = "latest" -[cerebrium.dependencies.apt] +[dependencies.apt] libnuma-dev = "latest" -[cerebrium.runtime.custom] +[runtime.custom] port = 8000 entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] ``` diff --git a/v4/examples/deploy-an-llm-with-tensorrtllm-tritonserver.mdx b/v4/examples/deploy-an-llm-with-tensorrtllm-tritonserver.mdx index 6874dcf5..4cd36407 100644 --- a/v4/examples/deploy-an-llm-with-tensorrtllm-tritonserver.mdx +++ b/v4/examples/deploy-an-llm-with-tensorrtllm-tritonserver.mdx @@ -388,7 +388,7 @@ The Dockerfile uses Nvidia's official Triton container with TensorRT-LLM pre-ins We now configure our container and autoscaling environment in `cerebrium.toml`. This file defines the hardware resources and scaling behavior: ```toml -[cerebrium.deployment] +[deployment] name = "tensorrt-triton-demo" python_version = "3.12" disable_auth = true @@ -396,7 +396,7 @@ include = ['./*', 'cerebrium.toml'] exclude = ['.*'] deployment_initialization_timeout = 830 -[cerebrium.hardware] +[hardware] cpu = 4.0 memory = 40.0 compute = "AMPERE_A10" @@ -404,14 +404,14 @@ gpu_count = 1 provider = "aws" region = "us-east-1" -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 5 cooldown = 300 replica_concurrency = 128 scaling_metric = "concurrency_utilization" -[cerebrium.runtime.custom] +[runtime.custom] port = 8000 healthcheck_endpoint = "/v2/health/live" readycheck_endpoint = "/v2/health/ready" diff --git a/v4/examples/gpt-oss.mdx b/v4/examples/gpt-oss.mdx index 5b3e6b12..5168c0f5 100644 --- a/v4/examples/gpt-oss.mdx +++ b/v4/examples/gpt-oss.mdx @@ -27,7 +27,7 @@ In this tutorial, we will show the most simple variation of deploying this model 2. Edit your toml file with the following settings ``` -[cerebrium.deployment] +[deployment] name = "7-openai-gpt-oss" python_version = "3.12" docker_base_image_url = "nvidia/cuda:12.8.1-devel-ubuntu22.04" @@ -42,14 +42,14 @@ pre_build_commands = [ "uv pip install huggingface_hub[hf_transfer]==0.34" ] -[cerebrium.hardware] +[hardware] cpu = 8.0 memory = 18.0 compute = "HOPPER_H100" provider = "aws" region = "us-east-1" -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 5 cooldown = 30 @@ -57,7 +57,7 @@ replica_concurrency = 32 scaling_metric = "concurrency_utilization" scaling -[cerebrium.runtime.custom] +[runtime.custom] port = 8000 entrypoint = ["sh", "-c", "export HF_HUB_ENABLE_HF_TRANSFER=1 && export VLLM_USER_V1=1 && vllm serve openai/gpt-oss-20b --enforce-eager"] ``` diff --git a/v4/examples/high-throughput-embeddings.mdx b/v4/examples/high-throughput-embeddings.mdx index ffe131c3..539afc0c 100644 --- a/v4/examples/high-throughput-embeddings.mdx +++ b/v4/examples/high-throughput-embeddings.mdx @@ -36,7 +36,7 @@ docker login -u your-dockerhub-username Now that you are logged in, you can add the following to your cerebrium.toml ``` -[cerebrium.deployment] +[deployment] name = "1-high-throughput" python_version = "3.11" docker_base_image_url = "michaelf34/infinity:0.0.77" @@ -48,20 +48,20 @@ exclude = ['.*'] Depending on your hardware type and model(s) you select you will have different autoscaling criteria. You can define this with the following sections in your cerebrium.toml: ``` -[cerebrium.hardware] +[hardware] cpu = 6.0 memory = 12.0 compute = "AMPERE_A10" region = "us-east-1" -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 2 cooldown = 30 replica_concurrency = 500 scaling_metric = "concurrency_utilization" -[cerebrium.dependencies.pip] +[dependencies.pip] numpy = "latest" "infinity-emb[all]" = "0.0.77" optimum = ">=1.24.0,<2.0.0" @@ -193,7 +193,7 @@ async def classify(sentences: list[str] = Body(...), model_index: int = Body(3)) Now you have a multi-purpose embedding server! Let us update our cerebrium.toml to point it to our FastAPI server. Add the following section: ``` -[cerebrium.runtime.custom] +[runtime.custom] port = 5000 entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "5000"] healthcheck_endpoint = "/health" diff --git a/v4/examples/langchain-langsmith.mdx b/v4/examples/langchain-langsmith.mdx index 39724e05..c073d157 100644 --- a/v4/examples/langchain-langsmith.mdx +++ b/v4/examples/langchain-langsmith.mdx @@ -144,7 +144,7 @@ Set up Cerebrium: Add these pip packages to your `cerebrium.toml`: ``` -[cerebrium.dependencies.pip] +[dependencies.pip] pydantic = "latest" langchain = "latest" pytz = "latest" ##this is used for timezones diff --git a/v4/examples/langchain.mdx b/v4/examples/langchain.mdx index 6d863d59..d5498f3c 100644 --- a/v4/examples/langchain.mdx +++ b/v4/examples/langchain.mdx @@ -17,10 +17,10 @@ First, create your project: cerebrium init 1-langchain-QA ``` -Add these Python packages to the `[cerebrium.dependencies.pip]` section of your `cerebrium.toml` file: +Add these Python packages to the `[dependencies.pip]` section of your `cerebrium.toml` file: ```toml -[cerebrium.dependencies.pip] +[dependencies.pip] pytube = "latest" # For audio downloading langchain = "latest" faiss-gpu = "latest" @@ -30,10 +30,10 @@ transformers = ">=4.35.0" sentence_transformers = ">=2.2.0" ``` -Whisper requires ffmpeg and other Linux packages. Add them to the `[cerebrium.dependencies]` section: +Whisper requires ffmpeg and other Linux packages. Add them to the `[dependencies]` section: ```toml -[cerebrium.dependencies] +[dependencies] apt = [ "ffmpeg", "libopenblas-base", "libomp-dev"] ``` @@ -148,35 +148,35 @@ Configure your compute and environment settings in `cerebrium.toml`: ```toml -[cerebrium.build] +[build] predict_data = "{\"prompt\": \"Here is some example predict data for your cerebrium.toml which will be used to test your predict function on build.\"}" force_rebuild = false disable_animation = false log_level = "INFO" disable_confirmation = false -[cerebrium.deployment] +[deployment] name = "langchain-qa" python_version = "3.10" include = ["./*", "main.py"] exclude = ["./.*", "./__*"] -[cerebrium.hardware] +[hardware] gpu = "AMPERE_A5000" cpu = 2 memory = 16.0 gpu_count = 1 -[cerebrium.scaling] +[scaling] min_replicas = 0 cooldown = 60 -[cerebrium.dependencies.apt] +[dependencies.apt] ffmpeg = "latest" "libopenblas-base" = "latest" "libomp-dev" = "latest" -[cerebrium.dependencies.pip] +[dependencies.pip] pytube = "latest" # For audio downloading langchain = "latest" faiss-gpu = "latest" @@ -185,7 +185,7 @@ openai-whisper = "latest" transformers = ">=4.35.0" sentence_transformers = ">=2.2.0" -[cerebrium.dependencies.conda] +[dependencies.conda] ``` diff --git a/v4/examples/livekit-outbound-agent.mdx b/v4/examples/livekit-outbound-agent.mdx index d7a1c05c..3af72cf4 100644 --- a/v4/examples/livekit-outbound-agent.mdx +++ b/v4/examples/livekit-outbound-agent.mdx @@ -387,7 +387,7 @@ CMD ["python3", "main.py", "start"] In our `cerebrium.toml`, edit the values with the following contents: ``` -[cerebrium.deployment] +[deployment] name = "outbound-livekit-agent" python_version = "3.11" docker_base_image_url = "" @@ -395,21 +395,21 @@ disable_auth = false include = ['./*', 'main.py', 'cerebrium.toml'] exclude = ['.*'] -[cerebrium.hardware] +[hardware] cpu = 2 memory = 8.0 compute = "CPU" -[cerebrium.scaling] +[scaling] min_replicas = 1 max_replicas = 5 cooldown = 30 replica_concurrency = 1 -[cerebrium.dependencies.paths] +[dependencies.paths] pip = "requirements.txt" -[cerebrium.runtime.custom] +[runtime.custom] port = 8600 dockerfile_path = "./Dockerfile" ``` diff --git a/v4/examples/mistral-vllm.mdx b/v4/examples/mistral-vllm.mdx index 17df4c4d..b95ee62f 100644 --- a/v4/examples/mistral-vllm.mdx +++ b/v4/examples/mistral-vllm.mdx @@ -23,10 +23,10 @@ First, create your project: cerebrium init 1-faster-inference-with-vllm ``` -Add these Python packages to the `[cerebrium.dependencies.pip]` section in your `cerebrium.toml` file: +Add these Python packages to the `[dependencies.pip]` section in your `cerebrium.toml` file: ```toml -[cerebrium.dependencies.pip] +[dependencies.pip] sentencepiece = "latest" torch = ">=2.0.0" vllm = "latest" @@ -103,7 +103,7 @@ Configure your compute and environment settings in `cerebrium.toml`: ```toml -[cerebrium.build] +[build] predict_data = "{\"prompt\": \"Here is some example predict data for your config.yaml which will be used to test your predict function on build.\"}" hide_public_endpoint = false disable_animation = false @@ -113,14 +113,14 @@ disable_predict = false log_level = "INFO" disable_confirmation = false -[cerebrium.deployment] +[deployment] name = "1-faster-inference-with-vllm" python_version = "3.11" include = ["./*", "main.py", "cerebrium.toml"] exclude = ["./example_exclude"] docker_base_image_url = "nvidia/cuda:12.1.1-runtime-ubuntu22.04" -[cerebrium.hardware] +[hardware] region = "us-east-1" provider = "aws" compute = "AMPERE_A10" @@ -128,12 +128,12 @@ cpu = 2 memory = 16.0 gpu_count = 1 -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 5 cooldown = 60 -[cerebrium.dependencies.pip] +[dependencies.pip] huggingface-hub = "latest" sentencepiece = "latest" torch = ">=2.0.0" @@ -142,9 +142,9 @@ transformers = ">=4.35.0" accelerate = "latest" xformers = "latest" -[cerebrium.dependencies.conda] +[dependencies.conda] -[cerebrium.dependencies.apt] +[dependencies.apt] ffmpeg = "latest" ``` diff --git a/v4/examples/openai-compatible-endpoint-vllm.mdx b/v4/examples/openai-compatible-endpoint-vllm.mdx index 1451c108..c12fd736 100644 --- a/v4/examples/openai-compatible-endpoint-vllm.mdx +++ b/v4/examples/openai-compatible-endpoint-vllm.mdx @@ -19,12 +19,12 @@ In your IDE, run the following command to create our Cerebrium starter project: Add the following pip packages and hardware requirements to your `cerebrium.toml` to create your deployment environment: ```toml -[cerebrium.hardware] +[hardware] cpu = 2 memory = 12.0 compute = "AMPERE_A10" -[cerebrium.dependencies.pip] +[dependencies.pip] vllm = "latest" pydantic = "latest" ``` diff --git a/v4/examples/realtime-voice-agents.mdx b/v4/examples/realtime-voice-agents.mdx index f051a838..80fdeb82 100644 --- a/v4/examples/realtime-voice-agents.mdx +++ b/v4/examples/realtime-voice-agents.mdx @@ -38,24 +38,24 @@ For our LLM we deploy a OpenAI compatible Llama-3 endpoint using the vLLM framew Run `cerebrium init llama-llm` and add the following code to your cerebrium.toml: ``` -[cerebrium.deployment] +[deployment] name = "llama-llm" python_version = "3.11" docker_base_image_url = "debian:bookworm-slim" include = ["./*", "main.py", "cerebrium.toml"] exclude = [".*"] -[cerebrium.hardware] +[hardware] cpu = 4 memory = 12.0 compute = "ADA_L40" -[cerebrium.scaling] +[scaling] min_replicas = 1 max_replicas = 5 cooldown = 60 -[cerebrium.dependencies.pip] +[dependencies.pip] vllm = "latest" pydantic = "latest" ``` @@ -169,30 +169,30 @@ In your IDE, run the following command to create our pipecat-agent: `cerebrium i Add the following pip packages to your `cerebrium.toml` to create your deployment environment: ``` -[cerebrium.deployment] +[deployment] # This file was automatically generated by Cerebrium as a starting point for your project. # You can edit it as you wish. # If you would like to learn more about your Cerebrium config, please visit https://docs.cerebrium.ai/cerebrium/environments/config-files#config-file-example -[cerebrium.deployment] +[deployment] name = "pipecat-agent" python_version = "3.11" include = ["./*", "main.py", "cerebrium.toml"] exclude = ["./example_exclude"] -[cerebrium.hardware] +[hardware] region = "us-east-1" provider = "aws" compute = "CPU" cpu = 6 memory = 12.0 -[cerebrium.scaling] +[scaling] min_replicas = 1 # Note: This incurs a constant cost since at least one instance is always running. max_replicas = 2 cooldown = 180 -[cerebrium.dependencies.pip] +[dependencies.pip] torch = ">=2.0.0" "pipecat-ai[silero, daily, openai, deepgram, cartesia]" = "==0.0.67" aiohttp = ">=3.9.4" diff --git a/v4/examples/sdxl.mdx b/v4/examples/sdxl.mdx index 5ecf164b..dd8cf8cd 100644 --- a/v4/examples/sdxl.mdx +++ b/v4/examples/sdxl.mdx @@ -27,13 +27,13 @@ Configure your compute and environment settings in `cerebrium.toml`: ```toml -[cerebrium.deployment] +[deployment] name = "3-sdxl-refiner" python_version = "3.10" include = ["./*", "main.py", "cerebrium.toml"] exclude = ["./.*", "./__*"] -[cerebrium.hardware] +[hardware] region = "us-east-1" provider = "aws" compute = "AMPERE_A10" @@ -41,21 +41,21 @@ cpu = 2 memory = 16.0 gpu_count = 1 -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 5 cooldown = 60 -[cerebrium.dependencies.pip] +[dependencies.pip] accelerate = "latest" transformers = ">=4.35.0" safetensors = "latest" opencv-python = "latest" diffusers = "latest" -[cerebrium.dependencies.conda] +[dependencies.conda] -[cerebrium.dependencies.apt] +[dependencies.apt] ffmpeg = "latest" ``` diff --git a/v4/examples/streaming-falcon-7B.mdx b/v4/examples/streaming-falcon-7B.mdx index 096d2711..457f9493 100644 --- a/v4/examples/streaming-falcon-7B.mdx +++ b/v4/examples/streaming-falcon-7B.mdx @@ -23,10 +23,10 @@ First, create your project: cerebrium init 5-streaming-endpoint ``` -Add the following packages to the `[cerebrium.dependencies.pip]` section of your `cerebrium.toml` file: +Add the following packages to the `[dependencies.pip]` section of your `cerebrium.toml` file: ```toml -[cerebrium.dependencies.pip] +[dependencies.pip] peft = "git+https://github.com/huggingface/peft.git" transformers = "git+https://github.com/huggingface/transformers.git" accelerate = "git+https://github.com/huggingface/accelerate.git" @@ -128,7 +128,7 @@ The function receives inputs from our request object and uses `TextIteratorStrea Configure your compute and environment settings in `cerebrium.toml`: ```toml -[cerebrium.build] +[build] predict_data = "{\"prompt\": \"Here is some example predict data for your config.yaml which will be used to test your predict function on build.\"}" hide_public_endpoint = false disable_animation = false @@ -138,14 +138,14 @@ disable_predict = false log_level = "INFO" disable_confirmation = false -[cerebrium.deployment] +[deployment] name = "5-streaming-endpoint" python_version = "3.11" include = ["./*", "main.py", "cerebrium.toml"] exclude = ["./example_exclude"] docker_base_image_url = "nvidia/cuda:12.1.1-runtime-ubuntu22.04" -[cerebrium.hardware] +[hardware] region = "us-east-1" provider = "aws" compute = "AMPERE_A10" @@ -153,12 +153,12 @@ cpu = 2 memory = 16.0 gpu_count = 1 -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 5 cooldown = 60 -[cerebrium.dependencies.pip] +[dependencies.pip] peft = "git+https://github.com/huggingface/peft.git" transformers = "git+https://github.com/huggingface/transformers.git" accelerate = "git+https://github.com/huggingface/accelerate.git" @@ -167,9 +167,9 @@ sentencepiece = "latest" pydantic = "latest" torch = "2.1.0" -[cerebrium.dependencies.conda] +[dependencies.conda] -[cerebrium.dependencies.apt] +[dependencies.apt] ``` diff --git a/v4/examples/transcribe-whisper.mdx b/v4/examples/transcribe-whisper.mdx index acfaca5a..bec7c569 100644 --- a/v4/examples/transcribe-whisper.mdx +++ b/v4/examples/transcribe-whisper.mdx @@ -17,10 +17,10 @@ First, create your project: cerebrium init 1-whisper-transcription ``` -Add the following packages to the `[cerebrium.dependencies.pip]` section of your `cerebrium.toml` file: +Add the following packages to the `[dependencies.pip]` section of your `cerebrium.toml` file: ```toml -[cerebrium.dependencies.pip] +[dependencies.pip] accelerate = "latest" transformers = ">=4.35.0" openai-whisper = "latest" @@ -106,14 +106,14 @@ The `predict` function, which runs only on inference requests, creates an audio Configure your compute and environment settings in `cerebrium.toml`: ```toml -[cerebrium.deployment] +[deployment] name = "1-whisper-transcription" python_version = "3.11" include = ["./*", "main.py", "cerebrium.toml"] exclude = ["./example_exclude"] docker_base_image_url = "nvidia/cuda:12.1.1-runtime-ubuntu22.04" -[cerebrium.hardware] +[hardware] region = "us-east-1" provider = "aws" compute = "AMPERE_A10" @@ -121,20 +121,20 @@ cpu = 3 memory = 12.0 gpu_count = 1 -[cerebrium.scaling] +[scaling] min_replicas = 0 max_replicas = 5 cooldown = 60 -[cerebrium.dependencies.pip] +[dependencies.pip] accelerate = "latest" transformers = ">=4.35.0" openai-whisper = "latest" pydantic = "latest" -[cerebrium.dependencies.conda] +[dependencies.conda] -[cerebrium.dependencies.apt] +[dependencies.apt] "ffmpeg" = "latest" ``` diff --git a/v4/examples/twilio-voice-agent.mdx b/v4/examples/twilio-voice-agent.mdx index 7125da6a..aed73605 100644 --- a/v4/examples/twilio-voice-agent.mdx +++ b/v4/examples/twilio-voice-agent.mdx @@ -26,7 +26,7 @@ Set up Cerebrium: Add these pip packages to your `cerebrium.toml`: ``` -[cerebrium.dependencies.pip] +[dependencies.pip] torch = ">=2.0.0" "pipecat-ai[silero, daily, openai, deepgram, cartesia, twilio]" = "0.0.47" aiohttp = ">=3.9.4" @@ -99,7 +99,7 @@ Replace the stream URL with your deployment's base endpoint, using your project Configure Cerebrium to run the FastAPI server by adding this to `cerebrium.toml`: ``` -[cerebrium.runtime.custom] +[runtime.custom] port = 8765 entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8765"] healthcheck_endpoint = "/health" @@ -263,14 +263,14 @@ For scaling criteria, use Cerebrium's `replica_concurrency` setting to spawn new To make the two updates above you can update your cerebrium.toml to contain the following: ``` -[cerebrium.hardware] +[hardware] region = "us-east-1" provider = "aws" compute = "CPU" cpu = 10 memory = 8.0 -[cerebrium.scaling] +[scaling] min_replicas = 1 max_replicas = 3 cooldown = 30 diff --git a/v4/examples/wandb-sweep.mdx b/v4/examples/wandb-sweep.mdx index 30465087..e3814b3d 100644 --- a/v4/examples/wandb-sweep.mdx +++ b/v4/examples/wandb-sweep.mdx @@ -114,16 +114,16 @@ Add this configuration: ``` #existing configuration -[cerebrium.hardware] +[hardware] cpu = 6 memory = 30.0 compute = "ADA_L40" -[cerebrium.scaling] +[scaling] #existing configuration response_grace_period = 3600 -[cerebrium.dependencies.paths] +[dependencies.paths] pip = "requirements.txt" ``` From 5287e16d1a25577b5195f1c9366b2acd3ac1fe66 Mon Sep 17 00:00:00 2001 From: Elijah Roussos Date: Tue, 27 Jan 2026 15:49:04 -0500 Subject: [PATCH 09/16] docs: rename runtime.cortex to runtime.auto-py, dependencies to deps MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Rename runtime: cortex → auto-py, python stays as python - Rename TOML sections: [dependencies.*] → [deps.*] - Add deprecation notice for old runtime name cortex → auto-py - Update all prose references (Cortex runtime → auto-py runtime) - Old names remain backwards compatible --- .../container-images/custom-web-servers.mdx | 2 +- .../defining-container-images.mdx | 36 +++++++++---------- .../private-docker-registry.mdx | 2 +- cerebrium/endpoints/webhook.mdx | 2 +- cerebrium/hardware/using-cuda.mdx | 4 +-- cerebrium/other-topics/faster-cold-starts.mdx | 2 +- .../other-topics/request-response-logging.mdx | 6 ++-- cerebrium/scaling/batching-concurrency.mdx | 4 +-- cerebrium/scaling/graceful-termination.mdx | 4 +-- cerebrium/scaling/scaling-apps.mdx | 2 +- migrations/hugging-face.mdx | 6 ++-- migrations/mystic.mdx | 4 +-- migrations/replicate.mdx | 6 ++-- toml-reference/toml-reference.mdx | 4 +-- v4/examples/aiVoiceAgents.mdx | 4 +-- v4/examples/asgi-gradio-interface.mdx | 2 +- v4/examples/comfyUI.mdx | 4 +-- ...oy-a-vision-language-model-with-sglang.mdx | 4 +-- v4/examples/high-throughput-embeddings.mdx | 2 +- v4/examples/langchain-langsmith.mdx | 2 +- v4/examples/langchain.mdx | 14 ++++---- v4/examples/livekit-outbound-agent.mdx | 2 +- v4/examples/mistral-vllm.mdx | 10 +++--- .../openai-compatible-endpoint-vllm.mdx | 2 +- v4/examples/realtime-voice-agents.mdx | 4 +-- v4/examples/sdxl.mdx | 6 ++-- v4/examples/streaming-falcon-7B.mdx | 10 +++--- v4/examples/transcribe-whisper.mdx | 10 +++--- v4/examples/twilio-voice-agent.mdx | 2 +- v4/examples/wandb-sweep.mdx | 2 +- 30 files changed, 82 insertions(+), 82 deletions(-) diff --git a/cerebrium/container-images/custom-web-servers.mdx b/cerebrium/container-images/custom-web-servers.mdx index 1cb015f6..e89decf7 100644 --- a/cerebrium/container-images/custom-web-servers.mdx +++ b/cerebrium/container-images/custom-web-servers.mdx @@ -38,7 +38,7 @@ port = 5000 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[dependencies.pip] +[deps.pip] pydantic = "latest" numpy = "latest" loguru = "latest" diff --git a/cerebrium/container-images/defining-container-images.mdx b/cerebrium/container-images/defining-container-images.mdx index 8686d60e..5a893343 100644 --- a/cerebrium/container-images/defining-container-images.mdx +++ b/cerebrium/container-images/defining-container-images.mdx @@ -21,7 +21,7 @@ Check out the [Introductory Guide](/cerebrium/getting-started/introduction) for It is possible to initialize an existing project by adding a `cerebrium.toml` file to the root of your codebase, defining your entrypoint (`main.py` if - using the default cortex runtime, or adding an entrypoint to the runtime + using the default auto-py runtime, or adding an entrypoint to the runtime section if using a python or docker runtime) and including the necessary files in the `deployment` section of your `cerebrium.toml` file. @@ -47,7 +47,7 @@ For detailed hardware specifications and performance characteristics see the [GP The Python runtime version forms the foundation of every Cerebrium app. We currently support versions 3.10 to 3.13. Specify the Python version in the runtime section of the configuration: ```toml -[runtime.cortex] +[runtime.auto-py] python_version = "3.11" ``` @@ -72,7 +72,7 @@ The Python version affects the entire dependency chain. For instance, some packa Python dependencies can be managed directly in TOML or through requirement files. The system caches packages to speed up builds: ```toml -[dependencies.pip] +[deps.pip] torch = "==2.0.0" transformers = "==4.30.0" numpy = "latest" @@ -81,7 +81,7 @@ numpy = "latest" Or using an existing requirements file: ```toml -[dependencies.paths] +[deps.paths] pip = "requirements.txt" ``` @@ -94,19 +94,19 @@ The system implements an intelligent caching strategy at the node level. When an ### Adding APT Packages -System-level packages provide the foundation for many ML apps, handling everything from image-processing libraries to audio codecs. These can be added to the `cerebrium.toml` file under the `[dependencies.apt]` section as follows: +System-level packages provide the foundation for many ML apps, handling everything from image-processing libraries to audio codecs. These can be added to the `cerebrium.toml` file under the `[deps.apt]` section as follows: ```toml -[dependencies.apt] +[deps.apt] ffmpeg = "latest" libopenblas-base = "latest" libomp-dev = "latest" ``` -For teams with standardized system dependencies, text files can be used instead by adding the following to the `[dependencies.paths]` section: +For teams with standardized system dependencies, text files can be used instead by adding the following to the `[deps.paths]` section: ```toml -[dependencies.paths] +[deps.paths] apt = "deps_folder/pkglist.txt" ``` @@ -117,7 +117,7 @@ Since APT packages modify the system environment, any changes to these dependenc Conda excels at managing complex system-level Python dependencies, particularly for GPU support and scientific computing: ```toml -[dependencies.conda] +[deps.conda] cuda = ">=11.7" cudatoolkit = "11.7" opencv = "latest" @@ -126,7 +126,7 @@ opencv = "latest" Teams using conda environments can specify their environment file: ```toml -[dependencies.paths] +[deps.paths] conda = "conda_pkglist.txt" ``` @@ -141,7 +141,7 @@ Cerebrium's build process includes two specialized command types that execute at Pre-build commands execute at the start of the build process, before dependency installation begins. This early execution timing makes them essential for setting up the build environment: ```toml -[runtime.cortex] +[runtime.auto-py] pre_build_commands = [ # Add specialized build tools "curl -o /usr/local/bin/pget -L 'https://github.com/replicate/pget/releases/download/v0.6.2/pget_linux_x86_64'", @@ -156,7 +156,7 @@ Pre-build commands typically handle tasks like installing build tools, configuri Shell commands execute after all dependencies install and the application code copies into the container. This later timing ensures access to the complete environment: ```toml -[runtime.cortex] +[runtime.auto-py] shell_commands = [ # Initialize application resources "python -m download_models", @@ -186,7 +186,7 @@ The base image selection shapes how an app runs in Cerebrium. While the default Cerebrium supports several categories of base images to ensure system compatibility such as nvidia, ubuntu and python images. ```toml -[runtime.cortex] +[runtime.auto-py] docker_base_image_url = "debian:bookworm-slim" # Default minimal image #docker_base_image_url = "nvidia/cuda:12.0.1-runtime-ubuntu22.04" # CUDA-enabled images #docker_base_image_url = "ubuntu:22.04" # debian images @@ -213,7 +213,7 @@ docker login -u your-dockerhub-username After logging in, you can use the image in your configuration: ```toml -[runtime.cortex] +[runtime.auto-py] docker_base_image_url = "bob/infinity:latest" ``` @@ -234,7 +234,7 @@ docker_base_image_url = "bob/infinity:latest" Public ECR images from the `public.ecr.aws` registry work without authentication: ```toml -[runtime.cortex] +[runtime.auto-py] docker_base_image_url = "public.ecr.aws/lambda/python:3.11" ``` @@ -242,7 +242,7 @@ However, **private ECR images** require authentication. See [Using Private Docke ## Custom Runtimes -While Cerebrium's default cortex runtime works well for most apps, teams often need more control over their server implementation. Custom runtimes enable features like custom authentication, dynamic batching, public endpoints, or WebSocket connections. +While Cerebrium's default auto-py runtime works well for most apps, teams often need more control over their server implementation. Custom runtimes enable features like custom authentication, dynamic batching, public endpoints, or WebSocket connections. ### Python Runtime (ASGI/WSGI) @@ -290,7 +290,7 @@ readycheck_endpoint = "/ready" When using the docker runtime, all dependencies and build commands should be - handled within the Dockerfile. The `[dependencies.*]` sections will + handled within the Dockerfile. The `[deps.*]` sections will be ignored. @@ -308,7 +308,7 @@ port = 8000 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[dependencies.pip] +[deps.pip] torch = "latest" vllm = "latest" ``` diff --git a/cerebrium/container-images/private-docker-registry.mdx b/cerebrium/container-images/private-docker-registry.mdx index 756298ed..54d55962 100644 --- a/cerebrium/container-images/private-docker-registry.mdx +++ b/cerebrium/container-images/private-docker-registry.mdx @@ -64,7 +64,7 @@ Edit your cerebrium.toml and enter the url of the registry image from the step 2 [deployment] name = "my-app" -[runtime.cortex] +[runtime.auto-py] python_version = "3.11" docker_base_image_url = "your-registry.com/your-org/your-image:tag" diff --git a/cerebrium/endpoints/webhook.mdx b/cerebrium/endpoints/webhook.mdx index aa092e03..bb8e2f0f 100644 --- a/cerebrium/endpoints/webhook.mdx +++ b/cerebrium/endpoints/webhook.mdx @@ -17,7 +17,7 @@ curl -X POST https://api.aws.us-east-1.cerebrium.ai/v4// - These settings only affect the default Cortex runtime. If you are using a + These settings only affect the default auto-py runtime. If you are using a [custom runtime](/cerebrium/container-images/custom-web-servers), you will need to handle logging behavior in your own server implementation. diff --git a/cerebrium/scaling/batching-concurrency.mdx b/cerebrium/scaling/batching-concurrency.mdx index 0ca6371d..e505202a 100644 --- a/cerebrium/scaling/batching-concurrency.mdx +++ b/cerebrium/scaling/batching-concurrency.mdx @@ -33,7 +33,7 @@ max_replicas = 2 cooldown = 10 replica_concurrency = 4 # Each container can now handle multiple requests. -[dependencies.pip] +[deps.pip] sentencepiece = "latest" torch = "latest" vllm = "latest" @@ -63,7 +63,7 @@ entrypoint = ["python", "app/main.py"] healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[dependencies.pip] +[deps.pip] litserve = "latest" fastapi = "latest" ``` diff --git a/cerebrium/scaling/graceful-termination.mdx b/cerebrium/scaling/graceful-termination.mdx index 9ec10426..c671b6dd 100644 --- a/cerebrium/scaling/graceful-termination.mdx +++ b/cerebrium/scaling/graceful-termination.mdx @@ -9,7 +9,7 @@ Cerebrium runs in a shared, multi-tenant environment. To efficiently scale, opti ## Understanding Instance Termination -For both application autoscaling and our own internal node scaling, we will send your application a SIGTERM signal, as a warning to the application that we are intending to shut down this instance. For Cortex applications (Cerebriums default runtime), this is handled. On custom runtimes, should you wish to gracefully shut down, you will need to catch and handle this signal. Once at least `response_grace_period` has elapsed, we will send your application a SIGKILL signal, terminating the instance immediately. +For both application autoscaling and our own internal node scaling, we will send your application a SIGTERM signal, as a warning to the application that we are intending to shut down this instance. For auto-py applications (Cerebriums default runtime), this is handled. On custom runtimes, should you wish to gracefully shut down, you will need to catch and handle this signal. Once at least `response_grace_period` has elapsed, we will send your application a SIGKILL signal, terminating the instance immediately. When Cerebrium needs to terminate an contanier, we do the following: @@ -22,7 +22,7 @@ Below is a chart that shows it more eloquently: ```mermaid flowchart TD - A[SIGTERM sent] --> B[Cortex] + A[SIGTERM sent] --> B[auto-py] A --> C[Custom Runtime] B --> D[automatically captured] diff --git a/cerebrium/scaling/scaling-apps.mdx b/cerebrium/scaling/scaling-apps.mdx index 5a264ebd..ac993deb 100644 --- a/cerebrium/scaling/scaling-apps.mdx +++ b/cerebrium/scaling/scaling-apps.mdx @@ -79,7 +79,7 @@ During normal replica operation, this simply corresponds to a request timeout va waits for the specified grace period, issues a SIGKILL command if the instance has not stopped, and kills any active requests with a GatewayTimeout error. - When using the Cortex runtime (default), SIGTERM signals are automatically + When using the auto-py runtime (default), SIGTERM signals are automatically handled to allow graceful termination of requests. For custom runtimes, you'll need to implement SIGTERM handling yourself to ensure requests complete gracefully before termination. See our [Graceful Termination diff --git a/migrations/hugging-face.mdx b/migrations/hugging-face.mdx index a8f8d7c7..28328c2f 100644 --- a/migrations/hugging-face.mdx +++ b/migrations/hugging-face.mdx @@ -58,7 +58,7 @@ name = "llama-8b-vllm" include = ["./*", "main.py", "cerebrium.toml"] exclude = [".*"] -[runtime.cortex] +[runtime.auto-py] python_version = "3.11" docker_base_image_url = "debian:bookworm-slim" @@ -72,7 +72,7 @@ min_replicas = 0 max_replicas = 5 cooldown = 30 -[dependencies.pip] +[deps.pip] sentencepiece = "latest" torch = "latest" transformers = "latest" @@ -86,7 +86,7 @@ bitsandbytes = "latest" Let's break down this configuration: - `deployment`: Specifies the project name and which files to include/exclude as project files. -- `runtime.cortex`: Specifies the Python version and base Docker image. +- `runtime.auto-py`: Specifies the Python version and base Docker image. - `hardware`: Defines the CPU, memory, and GPU requirements for your deployment. - `scaling`: Configures auto-scaling behavior, including minimum and maximum replicas, and cooldown period. - `dependencies.pip`: Lists the Python packages required for your project. diff --git a/migrations/mystic.mdx b/migrations/mystic.mdx index 5006d7f1..07724826 100644 --- a/migrations/mystic.mdx +++ b/migrations/mystic.mdx @@ -67,7 +67,7 @@ name = "stable-diffusion" include = ["./*", "main.py", "cerebrium.toml"] exclude = [".*"] -[runtime.cortex] +[runtime.auto-py] python_version = "3.11" docker_base_image_url = "debian:bookworm-slim" @@ -83,7 +83,7 @@ max_replicas = 2 # Handle increased traffic and scale up where necessary cooldown = 60 # Time window at reduced concurrency before scaling down replica_concurrency = 1 # The number of requests a single container can support -[dependencies.pip] +[deps.pip] torch = ">=2.0.0" pydantic = "latest" transformers = "latest" diff --git a/migrations/replicate.mdx b/migrations/replicate.mdx index bed603a3..2cf49aa6 100644 --- a/migrations/replicate.mdx +++ b/migrations/replicate.mdx @@ -27,7 +27,7 @@ name = "cog-migration-sdxl" include = ["./*", "main.py", "cerebrium.toml"] exclude = ["./example_exclude"] -[runtime.cortex] +[runtime.auto-py] python_version = "3.11" docker_base_image_url = "nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04" shell_commands = [ @@ -42,14 +42,14 @@ cpu = 2 memory = 12.0 gpu_count = 1 -[dependencies.pip] +[deps.pip] "accelerate" = "latest" "diffusers" = "latest" "torch" = "==2.0.1" "torchvision" = "==0.15.2" "transformers" = "latest" -[dependencies.apt] +[deps.apt] "curl" = "latest" ``` diff --git a/toml-reference/toml-reference.mdx b/toml-reference/toml-reference.mdx index e75fa885..9f61e603 100644 --- a/toml-reference/toml-reference.mdx +++ b/toml-reference/toml-reference.mdx @@ -144,7 +144,7 @@ readycheck_endpoint = "/ready" When using `dockerfile_path`, all dependencies and build commands should be - handled within the Dockerfile. The `[dependencies.*]` sections, + handled within the Dockerfile. The `[deps.*]` sections, `shell_commands`, and `pre_build_commands` will be ignored. @@ -292,7 +292,7 @@ pip = "requirements.txt" # Alternative: use a file instead of inline This approach works with any runtime type (`auto-py`, `python`, or partner services). -**Deprecated:** Top-level `[dependencies.*]` sections are deprecated. +**Deprecated:** Top-level `[dependencies.*]` and `[deps.*]` sections are deprecated. Please move your dependencies to `[runtime.{type}.deps.*]`. Dependencies specified at the runtime level take precedence over top-level dependencies when both are present. diff --git a/v4/examples/aiVoiceAgents.mdx b/v4/examples/aiVoiceAgents.mdx index f59c4dbb..82a32744 100644 --- a/v4/examples/aiVoiceAgents.mdx +++ b/v4/examples/aiVoiceAgents.mdx @@ -52,7 +52,7 @@ min_replicas = 1 max_replicas = 5 cooldown = 60 -[dependencies.pip] +[deps.pip] vllm = "latest" pydantic = "latest" ``` @@ -188,7 +188,7 @@ min_replicas = 1 # Note: This incurs a constant cost since at least one instance max_replicas = 2 cooldown = 180 -[dependencies.pip] +[deps.pip] torch = ">=2.0.0" "pipecat-ai[silero, daily, openai, deepgram, cartesia]" = "==0.0.67" aiohttp = ">=3.9.4" diff --git a/v4/examples/asgi-gradio-interface.mdx b/v4/examples/asgi-gradio-interface.mdx index e69b884c..f3182edf 100644 --- a/v4/examples/asgi-gradio-interface.mdx +++ b/v4/examples/asgi-gradio-interface.mdx @@ -62,7 +62,7 @@ max_replicas = 2 cooldown = 30 replica_concurrency = 10 -[dependencies.pip] +[deps.pip] gradio = "latest" fastapi = "latest" requests = "latest" diff --git a/v4/examples/comfyUI.mdx b/v4/examples/comfyUI.mdx index 79e4a5cb..51334fd7 100644 --- a/v4/examples/comfyUI.mdx +++ b/v4/examples/comfyUI.mdx @@ -288,7 +288,7 @@ scaling_metric = "concurrency_utilization" scaling_target = 100 scaling_buffer = 0 -[dependencies.pip] +[deps.pip] uvicorn = "latest" fastapi = "latest" requests = "latest" @@ -313,7 +313,7 @@ tqdm = "latest" psutil = "latest" kornia = ">=0.7.1" -[dependencies.apt] +[deps.apt] git = "latest" ``` diff --git a/v4/examples/deploy-a-vision-language-model-with-sglang.mdx b/v4/examples/deploy-a-vision-language-model-with-sglang.mdx index d70c4d8a..3dfbf22a 100644 --- a/v4/examples/deploy-a-vision-language-model-with-sglang.mdx +++ b/v4/examples/deploy-a-vision-language-model-with-sglang.mdx @@ -90,7 +90,7 @@ max_replicas = 2 [build] use_uv = true -[dependencies.pip] +[deps.pip] transformers = "latest" huggingface_hub = "latest" pydantic = "latest" @@ -101,7 +101,7 @@ torch = "latest" "sgl-kernel" = "latest" "flashinfer-python" = "latest" -[dependencies.apt] +[deps.apt] libnuma-dev = "latest" [runtime.custom] diff --git a/v4/examples/high-throughput-embeddings.mdx b/v4/examples/high-throughput-embeddings.mdx index 539afc0c..5e0ca8dd 100644 --- a/v4/examples/high-throughput-embeddings.mdx +++ b/v4/examples/high-throughput-embeddings.mdx @@ -61,7 +61,7 @@ cooldown = 30 replica_concurrency = 500 scaling_metric = "concurrency_utilization" -[dependencies.pip] +[deps.pip] numpy = "latest" "infinity-emb[all]" = "0.0.77" optimum = ">=1.24.0,<2.0.0" diff --git a/v4/examples/langchain-langsmith.mdx b/v4/examples/langchain-langsmith.mdx index c073d157..8c3d72a3 100644 --- a/v4/examples/langchain-langsmith.mdx +++ b/v4/examples/langchain-langsmith.mdx @@ -144,7 +144,7 @@ Set up Cerebrium: Add these pip packages to your `cerebrium.toml`: ``` -[dependencies.pip] +[deps.pip] pydantic = "latest" langchain = "latest" pytz = "latest" ##this is used for timezones diff --git a/v4/examples/langchain.mdx b/v4/examples/langchain.mdx index d5498f3c..1c44fe35 100644 --- a/v4/examples/langchain.mdx +++ b/v4/examples/langchain.mdx @@ -17,10 +17,10 @@ First, create your project: cerebrium init 1-langchain-QA ``` -Add these Python packages to the `[dependencies.pip]` section of your `cerebrium.toml` file: +Add these Python packages to the `[deps.pip]` section of your `cerebrium.toml` file: ```toml -[dependencies.pip] +[deps.pip] pytube = "latest" # For audio downloading langchain = "latest" faiss-gpu = "latest" @@ -30,10 +30,10 @@ transformers = ">=4.35.0" sentence_transformers = ">=2.2.0" ``` -Whisper requires ffmpeg and other Linux packages. Add them to the `[dependencies]` section: +Whisper requires ffmpeg and other Linux packages. Add them to the `[deps]` section: ```toml -[dependencies] +[deps] apt = [ "ffmpeg", "libopenblas-base", "libomp-dev"] ``` @@ -171,12 +171,12 @@ gpu_count = 1 min_replicas = 0 cooldown = 60 -[dependencies.apt] +[deps.apt] ffmpeg = "latest" "libopenblas-base" = "latest" "libomp-dev" = "latest" -[dependencies.pip] +[deps.pip] pytube = "latest" # For audio downloading langchain = "latest" faiss-gpu = "latest" @@ -185,7 +185,7 @@ openai-whisper = "latest" transformers = ">=4.35.0" sentence_transformers = ">=2.2.0" -[dependencies.conda] +[deps.conda] ``` diff --git a/v4/examples/livekit-outbound-agent.mdx b/v4/examples/livekit-outbound-agent.mdx index 3af72cf4..bd8a4601 100644 --- a/v4/examples/livekit-outbound-agent.mdx +++ b/v4/examples/livekit-outbound-agent.mdx @@ -406,7 +406,7 @@ max_replicas = 5 cooldown = 30 replica_concurrency = 1 -[dependencies.paths] +[deps.paths] pip = "requirements.txt" [runtime.custom] diff --git a/v4/examples/mistral-vllm.mdx b/v4/examples/mistral-vllm.mdx index b95ee62f..da188046 100644 --- a/v4/examples/mistral-vllm.mdx +++ b/v4/examples/mistral-vllm.mdx @@ -23,10 +23,10 @@ First, create your project: cerebrium init 1-faster-inference-with-vllm ``` -Add these Python packages to the `[dependencies.pip]` section in your `cerebrium.toml` file: +Add these Python packages to the `[deps.pip]` section in your `cerebrium.toml` file: ```toml -[dependencies.pip] +[deps.pip] sentencepiece = "latest" torch = ">=2.0.0" vllm = "latest" @@ -133,7 +133,7 @@ min_replicas = 0 max_replicas = 5 cooldown = 60 -[dependencies.pip] +[deps.pip] huggingface-hub = "latest" sentencepiece = "latest" torch = ">=2.0.0" @@ -142,9 +142,9 @@ transformers = ">=4.35.0" accelerate = "latest" xformers = "latest" -[dependencies.conda] +[deps.conda] -[dependencies.apt] +[deps.apt] ffmpeg = "latest" ``` diff --git a/v4/examples/openai-compatible-endpoint-vllm.mdx b/v4/examples/openai-compatible-endpoint-vllm.mdx index c12fd736..753d770d 100644 --- a/v4/examples/openai-compatible-endpoint-vllm.mdx +++ b/v4/examples/openai-compatible-endpoint-vllm.mdx @@ -24,7 +24,7 @@ cpu = 2 memory = 12.0 compute = "AMPERE_A10" -[dependencies.pip] +[deps.pip] vllm = "latest" pydantic = "latest" ``` diff --git a/v4/examples/realtime-voice-agents.mdx b/v4/examples/realtime-voice-agents.mdx index 80fdeb82..173d9410 100644 --- a/v4/examples/realtime-voice-agents.mdx +++ b/v4/examples/realtime-voice-agents.mdx @@ -55,7 +55,7 @@ min_replicas = 1 max_replicas = 5 cooldown = 60 -[dependencies.pip] +[deps.pip] vllm = "latest" pydantic = "latest" ``` @@ -192,7 +192,7 @@ min_replicas = 1 # Note: This incurs a constant cost since at least one instance max_replicas = 2 cooldown = 180 -[dependencies.pip] +[deps.pip] torch = ">=2.0.0" "pipecat-ai[silero, daily, openai, deepgram, cartesia]" = "==0.0.67" aiohttp = ">=3.9.4" diff --git a/v4/examples/sdxl.mdx b/v4/examples/sdxl.mdx index dd8cf8cd..ea5c3832 100644 --- a/v4/examples/sdxl.mdx +++ b/v4/examples/sdxl.mdx @@ -46,16 +46,16 @@ min_replicas = 0 max_replicas = 5 cooldown = 60 -[dependencies.pip] +[deps.pip] accelerate = "latest" transformers = ">=4.35.0" safetensors = "latest" opencv-python = "latest" diffusers = "latest" -[dependencies.conda] +[deps.conda] -[dependencies.apt] +[deps.apt] ffmpeg = "latest" ``` diff --git a/v4/examples/streaming-falcon-7B.mdx b/v4/examples/streaming-falcon-7B.mdx index 457f9493..cb66153e 100644 --- a/v4/examples/streaming-falcon-7B.mdx +++ b/v4/examples/streaming-falcon-7B.mdx @@ -23,10 +23,10 @@ First, create your project: cerebrium init 5-streaming-endpoint ``` -Add the following packages to the `[dependencies.pip]` section of your `cerebrium.toml` file: +Add the following packages to the `[deps.pip]` section of your `cerebrium.toml` file: ```toml -[dependencies.pip] +[deps.pip] peft = "git+https://github.com/huggingface/peft.git" transformers = "git+https://github.com/huggingface/transformers.git" accelerate = "git+https://github.com/huggingface/accelerate.git" @@ -158,7 +158,7 @@ min_replicas = 0 max_replicas = 5 cooldown = 60 -[dependencies.pip] +[deps.pip] peft = "git+https://github.com/huggingface/peft.git" transformers = "git+https://github.com/huggingface/transformers.git" accelerate = "git+https://github.com/huggingface/accelerate.git" @@ -167,9 +167,9 @@ sentencepiece = "latest" pydantic = "latest" torch = "2.1.0" -[dependencies.conda] +[deps.conda] -[dependencies.apt] +[deps.apt] ``` diff --git a/v4/examples/transcribe-whisper.mdx b/v4/examples/transcribe-whisper.mdx index bec7c569..d91b9dbf 100644 --- a/v4/examples/transcribe-whisper.mdx +++ b/v4/examples/transcribe-whisper.mdx @@ -17,10 +17,10 @@ First, create your project: cerebrium init 1-whisper-transcription ``` -Add the following packages to the `[dependencies.pip]` section of your `cerebrium.toml` file: +Add the following packages to the `[deps.pip]` section of your `cerebrium.toml` file: ```toml -[dependencies.pip] +[deps.pip] accelerate = "latest" transformers = ">=4.35.0" openai-whisper = "latest" @@ -126,15 +126,15 @@ min_replicas = 0 max_replicas = 5 cooldown = 60 -[dependencies.pip] +[deps.pip] accelerate = "latest" transformers = ">=4.35.0" openai-whisper = "latest" pydantic = "latest" -[dependencies.conda] +[deps.conda] -[dependencies.apt] +[deps.apt] "ffmpeg" = "latest" ``` diff --git a/v4/examples/twilio-voice-agent.mdx b/v4/examples/twilio-voice-agent.mdx index aed73605..f31757b6 100644 --- a/v4/examples/twilio-voice-agent.mdx +++ b/v4/examples/twilio-voice-agent.mdx @@ -26,7 +26,7 @@ Set up Cerebrium: Add these pip packages to your `cerebrium.toml`: ``` -[dependencies.pip] +[deps.pip] torch = ">=2.0.0" "pipecat-ai[silero, daily, openai, deepgram, cartesia, twilio]" = "0.0.47" aiohttp = ">=3.9.4" diff --git a/v4/examples/wandb-sweep.mdx b/v4/examples/wandb-sweep.mdx index e3814b3d..77790444 100644 --- a/v4/examples/wandb-sweep.mdx +++ b/v4/examples/wandb-sweep.mdx @@ -123,7 +123,7 @@ compute = "ADA_L40" #existing configuration response_grace_period = 3600 -[dependencies.paths] +[deps.paths] pip = "requirements.txt" ``` From ce5424469d751c7fd5c368cdd7c2b7ab1fc8eb26 Mon Sep 17 00:00:00 2001 From: elijah-rou Date: Tue, 27 Jan 2026 20:49:27 +0000 Subject: [PATCH 10/16] Prettified Code! --- .../container-images/custom-dockerfiles.mdx | 4 +- .../defining-container-images.mdx | 3 +- toml-reference/toml-reference.mdx | 91 ++++++++++--------- 3 files changed, 51 insertions(+), 47 deletions(-) diff --git a/cerebrium/container-images/custom-dockerfiles.mdx b/cerebrium/container-images/custom-dockerfiles.mdx index f029dc8e..fb765e08 100644 --- a/cerebrium/container-images/custom-dockerfiles.mdx +++ b/cerebrium/container-images/custom-dockerfiles.mdx @@ -114,8 +114,8 @@ port = 8192 When specifying a `dockerfile_path`, all dependencies and necessary commands should be installed and executed within the Dockerfile. Dependencies listed - under `dependencies.*`, as well as `shell_commands` and - `pre_build_commands`, will be ignored. + under `dependencies.*`, as well as `shell_commands` and `pre_build_commands`, + will be ignored. ## Building Generic Dockerized Apps diff --git a/cerebrium/container-images/defining-container-images.mdx b/cerebrium/container-images/defining-container-images.mdx index 5a893343..e647c19a 100644 --- a/cerebrium/container-images/defining-container-images.mdx +++ b/cerebrium/container-images/defining-container-images.mdx @@ -290,8 +290,7 @@ readycheck_endpoint = "/ready" When using the docker runtime, all dependencies and build commands should be - handled within the Dockerfile. The `[deps.*]` sections will - be ignored. + handled within the Dockerfile. The `[deps.*]` sections will be ignored. ### Self-Contained Servers diff --git a/toml-reference/toml-reference.mdx b/toml-reference/toml-reference.mdx index 9f61e603..e8fa1cc7 100644 --- a/toml-reference/toml-reference.mdx +++ b/toml-reference/toml-reference.mdx @@ -6,13 +6,14 @@ description: Complete reference for all parameters available in Cerebrium's defa **Deprecation Notice:** The `cerebrium.` prefix is being removed from all configuration sections. Sub-keys are becoming top-level keys (e.g., `[cerebrium.deployment]` → `[deployment]`). The prefixed format is still supported for backwards compatibility but will be removed in a future release. We recommend migrating to the new format. -| Current (Deprecated) | New Format | -| --------------------------------- | --------------------- | -| `[cerebrium.deployment]` | `[deployment]` | -| `[cerebrium.runtime.cortex]` | `[runtime.auto-py]` | -| `[cerebrium.hardware]` | `[hardware]` | -| `[cerebrium.scaling]` | `[scaling]` | -| `[cerebrium.dependencies]` | `[dependencies]` | +| Current (Deprecated) | New Format | +| ---------------------------- | ------------------- | +| `[cerebrium.deployment]` | `[deployment]` | +| `[cerebrium.runtime.cortex]` | `[runtime.auto-py]` | +| `[cerebrium.hardware]` | `[hardware]` | +| `[cerebrium.scaling]` | `[scaling]` | +| `[cerebrium.dependencies]` | `[dependencies]` | + The configuration is organized into the following main sections: @@ -144,8 +145,8 @@ readycheck_endpoint = "/ready" When using `dockerfile_path`, all dependencies and build commands should be - handled within the Dockerfile. The `[deps.*]` sections, - `shell_commands`, and `pre_build_commands` will be ignored. + handled within the Dockerfile. The `[deps.*]` sections, `shell_commands`, and + `pre_build_commands` will be ignored. ### UV Package Manager @@ -292,15 +293,17 @@ pip = "requirements.txt" # Alternative: use a file instead of inline This approach works with any runtime type (`auto-py`, `python`, or partner services). -**Deprecated:** Top-level `[dependencies.*]` and `[deps.*]` sections are deprecated. -Please move your dependencies to `[runtime.{type}.deps.*]`. -Dependencies specified at the runtime level take precedence over top-level dependencies when both are present. + **Deprecated:** Top-level `[dependencies.*]` and `[deps.*]` sections are + deprecated. Please move your dependencies to `[runtime.{type}.deps.*]`. + Dependencies specified at the runtime level take precedence over top-level + dependencies when both are present. ### Top-Level Dependencies (Deprecated) -This approach is deprecated. Please migrate to runtime-specific dependencies above. + This approach is deprecated. Please migrate to runtime-specific dependencies + above. #### Pip Dependencies @@ -453,36 +456,36 @@ replica_concurrency = 10 ## Backwards Compatibility -The following configuration patterns are deprecated but still supported for backwards compatibility. -We recommend migrating to the new format. + The following configuration patterns are deprecated but still supported for + backwards compatibility. We recommend migrating to the new format. -### Deprecated: cerebrium.* prefix +### Deprecated: cerebrium.\* prefix The `cerebrium.` prefix on all section names is deprecated. Please migrate to the new format: -| Deprecated Format | New Format | -| ---------------------------------- | ----------------------------------- | -| `[cerebrium.deployment]` | `[deployment]` | -| `[cerebrium.runtime.cortex]` | `[runtime.auto-py]` | -| `[cerebrium.runtime.python]` | `[runtime.python]` | -| `[cerebrium.runtime.docker]` | `[runtime.docker]` | -| `[cerebrium.hardware]` | `[hardware]` | -| `[cerebrium.scaling]` | `[scaling]` | -| `[cerebrium.deps.pip]` | `[dependencies.pip]` | -| `[cerebrium.deps.apt]` | `[dependencies.apt]` | +| Deprecated Format | New Format | +| ---------------------------- | -------------------- | +| `[cerebrium.deployment]` | `[deployment]` | +| `[cerebrium.runtime.cortex]` | `[runtime.auto-py]` | +| `[cerebrium.runtime.python]` | `[runtime.python]` | +| `[cerebrium.runtime.docker]` | `[runtime.docker]` | +| `[cerebrium.hardware]` | `[hardware]` | +| `[cerebrium.scaling]` | `[scaling]` | +| `[cerebrium.deps.pip]` | `[dependencies.pip]` | +| `[cerebrium.deps.apt]` | `[dependencies.apt]` | ### Deprecated: Runtime fields in [deployment] The following fields in `[deployment]` are deprecated. Please move them to the appropriate runtime section: -| Deprecated Field | New Location | -| ---------------------- | --------------------------------------------- | -| python_version | `[runtime.auto-py]` or `[runtime.python]` | -| docker_base_image_url | `[runtime.auto-py]` or `[runtime.python]` | -| shell_commands | `[runtime.auto-py]` or `[runtime.python]` | -| pre_build_commands | `[runtime.auto-py]` or `[runtime.python]` | -| use_uv | `[runtime.auto-py]` or `[runtime.python]` | +| Deprecated Field | New Location | +| --------------------- | ----------------------------------------- | +| python_version | `[runtime.auto-py]` or `[runtime.python]` | +| docker_base_image_url | `[runtime.auto-py]` or `[runtime.python]` | +| shell_commands | `[runtime.auto-py]` or `[runtime.python]` | +| pre_build_commands | `[runtime.auto-py]` or `[runtime.python]` | +| use_uv | `[runtime.auto-py]` or `[runtime.python]` | ### Deprecated: [runtime.custom] @@ -495,23 +498,25 @@ The `[runtime.custom]` section is deprecated. Please migrate to: The runtime names `cortex` and `python` are deprecated. The old names still work for backwards compatibility but will be removed in a future release. -| Deprecated Name | New Name | -| ---------------------- | -------------------- | -| `[runtime.cortex]` | `[runtime.auto-py]` | +| Deprecated Name | New Name | +| ------------------ | ------------------- | +| `[runtime.cortex]` | `[runtime.auto-py]` | ### Deprecated: Top-level [dependencies.*] Top-level dependency sections are deprecated. Please move dependencies to runtime-specific sections: -| Deprecated Location | New Location | -| ------------------------ | ----------------------------------------- | -| `[dependencies.pip]` | `[runtime.{type}.deps.pip]` | -| `[dependencies.apt]` | `[runtime.{type}.deps.apt]` | -| `[dependencies.conda]` | `[runtime.{type}.deps.conda]` | -| `[dependencies.paths]` | `[runtime.{type}.deps.paths]` | +| Deprecated Location | New Location | +| ---------------------- | ----------------------------- | +| `[dependencies.pip]` | `[runtime.{type}.deps.pip]` | +| `[dependencies.apt]` | `[runtime.{type}.deps.apt]` | +| `[dependencies.conda]` | `[runtime.{type}.deps.conda]` | +| `[dependencies.paths]` | `[runtime.{type}.deps.paths]` | Where `{type}` is your runtime type (e.g., `auto-py`, `python`). -When both top-level and runtime-specific dependencies are present, runtime-specific dependencies take precedence on a per-package basis. This allows gradual migration. + When both top-level and runtime-specific dependencies are present, + runtime-specific dependencies take precedence on a per-package basis. This + allows gradual migration. From 0b4bc63204bbfc48f29afb66df2572ae752d2f51 Mon Sep 17 00:00:00 2001 From: Elijah Roussos Date: Wed, 28 Jan 2026 09:11:28 -0500 Subject: [PATCH 11/16] docs: update dependencies to use _file_relative_path key - Rename deps to dependencies in all examples - Replace [dependencies.paths] section with _file_relative_path key - Document file + inline dependency merging behavior - Update backwards compatibility section --- toml-reference/toml-reference.mdx | 84 ++++++++++++++++++------------- 1 file changed, 49 insertions(+), 35 deletions(-) diff --git a/toml-reference/toml-reference.mdx b/toml-reference/toml-reference.mdx index e8fa1cc7..f1829512 100644 --- a/toml-reference/toml-reference.mdx +++ b/toml-reference/toml-reference.mdx @@ -100,7 +100,7 @@ port = 8000 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[runtime.python.deps.pip] +[runtime.python.dependencies.pip] fastapi = "latest" uvicorn = "latest" ``` @@ -145,8 +145,8 @@ readycheck_endpoint = "/ready" When using `dockerfile_path`, all dependencies and build commands should be - handled within the Dockerfile. The `[deps.*]` sections, `shell_commands`, and - `pre_build_commands` will be ignored. + handled within the Dockerfile. The `[dependencies.*]` sections, + `shell_commands`, and `pre_build_commands` will be ignored. ### UV Package Manager @@ -273,30 +273,36 @@ Dependencies can be specified within the runtime section. This is the recommende [runtime.auto-py] python_version = "3.12" -[runtime.auto-py.deps.pip] +[runtime.auto-py.dependencies.pip] torch = "==2.0.0" # Exact version numpy = "latest" # Latest version pandas = ">=1.5.0" # Minimum version -[runtime.auto-py.deps.apt] +[runtime.auto-py.dependencies.apt] ffmpeg = "latest" libopenblas-base = "latest" -[runtime.auto-py.deps.conda] +[runtime.auto-py.dependencies.conda] cuda = ">=11.7" cudatoolkit = "11.7" +``` + +#### Using Requirements Files + +You can also specify a requirements file using the special `_file_relative_path` key. When both a file and inline packages are specified, they are merged (inline packages take precedence): -[runtime.auto-py.deps.paths] -pip = "requirements.txt" # Alternative: use a file instead of inline +```toml +[runtime.auto-py.dependencies.pip] +_file_relative_path = "requirements.txt" # Base packages from file +numpy = "1.24.0" # Override or add packages ``` This approach works with any runtime type (`auto-py`, `python`, or partner services). - **Deprecated:** Top-level `[dependencies.*]` and `[deps.*]` sections are - deprecated. Please move your dependencies to `[runtime.{type}.deps.*]`. - Dependencies specified at the runtime level take precedence over top-level - dependencies when both are present. +**Deprecated:** Top-level `[dependencies.*]` sections are deprecated. +Please move your dependencies to `[runtime.{type}.dependencies.*]`. +Dependencies specified at the runtime level take precedence over top-level dependencies when both are present. ### Top-Level Dependencies (Deprecated) @@ -339,13 +345,18 @@ cudatoolkit = "11.7" #### Dependency Files -The `[dependencies.paths]` section allows using requirement files. +Use the special `_file_relative_path` key to load dependencies from a file. When both a file and inline packages are specified, they are merged (inline packages take precedence): ```toml -[dependencies.paths] -pip = "requirements.txt" -apt = "pkglist.txt" -conda = "conda_pkglist.txt" +[dependencies.pip] +_file_relative_path = "requirements.txt" # Load from file +numpy = "1.24.0" # Override or add packages + +[dependencies.apt] +_file_relative_path = "pkglist.txt" + +[dependencies.conda] +_file_relative_path = "conda_pkglist.txt" ``` ## Complete Examples @@ -366,7 +377,7 @@ use_uv = true shell_commands = [] pre_build_commands = [] -[runtime.auto-py.deps.pip] +[runtime.auto-py.dependencies.pip] torch = "latest" transformers = "latest" @@ -406,7 +417,7 @@ port = 8000 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[runtime.python.deps.pip] +[runtime.python.dependencies.pip] torch = "latest" transformers = "latest" uvicorn = "latest" @@ -464,16 +475,16 @@ replica_concurrency = 10 The `cerebrium.` prefix on all section names is deprecated. Please migrate to the new format: -| Deprecated Format | New Format | -| ---------------------------- | -------------------- | -| `[cerebrium.deployment]` | `[deployment]` | -| `[cerebrium.runtime.cortex]` | `[runtime.auto-py]` | -| `[cerebrium.runtime.python]` | `[runtime.python]` | -| `[cerebrium.runtime.docker]` | `[runtime.docker]` | -| `[cerebrium.hardware]` | `[hardware]` | -| `[cerebrium.scaling]` | `[scaling]` | -| `[cerebrium.deps.pip]` | `[dependencies.pip]` | -| `[cerebrium.deps.apt]` | `[dependencies.apt]` | +| Deprecated Format | New Format | +| ---------------------------------- | ----------------------------------- | +| `[cerebrium.deployment]` | `[deployment]` | +| `[cerebrium.runtime.cortex]` | `[runtime.auto-py]` | +| `[cerebrium.runtime.python]` | `[runtime.python]` | +| `[cerebrium.runtime.docker]` | `[runtime.docker]` | +| `[cerebrium.hardware]` | `[hardware]` | +| `[cerebrium.scaling]` | `[scaling]` | +| `[cerebrium.dependencies.pip]` | `[dependencies.pip]` | +| `[cerebrium.dependencies.apt]` | `[dependencies.apt]` | ### Deprecated: Runtime fields in [deployment] @@ -506,15 +517,18 @@ The runtime names `cortex` and `python` are deprecated. The old names still work Top-level dependency sections are deprecated. Please move dependencies to runtime-specific sections: -| Deprecated Location | New Location | -| ---------------------- | ----------------------------- | -| `[dependencies.pip]` | `[runtime.{type}.deps.pip]` | -| `[dependencies.apt]` | `[runtime.{type}.deps.apt]` | -| `[dependencies.conda]` | `[runtime.{type}.deps.conda]` | -| `[dependencies.paths]` | `[runtime.{type}.deps.paths]` | +| Deprecated Location | New Location | +| ------------------------ | ----------------------------------------- | +| `[dependencies.pip]` | `[runtime.{type}.dependencies.pip]` | +| `[dependencies.apt]` | `[runtime.{type}.dependencies.apt]` | +| `[dependencies.conda]` | `[runtime.{type}.dependencies.conda]` | Where `{type}` is your runtime type (e.g., `auto-py`, `python`). + +The `[dependencies.paths]` section has been replaced with `_file_relative_path` key inside dependency maps. See the Dependencies section above for the new syntax. + + When both top-level and runtime-specific dependencies are present, runtime-specific dependencies take precedence on a per-package basis. This From d83178c25e4132311e9b6ca106943308b5785844 Mon Sep 17 00:00:00 2001 From: elijah-rou Date: Wed, 28 Jan 2026 14:11:44 +0000 Subject: [PATCH 12/16] Prettified Code! --- toml-reference/toml-reference.mdx | 41 +++++++++++++++++-------------- 1 file changed, 22 insertions(+), 19 deletions(-) diff --git a/toml-reference/toml-reference.mdx b/toml-reference/toml-reference.mdx index f1829512..fbd761ed 100644 --- a/toml-reference/toml-reference.mdx +++ b/toml-reference/toml-reference.mdx @@ -300,9 +300,10 @@ numpy = "1.24.0" # Override or add packages This approach works with any runtime type (`auto-py`, `python`, or partner services). -**Deprecated:** Top-level `[dependencies.*]` sections are deprecated. -Please move your dependencies to `[runtime.{type}.dependencies.*]`. -Dependencies specified at the runtime level take precedence over top-level dependencies when both are present. + **Deprecated:** Top-level `[dependencies.*]` sections are deprecated. Please + move your dependencies to `[runtime.{type}.dependencies.*]`. Dependencies + specified at the runtime level take precedence over top-level dependencies + when both are present. ### Top-Level Dependencies (Deprecated) @@ -475,16 +476,16 @@ replica_concurrency = 10 The `cerebrium.` prefix on all section names is deprecated. Please migrate to the new format: -| Deprecated Format | New Format | -| ---------------------------------- | ----------------------------------- | -| `[cerebrium.deployment]` | `[deployment]` | -| `[cerebrium.runtime.cortex]` | `[runtime.auto-py]` | -| `[cerebrium.runtime.python]` | `[runtime.python]` | -| `[cerebrium.runtime.docker]` | `[runtime.docker]` | -| `[cerebrium.hardware]` | `[hardware]` | -| `[cerebrium.scaling]` | `[scaling]` | -| `[cerebrium.dependencies.pip]` | `[dependencies.pip]` | -| `[cerebrium.dependencies.apt]` | `[dependencies.apt]` | +| Deprecated Format | New Format | +| ------------------------------ | -------------------- | +| `[cerebrium.deployment]` | `[deployment]` | +| `[cerebrium.runtime.cortex]` | `[runtime.auto-py]` | +| `[cerebrium.runtime.python]` | `[runtime.python]` | +| `[cerebrium.runtime.docker]` | `[runtime.docker]` | +| `[cerebrium.hardware]` | `[hardware]` | +| `[cerebrium.scaling]` | `[scaling]` | +| `[cerebrium.dependencies.pip]` | `[dependencies.pip]` | +| `[cerebrium.dependencies.apt]` | `[dependencies.apt]` | ### Deprecated: Runtime fields in [deployment] @@ -517,16 +518,18 @@ The runtime names `cortex` and `python` are deprecated. The old names still work Top-level dependency sections are deprecated. Please move dependencies to runtime-specific sections: -| Deprecated Location | New Location | -| ------------------------ | ----------------------------------------- | -| `[dependencies.pip]` | `[runtime.{type}.dependencies.pip]` | -| `[dependencies.apt]` | `[runtime.{type}.dependencies.apt]` | -| `[dependencies.conda]` | `[runtime.{type}.dependencies.conda]` | +| Deprecated Location | New Location | +| ---------------------- | ------------------------------------- | +| `[dependencies.pip]` | `[runtime.{type}.dependencies.pip]` | +| `[dependencies.apt]` | `[runtime.{type}.dependencies.apt]` | +| `[dependencies.conda]` | `[runtime.{type}.dependencies.conda]` | Where `{type}` is your runtime type (e.g., `auto-py`, `python`). -The `[dependencies.paths]` section has been replaced with `_file_relative_path` key inside dependency maps. See the Dependencies section above for the new syntax. + The `[dependencies.paths]` section has been replaced with + `_file_relative_path` key inside dependency maps. See the Dependencies section + above for the new syntax. From 846c99e2e8ae2ab5e31109f64dae9a6f8756d092 Mon Sep 17 00:00:00 2001 From: Elijah Roussos Date: Wed, 28 Jan 2026 09:15:00 -0500 Subject: [PATCH 13/16] docs: revert deps to dependencies, use _file_relative_path for requirement files MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Revert deps → dependencies (dependencies is the correct name) - Replace [dependencies.paths] with _file_relative_path key inside dependency sections - Example: [dependencies.pip] with _file_relative_path = "requirements.txt" --- .../container-images/custom-web-servers.mdx | 2 +- .../defining-container-images.mdx | 26 +++++++++---------- cerebrium/hardware/using-cuda.mdx | 2 +- cerebrium/other-topics/faster-cold-starts.mdx | 2 +- cerebrium/scaling/batching-concurrency.mdx | 4 +-- migrations/hugging-face.mdx | 2 +- migrations/mystic.mdx | 2 +- migrations/replicate.mdx | 4 +-- v4/examples/aiVoiceAgents.mdx | 4 +-- v4/examples/asgi-gradio-interface.mdx | 2 +- v4/examples/comfyUI.mdx | 4 +-- ...oy-a-vision-language-model-with-sglang.mdx | 4 +-- v4/examples/high-throughput-embeddings.mdx | 2 +- v4/examples/langchain-langsmith.mdx | 2 +- v4/examples/langchain.mdx | 10 +++---- v4/examples/livekit-outbound-agent.mdx | 4 +-- v4/examples/mistral-vllm.mdx | 10 +++---- .../openai-compatible-endpoint-vllm.mdx | 2 +- v4/examples/realtime-voice-agents.mdx | 4 +-- v4/examples/sdxl.mdx | 6 ++--- v4/examples/streaming-falcon-7B.mdx | 10 +++---- v4/examples/transcribe-whisper.mdx | 10 +++---- v4/examples/twilio-voice-agent.mdx | 2 +- v4/examples/wandb-sweep.mdx | 4 +-- 24 files changed, 62 insertions(+), 62 deletions(-) diff --git a/cerebrium/container-images/custom-web-servers.mdx b/cerebrium/container-images/custom-web-servers.mdx index e89decf7..1cb015f6 100644 --- a/cerebrium/container-images/custom-web-servers.mdx +++ b/cerebrium/container-images/custom-web-servers.mdx @@ -38,7 +38,7 @@ port = 5000 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[deps.pip] +[dependencies.pip] pydantic = "latest" numpy = "latest" loguru = "latest" diff --git a/cerebrium/container-images/defining-container-images.mdx b/cerebrium/container-images/defining-container-images.mdx index e647c19a..f4ef52a1 100644 --- a/cerebrium/container-images/defining-container-images.mdx +++ b/cerebrium/container-images/defining-container-images.mdx @@ -72,7 +72,7 @@ The Python version affects the entire dependency chain. For instance, some packa Python dependencies can be managed directly in TOML or through requirement files. The system caches packages to speed up builds: ```toml -[deps.pip] +[dependencies.pip] torch = "==2.0.0" transformers = "==4.30.0" numpy = "latest" @@ -81,8 +81,8 @@ numpy = "latest" Or using an existing requirements file: ```toml -[deps.paths] -pip = "requirements.txt" +[dependencies.pip] +_file_relative_path = "requirements.txt" ``` @@ -94,20 +94,20 @@ The system implements an intelligent caching strategy at the node level. When an ### Adding APT Packages -System-level packages provide the foundation for many ML apps, handling everything from image-processing libraries to audio codecs. These can be added to the `cerebrium.toml` file under the `[deps.apt]` section as follows: +System-level packages provide the foundation for many ML apps, handling everything from image-processing libraries to audio codecs. These can be added to the `cerebrium.toml` file under the `[dependencies.apt]` section as follows: ```toml -[deps.apt] +[dependencies.apt] ffmpeg = "latest" libopenblas-base = "latest" libomp-dev = "latest" ``` -For teams with standardized system dependencies, text files can be used instead by adding the following to the `[deps.paths]` section: +For teams with standardized system dependencies, text files can be used instead: ```toml -[deps.paths] -apt = "deps_folder/pkglist.txt" +[dependencies.apt] +_file_relative_path = "deps_folder/pkglist.txt" ``` Since APT packages modify the system environment, any changes to these dependencies trigger a full rebuild of the container image. This ensures system-level changes are properly integrated but means builds will take longer than when modifying Python packages alone. @@ -117,7 +117,7 @@ Since APT packages modify the system environment, any changes to these dependenc Conda excels at managing complex system-level Python dependencies, particularly for GPU support and scientific computing: ```toml -[deps.conda] +[dependencies.conda] cuda = ">=11.7" cudatoolkit = "11.7" opencv = "latest" @@ -126,8 +126,8 @@ opencv = "latest" Teams using conda environments can specify their environment file: ```toml -[deps.paths] -conda = "conda_pkglist.txt" +[dependencies.conda] +_file_relative_path = "conda_pkglist.txt" ``` Like APT packages, Conda packages often modify system-level components. Changes to Conda dependencies will trigger a full rebuild to ensure all binary dependencies and system libraries are correctly configured. Consider batching Conda dependency updates together to minimize rebuild time. @@ -290,7 +290,7 @@ readycheck_endpoint = "/ready" When using the docker runtime, all dependencies and build commands should be - handled within the Dockerfile. The `[deps.*]` sections will be ignored. + handled within the Dockerfile. The `[dependencies.*]` sections will be ignored. ### Self-Contained Servers @@ -307,7 +307,7 @@ port = 8000 healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[deps.pip] +[dependencies.pip] torch = "latest" vllm = "latest" ``` diff --git a/cerebrium/hardware/using-cuda.mdx b/cerebrium/hardware/using-cuda.mdx index fcccda92..72579e83 100644 --- a/cerebrium/hardware/using-cuda.mdx +++ b/cerebrium/hardware/using-cuda.mdx @@ -15,7 +15,7 @@ CUDA connects apps to graphics cards, splitting large tasks into smaller pieces Many Python packages come with built-in CUDA support. For example, the popular machine learning package PyTorch includes CUDA in its installation: ```toml -[deps.pip] +[dependencies.pip] torch = "latest" # PyTorch with graphics card support ``` diff --git a/cerebrium/other-topics/faster-cold-starts.mdx b/cerebrium/other-topics/faster-cold-starts.mdx index 2ea8cae4..02a21f3b 100644 --- a/cerebrium/other-topics/faster-cold-starts.mdx +++ b/cerebrium/other-topics/faster-cold-starts.mdx @@ -46,7 +46,7 @@ In this section below, we'll show you how to use **Tensorizer** to load your mod ### Installation -Add the following to your `[deps.pip]` in your `cerebrium.toml` file to install Tensorizer in your deployment: +Add the following to your `[dependencies.pip]` in your `cerebrium.toml` file to install Tensorizer in your deployment: ```txt tensorizer = ">=2.7.0" diff --git a/cerebrium/scaling/batching-concurrency.mdx b/cerebrium/scaling/batching-concurrency.mdx index e505202a..0ca6371d 100644 --- a/cerebrium/scaling/batching-concurrency.mdx +++ b/cerebrium/scaling/batching-concurrency.mdx @@ -33,7 +33,7 @@ max_replicas = 2 cooldown = 10 replica_concurrency = 4 # Each container can now handle multiple requests. -[deps.pip] +[dependencies.pip] sentencepiece = "latest" torch = "latest" vllm = "latest" @@ -63,7 +63,7 @@ entrypoint = ["python", "app/main.py"] healthcheck_endpoint = "/health" readycheck_endpoint = "/ready" -[deps.pip] +[dependencies.pip] litserve = "latest" fastapi = "latest" ``` diff --git a/migrations/hugging-face.mdx b/migrations/hugging-face.mdx index 28328c2f..089150c4 100644 --- a/migrations/hugging-face.mdx +++ b/migrations/hugging-face.mdx @@ -72,7 +72,7 @@ min_replicas = 0 max_replicas = 5 cooldown = 30 -[deps.pip] +[dependencies.pip] sentencepiece = "latest" torch = "latest" transformers = "latest" diff --git a/migrations/mystic.mdx b/migrations/mystic.mdx index 07724826..ff2e7781 100644 --- a/migrations/mystic.mdx +++ b/migrations/mystic.mdx @@ -83,7 +83,7 @@ max_replicas = 2 # Handle increased traffic and scale up where necessary cooldown = 60 # Time window at reduced concurrency before scaling down replica_concurrency = 1 # The number of requests a single container can support -[deps.pip] +[dependencies.pip] torch = ">=2.0.0" pydantic = "latest" transformers = "latest" diff --git a/migrations/replicate.mdx b/migrations/replicate.mdx index 2cf49aa6..b143e0bb 100644 --- a/migrations/replicate.mdx +++ b/migrations/replicate.mdx @@ -42,14 +42,14 @@ cpu = 2 memory = 12.0 gpu_count = 1 -[deps.pip] +[dependencies.pip] "accelerate" = "latest" "diffusers" = "latest" "torch" = "==2.0.1" "torchvision" = "==0.15.2" "transformers" = "latest" -[deps.apt] +[dependencies.apt] "curl" = "latest" ``` diff --git a/v4/examples/aiVoiceAgents.mdx b/v4/examples/aiVoiceAgents.mdx index 82a32744..f59c4dbb 100644 --- a/v4/examples/aiVoiceAgents.mdx +++ b/v4/examples/aiVoiceAgents.mdx @@ -52,7 +52,7 @@ min_replicas = 1 max_replicas = 5 cooldown = 60 -[deps.pip] +[dependencies.pip] vllm = "latest" pydantic = "latest" ``` @@ -188,7 +188,7 @@ min_replicas = 1 # Note: This incurs a constant cost since at least one instance max_replicas = 2 cooldown = 180 -[deps.pip] +[dependencies.pip] torch = ">=2.0.0" "pipecat-ai[silero, daily, openai, deepgram, cartesia]" = "==0.0.67" aiohttp = ">=3.9.4" diff --git a/v4/examples/asgi-gradio-interface.mdx b/v4/examples/asgi-gradio-interface.mdx index f3182edf..e69b884c 100644 --- a/v4/examples/asgi-gradio-interface.mdx +++ b/v4/examples/asgi-gradio-interface.mdx @@ -62,7 +62,7 @@ max_replicas = 2 cooldown = 30 replica_concurrency = 10 -[deps.pip] +[dependencies.pip] gradio = "latest" fastapi = "latest" requests = "latest" diff --git a/v4/examples/comfyUI.mdx b/v4/examples/comfyUI.mdx index 51334fd7..79e4a5cb 100644 --- a/v4/examples/comfyUI.mdx +++ b/v4/examples/comfyUI.mdx @@ -288,7 +288,7 @@ scaling_metric = "concurrency_utilization" scaling_target = 100 scaling_buffer = 0 -[deps.pip] +[dependencies.pip] uvicorn = "latest" fastapi = "latest" requests = "latest" @@ -313,7 +313,7 @@ tqdm = "latest" psutil = "latest" kornia = ">=0.7.1" -[deps.apt] +[dependencies.apt] git = "latest" ``` diff --git a/v4/examples/deploy-a-vision-language-model-with-sglang.mdx b/v4/examples/deploy-a-vision-language-model-with-sglang.mdx index 3dfbf22a..d70c4d8a 100644 --- a/v4/examples/deploy-a-vision-language-model-with-sglang.mdx +++ b/v4/examples/deploy-a-vision-language-model-with-sglang.mdx @@ -90,7 +90,7 @@ max_replicas = 2 [build] use_uv = true -[deps.pip] +[dependencies.pip] transformers = "latest" huggingface_hub = "latest" pydantic = "latest" @@ -101,7 +101,7 @@ torch = "latest" "sgl-kernel" = "latest" "flashinfer-python" = "latest" -[deps.apt] +[dependencies.apt] libnuma-dev = "latest" [runtime.custom] diff --git a/v4/examples/high-throughput-embeddings.mdx b/v4/examples/high-throughput-embeddings.mdx index 5e0ca8dd..539afc0c 100644 --- a/v4/examples/high-throughput-embeddings.mdx +++ b/v4/examples/high-throughput-embeddings.mdx @@ -61,7 +61,7 @@ cooldown = 30 replica_concurrency = 500 scaling_metric = "concurrency_utilization" -[deps.pip] +[dependencies.pip] numpy = "latest" "infinity-emb[all]" = "0.0.77" optimum = ">=1.24.0,<2.0.0" diff --git a/v4/examples/langchain-langsmith.mdx b/v4/examples/langchain-langsmith.mdx index 8c3d72a3..c073d157 100644 --- a/v4/examples/langchain-langsmith.mdx +++ b/v4/examples/langchain-langsmith.mdx @@ -144,7 +144,7 @@ Set up Cerebrium: Add these pip packages to your `cerebrium.toml`: ``` -[deps.pip] +[dependencies.pip] pydantic = "latest" langchain = "latest" pytz = "latest" ##this is used for timezones diff --git a/v4/examples/langchain.mdx b/v4/examples/langchain.mdx index 1c44fe35..5d0bf041 100644 --- a/v4/examples/langchain.mdx +++ b/v4/examples/langchain.mdx @@ -17,10 +17,10 @@ First, create your project: cerebrium init 1-langchain-QA ``` -Add these Python packages to the `[deps.pip]` section of your `cerebrium.toml` file: +Add these Python packages to the `[dependencies.pip]` section of your `cerebrium.toml` file: ```toml -[deps.pip] +[dependencies.pip] pytube = "latest" # For audio downloading langchain = "latest" faiss-gpu = "latest" @@ -171,12 +171,12 @@ gpu_count = 1 min_replicas = 0 cooldown = 60 -[deps.apt] +[dependencies.apt] ffmpeg = "latest" "libopenblas-base" = "latest" "libomp-dev" = "latest" -[deps.pip] +[dependencies.pip] pytube = "latest" # For audio downloading langchain = "latest" faiss-gpu = "latest" @@ -185,7 +185,7 @@ openai-whisper = "latest" transformers = ">=4.35.0" sentence_transformers = ">=2.2.0" -[deps.conda] +[dependencies.conda] ``` diff --git a/v4/examples/livekit-outbound-agent.mdx b/v4/examples/livekit-outbound-agent.mdx index bd8a4601..013fa15e 100644 --- a/v4/examples/livekit-outbound-agent.mdx +++ b/v4/examples/livekit-outbound-agent.mdx @@ -406,8 +406,8 @@ max_replicas = 5 cooldown = 30 replica_concurrency = 1 -[deps.paths] -pip = "requirements.txt" +[dependencies.pip] +_file_relative_path = "requirements.txt" [runtime.custom] port = 8600 diff --git a/v4/examples/mistral-vllm.mdx b/v4/examples/mistral-vllm.mdx index da188046..b95ee62f 100644 --- a/v4/examples/mistral-vllm.mdx +++ b/v4/examples/mistral-vllm.mdx @@ -23,10 +23,10 @@ First, create your project: cerebrium init 1-faster-inference-with-vllm ``` -Add these Python packages to the `[deps.pip]` section in your `cerebrium.toml` file: +Add these Python packages to the `[dependencies.pip]` section in your `cerebrium.toml` file: ```toml -[deps.pip] +[dependencies.pip] sentencepiece = "latest" torch = ">=2.0.0" vllm = "latest" @@ -133,7 +133,7 @@ min_replicas = 0 max_replicas = 5 cooldown = 60 -[deps.pip] +[dependencies.pip] huggingface-hub = "latest" sentencepiece = "latest" torch = ">=2.0.0" @@ -142,9 +142,9 @@ transformers = ">=4.35.0" accelerate = "latest" xformers = "latest" -[deps.conda] +[dependencies.conda] -[deps.apt] +[dependencies.apt] ffmpeg = "latest" ``` diff --git a/v4/examples/openai-compatible-endpoint-vllm.mdx b/v4/examples/openai-compatible-endpoint-vllm.mdx index 753d770d..c12fd736 100644 --- a/v4/examples/openai-compatible-endpoint-vllm.mdx +++ b/v4/examples/openai-compatible-endpoint-vllm.mdx @@ -24,7 +24,7 @@ cpu = 2 memory = 12.0 compute = "AMPERE_A10" -[deps.pip] +[dependencies.pip] vllm = "latest" pydantic = "latest" ``` diff --git a/v4/examples/realtime-voice-agents.mdx b/v4/examples/realtime-voice-agents.mdx index 173d9410..80fdeb82 100644 --- a/v4/examples/realtime-voice-agents.mdx +++ b/v4/examples/realtime-voice-agents.mdx @@ -55,7 +55,7 @@ min_replicas = 1 max_replicas = 5 cooldown = 60 -[deps.pip] +[dependencies.pip] vllm = "latest" pydantic = "latest" ``` @@ -192,7 +192,7 @@ min_replicas = 1 # Note: This incurs a constant cost since at least one instance max_replicas = 2 cooldown = 180 -[deps.pip] +[dependencies.pip] torch = ">=2.0.0" "pipecat-ai[silero, daily, openai, deepgram, cartesia]" = "==0.0.67" aiohttp = ">=3.9.4" diff --git a/v4/examples/sdxl.mdx b/v4/examples/sdxl.mdx index ea5c3832..dd8cf8cd 100644 --- a/v4/examples/sdxl.mdx +++ b/v4/examples/sdxl.mdx @@ -46,16 +46,16 @@ min_replicas = 0 max_replicas = 5 cooldown = 60 -[deps.pip] +[dependencies.pip] accelerate = "latest" transformers = ">=4.35.0" safetensors = "latest" opencv-python = "latest" diffusers = "latest" -[deps.conda] +[dependencies.conda] -[deps.apt] +[dependencies.apt] ffmpeg = "latest" ``` diff --git a/v4/examples/streaming-falcon-7B.mdx b/v4/examples/streaming-falcon-7B.mdx index cb66153e..457f9493 100644 --- a/v4/examples/streaming-falcon-7B.mdx +++ b/v4/examples/streaming-falcon-7B.mdx @@ -23,10 +23,10 @@ First, create your project: cerebrium init 5-streaming-endpoint ``` -Add the following packages to the `[deps.pip]` section of your `cerebrium.toml` file: +Add the following packages to the `[dependencies.pip]` section of your `cerebrium.toml` file: ```toml -[deps.pip] +[dependencies.pip] peft = "git+https://github.com/huggingface/peft.git" transformers = "git+https://github.com/huggingface/transformers.git" accelerate = "git+https://github.com/huggingface/accelerate.git" @@ -158,7 +158,7 @@ min_replicas = 0 max_replicas = 5 cooldown = 60 -[deps.pip] +[dependencies.pip] peft = "git+https://github.com/huggingface/peft.git" transformers = "git+https://github.com/huggingface/transformers.git" accelerate = "git+https://github.com/huggingface/accelerate.git" @@ -167,9 +167,9 @@ sentencepiece = "latest" pydantic = "latest" torch = "2.1.0" -[deps.conda] +[dependencies.conda] -[deps.apt] +[dependencies.apt] ``` diff --git a/v4/examples/transcribe-whisper.mdx b/v4/examples/transcribe-whisper.mdx index d91b9dbf..bec7c569 100644 --- a/v4/examples/transcribe-whisper.mdx +++ b/v4/examples/transcribe-whisper.mdx @@ -17,10 +17,10 @@ First, create your project: cerebrium init 1-whisper-transcription ``` -Add the following packages to the `[deps.pip]` section of your `cerebrium.toml` file: +Add the following packages to the `[dependencies.pip]` section of your `cerebrium.toml` file: ```toml -[deps.pip] +[dependencies.pip] accelerate = "latest" transformers = ">=4.35.0" openai-whisper = "latest" @@ -126,15 +126,15 @@ min_replicas = 0 max_replicas = 5 cooldown = 60 -[deps.pip] +[dependencies.pip] accelerate = "latest" transformers = ">=4.35.0" openai-whisper = "latest" pydantic = "latest" -[deps.conda] +[dependencies.conda] -[deps.apt] +[dependencies.apt] "ffmpeg" = "latest" ``` diff --git a/v4/examples/twilio-voice-agent.mdx b/v4/examples/twilio-voice-agent.mdx index f31757b6..aed73605 100644 --- a/v4/examples/twilio-voice-agent.mdx +++ b/v4/examples/twilio-voice-agent.mdx @@ -26,7 +26,7 @@ Set up Cerebrium: Add these pip packages to your `cerebrium.toml`: ``` -[deps.pip] +[dependencies.pip] torch = ">=2.0.0" "pipecat-ai[silero, daily, openai, deepgram, cartesia, twilio]" = "0.0.47" aiohttp = ">=3.9.4" diff --git a/v4/examples/wandb-sweep.mdx b/v4/examples/wandb-sweep.mdx index 77790444..36c10a47 100644 --- a/v4/examples/wandb-sweep.mdx +++ b/v4/examples/wandb-sweep.mdx @@ -123,8 +123,8 @@ compute = "ADA_L40" #existing configuration response_grace_period = 3600 -[deps.paths] -pip = "requirements.txt" +[dependencies.pip] +_file_relative_path = "requirements.txt" ``` Install the dependencies locally: From df2c2f35c3832549bf6970f0f47a236526d70cf3 Mon Sep 17 00:00:00 2001 From: elijah-rou Date: Wed, 28 Jan 2026 14:15:22 +0000 Subject: [PATCH 14/16] Prettified Code! --- cerebrium/container-images/defining-container-images.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/cerebrium/container-images/defining-container-images.mdx b/cerebrium/container-images/defining-container-images.mdx index f4ef52a1..6e14a707 100644 --- a/cerebrium/container-images/defining-container-images.mdx +++ b/cerebrium/container-images/defining-container-images.mdx @@ -290,7 +290,8 @@ readycheck_endpoint = "/ready" When using the docker runtime, all dependencies and build commands should be - handled within the Dockerfile. The `[dependencies.*]` sections will be ignored. + handled within the Dockerfile. The `[dependencies.*]` sections will be + ignored. ### Self-Contained Servers From d1066c44754d7e218f96cecef252c04b3ee4616b Mon Sep 17 00:00:00 2001 From: Elijah Roussos Date: Wed, 28 Jan 2026 13:18:12 -0500 Subject: [PATCH 15/16] docs: revert auto-py back to cortex runtime name MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Revert all auto-py → cortex - Remove unnecessary deprecation notice for runtime name change - Keep cortex as the default runtime name --- .../defining-container-images.mdx | 16 +++--- .../private-docker-registry.mdx | 2 +- cerebrium/endpoints/webhook.mdx | 2 +- cerebrium/hardware/using-cuda.mdx | 2 +- .../other-topics/request-response-logging.mdx | 6 +-- cerebrium/scaling/graceful-termination.mdx | 4 +- cerebrium/scaling/scaling-apps.mdx | 2 +- migrations/hugging-face.mdx | 4 +- migrations/mystic.mdx | 2 +- migrations/replicate.mdx | 2 +- toml-reference/toml-reference.mdx | 50 ++++++++----------- 11 files changed, 42 insertions(+), 50 deletions(-) diff --git a/cerebrium/container-images/defining-container-images.mdx b/cerebrium/container-images/defining-container-images.mdx index 6e14a707..8d3becbf 100644 --- a/cerebrium/container-images/defining-container-images.mdx +++ b/cerebrium/container-images/defining-container-images.mdx @@ -21,7 +21,7 @@ Check out the [Introductory Guide](/cerebrium/getting-started/introduction) for It is possible to initialize an existing project by adding a `cerebrium.toml` file to the root of your codebase, defining your entrypoint (`main.py` if - using the default auto-py runtime, or adding an entrypoint to the runtime + using the default cortex runtime, or adding an entrypoint to the runtime section if using a python or docker runtime) and including the necessary files in the `deployment` section of your `cerebrium.toml` file. @@ -47,7 +47,7 @@ For detailed hardware specifications and performance characteristics see the [GP The Python runtime version forms the foundation of every Cerebrium app. We currently support versions 3.10 to 3.13. Specify the Python version in the runtime section of the configuration: ```toml -[runtime.auto-py] +[runtime.cortex] python_version = "3.11" ``` @@ -141,7 +141,7 @@ Cerebrium's build process includes two specialized command types that execute at Pre-build commands execute at the start of the build process, before dependency installation begins. This early execution timing makes them essential for setting up the build environment: ```toml -[runtime.auto-py] +[runtime.cortex] pre_build_commands = [ # Add specialized build tools "curl -o /usr/local/bin/pget -L 'https://github.com/replicate/pget/releases/download/v0.6.2/pget_linux_x86_64'", @@ -156,7 +156,7 @@ Pre-build commands typically handle tasks like installing build tools, configuri Shell commands execute after all dependencies install and the application code copies into the container. This later timing ensures access to the complete environment: ```toml -[runtime.auto-py] +[runtime.cortex] shell_commands = [ # Initialize application resources "python -m download_models", @@ -186,7 +186,7 @@ The base image selection shapes how an app runs in Cerebrium. While the default Cerebrium supports several categories of base images to ensure system compatibility such as nvidia, ubuntu and python images. ```toml -[runtime.auto-py] +[runtime.cortex] docker_base_image_url = "debian:bookworm-slim" # Default minimal image #docker_base_image_url = "nvidia/cuda:12.0.1-runtime-ubuntu22.04" # CUDA-enabled images #docker_base_image_url = "ubuntu:22.04" # debian images @@ -213,7 +213,7 @@ docker login -u your-dockerhub-username After logging in, you can use the image in your configuration: ```toml -[runtime.auto-py] +[runtime.cortex] docker_base_image_url = "bob/infinity:latest" ``` @@ -234,7 +234,7 @@ docker_base_image_url = "bob/infinity:latest" Public ECR images from the `public.ecr.aws` registry work without authentication: ```toml -[runtime.auto-py] +[runtime.cortex] docker_base_image_url = "public.ecr.aws/lambda/python:3.11" ``` @@ -242,7 +242,7 @@ However, **private ECR images** require authentication. See [Using Private Docke ## Custom Runtimes -While Cerebrium's default auto-py runtime works well for most apps, teams often need more control over their server implementation. Custom runtimes enable features like custom authentication, dynamic batching, public endpoints, or WebSocket connections. +While Cerebrium's default cortex runtime works well for most apps, teams often need more control over their server implementation. Custom runtimes enable features like custom authentication, dynamic batching, public endpoints, or WebSocket connections. ### Python Runtime (ASGI/WSGI) diff --git a/cerebrium/container-images/private-docker-registry.mdx b/cerebrium/container-images/private-docker-registry.mdx index 54d55962..756298ed 100644 --- a/cerebrium/container-images/private-docker-registry.mdx +++ b/cerebrium/container-images/private-docker-registry.mdx @@ -64,7 +64,7 @@ Edit your cerebrium.toml and enter the url of the registry image from the step 2 [deployment] name = "my-app" -[runtime.auto-py] +[runtime.cortex] python_version = "3.11" docker_base_image_url = "your-registry.com/your-org/your-image:tag" diff --git a/cerebrium/endpoints/webhook.mdx b/cerebrium/endpoints/webhook.mdx index bb8e2f0f..55c984f7 100644 --- a/cerebrium/endpoints/webhook.mdx +++ b/cerebrium/endpoints/webhook.mdx @@ -17,7 +17,7 @@ curl -X POST https://api.aws.us-east-1.cerebrium.ai/v4// - These settings only affect the default auto-py runtime. If you are using a + These settings only affect the default cortex runtime. If you are using a [custom runtime](/cerebrium/container-images/custom-web-servers), you will need to handle logging behavior in your own server implementation. diff --git a/cerebrium/scaling/graceful-termination.mdx b/cerebrium/scaling/graceful-termination.mdx index c671b6dd..d61327ad 100644 --- a/cerebrium/scaling/graceful-termination.mdx +++ b/cerebrium/scaling/graceful-termination.mdx @@ -9,7 +9,7 @@ Cerebrium runs in a shared, multi-tenant environment. To efficiently scale, opti ## Understanding Instance Termination -For both application autoscaling and our own internal node scaling, we will send your application a SIGTERM signal, as a warning to the application that we are intending to shut down this instance. For auto-py applications (Cerebriums default runtime), this is handled. On custom runtimes, should you wish to gracefully shut down, you will need to catch and handle this signal. Once at least `response_grace_period` has elapsed, we will send your application a SIGKILL signal, terminating the instance immediately. +For both application autoscaling and our own internal node scaling, we will send your application a SIGTERM signal, as a warning to the application that we are intending to shut down this instance. For cortex applications (Cerebriums default runtime), this is handled. On custom runtimes, should you wish to gracefully shut down, you will need to catch and handle this signal. Once at least `response_grace_period` has elapsed, we will send your application a SIGKILL signal, terminating the instance immediately. When Cerebrium needs to terminate an contanier, we do the following: @@ -22,7 +22,7 @@ Below is a chart that shows it more eloquently: ```mermaid flowchart TD - A[SIGTERM sent] --> B[auto-py] + A[SIGTERM sent] --> B[cortex] A --> C[Custom Runtime] B --> D[automatically captured] diff --git a/cerebrium/scaling/scaling-apps.mdx b/cerebrium/scaling/scaling-apps.mdx index ac993deb..2cdd06f6 100644 --- a/cerebrium/scaling/scaling-apps.mdx +++ b/cerebrium/scaling/scaling-apps.mdx @@ -79,7 +79,7 @@ During normal replica operation, this simply corresponds to a request timeout va waits for the specified grace period, issues a SIGKILL command if the instance has not stopped, and kills any active requests with a GatewayTimeout error. - When using the auto-py runtime (default), SIGTERM signals are automatically + When using the cortex runtime (default), SIGTERM signals are automatically handled to allow graceful termination of requests. For custom runtimes, you'll need to implement SIGTERM handling yourself to ensure requests complete gracefully before termination. See our [Graceful Termination diff --git a/migrations/hugging-face.mdx b/migrations/hugging-face.mdx index 089150c4..a8f8d7c7 100644 --- a/migrations/hugging-face.mdx +++ b/migrations/hugging-face.mdx @@ -58,7 +58,7 @@ name = "llama-8b-vllm" include = ["./*", "main.py", "cerebrium.toml"] exclude = [".*"] -[runtime.auto-py] +[runtime.cortex] python_version = "3.11" docker_base_image_url = "debian:bookworm-slim" @@ -86,7 +86,7 @@ bitsandbytes = "latest" Let's break down this configuration: - `deployment`: Specifies the project name and which files to include/exclude as project files. -- `runtime.auto-py`: Specifies the Python version and base Docker image. +- `runtime.cortex`: Specifies the Python version and base Docker image. - `hardware`: Defines the CPU, memory, and GPU requirements for your deployment. - `scaling`: Configures auto-scaling behavior, including minimum and maximum replicas, and cooldown period. - `dependencies.pip`: Lists the Python packages required for your project. diff --git a/migrations/mystic.mdx b/migrations/mystic.mdx index ff2e7781..5006d7f1 100644 --- a/migrations/mystic.mdx +++ b/migrations/mystic.mdx @@ -67,7 +67,7 @@ name = "stable-diffusion" include = ["./*", "main.py", "cerebrium.toml"] exclude = [".*"] -[runtime.auto-py] +[runtime.cortex] python_version = "3.11" docker_base_image_url = "debian:bookworm-slim" diff --git a/migrations/replicate.mdx b/migrations/replicate.mdx index b143e0bb..bed603a3 100644 --- a/migrations/replicate.mdx +++ b/migrations/replicate.mdx @@ -27,7 +27,7 @@ name = "cog-migration-sdxl" include = ["./*", "main.py", "cerebrium.toml"] exclude = ["./example_exclude"] -[runtime.auto-py] +[runtime.cortex] python_version = "3.11" docker_base_image_url = "nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04" shell_commands = [ diff --git a/toml-reference/toml-reference.mdx b/toml-reference/toml-reference.mdx index fbd761ed..dbcc5db8 100644 --- a/toml-reference/toml-reference.mdx +++ b/toml-reference/toml-reference.mdx @@ -9,7 +9,7 @@ description: Complete reference for all parameters available in Cerebrium's defa | Current (Deprecated) | New Format | | ---------------------------- | ------------------- | | `[cerebrium.deployment]` | `[deployment]` | -| `[cerebrium.runtime.cortex]` | `[runtime.auto-py]` | +| `[cerebrium.runtime.cortex]` | `[runtime.cortex]` | | `[cerebrium.hardware]` | `[hardware]` | | `[cerebrium.scaling]` | `[scaling]` | | `[cerebrium.dependencies]` | `[dependencies]` | @@ -19,7 +19,7 @@ description: Complete reference for all parameters available in Cerebrium's defa The configuration is organized into the following main sections: - **[deployment]** Core settings like app name and file inclusion rules -- **[runtime.auto-py]** Default Cerebrium-managed Python runtime (build settings) +- **[runtime.cortex]** Default Cerebrium-managed Python runtime (build settings) - **[runtime.python]** Custom Python ASGI/WSGI web server settings - **[runtime.docker]** Custom Dockerfile settings - **[hardware]** Compute resources including CPU, memory, and GPU specifications @@ -44,7 +44,7 @@ Cerebrium supports three runtime types. You should only specify one runtime sect ### Auto-Py Runtime (Default) -The `[runtime.auto-py]` section configures the default Cerebrium-managed Python runtime. This is ideal for standard Python applications where Cerebrium automatically manages the web server. +The `[runtime.cortex]` section configures the default Cerebrium-managed Python runtime. This is ideal for standard Python applications where Cerebrium automatically manages the web server. | Option | Type | Default | Description | | --------------------- | -------- | ---------------------- | --------------------------------------------- | @@ -60,7 +60,7 @@ The `[runtime.auto-py]` section configures the default Cerebrium-managed Python [deployment] name = "my-app" -[runtime.auto-py] +[runtime.cortex] python_version = "3.12" docker_base_image_url = "debian:bookworm-slim" use_uv = true @@ -151,7 +151,7 @@ readycheck_endpoint = "/ready" ### UV Package Manager -UV is a fast Python package installer written in Rust that can significantly speed up deployment times. When enabled in `[runtime.auto-py]` or `[runtime.python]`, UV will be used instead of pip for installing Python dependencies. +UV is a fast Python package installer written in Rust that can significantly speed up deployment times. When enabled in `[runtime.cortex]` or `[runtime.python]`, UV will be used instead of pip for installing Python dependencies. UV typically installs packages 10-100x faster than pip, especially beneficial for: @@ -165,7 +165,7 @@ UV typically installs packages 10-100x faster than pip, especially beneficial fo **Example with UV enabled:** ```toml -[runtime.auto-py] +[runtime.cortex] use_uv = true ``` @@ -270,19 +270,19 @@ Dependencies can be specified either at the runtime level (recommended) or at th Dependencies can be specified within the runtime section. This is the recommended approach: ```toml -[runtime.auto-py] +[runtime.cortex] python_version = "3.12" -[runtime.auto-py.dependencies.pip] +[runtime.cortex.dependencies.pip] torch = "==2.0.0" # Exact version numpy = "latest" # Latest version pandas = ">=1.5.0" # Minimum version -[runtime.auto-py.dependencies.apt] +[runtime.cortex.dependencies.apt] ffmpeg = "latest" libopenblas-base = "latest" -[runtime.auto-py.dependencies.conda] +[runtime.cortex.dependencies.conda] cuda = ">=11.7" cudatoolkit = "11.7" ``` @@ -292,12 +292,12 @@ cudatoolkit = "11.7" You can also specify a requirements file using the special `_file_relative_path` key. When both a file and inline packages are specified, they are merged (inline packages take precedence): ```toml -[runtime.auto-py.dependencies.pip] +[runtime.cortex.dependencies.pip] _file_relative_path = "requirements.txt" # Base packages from file numpy = "1.24.0" # Override or add packages ``` -This approach works with any runtime type (`auto-py`, `python`, or partner services). +This approach works with any runtime type (`cortex`, `python`, or partner services). **Deprecated:** Top-level `[dependencies.*]` sections are deprecated. Please @@ -371,14 +371,14 @@ disable_auth = false include = ["*"] exclude = [".*"] -[runtime.auto-py] +[runtime.cortex] python_version = "3.12" docker_base_image_url = "debian:bookworm-slim" use_uv = true shell_commands = [] pre_build_commands = [] -[runtime.auto-py.dependencies.pip] +[runtime.cortex.dependencies.pip] torch = "latest" transformers = "latest" @@ -479,7 +479,7 @@ The `cerebrium.` prefix on all section names is deprecated. Please migrate to th | Deprecated Format | New Format | | ------------------------------ | -------------------- | | `[cerebrium.deployment]` | `[deployment]` | -| `[cerebrium.runtime.cortex]` | `[runtime.auto-py]` | +| `[cerebrium.runtime.cortex]` | `[runtime.cortex]` | | `[cerebrium.runtime.python]` | `[runtime.python]` | | `[cerebrium.runtime.docker]` | `[runtime.docker]` | | `[cerebrium.hardware]` | `[hardware]` | @@ -493,11 +493,11 @@ The following fields in `[deployment]` are deprecated. Please move them to the a | Deprecated Field | New Location | | --------------------- | ----------------------------------------- | -| python_version | `[runtime.auto-py]` or `[runtime.python]` | -| docker_base_image_url | `[runtime.auto-py]` or `[runtime.python]` | -| shell_commands | `[runtime.auto-py]` or `[runtime.python]` | -| pre_build_commands | `[runtime.auto-py]` or `[runtime.python]` | -| use_uv | `[runtime.auto-py]` or `[runtime.python]` | +| python_version | `[runtime.cortex]` or `[runtime.python]` | +| docker_base_image_url | `[runtime.cortex]` or `[runtime.python]` | +| shell_commands | `[runtime.cortex]` or `[runtime.python]` | +| pre_build_commands | `[runtime.cortex]` or `[runtime.python]` | +| use_uv | `[runtime.cortex]` or `[runtime.python]` | ### Deprecated: [runtime.custom] @@ -506,14 +506,6 @@ The `[runtime.custom]` section is deprecated. Please migrate to: - `[runtime.python]` - For custom Python ASGI/WSGI applications - `[runtime.docker]` - For custom Dockerfile deployments (when using `dockerfile_path`) -### Deprecated: Runtime names `cortex` and `python` - -The runtime names `cortex` and `python` are deprecated. The old names still work for backwards compatibility but will be removed in a future release. - -| Deprecated Name | New Name | -| ------------------ | ------------------- | -| `[runtime.cortex]` | `[runtime.auto-py]` | - ### Deprecated: Top-level [dependencies.*] Top-level dependency sections are deprecated. Please move dependencies to runtime-specific sections: @@ -524,7 +516,7 @@ Top-level dependency sections are deprecated. Please move dependencies to runtim | `[dependencies.apt]` | `[runtime.{type}.dependencies.apt]` | | `[dependencies.conda]` | `[runtime.{type}.dependencies.conda]` | -Where `{type}` is your runtime type (e.g., `auto-py`, `python`). +Where `{type}` is your runtime type (e.g., `cortex`, `python`). The `[dependencies.paths]` section has been replaced with From dd9d7b97a345aafcbfdd0d5fbb0a1bf69a8b0874 Mon Sep 17 00:00:00 2001 From: elijah-rou Date: Wed, 28 Jan 2026 18:18:33 +0000 Subject: [PATCH 16/16] Prettified Code! --- toml-reference/toml-reference.mdx | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/toml-reference/toml-reference.mdx b/toml-reference/toml-reference.mdx index dbcc5db8..2859cfc1 100644 --- a/toml-reference/toml-reference.mdx +++ b/toml-reference/toml-reference.mdx @@ -6,13 +6,13 @@ description: Complete reference for all parameters available in Cerebrium's defa **Deprecation Notice:** The `cerebrium.` prefix is being removed from all configuration sections. Sub-keys are becoming top-level keys (e.g., `[cerebrium.deployment]` → `[deployment]`). The prefixed format is still supported for backwards compatibility but will be removed in a future release. We recommend migrating to the new format. -| Current (Deprecated) | New Format | -| ---------------------------- | ------------------- | -| `[cerebrium.deployment]` | `[deployment]` | +| Current (Deprecated) | New Format | +| ---------------------------- | ------------------ | +| `[cerebrium.deployment]` | `[deployment]` | | `[cerebrium.runtime.cortex]` | `[runtime.cortex]` | -| `[cerebrium.hardware]` | `[hardware]` | -| `[cerebrium.scaling]` | `[scaling]` | -| `[cerebrium.dependencies]` | `[dependencies]` | +| `[cerebrium.hardware]` | `[hardware]` | +| `[cerebrium.scaling]` | `[scaling]` | +| `[cerebrium.dependencies]` | `[dependencies]` | @@ -479,7 +479,7 @@ The `cerebrium.` prefix on all section names is deprecated. Please migrate to th | Deprecated Format | New Format | | ------------------------------ | -------------------- | | `[cerebrium.deployment]` | `[deployment]` | -| `[cerebrium.runtime.cortex]` | `[runtime.cortex]` | +| `[cerebrium.runtime.cortex]` | `[runtime.cortex]` | | `[cerebrium.runtime.python]` | `[runtime.python]` | | `[cerebrium.runtime.docker]` | `[runtime.docker]` | | `[cerebrium.hardware]` | `[hardware]` | @@ -491,8 +491,8 @@ The `cerebrium.` prefix on all section names is deprecated. Please migrate to th The following fields in `[deployment]` are deprecated. Please move them to the appropriate runtime section: -| Deprecated Field | New Location | -| --------------------- | ----------------------------------------- | +| Deprecated Field | New Location | +| --------------------- | ---------------------------------------- | | python_version | `[runtime.cortex]` or `[runtime.python]` | | docker_base_image_url | `[runtime.cortex]` or `[runtime.python]` | | shell_commands | `[runtime.cortex]` or `[runtime.python]` |