-
Notifications
You must be signed in to change notification settings - Fork 8
docs: restructure runtime configuration and deprecate legacy formats #262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
455c43a
ee9d1cc
518eddc
3cc0fac
55f9878
b547fbf
aaf8669
cf84405
5287e16
ce54244
0b4bc63
d83178c
846c99e
df2c2f3
d1066c4
dd9d7b9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -51,40 +51,71 @@ CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8192 | |
| When creating a Dockerfile for Cerebrium, there are three key requirements: | ||
|
|
||
| 1. You must expose a port using the `EXPOSE` command - this port will be referenced later in your `cerebrium.toml` configuration | ||
| 2. A `CMD` command is required to specify what runs when the container starts (typically your server process) | ||
| 2. Either a `CMD` or `ENTRYPOINT` directive must be defined in your Dockerfile OR the `entrypoint` key in your `cerebrium.toml` under `[runtime.docker]`. This specifies what runs when the container starts. The TOML configuration will take precedence | ||
| 3. Set the working directory using `WORKDIR` to ensure your application runs from the correct location (defaults to root directory if not specified) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No WORKDIR in the dockerfile above? worth putting it in |
||
|
|
||
| Update cerebrium.toml to include a custom runtime section with the `dockerfile_path` parameter: | ||
| Update cerebrium.toml to include a docker runtime section with the `dockerfile_path` parameter: | ||
|
|
||
| ```toml | ||
| [cerebrium.runtime.custom] | ||
| [deployment] | ||
| name = "my-docker-app" | ||
|
|
||
| [runtime.docker] | ||
| dockerfile_path = "./Dockerfile" | ||
| port = 8192 | ||
| healthcheck_endpoint = "/health" | ||
| readycheck_endpoint = "/ready" | ||
| dockerfile_path = "./Dockerfile" | ||
| ``` | ||
|
|
||
| The configuration requires three key parameters: | ||
| The configuration requires the following parameters: | ||
|
|
||
| - `dockerfile_path`: The relative path to the Dockerfile used to build the app. | ||
| - `port`: The port the server listens on. | ||
| - `healthcheck_endpoint`: The endpoint used to confirm instance health. If unspecified, defaults to a TCP ping on the configured port. If the health check registers a non-200 response, it will be considered _unhealthy_, and be restarted should it not recover timely. | ||
| - `readycheck_endpoint`: The endpoint used to confirm if the instance is ready to receive. If unspecified, defaults to a TCP ping on the configured port. If the ready check registers a non-200 response, it will not be a viable target for request routing. | ||
| - `dockerfile_path`: The relative path to the Dockerfile used to build the app. | ||
| - `entrypoint` (optional): The command to start the application. Will be required if `CMD` or `ENTRYPOINT` is not defined in the given dockerfile. | ||
|
|
||
| ### Entrypoint Precedence | ||
|
|
||
| If a Dockerfile does not contain a `CMD` clause, specifying the `entrypoint` parameter in the `cerebrium.toml` file is required. | ||
| <Info> | ||
| The `entrypoint` parameter in `cerebrium.toml` **always takes precedence** | ||
| over the `CMD` or `ENTRYPOINT` instruction in your Dockerfile. If you specify | ||
| an `entrypoint` in your TOML configuration, it will be used regardless of what | ||
| `CMD` or `ENTRYPOINT` is defined in your Dockerfile. | ||
| </Info> | ||
|
|
||
| If your Dockerfile does not contain a `CMD` or `ENTRYPOINT` instruction, you **must** specify the `entrypoint` parameter in your `cerebrium.toml`: | ||
|
|
||
| ```toml | ||
| [cerebrium.runtime.custom] | ||
| [deployment] | ||
| name = "my-docker-app" | ||
|
|
||
| [runtime.docker] | ||
| dockerfile_path = "./Dockerfile" | ||
| entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8192"] | ||
| ... | ||
| port = 8192 | ||
| healthcheck_endpoint = "/health" | ||
| readycheck_endpoint = "/ready" | ||
| ``` | ||
|
|
||
| If you want to override your Dockerfile's `CMD` at deploy time without modifying the Dockerfile, simply add the `entrypoint` parameter to your TOML configuration: | ||
|
|
||
| ```toml | ||
| [deployment] | ||
| name = "my-docker-app" | ||
|
|
||
| [runtime.docker] | ||
| dockerfile_path = "./Dockerfile" | ||
| # This will override any CMD in your Dockerfile | ||
| entrypoint = ["python", "server.py", "--port", "8192"] | ||
| port = 8192 | ||
| ``` | ||
|
|
||
| <Warning> | ||
| When specifying a `dockerfile_path`, all dependencies and necessary commands | ||
| should be installed and executed within the Dockerfile. Dependencies listed | ||
| under `cerebrium.dependencies.*`, as well as | ||
| `cerebrium.deployment.shell_commands` and | ||
| `cerebrium.deployment.pre_build_commands`, will be ignored. | ||
| under `dependencies.*`, as well as `shell_commands` and `pre_build_commands`, | ||
| will be ignored. | ||
| </Warning> | ||
|
|
||
| ## Building Generic Dockerized Apps | ||
|
|
@@ -165,9 +196,12 @@ CMD ["dumb-init", "--", "/rs_server"] | |
| Similarly to the FastAPI webserver, the application should be configured in the `cerebrium.toml` file: | ||
|
|
||
| ```toml | ||
| [cerebrium.runtime.custom] | ||
| [deployment] | ||
| name = "rust-server" | ||
|
|
||
| [runtime.docker] | ||
| dockerfile_path = "./Dockerfile" | ||
| port = 8192 | ||
| healthcheck_endpoint = "/health" | ||
| readycheck_endpoint = "/ready" | ||
| dockerfile_path = "./Dockerfile" | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,7 +3,7 @@ title: "Custom Python Web Servers" | |
| description: "Run ASGI/WSGI Python apps on Cerebrium" | ||
| --- | ||
|
|
||
| While Cerebrium's default runtime works well for most app needs, teams sometimes need more control over their web server implementation. Using ASGI or WSGI servers through Cerebrium's custom runtime feature enables capabilities like custom authentication, dynamic batching, frontend dashboards, public endpoints, and WebSocket connections. | ||
| While Cerebrium's default runtime works well for most app needs, teams sometimes need more control over their web server implementation. Using ASGI or WSGI servers through Cerebrium's Python runtime feature enables capabilities like custom authentication, dynamic batching, frontend dashboards, public endpoints, and WebSocket connections. | ||
|
|
||
| ## Setting Up Custom Servers | ||
|
|
||
|
|
@@ -26,29 +26,46 @@ def ready(): | |
| return "OK" | ||
| ``` | ||
|
|
||
| Configure this server in `cerebrium.toml` by adding a custom runtime section: | ||
| Configure this server in `cerebrium.toml` by adding a Python runtime section: | ||
|
|
||
| ```toml | ||
| [cerebrium.runtime.custom] | ||
| port = 5000 | ||
| [deployment] | ||
| name = "my-fastapi-app" | ||
|
|
||
| [runtime.python] | ||
| entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "5000"] | ||
| port = 5000 | ||
| healthcheck_endpoint = "/health" | ||
| readycheck_endpoint = "/ready" | ||
|
|
||
| [cerebrium.dependencies.pip] | ||
| [dependencies.pip] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not worth mentioning paths here to point to requirements.txt or apt dependancies. or a link to other parts of the doc to see |
||
| pydantic = "latest" | ||
| numpy = "latest" | ||
| loguru = "latest" | ||
| fastapi = "latest" | ||
| uvicorn = "latest" | ||
| ``` | ||
|
|
||
| The configuration requires three key parameters: | ||
| The configuration requires the following key parameters: | ||
|
|
||
| - `entrypoint`: The command that starts your server | ||
| - `port`: The port your server listens on | ||
| - `healthcheck_endpoint`: The endpoint used to confirm instance health. If unspecified, defaults to a TCP ping on the configured port. If the health check registers a non-200 response, it will be considered _unhealthy_, and be restarted should it not recover timely. | ||
| - `readycheck_endpoint`: The endpoint used to confirm if the instance is ready to receive. If unspecified, defaults to a TCP ping on the configured port. If the ready check registers a non-200 response, it will not be a viable target for request routing. | ||
|
|
||
| You can also configure build settings in the Python runtime section: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you mention built settings but where can i see an exhaustive list? |
||
|
|
||
| ```toml | ||
| [runtime.python] | ||
| python_version = "3.11" | ||
| docker_base_image_url = "debian:bookworm-slim" | ||
| use_uv = true | ||
| entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] | ||
| port = 8000 | ||
| healthcheck_endpoint = "/health" | ||
| readycheck_endpoint = "/ready" | ||
| ``` | ||
|
|
||
| <Info> | ||
| For ASGI applications like FastAPI, include the appropriate server package | ||
| (like `uvicorn`) in your dependencies. After deployment, your endpoints become | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,17 +21,17 @@ Check out the [Introductory Guide](/cerebrium/getting-started/introduction) for | |
| <Info> | ||
| It is possible to initialize an existing project by adding a `cerebrium.toml` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can word this better also make it more obvious instead of an info block but this is not the right please to put it. Maybe we put in the Getting started -> introduciton page about having an existing project |
||
| file to the root of your codebase, defining your entrypoint (`main.py` if | ||
| using the default runtime, or adding an entrypoint to the .toml file if using | ||
| a custom runtime) and including the necessary files in the `deployment` | ||
| section of your `cerebrium.toml` file. | ||
| using the default cortex runtime, or adding an entrypoint to the runtime | ||
| section if using a python or docker runtime) and including the necessary files | ||
| in the `deployment` section of your `cerebrium.toml` file. | ||
| </Info> | ||
|
|
||
| ## Hardware Configuration | ||
|
|
||
| Cerebrium provides flexible hardware options to match app requirements. The basic configuration specifies GPU type and memory allocations. | ||
|
|
||
| ```toml | ||
| [cerebrium.hardware] | ||
| [hardware] | ||
| compute = "AMPERE_A10" # GPU selection | ||
| memory = 16.0 # Memory allocation in GB | ||
| cpu = 4 # Number of CPU cores | ||
|
|
@@ -44,11 +44,20 @@ For detailed hardware specifications and performance characteristics see the [GP | |
|
|
||
| ### Selecting a Python Version | ||
|
|
||
| The Python runtime version forms the foundation of every Cerebrium app. We currently support versions 3.10 to 3.13. Specify the Python version in the deployment section of the configuration: | ||
| The Python runtime version forms the foundation of every Cerebrium app. We currently support versions 3.10 to 3.13. Specify the Python version in the runtime section of the configuration: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it not worth that if they want to use a higher python version they must look at customer dockerfile deployments? |
||
|
|
||
| ```toml | ||
| [cerebrium.deployment] | ||
| python_version = 3.11 | ||
| [runtime.cortex] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe explain Cortex at the beginning page. This is the first time they here about it. Maybe just say Cortex is our Cerebrium optimized runtime where they can define multiple configurations through this page but also dockerfiles and python web servers (link to docs) |
||
| python_version = "3.11" | ||
| ``` | ||
|
|
||
| Or for custom Python ASGI/WSGI apps: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. link to docs |
||
|
|
||
| ```toml | ||
| [runtime.python] | ||
| python_version = "3.11" | ||
| entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] | ||
| port = 8000 | ||
| ``` | ||
|
|
||
| The Python version affects the entire dependency chain. For instance, some packages may not support newer Python versions immediately after release. | ||
|
|
@@ -63,7 +72,7 @@ The Python version affects the entire dependency chain. For instance, some packa | |
| Python dependencies can be managed directly in TOML or through requirement files. The system caches packages to speed up builds: | ||
|
|
||
| ```toml | ||
| [cerebrium.dependencies.pip] | ||
| [dependencies.pip] | ||
| torch = "==2.0.0" | ||
| transformers = "==4.30.0" | ||
| numpy = "latest" | ||
|
|
@@ -72,8 +81,8 @@ numpy = "latest" | |
| Or using an existing requirements file: | ||
|
|
||
| ```toml | ||
| [cerebrium.dependencies.paths] | ||
| pip = "requirements.txt" | ||
| [dependencies.pip] | ||
| _file_relative_path = "requirements.txt" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. none of our names start with underscore why here?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can I have path and values together? |
||
| ``` | ||
|
|
||
| <Tip> | ||
|
|
@@ -85,20 +94,20 @@ The system implements an intelligent caching strategy at the node level. When an | |
|
|
||
| ### Adding APT Packages | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it worth putting apt and conda before pip in docs since they happen before? |
||
|
|
||
| System-level packages provide the foundation for many ML apps, handling everything from image-processing libraries to audio codecs. These can be added to the `cerebrium.toml` file under the `[cerebrium.dependencies.apt]` section as follows: | ||
| System-level packages provide the foundation for many ML apps, handling everything from image-processing libraries to audio codecs. These can be added to the `cerebrium.toml` file under the `[dependencies.apt]` section as follows: | ||
|
|
||
| ```toml | ||
| [cerebrium.dependencies.apt] | ||
| [dependencies.apt] | ||
| ffmpeg = "latest" | ||
| libopenblas-base = "latest" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. show setting a version here for one of them |
||
| libomp-dev = "latest" | ||
| ``` | ||
|
|
||
| For teams with standardized system dependencies, text files can be used instead by adding the following to the `[cerebrium.dependencies.paths]` section: | ||
| For teams with standardized system dependencies, text files can be used instead: | ||
|
|
||
| ```toml | ||
| [cerebrium.dependencies.paths] | ||
| apt = "deps_folder/pkglist.txt" | ||
| [dependencies.apt] | ||
| _file_relative_path = "deps_folder/pkglist.txt" | ||
| ``` | ||
|
|
||
| Since APT packages modify the system environment, any changes to these dependencies trigger a full rebuild of the container image. This ensures system-level changes are properly integrated but means builds will take longer than when modifying Python packages alone. | ||
|
|
@@ -108,7 +117,7 @@ Since APT packages modify the system environment, any changes to these dependenc | |
| Conda excels at managing complex system-level Python dependencies, particularly for GPU support and scientific computing: | ||
|
|
||
| ```toml | ||
| [cerebrium.dependencies.conda] | ||
| [dependencies.conda] | ||
| cuda = ">=11.7" | ||
| cudatoolkit = "11.7" | ||
| opencv = "latest" | ||
|
|
@@ -117,8 +126,8 @@ opencv = "latest" | |
| Teams using conda environments can specify their environment file: | ||
|
|
||
| ```toml | ||
| [cerebrium.dependencies.paths] | ||
| conda = "conda_pkglist.txt" | ||
| [dependencies.conda] | ||
| _file_relative_path = "conda_pkglist.txt" | ||
| ``` | ||
|
|
||
| Like APT packages, Conda packages often modify system-level components. Changes to Conda dependencies will trigger a full rebuild to ensure all binary dependencies and system libraries are correctly configured. Consider batching Conda dependency updates together to minimize rebuild time. | ||
|
|
@@ -132,8 +141,7 @@ Cerebrium's build process includes two specialized command types that execute at | |
| Pre-build commands execute at the start of the build process, before dependency installation begins. This early execution timing makes them essential for setting up the build environment: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. dependency (Apt, conda,pip) installation begins |
||
|
|
||
| ```toml | ||
|
|
||
| [cerebrium.deployment] | ||
| [runtime.cortex] | ||
| pre_build_commands = [ | ||
| # Add specialized build tools | ||
| "curl -o /usr/local/bin/pget -L 'https://github.com/replicate/pget/releases/download/v0.6.2/pget_linux_x86_64'", | ||
|
|
@@ -148,7 +156,7 @@ Pre-build commands typically handle tasks like installing build tools, configuri | |
| Shell commands execute after all dependencies install and the application code copies into the container. This later timing ensures access to the complete environment: | ||
|
|
||
| ```toml | ||
| [cerebrium.deployment] | ||
| [runtime.cortex] | ||
| shell_commands = [ | ||
| # Initialize application resources | ||
| "python -m download_models", | ||
|
|
@@ -178,7 +186,7 @@ The base image selection shapes how an app runs in Cerebrium. While the default | |
| Cerebrium supports several categories of base images to ensure system compatibility such as nvidia, ubuntu and python images. | ||
|
|
||
| ```toml | ||
| [cerebrium.deployment] | ||
| [runtime.cortex] | ||
| docker_base_image_url = "debian:bookworm-slim" # Default minimal image | ||
| #docker_base_image_url = "nvidia/cuda:12.0.1-runtime-ubuntu22.04" # CUDA-enabled images | ||
| #docker_base_image_url = "ubuntu:22.04" # debian images | ||
|
|
@@ -205,7 +213,7 @@ docker login -u your-dockerhub-username | |
| After logging in, you can use the image in your configuration: | ||
|
|
||
| ```toml | ||
| [cerebrium.deployment] | ||
| [runtime.cortex] | ||
| docker_base_image_url = "bob/infinity:latest" | ||
| ``` | ||
|
|
||
|
|
@@ -226,27 +234,30 @@ docker_base_image_url = "bob/infinity:latest" | |
| Public ECR images from the `public.ecr.aws` registry work without authentication: | ||
|
|
||
| ```toml | ||
| [cerebrium.deployment] | ||
| [runtime.cortex] | ||
| docker_base_image_url = "public.ecr.aws/lambda/python:3.11" | ||
| ``` | ||
|
|
||
| However, **private ECR images** require authentication. See [Using Private Docker Registries](/cerebrium/container-images/private-docker-registry) for setup instructions. | ||
|
|
||
| ## Custom Runtimes | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think its not worth showing examples in this section, but rather write a parapgrah explaining and then link to docs where its more thorough |
||
|
|
||
| While Cerebrium's default runtime works well for most apps, teams often need more control over their server implementation. Custom runtimes enable features like custom authentication, dynamic batching, public endpoints, or WebSocket connections. | ||
| While Cerebrium's default cortex runtime works well for most apps, teams often need more control over their server implementation. Custom runtimes enable features like custom authentication, dynamic batching, public endpoints, or WebSocket connections. | ||
|
|
||
| ### Basic Configuration | ||
| ### Python Runtime (ASGI/WSGI) | ||
|
|
||
| Define a custom runtime by adding the `cerebrium.runtime.custom` section to the configuration: | ||
| For custom Python web servers, use the `[runtime.python]` section: | ||
|
|
||
| ```toml | ||
| [cerebrium.runtime.custom] | ||
| [deployment] | ||
| name = "my-fastapi-app" | ||
|
|
||
| [runtime.python] | ||
| python_version = "3.11" | ||
| entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"] | ||
| port = 8080 | ||
| healthcheck_endpoint = "" # Empty string uses TCP health check | ||
| readycheck_endpoint = "" # Empty string uses TCP health check | ||
|
|
||
| ``` | ||
|
|
||
| Key parameters: | ||
|
|
@@ -259,21 +270,45 @@ Key parameters: | |
| <Info> | ||
| Check out [this | ||
| example](https://github.com/CerebriumAI/examples/tree/master/11-python-apps/1-asgi-fastapi-server) | ||
| for a detailed implementation of a FastAPI server that uses a custom runtime. | ||
| for a detailed implementation of a FastAPI server that uses a Python runtime. | ||
| </Info> | ||
|
|
||
| ### Docker Runtime | ||
|
|
||
| For complete control over your container, use the `[runtime.docker]` section with a custom Dockerfile: | ||
|
|
||
| ```toml | ||
| [deployment] | ||
| name = "my-docker-app" | ||
|
|
||
| [runtime.docker] | ||
| dockerfile_path = "./Dockerfile" | ||
| port = 8080 | ||
| healthcheck_endpoint = "/health" | ||
| readycheck_endpoint = "/ready" | ||
| ``` | ||
|
|
||
| <Warning> | ||
| When using the docker runtime, all dependencies and build commands should be | ||
| handled within the Dockerfile. The `[dependencies.*]` sections will be | ||
| ignored. | ||
| </Warning> | ||
|
|
||
| ### Self-Contained Servers | ||
|
|
||
| Custom runtimes also support apps with built-in servers. For example, deploying a VLLM server requires no Python code: | ||
|
|
||
| ```toml | ||
| [cerebrium.runtime.custom] | ||
| entrypoint = "vllm serve meta-llama/Meta-Llama-3-8B-Instruct --host 0.0.0.0 --port 8000 --device cuda" | ||
| [deployment] | ||
| name = "vllm-server" | ||
|
|
||
| [runtime.python] | ||
| entrypoint = ["vllm", "serve", "meta-llama/Meta-Llama-3-8B-Instruct", "--host", "0.0.0.0", "--port", "8000", "--device", "cuda"] | ||
| port = 8000 | ||
| healthcheck_endpoint = "/health" | ||
| healthcheck_endpoint = "/ready" | ||
| readycheck_endpoint = "/ready" | ||
|
|
||
| [cerebrium.dependencies.pip] | ||
| [dependencies.pip] | ||
| torch = "latest" | ||
| vllm = "latest" | ||
| ``` | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The toml configuration, 'entrypoint', will take precedence