Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 48 additions & 14 deletions cerebrium/container-images/custom-dockerfiles.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -51,40 +51,71 @@ CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8192
When creating a Dockerfile for Cerebrium, there are three key requirements:

1. You must expose a port using the `EXPOSE` command - this port will be referenced later in your `cerebrium.toml` configuration
2. A `CMD` command is required to specify what runs when the container starts (typically your server process)
2. Either a `CMD` or `ENTRYPOINT` directive must be defined in your Dockerfile OR the `entrypoint` key in your `cerebrium.toml` under `[runtime.docker]`. This specifies what runs when the container starts. The TOML configuration will take precedence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The toml configuration, 'entrypoint', will take precedence

3. Set the working directory using `WORKDIR` to ensure your application runs from the correct location (defaults to root directory if not specified)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No WORKDIR in the dockerfile above? worth putting it in


Update cerebrium.toml to include a custom runtime section with the `dockerfile_path` parameter:
Update cerebrium.toml to include a docker runtime section with the `dockerfile_path` parameter:

```toml
[cerebrium.runtime.custom]
[deployment]
name = "my-docker-app"

[runtime.docker]
dockerfile_path = "./Dockerfile"
port = 8192
healthcheck_endpoint = "/health"
readycheck_endpoint = "/ready"
dockerfile_path = "./Dockerfile"
```

The configuration requires three key parameters:
The configuration requires the following parameters:

- `dockerfile_path`: The relative path to the Dockerfile used to build the app.
- `port`: The port the server listens on.
- `healthcheck_endpoint`: The endpoint used to confirm instance health. If unspecified, defaults to a TCP ping on the configured port. If the health check registers a non-200 response, it will be considered _unhealthy_, and be restarted should it not recover timely.
- `readycheck_endpoint`: The endpoint used to confirm if the instance is ready to receive. If unspecified, defaults to a TCP ping on the configured port. If the ready check registers a non-200 response, it will not be a viable target for request routing.
- `dockerfile_path`: The relative path to the Dockerfile used to build the app.
- `entrypoint` (optional): The command to start the application. Will be required if `CMD` or `ENTRYPOINT` is not defined in the given dockerfile.

### Entrypoint Precedence

If a Dockerfile does not contain a `CMD` clause, specifying the `entrypoint` parameter in the `cerebrium.toml` file is required.
<Info>
The `entrypoint` parameter in `cerebrium.toml` **always takes precedence**
over the `CMD` or `ENTRYPOINT` instruction in your Dockerfile. If you specify
an `entrypoint` in your TOML configuration, it will be used regardless of what
`CMD` or `ENTRYPOINT` is defined in your Dockerfile.
</Info>

If your Dockerfile does not contain a `CMD` or `ENTRYPOINT` instruction, you **must** specify the `entrypoint` parameter in your `cerebrium.toml`:

```toml
[cerebrium.runtime.custom]
[deployment]
name = "my-docker-app"

[runtime.docker]
dockerfile_path = "./Dockerfile"
entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8192"]
...
port = 8192
healthcheck_endpoint = "/health"
readycheck_endpoint = "/ready"
```

If you want to override your Dockerfile's `CMD` at deploy time without modifying the Dockerfile, simply add the `entrypoint` parameter to your TOML configuration:

```toml
[deployment]
name = "my-docker-app"

[runtime.docker]
dockerfile_path = "./Dockerfile"
# This will override any CMD in your Dockerfile
entrypoint = ["python", "server.py", "--port", "8192"]
port = 8192
```

<Warning>
When specifying a `dockerfile_path`, all dependencies and necessary commands
should be installed and executed within the Dockerfile. Dependencies listed
under `cerebrium.dependencies.*`, as well as
`cerebrium.deployment.shell_commands` and
`cerebrium.deployment.pre_build_commands`, will be ignored.
under `dependencies.*`, as well as `shell_commands` and `pre_build_commands`,
will be ignored.
</Warning>

## Building Generic Dockerized Apps
Expand Down Expand Up @@ -165,9 +196,12 @@ CMD ["dumb-init", "--", "/rs_server"]
Similarly to the FastAPI webserver, the application should be configured in the `cerebrium.toml` file:

```toml
[cerebrium.runtime.custom]
[deployment]
name = "rust-server"

[runtime.docker]
dockerfile_path = "./Dockerfile"
port = 8192
healthcheck_endpoint = "/health"
readycheck_endpoint = "/ready"
dockerfile_path = "./Dockerfile"
```
29 changes: 23 additions & 6 deletions cerebrium/container-images/custom-web-servers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: "Custom Python Web Servers"
description: "Run ASGI/WSGI Python apps on Cerebrium"
---

While Cerebrium's default runtime works well for most app needs, teams sometimes need more control over their web server implementation. Using ASGI or WSGI servers through Cerebrium's custom runtime feature enables capabilities like custom authentication, dynamic batching, frontend dashboards, public endpoints, and WebSocket connections.
While Cerebrium's default runtime works well for most app needs, teams sometimes need more control over their web server implementation. Using ASGI or WSGI servers through Cerebrium's Python runtime feature enables capabilities like custom authentication, dynamic batching, frontend dashboards, public endpoints, and WebSocket connections.

## Setting Up Custom Servers

Expand All @@ -26,29 +26,46 @@ def ready():
return "OK"
```

Configure this server in `cerebrium.toml` by adding a custom runtime section:
Configure this server in `cerebrium.toml` by adding a Python runtime section:

```toml
[cerebrium.runtime.custom]
port = 5000
[deployment]
name = "my-fastapi-app"

[runtime.python]
entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "5000"]
port = 5000
healthcheck_endpoint = "/health"
readycheck_endpoint = "/ready"

[cerebrium.dependencies.pip]
[dependencies.pip]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not worth mentioning paths here to point to requirements.txt or apt dependancies. or a link to other parts of the doc to see

pydantic = "latest"
numpy = "latest"
loguru = "latest"
fastapi = "latest"
uvicorn = "latest"
```

The configuration requires three key parameters:
The configuration requires the following key parameters:

- `entrypoint`: The command that starts your server
- `port`: The port your server listens on
- `healthcheck_endpoint`: The endpoint used to confirm instance health. If unspecified, defaults to a TCP ping on the configured port. If the health check registers a non-200 response, it will be considered _unhealthy_, and be restarted should it not recover timely.
- `readycheck_endpoint`: The endpoint used to confirm if the instance is ready to receive. If unspecified, defaults to a TCP ping on the configured port. If the ready check registers a non-200 response, it will not be a viable target for request routing.

You can also configure build settings in the Python runtime section:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mention built settings but where can i see an exhaustive list?


```toml
[runtime.python]
python_version = "3.11"
docker_base_image_url = "debian:bookworm-slim"
use_uv = true
entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
port = 8000
healthcheck_endpoint = "/health"
readycheck_endpoint = "/ready"
```

<Info>
For ASGI applications like FastAPI, include the appropriate server package
(like `uvicorn`) in your dependencies. After deployment, your endpoints become
Expand Down
103 changes: 69 additions & 34 deletions cerebrium/container-images/defining-container-images.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,17 +21,17 @@ Check out the [Introductory Guide](/cerebrium/getting-started/introduction) for
<Info>
It is possible to initialize an existing project by adding a `cerebrium.toml`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can word this better also make it more obvious instead of an info block but this is not the right please to put it. Maybe we put in the Getting started -> introduciton page about having an existing project

file to the root of your codebase, defining your entrypoint (`main.py` if
using the default runtime, or adding an entrypoint to the .toml file if using
a custom runtime) and including the necessary files in the `deployment`
section of your `cerebrium.toml` file.
using the default cortex runtime, or adding an entrypoint to the runtime
section if using a python or docker runtime) and including the necessary files
in the `deployment` section of your `cerebrium.toml` file.
</Info>

## Hardware Configuration

Cerebrium provides flexible hardware options to match app requirements. The basic configuration specifies GPU type and memory allocations.

```toml
[cerebrium.hardware]
[hardware]
compute = "AMPERE_A10" # GPU selection
memory = 16.0 # Memory allocation in GB
cpu = 4 # Number of CPU cores
Expand All @@ -44,11 +44,20 @@ For detailed hardware specifications and performance characteristics see the [GP

### Selecting a Python Version

The Python runtime version forms the foundation of every Cerebrium app. We currently support versions 3.10 to 3.13. Specify the Python version in the deployment section of the configuration:
The Python runtime version forms the foundation of every Cerebrium app. We currently support versions 3.10 to 3.13. Specify the Python version in the runtime section of the configuration:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it not worth that if they want to use a higher python version they must look at customer dockerfile deployments?


```toml
[cerebrium.deployment]
python_version = 3.11
[runtime.cortex]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain Cortex at the beginning page. This is the first time they here about it.

Maybe just say Cortex is our Cerebrium optimized runtime where they can define multiple configurations through this page but also dockerfiles and python web servers (link to docs)

python_version = "3.11"
```

Or for custom Python ASGI/WSGI apps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to docs


```toml
[runtime.python]
python_version = "3.11"
entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
port = 8000
```

The Python version affects the entire dependency chain. For instance, some packages may not support newer Python versions immediately after release.
Expand All @@ -63,7 +72,7 @@ The Python version affects the entire dependency chain. For instance, some packa
Python dependencies can be managed directly in TOML or through requirement files. The system caches packages to speed up builds:

```toml
[cerebrium.dependencies.pip]
[dependencies.pip]
torch = "==2.0.0"
transformers = "==4.30.0"
numpy = "latest"
Expand All @@ -72,8 +81,8 @@ numpy = "latest"
Or using an existing requirements file:

```toml
[cerebrium.dependencies.paths]
pip = "requirements.txt"
[dependencies.pip]
_file_relative_path = "requirements.txt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of our names start with underscore why here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can I have path and values together?

```

<Tip>
Expand All @@ -85,20 +94,20 @@ The system implements an intelligent caching strategy at the node level. When an

### Adding APT Packages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth putting apt and conda before pip in docs since they happen before?


System-level packages provide the foundation for many ML apps, handling everything from image-processing libraries to audio codecs. These can be added to the `cerebrium.toml` file under the `[cerebrium.dependencies.apt]` section as follows:
System-level packages provide the foundation for many ML apps, handling everything from image-processing libraries to audio codecs. These can be added to the `cerebrium.toml` file under the `[dependencies.apt]` section as follows:

```toml
[cerebrium.dependencies.apt]
[dependencies.apt]
ffmpeg = "latest"
libopenblas-base = "latest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show setting a version here for one of them

libomp-dev = "latest"
```

For teams with standardized system dependencies, text files can be used instead by adding the following to the `[cerebrium.dependencies.paths]` section:
For teams with standardized system dependencies, text files can be used instead:

```toml
[cerebrium.dependencies.paths]
apt = "deps_folder/pkglist.txt"
[dependencies.apt]
_file_relative_path = "deps_folder/pkglist.txt"
```

Since APT packages modify the system environment, any changes to these dependencies trigger a full rebuild of the container image. This ensures system-level changes are properly integrated but means builds will take longer than when modifying Python packages alone.
Expand All @@ -108,7 +117,7 @@ Since APT packages modify the system environment, any changes to these dependenc
Conda excels at managing complex system-level Python dependencies, particularly for GPU support and scientific computing:

```toml
[cerebrium.dependencies.conda]
[dependencies.conda]
cuda = ">=11.7"
cudatoolkit = "11.7"
opencv = "latest"
Expand All @@ -117,8 +126,8 @@ opencv = "latest"
Teams using conda environments can specify their environment file:

```toml
[cerebrium.dependencies.paths]
conda = "conda_pkglist.txt"
[dependencies.conda]
_file_relative_path = "conda_pkglist.txt"
```

Like APT packages, Conda packages often modify system-level components. Changes to Conda dependencies will trigger a full rebuild to ensure all binary dependencies and system libraries are correctly configured. Consider batching Conda dependency updates together to minimize rebuild time.
Expand All @@ -132,8 +141,7 @@ Cerebrium's build process includes two specialized command types that execute at
Pre-build commands execute at the start of the build process, before dependency installation begins. This early execution timing makes them essential for setting up the build environment:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependency (Apt, conda,pip) installation begins


```toml

[cerebrium.deployment]
[runtime.cortex]
pre_build_commands = [
# Add specialized build tools
"curl -o /usr/local/bin/pget -L 'https://github.com/replicate/pget/releases/download/v0.6.2/pget_linux_x86_64'",
Expand All @@ -148,7 +156,7 @@ Pre-build commands typically handle tasks like installing build tools, configuri
Shell commands execute after all dependencies install and the application code copies into the container. This later timing ensures access to the complete environment:

```toml
[cerebrium.deployment]
[runtime.cortex]
shell_commands = [
# Initialize application resources
"python -m download_models",
Expand Down Expand Up @@ -178,7 +186,7 @@ The base image selection shapes how an app runs in Cerebrium. While the default
Cerebrium supports several categories of base images to ensure system compatibility such as nvidia, ubuntu and python images.

```toml
[cerebrium.deployment]
[runtime.cortex]
docker_base_image_url = "debian:bookworm-slim" # Default minimal image
#docker_base_image_url = "nvidia/cuda:12.0.1-runtime-ubuntu22.04" # CUDA-enabled images
#docker_base_image_url = "ubuntu:22.04" # debian images
Expand All @@ -205,7 +213,7 @@ docker login -u your-dockerhub-username
After logging in, you can use the image in your configuration:

```toml
[cerebrium.deployment]
[runtime.cortex]
docker_base_image_url = "bob/infinity:latest"
```

Expand All @@ -226,27 +234,30 @@ docker_base_image_url = "bob/infinity:latest"
Public ECR images from the `public.ecr.aws` registry work without authentication:

```toml
[cerebrium.deployment]
[runtime.cortex]
docker_base_image_url = "public.ecr.aws/lambda/python:3.11"
```

However, **private ECR images** require authentication. See [Using Private Docker Registries](/cerebrium/container-images/private-docker-registry) for setup instructions.

## Custom Runtimes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its not worth showing examples in this section, but rather write a parapgrah explaining and then link to docs where its more thorough


While Cerebrium's default runtime works well for most apps, teams often need more control over their server implementation. Custom runtimes enable features like custom authentication, dynamic batching, public endpoints, or WebSocket connections.
While Cerebrium's default cortex runtime works well for most apps, teams often need more control over their server implementation. Custom runtimes enable features like custom authentication, dynamic batching, public endpoints, or WebSocket connections.

### Basic Configuration
### Python Runtime (ASGI/WSGI)

Define a custom runtime by adding the `cerebrium.runtime.custom` section to the configuration:
For custom Python web servers, use the `[runtime.python]` section:

```toml
[cerebrium.runtime.custom]
[deployment]
name = "my-fastapi-app"

[runtime.python]
python_version = "3.11"
entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
port = 8080
healthcheck_endpoint = "" # Empty string uses TCP health check
readycheck_endpoint = "" # Empty string uses TCP health check

```

Key parameters:
Expand All @@ -259,21 +270,45 @@ Key parameters:
<Info>
Check out [this
example](https://github.com/CerebriumAI/examples/tree/master/11-python-apps/1-asgi-fastapi-server)
for a detailed implementation of a FastAPI server that uses a custom runtime.
for a detailed implementation of a FastAPI server that uses a Python runtime.
</Info>

### Docker Runtime

For complete control over your container, use the `[runtime.docker]` section with a custom Dockerfile:

```toml
[deployment]
name = "my-docker-app"

[runtime.docker]
dockerfile_path = "./Dockerfile"
port = 8080
healthcheck_endpoint = "/health"
readycheck_endpoint = "/ready"
```

<Warning>
When using the docker runtime, all dependencies and build commands should be
handled within the Dockerfile. The `[dependencies.*]` sections will be
ignored.
</Warning>

### Self-Contained Servers

Custom runtimes also support apps with built-in servers. For example, deploying a VLLM server requires no Python code:

```toml
[cerebrium.runtime.custom]
entrypoint = "vllm serve meta-llama/Meta-Llama-3-8B-Instruct --host 0.0.0.0 --port 8000 --device cuda"
[deployment]
name = "vllm-server"

[runtime.python]
entrypoint = ["vllm", "serve", "meta-llama/Meta-Llama-3-8B-Instruct", "--host", "0.0.0.0", "--port", "8000", "--device", "cuda"]
port = 8000
healthcheck_endpoint = "/health"
healthcheck_endpoint = "/ready"
readycheck_endpoint = "/ready"

[cerebrium.dependencies.pip]
[dependencies.pip]
torch = "latest"
vllm = "latest"
```
Expand Down
Loading