diff --git a/content/manuals/ai/model-runner/_index.md b/content/manuals/ai/model-runner/_index.md index f65a76613f7..cf109948099 100644 --- a/content/manuals/ai/model-runner/_index.md +++ b/content/manuals/ai/model-runner/_index.md @@ -17,8 +17,8 @@ aliases: Docker Model Runner (DMR) makes it easy to manage, run, and deploy AI models using Docker. Designed for developers, Docker Model Runner streamlines the process of pulling, running, and serving -large language models (LLMs) and other AI models directly from Docker Hub or any -OCI-compliant registry. +large language models (LLMs) and other AI models directly from Docker Hub, +any OCI-compliant registry, or [Hugging Face](https://huggingface.co/). With seamless integration into Docker Desktop and Docker Engine, you can serve models via OpenAI and Ollama-compatible APIs, package GGUF files as @@ -32,7 +32,8 @@ with AI models locally. ## Key features -- [Pull and push models to and from Docker Hub](https://hub.docker.com/u/ai) +- [Pull and push models to and from Docker Hub or any OCI-compliant registry](https://hub.docker.com/u/ai) +- [Pull models from Hugging Face](https://huggingface.co/) - Serve models on [OpenAI and Ollama-compatible APIs](api-reference.md) for easy integration with existing apps - Support for [llama.cpp, vLLM, and Diffusers inference engines](inference-engines.md) (vLLM and Diffusers on Linux with NVIDIA GPUs) - [Generate images from text prompts](inference-engines.md#diffusers) using Stable Diffusion models with the Diffusers backend @@ -81,11 +82,12 @@ Docker Engine only: ## How Docker Model Runner works -Models are pulled from Docker Hub the first time you use them and are stored -locally. They load into memory only at runtime when a request is made, and -unload when not in use to optimize resources. Because models can be large, the -initial pull may take some time. After that, they're cached locally for faster -access. You can interact with the model using +Models are pulled from Docker Hub, an OCI-compliant registry, or +[Hugging Face](https://huggingface.co/) the first time you use them and are +stored locally. They load into memory only at runtime when a request is made, +and unload when not in use to optimize resources. Because models can be large, +the initial pull may take some time. After that, they're cached locally for +faster access. You can interact with the model using [OpenAI and Ollama-compatible APIs](api-reference.md). ### Inference engines