Skip to content

Commit 195f2f0

Browse files
committed
depth-anything: documentation, readme, fix license for base/large models
1 parent 01cfeb0 commit 195f2f0

File tree

4 files changed

+45
-13
lines changed

4 files changed

+45
-13
lines changed

README.md

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,17 @@ Based on [ggml](https://github.com/ggml-org/ggml) similar to the [llama.cpp](htt
1212

1313
### Features
1414

15-
| Model | Task | Backends |
16-
| :-------------------------- | :--------------- | :---------- |
17-
| [**MobileSAM**](#mobilesam) | Segmentation | CPU, Vulkan |
18-
| [**BiRefNet**](#birefnet) | Segmentation | CPU, Vulkan |
19-
| [**MI-GAN**](#mi-gan) | Inpainting | CPU, Vulkan |
20-
| [**ESRGAN**](#real-esrgan) | Super-resolution | CPU, Vulkan |
15+
| Model | Task | Backends |
16+
| :--------------------------------------- | :----------------------- | :---------- |
17+
| [**MobileSAM**](#mobilesam) | Promptable segmentation | CPU, Vulkan |
18+
| [**BiRefNet**](#birefnet) | Dichotomous segmentation | CPU, Vulkan |
19+
| [**Depth-Anything**](#depth-anything-v2) | Depth estimation | CPU, Vulkan |
20+
| [**MI-GAN**](#mi-gan) | Inpainting | CPU, Vulkan |
21+
| [**ESRGAN**](#real-esrgan) | Super-resolution | CPU, Vulkan |
2122
| [_Implement a model [**Guide**]_](docs/model-implementation-guide.md) | | |
2223

24+
**Backbones:** SWIN (v1), DINO (v2), TinyViT
25+
2326
## Get Started
2427

2528
Get the library and executables:
@@ -92,6 +95,16 @@ vision-cli sam -m MobileSAM-F16.gguf -i input.png -p 300 200 -o mask.png --compo
9295
vision-cli birefnet -m BiRefNet-lite-F16.gguf -i input.png -o mask.png --composite comp.png
9396
```
9497

98+
#### Depth-Anything V2
99+
100+
<img width="400" height="256" alt="example-depth-anything" src="" />
101+
102+
[Model download](https://huggingface.co/Acly/Depth-Anything-GGUF/tree/main) | [Paper (arXiv)](https://arxiv.org/abs/2406.09414) | [Repository (GitHub)](https://github.com/DepthAnything/Depth-Anything-V2) | License: Apache-2 / CC-BY-NC-4
103+
104+
```sh
105+
vision-cli depth-anything -m Depth-Anything-V2-Small-F16.gguf -i input.png -o depth.png
106+
```
107+
95108
#### MI-GAN
96109

97110
<img width="400" height="256" alt="example-migan" src="https://github.com/user-attachments/assets/cadf1994-7677-4822-94e5-a2ee6c07621f" />

include/visp/vision.h

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,9 @@
5757
// 7. Run the compute graph.
5858
// 8. Transfer the output to the host and post-process it.
5959
//
60-
// Custom pipelines are simply functions which call the individual steps and extend them
61-
// where needed. The implementation of the high-level API functions is a good starting point.
60+
// Custom pipelines can be created simply by writing a function that calls the
61+
// individual steps. As a starting point, check out or copy the implementation
62+
// of the high-level API functions. Then adapt them as needed.
6263
// This allows to:
6364
// * load model weights from a different source
6465
// * control exactly when allocation happens
@@ -76,10 +77,11 @@
7677

7778
#include <array>
7879
#include <span>
80+
#include <vector>
7981

8082
namespace visp {
8183

82-
// SWIN - vision transformer for feature extraction
84+
// SWIN v1 - vision transformer for feature extraction
8385

8486
constexpr int swin_n_layers = 4;
8587

@@ -102,7 +104,7 @@ VISP_API swin_params swin_detect_params(model_file const&);
102104
VISP_API swin_buffers swin_precompute(model_ref, i32x2 image_extent, swin_params const&);
103105
VISP_API swin_result swin_encode(model_ref, tensor image, swin_params const&);
104106

105-
// DINO - vision transformer for feature extraction
107+
// DINO v2 - vision transformer for feature extraction
106108

107109
struct dino_params {
108110
int patch_size = 16;
@@ -169,7 +171,9 @@ VISP_API image_data sam_process_mask(
169171
struct birefnet_model;
170172

171173
// Loads a BiRefNet model from GGUF file onto the backend device.
172-
// * supports BiRefNet, BiRefNet_lite, BiRefNet_Matting variants at 1024px resolution
174+
// * supports BiRefNet, BiRefNet-lite, BiRefNet-Matting variants at 1024px resolution
175+
// * supports BiRefNet-HR variant at 2048px resolution
176+
// * supports BiRefNet-dynamic variant at arbitrary resolution
173177
VISP_API birefnet_model birefnet_load_model(char const* filepath, backend_device const&);
174178

175179
// Takes RGB input and computes an alpha mask with foreground as 1.0 and background as 0.0.
@@ -203,7 +207,12 @@ VISP_API tensor birefnet_predict(model_ref, tensor image, birefnet_params const&
203207

204208
struct depthany_model;
205209

210+
// Loads a Depth Anything V2 model from GGUF file onto the backend device.
211+
// * supports Small/Base/Large variants with flexible input resolution
206212
VISP_API depthany_model depthany_load_model(char const* filepath, backend_device const&);
213+
214+
// Takes RGB input and computes estimated depth (distance from camera).
215+
// Output is a single-channel float32 image in range [0, 1.0].
207216
VISP_API image_data depthany_compute(depthany_model&, image_view image);
208217

209218
// --- Depth Anything pipeline
@@ -222,7 +231,7 @@ VISP_API i32x2 depthany_image_extent(i32x2 input_extent, depthany_params const&)
222231

223232
VISP_API image_data depthany_process_input(image_view image, depthany_params const&);
224233
image_data depthany_process_output(
225-
span<float const> output_data, i32x2 target_extent, depthany_params const&);
234+
std::span<float const> output_data, i32x2 target_extent, depthany_params const&);
226235

227236
VISP_API tensor depthany_predict(model_ref, tensor image, depthany_params const&);
228237

models/CMakeLists.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,13 @@ file(DOWNLOAD
1414
EXPECTED_HASH "SHA256=7b5397a2c98d66677f8f74317774bbeac49dbb321b8a3dc744af913db71d4fa5"
1515
SHOW_PROGRESS
1616
)
17+
message(STATUS "Checking for models/Depth-Anything-V2-Small-F16.gguf")
18+
file(DOWNLOAD
19+
"https://huggingface.co/Acly/Depth-Anything-V2-GGUF/resolve/main/Depth-Anything-V2-Small-F16.gguf"
20+
${CMAKE_CURRENT_LIST_DIR}/Depth-Anything-V2-Small-F16.gguf
21+
EXPECTED_HASH "SHA256=0f83332d6a8b4375cd7fdcc168f3e3636f474f8e84b0959e903f513aace782f5"
22+
SHOW_PROGRESS
23+
)
1724
message(STATUS "Checking for models/MIGAN-512-places2-F16.gguf")
1825
file(DOWNLOAD
1926
"https://huggingface.co/Acly/MIGAN-GGUF/resolve/main/MIGAN-512-places2-F16.gguf"

scripts/convert.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -354,7 +354,10 @@ def convert_birefnet(input_filepath: Path, writer: Writer):
354354

355355

356356
def convert_depth_anything(input_filepath: Path, writer: Writer):
357-
writer.add_license("apache-2.0")
357+
if "small" in input_filepath.name.lower():
358+
writer.add_license("apache-2.0")
359+
else:
360+
writer.add_license("cc-by-nc-4.0")
358361
writer.set_tensor_layout_default(TensorLayout.nchw)
359362

360363
model: dict[str, Tensor] = load_model(input_filepath)

0 commit comments

Comments
 (0)