Skip to content

Commit 04d60af

Browse files
committed
docs: update readme, header, add images
1 parent 05fb3fb commit 04d60af

File tree

2 files changed

+27
-15
lines changed

2 files changed

+27
-15
lines changed

README.md

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Computer Vision ML inference in C++
88
* Growing number of supported models behind a simple API
99
* Modular design for full control and implementing your own models
1010

11-
Based on [GGML](https://github.com/ggml-org/ggml) which also powers the [llama.cpp](https://github.com/ggml-org/llama.cpp) project.
11+
Based on [ggml](https://github.com/ggml-org/ggml) similar to the [llama.cpp](https://github.com/ggml-org/llama.cpp) project.
1212

1313
### Features
1414

@@ -28,12 +28,11 @@ Get the library and executables:
2828

2929
### Example: Select an object in an image
3030

31-
Let's use MobileSAM to generate a segmentation mask.
31+
Let's use MobileSAM to generate a segmentation mask of the plushy on the right by passing in a box describing its approximate location.
3232

33-
<img alt="Example image showing box prompt and mask output" src="docs/media/example-sam.jpg" width="400">
33+
<img alt="Example image showing box prompt at pixel location (420, 120) -> (650, 430), and the output mask" src="docs/media/example-sam-coords.jpg" width="400">
3434

35-
We target the plushy on the right by passing a box at pixel position (420, 120) → (650, 430).
36-
Download the model [MobileSAM-F16.gguf](https://huggingface.co/Acly/MobileSAM-GGUF/resolve/main/MobileSAM-F16.gguf) and the [input image](docs/media/input.jpg).
35+
You can download the model and input image here: [MobileSAM-F16.gguf](https://huggingface.co/Acly/MobileSAM-GGUF/resolve/main/MobileSAM-F16.gguf) | [input.jpg](docs/media/input.jpg)
3736

3837

3938
#### CLI
@@ -43,6 +42,7 @@ Find the `vision-cli` executable in the `bin` folder and run it to generate the
4342
```sh
4443
vision-cli -m MobileSAM-F16.gguf -i input.png -p 420 120 650 430 -o mask.png
4544
```
45+
Pass `--composite output.png` to composite input and mask. Use `--help` for more options.
4646

4747
#### API
4848

@@ -67,13 +67,14 @@ data to backend devices, post-processing output, etc.
6767
These can be used as building blocks for flexible functions which integrate
6868
with your existing data sources and infrastructure.
6969

70-
#### UI
7170

7271

7372
## Models
7473

7574
### MobileSAM
7675

76+
<img src="docs/media/example-sam.jpg" width="400">
77+
7778
[Model download](https://huggingface.co/Acly/MobileSAM-GGUF/tree/main) | [Paper (arXiv)](https://arxiv.org/pdf/2306.14289.pdf) | [Repository (GitHub)](https://github.com/ChaoningZhang/MobileSAM) | [Segment-Anything-Model](https://segment-anything.com/) | License: Apache-2
7879

7980
```sh
@@ -82,6 +83,8 @@ vision-cli sam -m MobileSAM-F16.gguf -i input.png -p 300 200 -o mask.png --compo
8283

8384
### BiRefNet
8485

86+
<img src="docs/media/example-birefnet.png" width="400">
87+
8588
[Model download](https://huggingface.co/Acly/BiRefNet-GGUF/tree/main) | [Paper (arXiv)](https://arxiv.org/pdf/2401.03407) | [Repository (GitHub)](https://github.com/ZhengPeng7/BiRefNet) | License: MIT
8689

8790
```sh
@@ -90,6 +93,8 @@ vision-cli birefnet -m BiRefNet-lite-F16.gguf -i input.png -o mask.png --composi
9093

9194
### MI-GAN
9295

96+
<img src="docs/media/example-migan.jpg" width="400">
97+
9398
[Model download](https://huggingface.co/Acly/MIGAN-GGUF/tree/main) | [Paper (thecvf.com)](https://openaccess.thecvf.com/content/ICCV2023/papers/Sargsyan_MI-GAN_A_Simple_Baseline_for_Image_Inpainting_on_Mobile_Devices_ICCV_2023_paper.pdf) | [Repository (GitHub)](https://github.com/Picsart-AI-Research/MI-GAN) | License: MIT
9499

95100
```sh
@@ -98,6 +103,8 @@ vision-cli migan -m MIGAN-512-places2-F16.gguf -i image.png mask.png -o output.p
98103

99104
### Real-ESRGAN
100105

106+
<img src="docs/media/example-esrgan.jpg" width="400">
107+
101108
[Model download](https://huggingface.co/Acly/Real-ESRGAN-GGUF) | [Paper (arXiv)](https://arxiv.org/abs/2107.10833) | [Repository (GitHub)](https://github.com/xinntao/Real-ESRGAN) | License: BSD-3-Clause
102109

103110
```sh
@@ -157,4 +164,10 @@ uv sync
157164

158165
# Run python tests
159166
uv run pytest
160-
```
167+
```
168+
169+
## Acknowledgements
170+
171+
* [ggml](https://github.com/ggml-org/ggml) - ML inference library | MIT
172+
* [stb-image](https://github.com/nothings/stb) - Image load/save/resize | Public Domain
173+
* [fmt](https://github.com/fmtlib/fmt) - String formatting _(only if compiler doesn't support &lt;format&gt;)_ | MIT

include/visp/vision.hpp

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@
2626
//
2727
// Provides a high-level API to run inference on various vision models for
2828
// common tasks. These operations are built for simplicity and don't provide
29-
// a lot of options. Rather, you will find below each operation it is split into
30-
// several steps, which can be used to build more flexible pipelines.
29+
// a lot of options. If you need more control, you will find each operation
30+
// split into several steps below, which can be combined in a modular fashion.
3131
//
3232
// Basic Use
3333
// ---------
@@ -49,17 +49,16 @@
4949
//
5050
// Internally running the model is split into several steps:
5151
// 1. Load the model weights from a GGUF file.
52-
// 2. Allocate storage on the backend device and transfer the weights.
53-
// 3. Detect model hyperparameters and precompute required buffers.
52+
// 2. Detect model hyperparameters and precompute required buffers.
53+
// 3. Allocate storage on the backend device and transfer the weights.
5454
// 4. Build a compute graph for the model architecture.
5555
// 5. Allocate storage for input, output and intermediate tensors on the backend device.
56-
// 6. Pre-process the image and transfer it to the backend device.
56+
// 6. Pre-process the input and transfer it to the backend device.
5757
// 7. Run the compute graph.
5858
// 8. Transfer the output to the host and post-process it.
5959
//
60-
// You can run all steps individually in order to customize the pipeline. Check the
61-
// implementation of the high-level API functions to get started.
62-
//
60+
// Custom pipelines are simply functions which call the individual steps and extend them
61+
// where needed. The implementation of the high-level API functions is a good starting point.
6362
// This allows to:
6463
// * load model weights from a different source
6564
// * control exactly when allocation happens

0 commit comments

Comments
 (0)