Skip to content

Commit cd37164

Browse files
authored
Merge pull request #24 from m5stack/dev
Dev
2 parents 07662b8 + 9f34887 commit cd37164

File tree

2,162 files changed

+116431
-818916
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

2,162 files changed

+116431
-818916
lines changed

README_zh.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
* [特性](#特性)
1616
* [Demo](#demo)
17+
* [模型列表](#模型列表)
1718
* [环境要求](#环境要求)
1819
* [编译](#编译)
1920
* [安装](#安装)
@@ -54,6 +55,53 @@ StackFlow 语音助手的主要工作模式:
5455
- [StackFlow yolo 视觉检测](https://github.com/Abandon-ht/ModuleLLM_Development_Guide/tree/dev/ESP32/cpp)
5556
- [StackFlow VLM 图片描述](https://github.com/Abandon-ht/ModuleLLM_Development_Guide/tree/dev/ESP32/cpp)
5657

58+
## 模型列表
59+
| 模型名 | 模型类型 | 模型大小 | 模型能力 | 模型配置文件 | 计算单元 |
60+
| :----: | :----: | :----: | :----: | :----: | :----: |
61+
| [silero-vad](https://github.com/snakers4/silero-vad) | VAD | 3.3M | 语音活动检测 | [mode_silero-vad.json](projects/llm_framework/main_vad/mode_silero-vad.json) | CPU |
62+
| [sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01](https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.tar.bz2) | KWS | 6.4M | 关键词识别 | [mode_sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.json](projects/llm_framework/main_kws/mode_sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.json) | CPU |
63+
| [sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01](https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2) | KWS | 5.7M | 关键词识别 | [mode_sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.json](projects/llm_framework/main_kws/mode_sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.json) | CPU |
64+
| [sherpa-ncnn-streaming-zipformer-20M-2023-02-17](https://huggingface.co/desh2608/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-small) | ASR | 40M | 语音识别 | [mode_sherpa-ncnn-streaming-zipformer-20M-2023-02-17.json](projects/llm_framework/main_asr/mode_sherpa-ncnn-streaming-zipformer-20M-2023-02-17.json) | CPU |
65+
| [sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming) | ASR | 24M | 语音识别 | [mode_sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23.json](projects/llm_framework/main_asr/mode_sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23.json) | CPU |
66+
| [whisper-tiny](https://huggingface.co/openai/whisper-tiny) | ASR | 201M | 语音识别 | [mode_whisper-tiny.json](projects/llm_framework/main_whisper/mode_whisper-tiny.json) | NPU |
67+
| [whisper-base](https://huggingface.co/openai/whisper-base) | ASR | 309M | 语音识别 | [mode_whisper-base.json](projects/llm_framework/main_whisper/mode_whisper-base.json) | NPU |
68+
| [whisper-small](https://huggingface.co/openai/whisper-small) | ASR | 725M | 语音识别 | [mode_whisper-small.json](projects/llm_framework/main_whisper/mode_whisper-small.json) | NPU |
69+
| [single-speaker-fast](https://github.com/huakunyang/SummerTTS) | TTS | 77M | 语音生成 | [mode_whisper-tiny.json](projects/llm_framework/main_tts/mode_single-speaker-fast.json) | CPU |
70+
| [single-speaker-english-fast](https://github.com/huakunyang/SummerTTS) | TTS | 60M | 语音生成 | [mode_whisper-tiny.json](projects/llm_framework/main_tts/mode_single-speaker-english-fast.json) | CPU |
71+
| [melotts-en-au](https://huggingface.co/myshell-ai/MeloTTS-English) | TTS | 102M | 语音生成 | [mode_melotts-en-au.json](projects/llm_framework/main_melotts/mode_melotts-en-au.json) | NPU |
72+
| [melotts-en-br](https://huggingface.co/myshell-ai/MeloTTS-English) | TTS | 102M | 语音生成 | [mode_melotts-en-au.json](projects/llm_framework/main_melotts/mode_melotts-en-br.json) | NPU |
73+
| [melotts-en-default](https://huggingface.co/myshell-ai/MeloTTS-English) | TTS | 102M | 语音生成 | [mode_melotts-en-india.json](projects/llm_framework/main_melotts/mode_melotts-en-default.json) | NPU |
74+
| [melotts-en-us](https://huggingface.co/myshell-ai/MeloTTS-English) | TTS | 102M | 语音生成 | [mode_melotts-en-au.json](projects/llm_framework/main_melotts/mode_melotts-en-us.json) | NPU |
75+
| [melotts-es-es](https://huggingface.co/myshell-ai/MeloTTS-Spanish) | TTS | 83M | 语音生成 | [mode_melotts-es-es.json](projects/llm_framework/main_melotts/mode_melotts-es-es.json) | NPU |
76+
| [melotts-ja-jp](https://huggingface.co/myshell-ai/MeloTTS-Japanese) | TTS | 83M | 语音生成 | [mode_melotts-ja-jp.json](projects/llm_framework/main_melotts/mode_melotts-ja-jp.json) | NPU |
77+
| [melotts-zh-cn](https://huggingface.co/myshell-ai/MeloTTS-Chinese) | TTS | 86M | 语音生成 | [mode_melotts-zh-cn.json](projects/llm_framework/main_melotts/mode_melotts-zh-cn.json) | NPU |
78+
| [deepseek-r1-1.5B-ax630c](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) | LLM | 2.0G | 文本生成 | [mode_deepseek-r1-1.5B-ax630c.json](projects/llm_framework/main_llm/models/mode_deepseek-r1-1.5B-ax630c.json) | NPU |
79+
| [deepseek-r1-1.5B-p256-ax630c](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) | LLM | 2.0G | 文本生成 | [mode_deepseek-r1-1.5B-p256-ax630c.json](projects/llm_framework/main_llm/models/mode_deepseek-r1-1.5B-p256-ax630c.json) | NPU |
80+
| [llama3.2-1B-p256-ax630c](https://huggingface.co/meta-llama/Llama-3.2-1B) | LLM | 1.7G | 文本生成 | [mode_llama3.2-1B-p256-ax630c.json](projects/llm_framework/main_llm/models/mode_llama3.2-1B-p256-ax630c.json) | NPU |
81+
| [llama3.2-1B-prefill-ax630c](https://huggingface.co/meta-llama/Llama-3.2-1B) | LLM | 1.7G | 文本生成 | [mode_llama3.2-1B-prefill-ax630c.json](projects/llm_framework/main_llm/models/mode_llama3.2-1B-prefill-ax630c.json) | NPU |
82+
| [openbuddy-llama3.2-1B-ax630c](https://huggingface.co/OpenBuddy/openbuddy-llama3.2-1b-v23.1-131k) | LLM | 1.7G | 文本生成 | [mode_openbuddy-llama3.2-1B-ax630c.json](projects/llm_framework/main_llm/models/mode_openbuddy-llama3.2-1B-ax630c.json) | NPU |
83+
| [qwen2.5-0.5B-Int4-ax630c](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4) | LLM | 626M | 文本生成 | [mode_qwen2.5-0.5B-Int4-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-0.5B-Int4-ax630c.json) | NPU |
84+
| [qwen2.5-0.5B-p256-ax630c](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) | LLM | 760M | 文本生成 | [mode_qwen2.5-0.5B-p256-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-0.5B-p256-ax630c.json) | NPU |
85+
| [qwen2.5-0.5B-prefill-20e](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) | LLM | 758M | 文本生成 | [mode_qwen2.5-0.5B-prefill-20e.json](projects/llm_framework/main_llm/models/mode_qwen2.5-0.5B-prefill-20e.json) | NPU |
86+
| [qwen2.5-1.5B-Int4-ax630c](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4) | LLM | 1.5G | 文本生成 | [mode_qwen2.5-1.5B-Int4-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-1.5B-Int4-ax630c.json) | NPU |
87+
| [qwen2.5-1.5B-p256-ax630c](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) | LLM | 2.0G | 文本生成 | [mode_qwen2.5-1.5B-p256-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-1.5B-p256-ax630c.json) | NPU |
88+
| [qwen2.5-1.5B-ax630c](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) | LLM | 2.0G | 文本生成 | [mode_qwen2.5-1.5B-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-1.5B-ax630c.json) | NPU |
89+
| [qwen2.5-coder-0.5B-ax630c](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct) | LLM | 756M | 文本生成 | [mode_qwen2.5-coder-0.5B-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-coder-0.5B-ax630c.json) | NPU |
90+
| [qwen3-0.6B-ax630c](https://huggingface.co/AXERA-TECH/InternVL2_5-1B) | LLM | 917M | 文本生成 | [mode_qwen3-0.6B-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen3-0.6B-ax630c.json) | NPU |
91+
| [mode_internvl2.5-1B-364-ax630c](https://huggingface.co/Qwen/Qwen3-0.6B) | VLM | 1.2G | 多模态文本生成 | [mode_internvl2.5-1B-364-ax630c.json](projects/llm_framework/main_vlm/models/mode_internvl2.5-1B-364-ax630c.json) | NPU |
92+
| [smolvlm-256M-ax630c](https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct) | VLM | 330M | 多模态文本生成 | [mode_smolvlm-256M-ax630c.json](projects/llm_framework/main_vlm/models/mode_smolvlm-256M-ax630c.json) | NPU |
93+
| [smolvlm-500M-ax630c](https://huggingface.co/HuggingFaceTB/SmolVLM-500M-Instruct) | VLM | 605M | 多模态文本生成 | [mode_smolvlm-256M-ax630c.json](projects/llm_framework/main_vlm/models/mode_smolvlm-500M-ax630c.json) | NPU |
94+
| [yolo11n](https://github.com/ultralytics/ultralytics) | CV | 2.8M | 目标检测 | [mode_yolo11n.json](projects/llm_framework/main_yolo/mode_yolo11n.json) | NPU |
95+
| [yolo11n-npu1](https://github.com/ultralytics/ultralytics) | CV | 2.8M | 目标检测 | [mode_yolo11n-npu1.json](projects/llm_framework/main_yolo/mode_yolo11n-npu1.json) | NPU |
96+
| [yolo11n-seg](https://github.com/ultralytics/ultralytics) | CV | 3.0M | 实例分割 | [mode_yolo11n-seg.json](projects/llm_framework/main_yolo/mode_yolo11n-seg.json) | NPU |
97+
| [yolo11n-seg-npu1](https://github.com/ultralytics/ultralytics) | CV | 3.0M | 实例分割 | [mode_yolo11n-seg-npu1.json](projects/llm_framework/main_yolo/mode_yolo11n-seg-npu1.json) | NPU |
98+
| [yolo11n-pose](https://github.com/ultralytics/ultralytics) | CV | 3.1M | 姿态检测 | [mode_yolo11n-pose.json](projects/llm_framework/main_yolo/mode_yolo11n-pose.json) | NPU |
99+
| [yolo11n-pose-npu1](https://github.com/ultralytics/ultralytics) | CV | 3.1M | 姿态检测 | [mode_yolo11n-pose-npu1.json](projects/llm_framework/main_yolo/mode_yolo11n-pose-npu1.json) | NPU |
100+
| [yolo11n-hand-pose](https://github.com/ultralytics/ultralytics) | CV | 3.2M | 姿态检测 | [mode_yolo11n-hand-pose.json](projects/llm_framework/main_yolo/mode_yolo11n-hand-pose.json) | NPU |
101+
| [yolo11n-hand-pose-npu1](https://github.com/ultralytics/ultralytics) | CV | 3.2M | 姿态检测 | [mode_yolo11n-hand-pose-npu1.json](projects/llm_framework/main_yolo/mode_yolo11n-hand-pose-npu1.json) | NPU |
102+
| [depth-anything-ax630c](https://github.com/DepthAnything/Depth-Anything-V2) | CV | 29M | 单目深度估计 | [mode_depth-anything-ax630c.json](projects/llm_framework/main_depth_anything/mode_depth-anything-ax630c.json) | NPU |
103+
| [depth-anything-npu1-ax630c](https://github.com/DepthAnything/Depth-Anything-V2) | CV | 29M | 单目深度估计 | [mode_depth-anything-npu1-ax630c.json](projects/llm_framework/main_depth_anything/mode_depth-anything-npu1-ax630c.json) | NPU |
104+
57105
## 环境要求 ##
58106
当前 StackFlow 的 AI 单元是建立在 AXERA 加速平台之上的,主要的芯片平台为 ax630c、ax650n。系统要求为 ubuntu。
59107

doc/projects_llm_framework_doc/llm_camera_en.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ Send JSON:
3737
- enoutput: Whether to enable user result output. If you do not need to obtain camera images, do not enable this parameter, as the video stream will increase the communication pressure on the channel.
3838
- enable_webstream: Whether to enable webstream output, webstream will listen on tcp:8989 port, and once a client connection is received, it will push jpeg images in HTTP protocol multipart/x-mixed-replace type.
3939
- rtsp: Whether to enable rtsp stream output, rtsp will establish an RTSP TCP server at rtsp://{DevIp}:8554/axstream0, and you can pull the video stream from this port using the RTSP protocol. The video stream format is 1280x720 H265. Note that this video stream is only valid on the AX630C MIPI camera, and the UVC camera cannot use RTSP.
40+
- VinParam.bAiispEnable: Whether to enable AI-ISP, enabled by default. Set to 0 to disable, only valid when using AX630C MIPI camera.
4041

4142
Response JSON:
4243

doc/projects_llm_framework_doc/llm_camera_zh.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
- enoutput:是否起用用户结果输出。如果不需要获取摄像头图片,请不要开启该参数,视频流会增加信道的通信压力。
3838
- enable_webstream:是否启用 webstream 流输出,webstream 会监听 tcp:8989 端口,一但收到客户端连接,将会以 HTTP 协议 multipart/x-mixed-replace 类型推送 jpeg 图片。
3939
- rtsp:是否启用 rtsp 流输出,rtsp 会建立一个 rtsp://{DevIp}:8554/axstream0 RTSP TCP 服务端,可使用RTSP 协议向该端口拉取视频流。视频流的格式为 1280x720 H265。注意,该视频流只在 AX630C MIPI 摄像头上有效,UVC 摄像头无法使用 RTSP。
40+
- VinParam.bAiispEnable:是否开启 AI-ISP,默认开启。关闭为 0,仅在使用 AX630C MIPI 摄像头时有效。
4041

4142
响应 json:
4243

Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
# llm_cosy_voice
2+
3+
使用 npu 加速的文字转语音单元,用于提供文字转语音服务,可使用语音克隆,用于提供多语言转语音服务。
4+
5+
## setup
6+
7+
配置单元工作。
8+
9+
发送 json:
10+
11+
```json
12+
cosy_voice
13+
{
14+
"request_id": "2",
15+
"work_id": "cosy_voice",
16+
"action": "setup",
17+
"object": "cosy_voice.setup",
18+
"data": {
19+
"model": "CosyVoice2-0.5B-ax650",
20+
"response_format": "file",
21+
"input": "tts.utf-8",
22+
"enoutput": false
23+
}
24+
}
25+
```
26+
27+
28+
- request_id:参考基本数据解释。
29+
- work_id:配置单元时,为 `cosy_voice`
30+
- action:调用的方法为 `setup`
31+
- object:传输的数据类型为 `cosy_voice.setup`
32+
- model:使用的模型为 `CosyVoice2-0.5B-ax650` 模型。
33+
- prompt_files:要克隆的音频信息文件。
34+
- response_format:返回结果为 `sys.pcm`, 系统音频数据,并直接发送到 llm-audio 模块进行播放。返回结果为 `file`, 生成的音频写 wav 文件,可用 `prompt_dir` 指定路径或文件名。
35+
- input:输入的为 `tts.utf-8`,代表的是从用户输入。
36+
- enoutput:是否起用用户结果输出。
37+
38+
响应 json:
39+
40+
```json
41+
{
42+
"created": 1761791627,
43+
"data": "None",
44+
"error": {
45+
"code": 0,
46+
"message": ""
47+
},
48+
"object": "None",
49+
"request_id": "2",
50+
"work_id": "cosy_voice.1000"
51+
}
52+
```
53+
54+
- created:消息创建时间,unix 时间。
55+
- work_id:返回成功创建的 work_id 单元。
56+
57+
## inference
58+
59+
### 流式输入
60+
61+
```json
62+
{
63+
"request_id": "2",
64+
"work_id": "cosy_voice.1000",
65+
"action": "inference",
66+
"object": "cosy_voice.utf-8.stream",
67+
"data": {
68+
"delta": "今天天气真好!",
69+
"index": 0,
70+
"finish": true
71+
}
72+
}
73+
```
74+
- object:传输的数据类型为 `cosy_voice.utf-8.stream` 代表的是从用户 utf-8 的流式输入
75+
- delta:流式输入的分段数据
76+
- index:流式输入的分段索引
77+
- finish:流式输入是否完成的标志位
78+
79+
### 非流式输入
80+
81+
```json
82+
{
83+
"request_id": "2",
84+
"work_id": "cosy_voice.1000",
85+
"action": "inference",
86+
"object": "cosy_voice.utf-8",
87+
"data": "今天天气真好!"
88+
}
89+
```
90+
- object:传输的数据类型为 `cosy_voice.utf-8` 代表的是从用户 utf-8 的非流式输入
91+
- data:非流式输入的数据
92+
93+
## pause
94+
95+
暂停单元工作。
96+
97+
发送 json:
98+
99+
```json
100+
{
101+
"request_id": "5",
102+
"work_id": "cosy_voice.1000",
103+
"action": "pause"
104+
}
105+
```
106+
107+
响应 json:
108+
109+
```json
110+
{
111+
"created": 1761791706,
112+
"data": "None",
113+
"error": {
114+
"code": 0,
115+
"message": ""
116+
},
117+
"object": "None",
118+
"request_id": "5",
119+
"work_id": "cosy_voice.1000"
120+
}
121+
```
122+
123+
error::code 为 0 表示执行成功。
124+
125+
## exit
126+
127+
单元退出。
128+
129+
发送 json:
130+
131+
```json
132+
{
133+
"request_id": "7",
134+
"work_id": "cosy_voice.1000",
135+
"action": "exit"
136+
}
137+
```
138+
139+
响应 json:
140+
141+
```json
142+
{
143+
"created": 1761791854,
144+
"data": "None",
145+
"error": {
146+
"code": 0,
147+
"message": ""
148+
},
149+
"object": "None",
150+
"request_id": "7",
151+
"work_id": "cosy_voice.1000"
152+
}
153+
```
154+
155+
error::code 为 0 表示执行成功。
156+
157+
## taskinfo
158+
159+
获取任务列表。
160+
161+
发送 json:
162+
163+
```json
164+
{
165+
"request_id": "2",
166+
"work_id": "cosy_voice",
167+
"action": "taskinfo"
168+
}
169+
```
170+
171+
响应 json:
172+
173+
```json
174+
{
175+
"created": 1761791739,
176+
"data": [
177+
"cosy_voice.1000"
178+
],
179+
"error": {
180+
"code": 0,
181+
"message": ""
182+
},
183+
"object": "llm.tasklist",
184+
"request_id": "2",
185+
"work_id": "cosy_voice"
186+
}
187+
```
188+
189+
获取任务运行参数。
190+
191+
```json
192+
{
193+
"request_id": "2",
194+
"work_id": "cosy_voice.1000",
195+
"action": "taskinfo"
196+
}
197+
```
198+
199+
响应 json:
200+
201+
```json
202+
{
203+
"created": 1761791761,
204+
"data": {
205+
"enoutput": false,
206+
"inputs": [
207+
"tts.utf-8"
208+
],
209+
"model": "CosyVoice2-0.5B-ax650",
210+
"response_format": "sys.pcm"
211+
},
212+
"error": {
213+
"code": 0,
214+
"message": ""
215+
},
216+
"object": "cosy_voice.taskinfo",
217+
"request_id": "2",
218+
"work_id": "cosy_voice.1000"
219+
}
220+
```
221+
222+
> **注意:work_id 是按照单元的初始化注册顺序增加的,并不是固定的索引值。**
223+
> **同类型单元不能配置多个单元同时工作,否则会产生未知错误。例如 tts 和 melo tts 不能同时拍起用工作。**

doc/projects_llm_framework_doc/llm_kws_en.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Send JSON:
3434
- response_format: The result returned is in `kws.bool` format.
3535
- input: The input is `sys.pcm`, representing system audio.
3636
- enoutput: Whether to enable user result output.
37-
- kws: The Chinese wake-up word is `"你好你好"`.
37+
- kws: The English wake-up word is `"HELLO"`. It must be capital letters.
3838
- enwake_audio: Whether to enable wake-up audio output. Default is true.
3939

4040
Response JSON:

doc/projects_llm_framework_doc/llm_vlm_en.md

Lines changed: 36 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Send the following JSON:
1515
"action": "setup",
1616
"object": "vlm.setup",
1717
"data": {
18-
"model": "internvl2.5-1B-ax630c",
18+
"model": "internvl2.5-1B-364-ax630c",
1919
"response_format": "vlm.utf-8.stream",
2020
"input": "vlm.utf-8",
2121
"enoutput": true,
@@ -29,7 +29,7 @@ Send the following JSON:
2929
- work_id: Set to `vlm` when configuring the unit.
3030
- action: The method being called is `setup`.
3131
- object: Data type being transferred is `vlm.setup`.
32-
- model: The model used is `internvl2.5-1B-ax630c`, a multimodal model.
32+
- model: The model used is `internvl2.5-1B-364-ax630c`, a multimodal model.
3333
- response_format: The output is in `vlm.utf-8.stream`, a UTF-8 stream format.
3434
- input: The input is `vlm.utf-8`, representing user input.
3535
- enoutput: Specifies whether to enable user output.
@@ -250,7 +250,7 @@ Example:
250250
"action": "setup",
251251
"object": "vlm.setup",
252252
"data": {
253-
"model": "internvl2.5-1B-ax630c",
253+
"model": "internvl2.5-1B-364-ax630c",
254254
"response_format": "vlm.utf-8.stream",
255255
"input": [
256256
"vlm.utf-8",
@@ -264,6 +264,38 @@ Example:
264264
}
265265
```
266266

267+
Linking the Output of the llm-camera Unit.
268+
269+
Sending JSON:
270+
271+
```json
272+
{
273+
"request_id": "3",
274+
"work_id": "vlm.1003",
275+
"action": "link",
276+
"object": "work_id",
277+
"data": "camera.1000"
278+
}
279+
```
280+
281+
Response JSON:
282+
283+
```json
284+
{
285+
"created": 1750992545,
286+
"data": "None",
287+
"error": {
288+
"code": 0,
289+
"message": ""
290+
},
291+
"object": "None",
292+
"request_id": "3",
293+
"work_id": "vlm.1003"
294+
}
295+
```
296+
297+
> **Ensure that the camera is properly configured and ready for operation when performing the link action. If using the AX630C MIPI camera, configure it in AI-ISP disabled mode during the initialization of llm-camera.**
298+
267299
## unlink
268300

269301
Unlink units.
@@ -447,7 +479,7 @@ Response JSON:
447479
"vlm.utf-8",
448480
"kws.1000"
449481
],
450-
"model": "internvl2.5-1B-ax630c",
482+
"model": "internvl2.5-1B-364-ax630c",
451483
"response_format": "vlm.utf-8.stream"
452484
},
453485
"error": {

0 commit comments

Comments
 (0)