m5stack
diff --git a/‎README_zh.md‎
Lines changed: 48 additions & 0 deletions b/‎README_zh.md‎
Lines changed: 48 additions & 0 deletions
diff --git a/‎doc/projects_llm_framework_doc/llm_camera_en.md‎
Lines changed: 1 addition & 0 deletions b/‎doc/projects_llm_framework_doc/llm_camera_en.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/projects_llm_framework_doc/llm_camera_zh.md‎
Lines changed: 1 addition & 0 deletions b/‎doc/projects_llm_framework_doc/llm_camera_zh.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/projects_llm_framework_doc/llm_cosyvoice2.md‎
Lines changed: 223 additions & 0 deletions b/‎doc/projects_llm_framework_doc/llm_cosyvoice2.md‎
Lines changed: 223 additions & 0 deletions
diff --git a/‎doc/projects_llm_framework_doc/llm_kws_en.md‎
Lines changed: 1 addition & 1 deletion b/‎doc/projects_llm_framework_doc/llm_kws_en.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/projects_llm_framework_doc/llm_vlm_en.md‎
Lines changed: 36 additions & 4 deletions b/‎doc/projects_llm_framework_doc/llm_vlm_en.md‎
Lines changed: 36 additions & 4 deletions
@@ -14,6 +14,7 @@
 
 * [特性](#特性)
 * [Demo](#demo)
+* [模型列表](#模型列表)
 * [环境要求](#环境要求)
 * [编译](#编译)
 * [安装](#安装)
@@ -54,6 +55,53 @@ StackFlow 语音助手的主要工作模式：
 - [StackFlow yolo 视觉检测](https://github.com/Abandon-ht/ModuleLLM_Development_Guide/tree/dev/ESP32/cpp)
 - [StackFlow VLM 图片描述](https://github.com/Abandon-ht/ModuleLLM_Development_Guide/tree/dev/ESP32/cpp)
 
+## 模型列表
+| 模型名 | 模型类型 | 模型大小 | 模型能力 | 模型配置文件 | 计算单元 |
+| :----: | :----: | :----: | :----: | :----: | :----: |
+| [silero-vad](https://github.com/snakers4/silero-vad) | VAD | 3.3M | 语音活动检测 | [mode_silero-vad.json](projects/llm_framework/main_vad/mode_silero-vad.json) | CPU |
+| [sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01](https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.tar.bz2) | KWS | 6.4M | 关键词识别 | [mode_sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.json](projects/llm_framework/main_kws/mode_sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.json) | CPU |
+| [sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01](https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2) | KWS | 5.7M | 关键词识别 | [mode_sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.json](projects/llm_framework/main_kws/mode_sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.json) | CPU |
+| [sherpa-ncnn-streaming-zipformer-20M-2023-02-17](https://huggingface.co/desh2608/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-small) | ASR | 40M | 语音识别 | [mode_sherpa-ncnn-streaming-zipformer-20M-2023-02-17.json](projects/llm_framework/main_asr/mode_sherpa-ncnn-streaming-zipformer-20M-2023-02-17.json) | CPU |
+| [sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming) | ASR | 24M | 语音识别 | [mode_sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23.json](projects/llm_framework/main_asr/mode_sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23.json) | CPU |
+| [whisper-tiny](https://huggingface.co/openai/whisper-tiny) | ASR | 201M | 语音识别 | [mode_whisper-tiny.json](projects/llm_framework/main_whisper/mode_whisper-tiny.json) | NPU |
+| [whisper-base](https://huggingface.co/openai/whisper-base) | ASR | 309M | 语音识别 | [mode_whisper-base.json](projects/llm_framework/main_whisper/mode_whisper-base.json) | NPU |
+| [whisper-small](https://huggingface.co/openai/whisper-small) | ASR | 725M | 语音识别 | [mode_whisper-small.json](projects/llm_framework/main_whisper/mode_whisper-small.json) | NPU |
+| [single-speaker-fast](https://github.com/huakunyang/SummerTTS) | TTS | 77M | 语音生成 | [mode_whisper-tiny.json](projects/llm_framework/main_tts/mode_single-speaker-fast.json) | CPU |
+| [single-speaker-english-fast](https://github.com/huakunyang/SummerTTS) | TTS | 60M | 语音生成 | [mode_whisper-tiny.json](projects/llm_framework/main_tts/mode_single-speaker-english-fast.json) | CPU |
+| [melotts-en-au](https://huggingface.co/myshell-ai/MeloTTS-English) | TTS | 102M | 语音生成 | [mode_melotts-en-au.json](projects/llm_framework/main_melotts/mode_melotts-en-au.json) | NPU |
+| [melotts-en-br](https://huggingface.co/myshell-ai/MeloTTS-English) | TTS | 102M | 语音生成 | [mode_melotts-en-au.json](projects/llm_framework/main_melotts/mode_melotts-en-br.json) | NPU |
+| [melotts-en-default](https://huggingface.co/myshell-ai/MeloTTS-English) | TTS | 102M | 语音生成 | [mode_melotts-en-india.json](projects/llm_framework/main_melotts/mode_melotts-en-default.json) | NPU |
+| [melotts-en-us](https://huggingface.co/myshell-ai/MeloTTS-English) | TTS | 102M | 语音生成 | [mode_melotts-en-au.json](projects/llm_framework/main_melotts/mode_melotts-en-us.json) | NPU |
+| [melotts-es-es](https://huggingface.co/myshell-ai/MeloTTS-Spanish) | TTS | 83M | 语音生成 | [mode_melotts-es-es.json](projects/llm_framework/main_melotts/mode_melotts-es-es.json) | NPU |
+| [melotts-ja-jp](https://huggingface.co/myshell-ai/MeloTTS-Japanese) | TTS | 83M | 语音生成 | [mode_melotts-ja-jp.json](projects/llm_framework/main_melotts/mode_melotts-ja-jp.json) | NPU |
+| [melotts-zh-cn](https://huggingface.co/myshell-ai/MeloTTS-Chinese) | TTS | 86M | 语音生成 | [mode_melotts-zh-cn.json](projects/llm_framework/main_melotts/mode_melotts-zh-cn.json) | NPU |
+| [deepseek-r1-1.5B-ax630c](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) | LLM | 2.0G | 文本生成 | [mode_deepseek-r1-1.5B-ax630c.json](projects/llm_framework/main_llm/models/mode_deepseek-r1-1.5B-ax630c.json) | NPU |
+| [deepseek-r1-1.5B-p256-ax630c](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) | LLM | 2.0G | 文本生成 | [mode_deepseek-r1-1.5B-p256-ax630c.json](projects/llm_framework/main_llm/models/mode_deepseek-r1-1.5B-p256-ax630c.json) | NPU |
+| [llama3.2-1B-p256-ax630c](https://huggingface.co/meta-llama/Llama-3.2-1B) | LLM | 1.7G | 文本生成 | [mode_llama3.2-1B-p256-ax630c.json](projects/llm_framework/main_llm/models/mode_llama3.2-1B-p256-ax630c.json) | NPU |
+| [llama3.2-1B-prefill-ax630c](https://huggingface.co/meta-llama/Llama-3.2-1B) | LLM | 1.7G | 文本生成 | [mode_llama3.2-1B-prefill-ax630c.json](projects/llm_framework/main_llm/models/mode_llama3.2-1B-prefill-ax630c.json) | NPU |
+| [openbuddy-llama3.2-1B-ax630c](https://huggingface.co/OpenBuddy/openbuddy-llama3.2-1b-v23.1-131k) | LLM | 1.7G | 文本生成 | [mode_openbuddy-llama3.2-1B-ax630c.json](projects/llm_framework/main_llm/models/mode_openbuddy-llama3.2-1B-ax630c.json) | NPU |
+| [qwen2.5-0.5B-Int4-ax630c](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4) | LLM | 626M | 文本生成 | [mode_qwen2.5-0.5B-Int4-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-0.5B-Int4-ax630c.json) | NPU |
+| [qwen2.5-0.5B-p256-ax630c](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) | LLM | 760M | 文本生成 | [mode_qwen2.5-0.5B-p256-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-0.5B-p256-ax630c.json) | NPU |
+| [qwen2.5-0.5B-prefill-20e](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) | LLM | 758M | 文本生成 | [mode_qwen2.5-0.5B-prefill-20e.json](projects/llm_framework/main_llm/models/mode_qwen2.5-0.5B-prefill-20e.json) | NPU |
+| [qwen2.5-1.5B-Int4-ax630c](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4) | LLM | 1.5G | 文本生成 | [mode_qwen2.5-1.5B-Int4-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-1.5B-Int4-ax630c.json) | NPU |
+| [qwen2.5-1.5B-p256-ax630c](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) | LLM | 2.0G | 文本生成 | [mode_qwen2.5-1.5B-p256-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-1.5B-p256-ax630c.json) | NPU |
+| [qwen2.5-1.5B-ax630c](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) | LLM | 2.0G | 文本生成 | [mode_qwen2.5-1.5B-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-1.5B-ax630c.json) | NPU |
+| [qwen2.5-coder-0.5B-ax630c](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct) | LLM | 756M | 文本生成 | [mode_qwen2.5-coder-0.5B-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-coder-0.5B-ax630c.json) | NPU |
+| [qwen3-0.6B-ax630c](https://huggingface.co/AXERA-TECH/InternVL2_5-1B) | LLM | 917M | 文本生成 | [mode_qwen3-0.6B-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen3-0.6B-ax630c.json) | NPU |
+| [mode_internvl2.5-1B-364-ax630c](https://huggingface.co/Qwen/Qwen3-0.6B) | VLM | 1.2G | 多模态文本生成 | [mode_internvl2.5-1B-364-ax630c.json](projects/llm_framework/main_vlm/models/mode_internvl2.5-1B-364-ax630c.json) | NPU |
+| [smolvlm-256M-ax630c](https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct) | VLM | 330M | 多模态文本生成 | [mode_smolvlm-256M-ax630c.json](projects/llm_framework/main_vlm/models/mode_smolvlm-256M-ax630c.json) | NPU |
+| [smolvlm-500M-ax630c](https://huggingface.co/HuggingFaceTB/SmolVLM-500M-Instruct) | VLM | 605M | 多模态文本生成 | [mode_smolvlm-256M-ax630c.json](projects/llm_framework/main_vlm/models/mode_smolvlm-500M-ax630c.json) | NPU |
+| [yolo11n](https://github.com/ultralytics/ultralytics) | CV | 2.8M | 目标检测 | [mode_yolo11n.json](projects/llm_framework/main_yolo/mode_yolo11n.json) | NPU |
+| [yolo11n-npu1](https://github.com/ultralytics/ultralytics) | CV | 2.8M | 目标检测 | [mode_yolo11n-npu1.json](projects/llm_framework/main_yolo/mode_yolo11n-npu1.json) | NPU |
+| [yolo11n-seg](https://github.com/ultralytics/ultralytics) | CV | 3.0M | 实例分割 | [mode_yolo11n-seg.json](projects/llm_framework/main_yolo/mode_yolo11n-seg.json) | NPU |
+| [yolo11n-seg-npu1](https://github.com/ultralytics/ultralytics) | CV | 3.0M | 实例分割 | [mode_yolo11n-seg-npu1.json](projects/llm_framework/main_yolo/mode_yolo11n-seg-npu1.json) | NPU |
+| [yolo11n-pose](https://github.com/ultralytics/ultralytics) | CV | 3.1M | 姿态检测 | [mode_yolo11n-pose.json](projects/llm_framework/main_yolo/mode_yolo11n-pose.json) | NPU |
+| [yolo11n-pose-npu1](https://github.com/ultralytics/ultralytics) | CV | 3.1M | 姿态检测 | [mode_yolo11n-pose-npu1.json](projects/llm_framework/main_yolo/mode_yolo11n-pose-npu1.json) | NPU |
+| [yolo11n-hand-pose](https://github.com/ultralytics/ultralytics) | CV | 3.2M | 姿态检测 | [mode_yolo11n-hand-pose.json](projects/llm_framework/main_yolo/mode_yolo11n-hand-pose.json) | NPU |
+| [yolo11n-hand-pose-npu1](https://github.com/ultralytics/ultralytics) | CV | 3.2M | 姿态检测 | [mode_yolo11n-hand-pose-npu1.json](projects/llm_framework/main_yolo/mode_yolo11n-hand-pose-npu1.json) | NPU |
+| [depth-anything-ax630c](https://github.com/DepthAnything/Depth-Anything-V2) | CV | 29M | 单目深度估计 | [mode_depth-anything-ax630c.json](projects/llm_framework/main_depth_anything/mode_depth-anything-ax630c.json) | NPU |
+| [depth-anything-npu1-ax630c](https://github.com/DepthAnything/Depth-Anything-V2) | CV | 29M | 单目深度估计 | [mode_depth-anything-npu1-ax630c.json](projects/llm_framework/main_depth_anything/mode_depth-anything-npu1-ax630c.json) | NPU |
+
 ## 环境要求 ##
 当前 StackFlow 的 AI 单元是建立在 AXERA 加速平台之上的，主要的芯片平台为 ax630c、ax650n。系统要求为 ubuntu。
 
 
@@ -37,6 +37,7 @@ Send JSON:
 - enoutput: Whether to enable user result output. If you do not need to obtain camera images, do not enable this parameter, as the video stream will increase the communication pressure on the channel.
 - enable_webstream: Whether to enable webstream output, webstream will listen on tcp:8989 port, and once a client connection is received, it will push jpeg images in HTTP protocol multipart/x-mixed-replace type.
 - rtsp: Whether to enable rtsp stream output, rtsp will establish an RTSP TCP server at rtsp://{DevIp}:8554/axstream0, and you can pull the video stream from this port using the RTSP protocol. The video stream format is 1280x720 H265. Note that this video stream is only valid on the AX630C MIPI camera, and the UVC camera cannot use RTSP.
+- VinParam.bAiispEnable: Whether to enable AI-ISP, enabled by default. Set to 0 to disable, only valid when using AX630C MIPI camera.
 
 Response JSON:
 
 
@@ -37,6 +37,7 @@
 - enoutput：是否起用用户结果输出。如果不需要获取摄像头图片，请不要开启该参数，视频流会增加信道的通信压力。
 - enable_webstream：是否启用 webstream 流输出，webstream 会监听 tcp:8989 端口，一但收到客户端连接，将会以 HTTP 协议 multipart/x-mixed-replace 类型推送 jpeg 图片。
 - rtsp：是否启用 rtsp 流输出，rtsp 会建立一个 rtsp://{DevIp}:8554/axstream0 RTSP TCP 服务端，可使用RTSP 协议向该端口拉取视频流。视频流的格式为 1280x720 H265。注意，该视频流只在 AX630C MIPI 摄像头上有效，UVC 摄像头无法使用 RTSP。
+- VinParam.bAiispEnable：是否开启 AI-ISP，默认开启。关闭为 0，仅在使用 AX630C MIPI 摄像头时有效。
 
 响应 json：
 
 
@@ -0,0 +1,223 @@
+# llm_cosy_voice
+
+使用 npu 加速的文字转语音单元，用于提供文字转语音服务，可使用语音克隆，用于提供多语言转语音服务。
+
+## setup
+
+配置单元工作。
+
+发送 json：
+
+```json
+cosy_voice
+{
+  "request_id": "2",
+  "work_id": "cosy_voice",
+  "action": "setup",
+  "object": "cosy_voice.setup",
+  "data": {
+    "model": "CosyVoice2-0.5B-ax650",
+    "response_format": "file",
+    "input": "tts.utf-8",
+    "enoutput": false
+  }
+}
+```
+
+
+- request_id：参考基本数据解释。
+- work_id：配置单元时，为 `cosy_voice`。
+- action：调用的方法为 `setup`。
+- object：传输的数据类型为 `cosy_voice.setup`。
+- model：使用的模型为 `CosyVoice2-0.5B-ax650` 模型。
+- prompt_files：要克隆的音频信息文件。
+- response_format：返回结果为 `sys.pcm`, 系统音频数据，并直接发送到 llm-audio 模块进行播放。返回结果为 `file`, 生成的音频写 wav 文件，可用 `prompt_dir` 指定路径或文件名。
+- input：输入的为 `tts.utf-8`,代表的是从用户输入。
+- enoutput：是否起用用户结果输出。
+
+响应 json：
+
+```json
+{
+    "created": 1761791627,
+    "data": "None",
+    "error": {
+        "code": 0,
+        "message": ""
+    },
+    "object": "None",
+    "request_id": "2",
+    "work_id": "cosy_voice.1000"
+}
+```
+
+- created：消息创建时间，unix 时间。
+- work_id：返回成功创建的 work_id 单元。
+
+## inference
+
+### 流式输入
+
+```json
+{
+    "request_id": "2",
+    "work_id": "cosy_voice.1000",
+    "action": "inference",
+    "object": "cosy_voice.utf-8.stream",
+    "data": {
+        "delta": "今天天气真好！",
+        "index": 0,
+        "finish": true
+    }
+}
+```
+- object：传输的数据类型为 `cosy_voice.utf-8.stream` 代表的是从用户 utf-8 的流式输入
+- delta：流式输入的分段数据
+- index：流式输入的分段索引
+- finish:流式输入是否完成的标志位
+
+### 非流式输入
+
+```json
+{
+    "request_id": "2",
+    "work_id": "cosy_voice.1000",
+    "action": "inference",
+    "object": "cosy_voice.utf-8",
+    "data": "今天天气真好！"
+}
+```
+- object：传输的数据类型为 `cosy_voice.utf-8` 代表的是从用户 utf-8 的非流式输入
+- data：非流式输入的数据
+
+## pause
+
+暂停单元工作。
+
+发送 json：
+
+```json
+{
+  "request_id": "5",
+  "work_id": "cosy_voice.1000",
+  "action": "pause"
+}
+```
+
+响应 json：
+
+```json
+{
+    "created": 1761791706,
+    "data": "None",
+    "error": {
+        "code": 0,
+        "message": ""
+    },
+    "object": "None",
+    "request_id": "5",
+    "work_id": "cosy_voice.1000"
+}
+```
+
+error::code 为 0 表示执行成功。
+
+## exit
+
+单元退出。
+
+发送 json：
+
+```json
+{
+  "request_id": "7",
+  "work_id": "cosy_voice.1000",
+  "action": "exit"
+}
+```
+
+响应 json：
+
+```json
+{
+    "created": 1761791854,
+    "data": "None",
+    "error": {
+        "code": 0,
+        "message": ""
+    },
+    "object": "None",
+    "request_id": "7",
+    "work_id": "cosy_voice.1000"
+}
+```
+
+error::code 为 0 表示执行成功。
+
+## taskinfo
+
+获取任务列表。
+
+发送 json：
+
+```json
+{
+  "request_id": "2",
+  "work_id": "cosy_voice",
+  "action": "taskinfo"
+}
+```
+
+响应 json：
+
+```json
+{
+    "created": 1761791739,
+    "data": [
+        "cosy_voice.1000"
+    ],
+    "error": {
+        "code": 0,
+        "message": ""
+    },
+    "object": "llm.tasklist",
+    "request_id": "2",
+    "work_id": "cosy_voice"
+}
+```
+
+获取任务运行参数。
+
+```json
+{
+  "request_id": "2",
+  "work_id": "cosy_voice.1000",
+  "action": "taskinfo"
+}
+```
+
+响应 json：
+
+```json
+{
+    "created": 1761791761,
+    "data": {
+        "enoutput": false,
+        "inputs": [
+            "tts.utf-8"
+        ],
+        "model": "CosyVoice2-0.5B-ax650",
+        "response_format": "sys.pcm"
+    },
+    "error": {
+        "code": 0,
+        "message": ""
+    },
+    "object": "cosy_voice.taskinfo",
+    "request_id": "2",
+    "work_id": "cosy_voice.1000"
+}
+```
+
+> **注意：work_id 是按照单元的初始化注册顺序增加的，并不是固定的索引值。**  
+> **同类型单元不能配置多个单元同时工作，否则会产生未知错误。例如 tts 和 melo tts 不能同时拍起用工作。**
@@ -34,7 +34,7 @@ Send JSON:
 - response_format: The result returned is in `kws.bool` format.
 - input: The input is `sys.pcm`, representing system audio.
 - enoutput: Whether to enable user result output.
-- kws: The Chinese wake-up word is `"你好你好"`.
+- kws: The English wake-up word is `"HELLO"`. It must be capital letters.
 - enwake_audio: Whether to enable wake-up audio output. Default is true.
 
 Response JSON:
 
@@ -15,7 +15,7 @@ Send the following JSON:
   "action": "setup",
   "object": "vlm.setup",
   "data": {
-    "model": "internvl2.5-1B-ax630c",
+    "model": "internvl2.5-1B-364-ax630c",
     "response_format": "vlm.utf-8.stream",
     "input": "vlm.utf-8",
     "enoutput": true,
@@ -29,7 +29,7 @@ Send the following JSON:
 - work_id: Set to `vlm` when configuring the unit.
 - action: The method being called is `setup`.
 - object: Data type being transferred is `vlm.setup`.
-- model: The model used is `internvl2.5-1B-ax630c`, a multimodal model.
+- model: The model used is `internvl2.5-1B-364-ax630c`, a multimodal model.
 - response_format: The output is in `vlm.utf-8.stream`, a UTF-8 stream format.
 - input: The input is `vlm.utf-8`, representing user input.
 - enoutput: Specifies whether to enable user output.
@@ -250,7 +250,7 @@ Example:
   "action": "setup",
   "object": "vlm.setup",
   "data": {
-    "model": "internvl2.5-1B-ax630c",
+    "model": "internvl2.5-1B-364-ax630c",
     "response_format": "vlm.utf-8.stream",
     "input": [
       "vlm.utf-8",
@@ -264,6 +264,38 @@ Example:
 }
 ```
 
+Linking the Output of the llm-camera Unit.
+
+Sending JSON:
+
+```json
+{
+  "request_id": "3",
+  "work_id": "vlm.1003",
+  "action": "link",
+  "object": "work_id",
+  "data": "camera.1000"
+}
+```
+
+Response JSON:
+
+```json
+{
+    "created": 1750992545,
+    "data": "None",
+    "error": {
+        "code": 0,
+        "message": ""
+    },
+    "object": "None",
+    "request_id": "3",
+    "work_id": "vlm.1003"
+}
+```
+
+> **Ensure that the camera is properly configured and ready for operation when performing the link action. If using the AX630C MIPI camera, configure it in AI-ISP disabled mode during the initialization of llm-camera.**
+
 ## unlink
 
 Unlink units.
@@ -447,7 +479,7 @@ Response JSON:
       "vlm.utf-8",
       "kws.1000"
     ],
-    "model": "internvl2.5-1B-ax630c",
+    "model": "internvl2.5-1B-364-ax630c",
     "response_format": "vlm.utf-8.stream"
   },
   "error": {