Skip to content

Conversation

@RC-CHN
Copy link
Member

@RC-CHN RC-CHN commented Dec 28, 2025

了解到部分用户生成的备份文件较大,触发最大请求体限制而无法上传,同时目前针对备份文件的管理功能较为简陋,缺失编辑文件名,从上传备份列表恢复,元数据展示等功能。

Modifications / 改动点

修改了astrbot/core/backup/exporter中相关导出功能,添加来源元数据段,在astrbot/dashboard/routes/backup.py中添加了分片上传,文件重命名等相关方法,并更新了前端对应部分及国际化

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

image 380e46076833666338a4164b816d5926 image image

Checklist / 检查清单

  • 😊 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。/ If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
  • 👀 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。/ My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
  • 🤓 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到了 requirements.txtpyproject.toml 文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
  • 😮 我的更改没有引入恶意代码。/ My changes do not introduce malicious code.

Summary by Sourcery

在后端和控制台界面中添加分片备份上传支持,并增强备份文件管理功能。

新功能:

  • 支持通过初始化的分片可续传上传会话,对大型备份进行上传,并提供完成和中止上传的接口端点。
  • 通过 manifest 暴露备份文件元数据(来源、AstrBot 版本、导出时间),并在备份列表界面中展示这些信息。
  • 允许直接从备份列表中的现有备份文件进行恢复,而无需重新上传。
  • 在 UI 中添加备份文件重命名功能,并通过安全的重命名 API 端点提供支持。

增强改进:

  • 优化备份文件名生成逻辑,在保留原始基础名称的同时追加时间戳后缀。
  • 在列出备份时跳过无效或非 AstrBot 的 ZIP 文件,并在界面中对上传的备份进行视觉区分。
  • 改进导入/上传的用户体验,添加详细进度指示,并为备份文件下载使用浏览器原生下载处理。

测试:

  • 扩展备份相关测试,以验证 manifest 中的来源元数据以及更新后的唯一文件名生成行为。
Original summary in English

Summary by Sourcery

Add chunked backup upload support and enhance backup file management in both backend and dashboard UI.

New Features:

  • Support large backup uploads via initialized, chunked, resumable upload sessions with completion and abort endpoints.
  • Expose backup file metadata (origin, AstrBot version, export time) via manifest and display it in the backup list UI.
  • Allow restoring directly from an existing backup file in the list without re-uploading.
  • Add backup file renaming capability from the UI backed by a secure rename API endpoint.

Enhancements:

  • Refine backup filename generation to append a timestamp suffix while preserving the original base name.
  • Skip invalid or non-AstrBot ZIPs when listing backups and visually distinguish uploaded backups in the UI.
  • Improve import/upload UX with detailed progress indicators and native browser download handling for backup files.

Tests:

  • Extend backup tests to validate manifest origin metadata and updated unique filename generation behavior.

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend area:webui The bug / feature is about webui(dashboard) of astrbot. labels Dec 28, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嘿,我这里发现了 4 个问题,并给出了一些整体性反馈:

  • 后台清理任务在 BackupRoute.__init__ 中通过 asyncio.create_task 启动,这要求在构造时已有一个活动的事件循环,而且该任务从未被取消;建议把启动逻辑延后到一个明确的异步生命周期钩子中(或者通过注入事件循环),并在关闭时确保取消该任务,以避免警告和任务泄漏。
  • CHUNK_SIZE 常量在后端(backup.py)和前端(BackupDialog.vue)中都被硬编码;建议由后端暴露分片大小(例如只在 /upload/init 的响应中返回),前端从该值派生,这样一旦单侧修改就不会出现静默失配。
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The background cleanup task is started with `asyncio.create_task` in `BackupRoute.__init__`, which assumes an active event loop at construction time and never gets cancelled; consider deferring startup to a clearly async lifecycle hook (or injecting the loop) and ensuring the task is cancelled on shutdown to avoid warnings and leaked tasks.
- The `CHUNK_SIZE` constant is hard-coded in both the backend (`backup.py`) and the frontend (`BackupDialog.vue`); exposing the chunk size from the backend (e.g., via `/upload/init` response only) and deriving the frontend value from that would avoid silent desynchronization if one side is changed.

## Individual Comments

### Comment 1
<location> `astrbot/dashboard/routes/backup.py:227` </location>
<code_context>
+    def _cleanup_upload_session(self, upload_id: str):
</code_context>

<issue_to_address>
**suggestion:** Upload sessions can expire mid-upload without being refreshed on activity.

`created_at` is only set in `upload_init`, so expiry is based purely on start time. For large/slow uploads that exceed `UPLOAD_EXPIRE_SECONDS`, the cleanup task can remove the session and chunk directory while the client is still sending data, causing `上传会话不存在或已过期` and failed completion. Please update the session timestamp (or add a `last_activity` field) in `upload_chunk` so active uploads aren’t cleaned up early.
</issue_to_address>

### Comment 2
<location> `astrbot/dashboard/routes/backup.py:124-125` </location>
<code_context>
         }
         self.register_routes()

+        # 启动清理过期上传会话的任务
+        asyncio.create_task(self._cleanup_expired_uploads())
+
     def _init_task(self, task_id: str, task_type: str, status: str = "pending") -> None:
</code_context>

<issue_to_address>
**issue (bug_risk):** Starting an asyncio task in __init__ can fail if no running event loop is available.

`asyncio.create_task` here assumes there is an active event loop on the current thread. If `BackupRoute` is constructed at import time or from sync code, this can raise `RuntimeError: no running event loop` or attach the task to the wrong loop. Consider starting this background task from a dedicated async startup/lifecycle hook where the loop is guaranteed to be running, and keep the task handle if you need to cancel it on shutdown.
</issue_to_address>

### Comment 3
<location> `astrbot/dashboard/routes/backup.py:64-68` </location>
<code_context>


 def generate_unique_filename(original_filename: str) -> str:
-    """生成唯一的文件名,添加时间戳前缀
+    """生成唯一的文件名,添加时间戳后缀避免重名
</code_context>

<issue_to_address>
**issue:** Docstring/comment for generate_unique_filename do not match the actual behavior.

The docstring and inline comment mention reusing filenames that already contain a timestamp, but the code always appends `_{timestamp}` to `name` without checking for an existing timestamp. Please either implement timestamp detection (to avoid duplicating timestamps) or update the docstring/comment so they match the actual behavior.
</issue_to_address>

### Comment 4
<location> `astrbot/dashboard/routes/backup.py:440-449` </location>
<code_context>
             logger.error(traceback.format_exc())
             return Response().error(f"上传备份文件失败: {e!s}").__dict__

+    async def upload_init(self):
+        """初始化分片上传
+
+        创建一个上传会话,返回 upload_id 供后续分片上传使用。
+
+        JSON Body:
+        - filename: 原始文件名
+        - total_size: 文件总大小(字节)
+        - total_chunks: 分片总数
+
+        返回:
+        - upload_id: 上传会话 ID
+        - chunk_size: 建议的分片大小
+        """
+        try:
+            data = await request.json
+            filename = data.get("filename")
+            total_size = data.get("total_size", 0)
+            total_chunks = data.get("total_chunks", 0)
+
+            if not filename:
</code_context>

<issue_to_address>
**🚨 suggestion (security):** Lack of server-side validation for total_size/total_chunks could allow pathological upload parameters.

In `upload_init` you only check `total_chunks > 0`. A client can set arbitrarily large `total_size` or huge `total_chunks` (e.g., millions), then issue many tiny chunk requests, causing excessive disk/CPU usage. Consider enforcing reasonable upper bounds for both values and checking that `total_size` roughly matches `total_chunks * CHUNK_SIZE` (with tolerance for a final partial chunk) so you can reject clearly invalid or abusive sessions early.

Suggested implementation:

```python
            data = await request.json
            filename = data.get("filename")
            # 防止客户端传入字符串等类型,这里强制转为 int
            total_size = int(data.get("total_size", 0) or 0)
            total_chunks = int(data.get("total_chunks", 0) or 0)

```

```python
            # 基础参数校验
            if total_size <= 0:
                return Response().error("无效的文件大小").__dict__

            if total_chunks <= 0:
                return Response().error("无效的分片数量").__dict__

            # 服务端限制,防止滥用
            MAX_TOTAL_SIZE = 2 * 1024 * 1024 * 1024  # 2GB
            MAX_TOTAL_CHUNKS = 10_000

            if total_size > MAX_TOTAL_SIZE:
                return Response().error("备份文件过大,已超过服务器允许的最大大小").__dict__

            if total_chunks > MAX_TOTAL_CHUNKS:
                return Response().error("分片数量过多,已超过服务器允许的上限").__dict__

            # 按服务器约定的分片大小做一致性校验
            # 如果后续代码/配置中已有统一的分片大小常量,请将这里的值与之保持一致
            chunk_size = 5 * 1024 * 1024  # 5MB
            # total_size 应落在 [(total_chunks-1)*chunk_size + 1, total_chunks*chunk_size] 区间内
            # 允许最后一片为部分分片(至少 1 字节)
            min_expected_size = (total_chunks - 1) * chunk_size + 1
            max_expected_size = total_chunks * chunk_size
            if not (min_expected_size <= total_size <= max_expected_size):
                return Response().error("文件大小与分片数量不匹配,请重新发起备份上传").__dict__

            # 生成上传 ID
            upload_id = str(uuid.uuid4())

```

1. 确认本文件或项目中是否已经有统一的分片大小常量(例如 `CHUNK_SIZE``BACKUP_CHUNK_SIZE` 等),如果有:
   - 将本方法中的 `chunk_size = 5 * 1024 * 1024` 替换为对应的全局/配置常量;
   - 确保返回给客户端的 `chunk_size` 字段(如果在后面的代码中有)与这里使用的值一致。
2. 如果系统希望允许更大文件或更多分片,可以将 `MAX_TOTAL_SIZE``MAX_TOTAL_CHUNKS` 提取到配置文件/常量区域,并在这里引用,以便统一调整策略。
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Original comment in English

Hey - I've found 4 issues, and left some high level feedback:

  • The background cleanup task is started with asyncio.create_task in BackupRoute.__init__, which assumes an active event loop at construction time and never gets cancelled; consider deferring startup to a clearly async lifecycle hook (or injecting the loop) and ensuring the task is cancelled on shutdown to avoid warnings and leaked tasks.
  • The CHUNK_SIZE constant is hard-coded in both the backend (backup.py) and the frontend (BackupDialog.vue); exposing the chunk size from the backend (e.g., via /upload/init response only) and deriving the frontend value from that would avoid silent desynchronization if one side is changed.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The background cleanup task is started with `asyncio.create_task` in `BackupRoute.__init__`, which assumes an active event loop at construction time and never gets cancelled; consider deferring startup to a clearly async lifecycle hook (or injecting the loop) and ensuring the task is cancelled on shutdown to avoid warnings and leaked tasks.
- The `CHUNK_SIZE` constant is hard-coded in both the backend (`backup.py`) and the frontend (`BackupDialog.vue`); exposing the chunk size from the backend (e.g., via `/upload/init` response only) and deriving the frontend value from that would avoid silent desynchronization if one side is changed.

## Individual Comments

### Comment 1
<location> `astrbot/dashboard/routes/backup.py:227` </location>
<code_context>
+    def _cleanup_upload_session(self, upload_id: str):
</code_context>

<issue_to_address>
**suggestion:** Upload sessions can expire mid-upload without being refreshed on activity.

`created_at` is only set in `upload_init`, so expiry is based purely on start time. For large/slow uploads that exceed `UPLOAD_EXPIRE_SECONDS`, the cleanup task can remove the session and chunk directory while the client is still sending data, causing `上传会话不存在或已过期` and failed completion. Please update the session timestamp (or add a `last_activity` field) in `upload_chunk` so active uploads aren’t cleaned up early.
</issue_to_address>

### Comment 2
<location> `astrbot/dashboard/routes/backup.py:124-125` </location>
<code_context>
         }
         self.register_routes()

+        # 启动清理过期上传会话的任务
+        asyncio.create_task(self._cleanup_expired_uploads())
+
     def _init_task(self, task_id: str, task_type: str, status: str = "pending") -> None:
</code_context>

<issue_to_address>
**issue (bug_risk):** Starting an asyncio task in __init__ can fail if no running event loop is available.

`asyncio.create_task` here assumes there is an active event loop on the current thread. If `BackupRoute` is constructed at import time or from sync code, this can raise `RuntimeError: no running event loop` or attach the task to the wrong loop. Consider starting this background task from a dedicated async startup/lifecycle hook where the loop is guaranteed to be running, and keep the task handle if you need to cancel it on shutdown.
</issue_to_address>

### Comment 3
<location> `astrbot/dashboard/routes/backup.py:64-68` </location>
<code_context>


 def generate_unique_filename(original_filename: str) -> str:
-    """生成唯一的文件名,添加时间戳前缀
+    """生成唯一的文件名,添加时间戳后缀避免重名
</code_context>

<issue_to_address>
**issue:** Docstring/comment for generate_unique_filename do not match the actual behavior.

The docstring and inline comment mention reusing filenames that already contain a timestamp, but the code always appends `_{timestamp}` to `name` without checking for an existing timestamp. Please either implement timestamp detection (to avoid duplicating timestamps) or update the docstring/comment so they match the actual behavior.
</issue_to_address>

### Comment 4
<location> `astrbot/dashboard/routes/backup.py:440-449` </location>
<code_context>
             logger.error(traceback.format_exc())
             return Response().error(f"上传备份文件失败: {e!s}").__dict__

+    async def upload_init(self):
+        """初始化分片上传
+
+        创建一个上传会话,返回 upload_id 供后续分片上传使用。
+
+        JSON Body:
+        - filename: 原始文件名
+        - total_size: 文件总大小(字节)
+        - total_chunks: 分片总数
+
+        返回:
+        - upload_id: 上传会话 ID
+        - chunk_size: 建议的分片大小
+        """
+        try:
+            data = await request.json
+            filename = data.get("filename")
+            total_size = data.get("total_size", 0)
+            total_chunks = data.get("total_chunks", 0)
+
+            if not filename:
</code_context>

<issue_to_address>
**🚨 suggestion (security):** Lack of server-side validation for total_size/total_chunks could allow pathological upload parameters.

In `upload_init` you only check `total_chunks > 0`. A client can set arbitrarily large `total_size` or huge `total_chunks` (e.g., millions), then issue many tiny chunk requests, causing excessive disk/CPU usage. Consider enforcing reasonable upper bounds for both values and checking that `total_size` roughly matches `total_chunks * CHUNK_SIZE` (with tolerance for a final partial chunk) so you can reject clearly invalid or abusive sessions early.

Suggested implementation:

```python
            data = await request.json
            filename = data.get("filename")
            # 防止客户端传入字符串等类型,这里强制转为 int
            total_size = int(data.get("total_size", 0) or 0)
            total_chunks = int(data.get("total_chunks", 0) or 0)

```

```python
            # 基础参数校验
            if total_size <= 0:
                return Response().error("无效的文件大小").__dict__

            if total_chunks <= 0:
                return Response().error("无效的分片数量").__dict__

            # 服务端限制,防止滥用
            MAX_TOTAL_SIZE = 2 * 1024 * 1024 * 1024  # 2GB
            MAX_TOTAL_CHUNKS = 10_000

            if total_size > MAX_TOTAL_SIZE:
                return Response().error("备份文件过大,已超过服务器允许的最大大小").__dict__

            if total_chunks > MAX_TOTAL_CHUNKS:
                return Response().error("分片数量过多,已超过服务器允许的上限").__dict__

            # 按服务器约定的分片大小做一致性校验
            # 如果后续代码/配置中已有统一的分片大小常量,请将这里的值与之保持一致
            chunk_size = 5 * 1024 * 1024  # 5MB
            # total_size 应落在 [(total_chunks-1)*chunk_size + 1, total_chunks*chunk_size] 区间内
            # 允许最后一片为部分分片(至少 1 字节)
            min_expected_size = (total_chunks - 1) * chunk_size + 1
            max_expected_size = total_chunks * chunk_size
            if not (min_expected_size <= total_size <= max_expected_size):
                return Response().error("文件大小与分片数量不匹配,请重新发起备份上传").__dict__

            # 生成上传 ID
            upload_id = str(uuid.uuid4())

```

1. 确认本文件或项目中是否已经有统一的分片大小常量(例如 `CHUNK_SIZE``BACKUP_CHUNK_SIZE` 等),如果有:
   - 将本方法中的 `chunk_size = 5 * 1024 * 1024` 替换为对应的全局/配置常量;
   - 确保返回给客户端的 `chunk_size` 字段(如果在后面的代码中有)与这里使用的值一致。
2. 如果系统希望允许更大文件或更多分片,可以将 `MAX_TOTAL_SIZE``MAX_TOTAL_CHUNKS` 提取到配置文件/常量区域,并在这里引用,以便统一调整策略。
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

- 修复浏览器原生下载鉴权问题,支持 URL 参数传递 token
- 修复上传会话过期判断,使用 last_activity 避免活跃上传被清理
- 延迟启动后台清理任务,避免 asyncio 事件循环问题
- 统一由后端计算 chunk_size 和 total_chunks,避免前后端不一致
- 更新 generate_unique_filename 文档注释与实际行为一致
- 更新测试用例以验证 origin 字段

修复问题:
- 浏览器下载时显示"需要授权"
- 大文件上传可能因会话过期失败
- __init__ 中 asyncio.create_task 可能失败
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 29, 2025
@Soulter Soulter merged commit 9eafd7b into AstrBotDevs:master Dec 29, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend area:webui The bug / feature is about webui(dashboard) of astrbot. lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants