-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: 为备份部分添加分片上传及备份文件相关管理功能 #4237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嘿,我这里发现了 4 个问题,并给出了一些整体性反馈:
- 后台清理任务在
BackupRoute.__init__中通过asyncio.create_task启动,这要求在构造时已有一个活动的事件循环,而且该任务从未被取消;建议把启动逻辑延后到一个明确的异步生命周期钩子中(或者通过注入事件循环),并在关闭时确保取消该任务,以避免警告和任务泄漏。 CHUNK_SIZE常量在后端(backup.py)和前端(BackupDialog.vue)中都被硬编码;建议由后端暴露分片大小(例如只在/upload/init的响应中返回),前端从该值派生,这样一旦单侧修改就不会出现静默失配。
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The background cleanup task is started with `asyncio.create_task` in `BackupRoute.__init__`, which assumes an active event loop at construction time and never gets cancelled; consider deferring startup to a clearly async lifecycle hook (or injecting the loop) and ensuring the task is cancelled on shutdown to avoid warnings and leaked tasks.
- The `CHUNK_SIZE` constant is hard-coded in both the backend (`backup.py`) and the frontend (`BackupDialog.vue`); exposing the chunk size from the backend (e.g., via `/upload/init` response only) and deriving the frontend value from that would avoid silent desynchronization if one side is changed.
## Individual Comments
### Comment 1
<location> `astrbot/dashboard/routes/backup.py:227` </location>
<code_context>
+ def _cleanup_upload_session(self, upload_id: str):
</code_context>
<issue_to_address>
**suggestion:** Upload sessions can expire mid-upload without being refreshed on activity.
`created_at` is only set in `upload_init`, so expiry is based purely on start time. For large/slow uploads that exceed `UPLOAD_EXPIRE_SECONDS`, the cleanup task can remove the session and chunk directory while the client is still sending data, causing `上传会话不存在或已过期` and failed completion. Please update the session timestamp (or add a `last_activity` field) in `upload_chunk` so active uploads aren’t cleaned up early.
</issue_to_address>
### Comment 2
<location> `astrbot/dashboard/routes/backup.py:124-125` </location>
<code_context>
}
self.register_routes()
+ # 启动清理过期上传会话的任务
+ asyncio.create_task(self._cleanup_expired_uploads())
+
def _init_task(self, task_id: str, task_type: str, status: str = "pending") -> None:
</code_context>
<issue_to_address>
**issue (bug_risk):** Starting an asyncio task in __init__ can fail if no running event loop is available.
`asyncio.create_task` here assumes there is an active event loop on the current thread. If `BackupRoute` is constructed at import time or from sync code, this can raise `RuntimeError: no running event loop` or attach the task to the wrong loop. Consider starting this background task from a dedicated async startup/lifecycle hook where the loop is guaranteed to be running, and keep the task handle if you need to cancel it on shutdown.
</issue_to_address>
### Comment 3
<location> `astrbot/dashboard/routes/backup.py:64-68` </location>
<code_context>
def generate_unique_filename(original_filename: str) -> str:
- """生成唯一的文件名,添加时间戳前缀
+ """生成唯一的文件名,添加时间戳后缀避免重名
</code_context>
<issue_to_address>
**issue:** Docstring/comment for generate_unique_filename do not match the actual behavior.
The docstring and inline comment mention reusing filenames that already contain a timestamp, but the code always appends `_{timestamp}` to `name` without checking for an existing timestamp. Please either implement timestamp detection (to avoid duplicating timestamps) or update the docstring/comment so they match the actual behavior.
</issue_to_address>
### Comment 4
<location> `astrbot/dashboard/routes/backup.py:440-449` </location>
<code_context>
logger.error(traceback.format_exc())
return Response().error(f"上传备份文件失败: {e!s}").__dict__
+ async def upload_init(self):
+ """初始化分片上传
+
+ 创建一个上传会话,返回 upload_id 供后续分片上传使用。
+
+ JSON Body:
+ - filename: 原始文件名
+ - total_size: 文件总大小(字节)
+ - total_chunks: 分片总数
+
+ 返回:
+ - upload_id: 上传会话 ID
+ - chunk_size: 建议的分片大小
+ """
+ try:
+ data = await request.json
+ filename = data.get("filename")
+ total_size = data.get("total_size", 0)
+ total_chunks = data.get("total_chunks", 0)
+
+ if not filename:
</code_context>
<issue_to_address>
**🚨 suggestion (security):** Lack of server-side validation for total_size/total_chunks could allow pathological upload parameters.
In `upload_init` you only check `total_chunks > 0`. A client can set arbitrarily large `total_size` or huge `total_chunks` (e.g., millions), then issue many tiny chunk requests, causing excessive disk/CPU usage. Consider enforcing reasonable upper bounds for both values and checking that `total_size` roughly matches `total_chunks * CHUNK_SIZE` (with tolerance for a final partial chunk) so you can reject clearly invalid or abusive sessions early.
Suggested implementation:
```python
data = await request.json
filename = data.get("filename")
# 防止客户端传入字符串等类型,这里强制转为 int
total_size = int(data.get("total_size", 0) or 0)
total_chunks = int(data.get("total_chunks", 0) or 0)
```
```python
# 基础参数校验
if total_size <= 0:
return Response().error("无效的文件大小").__dict__
if total_chunks <= 0:
return Response().error("无效的分片数量").__dict__
# 服务端限制,防止滥用
MAX_TOTAL_SIZE = 2 * 1024 * 1024 * 1024 # 2GB
MAX_TOTAL_CHUNKS = 10_000
if total_size > MAX_TOTAL_SIZE:
return Response().error("备份文件过大,已超过服务器允许的最大大小").__dict__
if total_chunks > MAX_TOTAL_CHUNKS:
return Response().error("分片数量过多,已超过服务器允许的上限").__dict__
# 按服务器约定的分片大小做一致性校验
# 如果后续代码/配置中已有统一的分片大小常量,请将这里的值与之保持一致
chunk_size = 5 * 1024 * 1024 # 5MB
# total_size 应落在 [(total_chunks-1)*chunk_size + 1, total_chunks*chunk_size] 区间内
# 允许最后一片为部分分片(至少 1 字节)
min_expected_size = (total_chunks - 1) * chunk_size + 1
max_expected_size = total_chunks * chunk_size
if not (min_expected_size <= total_size <= max_expected_size):
return Response().error("文件大小与分片数量不匹配,请重新发起备份上传").__dict__
# 生成上传 ID
upload_id = str(uuid.uuid4())
```
1. 确认本文件或项目中是否已经有统一的分片大小常量(例如 `CHUNK_SIZE`、`BACKUP_CHUNK_SIZE` 等),如果有:
- 将本方法中的 `chunk_size = 5 * 1024 * 1024` 替换为对应的全局/配置常量;
- 确保返回给客户端的 `chunk_size` 字段(如果在后面的代码中有)与这里使用的值一致。
2. 如果系统希望允许更大文件或更多分片,可以将 `MAX_TOTAL_SIZE`、`MAX_TOTAL_CHUNKS` 提取到配置文件/常量区域,并在这里引用,以便统一调整策略。
</issue_to_address>Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Original comment in English
Hey - I've found 4 issues, and left some high level feedback:
- The background cleanup task is started with
asyncio.create_taskinBackupRoute.__init__, which assumes an active event loop at construction time and never gets cancelled; consider deferring startup to a clearly async lifecycle hook (or injecting the loop) and ensuring the task is cancelled on shutdown to avoid warnings and leaked tasks. - The
CHUNK_SIZEconstant is hard-coded in both the backend (backup.py) and the frontend (BackupDialog.vue); exposing the chunk size from the backend (e.g., via/upload/initresponse only) and deriving the frontend value from that would avoid silent desynchronization if one side is changed.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The background cleanup task is started with `asyncio.create_task` in `BackupRoute.__init__`, which assumes an active event loop at construction time and never gets cancelled; consider deferring startup to a clearly async lifecycle hook (or injecting the loop) and ensuring the task is cancelled on shutdown to avoid warnings and leaked tasks.
- The `CHUNK_SIZE` constant is hard-coded in both the backend (`backup.py`) and the frontend (`BackupDialog.vue`); exposing the chunk size from the backend (e.g., via `/upload/init` response only) and deriving the frontend value from that would avoid silent desynchronization if one side is changed.
## Individual Comments
### Comment 1
<location> `astrbot/dashboard/routes/backup.py:227` </location>
<code_context>
+ def _cleanup_upload_session(self, upload_id: str):
</code_context>
<issue_to_address>
**suggestion:** Upload sessions can expire mid-upload without being refreshed on activity.
`created_at` is only set in `upload_init`, so expiry is based purely on start time. For large/slow uploads that exceed `UPLOAD_EXPIRE_SECONDS`, the cleanup task can remove the session and chunk directory while the client is still sending data, causing `上传会话不存在或已过期` and failed completion. Please update the session timestamp (or add a `last_activity` field) in `upload_chunk` so active uploads aren’t cleaned up early.
</issue_to_address>
### Comment 2
<location> `astrbot/dashboard/routes/backup.py:124-125` </location>
<code_context>
}
self.register_routes()
+ # 启动清理过期上传会话的任务
+ asyncio.create_task(self._cleanup_expired_uploads())
+
def _init_task(self, task_id: str, task_type: str, status: str = "pending") -> None:
</code_context>
<issue_to_address>
**issue (bug_risk):** Starting an asyncio task in __init__ can fail if no running event loop is available.
`asyncio.create_task` here assumes there is an active event loop on the current thread. If `BackupRoute` is constructed at import time or from sync code, this can raise `RuntimeError: no running event loop` or attach the task to the wrong loop. Consider starting this background task from a dedicated async startup/lifecycle hook where the loop is guaranteed to be running, and keep the task handle if you need to cancel it on shutdown.
</issue_to_address>
### Comment 3
<location> `astrbot/dashboard/routes/backup.py:64-68` </location>
<code_context>
def generate_unique_filename(original_filename: str) -> str:
- """生成唯一的文件名,添加时间戳前缀
+ """生成唯一的文件名,添加时间戳后缀避免重名
</code_context>
<issue_to_address>
**issue:** Docstring/comment for generate_unique_filename do not match the actual behavior.
The docstring and inline comment mention reusing filenames that already contain a timestamp, but the code always appends `_{timestamp}` to `name` without checking for an existing timestamp. Please either implement timestamp detection (to avoid duplicating timestamps) or update the docstring/comment so they match the actual behavior.
</issue_to_address>
### Comment 4
<location> `astrbot/dashboard/routes/backup.py:440-449` </location>
<code_context>
logger.error(traceback.format_exc())
return Response().error(f"上传备份文件失败: {e!s}").__dict__
+ async def upload_init(self):
+ """初始化分片上传
+
+ 创建一个上传会话,返回 upload_id 供后续分片上传使用。
+
+ JSON Body:
+ - filename: 原始文件名
+ - total_size: 文件总大小(字节)
+ - total_chunks: 分片总数
+
+ 返回:
+ - upload_id: 上传会话 ID
+ - chunk_size: 建议的分片大小
+ """
+ try:
+ data = await request.json
+ filename = data.get("filename")
+ total_size = data.get("total_size", 0)
+ total_chunks = data.get("total_chunks", 0)
+
+ if not filename:
</code_context>
<issue_to_address>
**🚨 suggestion (security):** Lack of server-side validation for total_size/total_chunks could allow pathological upload parameters.
In `upload_init` you only check `total_chunks > 0`. A client can set arbitrarily large `total_size` or huge `total_chunks` (e.g., millions), then issue many tiny chunk requests, causing excessive disk/CPU usage. Consider enforcing reasonable upper bounds for both values and checking that `total_size` roughly matches `total_chunks * CHUNK_SIZE` (with tolerance for a final partial chunk) so you can reject clearly invalid or abusive sessions early.
Suggested implementation:
```python
data = await request.json
filename = data.get("filename")
# 防止客户端传入字符串等类型,这里强制转为 int
total_size = int(data.get("total_size", 0) or 0)
total_chunks = int(data.get("total_chunks", 0) or 0)
```
```python
# 基础参数校验
if total_size <= 0:
return Response().error("无效的文件大小").__dict__
if total_chunks <= 0:
return Response().error("无效的分片数量").__dict__
# 服务端限制,防止滥用
MAX_TOTAL_SIZE = 2 * 1024 * 1024 * 1024 # 2GB
MAX_TOTAL_CHUNKS = 10_000
if total_size > MAX_TOTAL_SIZE:
return Response().error("备份文件过大,已超过服务器允许的最大大小").__dict__
if total_chunks > MAX_TOTAL_CHUNKS:
return Response().error("分片数量过多,已超过服务器允许的上限").__dict__
# 按服务器约定的分片大小做一致性校验
# 如果后续代码/配置中已有统一的分片大小常量,请将这里的值与之保持一致
chunk_size = 5 * 1024 * 1024 # 5MB
# total_size 应落在 [(total_chunks-1)*chunk_size + 1, total_chunks*chunk_size] 区间内
# 允许最后一片为部分分片(至少 1 字节)
min_expected_size = (total_chunks - 1) * chunk_size + 1
max_expected_size = total_chunks * chunk_size
if not (min_expected_size <= total_size <= max_expected_size):
return Response().error("文件大小与分片数量不匹配,请重新发起备份上传").__dict__
# 生成上传 ID
upload_id = str(uuid.uuid4())
```
1. 确认本文件或项目中是否已经有统一的分片大小常量(例如 `CHUNK_SIZE`、`BACKUP_CHUNK_SIZE` 等),如果有:
- 将本方法中的 `chunk_size = 5 * 1024 * 1024` 替换为对应的全局/配置常量;
- 确保返回给客户端的 `chunk_size` 字段(如果在后面的代码中有)与这里使用的值一致。
2. 如果系统希望允许更大文件或更多分片,可以将 `MAX_TOTAL_SIZE`、`MAX_TOTAL_CHUNKS` 提取到配置文件/常量区域,并在这里引用,以便统一调整策略。
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- 修复浏览器原生下载鉴权问题,支持 URL 参数传递 token - 修复上传会话过期判断,使用 last_activity 避免活跃上传被清理 - 延迟启动后台清理任务,避免 asyncio 事件循环问题 - 统一由后端计算 chunk_size 和 total_chunks,避免前后端不一致 - 更新 generate_unique_filename 文档注释与实际行为一致 - 更新测试用例以验证 origin 字段 修复问题: - 浏览器下载时显示"需要授权" - 大文件上传可能因会话过期失败 - __init__ 中 asyncio.create_task 可能失败
Soulter
approved these changes
Dec 29, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:core
The bug / feature is about astrbot's core, backend
area:webui
The bug / feature is about webui(dashboard) of astrbot.
lgtm
This PR has been approved by a maintainer
size:XL
This PR changes 500-999 lines, ignoring generated files.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
了解到部分用户生成的备份文件较大,触发最大请求体限制而无法上传,同时目前针对备份文件的管理功能较为简陋,缺失编辑文件名,从上传备份列表恢复,元数据展示等功能。
Modifications / 改动点
修改了astrbot/core/backup/exporter中相关导出功能,添加来源元数据段,在astrbot/dashboard/routes/backup.py中添加了分片上传,文件重命名等相关方法,并更新了前端对应部分及国际化
Screenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
requirements.txt和pyproject.toml文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations inrequirements.txtandpyproject.toml.Summary by Sourcery
在后端和控制台界面中添加分片备份上传支持,并增强备份文件管理功能。
新功能:
增强改进:
测试:
Original summary in English
Summary by Sourcery
Add chunked backup upload support and enhance backup file management in both backend and dashboard UI.
New Features:
Enhancements:
Tests: