Skip to content

Conversation

@ocetars
Copy link
Member

@ocetars ocetars commented Dec 29, 2025

Fixes #4214

Summary by Sourcery

Bug Fixes:

  • 通过添加更健壮、与平台相关的解码策略,在解码 pip 安装输出时避免出现 UnicodeDecodeError
Original summary in English

Summary by Sourcery

Bug Fixes:

  • Prevent UnicodeDecodeError when decoding pip installation output by adding a more robust, platform-aware decoding strategy.
Original summary in English

Summary by Sourcery

Bug Fixes:

  • 通过添加更健壮、与平台相关的解码策略,在解码 pip 安装输出时避免出现 UnicodeDecodeError
Original summary in English

Summary by Sourcery

Bug Fixes:

  • Prevent UnicodeDecodeError when decoding pip installation output by adding a more robust, platform-aware decoding strategy.

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend labels Dec 29, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你好——我发现了 1 个问题,并给出了一些整体反馈:

  • 建议在模块级缓存 locale.getpreferredencoding(False)sys.platform.startswith("win") 的检查,这样 _robust_decode 在处理每一行输出时就不需要重复进行这些基本静态的查询。
  • 当回退到使用 errors="replace" 时,可以考虑在发生替换时加入一个简单的标记或调试日志,这样在排查 pip 输出中的意外编码问题时会更容易。
给 AI Agent 的提示
Please address the comments from this code review:

## Overall Comments
- Consider caching `locale.getpreferredencoding(False)` and the `sys.platform.startswith("win")` check at module level so `_robust_decode` doesn’t repeat these relatively static lookups for every output line.
- When falling back to `errors="replace"`, you might want to include a brief marker or debug log when replacements occur so it’s easier to diagnose unexpected encoding issues in the pip output.

## Individual Comments

### Comment 1
<location> `astrbot/core/utils/pip_installer.py:11-24` </location>
<code_context>

+def _robust_decode(line: bytes) -> str:
+    """解码字节流,兼容不同平台的编码"""
+    try:
+        return line.decode("utf-8").strip()
+    except UnicodeDecodeError:
+        pass
+    try:
+        return line.decode(locale.getpreferredencoding(False)).strip()
+    except UnicodeDecodeError:
+        pass
+    if sys.platform.startswith("win"):
+        try:
+            return line.decode("gbk").strip()
+        except UnicodeDecodeError:
+            pass
+    return line.decode("utf-8", errors="replace").strip()
+
+
</code_context>

<issue_to_address>
**suggestion:** 建议在本地编码和 Windows 特定的解码中同样使用 `errors="replace"`(或类似方式),以保留更多可读输出。

目前只有最后一次 UTF-8 解码使用了 `errors="replace"`;本地编码和 Windows 的 GBK 解码都是严格解码,然后再回退到带替换的 UTF-8。对于非 UTF-8 的输出,这可能会产生不必要的乱码。可以考虑让本地编码/GBK 解码也使用 `errors="replace"`(或 `"ignore"`),这样在大部分内容有效的情况下也能成功解码,并保留最后的 UTF-8 兜底,用于所有其他解码都失败的情形。

```suggestion
    try:
        return line.decode("utf-8").strip()
    except UnicodeDecodeError:
        pass
    try:
        # 使用系统首选编码,允许替换非法字符以尽量保留可读输出
        return line.decode(locale.getpreferredencoding(False), errors="replace").strip()
    except UnicodeDecodeError:
        pass
    if sys.platform.startswith("win"):
        try:
            # Windows 下常见 GBK 编码,同样使用替换策略
            return line.decode("gbk", errors="replace").strip()
        except UnicodeDecodeError:
            pass
    # 最后的兜底仍然使用 UTF-8 + replace,确保不会抛出异常
    return line.decode("utf-8", errors="replace").strip()
```
</issue_to_address>

Sourcery 对开源项目免费——如果你喜欢我们的代码审查,欢迎分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进后续的代码审查。
Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

  • Consider caching locale.getpreferredencoding(False) and the sys.platform.startswith("win") check at module level so _robust_decode doesn’t repeat these relatively static lookups for every output line.
  • When falling back to errors="replace", you might want to include a brief marker or debug log when replacements occur so it’s easier to diagnose unexpected encoding issues in the pip output.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider caching `locale.getpreferredencoding(False)` and the `sys.platform.startswith("win")` check at module level so `_robust_decode` doesn’t repeat these relatively static lookups for every output line.
- When falling back to `errors="replace"`, you might want to include a brief marker or debug log when replacements occur so it’s easier to diagnose unexpected encoding issues in the pip output.

## Individual Comments

### Comment 1
<location> `astrbot/core/utils/pip_installer.py:11-24` </location>
<code_context>

+def _robust_decode(line: bytes) -> str:
+    """解码字节流,兼容不同平台的编码"""
+    try:
+        return line.decode("utf-8").strip()
+    except UnicodeDecodeError:
+        pass
+    try:
+        return line.decode(locale.getpreferredencoding(False)).strip()
+    except UnicodeDecodeError:
+        pass
+    if sys.platform.startswith("win"):
+        try:
+            return line.decode("gbk").strip()
+        except UnicodeDecodeError:
+            pass
+    return line.decode("utf-8", errors="replace").strip()
+
+
</code_context>

<issue_to_address>
**suggestion:** Consider using `errors="replace"` (or similar) for the locale/Windows-specific decodes as well to retain more readable output.

Currently only the final UTF-8 decode uses `errors="replace"`; the locale and Windows GBK decodes are strict and then fall back to UTF-8 with replacement. For non-UTF-8 output this can create unnecessary mojibake. Consider allowing the locale/GBK decodes to use `errors="replace"` (or `"ignore"`) so they succeed when mostly valid, and keep the final UTF-8 fallback for cases where all other decodes fail.

```suggestion
    try:
        return line.decode("utf-8").strip()
    except UnicodeDecodeError:
        pass
    try:
        # 使用系统首选编码,允许替换非法字符以尽量保留可读输出
        return line.decode(locale.getpreferredencoding(False), errors="replace").strip()
    except UnicodeDecodeError:
        pass
    if sys.platform.startswith("win"):
        try:
            # Windows 下常见 GBK 编码,同样使用替换策略
            return line.decode("gbk", errors="replace").strip()
        except UnicodeDecodeError:
            pass
    # 最后的兜底仍然使用 UTF-8 + replace,确保不会抛出异常
    return line.decode("utf-8", errors="replace").strip()
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ocetars ocetars changed the title fix(utils): 优化 pip 安装过程中的字符解码逻辑以兼容多平台编码 fix(#4214): 优化 pip 安装过程中的字符解码逻辑以兼容多平台编码 Dec 29, 2025
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 29, 2025
@Soulter Soulter changed the title fix(#4214): 优化 pip 安装过程中的字符解码逻辑以兼容多平台编码 fix(#4214): optimize pip install output decoding for cross-platform encoding compatibility Dec 29, 2025
@Soulter Soulter merged commit ef1feb6 into AstrBotDevs:master Dec 29, 2025
6 checks passed
@ocetars ocetars deleted the fix/bugs branch December 29, 2025 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]

2 participants