fix(#4214): optimize pip install output decoding for cross-platform encoding compatibility #4249

ocetars · 2025-12-29T08:57:50Z

Fixes #4214

Summary by Sourcery

Bug Fixes:

通过添加更健壮、与平台相关的解码策略，在解码 pip 安装输出时避免出现 UnicodeDecodeError。

Original summary in English

Summary by Sourcery

Bug Fixes:

Prevent UnicodeDecodeError when decoding pip installation output by adding a more robust, platform-aware decoding strategy.

Original summary in English

Summary by Sourcery

Bug Fixes:

通过添加更健壮、与平台相关的解码策略，在解码 pip 安装输出时避免出现 UnicodeDecodeError。

Original summary in English

Summary by Sourcery

Bug Fixes:

Prevent UnicodeDecodeError when decoding pip installation output by adding a more robust, platform-aware decoding strategy.

sourcery-ai

你好——我发现了 1 个问题，并给出了一些整体反馈：

建议在模块级缓存 locale.getpreferredencoding(False) 和 sys.platform.startswith("win") 的检查，这样 _robust_decode 在处理每一行输出时就不需要重复进行这些基本静态的查询。
当回退到使用 errors="replace" 时，可以考虑在发生替换时加入一个简单的标记或调试日志，这样在排查 pip 输出中的意外编码问题时会更容易。

给 AI Agent 的提示

Please address the comments from this code review:

## Overall Comments
- Consider caching `locale.getpreferredencoding(False)` and the `sys.platform.startswith("win")` check at module level so `_robust_decode` doesn’t repeat these relatively static lookups for every output line.
- When falling back to `errors="replace"`, you might want to include a brief marker or debug log when replacements occur so it’s easier to diagnose unexpected encoding issues in the pip output.

## Individual Comments

### Comment 1
<location> `astrbot/core/utils/pip_installer.py:11-24` </location>
<code_context>

+def _robust_decode(line: bytes) -> str:
+    """解码字节流，兼容不同平台的编码"""
+    try:
+        return line.decode("utf-8").strip()
+    except UnicodeDecodeError:
+        pass
+    try:
+        return line.decode(locale.getpreferredencoding(False)).strip()
+    except UnicodeDecodeError:
+        pass
+    if sys.platform.startswith("win"):
+        try:
+            return line.decode("gbk").strip()
+        except UnicodeDecodeError:
+            pass
+    return line.decode("utf-8", errors="replace").strip()
+
+
</code_context>

<issue_to_address>
**suggestion:** 建议在本地编码和 Windows 特定的解码中同样使用 `errors="replace"`（或类似方式），以保留更多可读输出。

目前只有最后一次 UTF-8 解码使用了 `errors="replace"`；本地编码和 Windows 的 GBK 解码都是严格解码，然后再回退到带替换的 UTF-8。对于非 UTF-8 的输出，这可能会产生不必要的乱码。可以考虑让本地编码/GBK 解码也使用 `errors="replace"`（或 `"ignore"`），这样在大部分内容有效的情况下也能成功解码，并保留最后的 UTF-8 兜底，用于所有其他解码都失败的情形。

```suggestion
    try:
        return line.decode("utf-8").strip()
    except UnicodeDecodeError:
        pass
    try:
        # 使用系统首选编码，允许替换非法字符以尽量保留可读输出
        return line.decode(locale.getpreferredencoding(False), errors="replace").strip()
    except UnicodeDecodeError:
        pass
    if sys.platform.startswith("win"):
        try:
            # Windows 下常见 GBK 编码，同样使用替换策略
            return line.decode("gbk", errors="replace").strip()
        except UnicodeDecodeError:
            pass
    # 最后的兜底仍然使用 UTF-8 + replace，确保不会抛出异常
    return line.decode("utf-8", errors="replace").strip()
```
</issue_to_address>

Sourcery 对开源项目免费——如果你喜欢我们的代码审查，欢迎分享 ✨

_{帮我变得更有用！请在每条评论上点 👍 或 👎，我会根据你的反馈改进后续的代码审查。}

Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

Consider caching locale.getpreferredencoding(False) and the sys.platform.startswith("win") check at module level so _robust_decode doesn’t repeat these relatively static lookups for every output line.
When falling back to errors="replace", you might want to include a brief marker or debug log when replacements occur so it’s easier to diagnose unexpected encoding issues in the pip output.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider caching `locale.getpreferredencoding(False)` and the `sys.platform.startswith("win")` check at module level so `_robust_decode` doesn’t repeat these relatively static lookups for every output line.
- When falling back to `errors="replace"`, you might want to include a brief marker or debug log when replacements occur so it’s easier to diagnose unexpected encoding issues in the pip output.

## Individual Comments

### Comment 1
<location> `astrbot/core/utils/pip_installer.py:11-24` </location>
<code_context>

+def _robust_decode(line: bytes) -> str:
+    """解码字节流，兼容不同平台的编码"""
+    try:
+        return line.decode("utf-8").strip()
+    except UnicodeDecodeError:
+        pass
+    try:
+        return line.decode(locale.getpreferredencoding(False)).strip()
+    except UnicodeDecodeError:
+        pass
+    if sys.platform.startswith("win"):
+        try:
+            return line.decode("gbk").strip()
+        except UnicodeDecodeError:
+            pass
+    return line.decode("utf-8", errors="replace").strip()
+
+
</code_context>

<issue_to_address>
**suggestion:** Consider using `errors="replace"` (or similar) for the locale/Windows-specific decodes as well to retain more readable output.

Currently only the final UTF-8 decode uses `errors="replace"`; the locale and Windows GBK decodes are strict and then fall back to UTF-8 with replacement. For non-UTF-8 output this can create unnecessary mojibake. Consider allowing the locale/GBK decodes to use `errors="replace"` (or `"ignore"`) so they succeed when mostly valid, and keep the final UTF-8 fallback for cases where all other decodes fail.

```suggestion
    try:
        return line.decode("utf-8").strip()
    except UnicodeDecodeError:
        pass
    try:
        # 使用系统首选编码，允许替换非法字符以尽量保留可读输出
        return line.decode(locale.getpreferredencoding(False), errors="replace").strip()
    except UnicodeDecodeError:
        pass
    if sys.platform.startswith("win"):
        try:
            # Windows 下常见 GBK 编码，同样使用替换策略
            return line.decode("gbk", errors="replace").strip()
        except UnicodeDecodeError:
            pass
    # 最后的兜底仍然使用 UTF-8 + replace，确保不会抛出异常
    return line.decode("utf-8", errors="replace").strip()
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

astrbot/core/utils/pip_installer.py

fix(utils): 优化 pip 安装过程中的字符解码逻辑以兼容多平台编码

2130b23

auto-assign bot requested review from LIghtJUNction and Raven95676 December 29, 2025 08:57

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend labels Dec 29, 2025

sourcery-ai bot reviewed Dec 29, 2025

View reviewed changes

astrbot/core/utils/pip_installer.py Show resolved Hide resolved

ocetars changed the title ~~fix(utils): 优化 pip 安装过程中的字符解码逻辑以兼容多平台编码~~ fix(#4214): 优化 pip 安装过程中的字符解码逻辑以兼容多平台编码 Dec 29, 2025

Soulter approved these changes Dec 29, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 29, 2025

Soulter changed the title ~~fix(#4214): 优化 pip 安装过程中的字符解码逻辑以兼容多平台编码~~ fix(#4214): optimize pip install output decoding for cross-platform encoding compatibility Dec 29, 2025

Soulter merged commit ef1feb6 into AstrBotDevs:master Dec 29, 2025
6 checks passed

ocetars deleted the fix/bugs branch December 29, 2025 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(#4214): optimize pip install output decoding for cross-platform encoding compatibility #4249

fix(#4214): optimize pip install output decoding for cross-platform encoding compatibility #4249

Uh oh!

ocetars commented Dec 29, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix(#4214): optimize pip install output decoding for cross-platform encoding compatibility #4249

fix(#4214): optimize pip install output decoding for cross-platform encoding compatibility #4249

Uh oh!

Conversation

ocetars commented Dec 29, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by Sourcery

Summary by Sourcery

Summary by Sourcery

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ocetars commented Dec 29, 2025 •

edited by sourcery-ai bot

Loading