-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Update satori_adapter.py - 优化XML解析逻辑,移除正则表达式依赖 #2785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你好 - 我已经审阅了你的更改 - 这里有一些反馈:
- 自定义的
_extract_namespace_prefixes方法可能会遗漏或错误处理复杂的命名空间场景——考虑使用 ElementTree 或 lxml 内置的命名空间支持,而不是手动字符串扫描。 - 通过
content.replace()移除<quote>可能会意外删除不必要的文本;更安全的方法是从解析后的 XML 树中直接删除该元素,然后重新序列化。 - 注入
<root>和命名空间声明的包装逻辑在多个方法中重复——将其提取到一个单独的辅助函数中,以减少重复并确保一致性。
供 AI 代理使用的提示
请解决此代码审查中的评论:
## 总体评论
- 自定义的 `_extract_namespace_prefixes` 方法可能会遗漏或错误处理复杂的命名空间场景——考虑使用 ElementTree 或 lxml 内置的命名空间支持,而不是手动字符串扫描。
- 通过 `content.replace()` 移除 `<quote>` 可能会意外删除不必要的文本;更安全的方法是从解析后的 XML 树中直接删除该元素,然后重新序列化。
- 注入 `<root>` 和命名空间声明的包装逻辑在多个方法中重复——将其提取到一个单独的辅助函数中,以减少重复并确保一致性。
## 单独评论
### 评论 1
<location> `astrbot/core/platform/sources/satori/satori_adapter.py:475-476` </location>
<code_context>
+ inner_content += child.tail
+
+ # 构造移除了<quote>标签的内容
+ content_without_quote = content.replace(
+ ET.tostring(quote_element, encoding='unicode', method='xml'), "")
+
+ return {
</code_context>
<issue_to_address>
**issue (bug_risk):** 使用字符串替换来移除 <quote> 可能不适用于所有情况。
如果 XML 格式发生变化,使用字符串替换可能会失败。请使用 XML 解析器可靠地移除 <quote> 元素。
</issue_to_address>
### 评论 2
<location> `astrbot/core/platform/sources/satori/satori_adapter.py:452-459` </location>
<code_context>
+ quote_element = None
+ for elem in root.iter():
+ tag_name = elem.tag
+ if '}' in tag_name:
+ tag_name = tag_name.split('}')[1]
+ if tag_name.lower() == "quote":
</code_context>
<issue_to_address>
**suggestion:** 命名空间剥离假设命名空间标签中始终存在 '}'。
如果 XML 格式发生变化或使用不同的分隔符,此方法可能会失效。使用 xml.etree.ElementTree 的命名空间处理将使代码更健壮。
```suggestion
quote_element = None
for elem in root.iter():
# Use ElementTree's QName to get the local name robustly
try:
local_name = ET.QName(elem.tag).localname
except AttributeError:
# Fallback for older Python versions or non-namespaced tags
local_name = elem.tag.split('}')[-1] if '}' in elem.tag else elem.tag
if local_name.lower() == "quote":
quote_element = elem
break
```
</issue_to_address>
### 评论 3
<location> `astrbot/core/platform/sources/satori/satori_adapter.py:605` </location>
<code_context>
- elements.append(Record.fromBase64(src))
- elif src.startswith("http"):
- elements.append(Record.fromURL(src))
+ elements.append(Record(file=src))
+
+ elif tag_name == "quote":
</code_context>
<issue_to_address>
**issue (bug_risk):** 音频/记录处理现在仅使用文件参数,丢失了以前的格式检测。
此更新移除了特定于格式的处理,这可能会影响对 base64 和 URL 源的支持。请确认此更改是否是故意的,并审查可能存在的回归。
</issue_to_address>
### 评论 4
<location> `astrbot/core/platform/sources/satori/satori_adapter.py:382` </location>
<code_context>
def _extract_namespace_prefixes(self, content: str) -> set:
"""提取XML内容中的命名空间前缀"""
prefixes = set()
# 查找所有标签
i = 0
while i < len(content):
# 查找开始标签
if content[i] == '<' and i + 1 < len(content) and content[i + 1] != '/':
# 找到标签结束位置
tag_end = content.find('>', i)
if tag_end != -1:
# 提取标签内容
tag_content = content[i + 1:tag_end]
# 检查是否有命名空间前缀
if ':' in tag_content and 'xmlns:' not in tag_content:
# 分割标签名
parts = tag_content.split()
if parts:
tag_name = parts[0]
if ':' in tag_name:
prefix = tag_name.split(':')[0]
# 确保是有效的命名空间前缀
if prefix.isalnum() or prefix.replace('_', '').isalnum():
prefixes.add(prefix)
i = tag_end + 1
else:
i += 1
# 查找结束标签
elif content[i] == '<' and i + 1 < len(content) and content[i + 1] == '/':
# 找到标签结束位置
tag_end = content.find('>', i)
if tag_end != -1:
# 提取标签内容
tag_content = content[i + 2:tag_end]
# 检查是否有命名空间前缀
if ':' in tag_content:
prefix = tag_content.split(':')[0]
# 确保是有效的命名空间前缀
if prefix.isalnum() or prefix.replace('_', '').isalnum():
prefixes.add(prefix)
i = tag_end + 1
else:
i += 1
else:
i += 1
return prefixes
</code_context>
<issue_to_address>
**issue (code-quality):** 我们发现了这些问题:
- 使用命名表达式简化赋值和条件 ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- 移除冗余条件 ([`remove-redundant-if`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-redundant-if/))
- 在 SatoriPlatformAdapter.\_extract\_namespace\_prefixes 中发现低代码质量 - 16% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>解释</summary>
此函数的质量得分低于 25% 的质量阈值。
此得分是方法长度、认知复杂度和工作内存的组合。
如何解决这个问题?
重构此函数以使其更短、更具可读性可能是有益的。
- 通过将部分功能提取到自己的函数中来减少函数长度。这是你能做的最重要的事情——理想情况下,一个函数应该少于 10 行。
- 减少嵌套,例如通过引入守卫子句来提前返回。
- 确保变量的作用域紧密,以便使用相关概念的代码在函数内部紧密地放在一起,而不是分散开来。</details>
</issue_to_address>
### 评论 5
<location> `astrbot/core/platform/sources/satori/satori_adapter.py:498` </location>
<code_context>
async def _convert_quote_message(self, quote: dict) -> Optional[AstrBotMessage]:
"""转换引用消息"""
try:
quote_abm = AstrBotMessage()
quote_abm.message_id = quote.get("id", "")
# 解析引用消息的发送者
quote_author = quote.get("author", {})
if quote_author:
quote_abm.sender = MessageMember(
user_id=quote_author.get("id", ""),
nickname=quote_author.get("nick", quote_author.get("name", "")),
)
else:
# 如果没有作者信息,使用默认值
quote_abm.sender = MessageMember(
user_id=quote.get("user_id", ""),
nickname="内容",
)
# 解析引用消息内容
quote_content = quote.get("content", "")
quote_abm.message = await self.parse_satori_elements(quote_content)
quote_abm.message_str = ""
for comp in quote_abm.message:
if isinstance(comp, Plain):
quote_abm.message_str += comp.text
quote_abm.timestamp = int(quote.get("timestamp", time.time()))
# 如果没有任何内容,使用默认文本
if not quote_abm.message_str.strip():
quote_abm.message_str = "[引用消息]"
return quote_abm
except Exception as e:
logger.error(f"转换引用消息失败: {e}")
return None
</code_context>
<issue_to_address>
**issue (code-quality):** 使用命名表达式简化赋值和条件 ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
</issue_to_address>帮助我更有用!请点击每个评论上的 👍 或 👎,我将使用这些反馈来改进你的评论。
Original comment in English
Hey there - I've reviewed your changes - here's some feedback:
- The custom
_extract_namespace_prefixesapproach may miss or mis-handle complex namespace scenarios—consider using ElementTree’s or lxml’s built-in namespace support instead of manual string scanning. - Dropping the
<quote>viacontent.replace()risks removing unintended text; it’s safer to remove the element directly from the parsed XML tree and then re-serialize. - The wrapping logic that injects
<root>and namespace declarations is duplicated in several methods—extract it into a single helper to reduce repetition and ensure consistency.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The custom `_extract_namespace_prefixes` approach may miss or mis-handle complex namespace scenarios—consider using ElementTree’s or lxml’s built-in namespace support instead of manual string scanning.
- Dropping the `<quote>` via `content.replace()` risks removing unintended text; it’s safer to remove the element directly from the parsed XML tree and then re-serialize.
- The wrapping logic that injects `<root>` and namespace declarations is duplicated in several methods—extract it into a single helper to reduce repetition and ensure consistency.
## Individual Comments
### Comment 1
<location> `astrbot/core/platform/sources/satori/satori_adapter.py:475-476` </location>
<code_context>
+ inner_content += child.tail
+
+ # 构造移除了<quote>标签的内容
+ content_without_quote = content.replace(
+ ET.tostring(quote_element, encoding='unicode', method='xml'), "")
+
+ return {
</code_context>
<issue_to_address>
**issue (bug_risk):** Using string replace to remove <quote> may not work for all cases.
Using string replacement may fail if XML formatting changes. Use an XML parser to reliably remove the <quote> element.
</issue_to_address>
### Comment 2
<location> `astrbot/core/platform/sources/satori/satori_adapter.py:452-459` </location>
<code_context>
+ quote_element = None
+ for elem in root.iter():
+ tag_name = elem.tag
+ if '}' in tag_name:
+ tag_name = tag_name.split('}')[1]
+ if tag_name.lower() == "quote":
</code_context>
<issue_to_address>
**suggestion:** Namespace stripping assumes '}' is always present for namespaced tags.
This approach may break if the XML format changes or uses different delimiters. Using xml.etree.ElementTree's namespace handling would make the code more robust.
```suggestion
quote_element = None
for elem in root.iter():
# Use ElementTree's QName to get the local name robustly
try:
local_name = ET.QName(elem.tag).localname
except AttributeError:
# Fallback for older Python versions or non-namespaced tags
local_name = elem.tag.split('}')[-1] if '}' in elem.tag else elem.tag
if local_name.lower() == "quote":
quote_element = elem
break
```
</issue_to_address>
### Comment 3
<location> `astrbot/core/platform/sources/satori/satori_adapter.py:605` </location>
<code_context>
- elements.append(Record.fromBase64(src))
- elif src.startswith("http"):
- elements.append(Record.fromURL(src))
+ elements.append(Record(file=src))
+
+ elif tag_name == "quote":
</code_context>
<issue_to_address>
**issue (bug_risk):** Audio/record handling now only uses the file parameter, losing previous format detection.
The update removes format-specific handling, which may affect support for base64 and URL sources. Please confirm if this change is intentional and review for possible regressions.
</issue_to_address>
### Comment 4
<location> `astrbot/core/platform/sources/satori/satori_adapter.py:382` </location>
<code_context>
def _extract_namespace_prefixes(self, content: str) -> set:
"""提取XML内容中的命名空间前缀"""
prefixes = set()
# 查找所有标签
i = 0
while i < len(content):
# 查找开始标签
if content[i] == '<' and i + 1 < len(content) and content[i + 1] != '/':
# 找到标签结束位置
tag_end = content.find('>', i)
if tag_end != -1:
# 提取标签内容
tag_content = content[i + 1:tag_end]
# 检查是否有命名空间前缀
if ':' in tag_content and 'xmlns:' not in tag_content:
# 分割标签名
parts = tag_content.split()
if parts:
tag_name = parts[0]
if ':' in tag_name:
prefix = tag_name.split(':')[0]
# 确保是有效的命名空间前缀
if prefix.isalnum() or prefix.replace('_', '').isalnum():
prefixes.add(prefix)
i = tag_end + 1
else:
i += 1
# 查找结束标签
elif content[i] == '<' and i + 1 < len(content) and content[i + 1] == '/':
# 找到标签结束位置
tag_end = content.find('>', i)
if tag_end != -1:
# 提取标签内容
tag_content = content[i + 2:tag_end]
# 检查是否有命名空间前缀
if ':' in tag_content:
prefix = tag_content.split(':')[0]
# 确保是有效的命名空间前缀
if prefix.isalnum() or prefix.replace('_', '').isalnum():
prefixes.add(prefix)
i = tag_end + 1
else:
i += 1
else:
i += 1
return prefixes
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Remove redundant conditional ([`remove-redundant-if`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-redundant-if/))
- Low code quality found in SatoriPlatformAdapter.\_extract\_namespace\_prefixes - 16% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>
### Comment 5
<location> `astrbot/core/platform/sources/satori/satori_adapter.py:498` </location>
<code_context>
async def _convert_quote_message(self, quote: dict) -> Optional[AstrBotMessage]:
"""转换引用消息"""
try:
quote_abm = AstrBotMessage()
quote_abm.message_id = quote.get("id", "")
# 解析引用消息的发送者
quote_author = quote.get("author", {})
if quote_author:
quote_abm.sender = MessageMember(
user_id=quote_author.get("id", ""),
nickname=quote_author.get("nick", quote_author.get("name", "")),
)
else:
# 如果没有作者信息,使用默认值
quote_abm.sender = MessageMember(
user_id=quote.get("user_id", ""),
nickname="内容",
)
# 解析引用消息内容
quote_content = quote.get("content", "")
quote_abm.message = await self.parse_satori_elements(quote_content)
quote_abm.message_str = ""
for comp in quote_abm.message:
if isinstance(comp, Plain):
quote_abm.message_str += comp.text
quote_abm.timestamp = int(quote.get("timestamp", time.time()))
# 如果没有任何内容,使用默认文本
if not quote_abm.message_str.strip():
quote_abm.message_str = "[引用消息]"
return quote_abm
except Exception as e:
logger.error(f"转换引用消息失败: {e}")
return None
</code_context>
<issue_to_address>
**issue (code-quality):** Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| content_without_quote = content.replace( | ||
| ET.tostring(quote_element, encoding='unicode', method='xml'), "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): 使用字符串替换来移除 可能不适用于所有情况。
如果 XML 格式发生变化,使用字符串替换可能会失败。请使用 XML 解析器可靠地移除 元素。
Original comment in English
issue (bug_risk): Using string replace to remove may not work for all cases.
Using string replacement may fail if XML formatting changes. Use an XML parser to reliably remove the element.
| quote_element = None | ||
| for elem in root.iter(): | ||
| tag_name = elem.tag | ||
| if '}' in tag_name: | ||
| tag_name = tag_name.split('}')[1] | ||
| if tag_name.lower() == "quote": | ||
| quote_element = elem | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: 命名空间剥离假设命名空间标签中始终存在 '}'。
如果 XML 格式发生变化或使用不同的分隔符,此方法可能会失效。使用 xml.etree.ElementTree 的命名空间处理将使代码更健壮。
| quote_element = None | |
| for elem in root.iter(): | |
| tag_name = elem.tag | |
| if '}' in tag_name: | |
| tag_name = tag_name.split('}')[1] | |
| if tag_name.lower() == "quote": | |
| quote_element = elem | |
| break | |
| quote_element = None | |
| for elem in root.iter(): | |
| # Use ElementTree's QName to get the local name robustly | |
| try: | |
| local_name = ET.QName(elem.tag).localname | |
| except AttributeError: | |
| # Fallback for older Python versions or non-namespaced tags | |
| local_name = elem.tag.split('}')[-1] if '}' in elem.tag else elem.tag | |
| if local_name.lower() == "quote": | |
| quote_element = elem | |
| break |
Original comment in English
suggestion: Namespace stripping assumes '}' is always present for namespaced tags.
This approach may break if the XML format changes or uses different delimiters. Using xml.etree.ElementTree's namespace handling would make the code more robust.
| quote_element = None | |
| for elem in root.iter(): | |
| tag_name = elem.tag | |
| if '}' in tag_name: | |
| tag_name = tag_name.split('}')[1] | |
| if tag_name.lower() == "quote": | |
| quote_element = elem | |
| break | |
| quote_element = None | |
| for elem in root.iter(): | |
| # Use ElementTree's QName to get the local name robustly | |
| try: | |
| local_name = ET.QName(elem.tag).localname | |
| except AttributeError: | |
| # Fallback for older Python versions or non-namespaced tags | |
| local_name = elem.tag.split('}')[-1] if '}' in elem.tag else elem.tag | |
| if local_name.lower() == "quote": | |
| quote_element = elem | |
| break |
| logger.error(f"转换 Satori 消息失败: {e}") | ||
| return None | ||
|
|
||
| def _extract_namespace_prefixes(self, content: str) -> set: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): 我们发现了这些问题:
- 使用命名表达式简化赋值和条件 (
use-named-expression) - 移除冗余条件 (
remove-redundant-if) - 在 SatoriPlatformAdapter._extract_namespace_prefixes 中发现低代码质量 - 16% (
low-code-quality)
解释
此函数的质量得分低于 25% 的质量阈值。
此得分是方法长度、认知复杂度和工作内存的组合。
如何解决这个问题?
重构此函数以使其更短、更具可读性可能是有益的。
- 通过将部分功能提取到自己的函数中来减少函数长度。这是你能做的最重要的事情——理想情况下,一个函数应该少于 10 行。
- 减少嵌套,例如通过引入守卫子句来提前返回。
- 确保变量的作用域紧密,以便使用相关概念的代码在函数内部紧密地放在一起,而不是分散开来。
Original comment in English
issue (code-quality): We've found these issues:
- Use named expression to simplify assignment and conditional (
use-named-expression) - Remove redundant conditional (
remove-redundant-if) - Low code quality found in SatoriPlatformAdapter._extract_namespace_prefixes - 16% (
low-code-quality)
Explanation
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines. - Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.
| quote_abm.message_id = quote.get("id", "") | ||
|
|
||
| # 解析引用消息的发送者 | ||
| quote_author = quote.get("author", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): 使用命名表达式简化赋值和条件 (use-named-expression)
Original comment in English
issue (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)
承接自 #2686 (comment)
由于上次通过PR之后,我观察到 satori_adapter.py 文件没有得到更新,所以此前的改动内容与本次xml相关优化改动 在此次PR中一并提交
Motivation / 动机
此PR旨在优化Satori平台适配器中的XML解析逻辑,完全移除对正则表达式的依赖。
原先的代码在处理XML命名空间前缀和
<quote>标签时使用了正则表达式,不够安全和可靠。混合使用正则表达式和XML解析器导致代码不一致。
Modifications / 改动点
核心改动文件:
astrbot/core/platform/sources/satori/satori_adapter.py
主要改动:
_extract_namespace_prefixes方法,使用字符串遍历方式替代正则表达式提取XML命名空间前缀_extract_quote_element方法,使用统一的XML解析器处理quote标签Verification Steps / 验证步骤
发送包含XML标签的消息(如包含
<quote>、<at>、<img>等标签)验证消息能被正确解析和处理
特别测试包含命名空间前缀的XML内容是否能正确处理(例如音乐卡片、小程序卡片)
Screenshots or Test Results / 运行截图或测试结果
Compatibility & Breaking Changes / 兼容性与破坏性变更
此变更为内部实现优化,不改变任何对外接口和功能行为,完全向后兼容。
Checklist / 检查清单
requirements.txt和pyproject.toml文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations inrequirements.txtandpyproject.toml.Sourcery 总结
优化 Satori 平台适配器,移除所有基于正则表达式的 XML 解析并统一消息元素处理
新功能:
<quote>元素的专用提取并转换为 Reply 组件的功能改进:
__init__中引入PlatformMetadata初始化,并在meta()中返回它杂项:
ws://localhost:5140satori_adapter中移除所有剩余的正则表达式依赖Original summary in English
Summary by Sourcery
Optimize the Satori platform adapter by removing all regex-based XML parsing and unifying message element handling
New Features:
Enhancements:
Chores: