Skip to content

Conversation

@eatvector
Copy link
Contributor

@eatvector eatvector commented Jul 13, 2025

拉取/合并请求描述:(PR description)

修复用户态信号处理函数循环调用导致的执行流中断问题,关联#10501
[

为什么提交这份PR (why to submit this PR)

当前risc-v64信号处理机制存在风险,可能导致线程陷入无法退出的信号处理循环。这一设计缺陷会直接影响线程的正常执行流程,可能造成线程无法正常终止。

问题的典型场景出现在线程取消操作中。当线程A调用pthread_cancel试图取消线程B时,会向线程B发送SIGCANCEL信号。若线程B在执行地址addr处的代码后被信号中断,内核会保存当前上下文,并将控制流重定向到cancel_handler信号处理函数。

危险的情况在于,cancel_handler内部可能会触发新的SIGCANCEL信号,系统就会陷入一个处理循环。具体表现为:线程B执行完cancel_handler后,通过lwp_sigreturn触发系统调用返回内核;内核检测到cancel_handler内部发送的等待处理的SIGCANCEL信号后,会再次将控制流指向cancel_handler,而不是恢复原先被中断的执行点addr,如此形成无法终止的循环。

这种机制缺陷会导致线程B永远只能执行cancel_handler和lwp_sigreturn的代码,无法继续执行addr之后的代码。由于线程无法执行到取消点,pthread_exit也就永远不会被触发,线程B将无法被正常终止。

你的解决方案是什么 (what is your solution)

避免在arch_signal_quit信号处理返回路径中立即处理新信号,而是延迟到下次陷入内核重新返回用户态时,使得线程有机会从第一次被信号中断的地方执行后续代码。

请提供验证的bsp和config (provide the config and bsp)

  • BSP:
  • .config:
  • action:

]

当前拉取/合并请求的状态 Intent for your PR

必须选择一项 Choose one (Mandatory):

  • 本拉取/合并请求是一个草稿版本 This PR is for a code-review and is intended to get feedback
  • 本拉取/合并请求是一个成熟版本 This PR is mature, and ready to be integrated into the repo

代码质量 Code Quality:

我在这个拉取/合并请求中已经考虑了 As part of this pull request, I've considered the following:

  • 已经仔细查看过代码改动的对比 Already check the difference between PR and old code
  • 代码风格正确,包括缩进空格,命名及其他风格 Style guide is adhered to, including spacing, naming and other styles
  • 没有垃圾代码,代码尽量精简,不包含#if 0代码,不包含已经被注释了的代码 All redundant code is removed and cleaned up
  • 所有变更均有原因及合理的,并且不会影响到其他软件组件代码或BSP All modifications are justified and not affect other components or BSP
  • 对难懂代码均提供对应的注释 I've commented appropriately where code is tricky
  • 代码是高质量的 Code in this PR is of high quality
  • 已经使用formatting 等源码格式化工具确保格式符合RT-Thread代码规范 This PR complies with RT-Thread code specification
  • 如果是新增bsp, 已经添加ci检查到.github/workflows/bsp_buildings.yml 详细请参考链接BSP自查

@eatvector eatvector requested a review from BernardXiong as a code owner July 13, 2025 06:23
@github-actions github-actions bot added RT-Smart RT-Thread Smart related PR or issues component: lwp Component labels Jul 13, 2025
@github-actions
Copy link

github-actions bot commented Jul 13, 2025

📌 Code Review Assignment

🏷️ Tag: components

Reviewers: Maihuanyi

Changed Files (Click to expand)
  • components/lwp/arch/risc-v/rv64/lwp_gcc.S

🏷️ Tag: components_lwp

Reviewers: xu18838022837

Changed Files (Click to expand)
  • components/lwp/arch/risc-v/rv64/lwp_gcc.S

📊 Current Review Status (Last Updated: 2025-07-18 17:11 CST)

  • Maihuanyi Pending Review
  • xu18838022837 Pending Review

📝 Review Instructions

  1. 维护者可以通过单击此处来刷新审查状态: 🔄 刷新状态
    Maintainers can refresh the review status by clicking here: 🔄 Refresh Status

  2. 确认审核通过后评论 LGTM/lgtm
    Comment LGTM/lgtm after confirming approval

  3. PR合并前需至少一位维护者确认
    PR must be confirmed by at least one maintainer before merging

ℹ️ 刷新CI状态操作需要具备仓库写入权限。
ℹ️ Refresh CI status operation requires repository Write permission.

@eatvector eatvector changed the title [lwp][rv64] riscv: fix potential signal handler infinite loop [lwp][rv64] fix potential signal handler infinite loop Jul 13, 2025

RESTORE_ALL
SAVE_ALL
j arch_ret_to_user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以结合下aarch64的情况,查看下这部分如何处理比较合适

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

您好,aarch64和riscv64的处理都是类似的,避免在arch_signal_quit->arch_ret_to_user->lwp_thread_signal_catch 这条调用链上使用lwp_thread_signal_catch 处理信号即可,其余部分保持不变。我尝试了三种可能的修改方案:

1 通过给arch_ret_to_user传入一个额外参数已指示是否需要调用lwp_thread_signal_catch,这种方法可以让线程通过统一的arch_ret_to_user返回用户态。但是所有调用arch_ret_to_user的地方都需要额外传入参数,实现示例链接:https://github.com/RT-Thread/rt-thread/compare/master...eatvector:rt-thread:demo0?expand=1

2 单独针对arch_signal_quit实现一个独立的返回用户态调用接口,与之前提交的方案类似,不过复用了arch_ret_to_user的代码已减少重复代码,这种方案应该是在riscv64和aarch64上最清晰,修改也最少的实现,示例链接:https://github.com/RT-Thread/rt-thread/compare/master...eatvector:rt-thread:demo1?expand=1

3 在线程结构体内部增加额外标记已指示线程是否通过arch_signal_quit返回用户态,如果该标志被设置,则在lwp_thread_signal_catch 不进行任何处理,这种方法所需修改的架构相关内容最少,但是得修改相关的c文件,以及相关结构体,实现示例:https://github.com/RT-Thread/rt-thread/compare/master...eatvector:rt-thread:demo3?expand=1

不知道您认为哪种修改方案更好,或者有其他更好的建议吗?

@eatvector eatvector force-pushed the fix/signal-handler-loop branch 2 times, most recently from c1b8586 to ee02769 Compare July 17, 2025 04:06
@kurisaW
Copy link
Member

kurisaW commented Jul 17, 2025

ci问题需要等该PR合并后即可正常:

@eatvector
Copy link
Contributor Author

ci问题需要等该PR合并后即可正常:

好的

@kurisaW kurisaW closed this Jul 18, 2025
@kurisaW kurisaW reopened this Jul 18, 2025
@eatvector eatvector force-pushed the fix/signal-handler-loop branch from ee02769 to a725673 Compare July 18, 2025 09:10
@Rbb666 Rbb666 merged commit ee1fe20 into RT-Thread:master Jul 23, 2025
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: lwp Component RT-Smart RT-Thread Smart related PR or issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants