Skip to content

<fix>[core]: synchronize consistent hash ring to prevent dual-MN race condition#3332

Open
MatheMatrix wants to merge 1 commit into5.5.6from
sync/ye.zou/fix/ZSTAC-77711
Open

<fix>[core]: synchronize consistent hash ring to prevent dual-MN race condition#3332
MatheMatrix wants to merge 1 commit into5.5.6from
sync/ye.zou/fix/ZSTAC-77711

Conversation

@MatheMatrix
Copy link
Owner

Summary

  • ZSTAC-77711: 双 MN 一致性哈希环出现不一致,消息路由到错误 MN,导致 UI 任务卡顿
  • 根因:nodeJoin/nodeLeft/iJoin 和 makeDestination 等方法无同步,心跳线程与事件线程并发修改 nodeHash 和 nodes
  • 修复:所有读写 nodeHash/nodes 的方法加 synchronized lock,getManagementNodesInHashRing/getAllNodeInfo 返回防御性拷贝
  • 额外修复 getNodeInfo 中 nodes.put() 返回值 bug

Files Changed

  • ResourceDestinationMakerImpl.java — synchronized lock on all methods

Resolves: ZSTAC-77711

sync from gitlab !9154

… in dual-MN hash ring

Resolves: ZSTAC-77711

Change-Id: I000273d162bd7f129256904c14c75d4d6e60cb18
@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

概览

添加了专用锁对象和同步块,保护ResourceDestinationMakerImpl中的共享状态访问。多个方法现在在修改或读取节点哈希表和映射时执行同步操作,提升线程安全性。

变更详情

产品/文件 变更摘要
线程安全增强
core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java
新增私有锁对象,为nodeJoinnodeLeftiAmDeadiJoinmakeDestinationgetManagementNodesInHashRinggetNodeInfogetAllNodeInfogetManagementNodeCountisNodeInCircle等方法添加同步块,确保对共享节点数据结构的并发访问安全,同时保持现有异常处理行为。

预估代码审查工作量

🎯 3 (中等) | ⏱️ ~20 分钟

🐰 呦呦呦,来看新锁定
多线程的舞蹈变得有序,
共享状态不再混乱无章,
同步保驾护航,
节点在哈希环上安全漫步!


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)
Check name Status Explanation Resolution
Title check ❌ Error PR标题超过72个字符的限制(实际79字符),不符合format要求的字符数限制。 请将标题缩短至72字符以内,例如:'[core]: synchronize hash ring to prevent MN race condition'
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed PR描述详细说明了问题根因、修复方案和涉及文件,与changeset高度相关。
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sync/ye.zou/fix/ZSTAC-77711

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java (2)

56-65: 在同步块内执行数据库查询可能影响并发性能。

iJoin 方法在持有锁的同时执行 Q.New(ManagementNodeVO.class).list() 数据库查询。虽然节点加入是低频操作,但如果数据库响应较慢,可能会阻塞其他等待锁的线程(如心跳线程)。

可以考虑将 DB 查询移到同步块外部:

♻️ 可选优化:将 DB 查询移出同步块
 `@Override`
 public void iJoin(ManagementNodeInventory inv) {
+    List<ManagementNodeVO> lst = Q.New(ManagementNodeVO.class).list();
     synchronized (lock) {
-        List<ManagementNodeVO> lst = Q.New(ManagementNodeVO.class).list();
         lst.forEach((ManagementNodeVO node) -> {
             nodeHash.add(node.getUuid());
             nodes.put(node.getUuid(), new NodeInfo(node));
         });
     }
 }

92-109: 缓存未命中时在同步块内执行 DB 查询可能造成锁竞争。

getNodeInfo 方法在缓存未命中时执行 dbf.findByUuid 数据库查询,同时持有锁。如果该方法被频繁调用且缓存未命中,可能导致其他线程长时间等待。

可以考虑使用双重检查锁定模式减少锁竞争:

♻️ 可选优化:双重检查锁定模式
 `@Override`
 public NodeInfo getNodeInfo(String nodeUuid) {
+    // First check without lock
+    synchronized (lock) {
+        NodeInfo info = nodes.get(nodeUuid);
+        if (info != null) {
+            return info;
+        }
+    }
+
+    // Query DB outside lock
+    ManagementNodeVO vo = dbf.findByUuid(nodeUuid, ManagementNodeVO.class);
+    if (vo == null) {
+        throw new ManagementNodeNotFoundException(nodeUuid);
+    }
+
+    // Re-acquire lock and double-check before inserting
     synchronized (lock) {
-        NodeInfo info = nodes.get(nodeUuid);
+        NodeInfo info = nodes.get(nodeUuid);
         if (info == null) {
-            ManagementNodeVO vo = dbf.findByUuid(nodeUuid, ManagementNodeVO.class);
-            if (vo == null) {
-                throw new ManagementNodeNotFoundException(nodeUuid);
-            }
-
             nodeHash.add(nodeUuid);
             info = new NodeInfo(vo);
             nodes.put(nodeUuid, info);
         }
-
         return info;
     }
 }

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants