<fix>[plugin-premium]: GPU/VM page keeps loading when shutting down or...#3549
<fix>[plugin-premium]: GPU/VM page keeps loading when shutting down or...#3549ZStack-Robot wants to merge 153 commits into5.5.6from
Conversation
…igration. support starting vm without nic in other clusters. Resolves/Related: ZSTAC-80468 Change-Id: I6370646f7a796265677a6a656c716f6867706d69
…isk offering Resolves: ZSTAC-74683 Change-Id: Id0339ed0221e92e506f60745cde972cc3ee6d9ae
When anti-split-brain check selects a disconnected MDS node, the HTTP call now times out after 30s instead of 5+ minutes, and automatically retries the next available MDS via tryNext mechanism. Resolves: ZSTAC-80595 Change-Id: I1be80f1b70cad1606eb38d1f0078c8f2781e6941
When MN restarts during a destroy operation, the hypervisor may report the VM as Stopped. Without this transition, the state machine throws an exception and the VM stays stuck in Destroying state forever. Resolves: ZSTAC-80620 Change-Id: I037edba70d145a44a88ce0d3573089182fedb162
…pacity Resolves: ZSTAC-79709 Change-Id: I45a2133bbb8c51c25ae3549d59e588976192a08d
Resolves: ZSTAC-78989 Change-Id: I0fe3a56ab724978944c69afadaab7ff7353e4c0f
Resolves: ZSTAC-82153 Change-Id: Ib51c2e21553277416d1a9444be55aca2aa4b2fc4
…fterAddIpAddress to prevent NPE during rollback Resolves: ZSTAC-81741 Change-Id: I53bcf20a10306afc7b6172da294d347b74e6c41f
Resolves: ZSTAC-81182 Change-Id: Id1bb642154dc66ae9995dcc4d9fc00cdce9bcaf8
<fix>[vm]: add Destroying->Stopped state transition See merge request zstackio/zstack!9156
<fix>[i18n]: improve snapshot error message for unattached volume See merge request zstackio/zstack!9192
<fix>[zbs]: reduce mds connect timeout and enable tryNext for volume clients See merge request zstackio/zstack!9153
<fix>[vm]: use max of virtual and actual size for root disk allocation See merge request zstackio/zstack!9155
…talling In dual management node scenarios, concurrent modifications to the consistent hash ring from heartbeat reconciliation and canonical event callbacks can cause NodeHash/Nodes inconsistency, leading to message routing failures and task timeouts. Fix: (1) synchronized all ResourceDestinationMakerImpl methods to ensure atomic nodeHash+nodes updates, (2) added lifecycleLock in ManagementNodeManagerImpl to serialize heartbeat reconciliation with event callbacks, (3) added two-round delayed confirmation before removing nodes from hash ring to avoid race with NodeJoin events. Resolves: ZSTAC-77711 Change-Id: I3d33d53595dd302784dff17417a5b25f2d0f3426
<fix>[network]: filter reserved IPs in getFreeIp See merge request zstackio/zstack!9170
<fix>[ceph]: apply over-provisioning ratio when releasing snapshot capacity See merge request zstackio/zstack!9162
<fix>[network]: add null check for L3 network system tags in API interceptor See merge request zstackio/zstack!9169
The mdsUrls field in ExternalPrimaryStorage config contains user:password@host format credentials. Add desensitization to mask credentials as ***@host in API/CLI output. Resolves: ZSTAC-80664 Change-Id: I94bdede5a1b52eb039de70efb5458693484405f7
Add ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT constant to CloudOperationsErrorCode for use in premium volumebackup module. Resolves: ZSTAC-82195 Change-Id: Ibc405876e1171b637cf76b91a6822574fb6e7811
<fix>[core]: synchronize consistent hash ring to prevent dual-MN race condition See merge request zstackio/zstack!9154
SyncTaskFuture constructor calls Context.current() unconditionally, triggering ServiceLoader for ContextStorageProvider even when telemetry is disabled. If sentry-opentelemetry-bootstrap jar is on classpath, ServiceLoader fails with "not a subtype" due to ClassLoader isolation in Tomcat, throwing ServiceConfigurationError (extends Error) that escapes all catch(Exception) blocks. 1. Why is this change necessary? MN startup crashes with ORG_ZSTACK_CORE_WORKFLOW_10001 because Context.current() triggers ServiceLoader unconditionally in SyncTaskFuture constructor, even when telemetry is disabled. 2. How does it address the problem? Only call Context.current() when isTelemetryEnabled() returns true, matching the existing guard pattern used in other DispatchQueueImpl code paths (lines 351, 1069). 3. Are there any side effects? None. When telemetry is disabled, parentContext was never used. # Summary of changes (by module): - core/thread: conditionalize Context.current() in SyncTaskFuture Related: ZSTAC-82275 Change-Id: I5c0e1f15769c746c630028a29df8cf1815620608
<fix>[thread]: guard Context.current() with telemetry check See merge request zstackio/zstack!9202
Resolves: ZSTAC-82195 Change-Id: I3d5e91d09d7c088d3c53e3839f8b32f4bce32dec
<fix>[loadBalancer]: block SLB deletion during grayscale upgrade See merge request zstackio/zstack!9187
<fix>[volumebackup]: add backup cancel timeout error code See merge request zstackio/zstack!9200
<fix>[core]: add @nologging to sensitive config fields See merge request zstackio/zstack!9171
…plit-brain When a management node departs, its VM skip-trace entries were immediately removed. If VMs were still being started by kvmagent, the next VM sync would falsely detect them as Stopped and trigger HA, causing split-brain. Fix: transfer departed MN skip-trace entries to an orphaned set with 10-minute TTL instead of immediate deletion. VMs in the orphaned set remain skip-traced until the TTL expires or they are explicitly continued, preventing false HA triggers during MN restart scenarios. Resolves: ZSTAC-80821 Change-Id: I3222e260b2d7b33dc43aba0431ce59a788566b34
…anup Resolves: ZSTAC-80821 Change-Id: I59284c4e69f5d2ee357b1836b7c243200e30949a
Resolves: ZSTAC-77544 Change-Id: I1f711bff9c1e87a8cbf6a2eb310ca6086f0f99ba
Resolves: ZSTAC-80821 Change-Id: Ia9a9597feceb96b3e6e22259e2d0be7bde8ae499
<fix>[conf]: use absolute path for ansible version check during upgrade See merge request zstackio/zstack!9356
…ostic Resolves: ZSTAC-82318 Change-Id: If68d507a3eeb3f48458d9fbf61c357339688d43b
setProcessedBytes/setTotalBytes/setSpeedBytesPerSecond renamed to
setProcessed/setTotal/setSpeed in unit-agnostic refactor.
Add setUnit("bytes") to preserve semantic clarity.
Related: ZSTAC-82318
Change-Id: Ied29ac01d488062a03d792378432522fbf0a523f
Co-Authored-By: Claude Opus 4.6 (1M context) <ye.zou@zstack.io>
…e_exporter Resolves: ZSTAC-82619 Change-Id: I581385808fe4d942369b98f175eb8fbbdb382a0d
…od (ZSTAC-71075) Resolves: ZSTAC-71075 Change-Id: I21ebd76ba332995252f67f4cacf78022587a215a
<feature>[longjob]: standardize progress detail fields to be unit-agnostic See merge request zstackio/zstack!9358
fix(ai): ZSTAC-68709 add eval task queuing with concurrency control See merge request zstackio/zstack!9295
…alization
CloudBusGson.httpGson does not disableHtmlEscaping(), causing single
quotes in error details to be escaped as \u0027. While valid JSON,
the SDK deserializes these literally, breaking string assertions in
tests like BatchAddBareMetal2ChassisCase that check error.details
with contains("'reboot' must be 'No'").
Restore JSONObjectUtil.toJsonString() for the sendReplyResponse error
path while keeping the centralized localization logic.
Co-Authored-By: ye.zou <ye.zou@zstack.io>
APIImpact DBImpact GlobalConfigImpact Resolves: ZSTAC-81969 Change-Id: I736e77646266696e646271766d7170627378796b
When vSwitchType is LinuxBridge, physicalInterface must not be null or empty. Add interceptor check in L2NetworkApiInterceptor and unit test in AttachL2NetworkCase. ZSTAC-83300 Change-Id: I333a5bac866354c366051fece84e447f089e2306 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ZSTAC-81797: iSCSI fencer use platform compact hostId instead of storage bdc id See merge request zstackio/zstack!9274
<fix>[conf]: retry rm -rf virtualenv to avoid race with zstack_service_exporter See merge request zstackio/zstack!9365
<fix>[network]: set nic ip out of l3 cidr scope See merge request zstackio/zstack!9374
Resolves: ZSTAC-81969 Change-Id: I7176617669636d7362616973787a68766c6d6579
…id loader init - GlobalErrorCodeI18nServiceImpl: add final fallback to error.code/empty string when both details and description are null, guaranteeing message is never null - Platform: use existing loader field instead of getComponentLoader() to avoid triggering ComponentLoader creation during early startup; move message fallback outside catch block so it always runs Co-Authored-By: ye.zou <ye.zou@zstack.io> Change-Id: I8571f05657dc2173cc232b511f505eedc68d714e
<fix>[vmScheduling]: change GET scheduling APIs from POST to GET method (ZSTAC-71075) See merge request zstackio/zstack!9373
<feature>[errorcode]: simplify i18n — guarantee message is never null See merge request zstackio/zstack!9354
Add libvirt.tls.enabled GlobalConfig and useTls field in MigrateVmCmd to support TLS-encrypted libvirt connections for migration and V2V. Resolves: ZSTAC-81343 Change-Id: I391fa36c0dd63c25c5d85d102bc3579c8eb3d685
<fix>[docs]: move to zstack/docs See merge request zstackio/zstack!9379
<fix>[l2network]: validate physicalInterface for LinuxBridge See merge request zstackio/zstack!9375
<feature>[kvm]: add libvirt TLS config See merge request zstackio/zstack!9317
…VMHost
1. CallBackNetworkChecker.stopAnsible() creates Ssh object but never
closes it, leaking JSch Session thread on every call.
2. KVMHost "check-host-is-taken-over" flow has the same pattern.
When a host continuously fails reconnect (e.g. sharedblock VG error),
each reconnect cycle leaks 1-2 SSH sessions. Over days this exhausts
heap memory (18000+ threads observed) causing MN OOM and Unknown status.
Fix: add finally { ssh.close() } to both code paths.
Resolves: ZSTAC-83305
Resolves: ZSTAC-82731
Change-Id: I786e766e79776d6b7a75786d776278696c76666f
<fix>[core,kvm]: fix SSH session leak in CallBackNetworkChecker and KVMHost See merge request zstackio/zstack!9398
when user define ip/netmask/gateway, then gateway must be in the cidr of ip/netamsl Resolves: ZSTAC-83321 Change-Id: I78616376707476666a7a72786a7062766272686b
<fix>[compute]: fix user define param error See merge request zstackio/zstack!9387
Walkthrough新增与修改大量功能与接口,覆盖外部服务配置、国际化错误码、长作业进度结构、网络 IPAM 与静态 IP 管理、KVM/Libvirt TLS 支持、数据库 schema 迁移与函数扩展、多个插件与后端行为调整,以及文档与安装脚本向 Python3.11 迁移等多处变更。 Changes
代码审查工作量评估🎯 5 (Critical) | ⏱️ ~120 分钟 兔子的诗
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
…r encountering errors in Zaku cluster. Resolves: ZSTAC-80202 Change-Id: I7778676171646874706164777869707279776172
…r encountering errors in Zaku cluster. Resolves: ZSTAC-80202 Change-Id: I7778676171646874706164777869707279776172
63007bb to
107fb32
Compare
There was a problem hiding this comment.
Actionable comments posted: 18
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
compute/src/main/java/org/zstack/compute/vm/VmNicManagerImpl.java (1)
56-64:⚠️ Potential issue | 🟠 Major空检查应在更新 UsedIpVO 之前执行
当前第 58 行在检查
VmNicVO是否存在之前就更新了UsedIpVO的vmNicUuid字段。如果vmNicUUid对应的 NIC 不存在,UsedIpVO将被设置为一个无效的引用,导致数据不一致。建议将空检查移至 SQL 更新之前:
🔧 建议修复方案
`@Override` public void afterAddIpAddress(String vmNicUUid, String usedIpUuid) { - /* update UsedIpVO */ - SQL.New(UsedIpVO.class).eq(UsedIpVO_.uuid, usedIpUuid).set(UsedIpVO_.vmNicUuid, vmNicUUid).update(); - VmNicVO nic = Q.New(VmNicVO.class).eq(VmNicVO_.uuid, vmNicUUid).find(); if (nic == null) { logger.debug(String.format("VmNic[uuid:%s] not found, skip afterAddIpAddress", vmNicUUid)); return; } + + /* update UsedIpVO */ + SQL.New(UsedIpVO.class).eq(UsedIpVO_.uuid, usedIpUuid).set(UsedIpVO_.vmNicUuid, vmNicUUid).update();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmNicManagerImpl.java` around lines 56 - 64, In afterAddIpAddress, don't update UsedIpVO before verifying the NIC exists: first fetch VmNicVO via Q.New(VmNicVO.class).eq(VmNicVO_.uuid, vmNicUUid).find() and if it is null log and return, and only after that call SQL.New(UsedIpVO.class).eq(UsedIpVO_.uuid, usedIpUuid).set(UsedIpVO_.vmNicUuid, vmNicUUid).update(); this prevents setting vmNicUuid to an invalid reference.conf/tools/install.sh (1)
143-151:⚠️ Potential issue | 🟠 Major
zstack-dashboard的升级路径还会复用旧的 Python 2 venv。这里只有“目录不存在才创建”的判断。升级场景下,如果
/var/lib/zstack/virtualenv/zstack-dashboard已经是旧的 Python 2 环境,后面的pip install仍会继续落在旧解释器里,这次 3.11 迁移对存量节点就不生效了。这里也应该复用前面新加的ensure_python3_venv。🛠️ 建议修正
- if [ ! -d "$UI_VIRENV_PATH" ]; then - python3.11 -m venv $UI_VIRENV_PATH - if [ $? -ne 0 ]; then - rm -rf $UI_VIRENV_PATH - exit 1 - fi - fi + ensure_python3_venv "$UI_VIRENV_PATH"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@conf/tools/install.sh` around lines 143 - 151, The current install logic only creates the venv at UI_VIRENV_PATH when the directory is missing, so upgrades can keep an old Python2 venv; call the shared ensure_python3_venv routine (or perform a version check on UI_VIRENV_PATH and recreate it if its python is not 3.11) before running pip installs. Specifically, replace the "if [ ! -d \"$UI_VIRENV_PATH\" ]" creation-only flow by invoking ensure_python3_venv (or checking the venv's python binary via UI_VIRENV_PATH/bin/python and recreating the venv if its version != 3.11) so the zstack-dashboard venv is guaranteed to use Python 3.11.core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java (1)
80-92:⚠️ Potential issue | 🟠 Major
getNodeInfo()在缓存未命中时返回null第 89 行使用
nodes.put()的返回值(前一个值),而非新创建的NodeInfo对象。当缓存未命中时,put()返回null,导致方法返回null。第 893 行的调用链getNodeInfo(nodeUuid).getNodeIP()会在 null 值上触发 NullPointerException,虽然被 try-catch 捕获,但导致setHostName()未被执行,主机名设置被无声忽略。应改为:
nodeHash.add(nodeUuid); - info = nodes.put(nodeUuid, new NodeInfo(vo)); + info = new NodeInfo(vo); + nodes.put(nodeUuid, info);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java` around lines 80 - 92, In getNodeInfo(String nodeUuid) the code stores new NodeInfo via nodes.put(...) but returns the put() result (the previous value), so on cache miss it returns null; fix by constructing the new NodeInfo (e.g. new NodeInfo(vo)), assign it to a local variable, put that instance into nodes (nodes.put(nodeUuid, newInfo) or putIfAbsent as appropriate), add nodeUuid to nodeHash, and return the newInfo; update the method so it always returns the actual NodeInfo instance (reference getNodeInfo) instead of the put() return value.compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java (3)
1280-1282:⚠️ Potential issue | 🟠 Major静态 DNS 写入发生在网络服务重放之后,Running VM 这次请求里拿不到新 DNS。
这两个分支都会先执行
VmApplyNetworkServiceOnChangeIPFlow,然后才在 done/success 回调里setStaticDns(...)。对运行中的 VM 来说,本次 change-IP 流程只会把 tag 存下来,DNS 不会在当前请求里立即下发。建议像changeVmNicNetwork()里的SetStaticIp.setDns()一样,把 DNS 写入前移到 flow 内、放在 apply flow 之前,并补上失败回滚。Based on learnings:
NetworkServiceManagerImpl.getVmNicDns()reads VM NIC DNS from theSTATIC_DNSsystem tag keyed byl3NetworkUuid.Also applies to: 3707-3717, 3763-3766
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines 1280 - 1282, The static DNS is being written after VmApplyNetworkServiceOnChangeIPFlow finishes so a Running VM won't get the DNS in this request; modify the change-IP path in VmInstanceBase so that setting the STATIC_DNS tag (what setStaticDns does) is done inside VmApplyNetworkServiceOnChangeIPFlow before the apply step (mirroring changeVmNicNetwork -> SetStaticIp.setDns behavior) and add a rollback that removes/restores the previous STATIC_DNS tag on flow failure; update references to NetworkServiceManagerImpl.getVmNicDns to continue reading the tag and ensure both the primary branch here and the other occurrences around the specified blocks (3707-3717, 3763-3766) apply the same change.
1088-1110:⚠️ Potential issue | 🟠 Major同 IP 更新时,
overrideInfo会被这段短路逻辑直接吞掉。现在
setIpamStaticIp()已经把netmask/gateway/ipv6Prefix/ipv6Gateway传进来了,但只要请求里的 IP 和当前 NIC IP 相同,Line 1106-1110 就会直接返回成功,Line 1147-1155 的覆盖字段根本不会落到AllocateIpMsg。这样“只改网关/前缀、不改 IP”的请求会表现为成功但实际不生效。建议只有在 IP 和这些 override 字段都未变化 时才走这个 fast path。Based on learnings:
fillIpv4Parameters()/fillIpv6Parameters()intentionally persist user-suppliednetmask/gateway/ipv6Prefix/ipv6Gatewaywhen those fields are provided.Also applies to: 1147-1155, 3747-3753
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines 1088 - 1110, The fast-path in changeVmIp currently returns success when the requested IPs equal the NIC's IPs, which discards overrideInfo (netmask/gateway/ipv6Prefix/ipv6Gateway) so updates that only change those fields never reach AllocateIpMsg; modify the check in changeVmIp (and the analogous checks around the blocks that handle AllocateIpMsg at lines referenced) to compare not only IPs (targetNic.getUsedIps() / staticIpMap.values()) but also the corresponding override fields from overrideInfo (e.g., netmask, gateway, ipv6Prefix, ipv6Gateway) against the NIC’s current values (use fillIpv4Parameters/fillIpv6Parameters semantics to detect when user supplied values are present); only short-circuit and call completion.success() when both IPs and all provided override fields are unchanged, otherwise proceed so setIpamStaticIp and AllocateIpMsg receive and persist the overrideInfo.
143-179:⚠️ Potential issue | 🟠 Major
NoState只补了 tracer 收敛路径,checkState()失败回收链路里仍会把它当异常抛掉。
stopVm()、rebootVm()、destroy()、migrate等失败后的收敛逻辑都会回到checkState(...)。那里目前只接受Running/Stopped/Paused,一旦主机返回NoState就会直接抛CloudRuntimeException,外层 completion 不会继续走完,这类请求仍然可能维持 loading。🔧 建议补齐 `checkState()` 的 `NoState` 分支
- } else { + } else if (VmInstanceState.NoState.toString().equals(state)) { + changeVmStateInDb(VmInstanceStateEvent.noState, () -> self.setHostUuid(hostUuid)); + } else { throw new CloudRuntimeException(String.format( "CheckVmStateOnHypervisorMsg should only report states[Running, Paused or Stopped]," + "but it reports %s for the vm[uuid:%s] on the host[uuid:%s]", state, self.getUuid(), hostUuid)); }Also applies to: 1655-1659
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines 143 - 179, The checkState(...) method in VmInstanceBase currently only handles Running/Stopped/Paused and throws on any other string; add a branch to handle the hypervisor "NoState" response so it doesn't escalate to a CloudRuntimeException: in VmInstanceBase.checkState, after retrieving state (String state = r.getStates().get(self.getUuid())), add an else-if for VmInstanceState.NoState.toString() that calls changeVmStateInDb(VmInstanceStateEvent.unknown) and completes the NoErrorCompletion (completion.done()), mirroring the existing null/error handling paths so stopVm/rebootVm/destroy/migrate convergence won't be interrupted.
🧹 Nitpick comments (15)
core/src/main/java/org/zstack/core/db/UpdateQueryImpl.java (1)
55-59: LGTM! 空值安全检查修复正确。此修改正确地处理了
val为null时的情况,避免了在格式化错误消息时抛出NullPointerException。可选的小优化:当前
val.getClass()会输出类似class java.lang.String的格式,如果希望更简洁可以使用val.getClass().getSimpleName()。♻️ 可选优化
if ((op == Op.IN || op == Op.NOT_IN) && !(val instanceof Collection)) { throw new CloudRuntimeException(String.format("for operation IN or NOT IN, a Collection value is expected, but %s got", - val == null ? "null" : val.getClass())); + val == null ? "null" : val.getClass().getSimpleName())); },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/db/UpdateQueryImpl.java` around lines 55 - 59, The current null-safe check in UpdateQueryImpl.condAnd correctly avoids NPE but the error message uses val.getClass() which prints full class name with "class" prefix; change the message to use val.getClass().getSimpleName() (still guarding for null) so the thrown CloudRuntimeException shows a cleaner type name; update the String.format call in the condAnd method to use getSimpleName() when val != null.header/src/main/java/org/zstack/header/storage/addon/primary/ExternalPrimaryStorageInventory.java (1)
80-89: 列表元素类型假设可能存在风险。当前代码通过
String.valueOf(url)将列表元素转换为字符串。如果mdsUrls或mdsInfos中的元素是复杂对象(如嵌套的 Map),String.valueOf会调用toString()方法,无法正确脱敏内部可能包含的凭证信息。建议增加类型检查,对非字符串类型的元素做适当处理:
♻️ 可选的类型检查改进
private static void desensitizeUrlList(Map config, String key) { Object urls = config.get(key); if (urls instanceof List) { List<String> desensitized = new ArrayList<>(); for (Object url : (List) urls) { - desensitized.add(desensitizeUrl(String.valueOf(url))); + if (url instanceof String) { + desensitized.add(desensitizeUrl((String) url)); + } else { + // Skip non-string elements or handle complex objects appropriately + desensitized.add(String.valueOf(url)); + } } config.put(key, desensitized); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/storage/addon/primary/ExternalPrimaryStorageInventory.java` around lines 80 - 89, desensitizeUrlList currently assumes each list element can be stringified safely; if elements are nested Maps/Lists (e.g., mdsUrls/mdsInfos entries with credential fields) toString() will not properly desensitize secrets. Update desensitizeUrlList (and reuse desensitizeUrl) to type-check each element: if element is String call desensitizeUrl and add result; if element is Map iterate its entries and desensitize any String values (or recursively call a helper to handle nested Map/List), if element is List process recursively, and for other types either skip or convert safely; finally put the processed list back into config under the same key. Ensure helper names (desensitizeUrlList, desensitizeUrl) are used so changes are easy to find.core/src/main/java/org/zstack/core/db/DatabaseFacadeImpl.java (1)
909-921: 建议:clz非空但EntityInfo未找到时添加警告日志。与
installEntityLifeCycleCallback方法相比存在不一致行为:
- 安装时:若
info == null会触发断言失败- 卸载时:若
info == null静默返回虽然卸载操作采用宽容策略是合理的(避免清理时抛出异常),但建议添加警告日志以便排查问题。
♻️ 建议的改进
`@Override` public void uninstallEntityLifeCycleCallback(Class clz, EntityEvent evt, EntityLifeCycleCallback cb) { if (clz != null) { EntityInfo info = entityInfoMap.get(clz); if (info != null) { info.uninstallLifeCycleCallback(evt, cb); + } else { + logger.warn(String.format("cannot find EntityInfo for the class[%s] when uninstalling lifecycle callback", clz)); } } else { for (EntityInfo info : entityInfoMap.values()) { info.uninstallLifeCycleCallback(evt, cb); } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/db/DatabaseFacadeImpl.java` around lines 909 - 921, The uninstallEntityLifeCycleCallback currently silently returns when clz != null but entityInfoMap.get(clz) yields null; add a warning log in that branch to mirror installEntityLifeCycleCallback's assert behavior for visibility: in DatabaseFacadeImpl.uninstallEntityLifeCycleCallback, when clz != null and EntityInfo info == null, call the logger (e.g., use existing logger in this class) to log a warning that no EntityInfo was found for the given class before returning; keep the tolerant behavior (no exception) but emit the warning so missing entries are discoverable.rest/src/main/resources/scripts/GoTestTemplate.groovy (1)
31-31: 注释编号重复Line 28 已使用
// 2.注释,这里的// 2. Per-resource test files应改为// 3.。📝 建议修复
- // 2. Per-resource test files + // 3. Per-resource test files🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rest/src/main/resources/scripts/GoTestTemplate.groovy` at line 31, The comment numbering is duplicated: change the comment string "// 2. Per-resource test files" to "// 3. Per-resource test files" so it follows the earlier "// 2." comment; update the comment text near the existing "// 2. Per-resource test files" occurrence in GoTestTemplate.groovy (look for that exact comment) to use "// 3." instead.compute/src/main/java/org/zstack/compute/host/HostBase.java (1)
1443-1446: 防御性空检查:考虑"unknown"带来的去重冲突风险此处增加了对
self的空检查以避免 NPE,这是合理的防御性编程。但使用固定字符串"unknown"作为去重键可能导致问题:如果多个不同上下文的连接请求同时遇到
self == null的情况,它们将共享相同的去重签名"connect-host-unknown",导致本应独立执行的请求被错误去重。建议:
- 若
self为null是不合法状态,考虑在方法入口处提前失败并记录日志- 若为合法边界情况,可考虑使用
msg.getHostUuid()替代"unknown"♻️ 建议的改进方案
`@Override` protected String getDeduplicateString() { - return String.format("connect-host-%s", self == null ? "unknown" : self.getUuid()); + return String.format("connect-host-%s", self != null ? self.getUuid() : msg.getHostUuid()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/host/HostBase.java` around lines 1443 - 1446, The getDeduplicateString() method currently returns "connect-host-unknown" when self is null, which can cause unrelated requests to collide; update the logic in HostBase.getDeduplicateString() to avoid a fixed "unknown" key: either fail fast with a logged error if self == null (treating it as an invalid state), or build the dedupe key from a more specific identifier such as the incoming message's host UUID (e.g., use msg.getHostUuid() when available) so concurrent null-self cases produce distinct keys; ensure to reference self and msg in the chosen branch and keep the log/context clear.header/src/main/java/org/zstack/header/vm/VmAbnormalLifeCycleStruct.java (1)
37-44: 建议:统一使用Objects.equals()以提高代码一致性和安全性。该文件中存在混合使用
.equals()和Objects.equals()的情况。虽然部分枚举值已使用Objects.equals()(如第 20、75、83 行),但大多数仍使用.equals()进行主机 UUID 比较(第 42、50、58、67、107、115、123、131、139、155 行等)。其中第 67 行的显式 null 检查表明这些 getter 方法确实可能返回 null。为了保证代码的一致性和健壮性,建议在后续迭代中统一改为
Objects.equals()。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/VmAbnormalLifeCycleStruct.java` around lines 37 - 44, Replace direct .equals() comparisons of host UUIDs with null-safe Objects.equals() in the VmAbnormalLifeCycleStruct matching logic; specifically update the match(...) implementation in the VmPausedFromUnknownStateHostNotChanged enum constant (and other enum constants inside VmAbnormalLifeCycleStruct that compare getCurrentHostUuid()/getOriginalHostUuid()) to call Objects.equals(struct.getCurrentHostUuid(), struct.getOriginalHostUuid()) instead of struct.getCurrentHostUuid().equals(struct.getOriginalHostUuid()) so comparisons are consistent and null-safe.plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java (1)
247-248: 建议去掉硬编码999,改为具名常量。Line 247 的
999是魔法值,后续维护时很难理解其语义和边界。建议提取为具名常量(类常量或至少方法内final变量)。♻️ 建议修改
- ref = new ExternalHostIdGetter(999).getOrAllocateHostIdRef( + final int externalHostIdType = 999; + ref = new ExternalHostIdGetter(externalHostIdType).getOrAllocateHostIdRef( param.getHostUuid(), param.getPrimaryStorage().getUuid());As per coding guidelines, "避免使用魔法值(Magic Value):直接使用未经定义的数值或字符串应替换为枚举或常量。"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java` around lines 247 - 248, Replace the magic number 999 passed to ExternalHostIdGetter in the call to getOrAllocateHostIdRef with a named constant; define a descriptive constant (e.g., DEFAULT_EXTERNAL_HOST_ID or EXTERNAL_HOST_ID_BASE) as a private static final field in KvmIscsiNodeServer (or as a final local variable inside the method if scope-local) and use that constant in place of 999 when constructing ExternalHostIdGetter(param.getHostUuid(), param.getPrimaryStorage().getUuid()) so the purpose and bounds of the value are clear.core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java (1)
58-58: 建议移除硬编码队列大小。Line 58 的
1000属于魔法值,建议改为TelemetryGlobalProperty配置项或命名常量,便于环境调优和后续维护。♻️ 建议修改
- options.setMaxQueueSize(1000); + options.setMaxQueueSize(TelemetryGlobalProperty.SENTRY_MAX_QUEUE_SIZE);As per coding guidelines: “避免使用魔法值(Magic Value):直接使用未经定义的数值或字符串应替换为常量或枚举。”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java` at line 58, 在 SentryInitHelper 中的 options.setMaxQueueSize(1000) 使用了魔法值 1000;请将该硬编码值替换为可配置或有意义命名的常量,例如从 TelemetryGlobalProperty(新增属性如 sentryMaxQueueSize)读取或定义类常量(e.g. DEFAULT_SENTRY_MAX_QUEUE_SIZE),并在 SentryInitHelper 中用该常量/属性替代原有的 1000,确保有默认值、配置注入或静态常量注释以便后续调优和维护。plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java (2)
470-515: 检查 Shell 命令执行结果以确保 CA 初始化成功
ShellUtils.run()方法在命令执行失败时不会抛出异常,而是返回退出码。当前代码未检查 openssl 命令的执行结果,可能导致在 CA 生成失败时继续执行后续逻辑,产生难以排查的问题。建议的修改方案
- if (!caFile.exists() || !keyFile.exists()) { - ShellUtils.run(String.format( - "openssl genrsa -out %s/cakey.pem 4096", CA_DIR)); - ShellUtils.run(String.format( - "openssl req -new -x509 -days 3650 -key %s/cakey.pem " + - "-out %s/cacert.pem -subj '/O=ZStack/CN=ZStack Libvirt CA'", - CA_DIR, CA_DIR)); + if (!caFile.exists() || !keyFile.exists()) { + int ret = ShellUtils.run(String.format( + "openssl genrsa -out %s/cakey.pem 4096", CA_DIR)); + if (ret != 0) { + logger.warn(String.format("Failed to generate CA private key, exit code: %d", ret)); + return; + } + ret = ShellUtils.run(String.format( + "openssl req -new -x509 -days 3650 -key %s/cakey.pem " + + "-out %s/cacert.pem -subj '/O=ZStack/CN=ZStack Libvirt CA'", + CA_DIR, CA_DIR)); + if (ret != 0) { + logger.warn(String.format("Failed to generate CA certificate, exit code: %d", ret)); + return; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java` around lines 470 - 515, The initLibvirtTlsCA method currently calls ShellUtils.run(...) for openssl and file-permission commands but does not check return values; update initLibvirtTlsCA to capture the integer return from each ShellUtils.run call (especially the openssl genrsa and openssl req calls and chown/chmod invocations), verify success (zero) before proceeding, and handle failures by logging an error and aborting the CA initialization (e.g., throw or return early) so subsequent FileUtils.readFileToString and JsonLabel.createIfAbsent calls for caFile and keyFile are not executed with missing/invalid files; ensure any cleanup or descriptive logger.warn/logger.error includes the failing command and its exit code.
498-507: 使用带 Charset 参数的 FileUtils 方法
FileUtils.readFileToString(File)和FileUtils.writeStringToFile(File, String)方法已被标记为 deprecated,建议使用带Charset参数的版本以明确指定字符编码。建议的修改
+import java.nio.charset.StandardCharsets; ... - String ca = FileUtils.readFileToString(caFile).trim(); - String key = FileUtils.readFileToString(keyFile).trim(); + String ca = FileUtils.readFileToString(caFile, StandardCharsets.UTF_8).trim(); + String key = FileUtils.readFileToString(keyFile, StandardCharsets.UTF_8).trim(); ... - FileUtils.writeStringToFile(caFile, caInv.getLabelValue()); - FileUtils.writeStringToFile(keyFile, keyInv.getLabelValue()); + FileUtils.writeStringToFile(caFile, caInv.getLabelValue(), StandardCharsets.UTF_8); + FileUtils.writeStringToFile(keyFile, keyInv.getLabelValue(), StandardCharsets.UTF_8);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java` around lines 498 - 507, The FileUtils usages around caFile/keyFile should be updated to the non-deprecated Charset-aware overloads: when reading use FileUtils.readFileToString(caFile, StandardCharsets.UTF_8) and FileUtils.readFileToString(keyFile, StandardCharsets.UTF_8), and when writing use FileUtils.writeStringToFile(caFile, caInv.getLabelValue(), StandardCharsets.UTF_8) and FileUtils.writeStringToFile(keyFile, keyInv.getLabelValue(), StandardCharsets.UTF_8); make these changes in the block that reads into ca/key, calls new JsonLabel().createIfAbsent(LIBVIRT_TLS_CA_KEY, ca) and createIfAbsent(LIBVIRT_TLS_PRIVATE_KEY, key), and then overwrites the local files so encoding is explicit and deprecation warnings are removed.core/src/main/java/org/zstack/core/Platform.java (1)
986-1007:LocaleUtils类名冲突的处理方式值得注意。Line 999 使用了完全限定名
org.zstack.core.errorcode.LocaleUtils.DEFAULT_LOCALE,这是因为 Line 4 已经导入了org.apache.commons.lang.LocaleUtils。当前的处理方式是正确的,但可以考虑为其中一个类使用别名导入以提高可读性。错误码的国际化回退逻辑设计良好:先尝试本地化,失败时优雅降级到
details或description。♻️ 可选:使用静态导入简化代码
+import static org.zstack.core.errorcode.LocaleUtils.DEFAULT_LOCALE; // 然后在使用处: -i18nService.localizeErrorCode(result, org.zstack.core.errorcode.LocaleUtils.DEFAULT_LOCALE); +i18nService.localizeErrorCode(result, DEFAULT_LOCALE);或者重命名 Apache Commons 的导入:
-import org.apache.commons.lang.LocaleUtils; +import org.apache.commons.lang.LocaleUtils as ApacheLocaleUtils;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/Platform.java` around lines 986 - 1007, The code currently imports org.apache.commons.lang.LocaleUtils which conflicts with org.zstack.core.errorcode.LocaleUtils used in Platform, making the fully-qualified reference org.zstack.core.errorcode.LocaleUtils.DEFAULT_LOCALE verbose; fix by removing the conflicting import and either add a static import for org.zstack.core.errorcode.LocaleUtils.DEFAULT_LOCALE and call DEFAULT_LOCALE directly in Platform.localizeErrorCode(result, DEFAULT_LOCALE), or keep the Apache import and instead fully-qualify the Apache Commons usages elsewhere so the Platform class can use a simple static import; update imports accordingly and ensure the call in Platform (localizeErrorCode) uses the unambiguous DEFAULT_LOCALE symbol.header/src/main/java/org/zstack/header/core/external/service/APIAddExternalServiceConfigurationMsg.java (1)
53-57:audit()方法实现可以更健壮。当前实现在
rsp.isSuccess()为 true 但evt.getInventory()为 null 时会抛出 NPE。虽然根据已有的 learnings,失败时不会设置 inventory 且isSuccess()会为 false,但添加空值检查可以提高防御性。♻️ 建议的改进
`@Override` public Result audit(APIMessage msg, APIEvent rsp) { APIAddExternalServiceConfigurationEvent evt = (APIAddExternalServiceConfigurationEvent) rsp; - return new Result(rsp.isSuccess() ? evt.getInventory().getUuid(): "", ExternalServiceConfigurationVO.class); + return new Result(rsp.isSuccess() && evt.getInventory() != null ? evt.getInventory().getUuid() : "", ExternalServiceConfigurationVO.class); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/core/external/service/APIAddExternalServiceConfigurationMsg.java` around lines 53 - 57, The audit() in APIAddExternalServiceConfigurationMsg should avoid a potential NPE when rsp.isSuccess() is true but evt or evt.getInventory() is null; update the method to cast rsp to APIAddExternalServiceConfigurationEvent (evt) then defensively check evt != null and evt.getInventory() != null before calling evt.getInventory().getUuid(), returning an empty string when either is null while still constructing the Result with ExternalServiceConfigurationVO.class.conf/db/upgrade/V5.5.12__schema.sql (1)
112-124: 外键修改操作缺少幂等性检查直接
DROP FOREIGN KEY在约束不存在时会失败。建议参考 Lines 141-179 的ModifyUsedIpVOForeignKey存储过程模式,先检查约束是否存在再执行删除操作,以确保升级脚本可以安全重复执行。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@conf/db/upgrade/V5.5.12__schema.sql` around lines 112 - 124, The ALTER TABLE drops for BareMetal2InstanceVO currently assume the foreign keys fkBareMetal2InstanceVOGatewayVO and fkBareMetal2InstanceVOGatewayVO1 exist; make the operation idempotent by first checking information_schema (or using the same pattern as the ModifyUsedIpVOForeignKey procedure) to detect and only DROP the constraint when present, and similarly only ADD each FOREIGN KEY when it does not already exist; reference the BareMetal2InstanceVO table and those constraint names (fkBareMetal2InstanceVOGatewayVO, fkBareMetal2InstanceVOGatewayVO1) and mirror the existence-check logic from ModifyUsedIpVOForeignKey to ensure safe repeated executions of the upgrade script.core/src/main/java/org/zstack/core/externalservice/ExternalServiceManagerImpl.java (2)
418-435: 错误信息缺少错误码第 429 行和第 434 行的
operr(...)调用缺少预定义的错误码,与文件中其他错误处理方式不一致。♻️ 建议添加错误码
try{ service.externalConfig(serviceType); completion.success(serviceType); } catch (Exception e) { - completion.fail(operr("failed to apply external service configuration for type [%s]: %s", serviceType, e.getMessage())); + completion.fail(operr(ORG_ZSTACK_CORE_EXTERNALSERVICE_XXXXX, "failed to apply external service configuration for type [%s]: %s", serviceType, e.getMessage())); } return; } } - completion.fail(operr("unable to find external service type [%s]", serviceType)); + completion.fail(operr(ORG_ZSTACK_CORE_EXTERNALSERVICE_XXXXX, "unable to find external service type [%s]", serviceType)); }请在
CloudOperationsErrorCode中定义相应的错误码。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/externalservice/ExternalServiceManagerImpl.java` around lines 418 - 435, The oprerr(...) calls in regenerateExternalServiceConfiguration (class ExternalServiceManagerImpl, method regenerateExternalServiceConfiguration) are missing explicit error codes; update both failure paths (the catch block after service.externalConfig(serviceType) and the final completion.fail for missing service) to call oprerr with a CloudOperationsErrorCode value (e.g., CLOUD_OPERATION_EXTERNAL_SERVICE_APPLY_FAILED and CLOUD_OPERATION_EXTERNAL_SERVICE_NOT_FOUND), then add those constants to CloudOperationsErrorCode enum/class with appropriate messages so the completion.fail carries an error code and message; ensure references to serviceType remain in the formatted message and keep the surrounding logic (services map, ExternalService.externalConfig, and completion) unchanged.
259-274: 错误信息缺少错误码第 263 行的
operr(...)调用缺少预定义的错误码,与文件中其他错误处理方式不一致(其他地方使用如ORG_ZSTACK_CORE_EXTERNALSERVICE_10000等错误码)。♻️ 建议添加错误码
if (vo == null) { - completion.fail(operr("unable to find external service configuration with uuid [%s]", msg.getUuid())); + completion.fail(operr(ORG_ZSTACK_CORE_EXTERNALSERVICE_XXXXX, "unable to find external service configuration with uuid [%s]", msg.getUuid())); return; }请在
CloudOperationsErrorCode中定义相应的错误码。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/externalservice/ExternalServiceManagerImpl.java` around lines 259 - 274, The error handling in ExternalServiceManagerImpl.updateExternalServiceConfiguration uses operr(...) without a CloudOperationsErrorCode, making it inconsistent; add a new error code constant (e.g. ORG_ZSTACK_CORE_EXTERNALSERVICE_10001) to CloudOperationsErrorCode and update the failure call in updateExternalServiceConfiguration to use operr(CloudOperationsErrorCode.ORG_ZSTACK_CORE_EXTERNALSERVICE_10001, "unable to find external service configuration with uuid [%s]", msg.getUuid()); ensure the new enum member follows the existing naming/numbering convention and compile-time imports are adjusted so ExternalServiceManagerImpl can reference CloudOperationsErrorCode.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java`:
- Around line 369-377: The code may NPE when msg.getSystemTags() is null because
addAll(resolvedTags) is outside the null check; inside VmInstanceApiInterceptor
ensure system tags are initialized before adding: check msg.getSystemTags() and
if null call msg.setSystemTags(new ArrayList<>()) (or otherwise create an empty
list), then perform removeIf and addAll using msg.getSystemTags() and
resolvedTags so addAll never runs on a null list.
- Around line 409-426: validateDnsAddresses currently validates entries without
trimming, so values like " 1.2.3.4 " fail; update validateDnsAddresses to first
map dnsAddresses to trimmed strings (e.g., dnsAddresses =
dnsAddresses.stream().map(String::trim).collect...) and remove any empty strings
produced by trimming before performing the size check and IPv4/IPv6 checks; use
the same symbols (validateDnsAddresses, dnsAddresses,
NetworkUtils.isIpv4Address, IPv6NetworkUtils.isIpv6Address,
VmInstanceConstant.MAXIMUM_NIC_DNS_NUMBER) and keep the same error behavior
(throw ApiMessageInterceptionException with existing error codes) after
trimming/filtering.
In `@core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java`:
- Around line 62-91: In the anonymous ILogger implementation passed to
options.setLogger in SentryInitHelper, guard against null and formatting
exceptions: check if args is null before accessing args.length, and wrap
String.format(...) in a try/catch to fall back to a safe message if formatting
fails; likewise, change the logging for SentryLevel.ERROR to use
logger.error(...) (not logger.warn) so error-level events are not downgraded;
apply these fixes in the three logger methods
(log(SentryLevel,String,Object...), log(SentryLevel,String,Throwable), and
log(SentryLevel,Throwable,String,Object...)) to ensure no NPEs or format
exceptions and correct severity mapping.
In `@network/src/main/java/org/zstack/network/l3/IpRangeHelper.java`:
- Around line 303-304: isIpOutsideL3NetworkCidr currently just negates
isIpInL3NetworkCidr which causes it to return true for L3s with no
NormalIpRangeVO; update isIpOutsideL3NetworkCidr to first check for any
NormalIpRangeVO for the given l3Uuid and return false when none exist, otherwise
return the negation of isIpInL3NetworkCidr(ip, l3Uuid); reference the methods
isIpOutsideL3NetworkCidr and isIpInL3NetworkCidr and the NormalIpRangeVO query
used elsewhere in this class to locate and implement the guard.
In `@network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java`:
- Around line 1126-1130: The current removal of reserved IPs from
freeIpInventorys in L3BasicNetwork (using reservedIpRanges and
NetworkUtils.isInRange) can cause the returned list to be shorter than
msg.getLimit() because you simply remove matches instead of filling the result
to the requested limit; change the logic so reserved IP filtering happens during
accumulation (skip adding any freeIp that matches reservedIpRanges) or, if you
must filter after collection, continue iterating the candidate pool to append
additional non-reserved freeIp entries until you reach msg.getLimit(); update
the code that populates freeIpInventorys (the accumulation loop that feeds
freeIpInventorys and any caller that relies on msg.getLimit()) to enforce this
behavior.
In `@network/src/main/java/org/zstack/network/service/DhcpExtension.java`:
- Around line 347-350: The current early filter in DhcpExtension (checking
ip.getIpRangeUuid() == null and continuing) incorrectly excludes NICs whose
ipRangeUuid is null but whose IP still falls inside the L3 CIDR and should be
processed by makeDhcpStruct(); change the logic to only skip when the IP is
truly outside the L3 CIDR (i.e., ip.getIpRangeUuid() == null AND the IP is not
within the L3 network/CIDR), otherwise keep the NIC so makeDhcpStruct() can
generate DHCP entries; update the condition around ip.getIpRangeUuid() in the
loop (the block that currently does continue) to verify CIDR inclusion before
skipping.
In `@plugin/eip/src/main/java/org/zstack/network/service/eip/EipManagerImpl.java`:
- Around line 421-427: The stream filter currently calls
IpRangeHelper.isIpInL3NetworkCidr(...) inside the loop (causing N+1 DB queries);
instead collect all L3 network UUIDs from ret's nics' UsedIp objects, perform a
single batch query to load each L3 network's CIDR(s) into a Map<String,String>
(keyed by l3NetworkUuid) and then replace the in-loop call with a local lookup
against that map (e.g., lookup cidr = l3CidrMap.get(ip.getL3NetworkUuid()) and
test the IP against the cached CIDR via IpRangeHelper or a local CIDR check), so
EipManagerImpl's nic filtering uses the cached CIDRs and avoids per-UsedIp DB
calls.
In `@plugin/expon/src/main/java/org/zstack/expon/ExponStorageController.java`:
- Around line 125-129: In ExponStorageController, validate and normalize the
extracted scheme before assigning to clientConfig.scheme and before any
HttpUrl.Builder.scheme(String) usage: read uri.getScheme(), convert to
lower-case, treat null/empty as "https", accept only "http" or "https" and
assign that normalized value to clientConfig.scheme; if the scheme is any other
value (e.g., "ftp"), throw a clear, user-friendly exception (or return a
validation error) instead of passing it through, so downstream
HttpUrl.Builder.scheme(String) is never given an unsupported protocol and we
avoid leaking raw IllegalArgumentException details.
In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`:
- Around line 247-257: The code may NPE when
ExternalHostIdGetter.getOrAllocateHostIdRef(...) returns null; in
KvmIscsiNodeServer where you create KvmSetupSelfFencerCmd and assign ref (call
to ExternalHostIdGetter.getOrAllocateHostIdRef), add a null/invalid check (if
ref == null || ref.getHostId() == 0) immediately after obtaining ref and call
trigger.fail(...) with an appropriate message and return to stop further
processing; mirror the null-check pattern used in ExternalPrimaryStorage and
KvmCbdNodeServer to avoid dereferencing ref when setting cmd.hostId.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java`:
- Around line 5823-5831: The TLS SAN list is built from HostSystemTags.EXTRA_IPS
which may not be populated yet (refresh happens in
createCollectHostFactsFlow()/saveHostExtraIps()), causing first-time connections
to miss migration IPs; change KVMHost code that sets
deployArguments.setTlsCertIps(...) to use the IPs discovered during the current
probe/host-facts (the in-memory detected extra IP list from the host probe) or
move the host facts collection earlier so you combine
getSelf().getManagementIp() with the freshly detected extra IPs (instead of
relying on HostSystemTags.EXTRA_IPS) before calling
deployArguments.setTlsCertIps(...) so the generated cert SANs include migration
network IPs on first connect.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java`:
- Around line 463-477: The nodeLeft flow removes entries from vmApis and
vmsToSkip which causes a possible NPE in the VM_CONTINUE_TRACE_PATH callback
(lines ~150-159) where it does vmsToSkip.get(managementNodeId).remove(...); make
that callback null-safe by checking that vmsToSkip.get(managementNodeId) is
non-null before calling remove (or use a safe lookup + conditional remove), and
when you successfully obtain the vmUuid in that callback also remove any
orphaned entry from orphanedSkipVms for that vmUuid so orphan entries are
cleaned up promptly; keep all changes local to the VM_CONTINUE_TRACE_PATH
handler and reference the nodeLeft-related maps vmApis, vmsToSkip, and
orphanedSkipVms.
In
`@plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerBase.java`:
- Around line 641-645: The stream filter in LoadBalancerBase that calls
IpRangeHelper.isIpInL3NetworkCidr per NIC creates an N+1 query risk (seen where
nicVOS is filtered and again at the other site); replace per-NIC calls with a
shared helper that accepts a collection of L3 network UUIDs, does a single bulk
fetch of L3 CIDR(s) (via the appropriate DAO or batch IpRangeHelper API), builds
an in-memory map of l3Uuid -> CIDR(s), and then performs IP-in-CIDR checks
against that map to filter nicVOS; implement this new method (e.g.,
ensureIpsInL3Cidrs(Collection<NetworkNicVO> nics) or similar) in
LoadBalancerBase and call it from both places instead of calling
IpRangeHelper.isIpInL3NetworkCidr for each NIC.
In
`@plugin/portForwarding/src/main/java/org/zstack/network/service/portforwarding/PortForwardingApiInterceptor.java`:
- Around line 153-162: Move the "IP outside L3 CIDR" validation block in
PortForwardingApiInterceptor so it runs before the early-return that handles the
attach path; specifically, place the
UsedIpVO/IpRangeHelper.isIpOutsideL3NetworkCidr check (the block using
dbf.findByUuid(nicUuid) and throwing ApiMessageInterceptionException) ahead of
the return that short-circuits handling for APIAttachPortForwardingRuleMsg (the
early return around the attach path), so the CIDR check executes for attach
requests as well.
In
`@plugin/portForwarding/src/main/java/org/zstack/network/service/portforwarding/PortForwardingManagerImpl.java`:
- Around line 369-378: The per-NIC call to IpRangeHelper.isIpInL3NetworkCidr in
PortForwardingManagerImpl (after filterVmNicByIpVersion) causes an N+1 query;
instead, batch-fetch the L3 network CIDRs for all distinct l3NetworkUuid values
using l3Mgr (or the L3 network DAO), build a map of l3Uuid->CIDR(s) in memory,
then replace the stream filter so each VmNicInventory checks its IP against the
preloaded CIDR(s) using IpRangeHelper (or a pure in-memory helper) without
additional DB calls; update the logic around filtered =
filtered.stream().filter(...) to use the map and avoid per-nic queries.
In `@plugin/zbs/src/main/java/org/zstack/storage/zbs/ZbsStorageController.java`:
- Around line 182-185: HttpCaller.doSyncCall currently only retries when the
returned response has ret.isSuccess()==false, but RESTFacadeImpl.syncRawJson
throws OperationFailureException on timeouts/connection errors which bypass that
check; update HttpCaller.doSyncCall to catch OperationFailureException (and
related IO/connection timeout exceptions) around the syncRawJson call, treat
these exceptions as a failed call that honors setTryNext(true), log the
exception, and proceed to the next MDS endpoint before ultimately rethrowing if
all endpoints fail so timeouts/connection failures trigger the same MDS-switch
retry logic as unsuccessful responses.
In
`@portal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.java`:
- Around line 381-390: The nodeDie() path is calling nodeLifeCycle.nodeLeft(...)
outside the lifecycleLock, causing a race where a stale nodeLeft can run
concurrently with a later NodeJoin handling; move the local nodeLeft call and
its event dispatch inside the same synchronized(lifecycleLock) block used by the
NodeJoin/NodeLeft canonical callback so DB/hash-ring reconciliation and callback
handling are serialized. Concretely, update the nodeDie() method to acquire
lifecycleLock before invoking nodeLifeCycle.nodeLeft(...) and any related
suspectedMissingFromDb updates or event dispatch logic (mirror the pattern used
around nodeLifeCycle.nodeJoin(...) and the existing NodeLeft branch) so both
canonical callbacks and fencing-generated nodeLefts use the same lock. Ensure
the same change is applied for the other referenced region (lines around the
second occurrence noted, e.g., the block covering 877-925) so all nodeLeft paths
are synchronized on lifecycleLock.
- Around line 890-898: The heartbeat thread in ManagementNodeManagerImpl must
not call destinationMaker.getNodeInfo (which triggers
ResourceDestinationMakerImpl.getNodeInfo -> dbf.findByUuid) because this
bypasses heartBeatDBSource; change the code around nodeLifeCycle.nodeLeft(inv)
to fetch hostname only from the in-memory joinedManagementNodes cache (use the
joinedManagementNodes map/structure available in ManagementNodeManagerImpl) and,
if there's no cached entry for nodeUuid, leave inv.hostName empty instead of
calling destinationMaker; remove the try/catch that calls
destinationMaker.getNodeInfo to avoid hitting the ordinary DB path.
In `@rest/src/main/java/org/zstack/rest/RestServer.java`:
- Around line 411-416: Deep-copy the ErrorCode before localizing to avoid
mutating shared cached objects: in callWebHook, sendResponse, and
sendReplyResponse create a deep clone of the ErrorCode (clone/copy the result of
evt.getError, response.getError, reply.getError), pass that clone to
i18nService.localizeErrorCode(..., locale), then set the localized clone back
onto the event/response/reply (instead of localizing the original). Ensure the
copy is a full deep copy (message, cause, details, children) so no shared
references remain.
---
Outside diff comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java`:
- Around line 1280-1282: The static DNS is being written after
VmApplyNetworkServiceOnChangeIPFlow finishes so a Running VM won't get the DNS
in this request; modify the change-IP path in VmInstanceBase so that setting the
STATIC_DNS tag (what setStaticDns does) is done inside
VmApplyNetworkServiceOnChangeIPFlow before the apply step (mirroring
changeVmNicNetwork -> SetStaticIp.setDns behavior) and add a rollback that
removes/restores the previous STATIC_DNS tag on flow failure; update references
to NetworkServiceManagerImpl.getVmNicDns to continue reading the tag and ensure
both the primary branch here and the other occurrences around the specified
blocks (3707-3717, 3763-3766) apply the same change.
- Around line 1088-1110: The fast-path in changeVmIp currently returns success
when the requested IPs equal the NIC's IPs, which discards overrideInfo
(netmask/gateway/ipv6Prefix/ipv6Gateway) so updates that only change those
fields never reach AllocateIpMsg; modify the check in changeVmIp (and the
analogous checks around the blocks that handle AllocateIpMsg at lines
referenced) to compare not only IPs (targetNic.getUsedIps() /
staticIpMap.values()) but also the corresponding override fields from
overrideInfo (e.g., netmask, gateway, ipv6Prefix, ipv6Gateway) against the NIC’s
current values (use fillIpv4Parameters/fillIpv6Parameters semantics to detect
when user supplied values are present); only short-circuit and call
completion.success() when both IPs and all provided override fields are
unchanged, otherwise proceed so setIpamStaticIp and AllocateIpMsg receive and
persist the overrideInfo.
- Around line 143-179: The checkState(...) method in VmInstanceBase currently
only handles Running/Stopped/Paused and throws on any other string; add a branch
to handle the hypervisor "NoState" response so it doesn't escalate to a
CloudRuntimeException: in VmInstanceBase.checkState, after retrieving state
(String state = r.getStates().get(self.getUuid())), add an else-if for
VmInstanceState.NoState.toString() that calls
changeVmStateInDb(VmInstanceStateEvent.unknown) and completes the
NoErrorCompletion (completion.done()), mirroring the existing null/error
handling paths so stopVm/rebootVm/destroy/migrate convergence won't be
interrupted.
In `@compute/src/main/java/org/zstack/compute/vm/VmNicManagerImpl.java`:
- Around line 56-64: In afterAddIpAddress, don't update UsedIpVO before
verifying the NIC exists: first fetch VmNicVO via
Q.New(VmNicVO.class).eq(VmNicVO_.uuid, vmNicUUid).find() and if it is null log
and return, and only after that call SQL.New(UsedIpVO.class).eq(UsedIpVO_.uuid,
usedIpUuid).set(UsedIpVO_.vmNicUuid, vmNicUUid).update(); this prevents setting
vmNicUuid to an invalid reference.
In `@conf/tools/install.sh`:
- Around line 143-151: The current install logic only creates the venv at
UI_VIRENV_PATH when the directory is missing, so upgrades can keep an old
Python2 venv; call the shared ensure_python3_venv routine (or perform a version
check on UI_VIRENV_PATH and recreate it if its python is not 3.11) before
running pip installs. Specifically, replace the "if [ ! -d \"$UI_VIRENV_PATH\"
]" creation-only flow by invoking ensure_python3_venv (or checking the venv's
python binary via UI_VIRENV_PATH/bin/python and recreating the venv if its
version != 3.11) so the zstack-dashboard venv is guaranteed to use Python 3.11.
In
`@core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java`:
- Around line 80-92: In getNodeInfo(String nodeUuid) the code stores new
NodeInfo via nodes.put(...) but returns the put() result (the previous value),
so on cache miss it returns null; fix by constructing the new NodeInfo (e.g. new
NodeInfo(vo)), assign it to a local variable, put that instance into nodes
(nodes.put(nodeUuid, newInfo) or putIfAbsent as appropriate), add nodeUuid to
nodeHash, and return the newInfo; update the method so it always returns the
actual NodeInfo instance (reference getNodeInfo) instead of the put() return
value.
---
Nitpick comments:
In `@compute/src/main/java/org/zstack/compute/host/HostBase.java`:
- Around line 1443-1446: The getDeduplicateString() method currently returns
"connect-host-unknown" when self is null, which can cause unrelated requests to
collide; update the logic in HostBase.getDeduplicateString() to avoid a fixed
"unknown" key: either fail fast with a logged error if self == null (treating it
as an invalid state), or build the dedupe key from a more specific identifier
such as the incoming message's host UUID (e.g., use msg.getHostUuid() when
available) so concurrent null-self cases produce distinct keys; ensure to
reference self and msg in the chosen branch and keep the log/context clear.
In `@conf/db/upgrade/V5.5.12__schema.sql`:
- Around line 112-124: The ALTER TABLE drops for BareMetal2InstanceVO currently
assume the foreign keys fkBareMetal2InstanceVOGatewayVO and
fkBareMetal2InstanceVOGatewayVO1 exist; make the operation idempotent by first
checking information_schema (or using the same pattern as the
ModifyUsedIpVOForeignKey procedure) to detect and only DROP the constraint when
present, and similarly only ADD each FOREIGN KEY when it does not already exist;
reference the BareMetal2InstanceVO table and those constraint names
(fkBareMetal2InstanceVOGatewayVO, fkBareMetal2InstanceVOGatewayVO1) and mirror
the existence-check logic from ModifyUsedIpVOForeignKey to ensure safe repeated
executions of the upgrade script.
In `@core/src/main/java/org/zstack/core/db/DatabaseFacadeImpl.java`:
- Around line 909-921: The uninstallEntityLifeCycleCallback currently silently
returns when clz != null but entityInfoMap.get(clz) yields null; add a warning
log in that branch to mirror installEntityLifeCycleCallback's assert behavior
for visibility: in DatabaseFacadeImpl.uninstallEntityLifeCycleCallback, when clz
!= null and EntityInfo info == null, call the logger (e.g., use existing logger
in this class) to log a warning that no EntityInfo was found for the given class
before returning; keep the tolerant behavior (no exception) but emit the warning
so missing entries are discoverable.
In `@core/src/main/java/org/zstack/core/db/UpdateQueryImpl.java`:
- Around line 55-59: The current null-safe check in UpdateQueryImpl.condAnd
correctly avoids NPE but the error message uses val.getClass() which prints full
class name with "class" prefix; change the message to use
val.getClass().getSimpleName() (still guarding for null) so the thrown
CloudRuntimeException shows a cleaner type name; update the String.format call
in the condAnd method to use getSimpleName() when val != null.
In
`@core/src/main/java/org/zstack/core/externalservice/ExternalServiceManagerImpl.java`:
- Around line 418-435: The oprerr(...) calls in
regenerateExternalServiceConfiguration (class ExternalServiceManagerImpl, method
regenerateExternalServiceConfiguration) are missing explicit error codes; update
both failure paths (the catch block after service.externalConfig(serviceType)
and the final completion.fail for missing service) to call oprerr with a
CloudOperationsErrorCode value (e.g.,
CLOUD_OPERATION_EXTERNAL_SERVICE_APPLY_FAILED and
CLOUD_OPERATION_EXTERNAL_SERVICE_NOT_FOUND), then add those constants to
CloudOperationsErrorCode enum/class with appropriate messages so the
completion.fail carries an error code and message; ensure references to
serviceType remain in the formatted message and keep the surrounding logic
(services map, ExternalService.externalConfig, and completion) unchanged.
- Around line 259-274: The error handling in
ExternalServiceManagerImpl.updateExternalServiceConfiguration uses operr(...)
without a CloudOperationsErrorCode, making it inconsistent; add a new error code
constant (e.g. ORG_ZSTACK_CORE_EXTERNALSERVICE_10001) to
CloudOperationsErrorCode and update the failure call in
updateExternalServiceConfiguration to use
operr(CloudOperationsErrorCode.ORG_ZSTACK_CORE_EXTERNALSERVICE_10001, "unable to
find external service configuration with uuid [%s]", msg.getUuid()); ensure the
new enum member follows the existing naming/numbering convention and
compile-time imports are adjusted so ExternalServiceManagerImpl can reference
CloudOperationsErrorCode.
In `@core/src/main/java/org/zstack/core/Platform.java`:
- Around line 986-1007: The code currently imports
org.apache.commons.lang.LocaleUtils which conflicts with
org.zstack.core.errorcode.LocaleUtils used in Platform, making the
fully-qualified reference org.zstack.core.errorcode.LocaleUtils.DEFAULT_LOCALE
verbose; fix by removing the conflicting import and either add a static import
for org.zstack.core.errorcode.LocaleUtils.DEFAULT_LOCALE and call DEFAULT_LOCALE
directly in Platform.localizeErrorCode(result, DEFAULT_LOCALE), or keep the
Apache import and instead fully-qualify the Apache Commons usages elsewhere so
the Platform class can use a simple static import; update imports accordingly
and ensure the call in Platform (localizeErrorCode) uses the unambiguous
DEFAULT_LOCALE symbol.
In `@core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java`:
- Line 58: 在 SentryInitHelper 中的 options.setMaxQueueSize(1000) 使用了魔法值
1000;请将该硬编码值替换为可配置或有意义命名的常量,例如从 TelemetryGlobalProperty(新增属性如
sentryMaxQueueSize)读取或定义类常量(e.g. DEFAULT_SENTRY_MAX_QUEUE_SIZE),并在
SentryInitHelper 中用该常量/属性替代原有的 1000,确保有默认值、配置注入或静态常量注释以便后续调优和维护。
In
`@header/src/main/java/org/zstack/header/core/external/service/APIAddExternalServiceConfigurationMsg.java`:
- Around line 53-57: The audit() in APIAddExternalServiceConfigurationMsg should
avoid a potential NPE when rsp.isSuccess() is true but evt or evt.getInventory()
is null; update the method to cast rsp to
APIAddExternalServiceConfigurationEvent (evt) then defensively check evt != null
and evt.getInventory() != null before calling evt.getInventory().getUuid(),
returning an empty string when either is null while still constructing the
Result with ExternalServiceConfigurationVO.class.
In
`@header/src/main/java/org/zstack/header/storage/addon/primary/ExternalPrimaryStorageInventory.java`:
- Around line 80-89: desensitizeUrlList currently assumes each list element can
be stringified safely; if elements are nested Maps/Lists (e.g., mdsUrls/mdsInfos
entries with credential fields) toString() will not properly desensitize
secrets. Update desensitizeUrlList (and reuse desensitizeUrl) to type-check each
element: if element is String call desensitizeUrl and add result; if element is
Map iterate its entries and desensitize any String values (or recursively call a
helper to handle nested Map/List), if element is List process recursively, and
for other types either skip or convert safely; finally put the processed list
back into config under the same key. Ensure helper names (desensitizeUrlList,
desensitizeUrl) are used so changes are easy to find.
In `@header/src/main/java/org/zstack/header/vm/VmAbnormalLifeCycleStruct.java`:
- Around line 37-44: Replace direct .equals() comparisons of host UUIDs with
null-safe Objects.equals() in the VmAbnormalLifeCycleStruct matching logic;
specifically update the match(...) implementation in the
VmPausedFromUnknownStateHostNotChanged enum constant (and other enum constants
inside VmAbnormalLifeCycleStruct that compare
getCurrentHostUuid()/getOriginalHostUuid()) to call
Objects.equals(struct.getCurrentHostUuid(), struct.getOriginalHostUuid())
instead of struct.getCurrentHostUuid().equals(struct.getOriginalHostUuid()) so
comparisons are consistent and null-safe.
In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java`:
- Around line 247-248: Replace the magic number 999 passed to
ExternalHostIdGetter in the call to getOrAllocateHostIdRef with a named
constant; define a descriptive constant (e.g., DEFAULT_EXTERNAL_HOST_ID or
EXTERNAL_HOST_ID_BASE) as a private static final field in KvmIscsiNodeServer (or
as a final local variable inside the method if scope-local) and use that
constant in place of 999 when constructing
ExternalHostIdGetter(param.getHostUuid(), param.getPrimaryStorage().getUuid())
so the purpose and bounds of the value are clear.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java`:
- Around line 470-515: The initLibvirtTlsCA method currently calls
ShellUtils.run(...) for openssl and file-permission commands but does not check
return values; update initLibvirtTlsCA to capture the integer return from each
ShellUtils.run call (especially the openssl genrsa and openssl req calls and
chown/chmod invocations), verify success (zero) before proceeding, and handle
failures by logging an error and aborting the CA initialization (e.g., throw or
return early) so subsequent FileUtils.readFileToString and
JsonLabel.createIfAbsent calls for caFile and keyFile are not executed with
missing/invalid files; ensure any cleanup or descriptive
logger.warn/logger.error includes the failing command and its exit code.
- Around line 498-507: The FileUtils usages around caFile/keyFile should be
updated to the non-deprecated Charset-aware overloads: when reading use
FileUtils.readFileToString(caFile, StandardCharsets.UTF_8) and
FileUtils.readFileToString(keyFile, StandardCharsets.UTF_8), and when writing
use FileUtils.writeStringToFile(caFile, caInv.getLabelValue(),
StandardCharsets.UTF_8) and FileUtils.writeStringToFile(keyFile,
keyInv.getLabelValue(), StandardCharsets.UTF_8); make these changes in the block
that reads into ca/key, calls new JsonLabel().createIfAbsent(LIBVIRT_TLS_CA_KEY,
ca) and createIfAbsent(LIBVIRT_TLS_PRIVATE_KEY, key), and then overwrites the
local files so encoding is explicit and deprecation warnings are removed.
In `@rest/src/main/resources/scripts/GoTestTemplate.groovy`:
- Line 31: The comment numbering is duplicated: change the comment string "// 2.
Per-resource test files" to "// 3. Per-resource test files" so it follows the
earlier "// 2." comment; update the comment text near the existing "// 2.
Per-resource test files" occurrence in GoTestTemplate.groovy (look for that
exact comment) to use "// 3." instead.
| // Remove any existing netmask/gateway/prefix tags, then add resolved ones | ||
| if (msg.getSystemTags() != null) { | ||
| msg.getSystemTags().removeIf(tag -> | ||
| VmSystemTags.IPV4_NETMASK.isMatch(tag) || VmSystemTags.IPV4_GATEWAY.isMatch(tag) | ||
| || VmSystemTags.IPV6_PREFIX.isMatch(tag) || VmSystemTags.IPV6_GATEWAY.isMatch(tag)); | ||
| } | ||
| if (!resolvedTags.isEmpty()) { | ||
| msg.getSystemTags().addAll(resolvedTags); | ||
| } |
There was a problem hiding this comment.
潜在的空指针异常风险
当 msg.getSystemTags() 为 null 时,第 376 行的 msg.getSystemTags().addAll(resolvedTags) 会抛出 NPE。虽然第 370 行检查了非空,但第 376 行的调用在该检查块之外。
🐛 建议修复
// Remove any existing netmask/gateway/prefix tags, then add resolved ones
if (msg.getSystemTags() != null) {
msg.getSystemTags().removeIf(tag ->
VmSystemTags.IPV4_NETMASK.isMatch(tag) || VmSystemTags.IPV4_GATEWAY.isMatch(tag)
|| VmSystemTags.IPV6_PREFIX.isMatch(tag) || VmSystemTags.IPV6_GATEWAY.isMatch(tag));
+ if (!resolvedTags.isEmpty()) {
+ msg.getSystemTags().addAll(resolvedTags);
+ }
+ } else if (!resolvedTags.isEmpty()) {
+ msg.setSystemTags(new ArrayList<>(resolvedTags));
}
- if (!resolvedTags.isEmpty()) {
- msg.getSystemTags().addAll(resolvedTags);
- }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java`
around lines 369 - 377, The code may NPE when msg.getSystemTags() is null
because addAll(resolvedTags) is outside the null check; inside
VmInstanceApiInterceptor ensure system tags are initialized before adding: check
msg.getSystemTags() and if null call msg.setSystemTags(new ArrayList<>()) (or
otherwise create an empty list), then perform removeIf and addAll using
msg.getSystemTags() and resolvedTags so addAll never runs on a null list.
| private void validateDnsAddresses(List<String> dnsAddresses) { | ||
| if (dnsAddresses == null || dnsAddresses.isEmpty()) { | ||
| return; | ||
| } | ||
|
|
||
| final Map<String, NicIpAddressInfo> nicNetworkInfo = new StaticIpOperator().getNicNetworkInfoBySystemTag(msg.getSystemTags()); | ||
| NicIpAddressInfo nicIpAddressInfo = nicNetworkInfo.get(msg.getDestL3NetworkUuid()); | ||
| if (nicIpAddressInfo != null) { | ||
| if (!nicIpAddressInfo.ipv4Address.isEmpty() && Q.New(UsedIpVO.class).eq(UsedIpVO_.ip, nicIpAddressInfo.ipv4Address).eq(UsedIpVO_.l3NetworkUuid, msg.getDestL3NetworkUuid()).isExists()) { | ||
| throw new ApiMessageInterceptionException(argerr(ORG_ZSTACK_COMPUTE_VM_10109, "the static IP[%s] has been occupied on the L3 network[uuid:%s]", nicIpAddressInfo.ipv4Address, msg.getDestL3NetworkUuid())); | ||
| } | ||
| if (!nicIpAddressInfo.ipv6Address.isEmpty() && Q.New(UsedIpVO.class).eq(UsedIpVO_.ip, IPv6NetworkUtils.getIpv6AddressCanonicalString(nicIpAddressInfo.ipv6Address)).eq(UsedIpVO_.l3NetworkUuid, msg.getDestL3NetworkUuid()).isExists()) { | ||
| throw new ApiMessageInterceptionException(argerr(ORG_ZSTACK_COMPUTE_VM_10110, "the static IP[%s] has been occupied on the L3 network[uuid:%s]", nicIpAddressInfo.ipv6Address, msg.getDestL3NetworkUuid())); | ||
| if (dnsAddresses.size() > VmInstanceConstant.MAXIMUM_NIC_DNS_NUMBER) { | ||
| throw new ApiMessageInterceptionException(argerr( | ||
| ORG_ZSTACK_COMPUTE_VM_10321, "at most %d DNS addresses are allowed, but got %d", | ||
| VmInstanceConstant.MAXIMUM_NIC_DNS_NUMBER, dnsAddresses.size())); | ||
| } | ||
|
|
||
| for (String dns : dnsAddresses) { | ||
| if (!NetworkUtils.isIpv4Address(dns) && !IPv6NetworkUtils.isIpv6Address(dns)) { | ||
| throw new ApiMessageInterceptionException(argerr( | ||
| ORG_ZSTACK_COMPUTE_VM_10322, "invalid DNS address[%s], must be a valid IPv4 or IPv6 address", dns)); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
DNS 地址应在校验前进行 trim 处理
根据编码规范,来自 Message 的参数应进行 trim 处理,因为用户可能在浏览器复制粘贴的数据带有空格或换行符。当前实现直接校验未经 trim 的 DNS 地址,可能导致有效地址被误判为无效。
🛡️ 建议修复
private void validateDnsAddresses(List<String> dnsAddresses) {
if (dnsAddresses == null || dnsAddresses.isEmpty()) {
return;
}
if (dnsAddresses.size() > VmInstanceConstant.MAXIMUM_NIC_DNS_NUMBER) {
throw new ApiMessageInterceptionException(argerr(
ORG_ZSTACK_COMPUTE_VM_10321, "at most %d DNS addresses are allowed, but got %d",
VmInstanceConstant.MAXIMUM_NIC_DNS_NUMBER, dnsAddresses.size()));
}
for (String dns : dnsAddresses) {
- if (!NetworkUtils.isIpv4Address(dns) && !IPv6NetworkUtils.isIpv6Address(dns)) {
+ String trimmedDns = dns != null ? dns.trim() : null;
+ if (trimmedDns == null || trimmedDns.isEmpty() ||
+ (!NetworkUtils.isIpv4Address(trimmedDns) && !IPv6NetworkUtils.isIpv6Address(trimmedDns))) {
throw new ApiMessageInterceptionException(argerr(
ORG_ZSTACK_COMPUTE_VM_10322, "invalid DNS address[%s], must be a valid IPv4 or IPv6 address", dns));
}
}
}As per coding guidelines: "注意检查来自 Message 的参数是否做过 trim,用户可能在浏览器上复制粘贴的数据带有空格、换行符等"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private void validateDnsAddresses(List<String> dnsAddresses) { | |
| if (dnsAddresses == null || dnsAddresses.isEmpty()) { | |
| return; | |
| } | |
| final Map<String, NicIpAddressInfo> nicNetworkInfo = new StaticIpOperator().getNicNetworkInfoBySystemTag(msg.getSystemTags()); | |
| NicIpAddressInfo nicIpAddressInfo = nicNetworkInfo.get(msg.getDestL3NetworkUuid()); | |
| if (nicIpAddressInfo != null) { | |
| if (!nicIpAddressInfo.ipv4Address.isEmpty() && Q.New(UsedIpVO.class).eq(UsedIpVO_.ip, nicIpAddressInfo.ipv4Address).eq(UsedIpVO_.l3NetworkUuid, msg.getDestL3NetworkUuid()).isExists()) { | |
| throw new ApiMessageInterceptionException(argerr(ORG_ZSTACK_COMPUTE_VM_10109, "the static IP[%s] has been occupied on the L3 network[uuid:%s]", nicIpAddressInfo.ipv4Address, msg.getDestL3NetworkUuid())); | |
| } | |
| if (!nicIpAddressInfo.ipv6Address.isEmpty() && Q.New(UsedIpVO.class).eq(UsedIpVO_.ip, IPv6NetworkUtils.getIpv6AddressCanonicalString(nicIpAddressInfo.ipv6Address)).eq(UsedIpVO_.l3NetworkUuid, msg.getDestL3NetworkUuid()).isExists()) { | |
| throw new ApiMessageInterceptionException(argerr(ORG_ZSTACK_COMPUTE_VM_10110, "the static IP[%s] has been occupied on the L3 network[uuid:%s]", nicIpAddressInfo.ipv6Address, msg.getDestL3NetworkUuid())); | |
| if (dnsAddresses.size() > VmInstanceConstant.MAXIMUM_NIC_DNS_NUMBER) { | |
| throw new ApiMessageInterceptionException(argerr( | |
| ORG_ZSTACK_COMPUTE_VM_10321, "at most %d DNS addresses are allowed, but got %d", | |
| VmInstanceConstant.MAXIMUM_NIC_DNS_NUMBER, dnsAddresses.size())); | |
| } | |
| for (String dns : dnsAddresses) { | |
| if (!NetworkUtils.isIpv4Address(dns) && !IPv6NetworkUtils.isIpv6Address(dns)) { | |
| throw new ApiMessageInterceptionException(argerr( | |
| ORG_ZSTACK_COMPUTE_VM_10322, "invalid DNS address[%s], must be a valid IPv4 or IPv6 address", dns)); | |
| } | |
| } | |
| } | |
| private void validateDnsAddresses(List<String> dnsAddresses) { | |
| if (dnsAddresses == null || dnsAddresses.isEmpty()) { | |
| return; | |
| } | |
| if (dnsAddresses.size() > VmInstanceConstant.MAXIMUM_NIC_DNS_NUMBER) { | |
| throw new ApiMessageInterceptionException(argerr( | |
| ORG_ZSTACK_COMPUTE_VM_10321, "at most %d DNS addresses are allowed, but got %d", | |
| VmInstanceConstant.MAXIMUM_NIC_DNS_NUMBER, dnsAddresses.size())); | |
| } | |
| for (String dns : dnsAddresses) { | |
| String trimmedDns = dns != null ? dns.trim() : null; | |
| if (trimmedDns == null || trimmedDns.isEmpty() || | |
| (!NetworkUtils.isIpv4Address(trimmedDns) && !IPv6NetworkUtils.isIpv6Address(trimmedDns))) { | |
| throw new ApiMessageInterceptionException(argerr( | |
| ORG_ZSTACK_COMPUTE_VM_10322, "invalid DNS address[%s], must be a valid IPv4 or IPv6 address", dns)); | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java`
around lines 409 - 426, validateDnsAddresses currently validates entries without
trimming, so values like " 1.2.3.4 " fail; update validateDnsAddresses to first
map dnsAddresses to trimmed strings (e.g., dnsAddresses =
dnsAddresses.stream().map(String::trim).collect...) and remove any empty strings
produced by trimming before performing the size check and IPv4/IPv6 checks; use
the same symbols (validateDnsAddresses, dnsAddresses,
NetworkUtils.isIpv4Address, IPv6NetworkUtils.isIpv6Address,
VmInstanceConstant.MAXIMUM_NIC_DNS_NUMBER) and keep the same error behavior
(throw ApiMessageInterceptionException with existing error codes) after
trimming/filtering.
| options.setLogger(new io.sentry.ILogger() { | ||
| @Override | ||
| public void log(SentryLevel level, String message, Object... args) { | ||
| String formatted = args.length > 0 ? String.format(message, args) : message; | ||
| switch (level) { | ||
| case ERROR: logger.warn("[Sentry] " + formatted); break; | ||
| case WARNING: logger.warn("[Sentry] " + formatted); break; | ||
| default: logger.info("[Sentry] " + formatted); break; | ||
| } | ||
| } | ||
|
|
||
| @Override | ||
| public void log(SentryLevel level, String message, Throwable throwable) { | ||
| switch (level) { | ||
| case ERROR: logger.warn("[Sentry] " + message, throwable); break; | ||
| case WARNING: logger.warn("[Sentry] " + message, throwable); break; | ||
| default: logger.info("[Sentry] " + message, throwable); break; | ||
| } | ||
| } | ||
|
|
||
| @Override | ||
| public void log(SentryLevel level, Throwable throwable, String message, Object... args) { | ||
| String formatted = args.length > 0 ? String.format(message, args) : message; | ||
| log(level, formatted, throwable); | ||
| } | ||
|
|
||
| @Override | ||
| public boolean isEnabled(SentryLevel level) { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
fd 'pom.xml|build.gradle|build.gradle.kts|gradle.properties' -t f \
| xargs -r rg -n 'io\.sentry|sentry'Repository: MatheMatrix/zstack
Length of output: 1821
🏁 Script executed:
# Find and read the SentryInitHelper.java file
find . -name "SentryInitHelper.java" -type fRepository: MatheMatrix/zstack
Length of output: 130
🏁 Script executed:
# Read the file to understand the actual implementation
cat -n core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java | sed -n '50,100p'Repository: MatheMatrix/zstack
Length of output: 3276
🏁 Script executed:
# Check if there are any other custom ILogger implementations in the codebase for comparison
rg "ILogger" --type java -B 2 -A 10Repository: MatheMatrix/zstack
Length of output: 1755
🌐 Web query:
Sentry Java 8.16 ILogger interface contract message args null safety
💡 Result:
In Sentry Java SDK 8.16.0, the io.sentry.ILogger contract is (per its Javadoc annotations):
level: @NotNull → implementations may assume it’s non-null.message: @NotNull → implementations may assume it’s non-null.args: @nullable Object... → implementations must be null-safe for the varargs array itself (i.e., callers are allowed to passnullinstead of an empty array). The element nullability isn’t specified, so implementations should also tolerateargscontainingnullvalues.throwable(where present): @nullable → must be null-safe.
Source: ILogger Javadoc for 8.16.0 shows these nullability annotations on each overload. [1]
Sources
- Sentry Java 8.16.0 Javadoc —
io.sentry.ILogger: https://javadoc.io/static/io.sentry/sentry/8.16.0/io/sentry/ILogger.html
Sentry 自定义日志回调存在空指针和格式化异常风险,且 ERROR 被降级为 WARN。
根据 Sentry Java 8.16 ILogger 契约,args 参数为 @Nullable,但第 65、84 行直接调用 args.length 而未做空检查,会导致 NPE。此外,String.format() 也可能因格式模板与参数不匹配而抛异常。同时第 67、76 行把 ERROR 级别记录为 warn,弱化故障可观测性。
修复建议
`@Override`
public void log(SentryLevel level, String message, Object... args) {
- String formatted = args.length > 0 ? String.format(message, args) : message;
+ String formatted = safeFormat(message, args);
switch (level) {
- case ERROR: logger.warn("[Sentry] " + formatted); break;
+ case ERROR: logger.error("[Sentry] " + formatted); break;
case WARNING: logger.warn("[Sentry] " + formatted); break;
default: logger.info("[Sentry] " + formatted); break;
}
}
`@Override`
public void log(SentryLevel level, Throwable throwable, String message, Object... args) {
- String formatted = args.length > 0 ? String.format(message, args) : message;
+ String formatted = safeFormat(message, args);
log(level, formatted, throwable);
}
+
+ private String safeFormat(String message, Object... args) {
+ if (args == null || args.length == 0) {
+ return message;
+ }
+ try {
+ return String.format(message, args);
+ } catch (Exception e) {
+ return message;
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java` around
lines 62 - 91, In the anonymous ILogger implementation passed to
options.setLogger in SentryInitHelper, guard against null and formatting
exceptions: check if args is null before accessing args.length, and wrap
String.format(...) in a try/catch to fall back to a safe message if formatting
fails; likewise, change the logging for SentryLevel.ERROR to use
logger.error(...) (not logger.warn) so error-level events are not downgraded;
apply these fixes in the three logger methods
(log(SentryLevel,String,Object...), log(SentryLevel,String,Throwable), and
log(SentryLevel,Throwable,String,Object...)) to ensure no NPEs or format
exceptions and correct severity mapping.
| public static boolean isIpOutsideL3NetworkCidr(String ip, String l3Uuid) { | ||
| return !isIpInL3NetworkCidr(ip, l3Uuid); |
There was a problem hiding this comment.
isIpOutsideL3NetworkCidr() 在无 range 的 L3 上返回值反了。
这里直接对 isIpInL3NetworkCidr() 取反,所以当 L3 没有任何 NormalIpRangeVO 时会返回 true。新调用方把这个 helper 当成 “no-range => false” 来用,这会把 no-IP-range/no-IPAM 场景误判成 outside。
💡 建议修正
public static boolean isIpOutsideL3NetworkCidr(String ip, String l3Uuid) {
- return !isIpInL3NetworkCidr(ip, l3Uuid);
+ if (ip == null || l3Uuid == null) {
+ return false;
+ }
+
+ Integer ipVersion = null;
+ if (NetworkUtils.isIpv4Address(ip)) {
+ ipVersion = IPv6Constants.IPv4;
+ } else if (IPv6NetworkUtils.isIpv6Address(ip)) {
+ ipVersion = IPv6Constants.IPv6;
+ }
+
+ if (ipVersion == null) {
+ return false;
+ }
+
+ boolean hasRange = Q.New(NormalIpRangeVO.class)
+ .eq(NormalIpRangeVO_.l3NetworkUuid, l3Uuid)
+ .eq(NormalIpRangeVO_.ipVersion, ipVersion)
+ .isExists();
+ return hasRange && !isIpInL3NetworkCidr(ip, l3Uuid);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@network/src/main/java/org/zstack/network/l3/IpRangeHelper.java` around lines
303 - 304, isIpOutsideL3NetworkCidr currently just negates isIpInL3NetworkCidr
which causes it to return true for L3s with no NormalIpRangeVO; update
isIpOutsideL3NetworkCidr to first check for any NormalIpRangeVO for the given
l3Uuid and return false when none exist, otherwise return the negation of
isIpInL3NetworkCidr(ip, l3Uuid); reference the methods isIpOutsideL3NetworkCidr
and isIpInL3NetworkCidr and the NormalIpRangeVO query used elsewhere in this
class to locate and implement the guard.
| Set<ReservedIpRangeVO> reservedIpRanges = self.getReservedIpRanges(); | ||
| if (reservedIpRanges != null && !reservedIpRanges.isEmpty()) { | ||
| freeIpInventorys.removeIf(freeIp -> reservedIpRanges.stream().anyMatch( | ||
| r -> NetworkUtils.isInRange(freeIp.getIp(), r.getStartIp(), r.getEndIp()))); | ||
| } |
There was a problem hiding this comment.
过滤完直接返回,会漏掉后面的可用地址。
这里 removeIf 之后没有继续补齐结果;只要前面收集到的候选里包含 reserved IP,API 就可能返回少于 msg.getLimit() 的地址,甚至空列表,即使后面还有可用 IP。过滤应前移到累加阶段,或者在过滤后继续向后取满。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java` around lines
1126 - 1130, The current removal of reserved IPs from freeIpInventorys in
L3BasicNetwork (using reservedIpRanges and NetworkUtils.isInRange) can cause the
returned list to be shorter than msg.getLimit() because you simply remove
matches instead of filling the result to the requested limit; change the logic
so reserved IP filtering happens during accumulation (skip adding any freeIp
that matches reservedIpRanges) or, if you must filter after collection, continue
iterating the candidate pool to append additional non-reserved freeIp entries
until you reach msg.getLimit(); update the code that populates freeIpInventorys
(the accumulation loop that feeds freeIpInventorys and any caller that relies on
msg.getLimit()) to enforce this behavior.
| List<VmNicInventory> filtered = l3Mgr.filterVmNicByIpVersion(nicInvs, IPv6Constants.IPv4); | ||
|
|
||
| // Filter out NICs whose primary IP is outside L3 CIDR | ||
| filtered = filtered.stream().filter(nic -> { | ||
| String nicIp = nic.getIp(); | ||
| String nicL3 = nic.getL3NetworkUuid(); | ||
| return nicIp == null || IpRangeHelper.isIpInL3NetworkCidr(nicIp, nicL3); | ||
| }).collect(Collectors.toList()); | ||
|
|
||
| return filtered; |
There was a problem hiding this comment.
逐 NIC 调用 CIDR 检查会产生 N+1 查询,建议改为批量判定。
Line 372-376 里每个 NIC 都调用一次 IpRangeHelper.isIpInL3NetworkCidr,在候选 NIC 较多时会显著拉高查询延迟。
💡建议改法(先批量拉取,再内存过滤)
- filtered = filtered.stream().filter(nic -> {
- String nicIp = nic.getIp();
- String nicL3 = nic.getL3NetworkUuid();
- return nicIp == null || IpRangeHelper.isIpInL3NetworkCidr(nicIp, nicL3);
- }).collect(Collectors.toList());
+ Set<String> l3Uuids = filtered.stream()
+ .map(VmNicInventory::getL3NetworkUuid)
+ .filter(Objects::nonNull)
+ .collect(Collectors.toSet());
+
+ Map<String, List<String>> l3Cidrs = Q.New(NormalIpRangeVO.class)
+ .select(NormalIpRangeVO_.l3NetworkUuid, NormalIpRangeVO_.networkCidr)
+ .in(NormalIpRangeVO_.l3NetworkUuid, l3Uuids)
+ .eq(NormalIpRangeVO_.ipVersion, IPv6Constants.IPv4)
+ .listTuple()
+ .stream()
+ .collect(Collectors.groupingBy(
+ t -> t.get(0, String.class),
+ Collectors.mapping(t -> t.get(1, String.class), Collectors.toList())
+ ));
+
+ filtered = filtered.stream().filter(nic -> {
+ if (nic.getIp() == null) {
+ return true;
+ }
+ return l3Cidrs.getOrDefault(nic.getL3NetworkUuid(), Collections.emptyList())
+ .stream().anyMatch(cidr -> NetworkUtils.isIpv4InCidr(nic.getIp(), cidr));
+ }).collect(Collectors.toList());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugin/portForwarding/src/main/java/org/zstack/network/service/portforwarding/PortForwardingManagerImpl.java`
around lines 369 - 378, The per-NIC call to IpRangeHelper.isIpInL3NetworkCidr in
PortForwardingManagerImpl (after filterVmNicByIpVersion) causes an N+1 query;
instead, batch-fetch the L3 network CIDRs for all distinct l3NetworkUuid values
using l3Mgr (or the L3 network DAO), build a map of l3Uuid->CIDR(s) in memory,
then replace the stream filter so each VmNicInventory checks its IP against the
preloaded CIDR(s) using IpRangeHelper (or a pure in-memory helper) without
additional DB calls; update the logic around filtered =
filtered.stream().filter(...) to use the map and avoid per-nic queries.
| GetVolumeClientsRsp rsp = new HttpCaller<>(GET_VOLUME_CLIENTS_PATH, cmd, GetVolumeClientsRsp.class, | ||
| null, TimeUnit.SECONDS, 30, true) | ||
| .setTryNext(true) | ||
| .syncCall(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# 目的:确认 ZbsPrimaryStorageMdsBase.syncCall 在超时/连接失败时,
# 是返回失败响应,还是直接抛异常。
# 期望:返回失败响应;否则这里的 setTryNext(true) 不会真正切到下一个 MDS。
echo "===== ZbsPrimaryStorageMdsBase.syncCall definition ====="
fd 'ZbsPrimaryStorageMdsBase.java' | while read -r f; do
echo "FILE: $f"
rg -n -C8 '\bsyncCall\s*\(' "$f"
done
echo "===== HttpCaller.doSyncCall in ZbsStorageController ====="
fd 'ZbsStorageController.java' | while read -r f; do
echo "FILE: $f"
rg -n -C12 '\bdoSyncCall\s*\(' "$f"
doneRepository: MatheMatrix/zstack
Length of output: 3129
🏁 Script executed:
# Find and examine ZbsPrimaryStorageMdsBase class
fd 'ZbsPrimaryStorageMdsBase.java' -type f | head -5Repository: MatheMatrix/zstack
Length of output: 232
🏁 Script executed:
# Examine the syncCall method in ZbsPrimaryStorageMdsBase
rg -n 'syncCall' plugin/zbs/src/main/java/org/zstack/storage/zbs/ZbsPrimaryStorageMdsBase.java -A 20Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Check if there's exception handling or try-catch in the method
rg -n 'try|catch|throw' plugin/zbs/src/main/java/org/zstack/storage/zbs/ZbsPrimaryStorageMdsBase.java -B 2 -A 3Repository: MatheMatrix/zstack
Length of output: 829
🏁 Script executed:
# Find the ZbsPrimaryStorageMdsBase.java file
find . -name 'ZbsPrimaryStorageMdsBase.java' 2>/dev/nullRepository: MatheMatrix/zstack
Length of output: 141
🏁 Script executed:
# Get file size and read it
f=$(find . -name 'ZbsPrimaryStorageMdsBase.java' 2>/dev/null | head -1)
if [ -f "$f" ]; then
wc -l "$f"
echo "=== Full file content ==="
cat "$f"
fiRepository: MatheMatrix/zstack
Length of output: 18792
🏁 Script executed:
# Find ZbsMdsBase.java
find . -name 'ZbsMdsBase.java' 2>/dev/nullRepository: MatheMatrix/zstack
Length of output: 127
🏁 Script executed:
# Get ZbsMdsBase file and examine it
f=$(find . -name 'ZbsMdsBase.java' 2>/dev/null | head -1)
if [ -f "$f" ]; then
wc -l "$f"
echo "=== Full file ==="
cat "$f"
fiRepository: MatheMatrix/zstack
Length of output: 5717
🏁 Script executed:
# Find RESTFacade and examine syncJsonPost
find . -name 'RESTFacade.java' 2>/dev/null | head -3Repository: MatheMatrix/zstack
Length of output: 123
🏁 Script executed:
# Search for syncJsonPost implementation
rg -n 'syncJsonPost' --type java -A 15 | head -100Repository: MatheMatrix/zstack
Length of output: 10408
🏁 Script executed:
# Read RESTFacade.java to find syncJsonPost method
cat ./header/src/main/java/org/zstack/header/rest/RESTFacade.java | head -300Repository: MatheMatrix/zstack
Length of output: 5768
🏁 Script executed:
# Search for syncJsonPost method specifically
rg -n 'syncJsonPost' ./header/src/main/java/org/zstack/header/rest/RESTFacade.java -A 10Repository: MatheMatrix/zstack
Length of output: 2969
🏁 Script executed:
# Find the implementation of RESTFacade
find . -name '*RESTFacade*.java' -type f | grep -i implRepository: MatheMatrix/zstack
Length of output: 123
🏁 Script executed:
# Search for implementations of syncJsonPost
rg -l 'implements RESTFacade' --type javaRepository: MatheMatrix/zstack
Length of output: 121
🏁 Script executed:
# Get file size and read RESTFacadeImpl.java
f='./core/src/main/java/org/zstack/core/rest/RESTFacadeImpl.java'
wc -l "$f"Repository: MatheMatrix/zstack
Length of output: 128
🏁 Script executed:
# Read RESTFacadeImpl and search for syncJsonPost implementation
f='./core/src/main/java/org/zstack/core/rest/RESTFacadeImpl.java'
rg -n 'syncJsonPost' "$f" -A 20 | head -150Repository: MatheMatrix/zstack
Length of output: 7101
🏁 Script executed:
# Search for the actual syncJsonPost implementation that handles the HTTP call
f='./core/src/main/java/org/zstack/core/rest/RESTFacadeImpl.java'
rg -n 'public.*syncJsonPost.*String.*returnClass' "$f" -A 30 | head -100Repository: MatheMatrix/zstack
Length of output: 4472
🏁 Script executed:
# Search for syncJson method which might be the underlying implementation
f='./core/src/main/java/org/zstack/core/rest/RESTFacadeImpl.java'
rg -n 'private.*syncJson|public.*syncJson' "$f" -A 25 | head -150Repository: MatheMatrix/zstack
Length of output: 7133
🏁 Script executed:
# Find the syncJson method implementation
f='./core/src/main/java/org/zstack/core/rest/RESTFacadeImpl.java'
rg -n 'private.*<T>.*syncJson\(' "$f" -A 40 | head -150Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Also search for syncJson without the type parameter
f='./core/src/main/java/org/zstack/core/rest/RESTFacadeImpl.java'
sed -n '798,900p' "$f"Repository: MatheMatrix/zstack
Length of output: 4910
需要修复:同步重试逻辑无法处理超时/连接异常。
当前实现中,HttpCaller.doSyncCall() 只在 ret.isSuccess() == false 时才会触发 tryNext 重试。但底层 RESTFacadeImpl.syncRawJson() 在超时或连接失败时会直接抛出 OperationFailureException 异常,而非返回失败响应对象。该异常会绕过 if (!ret.isSuccess()) 检查,导致 setTryNext(true) 对超时/连接异常无效,程序会立即在第一个 MDS 上失败。
建议在 HttpCaller.doSyncCall() 中添加异常捕获逻辑,确保超时和连接异常也能触发 MDS 切换重试。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugin/zbs/src/main/java/org/zstack/storage/zbs/ZbsStorageController.java`
around lines 182 - 185, HttpCaller.doSyncCall currently only retries when the
returned response has ret.isSuccess()==false, but RESTFacadeImpl.syncRawJson
throws OperationFailureException on timeouts/connection errors which bypass that
check; update HttpCaller.doSyncCall to catch OperationFailureException (and
related IO/connection timeout exceptions) around the syncRawJson call, treat
these exceptions as a failed call that honors setTryNext(true), log the
exception, and proceed to the next MDS endpoint before ultimately rethrowing if
all endpoints fail so timeouts/connection failures trigger the same MDS-switch
retry logic as unsuccessful responses.
| synchronized (lifecycleLock) { | ||
| if (LifeCycle.NodeJoin.toString().equals(d.getLifeCycle())) { | ||
| // Clear from suspected set since the node is confirmed alive | ||
| suspectedMissingFromDb.remove(d.getInventory().getUuid()); | ||
| nodeLifeCycle.nodeJoin(d.getInventory()); | ||
| } else if (LifeCycle.NodeLeft.toString().equals(d.getLifeCycle())) { | ||
| nodeLifeCycle.nodeLeft(d.getInventory()); | ||
| } else { | ||
| throw new CloudRuntimeException(String.format("unknown lifecycle[%s]", d.getLifeCycle())); | ||
| } |
There was a problem hiding this comment.
lifecycleLock 还没覆盖 nodeDie() 这条 node-left 路径。
这里把 canonical callback 和 DB/hash-ring reconcile 都串行化了,但同文件 Line 773-781 的 nodeDie() 仍然在锁外直接执行 nodeLifeCycle.nodeLeft(...)。节点刚恢复并广播 NodeJoin 时,旧的 fencing 线程仍可能补一个 stale nodeLeft 把它再次踢出 hash ring,这个竞态还没有完全收口。建议把 nodeDie() 里的本地 nodeLeft 和事件派发也纳入同一串行机制。
Also applies to: 877-925
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@portal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.java`
around lines 381 - 390, The nodeDie() path is calling
nodeLifeCycle.nodeLeft(...) outside the lifecycleLock, causing a race where a
stale nodeLeft can run concurrently with a later NodeJoin handling; move the
local nodeLeft call and its event dispatch inside the same
synchronized(lifecycleLock) block used by the NodeJoin/NodeLeft canonical
callback so DB/hash-ring reconciliation and callback handling are serialized.
Concretely, update the nodeDie() method to acquire lifecycleLock before invoking
nodeLifeCycle.nodeLeft(...) and any related suspectedMissingFromDb updates or
event dispatch logic (mirror the pattern used around nodeLifeCycle.nodeJoin(...)
and the existing NodeLeft branch) so both canonical callbacks and
fencing-generated nodeLefts use the same lock. Ensure the same change is applied
for the other referenced region (lines around the second occurrence noted, e.g.,
the block covering 877-925) so all nodeLeft paths are synchronized on
lifecycleLock.
| ManagementNodeInventory inv = new ManagementNodeInventory(); | ||
| inv.setUuid(nodeUuid); | ||
| try { | ||
| inv.setHostName(destinationMaker.getNodeInfo(nodeUuid).getNodeIP()); | ||
| } catch (Exception e) { | ||
| logger.warn(String.format("cannot get node info for node[uuid:%s], use empty hostname", nodeUuid)); | ||
| } | ||
|
|
||
| nodeLifeCycle.nodeLeft(inv); |
There was a problem hiding this comment.
不要在 heartbeat 线程里通过 destinationMaker 间接走普通数据源。
Line 739-740 已经约束这个 task 只能使用 heartBeatDBSource,但这里 cache miss 时会落到 ResourceDestinationMakerImpl.getNodeInfo() 的 dbf.findByUuid(...)。DB 抖动时,这会绕过 heartbeat 的重连逻辑,把健康检查卡在普通 dbf 上。这里最好只用 joinedManagementNodes 的缓存 hostname,拿不到就留空。
🛠️ 建议修正
- try {
- inv.setHostName(destinationMaker.getNodeInfo(nodeUuid).getNodeIP());
- } catch (Exception e) {
- logger.warn(String.format("cannot get node info for node[uuid:%s], use empty hostname", nodeUuid));
- }
+ ManagementNodeInventory knownInv = joinedManagementNodes.get(nodeUuid);
+ if (knownInv != null) {
+ inv.setHostName(knownInv.getHostName());
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@portal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.java`
around lines 890 - 898, The heartbeat thread in ManagementNodeManagerImpl must
not call destinationMaker.getNodeInfo (which triggers
ResourceDestinationMakerImpl.getNodeInfo -> dbf.findByUuid) because this
bypasses heartBeatDBSource; change the code around nodeLifeCycle.nodeLeft(inv)
to fetch hostname only from the in-memory joinedManagementNodes cache (use the
joinedManagementNodes map/structure available in ManagementNodeManagerImpl) and,
if there's no cached entry for nodeUuid, leave inv.hostName empty instead of
calling destinationMaker; remove the try/catch that calls
destinationMaker.getNodeInfo to avoid hitting the ordinary DB path.
| // localize with webhook caller's locale (message already populated by Platform.err) | ||
| String locale = resolveLocale(); | ||
| if (!LocaleUtils.DEFAULT_LOCALE.equals(locale)) { | ||
| i18nService.localizeErrorCode(evt.getError(), locale); | ||
| } | ||
| response.setError(evt.getError()); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
fd 'GlobalErrorCodeI18nService.java' -x sh -c 'echo "===== {} ====="; sed -n "1,240p" "{}"'
rg -n -C3 'localizeErrorCode\(' --glob '*.java'
fd 'AsyncRestApiStore.java|AsyncRestQueryResult.java|RequestData.java' -x sh -c 'echo "===== {} ====="; sed -n "1,260p" "{}"'
rg -n -C3 'class AsyncRestApiStore|query\(|complete\(' --glob '*.java'Repository: MatheMatrix/zstack
Length of output: 50376
在调用 localizeErrorCode() 之前需要深拷贝 ErrorCode 对象。
GlobalErrorCodeI18nServiceImpl.localizeErrorCode() 会直接修改入参的 ErrorCode 对象(通过 setMessage()、setCause() 等 setter 方法)。如果这些错误对象来自缓存或被多个请求复用,不同请求的 Accept-Language 会污染错误信息,导致后续请求获得错误的本地化消息。
应该先深拷贝 ErrorCode,再对副本进行本地化处理,最后将本地化后的副本设置到响应中。
涉及的位置:
callWebHook()第 414 行:i18nService.localizeErrorCode(evt.getError(), locale);sendResponse()第 932 行:i18nService.localizeErrorCode(response.getError(), locale);sendReplyResponse()第 1443 行:i18nService.localizeErrorCode(reply.getError(), locale);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@rest/src/main/java/org/zstack/rest/RestServer.java` around lines 411 - 416,
Deep-copy the ErrorCode before localizing to avoid mutating shared cached
objects: in callWebHook, sendResponse, and sendReplyResponse create a deep clone
of the ErrorCode (clone/copy the result of evt.getError, response.getError,
reply.getError), pass that clone to i18nService.localizeErrorCode(..., locale),
then set the localized clone back onto the event/response/reply (instead of
localizing the original). Ensure the copy is a full deep copy (message, cause,
details, children) so no shared references remain.
[plugin-premium]: GPU/VM page keeps loading when shutting down or encountering errors in Zaku cluster.
Resolves: ZSTAC-80202
Change-Id: I7778676171646874706164777869707279776172
sync from gitlab !9014