node_port: stale iptables DNAT rules shadow active sandbox, making port unreachable
Summary
node_port DNAT rules are never cleaned up when sandboxes are destroyed. Since iptables evaluates rules first-match-wins, stale rules pointing to dead sandbox IPs shadow the active sandbox's rules, making the node port completely unreachable — even from localhost.
Root cause
firewall.go uses -A (append) to add DNAT rules to PREROUTING, OUTPUT, and POSTROUTING chains but never deletes them when sandboxes are destroyed. After several deploy cycles, stale rules accumulate and take priority over the current sandbox's rules.
Verified on miren-club
# Localhost access — CLOSED (should be open)
$ nc -z -w3 127.0.0.1 6667
# Node IP access — CLOSED (should be open)
$ nc -z -w3 10.128.0.35 6667
25 DNAT rules exist in the OUTPUT chain for port 6667. The first rule targets 10.8.32.126 (dead sandbox) and captured our test packet (1 pkt, 60 bytes). The live sandbox's rule is ~24th in the chain and never reached.
Fix needed
- Delete DNAT rules when sandboxes are destroyed — Add
-D(delete) calls to the sandbox teardown path insandbox.go(around L2232-2397), mirroring the three rules created infirewall.go(PREROUTING, OUTPUT, POSTROUTING) - Consider using
-I(insert) instead of-A(append) — As a defense-in-depth measure, inserting new rules at the top of the chain would ensure the latest sandbox always wins, even if cleanup is missed - Flush stale rules on
miren-club— 25 dead DNAT entries need manual cleanup
Scope clarification
node_port is not responsible for cloud firewall rules (GCP only allows 80/443/8443/8989). External access to arbitrary ports requires separate firewall configuration. This issue is strictly about making the port work on the node itself.
Code references
runtime/controllers/sandbox/firewall.go— Rule creation (PREROUTING L40-52, OUTPUT L54-66, POSTROUTING L68-80)runtime/controllers/sandbox/sandbox.go— Sandbox deletion path (L2232-2397, no iptables cleanup)
Related
- MIR-751: node_port implicit HTTP port dropped