Skip to content

Conversation

@vsoch
Copy link
Contributor

@vsoch vsoch commented Feb 21, 2025

I am finding with testing that the networking between hosts does not work when we are running in rootful. I was testing this because using nvidia devices does work with rootful, but once I got to the stop of needing pods to communicate, there was no communication.

I am not sure about the error, but this test should reproduce it in CI. Note that to enable this we use the docker-rootful template provided by lima (@AkihiroSuda you have thought of all things)! The main changes here are to add this test to the matrix, and ensure that in the different install scripts, we largely do nothing if the container runtime is docker-rootful.

Related to #365 but does not fix it, only demonstrates it.

@vsoch
Copy link
Contributor Author

vsoch commented Feb 21, 2025

Note that I've seen two variants of this error - either an operation timeout (the result here):

image

Or that the address is not reachable / bad (what I've seen in production and my researchapps testing CI):

image

- lima_template: template://ubuntu-24.04
container_engine: docker
- lima_template: template://docker-rootful
container_engine: docker-rootful
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer this form

- lima_template: template://ubuntu-24.04
  container_engine: docker
  rootfull: 1

if [[ "$CONTAINER_ENGINE" == "docker-rootful" ]]
then
CONTAINER_ENGINE="docker"
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the variables to be immutable through the entire lifecycle of the test.
So, another variable like ROOTFUL=1 should be introduced.

@AkihiroSuda
Copy link
Member

Thanks, I confirmed that this issue happens on my local machines too, but I haven't identified the cause.

Tested with Docker v28 and v27.5.1, on Ubuntu 24.04.1 (ARM64).

I think it was working in the past?

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Feb 24, 2025

ICMP and DNS still seems to work, but TCP across the nodes seems broken?

VXLAN packets are apparently sent and received on each of the VMs, though. (Run tcpdump udp).

Apparently, the receiver VM seems refusing to route the VXLAN packets to the usernetes-node-1 container where kubelet, flannel, etc. are running in.

@AkihiroSuda
Copy link
Member

Found a workaround: execute ethtool --offload eth0 tx-checksum-ip-generic off in usernetes-node-1 container

@thaJeztah
Copy link

Any eyes needed here from the Moby networking folks? (I know they're pretty busy currently, but if it's useful I can try ask them if they have time to spare to give it eyes)

@vsoch vsoch force-pushed the test-rootful branch 5 times, most recently from 0010ee9 to 0d56a3c Compare February 24, 2025 16:26
This is important to run on multi-node

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
@vsoch
Copy link
Contributor Author

vsoch commented Feb 24, 2025

@AkihiroSuda do you remember the last time you tested with it working? In recent memory we had updates to flannel, the underlying kind node (Kubernetes version), and (for me) at some point last year the additional make sync-external-ip was added. If we can reproduce a previously working version it could be a good strategy to debug (to compare to).

@vsoch
Copy link
Contributor Author

vsoch commented Feb 24, 2025

oh wow, this is really interesting!

Not sure if this is expected, but this looks to be a warning in the failed nerdctl setup:

Warning: 7m[WARNING] buildkitd has access to images in "buildkit" namespace by default. If you want to give buildkitd access to the images in "default" namespace, run this command with CONTAINERD_NAMESPACE=default

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Feb 25, 2025

The ethtool --offload eth0 tx-checksum-ip-generic off rule can be probably appended here:

# Correct UDP checksums for VXLAN behind NAT
# https://github.com/flannel-io/flannel/issues/1279
# https://github.com/kubernetes/kops/pull/9074
# https://github.com/karmab/kcli/commit/b1a8eff658d17cf4e28162f0fa2c8b2b10e5ad00
SUBSYSTEM=="net", ACTION=="add|change|move", ENV{INTERFACE}=="flannel.1", RUN+="/usr/sbin/ethtool -K flannel.1 tx-checksum-ip-generic off"

It is still unclear why this is needed only for rootful, though.

Any eyes needed here from the Moby networking folks? (I know they're pretty busy currently, but if it's useful I can try ask them if they have time to spare to give it eyes)

Thanks, that would be appreciated.

@AkihiroSuda
Copy link
Member

Warning: 7m[WARNING] buildkitd has access to images in "buildkit" namespace by default. If you want to give buildkitd access to the images in "default" namespace, run this command with CONTAINERD_NAMESPACE=default

Irrelevant to the topic.
Should be fixed though.

@AkihiroSuda
Copy link
Member

@vsoch Do you plan to continue this?

@vsoch
Copy link
Contributor Author

vsoch commented May 1, 2025

I would like to - from this comment: #366 (comment) I thought we were waiting feedback from the Moby networking folks. Is the next step to try adding that line ethtool --offload eth0 tx-checksum-ip-generic off to the flannel rules?

@AkihiroSuda
Copy link
Member

AkihiroSuda commented May 1, 2025

Is the next step to try adding that line ethtool --offload eth0 tx-checksum-ip-generic off to the flannel rules?

Yes (when running in rootful), and let's call it a day

@thaJeztah
Copy link

/cc @robmry @akerouanton

@vsoch
Copy link
Contributor Author

vsoch commented May 1, 2025

Sounds good - I'll make some time in the next few days. It's after 1am here so I need to be off to sleep, but this is on my todo. Thanks for the ping @AkihiroSuda.

@robmry
Copy link

robmry commented May 1, 2025

Access from outside a host to container addresses inside bridge networks got blocked in moby 28.0, is that the issue? https://www.docker.com/blog/docker-engine-28-hardening-container-networking-by-default/

@robmry
Copy link

robmry commented May 1, 2025

If running dockerd with env var DOCKER_INSECURE_NO_IPTABLES_RAW=1 makes it work - that's the issue. Either way, I'd like to know more about what the network looks like - is it direct routing between container addresses, or do you have an overlay network in there?

@vsoch
Copy link
Contributor Author

vsoch commented May 1, 2025

@AkihiroSuda I tried both approaches suggested above, still issues. I left both commits / changes for feedback. Let me know what I should try next.

# https://github.com/kubernetes/kops/pull/9074
# https://github.com/karmab/kcli/commit/b1a8eff658d17cf4e28162f0fa2c8b2b10e5ad00
SUBSYSTEM=="net", ACTION=="add|change|move", ENV{INTERFACE}=="flannel.1", RUN+="/usr/sbin/ethtool -K flannel.1 tx-checksum-ip-generic off"
SUBSYSTEM=="net", ACTION=="add|change|move", ENV{INTERFACE}=="flannel.1", RUN+="/usr/sbin/ethtool --offload eth0 tx-checksum-ip-generic off"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not flannel.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AkihiroSuda I removed that parameter, but I don't know how the rules work and suspect something else should be there. Can you take a look?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the interface eth0 and then we remove --offload etho from the RUN?

Makefile Outdated

# Access from outside a host to container addresses inside bridge networks got blocked in Moby 28.0
# https://www.docker.com/blog/docker-engine-28-hardening-container-networking-by-default/
export DOCKER_INSECURE_NO_IPTABLES_RAW=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems irrelevant to our issue?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling it didn't fix things, so yes - I guess it is!

I have no context for this issue but, when Sebastiaan cc'd me this morning, I read about problems with networking between hosts with rootful Docker. That change came to mind, so I suggested disabling it. Now I see you're using flannel, if it hadn't been ruled out, I'd be even more convinced (!) - because ...

If there's something I can help with, let me know.

turn checksum off

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
@vsoch
Copy link
Contributor Author

vsoch commented May 13, 2025

@AkihiroSuda do you have another suggestion for what to try here? We'd like to try rootless soon - we have some overhead running rootless and want to test if running with rootful removes it (and then we could deduce it's something about user space).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants