-
Notifications
You must be signed in to change notification settings - Fork 68
ci: add test for rootful docker #366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
.github/workflows/main.yaml
Outdated
| - lima_template: template://ubuntu-24.04 | ||
| container_engine: docker | ||
| - lima_template: template://docker-rootful | ||
| container_engine: docker-rootful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer this form
- lima_template: template://ubuntu-24.04
container_engine: docker
rootfull: 1
hack/create-cluster-lima.sh
Outdated
| if [[ "$CONTAINER_ENGINE" == "docker-rootful" ]] | ||
| then | ||
| CONTAINER_ENGINE="docker" | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the variables to be immutable through the entire lifecycle of the test.
So, another variable like ROOTFUL=1 should be introduced.
|
Thanks, I confirmed that this issue happens on my local machines too, but I haven't identified the cause. Tested with Docker v28 and v27.5.1, on Ubuntu 24.04.1 (ARM64). I think it was working in the past? |
|
ICMP and DNS still seems to work, but TCP across the nodes seems broken? VXLAN packets are apparently sent and received on each of the VMs, though. (Run Apparently, the receiver VM seems refusing to route the VXLAN packets to the |
|
Found a workaround: execute |
|
Any eyes needed here from the Moby networking folks? (I know they're pretty busy currently, but if it's useful I can try ask them if they have time to spare to give it eyes) |
0010ee9 to
0d56a3c
Compare
This is important to run on multi-node Signed-off-by: vsoch <vsoch@users.noreply.github.com>
|
@AkihiroSuda do you remember the last time you tested with it working? In recent memory we had updates to flannel, the underlying kind node (Kubernetes version), and (for me) at some point last year the additional |
|
oh wow, this is really interesting! Not sure if this is expected, but this looks to be a warning in the failed nerdctl setup: |
|
The usernetes/Dockerfile.d/etc_udev_rules.d_90-flannel.rules Lines 1 to 5 in b259da8
It is still unclear why this is needed only for rootful, though.
Thanks, that would be appreciated. |
Irrelevant to the topic. |
|
@vsoch Do you plan to continue this? |
|
I would like to - from this comment: #366 (comment) I thought we were waiting feedback from the Moby networking folks. Is the next step to try adding that line |
Yes (when running in rootful), and let's call it a day |
|
/cc @robmry @akerouanton |
|
Sounds good - I'll make some time in the next few days. It's after 1am here so I need to be off to sleep, but this is on my todo. Thanks for the ping @AkihiroSuda. |
|
Access from outside a host to container addresses inside bridge networks got blocked in moby 28.0, is that the issue? https://www.docker.com/blog/docker-engine-28-hardening-container-networking-by-default/ |
|
If running dockerd with env var |
|
@AkihiroSuda I tried both approaches suggested above, still issues. I left both commits / changes for feedback. Let me know what I should try next. |
| # https://github.com/kubernetes/kops/pull/9074 | ||
| # https://github.com/karmab/kcli/commit/b1a8eff658d17cf4e28162f0fa2c8b2b10e5ad00 | ||
| SUBSYSTEM=="net", ACTION=="add|change|move", ENV{INTERFACE}=="flannel.1", RUN+="/usr/sbin/ethtool -K flannel.1 tx-checksum-ip-generic off" | ||
| SUBSYSTEM=="net", ACTION=="add|change|move", ENV{INTERFACE}=="flannel.1", RUN+="/usr/sbin/ethtool --offload eth0 tx-checksum-ip-generic off" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not flannel.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AkihiroSuda I removed that parameter, but I don't know how the rules work and suspect something else should be there. Can you take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the interface eth0 and then we remove --offload etho from the RUN?
Makefile
Outdated
|
|
||
| # Access from outside a host to container addresses inside bridge networks got blocked in Moby 28.0 | ||
| # https://www.docker.com/blog/docker-engine-28-hardening-container-networking-by-default/ | ||
| export DOCKER_INSECURE_NO_IPTABLES_RAW=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems irrelevant to our issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabling it didn't fix things, so yes - I guess it is!
I have no context for this issue but, when Sebastiaan cc'd me this morning, I read about problems with networking between hosts with rootful Docker. That change came to mind, so I suggested disabling it. Now I see you're using flannel, if it hadn't been ruled out, I'd be even more convinced (!) - because ...
If there's something I can help with, let me know.
turn checksum off Signed-off-by: vsoch <vsoch@users.noreply.github.com>
|
@AkihiroSuda do you have another suggestion for what to try here? We'd like to try rootless soon - we have some overhead running rootless and want to test if running with rootful removes it (and then we could deduce it's something about user space). |


I am finding with testing that the networking between hosts does not work when we are running in rootful. I was testing this because using nvidia devices does work with rootful, but once I got to the stop of needing pods to communicate, there was no communication.
I am not sure about the error, but this test should reproduce it in CI. Note that to enable this we use the docker-rootful template provided by lima (@AkihiroSuda you have thought of all things)! The main changes here are to add this test to the matrix, and ensure that in the different install scripts, we largely do nothing if the container runtime is
docker-rootful.Related to #365 but does not fix it, only demonstrates it.