Conversation
|
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Code Review
This pull request introduces the capability to simulate NUMA nodes in QEMU for external tests. The changes involve adding a NumaNodes option and plumbing it through various layers of the test harness and platform configuration. The core logic for generating QEMU arguments for NUMA is in mantle/platform/qemu.go.
I've identified a couple of issues in the implementation within mantle/platform/qemu.go that could lead to incorrect resource allocation for the simulated NUMA nodes. My review includes suggestions to fix these bugs.
ravanelli
left a comment
There was a problem hiding this comment.
Great job, it is looking good, let's squash the commits and follow the suggestion to simply it with using on/off with 2 NUMA nodes
mantle/platform/qemu.go
Outdated
| kvm = false | ||
| } | ||
| machineArg += "," + accel | ||
| func platformQemuArgs(arch, machineArg string, kvm bool) ([]string, error) { |
There was a problem hiding this comment.
You did a split in this function and also created baseNumaQemuArgs that shares a lot of the same code, let's try to merge the duplication
52601f4 to
774b30c
Compare
|
/test all |
774b30c to
5590322
Compare
New boolean flag "numaNodes" will simulate two NUMA nodes. They will divide the memory and cpus between them.
5590322 to
aafd091
Compare
| node0Cpus := cpus / 2 | ||
|
|
||
| ret = append(ret, "-object", fmt.Sprintf("memory-backend-memfd,id=%s,size=%dM,share=on", node0MemoryDevice, node0Memory)) | ||
| ret = append(ret, "-object", fmt.Sprintf("memory-backend-memfd,id=%s,size=%dM,share=on", node1MemoryDevice, memoryMiB-node0Memory)) |
There was a problem hiding this comment.
Here, you are using node0Memory for one node and memoryMiB - node0Memory for the second. Why is that? It seems this could result in one NUMA node having more memory than the other? If so, the statement "They will split the machine's memory/CPUs evenly between them" won't be true
There was a problem hiding this comment.
It is for cases where the memory is not exactly divisible by two. The memory of the nodes when added together, must equal exactly the total memory of the machine or there is an error.
If the memory is not divisible by 2, one node may have slightly more memory than the other one (one MiB more).
|
|
||
| ret = append(ret, "-object", fmt.Sprintf("memory-backend-memfd,id=%s,size=%dM,share=on", node0MemoryDevice, node0Memory)) | ||
| ret = append(ret, "-object", fmt.Sprintf("memory-backend-memfd,id=%s,size=%dM,share=on", node1MemoryDevice, memoryMiB-node0Memory)) | ||
| ret = append(ret, "-numa", fmt.Sprintf("node,memdev=%s,cpus=%d-%d,nodeid=0", node0MemoryDevice, 0, node0Cpus-1)) |
There was a problem hiding this comment.
What does 0, does here? That the second does not need it?
There was a problem hiding this comment.
I'm guessing you used this doc? https://www.qemu.org/docs/master/system/invocation.html#cmdoption-smp
Can you add the link for it, so we can understand what this code is doing?
Also, why you did not used the socket option? -numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1
There was a problem hiding this comment.
That is the docs I used, yes. Should I add the link as a comment in the codebase, or just to this PR ?
There was no big reason why I didnt use the socket flag. Would you prefer it be used when defining the nodes ?
The 0 is just the starting CPU for node0. The reason it isn't there for the other node is that they are both being given different CPUs. Right now, if your machine has 4 cpus, node0 would be given cpus=0-1 and node1 would be given cpus=2-3.
Maybe it would simplify the code if we hard coded it to be 2 cpus, one for each node, or even simpler if we gave node 0 one cpu, and node 1 only has memory.
| return nil, fmt.Errorf("Must have at least 2 cpus to simulate NUMA nodes") | ||
| } | ||
|
|
||
| ret, err := platformQemuArgs(arch, "") |
There was a problem hiding this comment.
In this case we have the output for with a comma for no reason:
machineArg = accelArg + "," + machineArg
Something like, machineArg=accel=tcg,
Add an option for external tests to be executed by a machine with multiple numa nodes.