-
Notifications
You must be signed in to change notification settings - Fork 284
✨ Shutdown VMs before Deletion #2835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hi @shaardie. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
lentzi90
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
Thinking about unnecessary API calls, should we skip trying to shut it down all together if the timeout is 0?
|
/retitle ✨ Shutdown VMs before Deletion |
Which timeout do you mean exactly? |
Tries to shutdown the OpenStack VM before deleting it. This way even Pods form Daemonsets are shut down more gracefully and services like license daemons on the VMs can be properly shutdown. Related to kubernetes-sigs#1973
5e56463 to
47154ea
Compare
|
Oh right, I got the 0 from the issue description. But the question is still relevant. I think users should be able to opt out of this, especially since this adds more API calls. |
So you suggest a new configuration option via CRD? |
|
Hmm let me gather some second opinions. I want to have more than a gut feeling before we start modifying the CRDs 😄 |
|
I'm not sure how I feel about failing if the system doesn't shut down, I feel like it would be better if the it tries to shut it down for 5 minutes, and if it doesn't shut down, it moves on to termination. Anyways, OpenStack will flip from a graceful to hard shutdown after 60s by default: https://docs.openstack.org/nova/rocky/configuration/config.html#DEFAULT.graceful_shutdown_timeout So the 5 minute timeout seems overkill as well, unless something is seriously wrong (or the cloud has that config changed). |
|
I can also change the PR to continue with deleting the VM instead of failing after the period of time. For me personally 60s would also be okay for a timeout, but I can think of situations where this can be a little bit short. For example, if there are some custom mounts of nfs, cifs, gpfs, what so ever. This can easily take more than 60s to shutdown. Maybe you should first decide, if you want to have this value configurable via CRD? |
|
I have checked with my downstream and they do not have any concerns with the feature (always enabled). However, it sounds like there are quite many ways to do things and people will want different things. Some do not care about the shutdown and definitely want to force it or just straight delete. Some want to make sure everything is properly shut down, rather error than force. And some will want a different timeout. So how should we do this? I can see it working with either a flag or CRD field(s). Then we have one more thing to consider. We want to make use of ORC for managing the servers. See #2814 for more details. Hopefully we can get this done sooner rather than later, which means that this feature would make more sense to implement in ORC directly. Otherwise we will end up having to migrate it later. |
|
I am not quite sure what you want me to do honestly. I would be happy to change stuff on this PR, if you tell me what you want to have. If you want to migrate to your new setup first, I would probably use my patched version for now and see, if I re-write the whole thing again, when you migration to ORC is done. |
Human interaction analogyLet's go through the scenarios where Alice (User) wants Bob (CAPO) to delete a VM in a "regular" talk to your colleague kind of interaction:
In all of the above cases, Alice should provide Bob with the information he needs to proceed in an ideal way. To me, this hints in the direction of Alice (User) providing this information to Bob (CAPO) beforehand, in a way Bob understands (CRD field). If Bob tries to have one solution that applies to all possible use cases (Configuration Flag) he might get some cases wrong, in which Alice has different requirements. You might also have the case that you have one CAPO instance managing VMs that you want to be deleted immediately as well as VMs that you want to give time for an orderly shutdown. That would also make the CRD field approach more desirable. Which value should the feature use by defaultIn my opinion, the Venn diagram representing the group of people for whom one minute of additional VM runtime would be more than even a minor inconvenience (which could then be fixed easily) and the group of people who would be caught unaware of such a change should have a very small intersection. Whereas with the way things currently work, the Venn diagram representing the group of people for whom an immediate VM termination would be more than even a minor inconvenience (which may or may not be easily remedied) and the group of people who might be bitten by this in the future probably has a larger intersection (in my opinion). So i think under those conditions, there is no need to treat the previous default (which is unusual and can definetly cause headaches) with a lot of reverence. I think But this is just my opinion, just trying to give some input to give you a perspective on the choices you mentioned. |
I am basically saying that I think we need an option to either turn this feature off or to allow more granular configuration of it. I do not have a strong opinion on how to do that so I am leaving it up to you to propose what to do. The issue description already suggests a If you don't need this urgently, I also suggest looking into ORC first so that we can get an implementation that will work with it. Otherwise we risk breaking this feature later. |
|
As we are expecting to have graceful shut down. Should we follow the following steps so that ungraceful shutdown doesn't happen:
|
Tries to shutdown the OpenStack VM before deleting it. This way even Pods form Daemonsets are shut down more gracefully and services like license daemons on the VMs can be properly shutdown.
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #1973
Special notes for your reviewer:
TODOs:
/hold