Fix check for draining#458
Conversation
Currently, flavors for GPUs and FPGAs, or flavors with more than 60 CPUs are prevented from migration, unconditionally. This change adds a new boolean parameter to work like this: * when the workflow hypervisor.drain is used, the behavior remains * when the action server.migrate is used, users can uncheck the box and allow the migration even for those cases
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #458 +/- ##
==========================================
- Coverage 99.19% 99.15% -0.04%
==========================================
Files 112 112
Lines 2726 2727 +1
Branches 338 339 +1
==========================================
Hits 2704 2704
Misses 19 19
- Partials 3 4 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
6ac0e72 to
911b032
Compare
0ed2fee to
00ae743
Compare
| required: true | ||
| default: true | ||
| check_flavor: | ||
| description: "Check the Flavor of the Server to decide if migration is feasible. When set as True, GPUs, FPGAs, and Flavors with more than 60 CPUs will NOT be migrated." |
There was a problem hiding this comment.
Nit, I would call it live_migration
Then the tick box (default) would do a live migration on CPU instances <60 cores
Unticking it would do a cold migration which can be done on anything but will shutdown while it happens
There was a problem hiding this comment.
That may be a good idea for the long term, as a separate flag.
Note that the purpose here is slightly different. We are NOT doing live migrations during the outage week in July.
This change is just to allow the migration of VMs with flavors for GPUs/FPGAs and/or more that 60 CPUs, which are currently blocked by st2. This PR allows a way to keep that check for those types of flavors in place, to avoid trying risky migrations by mistake, but at the same time having a mechanism to force the migration this time.
There was a problem hiding this comment.
Yeah we need to do a "cold migration" for these types of flavors
But we'll need to do this post-outage too (e.g. HW failure) and the steps will be the same
The fact it skips the checks is an implementation detail - we just want a way to force drain these types of flavors which will incur an offline
| raise ValueError( | ||
| f"Attempted to move GPU or FPGA flavor, {server.flavor.name}, which is not allowed!" | ||
| ) | ||
| def can_be_migrated(server: Server, check_flavor: bool): |
There was a problem hiding this comment.
put a default value here as check_flavor=false to keep backwards compatible with other workflows using this function
There was a problem hiding this comment.
I think it was the other way around, before this PR we always did the check as it was always a live migration
Now the check makes it optional for cold migrations - existing workflows will assume a live migration instead
| conn: Connection, | ||
| server_id: str, | ||
| snapshot: bool, | ||
| check_flavor: bool, |
There was a problem hiding this comment.
do the same here check_flavor=false
| conn=mock_connection, | ||
| server_id=mock_server_id, | ||
| snapshot=False, | ||
| check_flavor=True, |
There was a problem hiding this comment.
don't need this anymore if you put a default value
| server_id=mock_server_id, | ||
| dest_host=dest_host, | ||
| snapshot=True, | ||
| check_flavor=True, |
There was a problem hiding this comment.
dont need this if you put a default value
| server_id=mock_server_id, | ||
| dest_host=dest_host, | ||
| snapshot=True, | ||
| check_flavor=True, |
There was a problem hiding this comment.
change this to check_flavor because you're overwriting what it set as an argument
Description:
Special Notes:
Submitter:
Have you (where applicable):
Reviewer
Does this PR:
libdirectory?liblayers?