wiki:MoveVMWithinCluster

Moving a Ganeti VM Within a Cluster

With significant help from Phil Regnauld.

Initial Conditions

Given a cluster

vm0.sea.rg.net:/root# gnt-node list
Node           DTotal DFree MTotal MNode MFree Pinst Sinst
vm0.sea.rg.net   3.6T  2.6T  31.5G  3.4G 29.2G     3     1
vm1.sea.rg.net   5.9T  2.0T  70.9G 43.0G 48.3G    12     8
vm2.sea.rg.net   5.5T  1.3T  63.0G 28.2G 47.4G     9    13

With a VM

Instance                     Primary_node   Secondary_NodesMaxMem DiskUsage
zzyzx.sigpipe.org            vm1.sea.rg.net vm2.sea.rg.net   2.0G    100.1G

We will change the VM to have its primary on vm0, leaving the secondary on vm2. If everything goes according to plan, the VM will stay up and working the whole time.

First move the primary to vm2

We make vm2 the primary to set up the migration of the secondary, vm1, to vm0 in the subsequent step.

gnt-instance migrate -f <name-of-vm-instance>

This swaps primary and secondary, causing the secondary, vm2 to become primary and vm1 to become secondary. It is a live migration; the VM keeps running.

vm0.sea.rg.net:/root# gnt-instance migrate -f zzyzx.sigpipe.org
Sun Apr 24 00:13:07 2016 Migrating instance zzyzx.sigpipe.org
Sun Apr 24 00:13:07 2016 * checking disk consistency between source and target
Sun Apr 24 00:13:15 2016 * switching node vm2.sea.rg.net to secondary mode
Sun Apr 24 00:13:18 2016 * changing into standalone mode
Sun Apr 24 00:13:21 2016 * changing disks into dual-master mode
Sun Apr 24 00:13:30 2016 * wait until resync is done
Sun Apr 24 00:13:33 2016 * preparing vm2.sea.rg.net to accept the instance
Sun Apr 24 00:13:36 2016 * migrating instance to vm2.sea.rg.net
Sun Apr 24 00:13:38 2016 * starting memory transfer
Sun Apr 24 00:15:31 2016 * memory transfer complete
Sun Apr 24 00:15:34 2016 * switching node vm1.sea.rg.net to secondary mode
Sun Apr 24 00:15:35 2016 * wait until resync is done
Sun Apr 24 00:15:39 2016 * changing into standalone mode
Sun Apr 24 00:15:43 2016 * changing disks into single-master mode
Sun Apr 24 00:15:49 2016 * wait until resync is done
Sun Apr 24 00:15:54 2016 * done

This results in

Instance                     Primary_node   Secondary_NodesMaxMem DiskUsage
zzyzx.sigpipe.org            vm2.sea.rg.net vm1.sea.rg.net   2.0G    100.1G

Now move the drbd secondary to vm0

We want the secondary to be on vm0 so that we can swap them in the last step. So we replace vm1 drbd with vm0.

gnt-instance replace-disks -n <new-secondary-node> <name-of-vm-instance>

This moves the drbd secondary only. Again, it is a live migration; the VM keeps running. But the migration is very slow.

vm0.sea.rg.net:/root# gnt-instance replace-disks -n vm0 zzyzx.sigpipe.org
Sun Apr 24 00:17:28 2016 Replacing disk(s) 0 for instance 'zzyzx.sigpipe.org'
Sun Apr 24 00:17:28 2016 Current primary node: vm2.sea.rg.net
Sun Apr 24 00:17:28 2016 Current seconary node: vm1.sea.rg.net
Sun Apr 24 00:17:28 2016 STEP 1/6 Check device existence
Sun Apr 24 00:17:28 2016  - INFO: Checking disk/0 on vm2.sea.rg.net
Sun Apr 24 00:17:31 2016  - INFO: Checking volume groups
Sun Apr 24 00:17:32 2016 STEP 2/6 Check peer consistency
Sun Apr 24 00:17:32 2016  - INFO: Checking disk/0 consistency on node vm2.sea.rg.net
Sun Apr 24 00:17:38 2016 STEP 3/6 Allocate new storage
Sun Apr 24 00:17:38 2016  - INFO: Adding new local storage on vm0.sea.rg.net for disk/0
Sun Apr 24 00:17:39 2016 STEP 4/6 Changing drbd configuration
Sun Apr 24 00:17:39 2016  - INFO: activating a new drbd on vm0.sea.rg.net for disk/0
Sun Apr 24 00:17:46 2016  - INFO: Shutting down drbd for disk/0 on old node
Sun Apr 24 00:17:48 2016  - INFO: Detaching primary drbds from the network (=> standalone)
Sun Apr 24 00:17:51 2016  - INFO: Updating instance configuration
Sun Apr 24 00:17:51 2016  - INFO: Attaching primary drbds to new secondary (standalone => connected)
Sun Apr 24 00:17:54 2016 STEP 5/6 Sync devices
Sun Apr 24 00:17:55 2016  - INFO: Waiting for instance zzyzx.sigpipe.org to sync disks
Sun Apr 24 00:17:58 2016  - INFO: - device disk/0:  0.10% done, 40m 25s remaining (estimated)
Sun Apr 24 00:19:02 2016  - INFO: - device disk/0:  0.70% done, 2h 35m 52s remaining (estimated)
...
Sun Apr 24 01:59:00 2016  - INFO: - device disk/0: 99.40% done, 1m 27s remaining (estimated)
Sun Apr 24 02:00:03 2016  - INFO: Instance zzyzx.sigpipe.org's disks are in sync
Sun Apr 24 02:00:05 2016 STEP 6/6 Removing old storage
Sun Apr 24 02:00:05 2016  - INFO: Remove logical volumes for 0

We all love time to completion estimates. Be patient, very patient; I said it would be slow. Maybe take a nap.

The result is

Instance                     Primary_node   Secondary_NodesMaxMem DiskUsage
zzyzx.sigpipe.org            vm2.sea.rg.net vm0.sea.rg.net   2.0G    100.1G

Finally swap primary and secondary

We use migrate to swap primary and secondary, as we did in the first step

gnt-instance migrate -f <name-of-vm-instance>

Which should leave us with

Instance                     Primary_node   Secondary_NodesMaxMem DiskUsage
zzyzx.sigpipe.org            vm0.sea.rg.net vm2.sea.rg.net   2.0G    100.1G

Except sometimes it does not work out

It turns out the cluster had version skew in kvm

vm0.sea.rg.net:/root# gnt-instance migrate -f zzyzx.sigpipe.org 
Sun Apr 24 02:02:28 2016 Migrating instance zzyzx.sigpipe.org
Sun Apr 24 02:02:28 2016 * warning: hypervisor version mismatch between source ([2, 1, 2]) and target ([1, 1, 2]) node
Sun Apr 24 02:02:28 2016 * checking disk consistency between source and target
Sun Apr 24 02:02:29 2016 * switching node vm0.sea.rg.net to secondary mode
Sun Apr 24 02:02:29 2016 * changing into standalone mode
Sun Apr 24 02:02:31 2016 * changing disks into dual-master mode
Sun Apr 24 02:02:39 2016 * wait until resync is done
Sun Apr 24 02:02:42 2016 * preparing vm0.sea.rg.net to accept the instance
Sun Apr 24 02:02:43 2016 Pre-migration failed, aborting
Sun Apr 24 02:02:44 2016 * switching node vm0.sea.rg.net to secondary mode
Sun Apr 24 02:02:44 2016 * changing into standalone mode
Sun Apr 24 02:02:48 2016 * changing disks into single-master mode
Sun Apr 24 02:02:54 2016 * wait until resync is done
Failure: command execution error:
Could not pre-migrate instance zzyzx.sigpipe.org: Failed to accept instance: Failed to start instance zzyzx.sigpipe.org: exited with exit code 1 (Supported machines are:
pc                   Standard PC (alias of pc-1.1)
pc-1.1               Standard PC (default)
pc-1.0               Standard PC
pc-0.15              Standard PC
pc-0.14              Standard PC
pc-0.13              Standard PC
pc-0.12              Standard PC
pc-0.11              Standard PC, qemu 0.11
pc-0.10              Standard PC, qemu 0.10
isapc                ISA-only PC
)

The fallback is to use failover instead of migrate. Unfortunately this causes a shutdown of the VM; but oh well.

gnt-instance failover <name-of-vm-instance>

And this worked

vm0.sea.rg.net:/root# gnt-instance failover zzyzx.sigpipe.org 
Failover will happen to image zzyzx.sigpipe.org. This requires a
shutdown of the instance. Continue?
y/[n]/?: y
Sun Apr 24 02:08:45 2016 Failover instance zzyzx.sigpipe.org
Sun Apr 24 02:08:45 2016 * checking disk consistency between source and target
Sun Apr 24 02:08:45 2016 * shutting down instance on source node
Sun Apr 24 02:08:56 2016 * deactivating the instance's disks on source node
Sun Apr 24 02:09:03 2016 * activating the instance's disks on target node vm0.sea.rg.net
Sun Apr 24 02:09:11 2016 * starting the instance on the target node vm0.sea.rg.net

Leaving us with

Instance                     Primary_node   Secondary_NodesMaxMem DiskUsage
zzyzx.sigpipe.org            vm0.sea.rg.net vm2.sea.rg.net   2.0G    100.1G

And the cluster a little more balanced than at the start

m0.sea.rg.net:/root# gnt-node list
Node           DTotal DFree MTotal MNode MFree Pinst Sinst
vm0.sea.rg.net   3.6T  2.5T  31.5G  4.5G 28.6G     4     1
vm1.sea.rg.net   5.9T  2.1T  70.9G 42.1G 50.2G    11     8
vm2.sea.rg.net   5.5T  1.3T  63.0G 28.5G 47.4G     9    13
Last modified 4 years ago Last modified on Apr 24, 2016, 2:12:52 AM