wiki:ClusterUpgrde-2.12

Upgrading a Cluster from Debian7 to Debian8 and Ganeti from 2.11 to 2.12

Being sure we're ready to Upgrade

the cluster before i broke it

vm1.sql1.ietf.org:/root# ./do-stat 
Node              DTotal DFree MTotal MNode  MFree Pinst Sinst
vm1.sql1.ietf.org   7.3T  6.5T 110.3G 20.0G  94.0G     8     0
vm2.sql1.ietf.org   7.3T  7.1T 126.1G  769M 125.4G     0     3
vm3.sql1.ietf.org   7.3T  6.9T 126.1G  766M 125.2G     0     4

Instance                 Primary_node      Secondary_Nodes gMaxMem DiskUsage
hackathon.sql1.ietf.org  vm1.sql1.ietf.org                    4.0G      2.1G
management.sql1.ietf.org vm1.sql1.ietf.org vm2.sql1.ietf.or  16.0G     48.1G
netdot.sql1.ietf.org     vm1.sql1.ietf.org vm3.sql1.ietf.or   4.0G     40.1G
noc.ietf.org             vm1.sql1.ietf.org vm3.sql1.ietf.or   4.0G    256.1G
permatrac.sql1.ietf.org  vm1.sql1.ietf.org vm2.sql1.ietf.or   4.0G     16.1G
services-1.sql1.ietf.org vm1.sql1.ietf.org vm3.sql1.ietf.or   4.0G      8.1G
services-2.sql1.ietf.org vm1.sql1.ietf.org vm2.sql1.ietf.or   4.0G      8.1G
trac.sql1.ietf.org       vm1.sql1.ietf.org vm3.sql1.ietf.or   4.0G      8.1G

and it verified clean

vm1.sql1.ietf.org:/root# gnt-cluster verify
Submitted jobs 95884, 95885
Waiting for job 95884 ...
Fri Jun 17 23:35:25 2016 * Verifying cluster config
Fri Jun 17 23:35:25 2016 * Verifying cluster certificate files
Fri Jun 17 23:35:25 2016 * Verifying hypervisor parameters
Fri Jun 17 23:35:25 2016 * Verifying all nodes belong to an existing group
Waiting for job 95885 ...
Fri Jun 17 23:35:25 2016 * Verifying group 'default'
Fri Jun 17 23:35:25 2016 * Gathering data (3 nodes)
Fri Jun 17 23:35:26 2016 * Gathering disk information (3 nodes)
Fri Jun 17 23:35:26 2016 * Verifying configuration file consistency
Fri Jun 17 23:35:26 2016 * Verifying node status
Fri Jun 17 23:35:26 2016 * Verifying instance status
Fri Jun 17 23:35:26 2016 * Verifying orphan volumes
Fri Jun 17 23:35:27 2016 * Verifying N+1 Memory redundancy
Fri Jun 17 23:35:27 2016 * Other Notes
Fri Jun 17 23:35:27 2016   - NOTICE: 1 non-redundant instance(s) found.
Fri Jun 17 23:35:27 2016 * Hooks Results

i did apt-get update and upgrade to get wheezy current. i did not reboot.

Prepare to Upgrade Debian 7 to Debian 8

i s/wheezy/jessie/g in /etc/apt/sources.list and distributed it to all servers with copyfile

deb http://ftp.us.debian.org/debian/ jessie main contrib non-free
deb-src http://ftp.us.debian.org/debian/ jessie main contrib non-free
deb http://security.debian.org/ jessie/updates main
deb-src http://security.debian.org/ jessie/updates main

i pushed the hack to force apt to use v4 only, as i had failures over v6 with the green cluster.

    - name: copy 99force-ipv4
      copy: src=files/linux/99force-ipv4
              dest=/etc/apt/apt.conf.d/99force-ipv4
              owner=root 
              group=root
              mode=0644
# cat files/linux/99force-ipv4
Acquire::ForceIPv4 "true";

i shut down the VMs

gnt-instance shutdown --all

Upgrade Debian 7 to Debian 8

i ran

apt-get update
apt-get upgrade

on all three nodes

followed by

apt-get dist-upgrade

i chose the new maintainer version of /etc/lvm/lvm.conf

i rebooted vm1. it worked. so i rebooted vm2 an dvm3.

ganeti came up with vms, so i did another

gnt-instance shutdown --all

Preparing to Upgrade Ganeti

i did another

# gnt-cluster verify
Submitted jobs 95915, 95916
Waiting for job 95915 ...
Sat Jun 18 00:50:15 2016 * Verifying cluster config
Sat Jun 18 00:50:15 2016 * Verifying cluster certificate files
Sat Jun 18 00:50:15 2016 * Verifying hypervisor parameters
Sat Jun 18 00:50:15 2016 * Verifying all nodes belong to an existing group
Waiting for job 95916 ...
Sat Jun 18 00:50:16 2016 * Verifying group 'default'
Sat Jun 18 00:50:16 2016 * Gathering data (3 nodes)
Sat Jun 18 00:50:18 2016 * Gathering disk information (3 nodes)
Sat Jun 18 00:50:18 2016 * Verifying configuration file consistency
Sat Jun 18 00:50:18 2016 * Verifying node status
Sat Jun 18 00:50:18 2016 * Verifying instance status
Sat Jun 18 00:50:19 2016 * Verifying orphan volumes
Sat Jun 18 00:50:19 2016 * Verifying N+1 Memory redundancy
Sat Jun 18 00:50:19 2016 * Other Notes
Sat Jun 18 00:50:19 2016   - NOTICE: 1 non-redundant instance(s) found.
Sat Jun 18 00:50:19 2016 * Hooks Results

i installed 2.12 on all nodes

# apt-get install ganeti

which was a noop due to previous upgrade

i checked that 2.12 was ready and waiting on all nodes

# dpkg --list |grep ganeti
ii  ganeti                         2.12.4-1+deb8u3                all          cluster virtualization manager
ii  ganeti-2.11                    2.11.6-1~bpo70+1               all          cluster virtualization manager - Python components
ii  ganeti-2.12                    2.12.4-1+deb8u3                all          cluster virtualization manager - Python components
ii  ganeti-haskell-2.11            2.11.6-1~bpo70+1               amd64        cluster virtualization manager - Haskell components
ii  ganeti-haskell-2.12            2.12.4-1+deb8u3                amd64        cluster virtualization manager - Haskell components
ii  ganeti-htools-2.11             2.11.6-1~bpo70+1               amd64        cluster virtualization manager - tools for Ganeti 2.11
ii  ganeti-htools-2.12             2.12.4-1+deb8u3                amd64        cluster virtualization manager - tools for Ganeti 2.12
ii  ganeti-instance-debootstrap    0.14-2                         all          debootstrap-based instance OS definition for ganeti
ii  ganeti-instance-image          0.6-1+grnet17                  all          image based instance OS definition for ganeti
ii  ganeti-os-noop                 0.1-3                          all          A dummy no-op OS provider for ganeti

and that i was still really running 2.11

# ls -l /etc/ganeti/share
lrwxrwxrwx 1 root root 22 Jul 22  2015 /etc/ganeti/share -> /usr/share/ganeti/2.11/

Do the Ganeti Upgrade

and then, the moment we had all been awaiting

vm1.sql1.ietf.org:/home/randy# gnt-cluster upgrade --to 2.12
Draining queue
Pausing the watcher for one hour.
Stopping daemons on master node.
Stopping daemons everywhere.
Backing up configuration as /var/backups/ganeti1466213017.tar
Switching to version 2.12 on all nodes
Upgrading configuration
Ensuring directories everywhere.
Starting daemons everywhere.
Redistributing the configuration.
Restarting daemons everywhere.
Undraining the queue.
Running post-upgrade hooks
Unpasuing the watcher.
Verifying cluster.

Inspect Cluster and Restart VMs

but i wanted to see for myself

# gnt-cluster verify
Submitted jobs 95924, 95925
Waiting for job 95924 ...
Sat Jun 18 01:25:47 2016 * Verifying cluster config
Sat Jun 18 01:25:47 2016 * Verifying cluster certificate files
Sat Jun 18 01:25:47 2016 * Verifying hypervisor parameters
Sat Jun 18 01:25:47 2016 * Verifying all nodes belong to an existing group
Waiting for job 95925 ...
Sat Jun 18 01:25:49 2016 * Verifying group 'default'
Sat Jun 18 01:25:49 2016 * Gathering data (3 nodes)
Sat Jun 18 01:25:49 2016 * Gathering information about nodes (3 nodes)
Sat Jun 18 01:25:51 2016 * Gathering disk information (3 nodes)
Sat Jun 18 01:25:51 2016 * Verifying configuration file consistency
Sat Jun 18 01:25:51 2016 * Verifying node status
Sat Jun 18 01:25:51 2016 * Verifying instance status
Sat Jun 18 01:25:51 2016 * Verifying orphan volumes
Sat Jun 18 01:25:51 2016 * Verifying N+1 Memory redundancy
Sat Jun 18 01:25:51 2016 * Other Notes
Sat Jun 18 01:25:51 2016   - NOTICE: 1 non-redundant instance(s) found.
Sat Jun 18 01:25:52 2016 * Hooks Results

and confirm which version we are running

# ls -l /etc/ganeti/share
lrwxrwxrwx 1 root root 22 Jul 22  2015 /etc/ganeti/share -> /usr/share/ganeti/2.11/

and finally

# gnt-instance start --all
Last modified 3 years ago Last modified on Aug 31, 2016, 8:49:44 PM