Notes on upgrading RHV 4.3 to RHV 4.4

Reciently Red Hat has published the latest RHV 4.4 version. This introduces some major changes in the underlying operating system (migration from RHEL7 to RHEL8 in both hypervisors and Engine / Self Hosted Engine), and a bunch of new features.

There are extensive notes on how to perform the upgrade, especially for the Self-hosted Engine-type of deployments.

I upgraded a small 2-node lab environment and besides the notes already mentioned in the docs above, I also found relevant:

Before you start

  • Understand the NIC naming differences between RHEL7 and RHEL8.
    • Your hypervisor NICs will probably be renamed.
  • Jot down your hypervisors' NIC to MAC-addresses mappings prior to attempting an upgrade.
    • This will ease understanding what NIC is what after installing RHEL8.
  • When using shared storage (FC), beware of unmapping it while you reinstall each host, or ensure your kickstart does NOT clear the shared disks.
    • Otherwise this might lead into data loss!!

Prerequistes

  • One spare hypervisor, feshly installed with RHEL8/RHVH8 and NOT added to the manager.
  • One additional LUN / NFS share for the new SHE 4.4 deployment.

    • The installer does not upgrade the old SHE in-place, so a new lun is required.
    • This eases the rollback, as the original SHE LUN is untouched.
  • Ensure the new hypervisor has all the configuration to access all required networks prior to starting the upgrade.

    • IP configuration for the ovirtmgmt network (obvious).
    • IP configuration for any NFS/iSCSI networks, if required.
    • Shared FC storage, if required.
    • This is critical as the restore process does not prompt to configure/fix network settings when deploying the upgraded manager.
  • Extra steps

    • Collect your RHV-M details:
      • IP address and netmask
      • FQDN
      • Mac-address if using DHCP.
      • Extra software and additional RPMs (eg: AD/IDM/ldap integration, etc)
      • Existing /etc/hosts details in case you use hosts instead of DNS (bad bad bad!!!).
      • Same for hypervisors!
    • Optionally: Mark your networks within the cluster as non-Required . This might be useful until BZ #1867198 is addressed.

Deploying and registering the hypervisors.

The RHEL8/RHVH8 can be deployed as usual with Foreman / Red Hat Satellite.

Ensure the hypervisors are registered and have access to the repositories as below:

RHEL8 Host repositories

POOLID=`subscription-manager list --available --matches "Red Hat Virtualization"  --pool-only | head -n 1`
subscription-manager attach --pool=$POOLID
subscription-manager repos \
    --disable='*' \
    --enable=rhel-8-for-x86_64-baseos-rpms \
    --enable=rhel-8-for-x86_64-appstream-rpms \
    --enable=rhv-4-mgmt-agent-for-rhel-8-x86_64-rpms \
    --enable=fast-datapath-for-rhel-8-x86_64-rpms \
    --enable=ansible-2.9-for-rhel-8-x86_64-rpms \
    --enable=advanced-virt-for-rhel-8-x86_64-rpms

yum module reset -y virt
yum module enable -y virt:8.2
systemctl enable --now firewalld
yum install -y rhevm-appliance ovirt-hosted-engine-setup

RHVH8 Host repositories

POOLID=$(subscription-manager list --available --matches "Red Hat Virtualization"  --pool-only | head -n 1)
subscription-manager attach --pool=$POOLID
subscription-manager repos \
    --disable='*' \
    --enable=rhvh-4-for-rhel-8-x86_64-rpms
systemctl enable --now firewalld
yum install -y rhevm-appliance

Powering off RHV 4.3 manager

  • Set the Manager in global maintenance mode.
  • OPTIONAL: Mark your networks within the cluster as non-Required . This might be useful until BZ #1867198 is addressed.
  • Stop the ovirt-engine service.
  • Backup the RHV 4.3 database and save in a shared space.

Performing the RHV-M upgrade

  • Copy the database backup into the RHEL8 hypervisor.
  • Launch the restore process with hosted-engine --deploy --restore-from-file=backup.tar.bz2

The process has changed significantly in the last RHV releases and it now performs the new SHE rollout or restore in two phases :

  • Phase 1: it tries to roll it out in the hypervisor local storage.

    • Gather FQDN, IP details of the Manager.
    • Gather other configuration.
  • Phase 2: migrate to shared storage.

    • If Phase1 is successful, this takes care of gathering shared storage details (LUN ID or NFS defails).
    • Copy the bootstrap manager into the shared storage.
    • Configure the ovirt-ha-broker and ovirt-ha-agent in the hypervisor to monitor and ensure the SHE is started.

Phase 1 details

[root@rhevh2 rhev]# time  hosted-engine --deploy --restore-from-file=engine-backup-rhevm-20200807_1536.tar.bz2
[ INFO  ] Stage: Initializing
[ INFO  ] Stage: Environment setup
          During customization use CTRL-D to abort.
          Continuing will configure this host for serving as hypervisor and will create a local VM with a running engine.
          The provided engine backup file will be restored there,
          it's strongly recommended to run this tool on an host that wasn't part of the environment going to be restored.
          If a reference to this host is already contained in the backup file, it will be filtered out at restore time.
          The locally running engine will be used to configure a new storage domain and create a VM there.
          At the end the disk of the local VM will be moved to the shared storage.
          The old hosted-engine storage domain will be renamed, after checking that everything is correctly working you can manually remove it.
          Other hosted-engine hosts have to be reinstalled from the engine to update their hosted-engine configuration.
          Are you sure you want to continue? (Yes, No)[Yes]: yes
          It has been detected that this program is executed through an SSH connection without using tmux.
          Continuing with the installation may lead to broken installation if the network connection fails.
          It is highly recommended to abort the installation and run it inside a tmux session using command "tmux".
          Do you want to continue anyway? (Yes, No)[No]: yes
          Configuration files: 
          Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20200807155111-5blcva.log
          Version: otopi-1.9.2 (otopi-1.9.2-1.el8ev)
[ INFO  ] Stage: Environment packages setup
[ INFO  ] Stage: Programs detection
[ INFO  ] Stage: Environment setup (late)
[ INFO  ] Stage: Environment customization

          --== STORAGE CONFIGURATION ==--


          --== HOST NETWORK CONFIGURATION ==--

          Please indicate the gateway IP address [10.48.0.100]: 
[ INFO  ] TASK [ovirt.hosted_engine_setup : Execute just a specific set of steps]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Force facts gathering]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Detecting interface on existing management bridge]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Get all active network interfaces]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Filter bonds with bad naming]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Generate output list]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Collect interface types]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Check for Team devices]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Get list of Team devices]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Filter unsupported interface types]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Failed if only teaming devices are availible]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Validate selected bridge interface if management bridge does not exist]
[ INFO  ] skipping: [localhost]
         Please indicate a nic to set ovirtmgmt bridge on: (eth4.100, ens15.200) [ens15.200]: eth4.100
          Please specify which way the network connectivity should be checked (ping, dns, tcp, none) [dns]: 

          --== VM CONFIGURATION ==--

          Please enter the name of the datacenter where you want to deploy this hosted-engine host. Please note that if you are restoring a backup that contains info about other hosted-engine hosts,
          this value should exactly match the value used in the environment you are going to restore. [Default]: 
          Please enter the name of the cluster where you want to deploy this hosted-engine host. Please note that if you are restoring a backup that contains info about other hosted-engine hosts,
          this value should exactly match the value used in the environment you are going to restore. [Default]: 
          Renew engine CA on restore if needed? Please notice that if you choose Yes, all hosts will have to be later manually reinstalled from the engine. (Yes, No)[No]: 
          Pause the execution after adding this host to the engine?
          You will be able to iteratively connect to the restored engine in order to manually review and remediate its configuration before proceeding with the deployment:
          please ensure that all the datacenter hosts and storage domain are listed as up or in maintenance mode before proceeding.
          This is normally not required when restoring an up to date and coherent backup. (Yes, No)[No]: 
          If you want to deploy with a custom engine appliance image,
          please specify the path to the OVA archive you would like to use
          (leave it empty to skip, the setup will use rhvm-appliance rpm installing it if missing): 
          Please specify the number of virtual CPUs for the VM (Defaults to appliance OVF value): [4]: 
          Please specify the memory size of the VM in MB (Defaults to appliance OVF value): [16384]: 
[ INFO  ] Detecting host timezone.
          Please provide the FQDN you would like to use for the engine.
          Note: This will be the FQDN of the engine VM you are now going to launch,
          it should not point to the base host or to any other existing machine.
         Engine VM FQDN:  []: rhevm.example.org
          Please provide the domain name you would like to use for the engine appliance.
          Engine VM domain: [example.org]
          Enter root password that will be used for the engine appliance: 
          Confirm appliance root password: 
          Enter ssh public key for the root user that will be used for the engine appliance (leave it empty to skip): 
          Do you want to enable ssh access for the root user (yes, no, without-password) [yes]: 
          Do you want to apply a default OpenSCAP security profile (Yes, No) [No]: 
          You may specify a unicast MAC address for the VM or accept a randomly generated default [00:16:3e:03:ec:35]: 
          How should the engine VM network be configured (DHCP, Static)[DHCP]? static
          Please enter the IP address to be used for the engine VM []: 10.48.0.4
[ INFO  ] The engine VM will be configured to use 10.48.0.4/24
          Please provide a comma-separated list (max 3) of IP addresses of domain name servers for the engine VM
          Engine VM DNS (leave it empty to skip) [10.48.0.100]: 
          Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?
          Note: ensuring that this host could resolve the engine VM hostname is still up to you
          (Yes, No)[No] 

          --== HOSTED ENGINE CONFIGURATION ==--

          Please provide the name of the SMTP server through which we will send notifications [localhost]: 
          Please provide the TCP port number of the SMTP server [25]: 
          Please provide the email address from which notifications will be sent [root@localhost]: 
          Please provide a comma-separated list of email addresses which will get notifications [root@localhost]: 
          Enter engine admin password: 
          Confirm engine admin password: 
[ INFO  ] Stage: Setup validation
          Please provide the hostname of this host on the management network [rhevh2]: rhevh2.example.org
[ INFO  ] Stage: Transaction setup
[ INFO  ] Stage: Misc configuration (early)
[ INFO  ] Stage: Package installation
[ INFO  ] Stage: Misc configuration
[ INFO  ] Stage: Transaction commit
[ INFO  ] Stage: Closing up
[ INFO  ] Cleaning previous attempts
[ INFO  ] TASK [ovirt.hosted_engine_setup : Execute just a specific set of steps]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Force facts gathering]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Install oVirt Hosted Engine packages]
[ INFO  ] ok: [localhost]

[... snip ...]

The manager is now being deployed and made available via the hypervisor at a later stage:

[ INFO  ] TASK [ovirt.hosted_engine_setup : Adding new SSO_ALTERNATE_ENGINE_FQDNS line]
[ INFO  ] changed: [localhost -> rhevm.example.org]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Restart ovirt-engine service for changed OVF Update configuration and LibgfApi support]
[ INFO  ] changed: [localhost -> rhevm.example.org]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Mask cloud-init services to speed up future boot]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Wait for ovirt-engine service to start]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Open a port on firewalld]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Expose engine VM webui over a local port via ssh port forwarding]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Evaluate temporary bootstrap engine URL]
[ INFO  ] ok: [localhost]
[ INFO  ] The bootstrap engine is temporary accessible over https://rhevh2.example.org:6900/ovirt-engine/ 
[ INFO  ] TASK [ovirt.hosted_engine_setup : Detect VLAN ID]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Set Engine public key as authorized key without validating the TLS/SSL certificates]
[ INFO  ] changed: [localhost]
[...]
[ INFO  ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Always revoke the SSO token]
[ INFO  ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Obtain SSO token using username/password credentials]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Wait for the host to be up]

The bootstrap manager is available at https://hypervisor.example.org:6900/ovirt-engine/ and the installer tries to add the current host under the Manager management. (It waits for the host to be in the 'Up' state. This is why is important to have all the storage and network prerequisites prepared/available).

And to finish up :

[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Destroy local storage-pool localvm7imrhb7u]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Undefine local storage-pool localvm7imrhb7u]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Destroy local storage-pool {{ local_vm_disk_path.split('/')[5] }}]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Undefine local storage-pool {{ local_vm_disk_path.split('/')[5] }}]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20200807193709.conf'
[ INFO  ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ INFO  ] Hosted Engine successfully deployed
[ INFO  ] Other hosted-engine hosts have to be reinstalled in order to update their storage configuration. From the engine, host by host, please set maintenance mode and then click on reinstall button ensuring you choose DEPLOY in hosted engine tab.
[ INFO  ] Please note that the engine VM ssh keys have changed. Please remove the engine VM entry in ssh known_hosts on your clients.

real    45m1,768s
user    18m4,639s
sys     1m9,271s

After finishing the upgrade it is also recommended to register the RHV-Manager virtual machine and upgrade to the latest RPMs available in the Red Hat CDN.

Set the Hosted Engine in Global Maintenance mode and:

POOLID=`subscription-manager list --available --matches "Red Hat Virtualization"  --pool-only | head -n 1`
subscription-manager attach --pool=$POOLID

subscription-manager repos \
    --disable='*' \
    --enable=rhel-8-for-x86_64-baseos-rpms \
    --enable=rhel-8-for-x86_64-appstream-rpms \
    --enable=rhv-4.4-manager-for-rhel-8-x86_64-rpms \
    --enable=fast-datapath-for-rhel-8-x86_64-rpms \
    --enable=ansible-2.9-for-rhel-8-x86_64-rpms \
    --enable=jb-eap-7.3-for-rhel-8-x86_64-rpms

yum module -y enable pki-deps
yum module -y enable postgresql:12
yum module reset -y virt
yum module enable -y virt:8.2

Performing the upgrade :

systemctl stop ovirt-engine
yum upgrade -y
engine-setup --accept-defaults 

Rolling back a failed upgrade

A rollback can be performed if the following applies:

  • The deployment or upgrade to the new RHV 4.4 Manager was not successfull.
  • No new instances have been created and/or VMs have not been altered (eg, added disks or nics, etc). If a rollback occurs those changes will be inconsistent with the old manager DB status and potentially imposible to reconciliate.

If so, the rollback can be performed by:

  • Powering off the new RHEL8/RHVH hypervisor and manager.
  • Powering on the old Manager in a RHEL7 hosts. They should be pointed to the old SHE LUN and storage.

Finalising the upgrade

At this point you should have a working manager under the regular https://FQDN/ovirt-engine/ address. Don't forget to clear cookies and browser cache as this might lead into strange WebUI issues.

At this point you can continue reinstalling your hypervisors. I'd suggest:

  • Starting with your SHE hypervisors first. This will ensure you have SHE HA asap.
  • Then the non-SHE hypervisors.
  • Then finalise with the rest of the task such as upgrading Cluster and DC compatibility, rebooting the guest VMs, etc.

Happy hacking!