Upgrading from TrueNAS Core to TrueNAS SCALE

So, as a part of my working to retire an ancient (20 year old) server which I have been using as a NFS server, among other things, I had one major task before me… fix things so that my NAS server could be used for the what I intended. But I was having some issues around 8 years ago when I first tried this, and I don’t remember all the issues, since work kept me from attacking this problem at the time. With my NAS software, TrueNAS Core (back then, it was still called FreeNAS) being essentially EOL in favor of a new version with a different name, I decided to attack the upgrade to TrueNAS SCALE this week. I say “this week”, because I wanted to be sure to backup all the data I had only on my NAS server, and nowhere else. And when you are backing up multiple terabytes of data, even with gigabit ethernet, that takes quite a while. And, had it failed, I was just going to skip using the fancy NAS software with its ZFS filesystem, and go for a regular Linux install to make a NFS server initially, then add other services as I went along.

My first task was to try to reproduce some of the issues I was having. One of the biggest issues was that while my ancient NFS server worked without issues, it turns out that when trying to use my NAS server, I was being prompted for password, since my SSH keys were not being used. This is a huge issue for me, because I am always connecting between all my hosts (over 20 of them), and it goes from being just a nuisance to a major PITA. Back then, I suspect that I had just written it off as a SELinux issue, since around that time, I was really going in on the idea of having it in enforcing mode, rather than permissive mode. So, while reproducing it, I did a bit of digging (with far better knowledge of SELinux, sssd and idmapd), and found that back when I first had this problem, my user and group IDs were not being mapped correctly. But today, I got the critical clues with debugging turned on. The two key lines were:

Oct 19 10:36:29 devhost kernel: NFS: v4 server nas.xyzzy.ka8zrt.com does not accept raw uid/gids. Reenabling the idmapper.
Oct 19 10:36:24 devhost nfsidmap[1840]: nss_name_to_gid: name  'cinnion@xyzzy.ka8zrt.com' does not map into domain 'ka8zrt.com'

This combination gave me the critical clue… idmapd might need tweaked. Sure enough, changing the domain name in /etc/idmapd.conf fixed that issue, and I am preparing to push that change out to all my other hosts. More on that in a bit.

So, to do the upgrade, I just selected the dropdown on the TrueNAS Core to select the latest stable TrueNAS SCALE release as documented in the migration guide as the update train. I then told it to download and apply the update, and about 30 minutes later… voilà (I love autocorrect for helping me add that accented character!). And here, I noticed the first of the minor gotchas… when trying to get at my home directory, I was getting an “Unknown error 521“. It turns out that the host I was using to do my testing had stale NFS information. Easy enough, just reboot to clear that mount to start fresh, and sure enough, it worked. (Sorry. No, I did not try just restarting the automounter or such, it is just a development server I use.)

Now, a second glitch I have run into is with ansible. I use a redis cache for storing facts about all my hosts, and it was confused and trying to use the path to python which was on the FreeBSD install of TrueNAS Core, not the one for the debian based TrueNAS SCALE. Easy fix…

[root@resune ansible]# redis-cli
127.0.0.1:6379> del ansible_factsnas

And after that, my gather-facts playbook (full playbook can be seen here) got a second error. While it detects the server as a libvirt host, libvirt is apparently not running. Another easy fix… I just change

   - name: Get guest virtual machines
      virt:
        command: list_vms
      register: list_vms
      when:
        - ansible_facts['virtualization_role'] == "host"
        - inventory_hostname != 'planys'

    - name: Save guest virtual hosts as cachable fact
      set_fact:
        cacheable: yes
        guest_vms: "{{ list_vms.list_vms }}"
      when:
        - ansible_facts['virtualization_role'] == "host"
        - inventory_hostname != 'planys'

to

   - name: Get guest virtual machines
      virt:
        command: list_vms
      register: list_vms
      when:
        - ansible_facts['virtualization_role'] == "host"
        - inventory_hostname != 'planys'
        - inventory_hostname != 'nas'

    - name: Save guest virtual hosts as cachable fact
      set_fact:
        cacheable: yes
        guest_vms: "{{ list_vms.list_vms }}"
      when:
        - ansible_facts['virtualization_role'] == "host"
        - inventory_hostname != 'planys'
        - inventory_hostname != 'nas'

Notice the additional lines at the end of each task? Easy enough, and if/when I decide to actually host a virtual host there, I just remove those lines. But I am not likely to do that, as this is not as beefy a server as the one on which I run my virtual machines, such as my web server, which is dedicated to that task and currently has 4x the RAM and faster CPUs. And it certainly is the case given that the ZFS cache is taking around 75% of the RAM in this machine!

Now, to cleanup tasks, such as to copy my home directories from the old machine. Sure to take some time…

PHP Upgrades (aka the joys of running a LTS operating system)

Well, earlier today, I got a reminder that I had not upgraded PHP. Indeed, unlike most of my installs, the virtual host running my WordPress sites was installed from a Live CD, and was running the dated PHP 5.4 version which CentOS/RHEL 7 comes with as a part of their base. It is a joy of running an operating system which comes with “long term support”, aka LTS. When an OS such as CentOS/RHEL, or Ubuntu’s LTS releases is going through the release process, the out-of-the-box repositories result in configurations are pretty much set in stone as to what versions of given software packages are included, and they don’t always take into account things like how much longer software package X will be supported. So when the process started for RHEL 7.0 (from which CentOS 7 is compiled) in late 2013 for the July 2014 release, they packaged things like PHP 5.4.16, Python 2.7.5 and other old packages into the release, and at a point in the release cycle, even if there is a minor version upgrade (from say 5.4.15 to 5.4.16), they do not pick up the new version, because of all the testing which would need to be done to guarantee stability. They might backport certain security fixes, but no more until the next release. And then, through the entire 7.x lifecycle (or the lifecycle of say Ubuntu 10.04LTX), it is pretty much a given that PHP would remain a 5.4.x release. And for RHEL, this cutoff date was actually such that PHP 5.5, which was released June 2013, much less PHP 5.6, which was released August 2014 have never made it into the core CentOS/RHEL repositories, and are installed by default when you install the package called “php”. The result is that RHEL (and thus CentOS) were running with versions which were no-longer supported… indeed, 5.6 patches were no longer being released to be backported either, since support for 5.6 ended this past December. And for Python, it is much the same story, with releases through 2.7.16 now being available.

Why are LTS releases out there? Because sometimes, even the changes going from say 2.7.5 to 2.7.6 can cause issues for software vendors trying to support their software, and when you start talking about going from say a 2.6.x to 2.7.x release, or worse, a 2.y.x to a 3.y.x release, the odds of that happening increase, sometimes significantly. Indeed, changes like that often result in the downline vendors having to go through their own release cycles, which can be quite expensive. And ultimately, you have a battle with multiple sides trying to come to an accord which balances things like finances, security, new features and more, and where the costs and risks can easily run $100K up to values in the millions, depending on application, the number of installs, etc. (When I was at Bell Labs Messaging, a simple patch to the OS for a OS bug might start with $5K or more of testing by myself, before it even hit our QA team, where bundled with other software patches, a testing cycle might run another $100K easily, all for a new release of Audix or Conversant… and until then, it was only installed manually on very select customers who had run into the problem and could not wait).

Now, depending on the operating system, there are options to help with this for those who are willing to expend some additional effort on the upgrades, and any testing of their environments. For PHP, this involves either installing from either the IUS Repository (“IUS” = “Inline with Upstream Stable), or Remi’s RPM Repository (run by the Remi Collet, who is a PHP contributor who also maintains many of the RPM packages for the Fedora/RHEL/CentOS distros). But these two repositories take slightly different approaches, and different versions of PHP could not traditionally be installed side-by-side.

For myself… I actually used the IUS repository… I dislike how the Remi versions of the RPMs install everything under /opt/remi/... instead of /usr/... And while it does not have PHP 7.3 yet, it does have PHP 7.2. And thankfully, the upgrade appears to have gone relatively smoothly. I tend to also prefer everything being installed via just via RPMs… why should I have to keep track of what was installed via RPM, as well as via PHP’s pear/pecl, or Python’s pip utilities. I am coming to use those utilities more, as so much is not available as RPMs… but it is a layer of nuisance I would rather not have to deal with. Unfortunately, the PHP ssh2 module required me to install it via pecl, which meant additional development packages needing to be installed right now. Sometime halfway soon, I hope to instead start looking at re-packaging some of these into RPMs myself. I would far rather have my own repo (I actually have two per distribution/release which I use, one for packages I figure to share, another for packages which contain things I consider to be security sensitive and will not). But for now, things seem to be good. 🙂

Will I ever stop using a LTS distro/release? No… I consider Fedora’s release cycle to have been enough of a pain in my ass, that outside of my workstation and perhaps a development VM, all of my servers will be of the LTS variety. After all, with changes which occured with Fedora 20’s installer, I had my workstation remaining at FC19 until just a few months ago, and here in a month or so, I will likely just say “reinstall my workstation” to cobbler, reboot the workstation, and get up the next morning to find it all shiny and new. And when RHEL 8 is released and CentOS 8 comes out, I likely will do the same with many of my servers, as I am currently doing some testing of the RHEL 8 beta release they made awhile back. Now, if only WinBlows were as easy…