lundi 19 septembre 2016

Running FreeBSD in Travis-CI

Note for geospatial focused readers: this article has little to do with geo, although it is applied to GDAL, but more with software virtualization, hacks, software archeology and the value of free software. Note for virtualization experts: I'm not one, so please bear with my approximate language and inaccuracies.

Travis-CI is a popular continuous integration platform, that can be easily used with software projects hosted at GitHub. Travis-CI has a free offer for software having public repository at GitHub. Travis-CI provides cloud instances running Linux or Mac OS X. To increase portability tests of GDAL, I wondered if it was somehow possible to run another operating system with Travis-CI, for example FreeBSD. A search lead me to this question in their bug tracker but the outcome seems to be that it is not possible, nor in their medium or long term plans.

One idea that came quickly to mind was to use the QEMU machine emulator that can simulate full machines (CPU, peripherals, memory, etc), of several hardware architectures (Intel x86, ARM, MIPS, SPARC, PowerPC, etc..). To run QEMU, you mostly need to have a virtual hard drive, i.e. a file that replicates the content of the hard disk of the virtual machine you want to run. I found here a small ready-to-use x86_64 image of FreeBSD 9.2, with one nice property: the ssh server and DHCP are automatically started, making it possible to remote connect to it.

So starting with a Travis-CI Ubuntu Trusty (14.04) image, here are the step to launch our FreeBSD guest:

sudo apt-get install qemu
tar xJvf freebsd.9-2.x86-64.20140103.raw.img.txz
qemu-system-x86_64 -daemonize -display none \
   freebsd.9-2.x86-64.20140103.raw.img \
   -m 1536 -smp 4 -net user,hostfwd=tcp::10022-:22 -net nic

The qemu invokation starts the virtual machine as a daemon without display, turn on networking and asks for the guest (ie FreeBSD) TCP port 22 (the ssh port) to be accessible by the host (Linux Trusty) as port 10022

To ssh into the VM, there's one slight inconvenience: ssh login requires a password. The root password for this VM is "password". But ssh is secured and doesn't accept the password to be provided through files or piped in with "echo". I found that the sshpass utility was designed to overcome this in situations where security isn't really what matters. However, the version of sshpass bundled with Ubuntu Trusty didn't work with the corresponding ssh version (not surprisingly since the authors of sshpass mention that it is full of assumptions about how ssh works, that can be easily breaks with changes of ssh). I found that the latest version 1.0.6 worked however.

With 4 extra lines, we can now login into our FreeBSD instance:

tar xzf sshpass-1.06.tar.gz
cd sshpass-1.06 && ./configure && make -j3 && cd ..
export MYSSH="sshpass-1.06/sshpass -p password ssh \
   -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no \
    root@localhost -p10022" 

So now we can configure a bit our FreeBSD VM to install with the 'pkg' package manager a few dependencies to build GDAL:

$MYSSH 'env ASSUME_ALWAYS_YES=YES pkg bootstrap'
$MYSSH 'mkdir /etc/pkg'
sshpass-1.06/sshpass -p password scp \
   -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no \
   -P 10022 FreeBSD.conf root@localhost:/etc/pkg/FreeBSD.conf
$MYSSH 'env ASSUME_ALWAYS_YES=YES pkg install gmake'
$MYSSH 'env ASSUME_ALWAYS_YES=YES pkg install python27'
$MYSSH 'env ASSUME_ALWAYS_YES=YES pkg install py27-numpy'
$MYSSH 'env ASSUME_ALWAYS_YES=YES pkg install sqlite3 curl'
$MYSSH 'env ASSUME_ALWAYS_YES=YES pkg install expat'
Here we go: ./configure && make ! That works, but 50 minutes later (the maximum length of a Travis-CI job), our job is killed with perhaps only 10% of the GDAL code base being compiled. The reason is that we used the pure software emulation mode of QEMU that involves on-the-fly disassembling of the code to be run and re-assembling. QEMU can for example emulate a ARM guest on a Intel host, and vice-versa, and there's no really shortcuts when the guest and host architectures are the same. So your guest can typically run 10 times slower than it would on a real machine with its native architecture. Actually, that's not true, since with the addition of CPU instructions dedicated to virtualization (VT-x for Intel, AMD-V for AMD), an hypervisor called KVM (Kernel Virtual Machine) was added to the Linux kernel, and QEMU can use KVM to implement the above mentioned shortcuts to reach near bare-metal performance. It just takes to use 'kvm' instead of 'qemu-system-x86_64'. Let's do that ! Sigh, our attempt fails miserably with a "failed to initialize KVM" error message. If we display the content of /proc/cpuinfo, we get:

flags  : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc
rep_good nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 sse4_1
sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm
fsgsbase bmi1 avx2 smep bmi2 erms xsaveopt

A lot of nice to have things, but the important thing to notice is the absence of the 'vmx' (Intel virtualization instruction set) and 'svm' (similar for AMD) flags. So this machine has no hardware virtualization capabilities ! Or more precisely, this *virtual* machine has no such capabilities. The documentation of the Trusty Travis-CI environment mentionned they are based on Google Computing Engine as the hypervisor, and apparently it does not allow (or is not configured to allow) nested virtualization, despite GCE being based on KVM, and KVM potentially allowing nested virtualization. GCE allows Docker to run inside VM, but Docker only runs Linux "guests". So it seems we are really stuck.

Here comes the time for good old memories and a bit of software archeology. QEMU was started by Fabrice Bellard. If you didn't know his name yet, F. Bellard created FFMPEG and QEMU, holds a world record for the number of decimals of Pi computed on a COTS PC, has ported QEMU in JavaScript to run the Linux kernel in your browser, devised BPG, a new compression based on HEVC, etc....

At the time where his interest was focused on QEMU, he created KQemu, a kernel module (for Linux, Windows, FreeBSD hosts), that could significantly enhance QEMU performance when the guest and hosts are x86/x86_64. KQemu requires QEMU to be modified to communicate with the kernel module (similarly to the working of QEMU with the KVM kernel module). KQemu started as a closed source project and was eventually released as GPL v2. One of the key feature of KQemu is that it does not require (nor use) hardware virtualization instructions. KQemu software virtualization involves complicated tricks, particularly for code in the guest that run in "Ring 0", ie with the highest priviledges, that you must patch to run as Ring 3 (non-priviledge) code in the host. You can get an idea of what is involved by reading the documentation of VirtualBox regarding software virtualization. I will not pretend that QEMU+KQemu did the exact same tricks as VirtualBox, but that should give you at least a picture of the challenges involved.  This complexity is what lead to KQemu to be eventually abandonned when CPUs with hardware virtualization became widespread to the market since KVM based virtualization is much cleaner to implement. Starting with QEMU 0.12.0, KQemu support was finally dropped from QEMU code base.

Due to KQemu not using hardware virtualization instructions, there is a good hope that it can run inside a virtualized environment. So let's have a try with QEMU 0.11.1 and KQemu 1.4.0pre. Compiling QEMU 0.11.1 on Ubuntu Trusty runs quite well, except a linking error easily fixed with this trivial patch. Building KQemu is a bit more involved, being a kernel module and the (internal) Linux kernel API being prone to changes from time to time. One good news is that the Linux specific part of kqemu is a relatively small file and the API breaks were limited to 2 aspects. The way to get the memory management structure of the current task had changed in Linux 2.6.23 and I found this simple patch to solve it. Another change that occured in a later Linux release is the removal of kernel semaphores to be replaced by mutexes. My cumulated patch to fix all compilation issues is here. I don't pretend that it is technically correct as my knowledge of kernel internals is more than limited, but a local test seemed to confirm that adding -enable-kqemu to the qemu command line worked sufficiently well to start and do things in the FreeBSD VM, and at a very decent speed. I tried the -kernel-qemu switch that turns on KQemu acceleration for kernel guest code, but that resulted in a crash of qemu near the end of the boot process of FreeBSD. Which is not surprising as kernel-qemu makes some assumptions on the internal working of the guest OS, which perhaps FreeBSD does not meet. Or perhaps this is just a bug of qemu/kqemu.

Running it on Travis-CI was successful too, with the compilation being done in 20 minutes, so probably half of the speed of bare metal, which is good enough. kqemu does not support SMP guests (but this was listed in the potential "roadmap", so probably achievable), but if we wanted to speed up compilation, we could potentially launch 2 kqemu-enabled qemu instances (the Travis-CI VM have 2 cores available) that would compile different parts of the software with the build tree being hosted in a NFS share. I said that compilation goes fine, except that the build process (actually the qemu instance) crashes at building time (I can also reproduce that locally). This is probably because the history of qemu & kqemu wasn't long enough to go from beta quality to production quality. I've workarounded this issue by only doing the compilation in -enable-kqemu mode, restarting the VM in pure software emulation to do the linking, and then restarting in -enable-kqemu mode. Unfortunately running the GDAL Python autotest suite in kqemu mode also leads to a qemu crash (due to the design of kqemu only runnnig code in ring 3, crashes do not affect the host), and running it completely in pure emulation mode reaches the 50 minute time-out, so for the sake of this demonstration, I only run one of the test file. And now we have our first succesful build given this build recipee.

I could also have potentially tried VirtualBox because, as mentionned above, it supports software virtualization with acceleration. But that is only for 32 bit guests (and I didn't find a ready-made FreeBSD 32bit image that you can directly ssh into). For 64 bit guests, VirtualBox require hardware virtualization to be available in the host. To the best of my knowledge, KQemu is (was) the only solution to enable acceleration of 64 bit guests without hardware requirements.

My main conclusion of this experiment is it is a striking example of a key advantage of the open source model. If kqemu had not been released as GPL v2, I would have never been able to resurrect it and modify it to run on newer kernels (actually there was also QVM86, an attempt of developing an alternative to Kqemu while Kqemu was still closed source and that was abandonned when VirtualBox was open sourced).