Wednesday, April 10, 2013

Setting up a Hydra build cluster for continuous integration and testing (part 2)

In the previous blog post, I have described Hydra -- a Nix-based continuous integration server and I have given an incomplete tour of its features.

In order to be able to use Hydra, we have to set up a build cluster consisting of one or more build machines. In this blog post, I will describe how I have set up a Hydra cluster of three machines.

Prerequisites


To set up a cluster, we need two kinds of machines:

  • We need a build coordinator, with the Nix package manager installed running the three Hydra components: evaluator, queue runner and server. Hydra can be installed on any Linux distribution, but it's more convenient to use NixOS as it provides all the configuration steps as a NixOS module. Other distributions require more manual installation and configuration steps.
  • We need one or more build slaves, or we have to use the coordinator machine as build slave. Various types of build slaves can be used, such as machines running Linux, Mac OS X, FreeBSD and Cygwin.

I have set up 3 machines, consisting of a Linux coordinator, Linux build slave and a Mac OS X build slave, which I will describe in the next sections.

Installing a NixOS build slave


To set up a build slave (regardless of the operating system that we want to use), we need have to install two system services -- we need the Nix package manager and the OpenSSH server so that it can be remotely accessed from the build coordinator.

Setting up a Linux build slave running NixOS is straightforward. I have used the NixOS Live CD to install NixOS. After booting from the Live CD, I first had to configure my harddrive partitions:
$ fdisk /dev/sda
I have created a swap partition (/dev/sda1) and root partition (/dev/sda2). Then I had to initialize the filesystems:
$ mkswap -L nixosswap /dev/sda1
$ mke2fs -j -L nixos /dev/sda2
Then I had to mount the root partition on /mnt:
$ mount LABEL=nixos /mnt
And I have created a NixOS configuration.nix file and stored it in /mnt/etc/nixos/configuration.nix. My NixOS configuration looks roughly as follows:
{pkgs, ...}:

{
  boot.initrd.kernelModules = [ "uhci_hcd" "ehci_hcd" "ata_piix" ];

  nix.maxJobs = 2;
    
  boot.loader.grub.enable = true;
  boot.loader.grub.version = 2;
  boot.loader.grub.device = "/dev/sda";

  networking.hostName = "i686linux";

  fileSystems."/".device = "/dev/disk/by-label/nixos";

  swapDevices =
    [ { device = "/dev/disk/by-label/nixosswap"; } ];

  services.openssh.enable = true;
}

By running the following instruction NixOS gets installed, taking care of downloading/installing all required packages and composing the entire system configuration:
$ nixos-install
After the installation has succeeded, we can reboot the machine and boot into our freshly installed NixOS installation. The new installation has a root user account without any password. Therefore, it's smart to change the root password to something that's slightly more difficult to guess:
$ passwd
And then the installation is done :-). However, apart from the steps that I have described, it may also be convenient to add a non-privileged user account and install some administration tools.

Upgrading the NixOS installation can be done by running:
$ nixos-rebuild --upgrade switch
The above command-line instruction fetches the latest channel expressions and manifests containing the latest releases of the packages, and then rebuilds the entire system configuration and finally activates it.

As a sidenote: we can also use ordinary Linux distributions as build slaves, but this requires more manual installation and configuration, especially if you want to use it's more advanced features, such as multi-user builds. Moreover, since NixOS is almost a "pure system", it reduces the chances on side effects, which is a bit harder to guarantee with conventional Linux distributions.

Installing a Mac OS X build slave


Mac OS X was already pre-installed on our Mac machine, so I only had to set up a user account and perform some basic settings.

On the Mac OS X machine, I have to install the Nix package manager manually. To do this, I have obtained the Nix bootstrap binaries for x86_64-darwin (the system architecture identifier for a 64-bit Mac OS X machine) and installed it by running the following commands on the terminal:
$ sudo -i
# cd /
# tar xfvj /Users/sander/Downloads/nix-1.5.1-x86_64-darwin.tar.bz2
# chown -R sander /nix
# exit
$ nix-finish-install
By running the above command-line instructions, the /nix directory has been set up containing a Nix store with the Nix package manager and all its dependencies. The last command: nix-finish-install takes care of initializing the Nix database. We run this build as ordinary user, since we don't want to use the superuser for Nix installations.

If we add a Nix package to the user's profile, we also want it to be in the user's PATH, so that we can start a program without specifying its full path. I have appended the following line to both the user profile ~/.profile as well as ~/.bashrc:
$ cat >> ~/.profile <<EOF
source $HOME/.nix-profile/etc/profile.d/nix.sh
EOF
By adding the above code fragment to the user's shell profile, the Nix profile is appended to PATH allowing you to conveniently launch packages without specifying their full path including their hash-codes.

You may probably wonder why this line needs to be added to both .profile and .bashrc? The former case allows you to start packages when a login shell is used, e.g. by launching a terminal from the Mac OS X desktop. The latter case is needed for non-login shells. If the Hydra coordinator remotely executes a command-line instruction through SSH, then the shell is a non-login shell. If we don't add this line to .bashrc then we're unable to run the Nix package manager, because it's not in the PATH.

After performing the above steps, we can run a simple sanity check to see if the Nix package manager works as expected. The following instructions add a Nixpkgs channel and fetches the latest expressions and manifests:
$ nix-channel --add http://nixos.org/channels/nixpkgs-unstable
$ nix-channel --update
After updating the Nix channel, we should be able to install a Nix package into our profile, such as GNU Hello:
$ nix-env -i hello
And we should be able to run it from the command-line as the user's profile is supposed to be in our PATH:
$ hello
Hello, world!
After installing the Nix package manager, there may be some other desirable steps that must be performed. In order to build iOS apps or Mac OS X applications, Apple's Xcode needs to be installed, which must be done through Apple's App Store. We cannot use Nix for this purpose unfortunately. I have given some instructions in a previous blog posts about building iOS apps with the Nix package manager.

It may also be desired to build OpenGL applications for Mac OS X. To make this possible you need to manually install XQuartz first.

Finally, to be able use the Mac OS X machine as build slave we need to configure two other things. First, we must enable the SSH server so that the build machine can be remotely invoked from the coordinator machine. We need to open Mac OS X's system preferences for this, which can be found by clicking on the Apple logo and by picking 'System preferences':


The System preferences screen looks as follows:


By picking the 'Sharing' icon we can configure various services that makes the machine remotely accessible:

As can be observed from the above screenshot, we can enable remote SSH access by enabling the 'Remote login' option. Furthermore, we must configure the hostname to set it to something that we can remember.

Another issue is that we need to turn some power management settings off, because otherwise the Mac machine will turn standby after a while and cannot be used to perform builds. Power management settings can be adapted by picking 'Energy saver' from the System preferences screen, which will show you the following:

I have set the 'Computer sleep' time to 'Never' and I've disabled putting the harddisks to sleep.

Setting up the NixOS build coordinator machine


Then comes the most complicated part -- setting up the build coordinator machine. First, I performed a basic NixOS installation, which installation procedure is exactly the same as the NixOS build slave described earlier. After performing the basic installation, I have adapted its configuration, to turn it into a build coordinator.

Since Hydra is not part of the standard NixOS distribution, we have to obtain it ourselves from Git and store the code in a directory on the filesystem (such as the /root folder):
$ git clone https://github.com/NixOS/hydra.git
Then I have extended the machine's configuration, by adding a number of settings to the attribute set body of /etc/nixos/configuration.nix:

  • To be able to use Hydra's NixOS configuration properties, we must include the Hydra NixOS module:
    require = [ /root/hydra/hydra-module.nix ];
    
  • We must enable the Hydra server and configure some of its mandatory and optional properties:
    services.hydra = {
      enable = true;
      package = (import /root/hydra/release.nix {}).build {
        system = pkgs.stdenv.system;
      };
      logo = ./logo.png;
      dbi = "dbi:Pg:dbname=hydra;host=localhost;user=hydra;";
      hydraURL = "http://nixos";
      notificationSender = "yes@itsme.com";
    };
    

    In the above code fragment, hydra refers to the actual Hydra package, the logo is an optional parameter that can be used to show a logo in the web front-end's header, dbi is a Perl DBI database connection string, configured to make a localhost connection to a PostgreSQL database named: hydra using the hydra user, hydraURL contains the URL to the web front-end, and notificationSender contains the administrator's e-mail address.
  • In order to be able to delegate builds to build slaves for scalability and portability, we have to enable Nix's distributed builds feature:
    nix.distributedBuilds = true;
    nix.buildMachines = [
      { hostName = "i686linux";
        maxJobs = 2;
        sshKey = "/root/.ssh/id_buildfarm";
        sshUser = "root";
        system = "i686-linux";
      }
      
      { hostName = "macosx";
        maxJobs = 2;
        sshKey = "/root/.ssh/id_buildfarm";
        sshUser = "sander";
        system = "x86_64-darwin";
      }
    ];
    
    The above code fragment allows us to delegate 32-bit Linux builds to the NixOS build slave and 64-bit Mac OS X builds to the Mac OS X machine.

  • Hydra needs to store its data, such as projects, jobsets and builds into a database. For production use it's recommended to use PostgreSQL, which can be enabled by adding the following line to the configuration:
    services.postgresql.enable = true;
    
  • The Hydra server runs its own small webserver on TCP port 3000. In production environments, it's better to add a proxy in front of it. We can do this by adding the following Apache HTTP server configuration settings:
    httpd = {
      enable = true;
      adminAddr = "yes@itsme.com";
          
      extraConfig = ''
        <Proxy *>
        Order deny,allow
        Allow from all
        </Proxy>
            
        ProxyRequests     Off
        ProxyPreserveHost On
        ProxyPass         /    http://localhost:3000/ retry=5 disablereuse=on
        ProxyPassReverse  /    http://localhost:3000/
      '';
    };
    
  • To allow e-mail notifications to be sent, we must configure a default mail-server. For example, the following does direct delivery through sendmail:
    networking.defaultMailServer = {
      directDelivery = true;
      hostName = "nixos";
      domain = "nixos.local";
    };
    
  • As you may probably know from earlier blog posts, Nix always stores versions of components next to each other, and components never get overwritten or removed automatically. At some point we may run out of diskspace. Therefore, it's a good idea to enable garbage collection:
    nix.gc = {
      automatic = true;
      dates = "15 03 * * *";
    };
    
    services.cron = {
      enable = true;
          
      systemCronJobs =
        let
          gcRemote = { machine, gbFree ? 4, df ? "df" }:
            "15 03 * * *  root  ssh -x -i /root/.ssh/id_buildfarm ${machine} " +
            ''nix-store --gc --max-freed '$((${toString gbFree} * 1024**3 - 1024 * ''+
            ''$(${df} -P -k /nix/store | tail -n 1 | awk "{ print \$4 }")))' ''+
            ''> "/var/log/gc-${machine}.log" 2>&1'';
        in
        [ (gcRemote { machine = "root@i686linux"; gbFree = 50; })
          (gcRemote { machine = "sander@macosx"; gbFree = 50; })
        ];
    };
    

  • The nix.gc config attribute generates a cron job that runs the garbage collector service at 3:15 AM every night. The services.cron configuration also remotely connects to the build slave machines and runs the garbage collector if a certain threshold has been reached.

  • It may also be worth enabling some advanced features of Nix. For example, in our situation we have many large components that are very similar to each other consuming a lot of diskspace. It may be helpful to enable hard-link sharing, so that identical files are stored only once.

    Moreover, in our current configuration we also download substitutes from the NixOS' Hydra build instance, so that we don't have to build the complete Nixpkgs collection ourselves. It may also be disable this and take full control.

    Another interesting option is to enable chroot builds, reducing the chances on side effects even more:
    nix.extraOptions = ''
      auto-optimise-store = true
      build-use-substitutes = false
    '';
    nix.useChroot = true;
    
    The nix.conf manual page has more information about these extra options.


After adapting the coordinator's configuration.nix, we must activate it by running:
$ nixos-rebuild switch
The above command-line instruction downloads/installs all the required packages and generates all configuration files, such as the webserver and cron jobs.

After rebuilding, we still don't have a working Hydra instance yet. We must still set up its storage, by creating a PostgreSQL database and Hydra user. To do this, we must perform the following instructions as root user:
# createuser -S -D -R -P hydra
# createdb -O hydra hydra
By running the hydra-init job, we can setup its schema or migrate it to a new version:
# start hydra-init
Then we must create a configuration file that allows the unprivileged Hydra user to connect to it:
# su hydra
$ echo "localhost:*:hydra:hydra:password" > ~/.pgpass
$ chmod 600 ~/.pgpass
The .pgpass file contains the hostname, database, username and password which must be replaced by the user's real password, of course.

We also need to set up a user, as the user database is completely empty. The following will create an administration user named 'root' with password 'foobar':

$ hydra-create-user root --password foobar --role admin
And finally we can activate the three Hydra processes, which allows us to use it and to access the web front-end:
$ exit
# start hydra-{evaluator,queue-runner,server}

Setting up connections


We now have a running Hydra instance, but there is still one detail missing. In order to allow the coordinator to connect to the build slaves, we need SSH keys without passphrases allowing us to connect automatically. Generating a SSH keypair can be done as follows:
$ ssh-keygen -t rsa
The above command asks you a couple of questions. You have to keep in mind that we should not specify a passphrase.

Assuming that we have called the file: id_buildfarm, then we have to two files, a private key called: id_buildfarm and a public key called id_buildfarm.pub. We must copy the public key to all build slaves and run the following instruction on each client machine, under the user which performs the build (which is root on the Linux machine and sander on the Mac OS X machine):
$ cat id_buildfarm.pub >> ~/.ssh/authorized_keys
The above command adds the public key to a list of authorized keys, allowing the coordinator to connect to it with the private key.

After installing the public keys, we can try connecting to the build slaves from the coordinator through the private key, by running the following command as root user:
$ ssh -i ~/.ssh/id_buildfarm root@i686linux
If we run the above command, we should be able to connect to the machine without being asked for any credentials. Moreover, the first time that you connect a machine, the host key is added to the known_hosts list, which is necessary because otherwise we won't be able to connect.

Another issue that I have encountered with the Mac OS X machine is that it may stall connections after no input has been received from the coordinator for while. To remedy this, I added the following lines to the SSH config of the root user on the coordinator machine:
$ cat > /root/.ssh/config <<EOF
ServerAliveCountMax 3
ServerAliveInterval 60
EOF

Conclusion


In this blog post, I have described the steps that I have performed to set up a cluster of three Hydra machines consisting of two Linux machines and one Mac OS X machine. To get an impression on how Hydra can be used, I recommend users the read my previous blog post.

In the past, I have also set up some more "exotic" build slaves, such as the three BSDs: FreeBSD, OpenBSD, NetBSD and an OpenSolaris machine. To install these platforms, we can roughly repeat the procedure that I have done to install the Mac OS X build slave. First install the host OS itself, then the Nix package manager (there are bootstrap binaries for several platforms available or you must do a source install), and then set up SSH.

No comments:

Post a Comment