Tuesday, May 22, 2018

A new challenge

It has been quiet on my blog for a while, but don't worry, I'm still alive. In my absence, many things have happened. Most importantly, I have decided to embark on a new challenge.

Almost two months ago, I joined Mendix, a Rotterdam-based company (with its headquarters located in Boston) that ships a low-code application development platform. I will be working on their deployment infrastructure and I will be thinking about software modularity.

Some reflection


I vividly remember the days when I completed my PhD and left academia -- I still had to wait a couple of months for the defence ceremony, mainly because of the availability of the committee members.

Although I was quite happy with some of the work I had done during my research, I also wanted to leave my ivory tower and be more connected to the "real world" and my "research audience": developers. I joined a small startup company named Conference Compass (located in the YES!Delft incubator centre) that consisted of fewer than 10 people around the time I joined.

They had been looking into a setting up a product-line for their mobile conference apps, which sounded like an interesting challenge.

In the years that I was employed at Conference Compass, quite a few things happened. Most notably, the size of the company, the product and the service portfolio have grown considerably. Aside from these developments, I am particularly proud that I made quite a number of impacting open source contributions as part of my daily work.

The app building infrastructure


The biggest contribution I made by far is the mobile app building infrastructure. Most of its components have been part of Nixpkgs (the ecosystem of packages that can be deployed with the Nix package manager) for several years, such as:


To carry out all the builds on a large and timely scale, I installed a Hydra cluster: a Nix-based continuous integration service. I also developed an NPM module and command-line tool that we could use to remotely control a Hydra server from our custom built applications.

Chat functionality


Another interesting development area was enriching the app's product line with chat functionality built around the XMPP protocol/ejabberd service. I have ported the Simple XMPP library from the Node.js package ecosystem to Titanium by using a zero-forking strategy and I made a simple test application that somewhat resembles the Pidgin chat application.

I also ran into a number of practical issues while trying to keep the architecture of the test app clean. I did an in-depth study on the MVC paradigm and wrote a blog post about my findings.

Node.js


I also learned quite a lot about Node.js and its underlying concepts, such as asynchronous programming. Before joining Conference Compass, my JavaScript knowledge was limited to browser usage only, and I was not using JavaScript on a very extensive scale.

My learning experiences resulted in the following blog posts elaborating about various kinds of related concepts:


As of today, some of these blog posts are still in my all-time top 10 of most frequently read blog posts.

As a result of having to work with Node.js and being involved with the Nix project, I became the maintainer of node2nix, a tool that can generate Nix expressions from NPM package configurations after the maintainer of npm2nix decided to hand over the project.

Building a service deployment platform


In the first two years of my employment, my chief responsibility was the app building infrastructure. Another thing I am particularly proud of is the deployment infrastructure for the service platform that I built from scratch, that grew from just a single virtual machine hosted in the Amazon EC2 cloud to a platform managing the data and configuration services for 100+ apps per year, with some events attracting tens of thousands of app users.

I used variety of solutions, such as various Amazon web services, e.g. EC2, Route 53, S3. Most importantly, I used NixOps for infrastructure deployment and Disnix (the tool I created as part of my research) for service deployment.

Although Disnix already supported all the features I needed before I actually started using it at Conference Compass, my company experiences helped to substantially improve Disnix from a usability perspective -- in academia, Disnix was mostly used to validate my research objectives. To make it suitable for using it in a company with non-specialized deployment people, you need iron out many additional issues. I added more helpful error messages, assistance in recovering from errors and additional utilities to make diagnosing problems and carrying out maintenance tasks more convenient.

At the end of 2015, after using Disnix for almost one year in production, I gave a talk about Disnix's deployment concepts at NixCon 2015, the first edition of a conference fully centered around Nix and its related technologies.

Conclusion


I am grateful to my previous employer: Conference Compass who gave me the opportunity to do all the things described in this blog post.

At Mendix, there will be many new interesting challenges for me -- I will be working with different kinds of technologies, a new platform and new people. Stay tuned, for more information...

Sunday, February 25, 2018

A more realistic public Disnix example

It has almost been ten years ago when I started developing Disnix -- February 2008 marked the start of my master's thesis internship at Philips Research that resulted in the first prototype version.

Originally, Disnix was specifically developed for one use case only -- a medical service-oriented system called the "Service Development Support System" (SDS2) that can be used for asset tracking and utilisation analysis for medical devices in a hospital environment. More information about this case study can be found in my master's thesis, some of my research papers and my PhD thesis (all of them can be found on my publications page).

Many developments have happened since the realization of the first prototype -- its feature set has been extended considerably, its architecture has been overhauled several times and the code has evolved significantly. Most notably, I have been maintaining a production system for over three years with it.

In all these years, there is always one recurring question that I regularly receive from various kinds of people:

Why should I use Disnix and why would it be useful?

The answer is that Disnix becomes useful when you have a system that can be decomposed into distributable services, such as web services, RESTful services, web applications or processes.

In addition to the fact that Disnix automates its deployment and offers a number of powerful quality properties (e.g. non-destructive upgrades for the static parts of a system), it also helps componentized systems in reaching their full potential -- for example, when services can be built, deployed, and managed individually you can scale a system up and down (e.g. by distributing services to dedicated machines or consolidating all services on a single machine) and you can anticipate more flexibly to events (e.g. by redeploying services when we encounter a crashing machine).

Although the answer may sound simple, service-oriented systems are complicated -- besides facing all kinds of deployment complexities, properly dividing a system into distributable components is also quite challenging. For all the systems I have seen in the last decade, the requirements and their modularization strategies were all quite different from each other. I have also seen a number of systems for which decomposing into services did not work and unnecessary complexities were introduced.

Moreover, it is hard to find representative public examples that people can use as a reference. I was fortunate that I had access to an industrial case study during my research. Nonetheless, I was suffering from many difficulties because of the lack of any meaningful public case studies. As a countermeasure, I developed a collection of example cases in addition to SDS2, but because of their over-simplicity, proving my point often remained hard.

Roughly half a year ago, I have released most parts of my ancient web framework that I used to actively develop before I started doing research in software deployment and I created a couple of example applications for it.


Although my web framework development predates my deployment research, I was already using it to implement information systems that followed some modularity principles that are beneficial when using Disnix as a deployment system.

Recently, I have extended my web framework's example applications repository (providing a homework assistant, CMS, photo gallery and literature survey assistant) to become another public Disnix example case following the same modularity principles I used for the information systems I used to implement at that time.

Creating a componentized web information system


As mentioned earlier in this blog post, I have already implemented a (fairly simple) componentized web information system before I started working on Disnix using my ancient custom made web framework. The "componentization process" (a term that I had neither learned about yet nor something I was consciously implementing at that time) was partially driven by evolution and partially by non-functional requirements.

Originally, the system started out as just one single web application for one specific purpose and consisted of only two components -- a MySQL database responsible for storing the data and web front-end implemented in PHP, which is quite a common separation pattern for PHP applications.

Later, I was asked to implement another PHP application with similar functionality. Initially, I wrote the application from scratch without any reuse in mind, but at some point I made two important decisions:

  • I decided to keep the databases of each applications separate as opposed to integrating all the tables into one single database. My main motivating factor was that I wanted to prevent another developer's wrong decisions from messing up the other application. Moreover, I realized that for the data that was specific to the application domain that other systems did not have to know about it.
  • In addition to domain specific data, I noticed that both databases also stored the same kind of data, namely: user accounts -- both systems had a user account system to allow users to change the data. This also did not motivate me to integrate both databases into one database. Instead, I created a separate user database and authentication system (as a library API) that was shared among both applications.

After completing the two web applications, I had to implement more functionality. I decided to keep all of these new features for these new problem domains in separate applications with separate databases. The only thing they had in common was a shared user authentication system.

At some point I ended up having many sub applications. As a result, I needed a portal application that redirected users to these sub applications. Essentially, what I implemented became a system of systems.

Deployment with Disnix


The "architectural decisions" that I described earlier resulted in a system composed of several kinds of components:

  • Domain-specific web applications exposing functionality that logically belongs together.
  • Domain-specific databases storing tables that are strongly correlated.
  • A shared user database.
  • A portal application redirecting users to the domain-specific web applications.

The above listed components can be distributed over multiple machines in a network, because they connect to each other through network links (e.g. connecting to a MySQL database can be done with a TCP connection and connecting to a domain specific web application can be done through HTTP). As a result, they can also be modeled as services that can be deployed with Disnix.

To replicate the same patterns for demo purposes, I integrated my framework's example applications into a similar system of sub systems. We can deploy the corresponding example system to one single target machine with Disnix, by running:

$ disnixos-env -s services.nix \
  -n network-single.nix \
  -d distribution-single.nix --use-nixops

The entire system gets deployed to a single machine because of the distribution model (distribution.nix) that maps all services to one target machine:

{infrastructure}:

{
  usersdb = [ infrastructure.test1 ];
  cmsdb = [ infrastructure.test1 ];
  cmsgallerydb = [ infrastructure.test1 ];
  homeworkdb = [ infrastructure.test1 ];
  literaturedb = [ infrastructure.test1 ];
  portaldb = [ infrastructure.test1 ];

  cms = [ infrastructure.test1 ];
  cmsgallery = [ infrastructure.test1 ];
  homework = [ infrastructure.test1 ];
  literature = [ infrastructure.test1 ];
  users = [ infrastructure.test1 ];
  portal = [ infrastructure.test1 ];
}

The resulting deployment architecture looks as follows:


The above visualization of the deployment architecture shows the following aspects:

  • The surrounding light grey colored box denotes a target machine. In this particular example, we only have one single target machine where services are deployed to.
  • The dark grey colored boxes correspond to container environments. For our example system, we have two of them: mysql-database corresponding to a MySQL DBMS server and apache-webapplication corresponding to an Apache HTTP server.
  • The ovals denote services corresponding to MySQL databases and web applications.
  • The arrows denote inter-dependency links that correspond to network connections. As explained in my previous blog post, solid arrows are dependencies with a strict ordering requirement while dashed arrows are dependencies without an ordering requirement.

Some people may argue that it is not really beneficial to deploy such a system with Disnix -- with NixOps you can define a machine configuration having a MySQL DBMS server and an Apache HTTP server with the corresponding databases and web application components. With Disnix, you must first ensure that the machines, the MySQL and Apache HTTP servers are configured by other means first (that could for example be done with NixOps), and then you have to deploy the system's components with Disnix.

In a single machine deployment scenario, it may indeed not be that beneficial. However, what you get in addition to automated deployment is also more flexibility. Since Disnix manages the services directly, as opposed to entire machine configurations as a whole, you can anticipate better in case of events by redeploying the system.

For example, when the amount of visitors keeps growing, you may run into the problem that a single server can no longer handle all the traffic. In such cases, you can easily add another machine to the network and adjust the distribution model to move (for example) the databases to another machine:

{infrastructure}:

{
  usersdb = [ infrastructure.test2 ];
  cmsdb = [ infrastructure.test2 ];
  cmsgallerydb = [ infrastructure.test2 ];
  homeworkdb = [ infrastructure.test2 ];
  literaturedb = [ infrastructure.test2 ];
  portaldb = [ infrastructure.test2 ];

  cms = [ infrastructure.test1 ];
  cmsgallery = [ infrastructure.test1 ];
  homework = [ infrastructure.test1 ];
  literature = [ infrastructure.test1 ];
  users = [ infrastructure.test1 ];
  portal = [ infrastructure.test1 ];
}

By redeploying the system, we can take advantage of the additional system resources that the new machine provides:

$ disnixos-env -s services.nix \
  -n network-separate.nix \
  -d distribution-separate.nix --use-nixops

resulting in the following deployment architecture:


Likewise, there are countless of other deployment strategies possible to meet all kinds of non-functional requirements. For example, we can also distribute bundles of domain specific application and database pairs over two machines:

$ disnixos-env -s services.nix \
  -n network-bundles.nix \
  -d distribution-bundles.nix --use-nixops

resulting in the following deployment architecture:


This approach is even more scalable than simply offloading the databases to another server.

In addition to scalability, there are countless of other reasons to pick a certain distribution strategy. You could also, for example, distribute redundant instances of databases and applications as a failover to improve availability or improve security by deploying the databases with privacy sensitive data to a machine with restrictive network access.

State management


When updating the deployment of systems with Disnix (such as moving a database from one machine to another), there may be a recurring limitation that you could run frequently into -- like Nix, Disnix only manages the static parts of the system, but not any state. This means that a service's deployment can be reproduced elsewhere, but data, such as the content of a database is not migrated.

For example, the sub system of example applications stores two kinds of data -- records in the MySQL database and files, such as images uploaded in the photo gallery or PDF files uploaded to the literature application. When moving these applications around the data is not migrated.

As a possible solution, Disnix also provides simple state management facilities. When enabled, Disnix will take snapshots of the databases and filesets on the source machines, transfers the snapshots to the target machines, and finally restores the snapshots when moving a service one machine to another in the distribution model.

State management can be enabled globally by passing the --deploy-state parameter to (disnix-env or annotating the services with deployState = true; in the services model):

$ disnixos-env -s services.nix \
  -n network-bundles.nix \
  -d distribution-bundles.nix --use-nixops --deploy-state

We can also directly use the state management system, e.g. for backup purposes. When running the following command:

$ disnix-snapshot

Disnix takes snapshots of all databases and web application state (e.g. the images in the photo gallery and uploaded PDF files) and transfers them to the coordinator machine. With the dysnomia-snapshots tool we can inspect the snapshot store:

$ dysnomia-snapshots --query-all
apache-webapplication/cms/1f9ed847885d2b3e3c67c51231122d958751eb5e2443c281e02e1d7108a505a3
apache-webapplication/cmsgallery/28d17a6941cb195a92e748aae737ccf524747477c6943436b734891d0f36fd53
apache-webapplication/literature/ed5ec4f8b9b4fcdb8b740ad1fa7ecb40b10dece03548f1d6e09a6a82c804131b
apache-webapplication/portal/5bbea499f8f8a4f708bb873ad683dbf088afa4c553f90ab287a9249a7ef02651
mysql-database/cmsdb/aa75992f780991c39a0969dcac5f69b04685c4fa764937476b816e938d6972ba
mysql-database/cmsgallerydb/31ebdaba658ca376123ff6a91a3e275731b383346a07840b1acaa1e44d921b65
mysql-database/homeworkdb/f0fda91545af0cb300afd84592d4914dcd48257053401e232438e34d83af828d
mysql-database/literaturedb/cb881c2200a5f1562f0b66f1394d0902bbb8e2361068fe096faac3bc31f76b5d
mysql-database/portaldb/5d8a5cb952f40ce76f93eb939d0b37eab33736d7b1e1426038322f8a572034ee
mysql-database/usersdb/64d11fc7f8969da5da318276a666f2e00e0a020ba619a1d82ed9b84a7f1c2ca6

and with some shell scripting, the actual contents of the snapshot store:

$ find $(dysnomia-snapshots --resolve $(dysnomia-snapshots --query-all)) -type f
/home/sander/state/snapshots/apache-webapplication/cms/1f9ed847885d2b3e3c67c51231122d958751eb5e2443c281e02e1d7108a505a3/state.tar.xz
/home/sander/state/snapshots/apache-webapplication/cmsgallery/28d17a6941cb195a92e748aae737ccf524747477c6943436b734891d0f36fd53/state.tar.xz
/home/sander/state/snapshots/apache-webapplication/literature/ed5ec4f8b9b4fcdb8b740ad1fa7ecb40b10dece03548f1d6e09a6a82c804131b/state.tar.xz
/home/sander/state/snapshots/apache-webapplication/portal/5bbea499f8f8a4f708bb873ad683dbf088afa4c553f90ab287a9249a7ef02651/state.tar.xz
/home/sander/state/snapshots/mysql-database/cmsdb/aa75992f780991c39a0969dcac5f69b04685c4fa764937476b816e938d6972ba/dump.sql.xz
/home/sander/state/snapshots/mysql-database/cmsgallerydb/31ebdaba658ca376123ff6a91a3e275731b383346a07840b1acaa1e44d921b65/dump.sql.xz
/home/sander/state/snapshots/mysql-database/homeworkdb/f0fda91545af0cb300afd84592d4914dcd48257053401e232438e34d83af828d/dump.sql.xz
/home/sander/state/snapshots/mysql-database/literaturedb/cb881c2200a5f1562f0b66f1394d0902bbb8e2361068fe096faac3bc31f76b5d/dump.sql.xz
/home/sander/state/snapshots/mysql-database/portaldb/5d8a5cb952f40ce76f93eb939d0b37eab33736d7b1e1426038322f8a572034ee/dump.sql.xz
/home/sander/state/snapshots/mysql-database/usersdb/64d11fc7f8969da5da318276a666f2e00e0a020ba619a1d82ed9b84a7f1c2ca6/dump.sql.xz

The above output shows that for each MySQL database, we store a compressed SQL dump of the database and for each stateful web application, a compressed tarball of state files.

Conclusion


In this blog post, I have described a more realistic public Disnix example that is inspired by my web framework developments a long time ago. Aside from automating a system's deployment, the purpose of this blog post is to describe how a system that can be decomposed into distributable services that can be deployed with Disnix. Implementing such a system is all but trivial and driven by various kinds of design decisions.

Availability


The example web application system can be obtained from my GitHub page. The Disnix deployment expressions can be found in the deployment/ sub folder.

In addition, I have created a Dysnomia module named: fileset that can capture the state files of web applications in a compressed tarball.

After the recent developments the Disnix toolset has reached a new stable point. As a result, I have decided to release Disnix 0.8. Consult the Disnix homepage for more information!

Monday, February 12, 2018

Deploying systems with circular dependencies using Disnix


Some time ago, during my PhD thesis defence, one of my committee members asked me how I would deploy systems with Disnix in which services have circular dependencies.

It was an interesting question because Disnix defines dependencies between services (that typically involve network connections) as inter-dependencies that have two properties:

  • They allow services to find services they depend on by providing their connection properties
  • They ensure that any inter-dependency is activated before the service itself, so that no failures will occur because of missing dependencies -- in Disnix, a service is either available or unavailable, but never in a broken state due to missing inter-dependencies at runtime.

In a system with circular dependencies, the ordering property is problematic -- it is impossible to activate one dependency before another without having broken connections between them.

During the defence, I had to admit that I have never deployed such systems with Disnix before, but that there were a couple of possible solutions to cope with such constraints. For example, you can propagate properties of the distribution model directly to a service, as opposed to declaring circular inter-dependencies. Then the ordering requirement is not enforced.

I also explained that systems should not have any hard cyclic requirements on other services, but instead compose their (potential bidirectional) communication channels at runtime. Furthermore, I explained that circular dependencies are bad from a reuse perspective -- when two services mutually depend on each other, then they should ideally be one service.

Although the answer sufficed (e.g. it provided the answer that it was possible), the solution basically relies on unconventional usage of the deployment tool. Recently, as a personal exercise, I have decided to dig up this question again and explore the possibilities of deploying systems with circular dependencies.

Chord: a peer-to-peer distributed hash table


When thinking of an example system that has a circular dependency structure, the first thing that came up in my mind is Chord: a peer-to-peer distributed hash table (a copy of the research paper written by Stoica et al can be found here). Interesting fact is that I had to implement it many years ago in the lab course of the distributed algorithms course taught by another member of my PhD thesis committee.

A Chord network has circular runtime dependencies because it has a a ring structure -- in a network that has more than one node, each node has a successor and predecessor link, in which no node has the same predecessor or successor and the last successor link refers to the first node:


The Chord nodes (shown in the figure above) constitute a distributed peer-to-peer hash table. In addition to the fact that it can store key and value pairs (all kinds of objects), it also distributes the data over the nodes in the network.

Moreover, its operations are decentralized -- for example, when it is desired to search for an object or to store new objects in the hash table, it is possible to consult any node in the network. The system will redirect the caller to the appropriate node that should host the data.

Various kinds of implementations exist of the Chord protocol. The official reference implementation is a filesystem abstraction layer built on top of it. I experimented with the Java-based OpenChord implementation that is capable of storing arbitrary serializable Java objects.

More details about the implementation details of Chord operations can be found in the research paper.

Deploying a Chord network


One of the challenges I faced during the lab course is that I had deploy a test Chord network with a small collection of nodes. At that time, I had no proper deployment automation. I ended up writing a bash shell script that spawned a collection of processes in parallel.

Because deployment was complicated, I never tried more complex scenarios than running a small collection of processes on a single machine. Because it was not required for the lab course to do more than just that I, for example, never tried any real network communication deployments in which I had to distribute Chord nodes over multiple computer systems. The latter would have introduced even more complexity to the deployment process.

Deploying a Chord network basically works as follows:

  • First, we must deploy an initial node that has no connection to a predecessor or successor node.
  • Then for each additional node, we call the join operation to attach it to the network. As explained earlier, a Chord hash-table is decentralized and as a result, we can consult any node we want in the network for the join process. The join and stabilization procedures decide which predecessor and successor a new node actually gets.

There are various strategies to join additional nodes to the network, but I what I ended up doing is using the initial node as a bootstrap node -- all successive nodes, simply join to the bootstrap node and the network stabilizes to become a ring.

(As a sidenote: you could argue whether this is a good process, since the introduction of a central bootstrap node during the deployment process violates the peer-to-peer contraint, but that is a different story. Obviously, you could also think of other bootstrap strategies but that is beyond the scope of this blog post).

Automating a Chord network deployment with Disnix


To experiment with a Chord network, I have decided to create a simple server process (using the OpenChord API) whose only responsibility is to store data. It can optionally join another node in the network and it has a command-line interface allowing me to conveniently specify the connection parameters.

The deployment strategy using the initial node as a bootstrap node can be easily automated with Disnix. In the Disnix services model, we can define the bootstrap node as follows:

ChordBootstrapNode = rec {
  name = "ChordBootstrapNode";
  pkg = customPkgs.ChordBootstrapNode { inherit port; };
  port = 8001;
  portAssign = "private";
  type = "process";
};

The above service configuration corresponds to a process that binds the service to a provided TCP port.

Each successive node can be defined as a service that has an inter-dependency on the bootstrap node:

ChordNode1 = rec {
  name = "ChordNode1";
  pkg = customPkgs.ChordNode { inherit port; };
  port = 8002;
  portAssign = "private";
  type = "process";
  dependsOn = {
    inherit ChordBootstrapNode;
  };
};

As can be seen in the above Nix expression, the dependsOn attribute specifies that the node has an inter-dependency on the bootstrap node. The inter-dependency declaration provides the connection settings of the bootstrap node to the command-line utility that spawns the service and ensures that the bootstrap node is deployed first.

By providing an infrastructure model containing a number of machines and writing a distribution model that maps the node to the machine, such as:

{infrastructure}:

{
  ChordBootstrapNode = [ infrastructure.test1 ];
  ChordNode1 = [ infrastructure.test1 ];
  ChordNode2 = [ infrastructure.test2 ];
  ChordNode3 = [ infrastructure.test2 ];
}

we can deploy a Chord network consisting of 4 nodes distributed over two machines by running:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

This is the resulting deployment architecture of the Chord network that gets deployed:


In the above picture, the light grey colored boxes denote machines, the dark grey colored boxes container environments, the ovals services and the arrows inter-dependency relationships.

By running the OpenChord console, we can join any of our nodes in the network, such as the third node deployed to machine test2:

$ /nix/var/nix/profiles/disnix/default/bin/openchord-console
> joinN -port 9000 -bootstrap test2:8001
Trying to join chord network with boostrap URL ocsocket://test2:8001/
URL of created chord node ocsocket://192.168.56.102:9000/.

we can check the references that the console node has:

> refsN
Node: C1 F0 42 95 , ocsocket://192.168.56.102:9000/
Finger table:
  59 E4 86 AC , ocsocket://test2:8001/ (0-159)
Successor List:
  59 E4 86 AC , ocsocket://test2:8001/
  64 F1 96 B9 , ocsocket://test1:8001/
Predecessor: 9C 51 42 1F , ocsocket://test2:8002/

As may be observed in the output above, our predecessor is the node 3 deployed to machine test2 and our successors are node 3 deployed to machine test2 and node 1 deployed to machine test1.

We can also insert and retrieve the data we want:

> insertN -key test -value test
> entriesN
Entries:
  key = A9 4A 8F E5 , value = [( key = A9 4A 8F E5 , value = test)]

Defining services with circular dependencies in Disnix


As shown in the previous paragraph, the ring structure of a Chord hash table is constructed at runtime. As a result, Disnix does not need to manage any circular dependencies. Instead, it only has to know the dependencies of the bootstrap phase which are not cyclic at all.

I was also curious whether I could modify Disnix to properly define circular-dependencies, without any workarounds such as directly propagating properties from the distribution model. As explained in the introduction, inter-dependencies have two properties in which the second property is problematic: the ordering constraint.

To cope with the problematic ordering property, I have introduced a new property in the services model called: connectsTo allowing users to specify inter-dependencies for which the ordering does not matter. The connectsTo property makes it possible for services to define mutual dependencies on each other.

As an example case, I have extended the Disnix composition examples (a set of trivial examples implementing "Hello world" testcases) with a cyclic example case. In this new sub example, I have created a web application that both contains a server returning the "Hello world!" string and a client displaying the string. The result would be the following screen:


(Does it look cool? :p)

A web application instance is capable of connecting to another web service to obtain the "Hello world!" message to display. We can compose two web application instances that refer to each other to accomplish this.

The corresponding services model looks as follows:

{distribution, invDistribution, system, pkgs}:

let customPkgs = import ../top-level/all-packages.nix { 
  inherit system pkgs;
};
in
rec {
  HelloWorldCycle1 = {
    name = "HelloWorldCycle1";
    pkg = customPkgs.HelloWorldCycle;
    connectsTo = {
      # Depends on the other cyclic service
      HelloWorldCycle = HelloWorldCycle2;
    };
    type = "tomcat-webapplication";
  };

  HelloWorldCycle2 = {
    name = "HelloWorldCycle2";
    pkg = customPkgs.HelloWorldCycle;
    connectsTo = {
      # Depends on the other cyclic service
      HelloWorldCycle = HelloWorldCycle1;
    };
    type = "tomcat-webapplication";
  };
}

As may be observed in the above code fragment, the first service has a dependency on the second, while the second also has a dependency on the first. They are allowed to refer to each other because the connectsTo property disregards ordering.

By mapping the services to a network of machines that have Apache Tomcat hosted:

{infrastructure}:

{
  HelloWorldCycle1 = [ infrastructure.test1 ];
  HelloWorldCycle2 = [ infrastructure.test2 ];
}

and deploying the system:

$ disnix-env -s services-cyclic.nix \
  -i infrastructure.nix \
  -d distribution-cyclic.nix

We end-up with a deployment architecture of two services having cyclic dependencies:


To produce the above visualization, I have extended the disnix-visualize tool with support for the connectsTo property that displays inter-dependencies as dashed arrows (as opposed to solid arrows that denote ordinary inter-dependencies).

In addition to the option to specify circular dependencies, the connectsTo property has another interesting use case -- when services have inter-dependencies that may be broken, we can optimize the duration of an upgrade processes.

Normally, when a service gets upgraded, all its inter-dependent services will be reactivated. This is an implication of Disnix's strictness -- a service is either available or unavailable, but never broken because of missing inter-dependencies.

However, all the extra reactivations in the upgrade phase can be quite expensive as a result. If a link is non-critical and it is permitted to be down for a short while, then redeployments can be made faster.

Conclusion


In this blog post, I have described two deployment experiments with Disnix involving systems that have circular dependencies -- a Chord-based distributed hash table (that constructs a ring structure at runtime) and a trivial toy example system in which two services have mutual dependencies on each other.

Availability


The newly introduced connectsTo property is part of the development version of Disnix and will become available in the next release.

The composition example and newly created Chord example can be found on my GitHub page.

Wednesday, January 31, 2018

Diagnosing problems and running maintenance tasks in a network with services deployed by Disnix

I have been maintaining a production system with Disnix for quite some time. Although deployment works quite conveniently for me (I may probably be a bit biased, since I created Disnix :-) ), you cannot get around unforeseen incidents and problems, such as:

  • Crashing processes due to bugs or excessive load.
  • Database problems, such as inconsistencies in the data.

Errors in distributed systems are typically much more difficult to debug than single machine system failures. For example, tracing the origins of an error in distributed systems is generally hard -- one service's fault may be caused by a message propagated by another service residing on a different machine in the network.

But even if you know the origins of an error (e.g. you can clearly observe that a web application is crashing or a database connection), you may face other kinds of challenges:

  • You have to figure out to which machine in the network a service has been deployed.
  • You have to connect to the machine, e.g. through an SSH connection, to run debugging tasks.
  • You have to know the configuration properties of a service to diagnose it -- in Disnix, as explained in earlier blog posts, services can take any form -- they can be web services, but also web applications, databases and processes.

Because of these challenges, diagnosing errors and running maintenance tasks in a system deployed by Disnix is always unnecessarily time-consuming and inconvenient.

To alleviate this burden, I have developed a small tool and extension that establishes remote shell connections with environments providing all relevant configuration properties. Furthermore, the tool gives suggestions to the end-user explaining what kinds of maintenance tasks he could carry out.

The shell activity of Dysnomia


As explained in previous Disnix-related blog posts, Disnix carries out all activities to deploy a service oriented system to a network machines (i.e. to bring it in a running state), such as building services from source code, distributing their intra-dependency closures to the target machines, and activating or deactivating every service.

For the build and distribution activities, Disnix uses, as its name implies, the Nix package manager because it offers a number of powerful properties, such as strong reproducibility guarantees and atomic upgrades and rollbacks.

For the remaining activities that Nix does not support, e.g. activating or deactivating services, Disnix uses a companion tool called Dysnomia. Because services in a Disnix context could take any form, there is no generic means to activate or deactivate them -- for this reason, Dysnomia provides a plugin system with modules that carry out specific activities for a specific service type.

One of the plugins that Dysnomia provides is the deployment of MySQL databases to a MySQL DBMS server. Dysnomia deployment activities are driven by two kinds of configuration specifications. A component configuration defines the properties of a deployable unit, such as a MySQL database:

create table author
( AUTHOR_ID  INTEGER       NOT NULL,
  FirstName  VARCHAR(255)  NOT NULL,
  LastName   VARCHAR(255)  NOT NULL,
  PRIMARY KEY(AUTHOR_ID)
);

create table books
( ISBN       VARCHAR(255)  NOT NULL,
  Title      VARCHAR(255)  NOT NULL,
  AUTHOR_ID  VARCHAR(255)  NOT NULL,
  PRIMARY KEY(ISBN),
  FOREIGN KEY(AUTHOR_ID) references author(AUTHOR_ID) on update cascade on delete cascade
);

The above configuration is a MySQL script (~/testdb) that creates the database schema consisting of two tables.

The container configuration captures properties of the environment in which the component should be hosted, which is in this particular case, a MySQL DBMS server:

type=mysql-database
mysqlUsername=root
mysqlPassword=verysecret

The above component configuration (~/mysql-production) defines the type stating that mysql-database plugin must be used, and provides the authentication credentials required to connect to the DBMS server.

The Dysnomia plugin for MySQL implements various kinds of deployment activities for MySQL databases. For example, the activation activity is implemented as follows:

...

case "$1" in
    activate)
        # Initalize the given schema if the database does not exists
        if [ "$(echo "show databases" | @mysql@ --user=$mysqlUsername --password=$mysqlPassword -N | grep -x $componentName)" = "" ]
        then
            ( echo "create database $componentName;"
              echo "use $componentName;"
              
              if [ -d $2/mysql-databases ]
              then
                  cat $2/mysql-databases/*.sql
              fi
            ) | @mysql@ $socketArg --user=$mysqlUsername --password=$mysqlPassword -N
        fi
        markComponentAsActive
    ;;

    ...
esac

The above code fragment checks whether a database with the given schema exists and if it does not, it will create it by running the database initialization script provided by the component configuration. As may also be observed, the above activity uses the container properties (such as the authentication credentials) as environment variables.

Dysnomia activities can be executed by invoking the dysnomia command-line tool. For example, the following command will activate the MySQL database in the MySQL database server:

$ dysnomia --operation activate \
  --component ~/testdb --container ~/mysql-production

To make the execution of arbitrary tasks more convenient, I have created a new Dysnomia option called: shell. The shell operation is basically an activity that does not execute anything, but instead spawns a shell session that provides the container configuration properties as environment variables.

Moreover, the shell activity of a Dysnomia plugin typically displays suggestions for shell commands that the user may want to carry out.

For example, when we run the following command:

$ dysnomia --shell \
  --component ~/testdb --container ~/mysql-production

Dysnomia spawns a shell session that shows the following:

This is a shell session that can be used to control the 'staff' MySQL database.

Module specific environment variables:
mysqlUsername  Username of the account that has the privileges to administer
               the database
mysqlPassword  Password of the above account
mysqlSocket    Path to the UNIX domain socket that is used to connect to the
               server (optional)

Some useful commands:
/nix/store/h0kcf5g2ssyancr9m2i8sr09b3wq2zy0-mariadb-10.1.28/bin/mysql  --user=$mysqlUsername --password=$mysqlPassword staff Start a MySQL interactive terminal

General environment variables:
this_dysnomia_module     Path to the Dysnomia module
this_component           Path to the mutable component
this_container           Path to the container configuration file

[dysnomia-shell:~]# 

By executing the command-line suggestion shown above in the above shell session, we get a MySQL interactive terminal allowing us to execute arbitrary SQL commands. It saves us the burden looking up all the MySQL configuration properties, such as the authentication credentials and the database name.

The Dysnomia shell feature is heavily inspired by nix-shell that works in quite a similar way -- it will take the build dependencies of a package build as inputs (which typically manifest themselves as environment variables) and fetches the sources, but it will not execute the package build procedure. Instead, it spawns an interactive shell session allowing the user to execute arbitrary build tasks. This Nix feature is particularly useful for development projects.

Diagnosing services with Disnix


In addition to extending Dysnomia with the shell feature, I have also extended Disnix to make this feature available in a distributed context.

The following command can be executed to spawn a shell for a particular service of the ridiculous staff tracker example (that happens to be a MySQL database):

$ disnix-diagnose -S staff
[test2]: Connecting to service: /nix/store/yazjd3hcb9ds160cq03z66y5crbxiwq0-staff deployed to container: mysql-database
This is a shell session that can be used to control the 'staff' MySQL database.

Module specific environment variables:
mysqlUsername  Username of the account that has the privileges to administer
               the database
mysqlPassword  Password of the above account
mysqlSocket    Path to the UNIX domain socket that is used to connect to the
               server (optional)

Some useful commands:
/nix/store/h0kcf5g2ssyancr9m2i8sr09b3wq2zy0-mariadb-10.1.28/bin/mysql  --user=$mysqlUsername --password=$mysqlPassword staff Start a MySQL interactive terminal

General environment variables:
this_dysnomia_module     Path to the Dysnomia module
this_component           Path to the mutable component
this_container           Path to the container configuration file

[dysnomia-shell:~]# 

The above command-line instruction will lookup the location of the staff database in the configuration of the system that is currently deployed, connects to it (typically through SSH) and spawns a Dysnomia shell for the given service type.

In addition to an interactive shell, you can also directly run shell commands. For example, the following command will query all the staff records:

$ disnix-diagnose -S staff \
  --command 'echo "select * from staff" | mysql -u $mysqlUsername -p $mysqlPassword staff'

In most cases, only one instance of a service exists, but Disnix can also deploy redundant instances of the same service. For example, we may want to deploy two redundant instances of the web application front end in the distribution.nix configuration file:

stafftracker = [ infrastructure.test1 infrastructure.test2 ];

When trying to spawn a Dysnomia shell, the tool returns an error because it does not know to which instance to connect to:

$ disnix-diagnose -S stafftracker
Multiple mappings found! Please specify a --target and, optionally, a
--container parameter! Alternatively, you can execute commands for all possible
service mappings by providing a --command parameter.

This service has been mapped to:

container: apache-webapplication, target: test1
container: apache-webapplication, target: test2

In this case, we must refine our query with a --target parameter. For example, the following command connects to the web front-end on the test1 machine:

$ disnix-diagnose -S stafftracker --target test1

It is still possible to execute remote shell commands for redundantly deployed services. For example, the following command gets executed twice, because we have two instances deployed:

$ disnix-diagnose -S stafftracker \
  --command 'echo I will see this message two times!'

In some cases, you may want to execute other kinds of maintenance tasks or you simply want to know where a particular service resides. This can be done by running the following command:

$ disnix-diagnose -S stafftracker --show-mappings
This service has been mapped to:

container: apache-webapplication, target: test1
container: apache-webapplication, target: test2

Conclusion


In this blog post, I have described a new feature of Dysnomia and Disnix that spawns interactive shell sessions making problem solving and maintenance tasks more convenient.

disnix-diagnose and the shell extension are part of the development versions of Disnix and Dysnomia and will become available in the next release.

Monday, January 8, 2018

Syntax highlighting Nix expressions in mcedit

The year 2017 has passed and 2018 has now started. For quite a few people, this is a good moment for reflection (as I have done in my previous blog post) and to think about new year's resolutions. New year's resolutions are typically about adopting good new habits and rejecting old bad ones.

Orthodox file managers



One of my unconventional habits is that I like orthodox file managers and that I extensively use them. Orthodox file managers have a number of interesting properties:

  • They typically display textual lists of files, as opposed to icons or thumbnails.
  • They typically have two panels for displaying files: one source and one destination panel.
  • They may also have third panel (typically placed underneath the source and destination panels) that serves as a command-line prompt.

The first orthodox file manager I ever used was DirectoryOpus on the Commodore Amiga. For nearly all operating systems and desktop environments that I touched ever since, I have been using some kind of a orthodox file manager, such as:


Over the years, I have received many questions from various kinds of people -- they typically ask me what is so appealing about using such a "weird program" and why I have never considered switching to a more "traditional way" of working, because "that would be more efficient".

Aside from the fact that it may probably be mostly inertia, my motivating factors are the following:

  • Lists of files allow me to see more relevant and interesting details. In many traditional file managers, much of the screen space is wasted by icons and the spacing between them. Furthermore, traditional file managers may typically hide properties of files that I also typically want to know about, such as a file's size or modification timestamp.
  • Some file operations involve a source and destination, such as copying or moving files. In an orthodox file manager, these operations can be executed much more intuitively IMO because there is always a source and destination panel present. When I am using a traditional file manager, I typically have to interrupt my workflow to open a second destination window, and use it to browse to my target location.
  • All the orthodox file managers I have mentioned, implement virtual file system support allowing me to browse compressed archives and remote network locations as if they were directories.

    Nowadays, VFS support is not exclusive to orthodox file managers anymore, but they existed in orthodox file managers much longer.

    Moreover, I consider the VFS properties of orthodox file managers to be much more powerful. For example, the Windows file explorer can browse Zip archives, but Total Commander also has first class support for many more kinds of archives, such as RAR, ACE, LhA, 7-zip and tarballs, and can be easily extended to support many other kinds of file systems by an add-on system.
  • They have very powerful search properties. For example, searching for a collection of files having certain kinds of text patterns can be done quite conveniently.

    As with VFS support, this feature is not exclusive to orthodox file managers, but I have noticed that their search functions are still considerably more powerful than most traditional file managers.

From all the orthodox file managers listed above, Midnight Commander is the one I have been using the longest -- it was one of the first programs I used when I started using Linux (in 1999) and I have been using it ever since.

Midnight Commander also includes a text editor named: mcedit that integrates nicely with the search function. Although I have experience with half a dozen editors (such as vim and various IDEs, such as Eclipse and Netbeans), I have been using mcedit, mostly for editing configuration files, shell scripts and simple programs.

Syntax highlighting in mcedit


Earlier in the introduction I mentioned: "new year's resolutions", which may probably suggest that I intend to quit using orthodox file managers and an unconventional editor, such as mcedit. Actually, this is not something I am planning to :-).

In addition to Midnight Commander and mcedit, I have also been using another unconventional program for quite some time, namely: the Nix package manager since late 2007.

What I noticed is that, despite being primitive, mcedit has reasonable syntax highlighting support for a variety of programming languages. Unfortunately, what I still miss is support for the Nix expression language -- the DSL that is used to specify package builds and system configurations.

For quite some time, editing Nix expressions was a primitive process for me. To improve my unconventional way of working a bit, I have decided to address this shortcoming in my Christmas break by creating a Nix syntax configuration file for mcedit.

Implementing a syntax configuration for the Nix expression language


mcedit provides syntax highlighting (the format is described in the manual page) for a number of programming languages. The syntax highlighting configurations seem to follow similar conventions, probably because of the fact that programming languages influence each other a lot.

As with many programming languages, the Nix expression language has its own influences as well, such as Haskell, C, bash, JavaScript (more specifically: the JSON subset) and Perl.

I have decided to adopt similar syntax highlighting conventions in the Nix expression syntax configuration. I started by examining Nix's lexer module (src/libexpr/lexer.l):

  • First, I took the keywords and operators, and configured the syntax highlighter to color them yellow. Yellow keywords is a convention that other syntax highlighting configurations also seem to follow.
  • Then I implemented support for single line and multi-line comments. The context directive turned out to be very helpful -- it makes it possible to color all characters between a start and stop token. Comments in mcedit are typically brown.
  • The next step were the numbers. Unfortunately, the syntax highlighter does not have full support for regular expressions. For example, you cannot specify character ranges, such as [0-9]+. Instead you must enumerate all characters one by one:

    keyword whole \[0123456789\]
    

    Floating point numbers were a bit trickier to support, but fortunately I could steal them from the JavaScript syntax highlighter, since the formatting Nix uses is exactly the same.
  • Strings were also relatively simple to implement (with the exception of anti-quotations) by using the context directive. I have configured the syntax highlighter to color them green, similar to other programming languages.
  • The Nix expression language also supports objects of the URL or path type. Since there is no other language that I am aware of that has a similar property, I have decided to color them white, with the exception of system paths -- system paths look very similar to the C preprocessor's #include path arguments, so I have decided to color them red, similar to the C syntax highlighter.

    To properly support paths, I implemented an approximation of the regular expression used in Nix's lexer. Without full regular expression support, it is extremely difficult to make a direct translation, but for all my use cases it seems to work fine.

After configuring the above properties, I noticed that there were still some bits missing. The next step was opening the parser configuration (src/libexpr/parser.y) and look for any missing characters.

I discovered that there were still separators that I needed to add (e.g. parenthesis, brackets, semi-colons etc.). I have configured the syntax highlighter to color them bright cyan, with the exception of semi-colons -- I colored them purple, similar to the C and JavaScript syntax highlighter.

I also added syntax highlighting for the builtin functions (e.g. derivation, map and toString) so that they appear in cyan. This convention is similar to bash' syntax highlighting.

The implementation process of the Nix syntax configuration was generally straight forward, except for one thing -- anti-quotations. Because we only have a primitive lexer and no parser, it is impossible to have a configuration that covers all possibilities. For example, anti-quotations in strings that embed strings cannot be properly supported. I ended up with an implementation that only works for simple cases (e.g. a reference to an identifier or a file).

Results


The syntax highlighter works quite well for the majority of expressions in the Nix packages collection. For example, the expression for the Disnix package looks as follows:


The top-level expression that contains the package compositions looks as follows:


Also, most Hydra release.nix configurations seem to work well, such as the one used for node2nix:


Availability


The Nix syntax configuration can be obtained from my GitHub page. It can be used by installing it in a user's personal configuration directory, or by deploying a patched version of Midnight Commander. More details can be found in the README.