Archive for February, 2011

Why we use Cfengine: memory footprint

Wednesday, February 23rd, 2011

Here at Normation, we use Cfengine 3 extensively for configuration management across Linux and Windows servers. A question we get often is why Cfengine?

This is phrased either as What is so great about Cfengine? or What is the difference between Cfengine and Puppet or Chef? (as a reminder of how these 3 projects are related, check out Relative origin of Cfengine, Puppet and Chef).

I’d like to focus this post on memory consumption. Since Configuration Management software runs an agent on each server you want to manage, you want to be careful about the extra resources you’ll need to run it…

Cfengine components

Before getting into the statistics, we need to know what processes we’re looking at.

The main Cfengine process that applies configuration to a managed node is named cf-agent. When this process is run, it reads it’s local configuration (called promises) and attempts to apply that to the local machine, by running various commands. Three daemons can be run to support this process:

  1. cf-execd: In charge of running cf-agent on a regular basis. By default, it fires up every 5 minutes, then reports any changes to the configuration by email. This daemon would normally be run on all managed nodes.
  2. cf-serverd: Acts as a server, accepting incoming connections from authorized machines, for two reasons: sharing files from the local machine (this is used on a policy server, less frequently on managed nodes) and allowing remote on-demand execution of cf-agent. It is often run on all managed nodes, to allow instant policy application or fetching generated reports, but it’s use is optional.
  3. cf-monitord: Collects system statistics, and makes them available to cf-agent so that it may apply different configuration based on a machine’s current status (for example, if a disk is getting full, run some housekeeping operations). It’s use is also optional, but highly useful.

Statistics

With no further ado, here is the memory consumption we get on our servers for each component:

Graph of RAM (RSS) used by cf-execd, cf-serverd and cf-monitord

Cfengine daemons memory consumption

We couldn’t get graphs for the actual cf-agent process – it’s runtime is just too short for the monitoring probe to pick it up regularly. Running it manually we see it’s memory consumption peaking at 10 megabytes of RAM, with a total runtime of roughly 1.5 seconds.

Analysis

I think the graphs speak for themselves – each Cfengine daemon uses around 3 megabytes of RAM, and doesn’t have any visible memory leak (valgrind does confirm this). The agent itself sees slightly higher peaks, at around 10 megabytes, for a few seconds every execution.

This is why we trust Cfengine to be run on nodes old and new alike, from physical machines with more gigabytes of RAM than you can use down to tiny virtual machines running on only 128 MB (I’m not sure why, but we have more of the latter… I’m told it’s a budget problem).

The CPU usage of Cfengine is also very lightweight – but much harder to graph. Various other optimizations allow it to be extremely non-intrusive… More on these topics soon!

Some details for the curious:

  • No restarts occurred over the graph period.
  • The promises running while graphed cover system basics: ensuring required packages are installed, configured and running (SSH, monitoring, everyday tools, vim, etc), creating users, checking their passwords, copying SSH keys, and the like.
  • We run cf-agent every 5 minutes.
  • The graph is of RSS (Resident Set Size), or, in other words, the non-swapped physical memory used. The server was not using any swap at the time, so this is effectively the memory consumption of each process, excluding any shared libraries. The only shared libraries used are pretty standard on current UNIX systems: PCRE (Perl Compatible Regular Expressions) and BerkeleyDB, so they’re likely to be loaded already.
  • These graphs are based on Cfengine Community 3.1.4, currently the latest version.
  • Graph generated using Munin.

FOSDEM: Configuration Management wishlist

Saturday, February 12th, 2011

We were at FOSDEM in Brussels last weekend (OK, like every year, and like thousand of others – yes it’s that good an event!).

Alongside a huge number of interesting talks, events and people, of particular interest to us was the Configuration Management DevRoom, organized by James and Nigel from Puppet Labs. It hosted brilliant talks all day, ranging from introducing tools (Chef, Vagrant, Geppeto, FusionInventory, GLPI, OPSI, etc), best practives and real-world feedback from small companies up to the aviation industry.

We presented a session about Disaster Recovery, telling the tale of how things went massively wrong with our production systems, and how using configuration management saved us. The slides below include an introduction to Cfengine and some of the reasons why we chose it in 2009 and still love it.

A series of posts on this blog will go into more detail about these reasons in weeks to come – stay tuned!

Thank you to all those who could wake up early and attended the talk. The room was completely full, so for those who couldn’t get it because of the large attendance, apologies!

Two major wishlist items have come out of our experience:

  1. Backup and Configuration Management tools need real integration. Configuration Management is often highlighted as a life-saver for disaster recovery, but most people only go as far as automating service installation and configuration. Fully rolling out a service requires restoring backups too.
    • How can we automate our backup restoration?
    • How can we contact the “backup manager” to get the latest data and put it in the right place, the right way? (copy files, reload databases then restart services, or even update firewall rules)
    • How can we check whether our current data is up-to-date or not?
  2. Virtualization provisioning should be managed too. Our production systems rely on several big servers running many small virtual machines. Restoring these was the most time consuming aspect of our disaster recovery.
    • With abstraction layers such as libvirt now covering almost all modern virtualization systems, why do our configuration management tools not tie in to them better?
    • How can we define a list of virtual machines to set up, their parameters, the operating system to install and it’s installation settings, so that we can sit back and watch tools doing our work?

On this last point, I must mention that Cfengine Nova (the commercial version of Cfengine) already ties into libvirt to define virtual machines and change their settings. This is awesome, but installing the operating systems by hand is still a pain!

Any ideas or suggestions out there?