Posts Tagged ‘Configuration Management’

Cfengine tip: Keeping cf-execd alive (whatever happens)

Thursday, March 3rd, 2011

As good as your configuration management tool may be, it can only do it’s job if it’s running. Some tips on making sure this is the case, whatever happens.

Why just cf-execd?

With Cfengine, the “heavy lifting” is done by cf-agent, which is normally run on a regular interval by cf-execd (a daemon that runs all the time).

Frequently, servers will also run two other daemons: cf-monitord (keeps statistics on a system) and cf-serverd (allows local file sharing, and remote on-demand execution of cf-agent). Common practice is to include in cf-agent‘s configuration a promise that ensures that the desired daemons are running, and start them if not.

This makes sense, but what happens if cf-execd gets stopped, and then cf-agent is never run again? Well, this should never happen of course. But, out there in the real world, stuff happens:

  • maybe you ran out of RAM, and OOM-killer picked cf-execd for some weird reason
  • an administrator unwisely killed cf-execd without really knowing what it does
  • possibly you messed up your configuration and had it automatically killed (after all, errors are human …)

OK, so how do you avoid that?

Enough rambling, here is what we do:

  1. Use a promise in the configuration that ensures the daemons we want running are indeed running, or start them if not
  2. Configure cron to check on cf-execd, and start it if it’s not running

The promise we use is derived from one provided with the Cfengine sources, as follows:

The above Cfengine example uses some interesting concepts:

  • A list of the daemon names to check, which is iterated over by each of the three following promises, and reused in their attributes.
  • Ordering: the first promise, a processes promise, checks if a daemon is running and defines a class if not; then, the second promise restarts the daemon if that class was set, and finally, a report is printed if the restart went OK.
  • Generic class names: whatever the daemons you want to check, these class names will automatically be set and read.

Last, but not least, here is the line we add to /etc/crontab:

Of course, we wouldn’t add that line manually, but instead, use a Cfengine promise to add it if, and only if it’s not already in /etc/crontab… This is a subject for another post, but here’s a sneak preview:

Using the above promises, you will ensure that the Cfengine components you want will always be running (or at least restarted if they stop). Of course, it’s probably a good idea to monitor these promises, so that you don’t end up with a start/stop fight…

Why we use Cfengine: memory footprint

Wednesday, February 23rd, 2011

Here at Normation, we use Cfengine 3 extensively for configuration management across Linux and Windows servers. A question we get often is why Cfengine?

This is phrased either as What is so great about Cfengine? or What is the difference between Cfengine and Puppet or Chef? (as a reminder of how these 3 projects are related, check out Relative origin of Cfengine, Puppet and Chef).

I’d like to focus this post on memory consumption. Since Configuration Management software runs an agent on each server you want to manage, you want to be careful about the extra resources you’ll need to run it…

Cfengine components

Before getting into the statistics, we need to know what processes we’re looking at.

The main Cfengine process that applies configuration to a managed node is named cf-agent. When this process is run, it reads it’s local configuration (called promises) and attempts to apply that to the local machine, by running various commands. Three daemons can be run to support this process:

  1. cf-execd: In charge of running cf-agent on a regular basis. By default, it fires up every 5 minutes, then reports any changes to the configuration by email. This daemon would normally be run on all managed nodes.
  2. cf-serverd: Acts as a server, accepting incoming connections from authorized machines, for two reasons: sharing files from the local machine (this is used on a policy server, less frequently on managed nodes) and allowing remote on-demand execution of cf-agent. It is often run on all managed nodes, to allow instant policy application or fetching generated reports, but it’s use is optional.
  3. cf-monitord: Collects system statistics, and makes them available to cf-agent so that it may apply different configuration based on a machine’s current status (for example, if a disk is getting full, run some housekeeping operations). It’s use is also optional, but highly useful.

Statistics

With no further ado, here is the memory consumption we get on our servers for each component:

Graph of RAM (RSS) used by cf-execd, cf-serverd and cf-monitord

Cfengine daemons memory consumption

We couldn’t get graphs for the actual cf-agent process – it’s runtime is just too short for the monitoring probe to pick it up regularly. Running it manually we see it’s memory consumption peaking at 10 megabytes of RAM, with a total runtime of roughly 1.5 seconds.

Analysis

I think the graphs speak for themselves – each Cfengine daemon uses around 3 megabytes of RAM, and doesn’t have any visible memory leak (valgrind does confirm this). The agent itself sees slightly higher peaks, at around 10 megabytes, for a few seconds every execution.

This is why we trust Cfengine to be run on nodes old and new alike, from physical machines with more gigabytes of RAM than you can use down to tiny virtual machines running on only 128 MB (I’m not sure why, but we have more of the latter… I’m told it’s a budget problem).

The CPU usage of Cfengine is also very lightweight – but much harder to graph. Various other optimizations allow it to be extremely non-intrusive… More on these topics soon!

Some details for the curious:

  • No restarts occurred over the graph period.
  • The promises running while graphed cover system basics: ensuring required packages are installed, configured and running (SSH, monitoring, everyday tools, vim, etc), creating users, checking their passwords, copying SSH keys, and the like.
  • We run cf-agent every 5 minutes.
  • The graph is of RSS (Resident Set Size), or, in other words, the non-swapped physical memory used. The server was not using any swap at the time, so this is effectively the memory consumption of each process, excluding any shared libraries. The only shared libraries used are pretty standard on current UNIX systems: PCRE (Perl Compatible Regular Expressions) and BerkeleyDB, so they’re likely to be loaded already.
  • These graphs are based on Cfengine Community 3.1.4, currently the latest version.
  • Graph generated using Munin.

FOSDEM: Configuration Management wishlist

Saturday, February 12th, 2011

We were at FOSDEM in Brussels last weekend (OK, like every year, and like thousand of others – yes it’s that good an event!).

Alongside a huge number of interesting talks, events and people, of particular interest to us was the Configuration Management DevRoom, organized by James and Nigel from Puppet Labs. It hosted brilliant talks all day, ranging from introducing tools (Chef, Vagrant, Geppeto, FusionInventory, GLPI, OPSI, etc), best practives and real-world feedback from small companies up to the aviation industry.

We presented a session about Disaster Recovery, telling the tale of how things went massively wrong with our production systems, and how using configuration management saved us. The slides below include an introduction to Cfengine and some of the reasons why we chose it in 2009 and still love it.

A series of posts on this blog will go into more detail about these reasons in weeks to come – stay tuned!

Thank you to all those who could wake up early and attended the talk. The room was completely full, so for those who couldn’t get it because of the large attendance, apologies!

Two major wishlist items have come out of our experience:

  1. Backup and Configuration Management tools need real integration. Configuration Management is often highlighted as a life-saver for disaster recovery, but most people only go as far as automating service installation and configuration. Fully rolling out a service requires restoring backups too.
    • How can we automate our backup restoration?
    • How can we contact the “backup manager” to get the latest data and put it in the right place, the right way? (copy files, reload databases then restart services, or even update firewall rules)
    • How can we check whether our current data is up-to-date or not?
  2. Virtualization provisioning should be managed too. Our production systems rely on several big servers running many small virtual machines. Restoring these was the most time consuming aspect of our disaster recovery.
    • With abstraction layers such as libvirt now covering almost all modern virtualization systems, why do our configuration management tools not tie in to them better?
    • How can we define a list of virtual machines to set up, their parameters, the operating system to install and it’s installation settings, so that we can sit back and watch tools doing our work?

On this last point, I must mention that Cfengine Nova (the commercial version of Cfengine) already ties into libvirt to define virtual machines and change their settings. This is awesome, but installing the operating systems by hand is still a pain!

Any ideas or suggestions out there?

Cfengine 3 presentation @ RMLL 2010

Thursday, July 8th, 2010

From the 6 to 11 July 2010, the 11th RMLL are hosted in Bordeaux, a great meeting for anyone interested with Free Software and their uses.

Among the numerous very interesting presentations and round tables, I gave a presentation about the advantages of configuration management, and how Cfengine 3 works to help you deal with all the potential issues that might arise in the industrialization of your configurations. This presentation can be a good introduction on what Cfengine can do, and how it does it.

PS: Stay tuned for the Ldap Synchronization Connector presentation by Jonathan Clarke

Conférences : Loadays et RMLL

Saturday, June 26th, 2010

Normation a eu la chance de participer à la première édition des Loadays, ou Linux Open Administration Days. Retour rapide sur cet évènement, et rendez-vous à la prochaine grande conférence du libre : les RMLL à Bordeaux du 6 au 11 juillet 2010.

Le weekend du 10-11 avril 2010, une école de la ville d’Anvers en Belgique a accueilli une centaine de personnes pour les Loadays. Le programme a consacré une journée complète dans la salle principale à la gestion de configuration : les principaux outils (Cfengine, Chef, Puppet, Canonical Landscape, …) ont fait l’objet d’une présentation. Nicolas CHARLES de Normation a présenté l’outil Cfengine 3 et son écosystème à un public curieux de découvrir les dernières nouveautés du domaine. On trouvera la présentation en ligne.

À côté de ce thème principal, d’autres sessions ont abordé des sujets plus variés : inventaire avec FusionInventory, groupware avec Zarafa, gestion d’annuaires LDAP avec GOSa … J’ai présenté une session sur l’outil de synchronisation d’annuaires LSC (pour Ldap Synchronization Connector). En quelques slides, j’ai abordé les enjeux liés à la synchronisation des identités à travers plusieurs référentiels (OpenLDAP, Active Directory, bases de données SQL, …) et la solution que propose LSC. On trouvera aussi la présentation en ligne.

En dehors des présentations, le weekend a permis de nombreuses rencontres très intéressantes, et comme toujours, des discussions allant du débat échauffé à la refonte du monde autours d’excellentes bières belges… Un grand merci aux organisateurs !

Prochain rendez-vous : les Rencontres Mondiales du Logiciel Libre (RMLL) à Bordeaux. Nous participerons à plusieurs journées, et aurons l’honneur de présenter de nouveau des sessions sur ces sujets :

Managing configuration with Cfengine 3: concepts & theories

Wednesday, June 2nd, 2010

In December 2009, Mark Burgess, the author of Cfengine, was in France. This was a great opportunity to arrange a talk with members of the French Cfengine community.

Cfengine[0] is a policy-based configuration management system written by Mark Burgess at Oslo University College. Its primary function is to provide automated configuration and maintenance of computers, from a policy specification.[1]

(more…)