Introducing programmatic editing of Hiera YAML files


If you have ever maintained a complicated, multi-team deployment of Hiera, you have probably seen data keys repeated in flagrant violation of the Don’t Repeat Yourself principle.

To an extent, this is avoidable. It is possible to declare variables in Hiera and look them up from elsewhere in Hiera by calling the hiera function from within Hiera. It is also possible to define aliases in order to look up complex data from elsewhere within Hiera.

Meanwhile, the hiera_hash function can eliminate the need to repeat Hash keys at multiple levels of the hierarchy, although Puppet 3’s automatic parameter lookup will not return merged hash lookups.

On the other hand, many Puppet users don’t know about these features, and even when they do, tight project deadlines tempt the best of us to take shortcuts.

Bulk updating of Hiera data

The problem that arises can be stated as follows: Given many Hiera files, possibly in separate Git repos and maintained in separate teams, how would you update a similar block of Hiera data in all of these files?

I spent several hours on a Friday afternoon writing a simple Ruby script to double-check that I’d manually updated ~ 10 YAML files with changes to what were essentially the same data keys, and I wondered if there is a better way.

Python and ruamel.yaml

To my surprise, I discovered that it is simply impossible to programmatically update human-edited YAML files in Ruby because its parser cannot preserve commenting and formatting.

Mike Pastore states in his comment at

Most YAML libraries I’ve worked with don’t preserve formatting or comments. Some quick research turns up only one that does—and it’s for Python (ruamel.yaml). In my experience, YAML is great for human-friendly, machine-readable configuration files and not much else. It loses its allure the second you bring machine-writeability into the picture.

So to the Ruby community: someone needs to write a YAML parser that preserves commenting and formatting!

In the meantime, all power to Anthon van der Neut, who has forked the PyYAML project and solved a good 80% of the problem of preserving the commenting and formatting. He also proved to be incredibly helpful in answering questions about the parser on Stack Overflow, and in responding to bug reports.

I realised that a script that could execute snippets of arbitrary Python code on the YAML files in memory would provide a powerful and flexible interface for bulk editing of Hiera files. In the remainder of the post, I’ll show how various data editing – and viewing – problems can be solved using my new tool.

Installing the script

To install the script, just clone my Git repository and install the Python dependencies with PIP:

$ git clone $ cd hiera-bulk-edit $ pip install -r requirements.txt read more