Configuration management tools are a big deal these days. Just as with Puppet, Chef and Salt, a lot of material written about Ansible presumes either a lab environment or one focussed on servers, or both. Virtualisation is also considered ‘a given’- Docker and Vagrant are both popular- and the guides expect that you are implementing on green-field estate. I thought it would be useful to write about real-world experience and use in the opposite scenario: desktop more than server, physical rather than virtual machines and a ‘mature’ environment. Oh, and learning as we go along.
I arrived to support an environment that is almost entirely Linux – both for desktops and servers. The line of business is video effects post-production (VFX) and there are some notable peculiarities from the standard Linux environment that go with the territory:
The great majority of the estate is physical machines on-premises. This is partly because a lot of the work is hardware intensive, especially graphics intensive.
There is much less distinction between some of the servers and desktops than one might normally expect in the wider world. This is partly because a server in a render farm essentially has to be able to run all of the same applications and plugins as the desktop machine, just headless, and partly because the workstations need also be able to participate in the render farm. Having said this there are the typical file server, DNS server etc. too. Nearly all of the fleet is Linux though, mostly CentOS, and that’s what we are looking at here.
Although I am not the only IT person in the organisation, I am the only dedicated IT person on site. Having said this, some of the staff have a very high degree of technical understanding.
There are no full-time developers on-site but there is some custom python code and extensions written and maintained in-house.
The estate I am working on is ‘brown field’ rather than ‘greenfield’. It is a going concern of largely physical machines. There isn’t the option or facility of a staging/test/production or Blue/Green teams, only an example production machine. Even a clean reinstall is not necessarily an option as there may be undocumented software licenses in unexpected places. In many areas and cases the documentation is either weak or non-existent. Hosts have been hand-managed, every machine is different to some extent and any change may cause unexpected breakage. This may mean having to roll back a change on one or more machines. Enter Ansible.
I was determined that going forward my changes would be global, consistent and documented, even if they were being implemented over a base that did not share any of these factors. The obvious choices were Puppet; Chef; Salt and Ansible. I’m not going to go into a detailed comparison of these four and I a presuming at least a passing understanding of what each is. I realise that my goal could be achieved with any of them. Short arguments:
Puppet And Chef are Ruby based
I have not looked at Chef in depth but I spent some time trying to learn Puppet via their learning VM. As a non-developer it is not easy to make headway, An understanding of Ruby is required but you will probably have to be already an experienced Ruby dev to make the most of it, you certainly can’t expect to learn the bits you need from the beginning by only studying Puppet.
Salt and Ansible are Python based
Much VFX software uses Python code and the in-house bespoke code is also Python so the choice here is obvious.
Salt appears to have a significant focus on very large scale deployments with tiered control servers. It also requires that all of the subject machines have the salt minion agent installed. Looking at an example set up here for instance, this involves for the clients:
- Installing Python Software Properties
- Adding the SaltStack PPA Repository (for Ubuntu)
- Install The Salt-Minion package
- Configuring the Salt-Minion
- Restarting the salt-minion service
Let’s set aside the fact that 3rd-party PPA’s are disabled on release upgrades in Ubuntu- we are concentrating on CentOS. How do I meet these criteria on already deployed physical kit? Shelling into each machine by hand? A custom script? err…
Ansible is Python based with an emphasis on simplicity. You don’t have to have much understanding of Python to use it however. A layperson can look at an Ansible playbook and get the gist of what it is doing (big plus where non-technical management are involved). This makes it self-documenting. It requires nothing more than a basic python install on the clients and ssh access (which I already had). It does not require a single dedicated master server and all the files and configuration can be managed by GIT- this means that it is easy to share playbooks and host management if needed. It does not have to be used as a complete solution and has an (IMO deserved) reputation for being a good fit for places that have previously used a lot of custom shellscripts to manage their estate. Ansible is also owned by RedHat which bodes very well for a CentOS shop- no third party packages required for instance. It is true that some have criticised the speed and scaling of Ansible- the default behaviour is to gather facts before execution although this can be turned off, and a large number of ssh connections can take some memory- but these were not an issue for me, I can wait two minutes for a play to complete installing a new version of Maya for instance, and my workstation is an Xeon with 64GB of RAM.
So what have I actually done with Ansible
A lot of small stuff in the beginning! Custom cron tasks; Custom app shortcuts to the company wiki; Ensuring that the latest versions of various proprietary software were installed, some with paid licenses, some just proprietary, e.g. Google Chrome Browser. In a few cases I’ve been able to use Ansible Galaxy to download roles written by others but in most cases I have had to write my own. It hasn’t taken long to work up to the bigger stuff though. Often there are little niggles and snags- it’s not just a case installing the rpm (if an rpm is even provided) and Ansible is great for recording and actioning the workarounds to these. Recently I had to configure clients for a file server change from NFS to Samba. The clients are bound to an AD server (Yes, AD) and to ensure a good fit we decided to rebind all of the client machines as well. In order to ensure consistency with remote offices the configuration to match was presented to me, as a BASH history and a sample machine. I was able to translate this into an Ansible role quite quickly with the great advantage that anything that couldn’t be quickly and easily achieved within ansible proper, e.g. long authconfig commands, could be included in the playbook as shell commands. Obviously in a situation like this you do learn some new things and there is always the unexpected but the task was completed successfully and in a reasonable time. One tip I would give to anyone doing this sort of thing is ‘make use of the “backup: yes” option when doing file: copy!’
I have found that Ansible works really well in the real world, in a live production environment where a new image or OS instance is not a ready option and, if need be, where others have not been using configuration management.
I’m afraid that I can’t publish the samba/active directory role described above but I do have two example roles that I have written, on my Github and available on Ansible Galaxy:
The first role installs djv_view, a viewer for the EXR files used so much in vfx, on CentOS7; RHEL 7 and Ubuntu 16.04 and 16.10. There’s undeclared dependencies and conflicts with the upstream package as described in the readme but this role deals with them for you.
The second role is a custom Cron job, originally written to work with MATE desktop on CentOS 7. It creates a Cron job that checks disk usage on the lvm root partition and presents warnings at 80% and 90% usage using libnotify to alert the active desktop session user Again the code is also available for reuse on my github