A Plague on Both Your Offices – Covid-19 And The Rise of Distributed Working

This Article is also published on Medium

As I write this we are still in what I guess are the early stages of the Corona Virus /Covid-19 Pandemic. Whilst the situation is still rapidly changing, I am not alone in thinking that some of the social impact from this outbreak will be long-reaching.

It’s mid-March and I am in the UK. Sports and social events are being cancelled left, right and centre as people are advised to stay home. I am a ‘knowledge worker’ and able to work from home. I didn’t always though – I had a previous career in healthcare where people were very much expected to attend for work in person, at the office and to ‘dress smartly’ even though nobody besides their immediate colleagues would see them- very old school. A great number of the people I worked with could have just as well worked from home but the organisation simply didn’t have the interest or the will to let them do so. Separately, I have a Master’s degree in Cultural History – as I describe it, the ‘Whys’ rather than the ‘Whats’ of history. Let’s put all this together try and take a longer view of what’s happening now, not just the ‘right now’:
read more

When Amazon Web Services overtakes one of your custom features

This post is also published on Medium

It’s kind of a standing joke in the industry – do some cool thing with AWS to implement an infrastructure feature, and if it works well, Amazon will come along a few months later with some matching in-house feature. Sometimes that feature might be a relatively simple thing, maybe something that was obviously missing, sometimes you might have had a whole project that was essentially deprecated, and sometimes it’s a feature of a larger piece of work that means you have to adjust or re-evaluate your approach. What do you do when this happens? This article covers an example of each – one that happened with a company I was working with; one that happened with a third-party project we made use of, and one that happened with my own project. read more

The Horror of Microsoft Teams

Also published on Medium – from where this story was posted to Hacker News and has achieved as at 28 September 2019:

On Hacker News:

544 points (for a third party’s link to my Medium article)

On Medium:

47K reads
232 fans
1.8 k claps

I wrote this mostly whilst finishing a period of employment where I was obliged to use Microsoft Teams. For those who have not had the pleasure, it’s Microsoft’s counter to Slack, the instant-messaging client beloved in tech. Microsoft first introduced Teams in March 2017 and recently announced that they have ‘overtaken’ Slack, claiming more than 13 million daily active users, or 19 million weekly active users. In March 2019 Microsoft advised that 500,000 organisations were using the service. Previously I worked, and now again am presently working, in an organisation using Slack. I’m not in love with it but it does the job reasonably well. In each case I have been working as an engineer in a fair-size organisation with a mix of channels and chats. I have been a messaging system user, not an administrator. Over the 15 months I used it Teams was horrendous — here are some of the reasons why: read more

How not to do alerting

This article has also been published on Medium

There has been much written about the right way to handle alarms and alerts for Sysadmins, Ops and Reliability Engineers. I take the approach that you can learn as much from looking at how not to do it. Here are some examples. I’m sure readers can think of many more. This is one small part of a big field and doesn’t begin to cover all the other areas that system monitoring and feedback etc. touch on. Neither am I attempting to cover the greater field of Ops, systems administration, Site Reliability Engineering etc in any detail. Lastly, I am talking pretty much about the politics of alerts rather than the technical aspects. The below are principally about out-of-hours ‘on-call’ type alerts but the principles are general: read more

The Bastion Server That Isn’t There

Deploying ssh Bastion as a stateless service on AWS with Docker and Terraform

I also have a presentation and live demonstration on the below, so far given at DevSecOps – London Gathering June 13 2018.

This article has also been published on Medium

The mantras of software as a service, stateless, cattle vs. pets, etc, are often and loudly repeated, but in many environments you often don’t have to look too far before you find some big fat pet box sprawling somewhere. Maybe it is the in-house file server, maybe something else, but if your infrastructure is in the cloud then it is most likely going to be your Bastion server (or ‘jump box’). Here I look at the problem, look at a couple of options and present a solution that I implemented providing Bastion ssh as a stateless service on AWS – the code is available on GitHub and also published on the Terraform Module Registry. Whilst the principles are applicable universally, this specific solution employs a Terraform plan to deploy to AWS. If you are not using AWS then you might find concentrating on the cloud-config user data stuff more useful as the rest would need to be ported, e.g. for DigitalOcean etc. If you’re using GCP then to be honest you probably don’t need this at all. read more

Implementing the ELK stack with microservice containers on AWS with Terraform

25 Minute reading time  (but article is composed of short, numbered sections!)

1 – why is this article different to every other blog post on the ELK stack?

There’s a lot of articles on ElasticStack/ELK components out there, I found a LOT that were extremely basic, essentially school project level reiterations of official elastic.co documentation and also a few that were very high level, essentially assuming that you already know everything and are wanting to ‘talk shop’. I really struggled to find anything that covered a full use case in any detail without hand-waving over the fiddly bits. Whilst I do give some basic info here, I don’t intend to re-iterate official documentation and I cover a full use case including some of the blind alleys and pitfalls. This article concentrates on the technical challenges and solutions. It is not intended to be an introduction or comprehensive guide to the Elastic Stack. Bear in mind that I was starting more or less from scratch without previous production experience with AWS, negligible previous exposure to Docker and no prior experience with Terraform or with Elastic Stack. read more

Setting a Proprietary Server Process to Run at Boot Using Systemd

One of the big differences between being a good hobbyist with Linux and working commercially with it is dealing with proprietary software. You can use and configure all sorts of systems indefinitely on your own account and never come up against dealing with awkward proprietary software that is supposedly officially supported for your platform that you really need to make work. Recently I had this experience. Not only was I able to get it working but I was able to extend it beyond the manufacturer’s original provision to make it more user friendly and less work to administer. read more

Ansible on the Desktop

Configuration management tools are a big deal these days.  Just as with Puppet, Chef and Salt, a lot of material written about Ansible presumes either a lab environment or one focussed on servers, or both. Virtualisation is also considered ‘a given’- Docker and Vagrant are both popular- and the guides expect that you are implementing on green-field estate. I thought it would be useful to write about real-world experience and use in the opposite scenario: desktop more than server, physical rather than virtual machines and a ‘mature’ environment. Oh, and learning as we go along. read more