Ebooks for documentation- it’s easy and you should

Documentation. Not something to get excited about is it? Who wants to read a manual? Or a policy document? A reference guide even? How about if we could make it easier? More accessible? More relevant? I think we can, using ebooks, and in this paper I will expand on this. In case it isn’t obvious, this paper is intended for people with a technical interest. I give a description of the real world challenges and then 2 real-life examples illustrating where ebook solutions answer them. The majority of the software referred to is free and/or open source. None of it is meant to constitute a commercial endorsement.

The problem:

We’ve all been there- the documentation that you need is in an inconvenient format and an inconvenient place. Examples (not exhaustive):

Scenario 1:

You work at Large.corp. There are policies for everything and it’s all on some awful Intranet site where it’s almost impossible to find anything. When you do, what you need is half a page out of a PDF that’s a hundred pages long. If you’re lucky you can find your section quickly within it using your desktop computer so that when you print your paper copy off to take to the meeting you know where to highlight it. With a pen. That uses ink.

Scenario 2:

You work at Geek.E.R. There’s a lot of software developers on the crew and all the documentation is on Github. That’s great for the developers and the admins but nobody else has heard of it. Once you get beyond a certain size as an organisation you’re going to have documentation on things like claiming expenses and some staff who need to see it but aren’t quite as advanced in their tech literacy and don’t know their pull from their push.

Scenario 3:

You belong to some sort of special interest group- hobby, charity, etc.- that has policies, manuals, etc. If what you need it isn’t in a paper handbook then the documentation will be on some website – great at your desk, awkward on your phone, or when you’re out of signal, or when you can’t read the whole thing at once and forget where you were…

Scenario 4:

You have some sort of marketing brochure. You have a lovely website for it but as soon as your customers are out of signal or have to break away for something it’s gone, and they’ve lost where they were…

A solution in ebooks:

Paper books are great, but as the success of the Kindle shows, there are some key advantages to an electronic format- it weighs nothing, is easily transmitted and carried and the cost of storage and reproduction is negligible. One can rapidly search and easily annotate ebooks, which makes it extremely useful for reference books and manuals. Most applications have a built in dictionary so that you can easily look up any word if you don’t know it. Whilst the Kindle is very popular and doesn’t require any great technical knowledge, there are other devices also, many just as easy to use. To oversimplify slightly, Amazon use a format called azw for Kindle ebooks, everyone else (Google Play books, Apple books, me) uses ePub. They’re both a form of zipped html and it is relatively easy to convert automatically from one to the other though so once you have one you essentially have both. How would it be if you could publish your documentation this way? Perhaps you can. Here I present two very different real-world examples of quite different use cases with a similar solution.

The paper book conversion

I study a Japanese martial art called Iaido. It is extremely precise with regard to how techniques are performed. There are many different schools or styles with different techniques however. In order to compare people from different schools on a level field, a standardised subset of techniques was agreed. These have to be done exactly the same way by everyone, regardless of school. They are written out in a reference manual. This is published as a paper book. The paper book is very good but it has, for example, a lot of Japanese words in it (in the English version) so you are always flicking back and forth to the glossary and trying to find the exact part you need. I wanted an electronic version I could always have with me, easily search, easily reference foot notes, dictionary, pick up where I left off, etc.

Obviously it would have been easier to work with whatever the original electronic copy was but I only had access to the the paper copy, so (with permission) I scanned it, page by page, for 80 odd pages. Then I had to use Optical Character Recognition (OCR) software to turn the pictures of the pages back into editable text. Then I had to turn it into an ebook. The best way I found was to use Abbyy FineReader for the OCR and to work with plain text. Theoretically you can preserve the formatting from the original scan but I wound up with so much funny formatting that it was better to go back to bare bones and build back from there. Once I had the text proof-read (which took a while, OCR is not perfect) I was then able to reformat it (i.e. set chapters and section headings) as an ePub using Sigil – a free, open-source ebook editor. The result is extremely convenient- a manual that I always have with me, I can read on my phone, or my computer. I have even converted to an Amazon format and emailed to Kindle with all of the features you would expect from a regular ebook, even the illustrations are supported. The paper manual would simply be too bulky to carry with me in some scenarios, e.g. keeping it in my jacket on seminars which I can do with my mobile. I was even able to add an additional 3rd level of headings to make it easier for me to reference even the smallest sections. I did the conversion between formats using Calibre but there are free online services that are also available to do this.

The Github conversion

So, I found an I.T. company that maintain their playbook on Github. Keeping documentation on Github is by no means unusual but this was a ready example of publicly available company documentation that would be useful for a non-technical audience. The playbook in question is a set of documents laying out that organisation’s way of doing things both internally and externally. As in scenario 2 above, Github is great for developers, and people who are reading using a full-fat computer at a desk. Yes you can browse Github using a browser on your phone but it’s not so convenient, especially to ensure that you have checked every document and not lost your place. To be fair this company have set up a build chain to generate an html version of this work and put it on a (very pleasant!) website, just as Github themselves have described doing with their own documentation. Using Jekyll etc. to build a website in this way is also not so unusual but unfortunately the result still suffers from scenario 4 above- you need a signal or wired network connection to access the website and it’s not easy to keep a copy with you. This is great for ensuring that you are up to date with the latest version and might not be important for technical documentation that you’re referring to whilst you’re working on your machine. Unfortunately it’s not so great if you like to read a longer document methodically, e.g. on the tube. How could I improve on this?

So the documentation on Github is in Markdown. This is a very simple format so it should be possible to convert it into an ePub easily. How easily? Very easily it turns out. I used Leanpub. Leanpub is specifically set up for producing books automatically from a git repository. The short version is that you put your desired *.md files in a ‘manuscript’ folder in the root of your git repository together with a ‘Books.txt’ listing the files you want included in your desired order. Leanpub can then pull from your repo and assemble an ebook for you with each of your listed md files as a separate chapter. Section headings are preserved. There are various options on Leanpub for creating a cover, shopping cart integration for charging your readers etc, but the important bits here are that:

  • Leanpub is specifically designed around the git/agile paradigm- as they say on their homepage ‘Publish early, publish often.’ This means that you can start publishing immediately, whilst your work is in progress and update it as you go. Your readers will be emailed that there is an update available whenever you publish it.
  • There’s even an API to help with automated handling for you if you want
  • The output ebooks are available as ePub; Mobi and PDF. There is also an option for readers to have your book sent directly to their Kindle.
  • The reader end of the site is simple and pleasant to use, with no expert knowledge required.

So how did I create the ebook that I wanted in my case from the raw git repo? Well to be honest I was more interested in seeing what could be done quickly so I didn’t bother setting up scripts or a toolchain. I cloned the git repo, copied all of the md files to /manuscripts, did ls > Books.txt and then manually changed the chapter order slightly to put the introductory documents at the beginning. I also added a copyright disclaimer since the writing is not my own work. UPDATE 4th October – I did subsequently set up a proper fork so that I could sync to the upstream git repo and create a shell script to refresh the manuscript directory each time.

I was very pleased with the result from Leanpub. Whilst the output is not 100% perfect on all of my devices- it’s entirely usable and does what it says on the tin. In my case I am typically reading ePub on a customised Android ebook reader so the very minor issue- chapter numbers overprinting chapter names on the contents page, which isn’t even used normally- could just be my reading setup. The PDF version looks very nice, as does the Mobi/Kindle version and the epub on my desktop. When I revised some of the files and republished, I was notified that an updated edition was available, it was all quite quick and painless. The result is something that I can read through on the tube, in bed, wherever, without having to worry about losing my place.

A side point on PDF

PDF’s are designed to render on screen as they would when printed. This is great on a large screen but not so good on a mobile- you often wind up having to scroll left to right for each line- and interaction is limited- looking up words in a dictionary, highlighting, marking your place are all difficult or not possible. I did not construct an eBook from a PDF here but I have worked with stripping editable text out of PDF’s before. As a rule it is painful for any volume of text and unless you use an OCR type application you will find things like page numbers, headers and footers get pulled across and columns throw things out completely. Paragraph breaks can be lost but line breaks retained. In many respects it is a LOT like working from a paper printed document with similar formatting headaches.


Ebooks are established, accepted and widely supported. They have long made the leap from the technical audience to the general public. We can think of them like the mp3 of books- sure some people like vinyl or looking at the sleeve but the mass market is for convenience and this is convenience. We could think of documentation on a website in turn as the ‘streaming’ solution, a bit like Youtube or Netflix or Spotify – great for watching something all at once with a good network connection but a pain if either of those conditions aren’t met. Whilst git is excellent as the primary documentation tool and source, there are clear advantages to having documentation, especially that covering a non-core technical audience, immediately available and portable. Non-technical documentation, e.g. HR policies, or for a non-technical audience sometimes requires careful reading and reflection, not necessarily best done, or even possible, whilst sat in the office looking at a computer monitor.

Although it is possible to work from paper or PDF originals they are laborious to use in this way and it is far preferable to work from electronic sources. Git is not only popular as a documentation platform but, as befits an agile development platform, well supported for publishing automatically as part of continuous integration.

If you want people to be able to read through or refer to your documentation methodically or have it with them outside of the office, especially for the less technically oriented (and don’t expect people to jump up and tell you if they have difficulty with something), then consider integrating making it into an ebook as part of your workflow. It’s straightforward to do so and to keep your audience appraised of changes. You can expect that your documentation is then much more accessible to a much larger audience.