Help! We’re addicted to PDFs!

Frequently during the web migration project we have felt as though we’ve been drowning in a sea of PDFs, particularly during work on policy-heavy sites. The team has made a tremendous effort to drastically reduce the number of PDFs we have on the University website and, where possible, convert these to HTML pages.  

People sometimes ask us why we’re doing this, with many assuming that it would be easier to upload a PDF instead of creating a webpage.  In the short term it will often be easier and quicker to do this – but unfortunately quick and easy doesn’t equate to a better user experience, help with SEO, or assist with longer term content management and tracking.

Of course, we do still have many PDFs on our site – 1,020 at the last count! We’re working to get this number down so that only those which are best suited to PDF format (for reasons I will explain later in the blog post) are published this way.

Why do we prefer html pages to PDFs?

In the first place, PDFs are primarily optimized for paper – usually A4 in the UK. This is fine if it is a document you expect your users to print, but in an era where we’re actively discouraging printing and moving towards a paperless office and a net zero campus, it is not ideal. A quick straw poll in our office also suggests far fewer people have printers at home these days too.

Most of us working for the University will be working on a computer and used to the full real estate of the laptop or desktop display, but the experience is vastly different on a device with a smaller screen. If we publish as HTML, then the page is designed to be responsive, and is optimized for a swipe screen – the text size for instance adjusts automatically for a comfortable reading experience and the text re-flows to fit the space available. If you’ve ever tried reading a PDF on a mobile, you’ll know that it can be tricky to zoom, scan, and scroll as you need to move the page in all directions to read the content.

Another consideration is that if you are reading a PDF on a mobile device, you will need to use a separate app. This results in a poorer user experience as it takes you away from the website, losing both navigation and context.

screenshot of pdf showing how it is difficult to read with tiny text — How a PDF looks on mobile – the text in this example is far too small to read on the screen without zooming in.

Once zoomed in, you can see that the text no longer fits on the screen and it’s necessary to scroll both horizontally and vertically to read it. This makes for an extremely awkward and uncomfortable viewing experience as you are constantly moving the text back and forth to read the lines.

PDF content when zoomed in - end of sentences cut off

Compare to a document that has been converted to html, where the text has been resized to fit the screen:

screenshot showing how text flows to fit the page in html format

The text appears far clearer and it is easier to scan through and pick out the important information. It is easy to see at a glance that this provides a far superior user experience for someone who is accessing our content from a mobile device.

When we’re converting PDFs to HTML we can also take the opportunity to improve the layout of the content – simple things such as bulleting, adding in scannable headings, and sometimes editing the text so that it aligns to the University’s style guide, ensuring consistency across the website.

Additionally, HTML pages use headings on the page to form the contextual navigation – the in-page links you see at the side of the page on a computer, or at the top of the page on a mobile. In-document navigation on a PDF is less straightforward as although you can set up an index, it doesn’t follow you down through the page in the same way.

Because it’s so much easier to use the HTML version, we’re likely to get greater user engagement with the content. Research by the Nielsen Group has shown that task completion is higher when using HTML.

Accessibility

Under the Public Sector Bodies (Websites and Mobile Applications) Accessibility Regulations 2018 we have a legal obligation to make our content as accessible as possible. If we don’t provide accessible content then we could be breaking the Equality Act of 2010.

Although you can certainly make PDFs accessible, it can be a little tricky if you’ve never had any training or instructions. There’s actually a lot more work for the person creating the PDF as you would need to make sure you’ve correctly included navigational aids such as bookmarks, tags to provide a logical reading order, and accessible form fields (amongst other things). Tables also need to be properly tagged. Often simply converting a word document to pdf won’t be good enough.

Most of the PDFs we are sent have not been made accessible and if this hasn’t been done in the source document it can be time consuming to make the changes, and may require access to the original document. We’ve even had PDFs given to us where the entirety of the text has somehow been rendered as an image – obviously this is something that would be completely inaccessible to a screen reader!

Using HTML however, is a lot simpler. The styles and layout that we use on the website were created with both accessibility and usability in mind. Producing an HTML document will also give the end user more control over how they consume the information as they can modify the look and feel of the page in their own browser window – they can change the background colour and the font and text size for example.

Technical reasons

PDFs can be large – extremely large in some cases. Not everyone accessing your content will have a device with abundant storage, or the data to spare on large files. If you’ve not got a strong wifi signal the HTML page that uses less bandwidth will be preferable.

We have also found that issues of duplication can arise more easily with PDFs. Different people have uploaded the same piece of content with slightly different file names (we recently found seven instances of one particular PDF!) We find this easier to track when we’re using HTML pages and the CMS is handling the URL structure for us, as it alerts us to duplication.

It’s also difficult to see how a piece of content in PDF format performs. Whilst we can track downloads, we can’t see how long someone spends reading the content, or how far down it they scroll for example. Compared to a webpage where we can generate a heatmap and track metrics such as time on page and where they click to go next, this is a big disadvantage.

Links in PDFs

Links in PDFs can also be problematic. With the number of PDFs on our site, there are always going to be external links which change and end up out of date and thus broken. It is much easier for our support team to update an HTML page when they find a broken link – you can’t easily edit the PDFs and often need to go back to the original source (which we do not hold) to make the changes and then re-generate the PDF.

It’s also worth considering that if your PDF has a number of links, it might actually function better as a webpage – again, this is particularly useful on a mobile where you need to open a separate application to read a PDF document and don’t want to keep switching between browser and app.

Search engine optimisation

Historically, it has always been assumed that Google finds it easier to index content presented as HTML. This is not quite true as PDFs can be searched by Google and will surface in search results too. However, unless they are properly optimized, they may not reach as high a ranking as a comparable webpage. Links within PDFs don’t tend to carry the same value for SEO as links on a webpage, and Google also tends to prioritise mobile-first content.

If someone does find one of your PDFs on google, they can download it without ever needing to access the website at all. This means there is a danger of reading it out of context, with no navigation to the other relevant parts of your site. Perhaps the information given in the PDF is not quite the answer they needed but if they’d landed on a part of your site they might have found something else which was related and which answered their questions, or seen other links of interest and learnt something new.

Old PDFs can also linger in search results even when pages linking to them have disappeared – this can lead to out of date and misleading information.

Acceptable reasons for PDFs

Despite these compelling arguments for making your content HTML, there are still some instances where PDFs are acceptable:

Downloadable forms to be printed and completed offline (although you might want to think whether an online form would be quicker and easier for users)
Detailed multi-page legal documents and reports
Leaflets or brochures to be printed and used offline – such as the University’s prospectus

Where there is a legal requirement to publish a formal, signed document, such as the University’s charter
Material that people are likely to print and annotate themselves

If you are uploading a PDFs for one of these reasons, you need to ensure that you comply with our ‘Publishing PDFs on the University website’ guidelines.

I won’t reiterate these here, but in brief, you must make your documents fully accessible, and we will add them to a corporate information page to put the document into context and give it a summary. We never link directly to a PDF within the main body of a page, but in a separate downloads section at the bottom. Adding PDFs in this manner stops them getting lost on the site and makes it easier to audit – we can see at a glance when they were last updated. Providing the summary also helps in search visibility.

It is clear there are many reasons for moving away from PDFs online. We have been doing this gradually as part of our migration, but for practical reasons we had to leave some of the longer documents in PDF form. In due course we hope to rectify this and publish more of our documents in HTML format so that our content is as accessible as possible.