Dear Science — Let’s stop using PDF
It’s 2011, it’s the future. The Earth is doomed. I’m making a Space Ark. For Space. There’s no room for printed material on my Space Ark. “A4” is just an abstract concept for when we used dead trees to store our information. For when we collated facts like so many dead butterflies and bound them in books to sit on shelves and gather dust.
It’s 2011 and we’re still using PDF to publish academic papers. Dear God Why. While the web is evolving, constantly finding new ways to present text, information and graphics, scientific papers are stuck in a world unchanged since the 17th century.
Viewing Documents
Ignoring the added functionality that writing for the web provides, the viewing experience alone is reason enough to bury PDF.
As newspapers began their first forays into publishing on touchscreen devices, they were ripped to shreds by most interface designers for simply providing their printed content as-is on the scrollable screen. Users had to scroll back up when changing columns or changing between stories, it was impossible to link stories to friends, reference parts of the newspaper, quickly navigate to sections of the newspaper.
All these issues are present when viewing papers on PCs or mobile devices, yet it seems to be accepted as the norm. Tools have been developed for searching, mining, linking and generally enhancing academic papers, but the method of viewing papers in digital format hasn’t changed in the last 10 years.
Linking Documents
This is not hard. Linking documents is useful. Imagine the web without links. Better still, look at what you have to do to look up a referenced paper within a digital format academic paper.
- See reference in text, e.g. (Smith et. al. 2009)
- Scroll/page to end of document
- Work out if references are in last name order, or something else
- Visually find the last name of the author (or ctrl+f Smith)
- Guess which “Smith 2009” is the right one if there’s more than one
- Highlight title of paper with cursor
- Copy to clipboard
- Alt-tab to browser
- Paste into search box
- From results, find one that looks relevant
- If PDF not available on search page, click through, find PDF on next page… etc.
Compared with the experience on the web:
- Hover over - see full paper details in popup using simple Javascript
- ???
- Click text to open referenced PDF
It would even be possible to link to specific sections or paragraphs of the papers to which you are referring. Or even specific parts of results tables. So your reader doesn’t have to search the entire paper to find which result you mean. Wouldn’t that be revolutionary.
What to do?
In short, publish papers in HTML, using Javascript and CSS. This isn’t a revolutionary idea.
The ideas I’ve put below are just off the top of my head. The web is overflowing with competing alternative ways to display information, be it text or graphical. And that’s the point, we should be embracing new ideas. As printed media is dying, we should move away from the absurd situation we’re in, where our primary medium is a printed format that has been repurposed for screen use.
Technical Considerations
Much as it pains me, printed media is still here to stay for a while, so an ideal solution would be able to produce both LaTeX and HTML. Pandoc (there are probably other equivalents) does the majority of this already, creating LaTeX and HTML from a single Markdown input.
To make HTML-based publishing a viable alternative existing PDF format, it must provide more features than PDF. And for it to be accepted, it has to be as easy as the current method if not easier.
Text
LaTeX and HTML are already roughly equivalent, with different size headings, bold formatting. Personally I prefer Markdown for formatting that’s easier to read and visually parse than HTML or LaTeX.
Formatting the raw text could be done with a combination of CSS and Javascript modification. There are JS libraries that can resize/reformat text Journals could still have their own stylesheets that they provide or host for authors, in the same way they often provide LaTeX templates.
We no longer consume information purely through A4 (or letter if you’re American) paper. PDFs are completely oblivious to this, making you scroll, zoom, strain your eyes. “Responsive” is the name given to modern websites that offer reformatted content depending on the size of the viewing screen. There are hundreds of examples.
I want to clarify that I am not talking about publishing academic content on a website (thanks @gneubig). The idea would be to provide existing paper content, and Only that content in a web-standard, flexible-viewport format.
Equations
This is easy. MathJax is superb.
Tabular Data
Again, this is relatively easy. HTML already has good support for tabular data, and there are many JS libraries that can be used to add automatic formatting or allow reordering of the data.
Charts
There are a million Javascript libraries for this, but the problem is not many of them are that great. Google charts is decent enough, but my main worry for this is learning the interface may be a barrier to users.
In my mind, good chart support is probably the hardest part of this idea. In particular, making the interface simple to use, but also easy to enhance with useful interactions.
Making charts interactive * Could allow for selecting different groups of data to visually compare on the same graph * No more information available by hovering
Interactive Figures, Runnable Code
You are no longer tied to static images, free your mind. It’s totally possible to make more complex interactive figures in JavaScript to illustrate your point. Not just graphs but scripts that require user input, and show them how your research really works in practice.
For example a JS implementation of your algorithm that let viewers change values in text boxes, seeing the effect on the output and a corresponding chart.
Videos
Yes! You can add videos to webpages. No more being restricted to static content. Outside of computer science I think this would be much more useful, but it is good to have the ability to add other media.
Possible Problems
There are some problems that I want to think about more:
- By introducing external sources, it may be harder to archive pages so they are still usable in 10 years time.
- Similarly, linking to many resources makes working offline more difficult. Ubiquitous internet access is not quite here yet.
Conclusion
After writing all this, I’m going to have to put my money where my mouth is and write some sort of template.
- Markdown
- Responsive layout
- Interactive charts
- Intelligent bibliography
27 Notes/ Hide
-
othymoore65m liked this
-
adamjtaylor liked this
-
dylanfm liked this
-
benhumphreys posted this