Testing various conversions

Jonathan Godfrey

Let’s start something. I’ll put the markdown for this file at a downloadable spot.

N.B. I used a bibtex file with just one article in it. You will need to download foo.bib if you intend to run the examples here.

If you download or open these files, you must change the filenames to update.md and foo.bib respectively.

Perhaps a heading

or a heading for a subsection

Now for some formulae such as \(\mu=\frac{1}{n}\sum_{i=1}^{n}{x_i}\)

and hopefully that was in line while \[\mu=\frac{1}{n}\sum_{i=1}^n{x_i}\] goes on its own line.

I still want to be sure that a reference like Godfrey (2013) is working properly across all formats, even if presented the other way (Godfrey 2013). The default reference presentation style is Chicago.

Notes on conversions

to html

pandoc -s update.md --bibliography foo.bib -o example1a.html --mathjax
pandoc -s update.md --bibliography foo.bib -o example2a.html --mathml

both give the reference but the maths content displays differently in different browsers. The mathjax does not render well in Internet Explorer but it’s a dead browser anyway!

to pdf

OK, if we must…

pandoc update.md --bibliography foo.bib -o example3.pdf

gives a pdf but the headings don’t show up properly and the maths is garbage, as expected. I am aware that this process can be improved, but I care little for this format as an endpoint. Let’s move on…

to Open Office

pandoc update.md --bibliography foo.bib -o example4.odt

Well the file is painful to open in MS Word and the maths content fails to come in a readable form. If I understood how good Open Office could be then maybe I’d expend some more energy here. Moving right along…

to/from MS Word

Two experiments here.

pandoc update.md --bibliography foo.bib -o example5.docx

The MS Word document uses graphics and actually did read the math content to me. All content as expected. Worth doing more extensive testing though. In particular, if the endpoint must be a pdf that is fixed to a certain paper size, then proof-reading hte docx file and then converting to pdf may well be a sensible solution for the blind author.

pandoc example5.docx -t markdown -o example5.md

OK, this shows me that the mmath content was not actually a graphic in MS Word, but actually is math content. The markdown generated is a practically identical rendition of the original. This probably explains the good access found the docx version.

to/from epub

Two more experiments.

pandoc update.md --bibliography foo.bib -o Example6.epub

So we supposedly have an epub version. I can’t test its features though. Given I don’t currently have an epub reader, I need to convert to a better format.

pandoc Example6.epub -t plain -o example6.txt

The venture into plain text stripped out all headings and fonts but left the maths in LaTeX format.

My conclusion

The above experiments show me that we have tools at our fingertips that will get blind people the information we need as consumers, and an ability to produce documents in a suitable format for the sighted world.

My personal preference is to create and consume html and mathjax, but the pdf and MS Word results would be good for presenting work to others. The ability to take irritating formats and create readable text is a good backstop, but there is still room to improve this workflow.

I still need to work on slide presentation formats.

References

Godfrey, A. Jonathan R. 2013. “Statistical Software from a Blind Person’s Perspective: R Is the Best, but We Can Make It Better.” The R Journal 5 (1):73–79. http://journal.r-project.org/archive/2013-1/godfrey.pdf.