Grand Finale

So here is the Grand Finale, the last post summarising the JISC OER Rapid Innovation funded project on Improving Accessibility to Mathematical Teaching Resources.

Overview of Project

We identified two main aims for the project in our ‘nutshell’ post:

1. To turn the current research prototype of MaxTract into a robust, reliable system that is capable of handling a large corpus of available teaching material.

2. To make it usable by giving it an interface — first a web interface, then a proper user interface — and test its effectiveness with end users.

Since then a lot of progress has been made into improving the quality of output, i.e. improved layout analysis, better font identification and reproduction and better formulae recognition, and improving the quantity of output, i.e. improving compatibility with different PDF files and fixing bugs within the parser.  We have also created a web interface to use MaxTract as a service, and offer it for download to use offline.

The robustness of MaxTract has been tested by running it over a large corpus (tens of thousands) of European mathematical articles, together with smaller samples of teaching materials and user submitted documents. We have made significant progress on the types of PDF file that we can process, however some PDF documents will always be incompatible, i.e. those made from images or using type 3 fonts. In particular, we were able to process a large sample of non-encrypted online mathematical teaching materials.

With regards to the quality of output, we have extensively viewed hundreds of pages of processed documents, examining the layout of pages, structure of formulae and such like to ensure our output is accurate. We have also listened to feedback from users and made changes when errors have been pointed out. Finally, we have listened to what  users want in regards to the formats that we produce, and have tuned our system to their requirements.

We now offer a system that is capable of turning PDFs, especially those containing mathematics, into a wide range of what we believe to be truly accessible formats.

Future Work

There are certain issues we have been unable to deal with as of yet. Certain types of document cause us problems, such as those with lots of tables and line drawings, and unfortunately there are still some bugs that exist. However we are continuing to work on this and will be regularly updating the downloadable application and our online service.

Also, we have struggled to get our mathematical output, other than the text version, into a suitable e-book format. For this we hope to work with Kathi Fletcher who is working on a project for creating OERs from existing documents such as Word, Google Docs and HTML, the results of which  are published to cnx.org. The general idea is that we would offer the facility of directly importing PDF too, with the mathematics already marked up. We will be meeting later next month to discuss possible future collaboration.

Lessons Learnt

Here, in the penultimate project post I will write about the lessons we have learnt during the project.

  1. All PDFs are not created equal. Well, really we knew this already, but there can be a massive difference in quality between two visually identical documents. Some can just consist of images whilst others have fonts embedded. Some have indexes and contents, some don’t. A few may have structural tags and alternative accessible content, but mainly they don’t, which brings me onto the second point.
  2. Considering how widespread PDF has become it is important that people realise that using PDF for dissemination does not mean others will be able to actually use or access the document properly. Almost none of the accessibility features in the PDF specification are used meaning that screens readers are usually incompatible, especially when the documents contain any non-standard text, and searching for and copying scientific content is almost impossible.
  3. Even when the accessibility features are used in PDF (inserted by us using MaxTract) they don’t always work. This is because they are simply not supported by the majority of PDF viewers, with the exception of Adobe Reader.
  4. Standards for accessible mathematics don’t seem to be widely used or supported. From our evaluation we found users didn’t really prefer any particular format. Academics often preferred LaTeX, those with certain browsers liked MathML, a few liked the extended PDF and most people with screen readers liked text. The consensus was that people did not agree on a single format.

There are obviously other things too, which can be found from reading the other posts, but these are the main points.

Editors and future work

After being put in touch by Phil Barker, I spoke to Kathi Fletcher on Friday about her project for creating OERs from existing documents such as Word, Google Docs and HTML, the results of which  are published to cnx.org. To make the documents as accessible, open and useful as possible, semantic markup is included with the resultant html, and an editor for mathematics is also being built. As MaxTract can alreadr convert PDF, and preserve the mathematics, it looks like the two would be an ideal partnership, though unfortunately it is to late to be developed significantly during this project.

Maxtract to E-book

As mentioned in the previous post, we are using Calibre to convert our Maxtract output for use with eReaders.

First of all to install under linux is easy.
$ sudo apt-get install calibre

And to get up and running, just open calibre

$ calibre

This opens a simple and friendly GUI, which can be used to import, convert and view files.

To convert a MaxTract file, simply click ‘add books’ and select the desired file. Then simply click ‘convert e-books’ for the conversion. Lots of customisable options then appear, but the default options work fine. It should only take a few seconds for a file to be converted, which can be then uploaded to your preferred e-reader.

Unfortunately, MathML is not correctly translated during the process so we recommend using the text option from MaxTract. Whilst this results in a document not as visually appealing as a compiled LaTeX or xhtml file, it does mean that users will not experience the problems of incorrect translation and rendering of mathematics when using screen readers, as found out by users in one of our earlier evaluations.

Converting MaxTract output into ebooks

We have been working on transforming the output of MaxTract into an ebook format. The tool we have decided to use for this is Calibre, an open source ebook manager and editor. Later this week we will post install instructions for converting the MaxTract output. This will be after we make a few modifications to MaxTract to ensure the outputs fully compatible.

Furthermore we are also looking into the use of Sigil, an open source ebook editor,  for fine tuning the final result.

New version of MaxTract

Sorry for the lack of updates recently but we have been busy finalising the latest release of MaxTract , which is now live and ready to accept your PDF documents for analysis. To summarise the latest changes:

  • Fixed bugs in processing certain commands which resulted in strange (notdef) characters being produced
  • Improved font handling, allowing  more file to be processed
  • Rewrite of layout analysis for far improved reproduction of pages
  • Improved detection of structures such as formulae, page numbers, headers and footers

As usual, all feedback is welcomed

Compatibility

We are now working hard to improve the range of PDF files that MaxTract is compatible with. One may think that all PDFs are equal, but unfortunately, that is not the case! Even visually identical files from the same source can be completely disparate when viewing the internal code.

One of our main difficulties is going to be to try and make MaxTract work with as many different types of file  as possible. Now that we have rewritten and documented our extraction code, it has been far easier to tweak the system to try and improve compatibility. Obviously we are never going to be able to work with all files, and some will require far more than little tweaks to our code, but we are getting there!

Alistair has provided us with various Further Education materials that we will soon  be experimenting with.

Evaluation 2

We are still getting feedback from people whom we have been put in contact with by Alistair, and we have also recently released a trial version of Maxtract online, to be tested by delegates at the Systems and Projects track at the Conferences on Intelligent Computer Mathematics in Bremen. When we have collated all of the feedback, by the end of next week, we will finalise our output formats.

We are also testing Maxtract on various input, and next week will also be testing over further education resources. We will also be looking at the best ways to convert source materials into PDF for improved compatibility and accessibility.