So here is the Grand Finale, the last post summarising the JISC OER Rapid Innovation funded project on Improving Accessibility to Mathematical Teaching Resources.
Overview of Project
We identified two main aims for the project in our ‘nutshell’ post:
1. To turn the current research prototype of MaxTract into a robust, reliable system that is capable of handling a large corpus of available teaching material.
2. To make it usable by giving it an interface — first a web interface, then a proper user interface — and test its effectiveness with end users.
Since then a lot of progress has been made into improving the quality of output, i.e. improved layout analysis, better font identification and reproduction and better formulae recognition, and improving the quantity of output, i.e. improving compatibility with different PDF files and fixing bugs within the parser. We have also created a web interface to use MaxTract as a service, and offer it for download to use offline.
The robustness of MaxTract has been tested by running it over a large corpus (tens of thousands) of European mathematical articles, together with smaller samples of teaching materials and user submitted documents. We have made significant progress on the types of PDF file that we can process, however some PDF documents will always be incompatible, i.e. those made from images or using type 3 fonts. In particular, we were able to process a large sample of non-encrypted online mathematical teaching materials.
With regards to the quality of output, we have extensively viewed hundreds of pages of processed documents, examining the layout of pages, structure of formulae and such like to ensure our output is accurate. We have also listened to feedback from users and made changes when errors have been pointed out. Finally, we have listened to what users want in regards to the formats that we produce, and have tuned our system to their requirements.
We now offer a system that is capable of turning PDFs, especially those containing mathematics, into a wide range of what we believe to be truly accessible formats.
Future Work
There are certain issues we have been unable to deal with as of yet. Certain types of document cause us problems, such as those with lots of tables and line drawings, and unfortunately there are still some bugs that exist. However we are continuing to work on this and will be regularly updating the downloadable application and our online service.
Also, we have struggled to get our mathematical output, other than the text version, into a suitable e-book format. For this we hope to work with Kathi Fletcher who is working on a project for creating OERs from existing documents such as Word, Google Docs and HTML, the results of which are published to cnx.org. The general idea is that we would offer the facility of directly importing PDF too, with the mathematics already marked up. We will be meeting later next month to discuss possible future collaboration.