Ubuntu & !(EPUB support) have pretty long history now.

Let’s change it!

Everyone would say that Ubuntu gets a big plus for having a handful of apps which handle out-of-the-box all of the most popular content formats. Unfortunately, that is not the case for EPUB. EPUB is free and open e-book standard that is widely used. To support it in Ubuntu a backend has to be implemented in the default document reader, Evince. You can also look at the bug report.
Evince_ePub

EPUB defines a means of representing, packaging and encoding structured and semantically enhanced Web content — including XHTML, CSS, SVG, images, and other resources — for distribution in a single-file format.
EPUB 2 was initially standardized in 2007.
In October, 2011, EPUB 3 superseded EPUB 2 as the current version when EPUB 3.0 was
approved as a final Recommended Specification.
[1]

Good start could be to use libgepub library. It provides an EPUB rendered widget using glib and webkit. It is based on libgxps. libgepub library is unfinished, therefore some work has to be done in order to add it to Evince.

Today’s post gives an overview of the first challenge that needs to be faced to support EPUB. That is a layout or pagination problem.

LAYOUT

The design center of EPUB is dynamic layout: content is typically intended to be formatted on the fly rather than being typeset in a paginated manner in advance (i.e., expecting a particular sized “page”). EPUB files are paginate with “optimal fill”: new “page” is created only when there is no more space for text on the previous page. The user has ability to adjust font size, preview mode and different settings, and pages will be dynamically created. This way of formatting means that the original page numbers, from the printed book will not correspond to the eBook ones. But they can be displayed in a format from-to (for example 1-3/100). This functionality has to be implemented as the backend solution in Evince. Versions of EPUB have different way of solving the pagination problem.

For EPUB version 2.0.1, some files use DAISY’s Navigation Control File (NCX) for pagination. In NCX’s pageList element contains navigation information for pages within pageTargets. Each navigable page within the book will be represented by a pageTarget within the pageList.

EPUB 3.0 includes its own pagination specification. The page-list nav element is a container for pagination information.

The problem is that some EPUBs (version 2.0.1) don’t have NCX files. This could be solved by implementing a pagination algorithm.

There are several criteria in use to divide the ebook’s content in pages:

  • Adobe Digital Editions interprets every 1024 bytes of source-content as a “page”. This solution gives consistent page numbers across all screen and font sizes, but it shows wrong behaviour for pictures, which are commonly heavier than plain text, and will occupy more pages.
  • On the other hand Sony BBeB reader counts “pages” in units of “screenfuls of text”. In this case page numbers will change according to the font size.
  • A progress indicator as an incremental percentage (Eg. “You’re at the 80% of the book”).

The first solution seems like the best one in case that the version of EPUB file is 2.0.1 and it doesn’t have NCX file. Some extra calculation should be done in order to display pictures on one page. The pictures could be detected and it’s position could be calculated in advance. As other formats of EPUB files include page numbers in their specification there is no problem of displaying the actual page number.

This is the first in a series of posts about supporting EPUB file format in Ubuntu. Until next time…

[1] EPUB specification

Advertisements