MOBI vs EPUB

by Tina Holmboe 11th of September 2012 (archive)

I’ve done it before, and I’ll do it again: write about e–books and typography. This time out was prompted by a question by one of my favourite authors, Nicola Griffith, on Twitter:

Real question: what can ePub do that prc/mobi can’t?

combined with a discussion I had involving a norwegian publishing executive — also on twitter — regarding the very curious market for electronic books in Norway. On this I have ranted before and will, in the near future, rant again.

The question is a sensible one. Let me do a quick study.

The MOBI format for e–books was created by Mobipocket SA, a French company, somewhere around 2000. It is based on the so–called Open eBook Publication Structure (or OEBPS) standard.

OEBPS 1.0, to simplify strongly, specify an XML–based language with which to define an electronic book. It does not, contrary to belief, specify a container format (OEBPS 2 does, but that’s not relevant here)

In the case of MOBI, the OEBPS data is normally contained in a PRC–file; a Palm OS data container. Occasionally you’ll see such files with the extension .mobi instead.

Despite what Wikipedia claim, MOBI does not use XHTML. When OEBPS was first developed — version 1 was out in 1999 — XHTML 1.0 was not yet complete. As the MOBI reference documentation state:

As a consequence, the Mobipocket format is based on HTML and is reflowable

OEBPS 1.0 — often shortened to OEB — was based on HTML 4.0. The specification makes this explicit:

Any HTML construct deprecated in HTML 4.0 is either omitted from this specification or is deprecated; CSS–based equivalents are provided in most such cases. Stylesheet constructs are also used for new functionality beyond that provided in HTML 4.0.

However:

OEB was developed in parallel with and in reference to XHTML 1.0, …

and:

… is expected to conform to XHTML 1.0 when that specification is issued, and

OEBPS 1.0 was, however, designed to use CSS:

Any HTML construct deprecated in HTML 4.0 is either omitted from this specification or is deprecated; CSS–based equivalents are provided in most such cases. Stylesheet constructs are also used for new functionality beyond that provided in HTML 4.0.

Looking at the Mobipocket reference, I find that that «old style» HTML formatting is recommended in lieu of CSS:

The left alignment of the text can be forced with the <p align=«left»> or <div align=«left»> tag.

Let’s pass over with some trepidations examples such as:

<div bgcolor="#CCCCCC"><span color="#FFFFFF">My Heading</span></div>

which not only illustrate very non–CSS techniques, but mess up accessibility in that the heading isn’t, in fact, a heading at all.

Looking at the somewhat quaintly titled «HTML tags» table for the Mobipocket reader shows us that the style attribute and similarly named element simply isn’t supported.

So far so un–well: the OEBPS e–book specification allow for, and recommend, use of CSS, but the Mobipocket derivative format doesn’t actually do so.

In 2005, Mobipocket was bought by Amazon. Their Kindle reading platform supported MOBI and AZW formats — the latter simply MOBI with a slightly different compression technique. Both filetypes use the Palm container format, as does the AZW1 (or Topaz) format.

In January 2012, Amazon announced the KF8 format. These e–books, supported on the Kindle Fire and 4th generation Kindle readers, are EPUB 3 files packaged in a Palm container with some additional overhead and proprietary solutions tacked on.

So, what can EPUB do that MOBI cannot?

Well, for starters, EPUB 2 is a ZIP container with XML, XHTML 1.1 and CSS 2.1 files. No more PalmOS data blobs — these are difficult to process, not fully documented, with only partial implementation on some platforms, no longer developed or supported, and hard to manage. ZIP, on the other hand, is a generic archive format, well documented, and well supported.

EPUB 3 is similar, but uses HTML 5 and CSS 3. I am loath to call it «XHTML 5», as the HTML WG did not have a mandate to publish XHTML–family specs. Enough of that.

In conclusion, however: EPUB, which even Amazon is approaching, however obliquely, through KF8 is based on a common container format (ZIP), and use structurally sound(er) markup (XHTML 1.1), plus CSS for layout and typography.

That’s what it do better.

And, of course, the EPUB format is actually published — something the Amazon ones are not. It’s quite likely that, in a number of years, it will be very difficult to get a reader to handle MOBI. But that’s guesswork.

However, in the end, possession is nine tenths of the law. So let’s experiment!


The first step is to create an appropriate EPUB. I do so using Tetropy’s txt2book software, and a copy of «Silver Blaze» from Project Gutenberg. This is my own system, and I know exactly how and what it will produce.

I selected a «modern» profile, which will provide certain well–defined CSS constructs to look for, specifically in the way paragraphs are offset.

From there I used Amazon’s KindleGen to create a non–DRM’ed KF8 file, and finally the screenshots were taken in the Kindle Previewer application, also by Amazon — in the belief that it will handle the rendering as if were the actual reader hardware.

The results are below (screenshots are cropped):

  • Example of EPUB formatting
    EPUB in Calibre, 154 KB
  • Example of EPUB formatting
    KF8 on Fire, 36 KB
  • Example of EPUB formatting
    MOBI on Kindle, 53 KB

Perhaps we should conclude that the capabilities of the format are not as important as those of the reader?

Let’s look closer. I’ll use Charles M. Hannum’s MobiUnpack program to take the KindleGen–produced KF8 apart — specifically it’ll give me two structures: one Mobi 8 (KF8) and one Mobi 7 (AZW). I can then study the «raw» HTML output in detail.

What I find for the «old skool» MOBI/AZW format is this:

<p height="1em"><blockquote> “I am afraid, Watson, that I shall
have to go,” said Holmes, as we sat down together to our
breakfast one morning. </blockquote></p> <p height="1em"><blockquote>
“Go! Where to?” </blockquote></p> <p
height="1em"><blockquote> “To Dartmoor; to King’s
Pyland.” </blockquote></p>

The KF8 output is a structured EPUB complete with stylesheets, fonts and images.

I do believe I shall rest my case :)

References

Open eBook™ Publication Structure 1.0.1
http://web.archive.org/web/20101212183320/http://idpf.org/oebps/oebps1.0.1/download/oeb101-xhtml.htm
Open eBook Forum, 2001 (via the Wayback Machine)

The PRC Format
http://web.mit.edu/tytso/www/pilot/prc-format.html
Theodore Ts’o, 2000

MOBI
http://wiki.mobileread.com/wiki/Mobi
MobiRead, 2012