Demystifying EPUB Structure for Non-coders

© Christen Thomas
[email protected]
Demystifying EPUB Structure for Non-Coders
What should an editor know about
QA (Quality Assurance)
EPUB Structure
Basic coding
DRM (Digital Rights Management)
Editors know markup!
Ebooks use markup too…
(we’ll break this down later)
Consistency, clarity, & conciseness
Editors know all about these three goals.
Good code should follow the same rules.
CMOS for print = EPUB check for ebooks
Editors know content best,
& are best suited to check it
EPUB coding consists of the book’s content, and
applying styles which control what that content looks
Example: H1 = Heading 1.
The name of the heading is the content.
The font size and treatment is the style.
What does it mean to edit an ebook?
Approach the ebook as a reader.
Use multiple devices.
ie: Are headers or captions appearing
isolated from text?
ie: Are illustrations rendering correctly or scale
with the device?
Examples of bad QA
Bad UX (User Experience)
Common QA errors
Weird f o r m a t t i n g
Missing, duplicated, or incorrect text
Misplaced or unanchored images
Broken links (Table of Contents, endnotes, URLs)
Typohs [sic]
Image quality
Unsupp☐rted char☐cters
Definitions, description, and support
ePUB: Content priority over design. Reflowable and accessible
over many devices.
PDF: Priority in design matching print. Complex content.
ePUB: In detail
Reflowable content, across devices
Best for text only with little formatting
Require reader software
User can change font and size
Screen size irrelevant
No page references
Images lower quality, often b&w
EPUB has a bit of a poetry problem…
New standard
Supports HTML 5, audio, embedded videos
PDFs: in detail
Control design & presentation
Ideal for complex content
Maintains format of print regardless of device
Requires scrolling or a large screen
Good quality of images
Ebooks and Ereaders
Amazon’s proprietary format
Offer free conversion from ePub for publishers
Don’t offer the Mobi file in return
Small differences in formatting from ePub
Why better to convert to their format yourself?
Used on Kindle Fire (Amazon)
Replaces Mobi
Supports HMTL 5, CSS3 (ePUB 3 features)
IBA format
iBooks Author
Proprietary Apple format
Similar to EPUB, cannot be read or edited as an
EPUB document
Fixed-Layout EPUB (FXL)
Keeps same layout and design as print, can also be
enhanced and interactive
Styles and layouts are not reflowable
Amazon, B&N, Kobo, Apple accept them
Mostly kids books
Fixed-layout EPUB sample
When to use fixed-layout
If placement of text, tables, illustrations, is essential to maintain
text or story
for titles that must match print-book layout (ex. contractual
obligation to replicate printed work)
For children’s books, illustrated textbooks, cookbooks, 2-pagespreads, full-bleed art books
Enhanced Ebooks
Video, audio
Can embed PDFs
XHTML tables
Additional internal and external hyperlinks
Enhanced ebook sample: eBook Architects
Looking at its folders and files to demystify
EPUB structure overview:
EPUB files consist of content & style
(chapter files in HTML & CSS file)
Additional content in an EPUB:
- cover page
- copyright page
- table of contents
EPUB structure files:
- mimetype
- META-INF and container.xml file
- OEBPS and the OPF and NCX
xhtml content
Exercise: Unzip/Zip EPUB file
Download EPUBZip
Select a DRM-free EPUB file and then
UnZip it. Now you can look at the contents!
Open an XHTML file.
You can also select your EPUB folder and
Zip it: this will create a Zipped EPUB on
your Desktop.
Exercise: Validate your EPUB file
Go to
Select your EPUB file
Try introducing an error into an XHTML file (like
removing some of the tags), zip the EPUB and
validate. Read errors, fix and try again.
The building blocks of EPUB
EPUB = machine / human readable
What is HTML?
Markup Language developed for the internet
How text and media are displayed by a
Text-only file with tags
EPUB uses HTML to markup ebook text
Like proofreading markup, HTML is notation
inserted into text
Tells browser and ereader how the manuscript is
Lots of resources to find HTML tags!
Without HTML
Text like this:
Chapter 1
Once upon a time,
They lived happily ever after.
Displays like this in a browser, without any markup:
Chapter 1 Once upon a time, … They lived happily ever
With a little HTML
Add this markup:
<h1><b>Chapter 1</b></h1>
<p><i>Once upon a time,</i></p>
<p>… </p>
<p>They lived happily ever after.</p>
And it will display like this:
Chapter 1
Once upon a time,
They lived happily ever after.
What can HTML describe?
Heading levels
Paragraph breaks
Line breaks
Block quotes
Font sizes
Basic text styles (italic, bold)
Some basic HTML tags
<b>bold</b> or <strong>strong</strong>
<i>italics</i> or <em>emphasis</em>
To break a line of text
<br/>use the break tag. Note that it only requires
one tag, not an open and close tag.
Structure of an HTML file
HTML is plain text.
Write it in a plain text editor, such as Sublime Text,
Notepad, or TextEdit.
Microsoft Word imposes hidden markup which will
interfere with your HTML file/needs clean up.
Text editors default to a .txt file, but you can save it
as an .html file or .xhtml file. Wait, WHAT IS THAT?!
So what is XHTML?
XHTML is used in EPUB. It follows stricter rules. But
everything covered so far is true of HTML and XHTML.
XHTML has more detailed declarations.
Tags are Case Sensitive.
All tags are paired with an ending tag.
XHTML is not a different language than HTML, think of it
as a different accent.
HTML: created for browsers
A browser can understand this:
But that doesn’t look very good, does it?
To make it human readable, what to do…?
Whitespace makes code readable to humans, and
easy to edit. Also, indentation helps. Isn’t this better?
<!DOCTYPE html>
<title> </title>
What are those angled brackets?
Define the structure of HTML
Components of your book
Skeleton of the EPUB’s content
Usually, you need a pair of tags:
An opening tag <tag>and a closing tag with
an forward slash </tag>
<body>your content is here!</body>
Nested tags
Tags can be "nested" within other tags, like this:
(This is referred to "parent" node and "child"
Here’s how it works:
<title> is nested between opening and
closing <head> tags meaning that <head> is the
parent node and <title> is the child node.
<head> also happens to be a child node of <html>.
<html> is the always the "root" node and
has no parent nodes.
<head> and <body> are on the same level. They
are both child nodes of <html>. How do you show
Indentation helps
<!DOCTYPE html>
<title>Title goes here </title>
<body> Content appears!
What is CSS?
Cascading Style Sheets
The difference between content (like text) and
presentation (font, text colour and size)
“Cascading” means it should reduce redundant
HTML describes content, not how it looks.
CSS gives you control over how this content is
CSS styling not in HTML:
Leading (line height)
Paragraph indenting
Control over serif or sans serif
Finely calibrated font size
Margins and padding on elements
Element outlining
CSS: Control in one central place
You can set the style once, and have it applied
everywhere. You can tweak styles in one place to
change the entire document.
This is an efficient workflow! And maintains
EPUB structure takeaways
EPUB consists of chapters in xhtml, plus a CSS file
One CSS file is used per book, applies styles to all content
EPUB links and orders the content, includes a table of
contents and book cover, and lets an ereader understand it
Homework: As per previous exercises, get your hands on a
DRM-free EPUB file. Unzip the EPUB
Look at use of CSS and HTML with your new knowledge!
Make some coding changes and look at the results.
Resources on EPUB
 Great tutorial and listing of
HTML and CSS tags
EPUB Straight to the Point: Industry go-to book
about learning EPUB
Free tutorial on EPUB Resources, tutorials,
articles on ebook production
Ladies Learning Code: HTML & CSS
Scheduling, conversion, version control, cost
Workflow suggestions
know your ebook formats at acquisitions
go from production final files to ebook files to avoid
parallel changes
use templates if possible in InDesign
consider tools like InCopy to improve marking up
paper and designer input of changes
To “mark up” corrections in an ebook, reference
chapter, quote, instead of page #
If a PDF format, mark up as per print
Workflow/QA takeaways
Don’t let your readers be your proofreaders, this
undermines your product. – Laura Brady @LauraB7
Test on multiple ereading devices, use ePubCheck
Treat the ebook as relatively raw manuscript not
printed duplicate of edited book
When you OCR, expect more QA
Balance your conversion house options with time you
have to fix mistakes
What is DRM, and what does it do?
DRM: Digital Rights Management
When you are reading ebook, do you feel that DRM:
a) Protects the publisher and author from piracy
b) Interferes with your reading experience
c) You don’t notice it
d) Forces you to read/purchase within a closed
e) You remove the DRM
f) Prevents you from owning or sharing the ebook
Digital Publishing Resources
eBOUND Canada and BookNet Canada
#eprdctn on Twitter
Mobileread forum
Digital book conferences (Digital Book World,
TechForum, eBOUND workshops, BookSummit, and
BookCamp – Free!
Bloggers like, Mike
Thank you!
[email protected]

similar documents