Our Halloween Writing Contest is coming up soon. Start Writing! [ Dismiss ]
Home ยป Forum ยป Author Hangout

Forum: Author Hangout

Calibre and CSS stylesheet code for an e-pub

Ernest Bywater ๐Ÿšซ

For some years I've been working at making a better e-pub that comes out how I want it. I've learned it's hard not to mess up the code and end up with a sub-optimal result. This is mostly due to the lack of instructional material on how to get the best out of the various software programs and options. However, I've also learned much by trial an error.

If you create an e-pub direct from the word processing file you usually end up with a very bloated e-pub in the end due to excessive format coding and the way the convertor programs create css sheets for each chapter in the book.

For the last few years I've been working on including the css stylesheet code in the html file I create the e-pub from, and found that this gives a better finish and a smaller e-pub file. I've also found there are a number of tweaks that helps this along. So here's some for you to use to help with your next use of Calibre to make an e-pub based on my experiences with the program.

1. Create the e-pub from a clean html file with all of the unneeded format code removed.

2. Never include any in-line format code other than what is in your stylesheet code so that the in-line code is either a paragraph class command or a span command.

3. Include your stylesheet code in the head section of the html file just after the metadata.

4. Have the stylesheet code start with the character format code definitions then the paragraph code definitions. For some reason it works better to have things like the italics, bold, and similar character span definitions at the head of the code than at the bottom. I found this out by trial an error, and it can make a serious difference in the finished file size due to how Calibre handles the code - I don't why, but it does.

Having learned some of this recently I'm now reconverting all my e-pubs to use the latest format and I'm seeing the file sizes drop significantly. Just the moving the bold and italics code to the top of the list cuts the finished file size by about 20% or more. In some cases the new files are only 25% of the old one made a year ago.

Have fun.

samuelmichaels ๐Ÿšซ

@Ernest Bywater

Ernest, are you using a single HTML file for your entire book, or one per chapter?

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ

@samuelmichaels

Ernest, are you using a single HTML file for your entire book, or one per chapter?

A single file.

I write the stories in using the Libre Office word processing program. Then I save a copy as html, run a script to clean out all of the excess coding, run another to replace the usual html codes for the character format codes and paragraph codes to convert them to what I want to use in the stylesheet code, followed by opening that html file in a text editor to paste in a copy of the css code into the file. That way I have one large file for Calibre to convert to e-pub and it then splits at the h1 and h2 tags as it makes the table of contents and the creates the epub file.

Part of the file bloat is due to Calibre including its own stylesheet code for anything I didn't cover as a span command. So in the early versions where I used the < b > and < i > commands Calibre created it's own span commands and included extra code in the sylesheet for each chapter where those usual html commands appeared. Thus there would be a span command for bold1 in chapter 1 then another for bold 2 in chapter 2 etc. By having them in the file I created the span commands ended up in the stylesheet as only 1 entry, thus saving space.

Playing with the files in the last few days is how I found out I needed the character commands at the head of the css code in the file instead of the bottom of the css code.

Tonight I also found out that creating a table to include listed items results in a tighter e-pub file than using an ordered list or an unordered list, and I don't know why that is. However, I had one book with a lot of lists in it, and when I converted them to tables to improve the way they displayed the new e-pub file was half the size of the old one using lists.

Keet ๐Ÿšซ

@Ernest Bywater

Why include the style code in the html file and not keep it as a separate css file? If you have multiple html files for chapters you must have the same style code in every single file so using a separate css file instead would decrease the epub size even more.

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ

@Keet

Why include the style code in the html file and not keep it as a separate css file?

I have one big file with all of the chapters in it, so having the stylesheet code in the html file or in a separate file makes little difference. By having it in the actual file it means the I need only copy the one file to have it all there. Also, there are a few books with a bit of extra style code and that code is only in the files for the relevant book.

Replies:   Keet
Keet ๐Ÿšซ

@Ernest Bywater

I have one big file with all of the chapters in it, so having the stylesheet code in the html file or in a separate file makes little difference. By having it in the actual file it means the I need only copy the one file to have it all there. Also, there are a few books with a bit of extra style code and that code is only in the files for the relevant book.

I didn't know it was always a single file and that Calibre split the chapters based on h1 and h2 codes. Makes sense to do it your way.
Does Calibre split the single file into multiple files and/or split of the css into a separate file before converting it to an epub?

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ

@Keet

Does Calibre split the single file into multiple files and/or split of the css into a separate file before converting it to an epub?

I'm not sure when or how it does it, all I know is I input a single file and I get a single e-pub. However, when I use Calibre to edit the file it shows an internal html file for every h1 and h2 chapter in the story with the various overhead files for the epub. This makes me think the epub file is some sort of zip style file.

Replies:   Keet
Keet ๐Ÿšซ
Updated:

@Ernest Bywater

This makes me think the epub file is some sort of zip style file.

Yes an epub is just a zip file, that I thought was known around here. From your answer I understand that Calibre does indeed split the single file into multiple files using the h1/h2 tags. Does it extract the css too and put into a separate file?
I'm interested because technically file conversions like Calibre does for html>epub is what I do professionally (not epubs, mostly data files like csv). Too bad the epub format is poorly setup. It could have been much better with stricter standardization. I'm more interested in converting epubs into html but because of the bad standardization it's almost impossible to create a conversion routine that works for all epubs.

Switch Blayde ๐Ÿšซ

@Keet

I'm more interested in converting epubs into html

I thought an epub was HTML.

Keet ๐Ÿšซ

@Switch Blayde

I thought an epub was HTML.

In a way yes but formatted in a convoluted way and packed into a zip. You can really screw up html/css and make a mess of it but it seems that's the default for epubs :D

Lazeez Jiddan (Webmaster)

@Switch Blayde

EPUB is zipped xhtml. It's basically HTML with XML rules. So it's a much more rigid format than HTML (no html errors tolerated). It's all packed into a zip archive with an opf file for index and various functions.

EPUB 3.0 is very feature rich, but hardly supported by any software.

Ernest Bywater ๐Ÿšซ

@Keet

Does it extract the css too and put into a separate file?

yes.

Switch Blayde ๐Ÿšซ
Updated:

@Ernest Bywater

Have you ever tried inputting a LibreOffice file into Calibre rather than generating HTML from LibreOffice and then inputting that HTML output into Calibre?

People say Word (so I assume LibreOffice as well) generates bloated HTML. Input bloated HTML into Calibre and I assume it won't clean it up. But if you input a docx (or odt) file into Calibre, is Calibre as inefficient as Word/LibreOffice at generating HTML?

Ernest Bywater ๐Ÿšซ

@Switch Blayde

People say Word (so I assume LibreOffice as well) generates bloated HTML

Yes it does.

Have you ever tried inputting a LibreOffice file into Calibre rather than generating HTML from LibreOffice and then inputting that HTML output into Calibre?

Yes, and it doesn't give the same appearance as the method I use, as well as resulting in the creation of a huge css file with a full set of code for each chapter. It does the same thing with .docx and every other text file with built in formatting.

Ernest Bywater ๐Ÿšซ

@Switch Blayde

Have you ever tried inputting a LibreOffice file into Calibre

Back when I started creating e-pubs I had one document which had a lot of lists in it to display the data in a specific way. It also had a lot of underlines, subscript, and superscript. This was before I developed the css code in the html file. I created the file direct from odt to e-pub and the finished file was 2.7 MB for just under 9,000 words. I recently added text to make the file 9,700 words, created the html file with css code for underline, superscript, subscript, and the latest css code in the file. When I created a new e-pub from the new html file the finished e-pub file shrunk from 2.7 MB to 0.2 MB - however you look at the results, that's a hell of a drop in file size. It more than justifies, in my mind, the little work I go through to make the html file to create the e-pub.

I'll admit that over the last few years I've spent a lot of time and effort in learning how the css code works and playing with it, but it was worth the effort learn it because the current system only takes a few minutes to go from odt to the e-pub while also providing good html files to lodge with SoL and for my personal use.

Replies:   Keet  Vincent Berg
Keet ๐Ÿšซ

@Ernest Bywater

If I understood correctly what Ernest did then this is one of the standardizations that should have been implemented in the epub standards from the very beginning: a default css file that defines the classes (tags) to use in your html. Because that default definition does not exist every program that does the conversion to epub has to generate those classes on the fly which causes the enormous bloat Ernest mentioned.
The number of codes that html uses (like p, h1, h2, b, i, sup, etc) is limited so it should not have been much of a problem to implement such a css default. To implement Ernest' method you use a css class for every tag in your html. For example do not use [b]bold text[/b] but instead use [span class="bold"]bold text[/span]. (Replace [ with < and ] with >). Also define a class for p elements so you don't use [p][/p] but [p class="p"][/p]. This preempts the converter from creating new classes for p elements in a new chapter, the main reason for the bloat. Once you have your own css styles defined you can use is again and again and change it if it's needed for a new book.

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ

@Keet

ayep, that's about the best description of what I do.

Vincent Berg ๐Ÿšซ

@Ernest Bywater

Back when I started creating e-pubs I had one document which had a lot of lists in it to display the data in a specific way.

I gave up on list elements a while back, as it's easier to code them yourself, they look better, and they don't leave those annoying blank lines after each list item. In html, you simply add "& bull;" (no space) at the start of your indented line. It's a tiny bit of extra work, but it looks neater and is more consistent. And, if you mix it with hanging indents, you get very neat list items neatly tucked into each other.

Ernest Bywater ๐Ÿšซ

@Vincent Berg

I gave up on list elements a while back,

I had only the one book with lists in it, but I was shocked when new file with tables was 0.2 MB while the original file was 2.7 MB. But the original e-pub wasn't done with my css code while this one was. Thus I suspect a lot of the reduction was due to just using the css code. I do like the way the tables improve the look in the finished product over the way the lists looked.

Replies:   helmut_meukel
helmut_meukel ๐Ÿšซ

@Ernest Bywater

Ernest,

I download all books/stories per chapter as html-files.
If I like a story enough to reread it, I use Calibre's e-book editor to create an edited EPUB (correct typos and wrong homonyms I find). I use a slightly edited version of xht.css.
I created my e-book versions of your first two Clan Amir books without a title picture.
Book 1 A Fighting Heritage is 135 KB (138,357 Bytes).
Book 2 Falcon Chick is 127 KB (130,968 Bytes).
How does these sizes compare to your own versions?

BTW, I found no use for 'span' or 'article'.
What's the title of your story with the table? I would like to have a look on it, because none of the downloaded stories of any author I already converted to EPUB has used a table in its html code.

HM.

Replies:   Keet  Ernest Bywater
Keet ๐Ÿšซ
Updated:

@helmut_meukel

What's the title of your story with the table? I would like to have a look on it, because none of the downloaded stories of any author I already converted to EPUB has used a table in its html code.

I don't think SOL supports tables so I doubt you will find a story with a table. That's why you see lists and in some cases an image of a table. Many character lists would be tables if it was supported. I know one story from Ernest that probably would have had a table if it was supported: Power Tool; a table with commands the MC can use in the prelude chapter. I did the conversion to a table for my own library but I don't convert to epub.

ETA: remember that tables do not scale very well, especially if the reader enlarges the font size for easier reading. So if you mainly read on a phone you would be better of to leave it as it is. For proper scaling it would be better to create an image with the table. (make it a .png or .gif, definitely not .jpg)

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ

@Keet

The table list issue was with the Basic Math Notes book which had a lot of them due to the subject matter.

yes Power Tool uses lists of the people and the commands.

Replies:   Keet
Keet ๐Ÿšซ

@Ernest Bywater

The table list issue was with the Basic Math Notes book which had a lot of them due to the subject matter.

Yes, of course, but that one is not on SOL so it's not the story Helmut was searching for and it's already available as an epub (free on Bookapy). Since Helmut was looking for an example with a table in it to convert to an epub I replied as I did.

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ

@Keet

that one is not on SOL

It's not on SoL as it's not a story. If anyone wants a copy of the relevant code I'm happy to send it to them if they send me a Private Message so we can exchange emails to send attachments.

Ernest Bywater ๐Ÿšซ

@helmut_meukel

The epub version I made back in April was befo9re I created the latest very fine tuned css code set. Also, the last few years the epubs of the Clan Amir series have been a single large anthology of the whole series with it being split up for SoL. So I've not got comparable stats for you. Also, due to my current legal issues I won't be working on the series until June 2021.

BlacKnight ๐Ÿšซ

@Vincent Berg

I gave up on list elements a while back, as it's easier to code them yourself, they look better, and they don't leave those annoying blank lines after each list item.

You can control list item spacing with CSS, you know.

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ
Updated:

@BlacKnight

You can control list item spacing with CSS, you know.

You can, which I've done, but it's difficult to eliminate the trailing space after the final < /ul> command. For me, it's easier simply avoiding the cumbersome automated list structure entirely, as the results look cleaner without it.

But, that's my personal opinion, and I'm sure that most will disagree with it. For most, using lists is a no-brainer, regardless of how it looks.

Replies:   BlacKnight
BlacKnight ๐Ÿšซ
Updated:

@Vincent Berg

You can, which I've done, but it's difficult to eliminate the trailing space after the final < /ul> command.

No, it's not.

ul { margin-bottom: 0; }

You also need to remove the top margin from the following element, if you haven't done so already. Top/bottom margins overlap, so if you've got, say, two successive P elements with the default 1em top and bottom margins, they have only a 1em space between them, not 2em. But if you remove the bottom margin from them, they'll still have a 1em space between them. You have to remove both the bottom margin on the preceding element and the top margin on the following element to completely collapse the space.

If you want to remove top margin on Ps that directly follow a UL but leave it on other P elements (though fuck knows why you'd want to), you can do that with:

ul + p { margin-top: 0; }

edit: Dammit, the ampersand escapes worked last time I used them.

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ

@BlacKnight

You can, which I've done, but it's difficult to eliminate the trailing space after the final < /ul> command.

No, it's not.

ul { margin-bottom: 0; }

Yeah, that makes sense, I'm guessing I'd customized the le commands but not the ol, despite making other changes to it.

Vincent Berg ๐Ÿšซ

Sorry, Ernest, but after finally giving your < span class="italics"> idea a spin, it didn't make ANY difference in the size.

Examining my ePub code, I only have a single CSS file (aside from a separate .page CSS file containing the physical page dimensions), so there's NO duplicate CSS code in ANY of my ePubs (though I haven't reviewed that many years).

I'm guessing I've always coded my html/ePubs pretty well, and the extra work required to convert simple < I> commands into < span class=_REFERENCE commands doesn't seen worth it.

There must be some other reason why your books are generating multiple CSS files.

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ

@Vincent Berg

There must be some other reason why your books are generating multiple CSS files.

CW,

there's only one css file in the epub, but Calibre adds lots of extra code with css items names of things like Calibre 1, Calibre 2 etc. Having everything in the css code I submit, and putting the in-line format code first reduces those added css code items. When I examine and older file (yes I still have one) where I didn't have the everything in the code the stylesheet created by Calibre has an entry in the css for each splie that's exactly the same but with a name which indicates each one is for a specific split.

The old file has a css of 434 lines for a 4 chapter story of under 1,000 words and is 0.7 MB while that story is included in a new document of 42,500 words for 0.4 MB with a css of 165 lines covering 56 chapters.

So something is being changed by the inclusion of the css code and the order it's laid out in.

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ

@Ernest Bywater

there's only one css file in the epub, but Calibre adds lots of extra code with css items names of things like Calibre 1, Calibre 2 etc. Having everything in the css code I submit, and putting the in-line format code first reduces those added css code items. When I examine and older file (yes I still have one) where I didn't have the everything in the code the stylesheet created by Calibre has an entry in the css for each splie that's exactly the same but with a name which indicates each one is for a specific split.

Those various Calibre settings are formally duplicates, ensuring that your code confirms to Calibre's dictates. Whenever I specify various details in my H1 commands, Calibre puts in Center1 and Center2 settings which definitely duplicate or modify my own settings.

However, I don't get ANY duplications associate with my in-line code, just the 'override' settings to prevent users from narrowing the margins and other settings too much.

Vincent Berg ๐Ÿšซ

The replies I entered yesterday seem to have evaporated (or simply never got posted), so:

BlacKnight:
You're right. I'm not sure what I was thinking, as I'd probably modified the list-item (< li>) but not the object list (< ol>. But still, I prefer the simple & bull bullet point over the canned variety that Word uses. It's smaller, simpler and looks less like a bullet-point powerpoint slide presentation at a business meeting. But again, that's just one weird guy's perspective. Fell free to ignore.

@Ernest:
Those extra style definitions mostly serve to restrict what you're allowed to do in Calibre, as my centered H1 chapter headers contain several versions of my simple "centered" command, duplicating my < span clause>, but the duplications are only made once in the single CSS document, so other than complicating my H1 chapter headers, they aren't really providing much processing overhead.

You'd originally sold your 'simplification' of your in-line formatting commands as reducing the overall file size, but after trying it, I noticed NO size difference at all (though that may be because the graphics in my files make them too large to see any small variations).

By the way, the changes also had no impact on Lulu.com refusing ANY new publications or updates to ANY of my books, as the ePub checks I get back seem to object to every SINGLE style definition I use that's NOT a default html usage. I have no idea why Lulu would refuse to allow style definitions, but if I'm UNABLE to post or update my stories, there's really no reason remaining on Lulu any longer.

My last two books, Building a Nest of Their Own and my Not-Quite Human Box Set haven't allowed me to complete my three-book series, and it looks like I won't be able to post anymore unless I restrict my books to standard < p class=MsNormal> paragraph types. But there are too many publishing choices to waste time trying to dance around their ever-changing limitations.

Replies:   Keet  Ernest Bywater
Keet ๐Ÿšซ
Updated:

@Vincent Berg

but not the object list (< ol>.

Just a quick correction: ol = ordered list, ul = unordered list. You can set an attribute type="x" on an ol element where x = A, a, I, or i. For a numbered ordered list you can set an attribute start="x" where x is the number to start the list with.
Examples: < ol type="a"> ..., < ol start="50"> ...

ETA concerning the top/bottom margins around lists:
w3schools says that most browsers use these default settings for a list:
ul {
display: block;
list-style-type: disc;
margin-top: 1em;
margin-bottom: 1 em;
margin-left: 0;
margin-right: 0;
padding-left: 40px;
}
To change the top and bottom margin you would think that you could add this to your CSS:
ul {
margin-top: 0em;
margin-bottom: 0em;
}
But that doesn't work if the p elements above and below the list have top/bottom margins of 1em. The space would remain the same. Set the values to -1em and the empty space at the top and bottom disappears.

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ

@Keet

Just a quick correction: ol = ordered list, ul = unordered list. You can set an attribute type="x" on an ol element where x = A, a, I, or i. For a numbered ordered list you can set an attribute start="x" where x is the number to start the list with.
Examples: < ol type="a"> ..., < ol start="50"> ...

Thanks! I'd completely forgotten about that option, as I'll often expand one list element over more than a single line (by referencing the previous names of my revised books), but they keep throwing off the numbering in my book list. Now I have a handy way of working around that situation, but will definitely need to redefine my ordered and unordered lists to prevent duplicate blank lines following them.

In my case, I basically started my own lists after changing them to 'hanging indents', where multiple paragraph indents all line up under the initial & bull; flag. You get much more control with the hand-coded ones, particularly as the indents vary depending on the width of the numbered lists.

Ernest Bywater ๐Ÿšซ

@Vincent Berg

@Ernest:
Those extra style definitions mostly serve to restrict what you're allowed to do in Calibre, as my centered H1 chapter headers contain several versions of my simple "centered" command, duplicating my < span clause>, but the duplications are only made once in the single CSS document, so other than complicating my H1 chapter headers, they aren't really providing much processing overhead.

CW,

I'm not sure what you're doing different to me, or what I'm doing different to you, but I've noticed a major change is file size due to the changes, and checks shows most of it comes down to the in-line formatting.

In the older files where I had no css and used the html commands of < b > < / b > < / i > within text paragraphs Calibre was converting them to span commands and inserting specification in the stylesheet with the same info repeated with different numbers, one for each split for each type of paragraph or text variant in each split. Thus one small file ended up with 44 definitions with the great majority of them being the same few repeated but with a different number for the split it's in.

That 1,500 word file is now included within a 42,500 word file done with the new css code in it which includes 4 extra font type formats plus two extra font size formats as well as in-line font color commands above what was in the old file. Yet the new and larger version has a total of 24 definitions in the stylesheet and none are repeated.

In some documents I had the in-line code for bold, italics, blue, red, and green at the bottom of the css code list in the originating html document. But when I changed those to being and the top the finished file size reduced some more, despite the only change being to move that section of code from the bottom of the list to the top of the list for that html document.

Old file of 1,500 words is 0.7 MB, new file with 42,500 words plus extra colours and character formats is 0.4 MB. As I said before, I don't know what the reduction happens, all I can report on is the empirical evidence that I see happening.

Since this thread started I went through and recreated all the recent e-pubs with the new code order, and that's the only change I made to the code was to shift the in0line code to the start of the css code, and every single e-pub I've done that to has had a reduction in size.

I should really update the writer guide with that change in code order.

Also, while I think about it. If you add a cover image in the metadata section where you can select and image file to include, you should not include the code to insert a copy of the image within the book itself, as that will result in two copies of the image being included in the file.

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ

@Ernest Bywater

I'm not sure what you're doing different to me, or what I'm doing different to you, but I've noticed a major change is file size due to the changes, and checks shows most of it comes down to the in-line formatting.

In the older files where I had no css and used the html commands of < b > < / b > < / i > within text paragraphs Calibre was converting them to span commands and inserting specification in the stylesheet with the same info repeated with different numbers, one for each split for each type of paragraph or text variant in each split.

You may have identified the discrepancy. I've long used my own CSS (or at least Style Definitions), I've NEVER witnessed my in-line formatting being converted into html span commands! That might because I've also never used "italics" or "emphasis" commands, relying on the simpler < I> and < b> commands.

Also, while I think about it. If you add a cover image in the metadata section where you can select and image file to include, you should not include the code to insert a copy of the image within the book itself, as that will result in two copies of the image being included in the file.

Unfortunately, Calibre doesn't give you an option of including the cover image in your file. However, it's easy enough to either delete the entry entirely (stripped the cover image from you book list profiles so they display correctly within Calibre), OR replacing the image with a standard Title Page.

But as I intimated earlier, I suspect that since I make such extensive use of graphic images, I'm not detecting the smaller character level differences you're observing.

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ
Updated:

@Vincent Berg

Unfortunately, Calibre doesn't give you an option of including the cover image in your file. However, it's easy enough to either delete the entry entirely (stripped the cover image from you book list profiles so they display correctly within Calibre), OR replacing the image with a standard Title Page.

You can change how to handle covers within Calibre you can go to Preferences - Common Options - Structure detection - there's an option 'Remove first image' if you wish to tick the box to active it.

Another is Preferences - Output Options - EPUB output - there are options 'No default cover' and 'No SVG cover' if you wish to tick the box to active either of them.

Also, when you use the menu option 'Edit metadata' there's options there to include or change the cover image.

For those who input DOCX files will like that in Preferences - Input Options - DOCX input - there is an option 'Do not try to autodetect a cover from images in the document' so if you wish to tick the box to deactive it looking for a cover within the document.

Another aspect is if I include the standard < img src = " " > commands within the text of the html file I'm using as the input file a copy of the image called ends up in the file immediately after the Cover Image selected in the Edit metadate option.

.........................

I just did some experimenting and having the usual img src command to insert an image within html in the input file will definitely include the stated image in the file at the point it is in the text. When I have the img src command immediately after the < body ... > command and before the story title information and use the Output option of No Cover with No svg cover the epub opens with the image, but the edit option shows there is no cover image within the epub package.

I suggest you check out and play with the various options to find a set of options you like.

typo edit

Back to Top

Close
 

WARNING! ADULT CONTENT...

Storiesonline is for adult entertainment only. By accessing this site you declare that you are of legal age and that you agree with our Terms of Service and Privacy Policy.