The clitorides voting is open until the end of April. Vote for your favourites [ X Dismiss ]
Home « Forum « Author Hangout

Forum: Author Hangout

Creating an E-pub from HTML code in Calibre - I had a file name issue

Ernest Bywater
Updated:

First, I'm not an expert on Calibre, although I do know a lot about how to use it. Nor am I an expert on CSS stylesheets although I can get some simple ones to work. Nor am I an expert on E-pub files, but I do create a lot and they validate properly.

Background

Until recently I created my e-pub files by importing the .odt file into Calibre and then using that to create the e-pub file. To help with this I use paragraph styles within the .odt file. I recently had someone tell me that the e-pub files I was creating were exceptionally large for the size of the document within them. I investigated and found out that the way i had the system set up it was embedding the fonts into the files. I stopped that and it cut the file a lot, but it still seemed high according to the reader checking them for me - hey he started it, he can check 'em.

I next used the Edit function within Calibre to look at the e-pub file and found that where I had 17 paragraph styles in Libre Office and 7 of those were deleted prior to making the e-pub (it doesn't need the footers, headers, of contents styles) Calibre was creating about 40 paragraph styles to use within the CSS stylesheet for the e-pub from the 10 paragraph styles I had in the text file. Thus I went to look at ways to make the file smaller.

For a long time another author has been telling me to use CSS stylesheets, something I'd not used before, so I started using simple ones based on the ones used on SoL - Yes, CW, I do listen to you at times.

Problem

Anyway, I attempted a more complex CSS stylesheet and revised my Libre Office paragraph styles to better suit the CSS stylesheet codes I set up with all the paragraph format code except the font name - thus it had the font size, font color, if it was bold etc, spacing info etc. I ended up with 10 sets of data to cover all of the paragraph styles I use. I then used this to create the HTML variant of the story (something I do anyway) and then imported the HTML file into Calibre to create the e-pub. The result was a much smaller e-pub file, but it would not validate and it kept telling me each of the splits in the e-pub had faulty file names.

I've spent five days playing with things to see if I could get it to work because it should. I finally got everything to work and have Calibre create valid e-pub files from the HTML code with the CSS stylesheet. The fix was simple in the end.

When I add a word processing file to the Calibre library it adds my name to the file before it saves it. Thus the file Play Ball!.odt becomes Play Ball!- Ernest Bywater.odt. However, when I import the HTML file the file Play Ball!.html becomes Play Ball! - Ernest Bywater.zip. in both cases the finished e-pub is Play Ball! - Ernest Bywater.epub - but the one from the HTML fails validation due to the file name failure.

On examining both files in the Calibre Edit system I noticed a difference in the names for the sub-files. At each page break in the initial file the e-pub makes a split and creates it as another sub-file. The files start at number 000 and increase as the extra breaks are added. The sub-files from the .odt file were all named index_split_000.xhtml while the ones from the HTML were named Play Ball!_split_000.xhtml. While looking at them I remembered the old FAT16 file naming convention and tried that.

I went back to the HTML file and changed its name from Play Ball!.html to Play_Ball!.html and created the e-pub file which came out as the same name as before. However, the file validated and a check inside showed the sub files as Play_Ball_split_000.xhtml.

Answer

In short, if you want to create an e-pub from your HTML file using Calibre you need to make sure there are no blank space in the file name before you take it into Calibre, simply replace them with the underscore character of _ and the system will work for you. It will have a file name failure if you leave a blank space in it.

I've no idea why Calibre plays nice with the Word Processor files and not the HTML files, but that's the way it is.

Crumbly Writer

@Ernest Bywater

The result was a much smaller e-pub file, but it would not validate and it kept telling me each of the splits in the e-pub had faulty file names.

Calibre is not designed as an epub editor. They've only recently added the facility, and it's largely a hit-or-miss kludgy attempt. If you add large blocks of text, then Calibre will divide it's existing sections, meaning that every single link in those sections will break (Calibre keeps none of your original links, instead substituting links to the Calibre created sections & line numbers). If you're going to do extensive editing, switch to Sigil, which IS designed for editing epubs.

According to the Calibre documentation, they assume that readers will submit filenames containing nothing but your ISBN, and that you'll add the file once you get into "Edit metadata" section (i.e. Calibre was NEVER designed for ebook authors, it was designed for readers who want to convert books into other formats). The Calibre forums (and those of the various programs which generate book files) are filled with numerous accounts of how Calibre mangles names on a regular basis. Again, it's a kludge of a program.

Finally, for more annoying suggestions, most programmers assume that you'll create epubs with separate files for each chapter. They list numerous step-by-step, which are incredibly hard to fathom since they're so confusing an technical about how to split a single book into 30 separate files and have it line properly in a completed epub with an TOC.

By the way, Ernest, since talking to you, I've been researching how to format book/library matadata into epubs, which is another hard-to-fathom process, but the results are utterly unsupported in Calibre! So all that time I spent researching it was completely wasted.

Replies:   Ernest Bywater
Switch Blayde

@Ernest Bywater

Did you get a Table of Contents doing it that way?

For my first epub, I wrote the XHTML/CSS by hand and fed that into Calibre. I remember it was a pain to create the ToC.

Replies:   Ernest Bywater
Ernest Bywater

@Switch Blayde

Did you get a Table of Contents doing it that way?


Yes I did, but two of my paragraph styles are h1 and h2 and their's a setting in the conversion where you can tell it to use the h1 and h2 to create the Toc - so I use that.

If you want to look at the before and after files say so, as I have your email in Thunderbird.

Ernest Bywater

@Crumbly Writer

Calibre is not designed as an epub editor.

I don't use it to edit the file, just to look at what they have within the files they create. Do you want a set of before and after files sent to you by email to look at as how I have it now it all works.

Replies:   Crumbly Writer
Crumbly Writer

@Ernest Bywater

I don't use it to edit the file, just to look at what they have within the files they create. Do you want a set of before and after files sent to you by email to look at as how I have it now it all works.

No thanks. I know why it failed. Instead, I've got to see whether Calibre has a book publishing metadata add on or I'll have to spend several days figuring out Sigil once again as an alternative.

Lazeez Jiddan (Webmaster)

@Ernest Bywater

I've no idea why Calibre plays nice with the Word Processor files and not the HTML files, but that's the way it is.


I guess it's a matter of expectation on the part of the software. HTML files aren't supposed to have a space in them. A space is supposed to be encoded as %20.

On the other hand, word processing files are allowed anything that the file system accepts which is almost anything except directory delimiter. for Mac it's ':' for windows it's '' and for unix it's '/'.

I'm surprised that the bang '!' didn't cause issues.

Replies:   Ernest Bywater
Keet

There are a lot of restrictions on directory and file names and it differs according to what file system is used. There's a nice article on Wikipedia about this: File names. To be on the safe side it's usually a best practice to use only characters a-z, numbers 0-9, and instead of spaces use an underscore. It is allowed but better not start a filename with an underscore.
Funny thing: when I do a save-as from a SOL chapter on my linux box (FireFox) it uses colons in the generated filename. I cannot copy that file to a Windows NTFS filesystem. Since I don't have any Windows systems and don't care about portability it's no problem for me but for someone with mixed file systems it's something to keep in mind.

Ernest Bywater

@Lazeez Jiddan (Webmaster)

I guess it's a matter of expectation on the part of the software. HTML files aren't supposed to have a space in them. A space is supposed to be encoded as %20.


I know enough to expect that behaviour with the code inside the file, but the issue is with the file name itself where with one extension type it accepts spaces and has a routine to deal with them, and with another extension it doesn't do that. Anyway, it not important now I know what is happening and why, thus I can take action to ensure things happen in the correct way.

Lazeez Jiddan (Webmaster)

@Ernest Bywater

Anyway, it not important now I know what is happening and why, thus I can take action to ensure things happen in the correct way.


Since you use the software often, I think it would be good for you and everybody who uses it to report this bug to the makers of the software. That way you don't have to worry about it.

Replies:   Crumbly Writer
Crumbly Writer

@Lazeez Jiddan (Webmaster)

Since you use the software often, I think it would be good for you and everybody who uses it to report this bug to the makers of the software. That way you don't have to worry about it.

Users have been bitching about this 'bug' since 2014, both to Calibre and across a wide variety of author and software forums, but so far, they refuse to address the issue. That simply how they do business. They expect single word file names, and if you don't play ball, then you suffer the consequences. In fact, if you read their user manual, they strongly suggest that you name each filename with the books ISBN, rather than the actual name of the book.

To put it in the words of MS: "That's not a bug, that's an (intensely disliked) feature."

Back to Top