I was all excited about Calibre generating an epub from a Word docx file, even the TOC, but someone told me Calibre generates "messy" HTML. "Messy" was his word.
Does anyone know if he's right?
I was all excited about Calibre generating an epub from a Word docx file, even the TOC, but someone told me Calibre generates "messy" HTML. "Messy" was his word.
Does anyone know if he's right?
If it works and it looks as expected in an EPUB reader then that's all that matters.
Who cares if the xHTML is "messy". And I say this as in HTML perfectionist.
Who cares if the xHTML is "messy". And I say this as in HTML perfectionist.
This is what he said:
The messy HTML can cause issues with various different apps and devices, and it can get an ePub rejected from different retailers, so you just have to test it on as many readers as possible to find issues. You don't want your book removed from a retailer because someone on an obscure device complained about a weird issue. It's rare, but it happens. Clean HTML is important.
If it works and it looks as expected in an EPUB reader then that's all that matters.
I ran the Calibre conversion from docx to epub on Azotea, and compared it to the Lulu conversion from docx to epub. I found four deficiences, one fatal, in the Calibre conversion after a quick review:
--- although the TOC appears to display correctly in the Calibre conversion, i.e., it shows the separate elements (alsoby, copyright, acknowledgements, epigraph, and chapters), they do not each begin a new page, which in my view they should and in the Lulu conversion they do;
--- the Calibre conversion does not display the title correctly, and it does not display the author's name at all (this is a fatal defect as it stands, but is perhaps correctable by tweaking Calibre settings);
--- the epigraph, a couple of lines from a poem, is not centered properly, i.e., the line breaks seem to flip the text from centered to left-justified;
--- the font (chosen by my reader) is the same in each conversion, but it appears fuzzier in the Calibre conversion.
bb
I found four deficiences,
My experience with Calibre has been very different to yours. I wonder if part of it is to do with the document preparation or the document's base code. I don't know.
I can say when I first investigated doing epubs it was a long time before I came up with a way to make epubs I liked the look of, and Calibre was the only software that did it well that didn't cost me a over a $100 Australian. In the process I had to learn how to make the most of the software and process. Until then just about everything within the text was done by applying formats to the various sections of text, they way i was first taught to do way back in the early days of RTF. However, I learned by using the styles within the word processing software properly the finished epub was a lot better. Things like the style Heading is used for the book title on the title page, Heading 1 for the chapter titles, Heading 2 for the sub-chapter titles, Heading 3 for the section titles, Quotations for the quotes and notes, Preformatted Text for special centred text different to Quotations, Properties Window for the Title, Keywords, and other data needed for some of the metadata fields. I also had to learn how to select the proper options within Calibre for some settings, but now I know what I'm doing it's easy.
I learned by using different Heading styles and how you select them you can change the way the ToC appears as well.
Some of the metadata is taken from the File - Properties Window while some has to be set in Calibre before you convert the file by using the Edit metadata option from the toolbar. Any edit entries appear to override what it draws in, should there be a conflict.
The edit section is where you can enter and amend things like the Title, Author, Tags, Publisher, Published date and the Ids like the ISBN.
I learned by using different Heading styles and how you select them you can change the way the ToC appears as well.
If you need a copy of my CSS Style Defs, I can send you a copy as well, as well as how I define them in WORD so they'll transmit to Calibre correctly.
If you need a copy of my CSS Style Defs, I can send you a copy as well, as well as how I define them in WORD so they'll transmit to Calibre correctly.
They work perfectly now I know how to use the Heading styles to my advantage.
I use Heading 1 for chapter titles, Heading 2 for sub-chapter titles, Heading 3 for section titles, Heading 4 for new story titles in books with more than one story in it, Heading 9 and Heading 10 are the headings I use in the section of titles on the Other Stories by section at the back of the book. These are all set in the word processing document with all the format code I want in them.
For the ToC in the original document I tell it only to find the Heading 1 and Heading 2 for most stories. With the multiple story books I have it set to have Heading 4 then Heading 1 and Heading 2.
With Calibre I usually set it Level 1 as //h:h1 with Level 2 as //h:h2. But for the multi story books I set it as Level 1 as //h:h4 with Level 2 as //h:h1 and Level 3 as //h:h2 - this way I get the story titles in the ToC before the first chapter title in that story. Thus it comes out how I want it to.
typo edit
If you need a copy of my CSS Style Defs, I can send you a copy as well, as well as how I define them in WORD so they'll transmit to Calibre correctly.
They work perfectly now I know how to use the Heading styles to my advantage.
Sorry, that offer was intended for Switch (or anyone else attempting this), especially because I've used epigraphs fairly extensively for a few books.
--- although the TOC appears to display correctly in the Calibre conversion, i.e., it shows the separate elements (alsoby, copyright, acknowledgements, epigraph, and chapters), they do not each begin a new page, which in my view they should and in the Lulu conversion they do;
That's cause you have to define where the TOC finds new chapters (ex: search for "h1" and "h2"). If defined correctly, it'll work, although I also manually add the "next page" commands in word, so the page breaks are already in the WORD document.
--- the Calibre conversion does not display the title correctly, and it does not display the author's name at all (this is a fatal defect as it stands, but is perhaps correctable by tweaking Calibre settings)
That's because those items are manually entered in the "Edit Metadata" section. I'd never trust a conversion program to fill in everything, so I'd always eyeball it after it does. (I also create my metadata data before I ever submit anything to Calibre, just so I know which things to enter when needed: email me if you want to see a sample).
--- the epigraph, a couple of lines from a poem, is not centered properly, i.e., the line breaks seem to flip the text from centered to left-justified;
Here, you can't use line breaks, but instead create a new Epigraph Style which runs the lines next to each other, and a seperate EpigraphAuthor style to space the credit differently than the Epigraph.
--- the font (chosen by my reader) is the same in each conversion, but it appears fuzzier in the Calibre conversion.
You need to embed the font in the Word document before you submit it. As it is, it's taking it's version of the font, rather than yours, which can vary dramatically.
Hope those help your conversion efforts.
An epub is created in XHTML not plain HTML. I don't know about conversion from docx but I convert from odt to epub using calibre. The software is free and it has and option to edit the file where you can look at the XHTML code for each chapter. I just opened one of my epubs that way and the code in it looks fairly straight forward, except the CSS is a lot more complex than the ones I write. Everything in it has a class based on the original text format codes in the documents.
I have had one reader who does XHTML coding on a regular basis tell me the code can be tightened up, but he never said it was excessive or messy.
If you want, I can send you an epub for you to look at in any suitable code examination software, but it'll be easy for you to download the latest copy and try it out.
edit to add: I agree with the post by Lazeez.
If you want, I can send you an epub for you to look at in any suitable code examination software,
I actually looked at (with jEdit) the XHTML Calibre generated from my docx file for my recent novel. I didn't really examine the code, though. I was looking to see what Calibre generated with Word's italics. It was when we were discussing the difference between using "i" and "em".
This guy is trying to sell me on his publishing company. When I told him if I don't get a contract with a Big-5, I'll self-publish. He asked why I wouldn't go with a small indie. I said because they can't do anything for me I can't do myself. When we got into details, such as XHTML, this came up.
I believe he's using a scare technique, but wanted to hear from people more technically qualified than I. Thanks, Lazeez.
This guy is trying to sell me on his publishing company. When I told him if I don't get a contract with a Big-5, I'll self-publish. He asked why I wouldn't go with a small indie. I said because they can't do anything for me I can't do myself. When we got into details, such as XHTML, this came up.
I believe he's using a scare technique, but wanted to hear from people more technically qualified than I. Thanks, Lazeez.
The key with most small publishers are to read the ToS (terms of service) carefully. Generally, you're tied to therm for a specific number of books--meaning, until you give them that many books, you're forbidden from publishing elsewhere. They'll also soak you with fees, require you to buy the books (unusually a minimum of 200 at a time) and they 'own' the version of the story they publish (i.e. they'll charge your around $500 for a copy of the last edited copy you uploaded to them). Most Vanity Presses are really a scam, so it's always buyer beware. Most authors choose that route, because they're terrified by doing everything (editing, design, formatting, etc.) themselves, but you pay dearly for letting them do it for you.
I've yet to meet anyone who didn't regret signing up with a small vanity press.
I've yet to meet anyone who didn't regret signing up with a small vanity press.
There's no indication (YET) that it's a Vanity publisher.
There's no indication (YET) that it's a Vanity publisher.
No, but most 'small publishers' fit into that category. My brother was picked up by an older, 'established' small publisher, and they pulled the exact same crap on him (3 book contract, they own the book submitted to them, plus the edits their people did, plus the design, artwork, etc., and he does ALL the marketing, sales and promotions, while receiving a pittance ($.50) for every $20 book sold!
they own the book submitted to them, plus the edits their people did, plus the design, artwork, etc.
That's true for any publisher. If the publisher creates the artwork, they own the cover. If you sign a contract for them to publish it, they don't own the book (copyright), but you can't publish it anywhere else. After all, they put up all the money.
A Vanity publisher isn't trying to sell your book. They charge the author for services. That's how they make their money.
A Vanity publisher isn't trying to sell your book. They charge the author for services. That's how they make their money.
The fact is, you need to sell around 100 books (from a vanity or traditional publisher) for every 1 book you sell independently. Without an extensive marketing campaign, you don't gain anything by making that deal. And even with the marketing, authors who have been through the process say it takes until the 6th or 7th traditionally published book before they actually begin earning anything that would support them, so just being "published" isn't enough.
That said, most successful book sales are tied into contacts, rather than simply being 'good books'. Either they have a connection (like a doctor promotes a book among trade shows because it promotes a cause they care about, or the New York Times reports on the book, generating immediate interest).
Just as a note, the html code from WORD (doc or docX) will be messier than that from Calibre itself, as they embed WORD's internal coding (marking foreign languages, suspected syntax, punctuation errors, etc.). Part of the learning curve is learning how to turn off each of WORD's functions in their generated html for inclusion on websites and in epubs.
In all, calibre's code is pretty good, but I tend to edit it because they force Amazon specific language (such as single character spacing for any and all indents, which have to be turned off individually (i.e. the code removed on each command)).
One thing I like about the Calibre design team is they're very responsive to user requests. Sometime back I was creating a lot of epubs at once and I didn't like having to to go through the language dropdown box list to find Enlgish all the time. So I sent in a request for them to have the last selection used show as the default until changed. Two weeks later it's one fo the new features for a couple of the dropdown selection boxes. They're almost as responsive as Lazeez is.
All the epubs I create using Calibre pass testing with
validator.idpf.org
The only time I've had them rejected by the testers supplied by Amazon and Apple to Lulu to check epubs before they'll accept my free epub books for their sites have been when I created an epub and immediately went to upload it to Lulu - when the partnership tester was run it sometimes kicks up at me posting a file with a creation date from the future because my timezone is ahead of the USA. When I wait twenty hours it accepts the file because it's no longer a post from the future. I complained about that at the time. I think they adjusted the tester because I've not had one kick for over six months, despite sometimes uploading within minutes of creating the file.
I know my epub readers use a variety of devices due to the emails they've sent me. However, I did have one reader ask me to create a mobi of the story he wanted, which I did with Calibre and sent him the file. He said it worked perfectly.
All the epubs I create using Calibre pass testing with
validator.idpf.org
The docx to epub conversion I tried using Calibre passed the validator. EDIT: Apparently the defective author and title display was not fatal.
EDIT AGAIN: Amusingly, the Lulu conversions fail the validator, although they are accepted by Amazon, iBook Store, et al. The validator gives this error:
Irregular DOCTYPE: found '-//W3C//DTD XHTML 1.0 Transitional//EN', expected ''.
bb
Irregular DOCTYPE: found '-//W3C//DTD XHTML 1.0 Transitional//EN', expected ''.
That means you need to declare what language the document is in, something that isn't need most sites, since they'll give that in the story posting (i.e. they'll only list English language books on any given site/page).
That means you need to declare what language the document is in, something that isn't need most sites, since they'll give that in the story posting (i.e. they'll only list English language books on any given site/page).
Or, of course, I could continue to use Lulu's conversion engine which displays everything the way I want it (because I've defined the Word styles Lulu accepts), and not worry about manually tweaking Calibre. (!)
That it doesn't pass the validator doesn't matter to me as long as it's accepted by the big guys, starting with the one with the name that begins with A, and I don't mean Alibaba.
Thanks for your comments, and for Ernest's.
bb
One advantage I find with using Calibre is when someone emails and wants a MOBI version or one of the other options available it's only a few seconds work to create it for them because I've already got it set up right.
One advantage I find with using Calibre is when someone emails and wants a MOBI version or one of the other options available it's only a few seconds work to create it for them because I've already got it set up right.
In that case, it's easier loading an existing book into calibre and then coverting it, as opposed to creating the ebook from scratch. In my case, I prefer using calibre because I like the control it gives me over the final product, since I upload the basic html files, rather than allow calibre's automated procedures to do it for me. That offers more ebook functionality than converting the book does.
In that case, it's easier loading an existing book into calibre and then coverting it
The Input File is set at ODT with the Output File set at EPUB. I then use the Convert Books toolbar icon in Calibre to create the EPUB. When someone wants a MOBI file I open the same file in Calibre and change the Output File to MOBI and hit the Convert Books icon again. Job done. Very easy.
The Input File is set at ODT with the Output File set at EPUB. I then use the Convert Books toolbar icon in Calibre to create the EPUB. When someone wants a MOBI file I open the same file in Calibre and change the Output File to MOBI and hit the Convert Books icon again. Job done. Very easy.
My point was that, if you only want the mobi file, it's easier doing the straight conversion. From my point of view, I'd rather create my own epub, since I already have the clean html files from my website.
gives me over the final product, since I upload the basic html files
That's what I did with my first novel. I spent the time manually converting my Word doc to XHTML/CSS. That meant converting italics to "i", the ellipsis and em-dash to their HTML codes, etc.
The problem was, when I was done the master copy of my novel was XHTML/CMS. If I had to make a change to the novel I had to do it in the .html file which wasn't as easy as using a word processor. In fact, a lot harder.
If I went back to my original Word doc and made the change there, I would then have to go through the manual task of converting it to HTML all over again. Not a fun task.
If I remember right, the only problem I had with inputting a docx file into Calibre was a blank page. I removed the page break before each new chapter and that problem went away.
The problem was, when I was done the master copy of my novel was XHTML/CMS. If I had to make a change to the novel I had to do it in the .html file which wasn't as easy as using a word processor. In fact, a lot harder.
If I went back to my original Word doc and made the change there, I would then have to go through the manual task of converting it to HTML all over again. Not a fun task.
Again, we all encounter that, but it's due to WORD's crappy html conversion dumping a bunch of WORD specific commands into the html. Once you train WORD to NOT dump crap into your files, you'll be set.
Also, you don't manually tag < i>< /i> tags, instead you use the "Save as ... web page, filtered" option, which reduces the entire file to a single html file. Then it'll convert all your formatting into the appropriate html tags.
Also, you don't manually tag < i>< /i> tags, instead you use the "Save as ... web page, filtered" option,
But that does create crappy HTML. I prefer to let Calibre convert it.
But that does create crappy HTML. I prefer to let Calibre convert it.
Again (how many time do I have to repeat this?), you've got to turn off the setup flags in word to turn OFF those 'dump word content' into the html files. Once you do, WORD will produce reasonable html files.
As far as Calibre doing the conversion, the conversation started when someone stated that Calibre wasn't doing a terrific job of it either.
However, we each have out ways of getting to the finish line. Since I prefer coding specifically for ebooks, I prefer creating the ebooks by hand before submitting it to Calibre, as the editing feature in Calibre stinks!
As far as Calibre doing the conversion, the conversation started when someone stated that Calibre wasn't doing a terrific job of it either.
That was me. Why I started this thread. I didn't know if the guy representing the publisher was using a scare technique on me. It was before I told him I was an old IT guy. Lazeez set my mind at peace.
That was me. Why I started this thread. I didn't know if the guy representing the publisher was using a scare technique on me. It was before I told him I was an old IT guy. Lazeez set my mind at peace.
I tried the .docX Calibre conversions, once, but since I like monkeying around with the code (outside of Calibre), it's easier for me to construct the pieces by hand. Because it's so difficult assembling the different components, I add each new change to any book to each source. It's extra work, but you don't miss any steps.
since I like monkeying around with the code (outside of Calibre), it's easier for me to construct the pieces by hand
But there's so little formatting done for an ebook. What kind of "monkeying" do you do?
But there's so little formatting done for an ebook. What kind of "monkeying" do you do?
width=xx%, changing the default indent space from 1 to, say, 5 spaces, defining H1 titles for graphic chapter titles, including Alt="audio titles". There's no limit to the things you can do with ebooks.
1st question: is the result of 'monkeying' with ebook files uniform in most standard (if there _is_ a standard) ebook readers?
2nd question: do y'all commonly apply custom CSS to your ebook files? Again, do ebook readers uniformly support that approach?
1st question: is the result of 'monkeying' with ebook files uniform in most standard (if there is a standard) ebook readers?
The ebook 'standard' is pretty consistent (i.e. not much varies between them except Apple requires a separate command to center lines (a separate < scan center> line. Otherwise, the ebooks format adjusts for each different device.
2nd question: do y'all commonly apply custom CSS to your ebook files? Again, do ebook readers uniformly support that approach?
I pretty much just use my standard html (from my website) to my ebooks with a few minute adjustments, though without as much content since it's for only a single book (i.e. doesn't apply to every story I write).
if there _is_ a standard
There is standard, but it's kind of like the standard for people walking down the street. There's a very basic concept and everyone does just about anything they want.
I've had the e-pubs I create viewed in four different e-book readers and used three different e-book reader programs on my computer and had seven different results from viewing the same file which is according to the relevant committee's standard and passes their tests.
The problem is the standard was written by the companies that already had e-book readers out on the market for a few years when the standard was written. None of them wanted to change the software or code in their readers, so the standard was written to incorporate all the existing systems, despite them being incompatible. The result was the standard was reduced to the few things that was uniform across the readers.
The end result is obvious in how they display the stories I write. I use three levels of headers, red text, blue text, black text, two different sizes of text, bold, italics, and mix them up to show different types of textual content. There is one e-book computer reader which displays the e-pub in the same colours and sizes I wrote them in, but the others will strip out some of the formating code so the story loses colours, font text size changes, positioning on the page (centering is stripped), etc. Not all strip the same code. One may reduce the heading sizes and leave the colours, while another will strip the colours, and another will strip them all. Yet all the readers are working within the standard.
Thanks, DE
Perhaps a new "Calibre" thread would be useful, but I'll make a brief comment here:
My 'master' document is always a text (UTF-8) file, with no _binary_ formatting. I use Markdown codes for italic, bold, blockquote, poetry lines, etc.
Calibre has an excellent conversion system that includes Markdown-coded text files which will output to any of the Calibre formats. To enable this, go to preferences/input options/TXT Input and select 'markdown' in the formatting style dropdown menu.
The advantage, of course, is the ability to maintain one simple master document, that can output to a multitude of formats (HTML, PDF, EPUB, etc). This is also a brilliant way to input directly to SOL using the 'md.txt' suffix.
I realize many prefer WORD and tightly-formatted granular control of their manuscript, but for those who accept more generalized output options, this method is simple, easy, and fast. Calibre is a superb tool.
TIP: I keep my Calibre library in DropBox; thus it is universally available between my desktop and laptops.
I realize many prefer WORD and tightly-formatted granular control of their manuscript, but for those who accept more generalized output options, this method is simple, easy, and fast. Calibre is a superb tool.
I prefer word processors which support Style Definitions, as it makes formatting print documents more straightforward, which are normally stripped out of most ebooks.
Style definitions: cool.
Styles lend themselves to structure; they can form a 'template' into which basic .doc or .odt content can be 'poured' and any desired layout obtained. I use a LibreOffice template to produce a particular manuscript draft output with rigid margins, header, footer, double spacing, indents, etc that duplicates a publisher-mandated manuscript layout.
If I pour that original .odt content into another LO template, I can get the 6 x 9-inch book format for print output. Styles and templates are the secret of intelligent output control; reusable and flexible.
BUT, for the manuscript basic building block: the master.md.txt file is never obsolete; it is the prime source document.
Calibre: the 'rosetta stone' & library vault.
Calibre conversions enable:
master.md.txt ==> HTML + CSS = flexible web designs
master.md.txt ==> ODT + templates =flexible page designs
master.md.txt ==> EPUB + CSS = flexible ebook design
There are numerous 'tools' for markdown text output, typically most common in Linux. Best known: pandoc, by the guy who 'invented' markdown. Also for Windows, a nice editor/converter: markdown pad. For linux, 'ReText'. For OS-X, the linux stuff plus TextMate, and others. And various iOS and Android apps.
What nobody seems to understand: the manuscript draft is all about 'structure'; that is the basis of an HTML document, which carries straight through to ePUB (xHTML). Flexible and universal.
Word processor content (Word, LibreOffice, etc.) have little to do with structure; content is based on binary formats for appearance and printing. Rigid and inflexible; difficult to modify for other outputs.
There is standard, but it's kind of like the standard for people walking down the street.
Sounds like an opportunity for a story. How do "they"
enforce that walking down the street standard? Something sexual, I hope. Maybe like Naked in School, violators lose their clothing for a week? Or get violated? Reasonable requests? Time to bring NiS to the sidewalk.
How do "they"
enforce that walking down the street standard?
Don't know what it was like in other countries, but way back when in New South Wales, Australia the three big cities covered by the NSW Metropolitan Traffic Act had sections in the law about walking down the street, and the Central Business District had a line painted down the middle of the sidewalk and you had to stay on the left side of the line or you got booked by the cops. There were also restrictions on where you could stand still on the walkway to chat. Much of it was the same as for motor traffic, but covered pedestrians.
The last time I checked, about a decade ago, those laws were still in force but it's been decades since any cop booked anyone for them for anything other than jaywalking across the roadway.
The end result is obvious in how they display the stories I write.
The fact of the matter, is that the majority of the code to display ebooks (at least for epubs) is taken directly from browser code to handle html, thus it's fairly standard, even though everyone implements it slightly differently. However, the variants that Ernest notes are mainly in the older generation devices, which aren't as widely used as they used to be. Nowadays, it's mostly epub (based on HTML5) and Amazon, which is so large they dictate how everyone else (like Calibre and Sigil) formats their documents.