Home « Forum « Author Hangout

Forum: Author Hangout

calibre

Switch Blayde
Updated:

My wife doesn't want to beta read my novel on the computer (reading a Word doc). She wants to read it on her iPad. I don't want to spend the time converting the doc file to XHTML to input into Calibre for a not-final version.

I noticed you can input a docx file into Calibre. I don't use docx (choosing doc instead), but I can save my doc file to docx and feed it into Calibre.

Has anyone had experience doing that?

Replies:   John Demille
Ernest Bywater

I use Calibre to create -epub and MOBI files from .odt files all the time It accepts FB2 (whatever that is), PDF, RTF, TXT, Comic, DOCX, and ODT.

I just created a e-pub from a PDF and the text came through OK, but the line spacing and some of the heading settings weren't as per the original. Never tried DOCX, and not sure if a Libre Office conversion of .odt to .docx would come out the same. But I did that and the e-pub came out OK.

Replies:   Switch Blayde
John Demille

@Switch Blayde

Why don't you send her the .doc file directly? She can open it using Pages.

Replies:   Switch Blayde
Switch Blayde

@John Demille

Why don't you send her the .doc file directly? She can open it using Pages.


Her iPad is old and she's not sure it can handle a file that big (84,000 words). Her Mac is new, but she doesn't want to read it on her laptop.

Maybe I'll just try sending it to her and see what happens.

Switch Blayde

@Ernest Bywater

I just created a e-pub from a PDF and the text came through OK, but the line spacing and some of the heading settings weren't as per the original.


I'm not looking for it to be perfect. Not at this stage. Maybe I'll just try it after I finish my edits.

QM

I write all my stuff in docx, Calibre works fine with it.

Crumbly Writer

@QM

I write all my stuff in docx, Calibre works fine with it.

Every single time I open Calibre (it seems) they offer a new upgrade. I've rarely ever seen any 'new features', so I'd assume they're quick to make minor adjustments to adapt to new filetypes (rather than waiting for a 'major release' of the software). Besides, the .docx has been in widespread use for over six years now. Any software product that doesn't support it is probably considered 'Abandonedware'.

Replies:   Ernest Bywater
Switch Blayde

@QM

I write all my stuff in docx, Calibre works fine with it.


Thanks. When the time comes I'll save my .doc as .docx and input it into Calibre to create the ebook to send to my wife's iPad.

awnlee jawking

@Switch Blayde

Why don't you try inputting your .doc file directly into Calibre?

AJ

Replies:   Switch Blayde
Ernest Bywater

@Crumbly Writer

I'd assume they're quick to make minor adjustments to adapt to new filetypes


every Friday they put out an update fine tuning the operation or adding some minor improvement. The number of input and output options have increased since I started, as well as a few others. They do have documents to tell you what all the changes relate to.

Switch Blayde

@awnlee jawking

Why don't you try inputting your .doc file directly into Calibre?


The documentation says "docx on Windows 7 or later."

Crumbly Writer

@Switch Blayde

When the time comes I'll save my .doc as .docx and input it into Calibre to create the ebook to send to my wife's iPad.

Unnecessary, as Calibre handles both the older .doc and the newer .docx formats, as well as many others. That was my point about their frequent updates, they don't hold off on adjustments until the next major release like certain others (cough, cough, MicroShit).

Just as a test, since I've never submitted a WORD document to Calibre before, I tried submitting one of my newer ones (which is partially formatted). I got a list of over 100 error violations, but it at least tried to process it. Also, I've got two older epub documents which are listed as being in .doc format (I'm not sure whether they were submitted in .doc format, or that's an output format.) By the way, most of the errors consisted of what wasn't included ("No author", "cover file: False", etc.).

@Ernest Bywater

They do have documents to tell you what all the changes relate to.

For the most part, the summary of changes rarely describes the actual modifications ("minor performance enhancements" being a typical explanation).

Replies:   Switch Blayde
Switch Blayde

@Crumbly Writer

Just as a test, since I've never submitted a WORD document to Calibre before, I tried submitting one of my newer ones (which is partially formatted). I got a list of over 100 error violations,


From the FAQ in the Calibre manual: https://manual.calibre-ebook.com/faq.html

Input Formats: AZW, AZW3, AZW4, CBZ, CBR, CBC, CHM, DJVU, DOCX, EPUB, FB2, HTML, HTMLZ, LIT, LRF, MOBI, ODT, PDF, PRC, PDB, PML, RB, RTF, SNB, TCR, TXT, TXTZ


DOC is not listed.

And from the Calibre User Manual: https://manual.calibre-ebook.com/conversion.html

Convert Microsoft Word documents

calibre can automatically convert .docx files created by Microsoft Word 2007 and newer. Just add the file to calibre and click convert (make sure you are running the latest version of calibre as support for .docx files is very new).


Again, it specifically says DOCX

Replies:   Crumbly Writer
Crumbly Writer

@Switch Blayde

Again, it specifically says DOCX

I stand corrected, although Calibre doesn't flat out reject .doc submissions, it appears they depend on various data fields which only the .docx format supplies.

Replies:   awnlee jawking
Crumbly Writer

I've confirmed it. Using an older book, before I started formatting specifically for epubs, I submitted both the original .doc and a newly created .docx of the same document. One succeeded and one failed with a long list of non-included fields.

Calibre will not accept .doc submissions as an input source.

awnlee jawking

@Crumbly Writer

it appears they depend on various data fields which only the .docx format supplies


That strikes me as very bad software design because those fields surely won't be present in .txt files. They've created an unnecessary restriction.

AJ

Replies:   Crumbly Writer
Crumbly Writer

@awnlee jawking

That strikes me as very bad software design because those fields surely won't be present in .txt files. They've created an unnecessary restriction.

In that case, you'd build an ebook the traditional way, via html. The source restriction is because they just added the feature to accept .docx files, based (I assume) on their already having the required fields. When I imported my Word document, it filled in the author name, title and several other fields. Text files, on the other hand, aren't Style Based, which ebooks are, thus they wouldn't be able to create a table of contents, a necessary component.

Still, it's questionable that all a .doc file needs is to save it to a new format. You'd think they'd code their file processor to add the files themselves, since WORD does just that.

Ernest Bywater

this link shows all the release fix and upgrade info - docx capability was in version 1.0

http://calibre-ebook.com/whats-new

Considering the files with the .doc extension can have four, and some argue five, different sets of format instructions over the years it's no wonder they decided not to worry about it when they created Calibre in 2012/2013 since .docx had been in since 2007.

In case you wonder about the formats - Word for DOS, Word for Windows, Word 95, Word 97 - 2003 - are the four and in the Word for Windows group some argue Word 6 was enough different to Word 5 to be listed as a separate version with significant incompatibilities.

Replies:   Crumbly Writer
Crumbly Writer

@Ernest Bywater

Considering the files with the .doc extension can have four, and some argue five, different sets of format instructions over the years it's no wonder they decided not to worry about it when they created Calibre in 2012/2013 since .docx had been in since 2007.

Except, as I noted, the ability to import .doc files is already in the current version of Calibre. The issue is, they're unable to process the input file without the additional fields that .docx maintains internally. It has nothing to do with 'issues' inherent in the .doc format, at least from what I can see without examining their internal coding. However, the supposition they were avoiding problematic file structures is base supposition.

I'm guessing they couldn't process the .doc file without duplicating WORD's file conversion mechanism, and given the complication and risk of mishandling or M$'s future changes, decided it wasn't worth the risk.

However, it's a simple work around, which I'm sure even OpenOffice for the many word processor alternatives can easily handle.

Replies:   Ernest Bywater
Ernest Bywater
Updated:

@Crumbly Writer


the ability to import .doc files is already in the current version of Calibre


I've not tried to import a .doc file, but I do note none of the help files on Conversion Import Options have .doc files listed. See this link for details of what they do list:

https://manual.calibre-ebook.com/generated/en/ebook-convert.html

at this help file it says:

https://manual.calibre-ebook.com/conversion.html#convert-microsoft-word-documents

Convert Microsoft Word documents

calibre can automatically convert .docx files created by Microsoft Word 2007 and newer. Just add the file to calibre and click convert (make sure you are running the latest version of calibre as support for .docx files is very new).

Older .doc files

For older .doc files, you can save the document as HTML with Microsoft Word and then convert the resulting HTML file with calibre. When saving as HTML, be sure to use the "Save as Web Page, Filtered" option as this will produce clean HTML that will convert well. Note that Word produces really messy HTML, converting it can take a long time, so be patient. If you have a newer version of Word available, you can directly save it as docx as well.

Another alternative is to use the free OpenOffice. Open your .doc file in OpenOffice and save it in OpenOffice's format .odt. calibre can directly convert .odt files.

.......................

Which makes it clear it does not handle or process .doc files directly.

edit to add: They clearly chose not to handle .doc files, and probably due to the problems with it having so many different internal format codes.

Switch Blayde

I combined my 40 files (one for each chapter) into a single Word doc, made the chapter headings "heading 1", saved it as docx, and ran it through Calibre.

It was so easy that I'm not sure why I need to spend the time creating the XHTML to input into Calibre. It even gave me a TOC.

The only thing I noticed was sometimes there was a blank page between the end of a chapter and the beginning of the next one.

But I'm impressed.

docholladay

@Switch Blayde

The empty page could be an unseen page break code.

Replies:   Switch Blayde
Switch Blayde

@docholladay

The empty page could be an unseen page break code.


Aha, I put a page break between chapters. I guess it messed up Calibre (all it needed was the chapter headings defined as Header 1).

Replies:   docholladay
Crumbly Writer

@Switch Blayde

The only thing I noticed was sometimes there was a blank page between the end of a chapter and the beginning of the next one.

It might be easier editing that one page out, rather than coding the entire html document. It does sound much easier.

In my case, I like getting fancy. Thus I include chapter header graphics, which require < h1="chapter title"> commands, as well as "size=xx%" commands, which limit my approach, but it's SUCH a pain creating and maintaining the epub files, it's hardly worth bothering--especially if the epubs aren't that popular to begin with.

Replies:   graybyrd  Ernest Bywater
graybyrd

@Crumbly Writer

especially if the epubs aren't that popular to begin with.


Tell me you're joking. You are _joking,_ right?

Replies:   Crumbly Writer
docholladay

@Switch Blayde

Then again a blank page at the end of chapters gives your proof readers and editors a perfect place to insert their notes about the chapter. This page can probably be removed simply in the editing process for the published version of the story.

Ernest Bywater

@Crumbly Writer

especially if the epubs aren't that popular to begin with.


I don't know. In the last four years I've only sold 3,500 e-pubs as against 4 MOBI files I've been asked for.

Replies:   graybyrd
graybyrd

@Ernest Bywater

It's worth noting (as a reminder) that the ePub format is open-source, and has been universally adopted across all platforms and most publishing venues. The _notable_ exception is Amazon, and their proprietary MOBI format which is both closed source, and used in virtually no other publishing venue than Amazon.

Another point to note: "The EPUB format is the most widely supported vendor-independent XML-based (as opposed to PDF) e-book format; that is, it is supported by the largest number of e-Readers, including Amazon Kindle Fire (but not standard Kindle)." --en.wikipedia

So even Amazon finally recognizes ePub. I'd expect one day that MOBI may joined the exalted ranks of LIT, PDB, and DOC in obscurity, to be converted to something useful by the likes of Calibre.

Replies:   awnlee jawking
awnlee jawking

@graybyrd

The writers' group I belong to has standardised on .doc and .txt formats for the exchange of documents. So I'm surprised .doc seems to be so deprecated.

AJ

REP

@awnlee jawking

I've found that manufacturer's tend to sell the 'new' by claiming it corrects the flaws in the 'old'. People hop onto the bandwagon and start waving 'the old is bad' flag.

What people don't think of is the 'new' has its problems, and those problems are not yet apparent.

Switch Blayde

@awnlee jawking

The writers' group I belong to has standardised on .doc and .txt formats for the exchange of documents. So I'm surprised .doc seems to be so deprecated.


That's why I use .doc. At the time I went to Windows 7, the publishers wanted .doc files for manuscript submissions. Also, when I attached a .docx file to an email to a friend they often couldn't open it.

It's been years so I wonder if times have changed and .docx is the new standard?

Ernest Bywater

@awnlee jawking

So I'm surprised .doc seems to be so deprecated.


Part of the issue is the major difficulties with conversions due to the .doc format having four sets of format codes for it, and also due to it being proprietary to Microsoft.

Ernest Bywater

@Switch Blayde


It's been years so I wonder if times have changed and .docx is the new standard?


As far as Microsoft is concerned, yes, .docx is the standard for Microsoft software, and has been since 2007. According to Microsoft no one is still using any version of MS Word prior to Word 2007.

Replies:   awnlee jawking
awnlee jawking

@Ernest Bywater

According to Microsoft no one is still using any version of MS Word prior to Word 2007.


What they're trying to do is get everyone on a subscription based version of Office so they've got a secure revenue stream.

AJ

awnlee jawking

@Switch Blayde

Also, when I attached a .docx file to an email to a friend they often couldn't open it.


Microsoft produced a converter so people with Word 2003 could read .docx documents. Even if you could get the thing to work, the output was often garbled. :(

AJ

Replies:   Ernest Bywater
Ernest Bywater

@awnlee jawking


Microsoft produced a converter so people with Word 2003 could read .docx documents. Even if you could get the thing to work, the output was often garbled. :(


And was no good for older versions of Word in any way.

graybyrd
Updated:

@awnlee jawking


The writers' group I belong to has standardised on .doc and .txt formats for the exchange of documents. So I'm surprised .doc seems to be so deprecated.


Better to use opendoc (.odt, odf) and/or .txt for exchange. Could also use .rtf which has been accepted as the most universal of the MS formats (although .rtf also has been updated over its history from ver. 1 through 1.91) Some systems and writers' apps use .rtf as a base format, including Scrivener. As SOL editors, both TeNderLoin and I use .odt in LibreOffice as the preferred document edit/change format.

As for .docx, when someone sends a .docx document to me, I reject it and request .odt or .rtf. I'll not pay the MS tax to open a MS bloat. My last version of Word is 5.1a for Mac OS-9, which I've retired.

Replies:   Switch Blayde
Switch Blayde

@graybyrd

As for .docx, when someone sends a .docx document to me, I reject it and request .odt or .rtf. I'll not pay the MS tax to open a MS bloat.


You can open a .doc file in Pages.

Crumbly Writer

@graybyrd

Tell me you're joking. You are _joking,_ right?

When I was publishing on Smashwords, my users could download to whatever format they wanted, a wonderful feature. However, now that I'm stuck on lulu, I'm not distributing many of their epub files, so the majority of my sales are via Amazon's Kindle format.

You're right, epub is a popular format, mainly because it can be read by anyone, on any device. I'm debating forgoing using lulu on my next book--since my sales there have been marginal--and instead offering it to my readers directly, once again in their choice of formats. If I do that, I'll have a better idea of how many use epubs vs. the other formats.

@Ernest

In the last four years I've only sold 3,500 e-pubs as against 4 MOBI files I've been asked for.

That's because MOBI is an outdated and rarely used, older Kindle format. It's unpopular, because it doesn't work as well on most newer Amazon devices or in the newer Amazon reader apps, not because there are fewer Amazon readers!

All that said, before I switched from smashwords, the majority of my sales were via smashwords and NOT Amazon.

@graybyrd

So even Amazon finally recognizes ePub. I'd expect one day that MOBI may joined the exalted ranks of LIT, PDB, and DOC in obscurity, to be converted to something useful by the likes of Calibre.

It's already joined those others, as Amazon now only marginally supports it (no upgrades). Unfortunately, they've never offered any MOBI to their newer format converters.

@Switch

It's been years so I wonder if times have changed and .docx is the new standard?

Not for readers, but for communicating work--especially for those who used to work in office environments (i.e. most publishers and editors), it is, since that's how most submissions arrive.

For readers, as Awnlee pointed out, .docx is a pain because you have to pay a monthly fee simply to read it (though dropbox allows you to read them via their website for free).

@Ernest

Part of the issue is the major difficulties with conversions due to the .doc format having four sets of format codes for it, and also due to it being proprietary to Microsoft.

The first part is mitigated by switching to the newer .docx, and it's seen as a 'standard' now mainly because most word processors (including Google Docs) support exporting in that format, even if users don't own a copy of WORD or OFFICE. Unfortunately, most younger authors are moving away from word processors entirely, going with the newer generation of 'stripped down' writing tools, so it'll slowly be phased out over time.

@graybyrd

Better to use opendoc (.odt, odf) and/or .txt for exchange.

Unfortunately, it's harder to edit or flag text in those formats. Text doesn't show formatting changes, or color code changes, and only a few apps allow you to edit .rtf in any form. As for OpenOffice/Opendoc, since it's not as widely used, it's still easier to export in .docx, rather than expecting everyone you communicate with to install a learn an entirely new product. Again, most word processors accept the format. The newer writing tools the 'younger kids' are all using, generally don't export to other apps (except as .docx).

As for .docx, when someone sends a .docx document to me, I reject it and request .odt or .rtf. I'll not pay the MS tax to open a MS bloat. My last version of Word is 5.1a for Mac OS-9, which I've retired.

I may be mistaken, but I'm pretty sure that OpenOffice and the other tools also open .docx files, and convert them to your local formats for you, so there's no need to purchase anything (though it's a pain-in-the-neck to convert from and to different formats).

Generally, I open edited documents in a separate window, correcting my source documents an item at a time, so it's not an issue which format a file is in, as I can do that from a WORD or OpenOffice document easily enough.

Replies:   graybyrd  Capt Zapp
graybyrd

@Crumbly Writer

I s'pose my primary objection is a perceived "arrogance" that Windows is the prima facie backbone of everything "computer," thus many users never think when sending a file over their distribution list: DOCX !

It's a pain to convert, and too many times it's an imperfect conversion. I've been pretty firm when pointing out that Mac and Linux users have no obligation to be forced to submit to MS formats. So... please export that Word file as .rtf or .doc format. Just look up to the "Save As" menu... which, I regret to say, is sometimes a totally foreign experience for that person.

Replies:   Crumbly Writer
Crumbly Writer

@graybyrd

I s'pose my primary objection is a perceived "arrogance" that Windows is the prima facie backbone of everything "computer," thus many users never think when sending a file over their distribution list: DOCX !

I agree, and I DESPISE being forced to continually reinforce the MicroShit standard, but I use it because it makes communication easier.

That said, .docx is NOT a windows specific format. I use both WORD and OpenOffice on my Mac as well as on my PC (though they function differently on the two platforms). I'm also NOT recommending that anyone go out and purchase WORD! Far from it, I'd stay as far from the product as you can get. I'm just saying, stick with formats that everyone can access, and edit, and can send. As soon as we find a way to do that with .doc and .rtf files, I'll switch over to them. Although, I'll admit, I prefer the 'Review' function inherent in both M$ Office and OpenOffice for working with editors.

Again, my issue with the .rtf format is that, except for the use of certain word processors, many users can't edit them (with their desired software). Without WORD, I generally can't edit an rtf someone sends me, and often end up cutting and pasting the text from a read-only document into a .doc or .docx format simply to edit it.

That's also why I emphasized that I generally Don't directly work with edited files, because you can't always trust the results. Working with two separate open windows allows you to move proposed changes from one document to another. It's more work, and more overhead, but you end up with fewer errors using that approach.

Note: The newer generation of 'no frills' author tools available on most smart devices don't support the reading, editing and export of .rtf and .doc formats, while they continue to support the export (at least) of .docx. That was my only reason for continually harping on using the .doc/.docx standard.

Capt Zapp

@Crumbly Writer

OpenOffice and the other tools also open .docx files,


OpenOffice can OPEN .docx files but cannot SAVE AS .docx

Replies:   Crumbly Writer
Crumbly Writer
Updated:

@Capt Zapp

Docx doesn't offer any additional functionality, and the older .doc format is more universally accepted than the newer format.

Correction: The only new feature, in all of Office 2007, is the ability to add small caps for chapter titles. That's hardly a feature worthy of paying out $30 to $100 a month for (Ernest's dire warnings of coding impurities, notwithstanding)!

Replies:   Switch Blayde
Switch Blayde

@Crumbly Writer

That's why I always work in .doc files. Docx doesn't offer any additional functionality, and the older .doc format is more universally accepted than the newer format.


That's what I wanted to hear. I will continue to use .doc. The only need I have for .docx is to input to Calibre.

Back to Top