Home ยป Forum ยป Author Hangout

Forum: Author Hangout

Mac WORD html Generation Question

Crumbly Writer ๐Ÿšซ

I've just discovered that my Mac WORD subscription has been 'defaulting' to charset="windows-1252" when converting to html. It used to be that you could easily rest the default behavior when saving the file, but that option no longer seems to be valid. What's more, all the 'reset default html charset for Mac WORD' involve incredibly complicated changes to the underlying WORD coding (something I don't feel qualified to perform myself).

Does ANYONE know how to get my various Macs to do what they've always done in the past?

Note: For now, I've been doing a 'roundabout' conversion by physically copying the html code from the windows-1252 files to my older utf-8 that I require for publishing in a specialty html editing program (Adobe DreamWeaver). I know that that's an unsupported, stupid-assed approach, but for now, it appears to be working (as I've done it in the past, and the books all passed muster when I've submitted them to everyone except lulu.com.

Lazeez Jiddan (Webmaster)

@Crumbly Writer

I know that that's an unsupported, stupid-assed approach, but for now, it appears to be working (as I've done it in the past, and the books all passed muster when I've submitted them to everyone except

Download the free version of BBEdit from Barebones.com and use it to convert the character encoding.

What you're doing now usually ends up with misencoded characters that show up at the most inopportune time, like ยฉ marks messed up.

It's simple to use. Just open the Word-generated html file in bbedit and from the bottom of the window, click on the file encoding and select UTF-8 and save the file.

Replies:   Crumbly Writer
Crumbly Writer ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

Thanks. I knew that I was asking for trouble, but wasn't sure how to fix it.

By the way, does anyone know WTF happened to WORD for Mac, as to why it no longer allows you to selection "Options" under its "Save As ..." command. It seems that a change that significant would have generated a LOT of online comments (i.e. bitchin')!

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ

@Crumbly Writer


By the way, does anyone know WTF happened to WORD for Mac, as to why it no longer allows you to selection "Options" under its "Save As ..." command. It seems that a change that significant would have generated a LOT of online comments (i.e. bitchin')!

The level of complaint is directly proportional to the number of people who use who also notice the change. I'd say this is one case with a low number of people involved. I know in the past when MS and Apple make announcements about cutting back on anything they got a lot of complaint, so they both started not making such announcements and found few people complained as few realised what had actually happened.

Replies:   Crumbly Writer
Crumbly Writer ๐Ÿšซ

@Ernest Bywater

But, in this case, you go from creating a functional ePub (based on Word's html output) to creating entirely non-functional ebooks. That's a pretty dramatic difference, and one that's hard to miss, once the bitching about users being 'ripped off' by the author start to accumulate.

And while it's true that not many actually create their own ePubs from scratch, like you and I do, I'd image the 'bug' applies whether the underlying html production if specific (i.e. "Save As ... utf-8 html file") or generic (i.e. submitting a Word document as Calibre using the underlying WORD coding to generate the necessary html code).

And finally, while Lazeez's solution works, it's not an elegant 'it just works' solution. It's a hack and patch alternative, requiring an extra, unnecessary step, just to continue doing what we've always done successfully before.

Generally, you don't see that level of screw up until someone decides that they like super-thin keyboards much more than they like functioning keyboards, and then they don't do jack squat about it for five years as users continually bitch about it, demanding extensive and expensive repairs all because the designers 'know better' than everyone else! ** Flame Off! **

Ernest Bywater ๐Ÿšซ

@Crumbly Writer

But, in this case, you go from creating a functional ePub (based on Word's html output) to creating entirely non-functional ebooks. That's a pretty dramatic difference, and one that's hard to miss, once the bitching about users being 'ripped off' by the author start to accumulate.

While I acknowledge that your comments about how serious it is in the creation of e-pubs by MS Word I'd be willing to bet the number of people who us MS Word to create E-pubs is so low that it doesn't show on any of the MS user database counts, and thus isn't something they'd be concerned about.

Switch Blayde ๐Ÿšซ

@Crumbly Writer

But, in this case, you go from creating a functional ePub (based on Word's html output) to creating entirely non-functional ebooks.

Do what I do. I don't save as HTML from my Mac Word. I input the docx into Calibre and let it generate the ebook.

oyster50 ๐Ÿšซ

I write in MS Word. It's what I'm used to. I open the document I saved in Word into OpenOffice and save it as HTML in OpenOffice. That's what I post on SOL.

Works for me.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ

@oyster50

I open the document I saved in Word into OpenOffice and save it as HTML in OpenOffice. That's what I post on SOL.

When I post on SOL, I simply upload the docx file to the SOL Wizard. Why the extra step of creating HTML? At one time SOL didn't support docx files, but it does now.

Replies:   oyster50  Argon  Crumbly Writer
oyster50 ๐Ÿšซ

@Switch Blayde

Well, that .docx support somehow missed me. I'll give it a try next time.

Argon ๐Ÿšซ

@Switch Blayde

That's what I also do now; although I do not use Word, but Pages, and convert to .docx.
In the past I used the free text processor Bean for HTML conversion, but it often screwed up the italics.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ

@Argon

That's what I also do now; although I do not use Word, but Pages, and convert to .docx.

Occasionally the SOL Wizard doesn't convert my docx 100% correctly. When I notify Lazeez he fixes it. Sometimes he shows me why it didn't convert properly. Word does some crazy things.

Replies:   Crumbly Writer
Crumbly Writer ๐Ÿšซ

@Switch Blayde

Again, WORD dumps a shitload of 'book-keeping' crap into your files, which can easily choke many conversion routines. And in the end, does documenting every single name, date, location, foreign word and/or source really matter to your story?

Crumbly Writer ๐Ÿšซ

@Switch Blayde

When I post on SOL, I simply upload the docx file to the SOL Wizard. Why the extra step of creating HTML? At one time SOL didn't support docx files, but it does now.

It's because the html generated by WORD has so much garbage in it, allowing WORD to track things like names, dates and locations, among other things, that it produces incredibly cludgy (i.e. slow) code.

As far as posting directly to Calibre, it's the same problem, only with both WORD's and Calibre's cludge. It's a more tedious process, but after doing it for years, it's second nature now and the performance book is, IMHO, worth it.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ

@Crumbly Writer

It's because the html generated by WORD

I don't save as HTML in Word.

I was responding to someone who said he used OO to save the docx as HTML which he uploaded to the SOL Wizard. My point was that SOL now supports docx so why convert to HTML? Just post the docx to the SOL Wizard.

Now if I'm creating an ebook either for Bookapy or KDP, I do create the HTML, but not from Word. From Calibre.

Can someone hand code better HTML than Calibre which has to be general? Of course. But not me. I'll go with bloated HTML that works rather than streamlined HTML that has problems.

Replies:   Crumbly Writer
Crumbly Writer ๐Ÿšซ

@Switch Blayde

I was responding to someone who said he used OO to save the docx as HTML which he uploaded to the SOL Wizard. My point was that SOL now supports docx so why convert to HTML? Just post the docx to the SOL Wizard.

You were right then, as the SOL wizard strips out anything it doesn't recognize. Normally, you post in html in order to give you greater flexibility rather than submitting plain text files.

And I'll admit, once I've created the stripped-down html code for my documents, I then use Calibre to create the actual ePub document (and then, sometimes, I also 'clean up' all the Amazon-specific commands that Calibre dumps in their code, assuming that Amazon is the master of everything, and every must bow down to their moronic formatting commands (such as indented text only being indented by a single, unnoticeable space).

But Calibre actually creates HORRIBLE coding, covering each and every CSS style definition into their own internal coding (often tossing various elements). I'm not crazy about Calibre, but with so few viable ePub creation utility, there really isn't any other choice!

Ernest Bywater ๐Ÿšซ

@Crumbly Writer

But Calibre actually creates HORRIBLE coding, covering each and every CSS style definition into their own internal coding (often tossing various elements). I'm not crazy about Calibre, but with so few viable ePub creation utility, there really isn't any other choice!

That's weird, as it doesn't do that to me. But, then, I have every one of my paragraph styles set in my own css that's part of the html code of the file I use to create the epub in Calibre. Thus the only places calibre gets to add anything is for any embedded images as I don't have a style for them and embed it.

Switch Blayde ๐Ÿšซ
Updated:

@Crumbly Writer


their moronic formatting commands (such as indented text only being indented by a single, unnoticeable space

You've mentioned that before. I used to indent (in Word) 0.5" and now 0.3". I see the difference on Amazon so they can't be changing it to a single space.

Back to Top

 

WARNING! ADULT CONTENT...

Storiesonline is for adult entertainment only. By accessing this site you declare that you are of legal age and that you agree with our Terms of Service and Privacy Policy.


Log In