Zen Master: Blog

Back to Zen Master's Blog

A note on text

November 1, 2013
Posted at 4:13 pm
Updated: November 2, 2013 - 3:22 pm

Wow! People really pay attention to blogs! I have not posted a single thing for several months, but put up all these blog entries and I already have three feedback emails today.

Um, one of them suggested that I learn to use unicode so that I could do fancy things like long dash and the copyright symbol. He even gave me a short tutorial on how to do it using the ALT key and the numeric keypad on the right side of my keyboard. Very helpful, really, if it was in any way a good thing to do on a public forum like SoL.
I started to write a polite "are you fucking kidding?" reply, but no, this kind of thing belongs in my shiny new blog where everyone can see the answer.
His suggestion is useful in a single horrible situation: where you, and everyone you know, is hopelessly mired in Windows(*) and uses Word(*) to the exclusion of all else. I mean, Ford makes good vehicles, but what would the world be like if Ford was the only vehicle manufacturer in the whole world? Detroit is in the fix it's in right now because they sat on their fat asses and raked in the profits while the rest of the world innovated until they built better vehicles.
Two of my favorite authors argue publicly about emacs and vi. Frankly, I think they are just doing it for the attention, but it's a data point. Not everyone uses Word. Really? Emacs?
Not even talking about different word processors, editors, or publishing tools (the three are NOT the same), there are many different platforms. Neither my linux box nor Win7 laptop implement that old BIOS shortcut the same way my Win98-SE game machine do. The laptop _might_, if I plugged in a normal keyboard, but I don't always do that. Certainly, the built-in keyboard doesn't support that BIOS feature "correctly", and it was designed with a Microsoft OS in mind.
Let's move on to non IBM PC computers. Has anyone ever heard of Apple? They are a new and growing company that has started to build computers. One of their claims is that their computers _don't_ act just like IBM's computers. Who knows, maybe they are better. Kinda like a lot of people would trust a Corolla from Toyota before they would trust a Lincoln from Ford. Similarly, I hear that someday someone will invent some kind of electronic book reader that you can load documents in and read just like a paper book. Who knows what hardware, OS, or reader software those things will have?
The bottom line here is that there are a lot of hardware platforms, there are a lot of operating systems, there are a lot of document formats, and there are a lot of programs that claim to render your document the way the author intended. Hell, a lot of people even use their web browser program to read documents. Every combination fails, except one: Any platform, any OS, any program, will properly render an ASCII text file, usually identified by a ".txt" extension. No other file format can be expected to be properly rendered across all platforms, OSes, and viewers. If the file contains anything other than a simple stream of one-byte characters with values between d001 and d127, it isn't ASCII text, and it is useless to anyone without the same platform as the author.
Isn't this public knowledge? It's _why_ SoL expects all files uploaded to be in raw .txt format. They will take simple html (and I think RTF), but they will refuse to take anything else. It's just too certain that it won't come out right. Save your unicode for your next PowerPoint(*) presentation.

(*)All three copyright that ICBM target in Redmond, WA.

Update the next day: I have gotten quite a few emails about this particular 'rant', split about evenly between "Right on, ASCII isn't only the lowest-common denominator, it is still the _highest_ common denominator" and "You moron, the whole world except you uses Unicode. What the fuck is YOUR problem?"
Apparently, mentioning "StoriesonLine" several times wasn't enough to clue many readers that this 'rant' is specific to stories written for, uploaded to, and read on or downloaded from this website, so advice on how to use it in other places really isn't applicable.
The problem is that on the one hand Unicode is pretty much in universal use anywhere that "Word Processors" or "Desktop Publishing" is done but that there is still considerable confusion about different character sets, and this particular website chooses to minimize problems caused by this (among several other problems) by saying "text only". It _does_ allow some formatting and special characters, but since no two readers will see the same thing if you go very far down that road, I choose to stay at the highest level that ensures that all readers everywhere will see the same thing. Simple ASCII, or as close as I can get to it with modern software.
I have tried several different options. At the moment, I am encoding my files in "UTF-8" and making the strongest possible attempt to not use any characters with a value higher than 0x7F, with my Linux box set to use "DOS/Windows"-style end of line markers. If anything I publish like this comes out looking squirrely on any reader's platform, I would really appreciate hearing about it. Some older files were published before I realized how important this was, so if you see anything in the older files, please tell me about that, too.