Home « Forum « Author Hangout

Forum: Author Hangout

Submission Engine Chewed Up My Post

KimLittle

Hola!

Reposted Chapter 15 of "Off The Deep End" and it came out like this:

hapter 15
Posted: December 26, 2017 - 09:53:36 pm
Updated: January 06, 2018 - 12:27:53 am
BZh91AY&SY-����0P��4>΀?���@ P�V=�k�]e�� 4M �*~��S$�hd U?���$�C 4ĂS����h�i�@ $��I�I�LDޢd�i�,�{�����?h|��՗���;�� �yd �-,|�j����E1��K�F'f>V�V��xT!�ђL����J�ATq�ٔ�3����X�A�F��2dLXbL����}[�$׀�)��k&�+�+l��Ŵ-��I�/���L���tR �*�BH��(�ݭ���ڌ��


Another repost waiting in the Queue, but any ideas what the heck happened?

I always create a clean text file from MS Word which I write in, then go through and manually add italics where needed.

Never had this happen before...

AmigaClone

With issues like this you would need to contact the webmaster - Lazeez.

Ernest Bywater
Updated:

@KimLittle

here's the help page guides on submissions

http://storiesonline.net/doc/Text_Formatting_Information_Guide

http://storiesonline.net/author/posting_guidelines.php

which says

File Formats:

File formats accepted for submission of works through the site are: Plain Text (.txt, .asc) and HTML files. (All open formats, no proprietary formats are accepted -- No Word, Wordperfect, MS Works, AppleWorks, or Lotus Word) If you need to submit styled text like italics and bold, convert your document to HTML. All the popular word processors support one of these two.


I wonder if what you submitted was a .doc file because what you show above is the sort of mess I see when viewing a .doc file in a text editor.

typo edit

KimLittle

Shouldn't be, because I paste from Word into a text editor which strips all formatting, do a final check, then manually paste the chapter upload into the submission engine.

Hopefully just a glitch.

Ross at Play
Updated:

@KimLittle

Shouldn't be, because I paste from Word into a text editor which strips all formatting, do a final check, then manually paste the chapter upload into the submission engine.

It looks like the steps you think you did were: Copy from Word, Paste into text editor, Copy from text editor, Paste in submission wizard.

You might have accidentally hit Paste when you thought you were doing the second Copy. You could easily not notice that and the Paste into the submission wizard would then have been another copy of the Word document.

Ross at Play

Kim,
Note that CW's last post was intended for your information. :-)

Crumbly Writer

@KimLittle

I'm not sure what happened, but Ernest is right, as the text displayed looks like MS WORD's header material.

I convert from WORD to html all the time, but I don't copy it to a separate word processor first, as I'm unsure what that gains you. Thus, if I ever mistakenly post my original WORD file, SOL won't allow me to continue, catching my mistake.

Rather than going through your procedure, I suggest saving (in WORD) as "Web Page, Filtered (*.htm, *.html)". That will create the web page directly. You'll still need to strip out all the internal WORD commands (like identifying foreign words, proper names and included links) but it'll also help you figure out the various settings to turn off so WORD no longer includes all that JUNK information that you don't use (at the moment, I can't remember where WORD hides those settings, as I haven't needed to adjust mine in years).

By having less pass-thru's, I suspect you're mistakes will be caught the next time you unintentionally screw up (as opposed to all the other intentional time you do). 'D

Replies:   Switch Blayde  Gauthier
Switch Blayde

@Crumbly Writer

You'll still need to strip out all the internal WORD commands (like identifying foreign words, proper names and included links)


What do you use to edit the file to do that?

Replies:   Crumbly Writer
REP

@KimLittle

You'll still need to strip out all the internal WORD commands


Kim, CW gave you the above advice.

For his purposes, it is probably very good advice.

I am not into editing HTML files. I develop my chapters using Word, and I have Word convert the files to Web Page, Filtered files. I submit those files to SOL, and so far, I have had no problems with the posted results.

If you know how to edit HTML files and have a reason for doing so, then go ahead. However, you are not required to edit the HTML to achieve a postable result

Ernest Bywater

@KimLittle

Shouldn't be, because I paste from Word into a text editor which strips all formatting,


I suggest instead of copying and pasting from Word to the text program that you save the word file as .txt (plain text format) and then open it in the other text program to examine it there. This way all of the word fancy format code will be stripped out when it's saved as .txt

Replies:   Crumbly Writer
Ernest Bywater

If you decide to go the html route I suggest you read this document first:

http://storiesonline.net/article/Text-formatting-guide-for-WLPC-Sites

Replies:   KimLittle
KimLittle

@Ernest Bywater

Thanks, Ernest. I think I will go down the filtered HTML route for future stories.

Crumbly Writer

@Switch Blayde

Any html editor will do, as they're simply non-standard html commands. I can't remember precise which each is, but reviewing your files, it's not hard to identify (based on which words are featured). But I use Dreamweaver. It's a great program, but being an Adobe product, is vastly overpriced!

Crumbly Writer

@KimLittle

From Ross

I am not into editing HTML files. I develop my chapters using Word, and I have Word convert the files to Web Page, Filtered files. I submit those files to SOL, and so far, I have had no problems with the posted results.

If you know how to edit HTML files and have a reason for doing so, then go ahead. However, you are not required to edit the HTML to achieve a postable result

SOL is excellent at stripping out any and all unnecessary html commands (include ALL style definitions), so chances are, you shouldn't even need to remove the extra commands if you're uncomfortable with editing html. (I mostly do it because I also post to my webpages and WORD produces incredibly message html.)

I think you're troubles are entirely related to copying the WORD file to another WORD Processor instead of cleaning up the WORD produced html file directly. It was a valiant attempt, but an unnecessary complicating step.

Crumbly Writer

@Ernest Bywater

Shouldn't be, because I paste from Word into a text editor which strips all formatting,

I suggest instead of copying and pasting from Word to the text program that you save the word file as .txt (plain text format) and then open it in the other text program to examine it there. This way all of the word fancy format code will be stripped out when it's saved as .txt

Depends on how fancy she gets. It also strips out accent marks, publication marks and you've got to recode all of you italics and bold marks, which is probably more work than she saves (depending upon how comfortable she is with html in the first place).

It's not a bad way to go, it's just not for everyone. If she's been posting html all along with no problems, I'd say she's eminently qualified to continue to do so.

Ernest Bywater

@Crumbly Writer

Any html editor will do,


any basic text editor allows you to code html as well. I use Bluefish or Pluma, and when I was using MS Products (all those many moons ago before I left the Darkside) I used Notepad for working on html code.

Replies:   Crumbly Writer
Crumbly Writer

@Ernest Bywater

Any html editor will do,

any basic text editor allows you to code html as well. I use Bluefish or Pluma, and when I was using MS Products (all those many moons ago before I left the Darkside) I used Notepad for working on html code.

I didn't say that you couldn't do it, just that, if she's only using the html editor to REMOVE extraneous codes, they it'll likely be MORE work to add them all back in again. That sounds like a LOT of work! If you don't use html at all, then there's no reason to submit with html formatting.

KimLittle

More than qualified to do html, CSS, etc. Thank you. I started my web experience handcoding Geoshitties sites back when Trumpet Winsock was required software, and filenames were all 8.3. With tables. Then frames when they became available. And so on.

But for those who need a detailed colophon:

1) I paste from MS Word into Notepad on a PC or BBEdit on a Mac (depending on which machine I'm using at the time).

2) Force it to remove all formatting (ie become true plaintext).

3) Manually go through and add the plaintext underscores (just like we used to indicate italics to editors on a typewriter!), etc.

4) I paste from the text editor into the "Paste Your Text" part of the submission engine.

All I can think is that somehow the file became corrupted through the submission engine, because from what I see, the engine takes your input from the form and then turns it into a file which it attaches to all the other metadata you've added (codes, chapter numbers, etc).

All's fine now. Correction went through and I have lots of desperate messages from various SOLers which tells me that people are reading my work and looking forward to it.

Which is awesome.

^_^

Switch Blayde

@Crumbly Writer

But I use Dreamweaver.


Oh. I asked you because you have a Mac.

I got CotEditor from the App Store. It was free. But I haven't had the need to edit an html file.

Replies:   Crumbly Writer
Crumbly Writer

@Switch Blayde

Oh. I asked you because you have a Mac.

I got CotEditor from the App Store. It was free. But I haven't had the need to edit an html file.

I have an OLD version of Dreamweaver, which I prefer because it offers simultaneous views of the source code, display page and the live feed, so you can tell when you screw something up.

I may try CotEditor, just to see what it's like. I'd rather switch to ANYTHING OTHER THAN ADOBE, but there really aren't many full-featured html editors for authors on the market at this point, as I no longer consider myself a 'coder'.

Replies:   Ernest Bywater
Ernest Bywater

@Crumbly Writer

I'd rather switch to ANYTHING OTHER THAN ADOBE,


CW,

If I remember right you also use Calibre. Have you tried using the edit function in Calibre? I ask, because I tried it and it opens in three panes: left pane is the chapter index split where you select the chapter to edit; the middle pan is the code of the e-pub file and you can work on it in the same way as a html editor; the right hand pane is the chapter showing how it displays. You'll have to look at it to see if it does what you want.

Replies:   Crumbly Writer
Ross at Play
Updated:

I have a simple question.

I've used text files for submissions until now. The only feature I've wanted but been unable to produce has been the non-breaking space - I prefer the look of a space before my ellipses and dashes, but not at the start of new lines.

Would html format allow me to code for non-breaking spaces?

Replies:   zellus  Crumbly Writer
zellus

@Ross at Play

Yes; https://www.w3schools.com/html/html_entities.asp

Replies:   Ross at Play
Ross at Play
Updated:

@zellus

Yes

Thanks. I bookmarked that link and will test it soon with a revised version of an old story. :-)

Just checking ... Can I write a doc file in OpenOffice including things like: &nbsp. When the story is ready to go, I 'Save As' my file, selecting type of html. Presumably I should then check both version side-by-side. If the appearance of the html version is as I want, then it's ready. Right?

To make the final check easier, can I put the html entities, but only those, in bold font in the doc file?

Replies:   Crumbly Writer
Ernest Bywater

You may want to check with Lazeez if the code you wish to use is allowed through the submission wizard, because it strips out all but a few allowed html codes. The list has expanded a little lately, but you need to confirm if it's allowed.

Ross at Play

@Ernest Bywater

Got it. Thanks.

Ernest Bywater

@Ernest Bywater

You may want to check with Lazeez if the code you wish to use is allowed through the submission wizard,


http://storiesonline.net/doc/Text_Formatting_Information_Guide#htm

says:

HTML files
If you feel like submitting your own html files you can do so. However, keep in mind that all html code gets stripped from your files with very specific exceptions, and your files get reformatted from scratch.

The exempted html codes carried over to the site are:

italic
bold
bold-italic
emphasize
strong
superscript
strike-through
teletype
blockquote
horizontal rules
H3, H4 and H5 header tags
The site's conversion utilities try to support centered and right justified text, but it doesn't work very reliably due to the different ways this may be acheived.

..............

The web page has the actual tags, but I removed them so they wouldn't mess up this post.

at http://storiesonline.net/article/Text-formatting-guide-for-WLPC-Sites

I cover that and also mention a few others that are now allowed which include text colours, images, span commands, and the BR carriage return in some more detail. I believe a few more have been added since, but I'm not sure.

Replies:   Ross at Play
Ross at Play
Updated:

@Ernest Bywater

Thanks, EB.

I just got back from sending off an enquiry to Lazeez.

I scanned the links you provided. It's nice to be aware of what exists, but for my purposes, if a non-breaking space is not available then what I currently do ain't broke.

Ross at Play

UPDATE:

Lazeez advised me the submission wizard strips out non-breaking spaces.

I'll stick with the text-file posts I already know.

Crumbly Writer

@Ernest Bywater

If I remember right you also use Calibre. Have you tried using the edit function in Calibre? I ask, because I tried it and it opens in three panes: left pane is the chapter index split where you select the chapter to edit; the middle pan is the code of the e-pub file and you can work on it in the same way as a html editor; the right hand pane is the chapter showing how it displays. You'll have to look at it to see if it does what you want.

Yep, I use Calibre, but only for creating epubs exclusively for lulu.com (SW and D2D create their own epubs generated from my WORD files). It doesn't create the original html, which I use to create my webpages and the input for the epub files, so that really doesn't address the central question (what html editor I use).

Crumbly Writer

@Ross at Play

Would html format allow me to code for non-breaking spaces?

To add to Zellus's answer, you'd use " ... ".

Replies:   Ross at Play
Crumbly Writer

@Ross at Play

You'll need to check with Ernest about the specifics of OO (he uses LO, though). For WORD, which is similar, you'd save the file as "Web Page, Filtered (*.htm, *.html)".

If you do that, WORD (and I assumed both OO and LO) will convert the entire document to html.

By the way, in Windows, you create the non-breaking space by typing Cntrl+Shift+Space (That means, hit each key at the same time and it'll produce the key). You can also predefine autospell commands, so you can type something and it'll automatically convert it to a non-breaking space for you.

You may want to check with Lazeez if the code you wish to use is allowed through the submission wizard, because it strips out all but a few allowed html codes. The list has expanded a little lately, but you need to confirm if it's allowed.

With many of these features, Lazeez prefers using a consistent Style, rather than allowing each author to do whatever they please (like with paragraph styles).

Ernest Bywater
Updated:

The last few points about the html code is to keep in mind there is a very big difference between what you can do in html code and what you're allowed to do on SoL. There is information on the site that tells you all but the approved options are stripped from what's been submitted, and Ross has checked with Lazeez and been told the non-breaking spaces are stripped out.

In the link I gave earlier I listed all the allowed html code options as of January 2017. I know there have been a few minor changes since then, because I used to have to change the curly apostrophes fro speech dialogue to straight ones or the code got hashed, but the curly apostrophes are now allowed through. I don't know what else has been added to the approved list, because I work to stay with what I knew was approved as of January 2017 plus the curly apostrophes.

Dominions Son

@Ernest Bywater

I sued to have to change


That must have been complicated, not to mention expensive with you in Australia and Lazeez & SOL in Canada. :)

Replies:   Ernest Bywater
Crumbly Writer

@Ernest Bywater

In the link I gave earlier I listed all the allowed html code options as of January 2017. I know there have been a few minor changes since then, because I sued to have to change the curly apostrophes fro speech dialogue to straight ones or the code got hashed, but the curly apostrophes are now allowed through. I don't know what else has been added to the approved list, because I work to stay with what I knew was approved as of January 2017 plus the curly apostrophes.

As far as I know, the addition of the em-dash was the last html code added to SOL (previously it translated every em-dash command to the a single dash).

Capt. Zapp

@Ernest Bywater

... but the curly apostrophes are now allowed through...


Unfortunately when you download the stories as text files, they have very strange character combinations in their place.

JohnBobMead
Updated:

@Capt. Zapp


download the stories as text files


Every time I've tried to do that, I've received a document that was nothing except text; no spaces, no punctuation, just one solid mass of text crammed into one very large "word".

This doesn't happen to other people?

Actually, that's good to know.

I'd really rather it was just something weird happening with my interactions with the site, than it being the way it works for everyone.

Maybe I'll try it again, and see if I can figure out what's happening.

*****

Well glory be, it worked this time!

Replies:   sandpiper  Capt. Zapp
sandpiper

@JohnBobMead

Change the suffix of the file to ".rtf" and the story snaps into place.

Ross at Play

@Crumbly Writer

Thanks.
I need to remember that html commands must be ended with a semi-colon.
I would only use non-breaking spaces before ellipses and dashes and ordinary spaces after, but that's moot as the submission wizard does not allow non-breaking spaces through.
I'm not aware of any differences between OO and Word I need to be bothered about.

Ross at Play

@Ernest Bywater

because I sued to ...

Who's splitting hares now? :-)

Capt. Zapp

@JohnBobMead

Every time I've tried to do that, I've received a document that was nothing except text; no spaces, no punctuation, just one solid mass of text crammed into one very large "word".


If I open the txt file with Notepad, it displays a a single paragraph. If I open it with WordPad, the paragraphs display properly, but the curly apostrophes show as ’ and quotes display as “ and â€

Replies:   sandpiper
Ross at Play

@Crumbly Writer

As far as I know, the addition of the em-dash was the last html code added to SOL

They definitely work now.
I posted a revised version of a chapter yesterday. I had single characters in my doc file for ellipses, en-dashes, and em-dashes. They all survived both saving my doc files with txt format, and the submission wizard allowed them through to the posted version.

Capt. Zapp

@Ross at Play

because I sued to ...

Who's splitting hares now? :-)


I am quite sure he meant the definition

. to make suppliant requests of

Replies:   Ross at Play
sandpiper

@Capt. Zapp

If I open it with WordPad, the paragraphs display properly, but the curly apostrophes show as ’ and quotes display as “ and â€


Only some files do that. However, it's only a couple minutes to fix with a search and replace if it's too irritating to live with.

Lazeez Jiddan (Webmaster)

@sandpiper

The files generated by the site are encoded UTF-8. So if you open them as UTF-8, then they should look correct.

Ross at Play

@Capt. Zapp

I am quite sure he meant the definition "to make suppliant requests of"

I'm not sure what to make of that one.

I suspect you didn't realise my post was a continuation of a running joke EB have been having about wrong words making it through into posts here.

Maybe you intended a joke with your use of highfalutin' language. If so, speakers of BrE may have problems getting it. My OxD does not include the word 'suppliant'; it only has 'supplicant'. Just to be sure, I tested 'suppliant' in dictionary.com. It listed it as having the meaning 'supplicating'.

Sigh! It looks like another example of Americans mis-suing a word so often it becomes accepted as a real word. (There may be hope for you yet, EB :-)

Until that point, I thought you were trying to be very clever by adjusting the spelling of "supply-ant". The meaning of that didn't quite work.

Replies:   Capt. Zapp
Capt. Zapp

@Ross at Play

Maybe you intended a joke with your use of highfalutin' language.


Nah. I knew what I intended to say but couldn't put it into words so I just did a copy/paste from one of the online dictionaries. You can blame the 'highfalutin' language' on them.

Ernest Bywater

@Dominions Son

thanks, typo fixed

Ernest Bywater

@Crumbly Writer

As far as I know, the addition of the em-dash was the last html code added to SOL


I wasn't aware he allowed that yet. I don't use the em-dash at all so I don't notice it.

Ernest Bywater

@Capt. Zapp


Unfortunately when you download the stories as text files, they have very strange character combinations in their place.


That may be due to the system where by you're getting it as a text file or it may be on older file with the older code. Looking at Rough Diamond which was recently revised and re-posted the html page shows the curly apostrophes, and when I view the source code it has the curly apostrophes and not the html command for them - I don't know why.

Replies:   Capt. Zapp
Ernest Bywater

@Ross at Play

Who's splitting hares now? :-)


the typist with the fat fingers

Dominions Son

@Ross at Play

Who's splitting hares now? :-)


The barber? :)

Replies:   Ross at Play
Ross at Play

@Dominions Son

The barber? :)

No. The butcher for hares. :-)

Capt. Zapp

@Ernest Bywater

That may be due to the system where by you're getting it as a text file or it may be on older file with the older code.


I don't know where the problem lies. I am on a Win8.1 system but use LO as my primary word processor. Since you referred to your Rough Diamond story, I used it for my testing purposes. I downloaded the story as text file using the TXT button, I got the file Rough_Diamond@Ernest_Bywater.txt which opens by default with Notepad. In Notepad, the Table of Contents displays properly but the story becomes a single paragraph. Opening the same file in WordPad gives me properly formatted paragraphs, but I get the strange characters I noted before. Opening in LO I still get the strange characters, but I also see (with 'Formatting Marks' turned on) that the paragraphs are separated by line breaks (Shift+Enter) instead of paragraph break. I also get paragraph breaks in unusual places, usually in the middle of a sentence.

As sandpiper suggested, it's a fairly simple matter of me doing a find & replace to make corrections, but should I have to?

Ernest Bywater

@Capt. Zapp

I don't know where the problem lies.


I don't know what's causing the issues you have, either.

What I upload to SoL is a html file and when you look at the story page in the browser it's a html file. When you use the right mouse button click to get the sub-window and select 'view source code' you see plain html code. There is a little extra added and the start and the end for some of the Sol admin stuff, but the story text is simple text with very few html code for paragraphs, italics, bold, and some text colour span commands. There are none of html commands that start with & and end with ; or any other oddities, plus the apostrophes show the exact same way as a text apostrophe and not a html command.

It seems the change is taking place in the conversion to the text file or in Notepad.

May I suggest you try opening the file in Libre Office Writer and see what it looks like.

I just downloaded the story My Bloody valentine by The Scot by using both the TXT button and the EPUB button on the story list. The epub opens and reads perfectly. The .txt file opens and reads perfectly in Libre Office Writer and in the Linux Text Editor on my system, but when I open the same file in MS Notepad through WINE I have weird characters where the apostrophes for speech are. This would seem to be an issue with Notepad and not SoL or the file.

Replies:   Ross at Play
Ross at Play

@Ernest Bywater

This would seem to be an issue with Notepad and not SoL or the file.

Although not an SOL problem, I expect Lazeez has been asked about such things before and could explain what is happening.

Ernest Bywater

@Capt. Zapp

The characters you're having trouble with are the Unicode characters U+ 2019, U+ 201C, and U+ 201D (spaces to stop running as code) being misread by Notepad and saved as the wrong character. I'm not sure if these are UTF-8, UTF-16, or UTF-32 or why they aren't showing right in Notepad.

helmut_meukel

@Ernest Bywater

I'm using Opera as browser and tried to save from ASSM as text. When opening in Notepad the textfile was without linebreaks and in most cases the two words were glued together. I assumed this is a problem of Opera and as a work-around selected the whole text in Opera to copy+paste it into Notepad. This worked fine.

Now reading this thread I guess it's first the preselection of the character coding used by notepad when opening files – not UTF-8 – and second the way the different operating systems code and interprete NewLine. (Which codes are created by the OS and passed to the wordprocessor when the user presses the Return or Enter keys of their keyboard?)

In the early days with teletypes used as cheap printers 'Carriage Return' and 'Line Feed' were necessary and in this sequence: CRLF. (print head to the left and paper one line up). If you reversed the codes (LFCR) the first character was printed somewhere in the left half of the line while the print head was still moving.

Back when I had to code protocol files created by my program, which were copied and read on Unix and Windows and Apple machines I learned that using only one of the codes (CR alone, LF alone) did work as NewLine but not with all OS's.
One of the 3 OS did LFCR without problems while DOS/Windows interpreted the 'LF' as a 'Line Feed' without change of the printing position (same column) and the following 'CR' as a 'Carriage Return' with an implied 'Line Feed', thus moving the paper two lines upwards.

HM.

Replies:   zellus
Lazeez Jiddan (Webmaster)

@Ross at Play

I expect Lazeez has been asked about such things before and could explain what is happening.


already answered above:

http://storiesonline.net/d/s2/t3390/submission-engine-chewed-up-my-post#po65030

zellus

@helmut_meukel

Notepad


You should try using notepad++ https://notepad-plus-plus.org/

Capt. Zapp

@Ernest Bywater

The characters you're having trouble with are the Unicode characters U+ 2019, U+ 201C, and U+ 201D (spaces to stop running as code) being misread by Notepad and saved as the wrong character. I'm not sure if these are UTF-8, UTF-16, or UTF-32 or why they aren't showing right in Notepad.


That would make sense IF the strange characters were showing up in Notepad. The only issue in Notepad is that the story becomes one long paragraph. It is only when I open the file in other word processors (WordPad, OpenOffice, LibreOffice) that the strange characters appear.

When I cut/paste from SOL to Notepad or Wordpad I end up with completely unformatted text.

When I cut/paste from SOL to OpenOffice or LibreOffice, the formatting is correct but I end up with non-breaking space on either side of the formatting EXCEPT when the formatted word/sentence ends in punctuation. For example:

"It was their choice." There would be a nbsp on either end of their

"It was their choice." There would be a nbsp at the beginning of choice but not at the end.

Ernest Bywater

Capt Zapp, and Helmet Meukel,

What is strange is I'm using Chrome to view SoL on a system with Zorin Linux. When I use the Sol TXT button to download the file and save it as a .txt file I get the results I mention above where the file looks perfect in Libre Office Writer, the default text editor, and Plume, but when viewed in MS Notepad it has weird characters in place of the three characters I mentioned above, the rest displays as it should, even with the proper line spaces.

I don't know why those characters are being messed up, and I can only guess it's because they aren't being seen properly by the software you're using. Nor do I understand why you're losing the line spaces.

I've reached the limit of what I can do to help you debug this issue, sorry I can't help you any further.

I can only suggest you download as e-pubs in future to side-step the issue.

Replies:   Capt. Zapp
Capt. Zapp

@Ernest Bywater

I don't know why those characters are being messed up,


I 'think' I found the issue to be with LibreOffice. For some reason LO does not allow you to specify the file encoding UNLESS you go:

File> Open> File Type> Text - Choose Encoding

ROYAL P.I.T.A.!

In any case I just recorded a quick macro to go through and fix a couple of things that I don't like doing with Find & Replace. Now if I could just automate it.

Replies:   Ernest Bywater  graybyrd
Ernest Bywater

@Capt. Zapp

I 'think' I found the issue to be with LibreOffice.


That's odd, because I just downloaded, then opened the file in LO and had no issues until I opened it in Notepad. But I'm using Zorin Linux and not MS Windows, and I don't know if that makes a difference or not.

Replies:   helmut_meukel
helmut_meukel

@Ernest Bywater

But I'm using Zorin Linux and not MS Windows, and I don't know if that makes a difference or not.


I bet it makes a difference. Windows and Unix/Linux use by default different character sets.
AFAIK, there is still no generally accepted standard how and where in the file to store which character set was used to create a text file.

OTOH, any HTML file contains something like this: encoding='utf-8'

With text files you have to guess.
In the 'open' dialog for Notepad you can select the encoding you believe was used when creating the file.
On my Windows system it's ANSI, Unicode, Unicode Big Endian, UTF-8. Default is ANSI.
If you get strange characters displayed, reopen the file with another selection.

HTH,

HM.

BTW, last time I had to deal professionally with different encodings was about 25 years ago when we tried to read data from a Siemens PLC with a Big Endian processor. All data we got was in the wrong order. It was a PITA to create a conversions subroutine to convert 4 byte floating point values into usable data.

graybyrd

@Capt. Zapp

File> Open> File Type> Text - Choose Encoding

ROYAL P.I.T.A.!


As Lazeez said earlier, text encoding matters. UTF-8 has been accepted as the web standard.

Also, the long-standing headache of line endings has been resolved by accepting the Linux convention of "LF" as a line ending, normally found only at the end of a paragraph. The Windows convention requiring a "CR LF" at the end of each window wrap point (ie, hard returns) is obsolete and non-standard, but some Windows text editors still demand it.

Problem solved if one uses a modern text editor, or a word processor that exports text that meets the UTF-8 encoding, LF line ending standards.

Mac users have fought this Windows text convention nightmare for years.

Replies:   helmut_meukel
helmut_meukel
Updated:

@graybyrd


The Windows convention requiring a "CR LF" at the end of each window wrap point (ie, hard returns) is obsolete and non-standard, but some Windows text editors still demand it.


M$ in it's wisdom insisted on backward compatibility.

Back in the dim and distant past, long before Windows and Mac, even before DOS, when the ASCII code was created they intentionally defined two codes for 'End-of-Line': CR and LF.

Do you remember how early typewriters and printers worked?

One action to roll up the paper one line (=LF), another action to move the carriage or the print head back to the start of the line(=CR). Both commands were necessary.

Then Linux (did Unix this too?) implied the CR code when there was just a LF. But this was only internally. In communicating with any printing device the 'LF' was expanded to the full 'CRLF' sequence!

To create a vertical line of e.g. '*' somewhere on a page with no other text, you had to move the print head to the starting position and then only repeat this sequence: '*BSLF' multiple times.

IIRC, the old apple OS did it not like Linux but implied a 'LF' when there was only a 'CR'.

I never had to deal with files from a TRS-II, Commodore PET or Atari, so I don't know about them.

I know IBM mainframes used their own encoding, not ASCII.

HP's systems (the 3000 series, the 250 series and the 1000 real time series) used HP's Roman8 encoding (same control codes as ASCII) and their early Vectra PCs could be switched from ASCII to Roman8 for compatibility with their bigger systems. I analysed text files from a HP 1000, they had CRLF as 'End-of-Line' sequence.

BTW, if you would send a text with only a LF at the end of each paragraph to one of the old dumb printers it would print one line of text and then just advance the paper one line for each paragraph. :)

HM.
(typo edited)

Replies:   Dominions Son  graybyrd
Dominions Son

@helmut_meukel

I know IBM mainframes used their o


IBM mainframes use EBCIDIC

https://en.wikipedia.org/wiki/EBCDIC

Replies:   helmut_meukel
helmut_meukel

@Dominions Son

IBM mainframes use EBCIDIC


I know, but because I never had to cope with this code I forgot the name and was too lazy to look it up.

HM.

Gauthier

@Crumbly Writer

the text displayed looks like MS WORD's header material.


No, word docx format is zip, it start with "PK"
Note that LibreOffice OASIS Open Document format specification also specify the use of zip.

This one start with "BZh" and that is the header for Bzip2 compression.
https://en.wikipedia.org/wiki/Bzip2

So that can't be it...

graybyrd
Updated:

@helmut_meukel


One action to roll up the paper one line (=LF), another action to move the carriage or the print head back to the start of the line(=CR). Both commands were necessary.

Then Linux (did Unix this too?) implied the CR code when there was just a LF. But this was only internally. In communicating with any printing device the 'LF' was expanded to the full 'CRLF' sequence!


Historically correct, and understandable, due to the fact that the earliest printers available during the first days were teletype printers. Those machines required separate CR (carriage return) command, and a LF (line feed/paper advance) command for each line of print. I used to work with radioteletype and if during a weak signal or static condition one of those CR/LF commands was missed, the machine print was either overprinting or jammed at the carriage stop until a following CR/LF was received.

A basic toolkit item in any text-munger's repertoire is a 'translation' app to convert text file line endings from Windows-Mac, Mac-Windows, etc etc. Many 'Swiss Army Knife' type text editors included the functions in their menu choices. I still use BBEdit (mac text editor) to 'clean' a text file, removing hard line returns and other Windows cruft, to produce a cleanly-flowing Linux/Mac file for web page output.

Again, to all: a clean text file uses UTF-8 text encoding, and LF (Linux-encoded) paragraph endings. Using the other, older code outputs is just asking for headaches.

Back to Top