Please read. Significant change on the site that will affect compatibility [ Dismiss ]
Home ยป Forum ยป Author Hangout

Forum: Author Hangout

Posting converters

Anotherp08 ๐Ÿšซ

Ok I'm a newb. ii started looking through (Not Thru) the posts and couldn't find where this had been discussed before, so, I'm asking. I don't use any fancy text or colored fonts just Bold and Italics for things like letters or telegrams and such. I use MS word to write in. so does anyone have any suggestions?

Replies:   REP  datadude
Ernest Bywater ๐Ÿšซ

Here are the resources you need to check first.

https://storiesonline.net/author/posting_guidelines.php

Pleased note:

File formats accepted for submission of works through the site are: Plain Text (.txt, .asc) and HTML files. (All open formats, no proprietary formats are accepted -- No Word, Wordperfect, MS Works, AppleWorks, or Lotus Word) If you need to submit styled text like italics and bold, convert your document to HTML. All the popular word processors support one of these two.

https://storiesonline.net/doc/Text_Formatting_Information_Guide

https://storiesonline.net/sol-secure/user/help.php

https://storiesonline.net/article/Text-formatting-guide-for-WLPC-Sites

REP ๐Ÿšซ

@Anotherp08

Ernest's post covers almost everything you need to know from an SOL point of view.

As a fellow Word user, I would like to suggest 2 things. Once your file is complete, saved, and ready to post, use Word's Save As function to convert the story to HTML. There are alternatives, but I have found that to be the simplest. I use Web Page option for File Type. It works for me, so I never tried the other options. The only thing about that option is it creates 2 files, which is sometimes annoying to me. There is a Single File Web Page option that I have considered using, but I haven't tried it yet.

The only problem I have ever encountered with Word is its ellipse symbol (...). I strongly recommend that you don't use it; use 3 periods instead. The problem with Word's ellipse symbol is the SOL text converter does not like Word's symbol. The converter will do strange things to your story's text when the it encounters the symbol.

Ross at Play ๐Ÿšซ

@REP

The only thing about that option is it creates 2 files, which is sometimes annoying to me.

Could you add an explanation for the newbie what those 2 files are, and what they must do with them?

REP ๐Ÿšซ

@REP

The two files are linked; one has a folder symbol and the second a .html extension. When I post the story/chapter, I upload the file with the .html file extension. I suspect that whatever data is in the folder symbol file is uploaded also. It works fine for me and meets my needs, so I never asked.

Vincent Berg ๐Ÿšซ

@REP

The only problem I have ever encountered with Word is its ellipse symbol (...). I strongly recommend that you don't use it; use 3 periods instead. The problem with Word's ellipse symbol is the SOL text converter does not like Word's symbol. The converter will do strange things to your story's text when the it encounters the symbol.

That's old news. Lazeez changed how they get processed and he automatically converts the ellipses symbol into the proper html command …

Also, the "save as single file" is what you want, as it doesn't create a separate folder with all the additions (which you won't use anyway).

Unfortunately, WORDs conversion to html is messy, as they include all of WORDs internal commands. You can turn them off, one at a time, but there's no single option to switch.

Lazeez Jiddan (Webmaster)

@Vincent Berg

Unfortunately, WORDs conversion to html is messy, as they include all of WORDs internal commands. You can turn them off, one at a time, but there's no single option to switch.

The 'Web Page (filtered)' option is the one that gives cleaner code than the plain old 'Web Page'. Filtered is what should be used when exporting as HTML.

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

The 'Web Page (filtered)' option is the one that gives cleaner code than the plain old 'Web Page'. Filtered is what should be used when exporting as HTML.

You're correct. That's the one I meant to reference. It still creates a separate folder if there are any included files, but you can simply ignore those.

REP ๐Ÿšซ

@Vincent Berg

Lazeez changed how they get processed and he automatically converts the ellipses symbol into the proper html command โ€ฆ

I encountered the problem again last week. After reading your post, I informed Lazeez.

From his response, it sounds as if he did something to correct that issue, but evidently my Word settings has some obscure, hard-to-identify setting that is conflicting with the text converter. Seems some of the other Word users are also encountering a similar problem.

Lazeez seems to believe he has better things to do with his time, and I agree with him. I and any others who experience this conflict can simply use 3 periods. Much simpler than the solution Lazeez said he would have to implement to fix the problem on his end.

Ernest Bywater ๐Ÿšซ
Updated:

@REP

any others who experience this conflict can simply use 3 periods.

You may want to check what the code MS Word is sticking in the HTML actually is. In Unicode an ellipsis is called a hellip uses the code U+2026 and is displayed as three close together dots. To display it in html you use the code & h e l l i p ; I've put a space between every character so the code characters will display - everything in italics is one word in html code. If the MS Word doesn't convert it to that code it isn't provided the html code for an ellipsis. - โ€ฆ

edit to add - I ended the original post with the code, and you can see how the forum converted it to display properly after the dash.

Lazeez Jiddan (Webmaster)

@Ernest Bywater

You may want to check what the code MS Word is sticking in the HTML actually is.

If MS Word used the correct code for an ellipsis, it wouldn't be a problem. The site's converter handles those perfectly.

What MS Word does is to use either actual letters like a bullet or even letter like m and n, and sets them to a symbols font. So when read on a windows machine in a browser, they look correct as in an actual ellipsis, but move the file to a machine that doesn't have that symbols font, and the file doesn't look correct. Upload it in the wizard and the converter shows the character that Word's fucked up system used.

To fix this issue on my end, I would need somebody with MS Word, to make sure the setting to use the symbols font is on, create a document using all the problematic symbols, export it as HTML and email it to me. I'll parse it into a replacements table and add it to the wizard's converter.

Switch Blayde ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

I would need somebody with MS Word,

I don't know how.

I created a new Word doc which consisted simply as:
Emdash โ€”
Ellipsis โ€ฆ

I saved it as HTML (filtered) and opened it in TextEdit to see what the HTML code looks like. I didn't see any HTML code so I used Finder to change the file extension to TXT. When I opened that file in TextEdit I still don't see the HTML.

If you tell me how to do it, I will.

Replies:   graybyrd  Ernest Bywater
graybyrd ๐Ÿšซ

@Switch Blayde

I saved it as HTML (filtered) and opened it in TextEdit to see what the HTML code looks like. I didn't see any HTML code so I used Finder to change the file extension to TXT. When I opened that file in TextEdit I still don't see the HTML.

Open the html file in your web browser, and then look for the menu item to show the source code. Examine that listing. (I can't give you the exact menu listing; it may be a sub-menu under something like 'developer tools.' All these damn browsers have gotten so bloated and complex, the formerly 'simple' stuff has become 'obscure.' Anyway, try that.)

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ

@graybyrd

to show the source code.

for most browsers you open the web page, place the cursor anywhere on the page, click the right button and you have a sub-window where you select View Source and click with the left button. This should open a new window with the source code you can read.

Otherwise, SB, send me the file you created and I'll see what i can do.

Ernest Bywater ๐Ÿšซ

@Switch Blayde

I didn't see any HTML code

If you open the HTML or TXT file in Notepad you should see the html code as plain text.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ
Updated:

@Ernest Bywater

If you open the HTML or TXT file in Notepad you should see the html code as plain text.

I had Notepad on my Windows PC, but it's not on my Mac. When I went to a Mac class at the Apple Store they told me TextEditor is their version. But as Graybrd says, it's not the same.

I found out how to get the Develop in the Safari bar at the top (like File and Edit) and it has a show page source. But the right click is a great short-cut to do the same thing.

ETA: To use TextEdit as Notepad, you do this:

If you want to make it without any text formatting, go to menu "Format > Make Plain Text" (or shortcut 'Command+Shift+T".

But, I guess, that's if I wanted to code HTML from scratch. Doing what they said didn't show me the HTML.

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ

@Switch Blayde

Doing what they said didn't show me the HTML.

If and when you get a chance to look at the actual HTML code, the info on this page will help you:

https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Character_entity_references_in_HTML

in the left column is the actual html code that gets sandwiched between the & and ; symbols so they show as a single word. The next column is what it should render as, then the other codes etc. It's a great html code reference page for this stuff.

graybyrd ๐Ÿšซ
Updated:

@Lazeez Jiddan (Webmaster)

What MS Word does is to use either actual letters like a bullet or even letter like m and n, and sets them to a symbols font.

Lovely! So MS Word cannot be trusted to conform to all normal standards, and does not play well with others.

Seems there's been a rather long history of that.

Sadly, there's a multitude of methods to generate clean, standards-compliant HTML documents, but MS Word is not among them.

Replies:   REP  Vincent Berg
REP ๐Ÿšซ

@graybyrd

but MS Word is not among them.

Why change their attitude? It seems obvious that they do what they want and others can follow their lead, or go do their own thing. So far, it has worked for them, so I sincerely doubt they will change. At least as long as people continue to buy their products.

Replies:   graybyrd
graybyrd ๐Ÿšซ

@REP

Why change their attitude? It seems obvious that they do what they want and others can follow their lead, or go do their own thing. So far, it has worked for them, so I sincerely doubt they will change. At least as long as people continue to buy their products.

It's a question of respect for standards, to make life more sensible and productive for the community. The problems expressed in this thread are a good example.

Standards: everybody drives on the same side of the road; English speaking people read and write the same alphabet; railroads run on a standard gauge track; and our web browsers open and display pages using a standardized coding language.

Or we could just say 'screw it' and let chaos rule. Or let the biggest bully set the rules, and everybody can buy their product or go piss up a rope.

Aristocratic Supremacy ๐Ÿšซ

@graybyrd

Generally unrelated, but is 'go piss up a rope' a regional thing? I've never heard such an expression in Canada.

I assume it means 'go fuck themselves', but can you use it as a direct insult too? As in: "If you don't like it, you'll go piss up a rope." or something like that.

Replies:   graybyrd  Vincent Berg
graybyrd ๐Ÿšซ
Updated:

@Aristocratic Supremacy

I assume it means 'go fuck themselves', but can you use it as a direct insult too? As in: "If you don't like it, you'll go piss up a rope." or something like that.

I'll be 77 in July; I've heard this one all my life, in & around the western states. It's not an insult, per se; it's actually a vulgar variant of 'get lost' or 'go waste your time somewhere else."

Here's the best explanation I've found:

Try pissing up a rope. It'll become very clear. ask your grandmother--no seriously, older people are very familiar with sayings like that :^) it's a pointless exersize like pissing up a rope umm its like running uphill, or working agianst everything even though the world is agianst you, just a funny way to put it. go piss up a rope -it's like trying to nail porridge to the wall...can't be done The phrase describes an event that will never happen It simply means to go away because you have annoyed or made the person that said it angry. They say piss up a rope so u can get the piss in ur eyes... It comes from the phrase, "Go pound sand down a rat-hole."

Edit to add:

Generally unrelated, but is 'go piss up a rope' a regional thing? I've never heard such an expression in Canada.

I'm not surprised you haven't heard it. I live just south of the border; you're north. All my experience with folks up there is, they're generally too polite to say such a thing... at least to my face. ;-)

richardshagrin ๐Ÿšซ

@graybyrd

All my experience with folks up there is, they're generally too polite to say such a thing... at least to my face. ;-)

You haven't met many French-Canadians. Although if they say it in French you may not be able to translate it.

Replies:   graybyrd
graybyrd ๐Ÿšซ

@richardshagrin

You haven't met many French-Canadians. Although if they say it in French you may not be able to translate it.

Ahhh, but my friend, you said the 'secret' word: French!

(I'm on the Left coast. Very few of them here.)

Dominions Son ๐Ÿšซ

@graybyrd

Here's the best explanation I've found:

Another related idiom is pissing into the wind.

Replies:   REP  Not_a_ID
REP ๐Ÿšซ

@Dominions Son

wind.

Back in the 70's, a now defunct ratio station played a song by Jerry Jeff Walker titled, Pissin in the Wind. I always got a chuckle out of that song, especially the lyric "And we're pissin' in the wind, but it's blowing on all our friends"

http://www.lyricsmania.com/pissing_in_the_wind_lyrics_papa_m.html

Not_a_ID ๐Ÿšซ

@Dominions Son

Another related idiom is pissing into the wind.

...or an electric fence. :)

Vincent Berg ๐Ÿšซ

@Aristocratic Supremacy

Generally unrelated, but is 'go piss up a rope' a regional thing? I've never heard such an expression in Canada.

That's cause Canadian are too polite to piss up (or down) anyone's rope. 'D

Dominions Son ๐Ÿšซ

@graybyrd

It's a question of respect for standards, to make life more sensible and productive for the community. The problems expressed in this thread are a good example.

Microsoft has no respect for standards. They have pushed a number of standards through ISO only to immediately implement something with proprietary extensions to the standard so that third party tools implementing the standard Microsoft pushed will be incompatible with Microsoft's tools.

REP ๐Ÿšซ

@graybyrd

You misunderstood the question. It was intended as a rhetorical question to say:

Why should Microsoft want to voluntarily change their attitude?

Ezzy ๐Ÿšซ

@graybyrd

It's a question of respect for standards, to make life more sensible and productive for the community. The problems expressed in this thread are a good example.

Standards: everybody drives on the same side of the road; English speaking people read and write the same alphabet; railroads run on a standard gauge track; and our web browsers open and display pages using a standardized coding language.

Or we could just say 'screw it' and let chaos rule. Or let the biggest bully set the rules, and everybody can buy their product or go piss up a rope.

On the other hand, Word is on more PC's throughout the world than any other tool that has been mentioned here, by orders of magnitude.

Last I heard there were only two products in the world with over a billion users. Windows and Office (though possibly Android may be the third).

So what's the "standard"?

John Demille ๐Ÿšซ

@Ezzy

Last I heard there were only two products in the world with over a billion users. Windows and Office (though possibly Android may be the third).

A little out of date.

Android is over 2 Billions now, and iOS has roughly a billion too.

If you consider Facebook a 'product', then it has 1.95 Billion monthly users with over 1 billion daily users. Gmail is over a billion, Google search is over a billion.

With mobil, the scale has changed.

Ernest Bywater ๐Ÿšซ
Updated:

@Ezzy

On the other hand, Word is on more PC's throughout the world than any other tool that has been mentioned here, by orders of magnitude.

Last I heard there were only two products in the world with over a billion users. Windows and Office (though possibly Android may be the third).

So what's the "standard"?

The official standard is the Open Document Format.

Now, as to the number of copies of MS Word and Windows sold, you have to remember a few things about them:

1. Most of the sales states are for multiple sales of different versions to the same customer group.

2. There are a large number of versions of Word and Windows and most versions are not fully compatible with other versions, thus the figures should be split down into the various versions.

3. The great majority of the sales of the earlier versions are no longer in use, due to having been erased from the computers they were on.

I'd love to see what the numbers on actual current actively used versions of MS software are, but I doubt you'll ever see them.

Edit to add: If you include the Operating System used on embedded devices and mobiles then some version on Unix or Linux have hit the tens of billions mark, and the number of Nix variants goes ballistic if you add in the mobile devices using the Nix derivative called Android.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ

@Ernest Bywater

The official standard is the Open Document Format.

Depends how you define "standard."

If it's a group of people who say they are the open standards group, then I guess you're right because they say so.

But if it's the standard format out there, meaning when someone wants you to send it to them other than pasting it in the body of an email, it's either rtf or doc/docx. For me, that's the real-world standard. Not what someone says, but what's actually in use.

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ

@Switch Blayde

Depends how you define "standard."

It's the standard set by the official group established by the governments and the industry leaders. As to how important the Open Document Format is, well, Microsoft didn't provide the capability to open ODF files in their office suite until after the US Dept of Administrative Services told them they could either incorporate the ability to open, and save ODF files into MS Office or never sell another copy to a US federal or state department or agency because most of the European and South American governments and businesses were using the ODF and the US governments had to be able to work with the ODF files to continue to do business.

Now, there are a number of businesses that insist on files by MS products you send them, and most of those are US based, but there are also non-US businesses that insist on ODF files. And that's not getting into the incompatibilities with the various versions of MS products.

Vincent Berg ๐Ÿšซ

@graybyrd

Sadly, there's a multitude of methods to generate clean, standards-compliant HTML documents, but MS Word is not among them.

Mine get thru fine, but then I've agonized with WORD over the years to ensure it's behaving correctly. Note: I'm using Windows WORD 2013. The em-dash comes across as an actual em-dash, but as I noted, the ellipsis is converted to three dots.

Lazeez, if you want I can send you a test case, but it sounds like my files aren't what you're looking for.

@graybyrd
If you're software isn't generating the proper html codes, then that's precisely what Lazeez is looking for, to determine what code WORD is using instead.

Switch Blayde ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

To fix this issue on my end, I would need somebody with MS Word,

I found an app (CotEditor) in the Apple Store for HTML editing and installed it. I opened the .html file that was output from Word that I saved as "Webpage Filtered".

I expected to see what HTML code Word converted the em-dash and ellipsis to, BUT IT DIDN'T.

I didn't copy the entire HTML code here, only the part that represents the document I created. So Word has a special character in the HTML code rather than something like &mdash.

< div class=WordSection1 >

< p class=MsoNormal>Emdash โ€” < /p >

< p class=MsoNormal>Ellipsis โ€ฆ < /p >

< p class=MsoNormal> < /p >

< /div >

Lazeez Jiddan (Webmaster)

@Switch Blayde

I opened the .html file that was output from Word that I saved as "Webpage Filtered".

The problem doesn't show in Webpage (filtered) option. It's from straight Webpage from MS Word.

I need the Word-generated file to be emailed to me to view the raw code and see the binary file. Pasting anything here destroys the actual code that needs to be converted by my converter.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ
Updated:

@Lazeez Jiddan (Webmaster)

I need the Word-generated file to be emailed to me to view the raw code

I'll email it to the SOL webmaster with the em-dash and ellipsis generated in Word and saved as HTML (not filtered).

Oops. The webmaster link gives me a form I'd have to paste in. Where do you want me to email it to?

ETA: If you don't want your email public, send it to switch_blayde@hotmail.com and I'll reply with the attachment.

REP ๐Ÿšซ

@Ernest Bywater

is

Thanks EB. However, it really doesn't matter. Word converts the symbol to an HTML character and I don't have control over that conversion. It is what it is, and whatever character it generates, that character conflicts with the SOL text converter. Entering the 3 periods is as fast as going through the Symbols table.

There is the option of using CNTRL+Num to insert the character, which would be simpler. But I would have to expend the effort to figure out how to tell Word to output a different character code and there is also the fact that my keyboard doesn't have a Num key.

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ

@REP

Word converts the symbol to an HTML character and I don't have control over that conversion. It is what it is, and whatever character it generates, that character conflicts with the SOL text converter. Entering the 3 periods is as fast as going through the Symbols table.

The issue is there is a Standard HTML character, but Word converts it to a Special MS HTML Character which isn't recognised by everyone else. Thus it gets rendered wrong.

Replies:   REP
REP ๐Ÿšซ

@Ernest Bywater

The issue is there is a Standard HTML character

That is your issue Ernest. My issue is how do I get an ellipse through the text converter in the simplest way. I solved it by 3 periods.

Vincent Berg ๐Ÿšซ
Updated:

@REP

I encountered the problem again last week. After reading your post, I informed Lazeez.

From his response, it sounds as if he did something to correct that issue, but evidently my Word settings has some obscure, hard-to-identify setting that is conflicting with the text converter. Seems some of the other Word users are also encountering a similar problem.

Worried this might be a new problem, I checked my latest posts to ensure they weren't screwed up. While I submitted the files with ellipses, the SOL converter changed them to three-dots, so Lazeez automated it in reverse.

Test: This is a โ€ฆ test.

Strange. Entering "&amp;hellip;" in the Forum produces an actual ellipsis, while the same code submitted to SOL converts it to three periods.

awnlee jawking ๐Ÿšซ

@REP

The only problem I have ever encountered with Word is its ellipse symbol (...). I strongly recommend that you don't use it; use 3 periods instead.

Ditto OpenOffice. I'm using OO as a plain text editor so I've just tracked down and removed ellipsis from OO's 'replacements' table.

AJ

Switch Blayde ๐Ÿšซ
Updated:

@REP

The only problem I have ever encountered with Word is its ellipse symbol (...). I strongly recommend that you don't use it; use 3 periods instead. The problem with Word's ellipse symbol is the SOL text converter does not like Word's symbol. The converter will do strange things to your story's text when the it encounters the symbol.

I used Word (on my Mac) to create a file with an ellipsis and em-dash. I saved them as .html (not filtered) and emailed the file to Lazeez. He said Word created the ellipsis and em-dash properly so the SOL text converter would not have a problem.

If you're having a problem it may be your version of Word.

Replies:   Vincent Berg  REP
Vincent Berg ๐Ÿšซ

@Switch Blayde

If you're having a problem it may be your version of Word.

It may be one particular version of WORD, or perhaps a single setting not being set. We'll need to send more copies to Lazeez to trace the problem to a specific copy of WORD before we'll know for sure. But as I've said, I've never had a problem with using those characters on SOL before, but then I make a point of cleaning up all the crap that WORD shoves into their html files.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ
Updated:

@Vincent Berg

but then I make a point of cleaning up all the crap that WORD shoves into their html files.

Word didn't convert the em-dash or ellipsis to their HTML codes, so what would there be to clean up before submitting to the SOL Wizard?

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ

@Switch Blayde

Word didn't convert the em-dash or ellipsis to their HTML codes, so what would there be to clean up before submitting to the SOL Wizard?

I only meant that I'd spent a LONG time, figuring out how to turn off the various WORD options so it'll produce 'clean' html code, so that may have had an impact. However, it's more likely it's a version based matter. As I said, I'm running the PC based MS WORD 2013 producing the older .doc files, though I had the same results on SOL back when I was using WORD 2010.

REP ๐Ÿšซ

@Switch Blayde

it may be your version of Word.

I know. Lazeez has also said that he suspects it is a setting in the Word program.

Keep in mind I am running Microsoft Office Professional 2010 in a Windows' environment. That may be part of the reason I am having problem and you aren't. I doubt I am the only person running that version in a Windows' environment, so I shared my experience with others.

graybyrd ๐Ÿšซ

It's been mentioned in here before, but TextEdit is not primarily a 'text editor' which is quite a different animal from binary-format editors (such as .rtf, .doc, .docx, & other proprietary formats). TextEdit is superb at what it does, but plain-jane unformatted 'text' is not its prime function.

A superb free addition to your tool box is BBEdit. (The well-regarded 'light' version of BBEdit, known as TextWrangler, has been retired) Here's the info:

BBEdit offers a 30-day evaluation period, during which its full feature set is available. At the end of the evaluation period, you can continue to use BBEdit for free, forever, with no nag screens or unsolicited interruptions.

Get it here: http://www.barebones.com/products/textwrangler/

Vincent Berg ๐Ÿšซ

I've put a space between every character so the code characters will display - everything in italics is one word in html code.

I got tired of continually putting spaces in my html codes, so I experimented and figured out a solution. You type "&amp;hellip;", and it displays correctly. It's more work, but it looks better on the forum posts.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ
Updated:

@Vincent Berg

I got tired of continually putting spaces in my html codes

On wattpad, when I want to show the HTML, I use &lt (with the ;) instead of the <

But here it doesn't work.

Lazeez Jiddan (Webmaster)

I received a file with the problematic code. This is how Word โ€”sometimeโ€” creates an ellipsis:

< span style='mso-char-type:symbol;mso-symbol-font-family:Symbol'>ยผ< /span>

Ernest Bywater ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

I received a file with the problematic code. This is how Word โ€”sometimeโ€” creates an ellipsis:

< span style='mso-char-type:symbol;mso-symbol-font-family:Symbol'>ยผ< /span>

no wonder it doesn't crossover to good html code for it.

awnlee jawking ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

I remember a recent story littered with occurrences of spurious-looking ยผs. Did the author ever figure out the problem? If not, can anyone remember their name so we can pass on the 'good' news?

AJ

Replies:   REP
REP ๐Ÿšซ

@awnlee jawking

Did the author ever figure out the problem

It was an ongoing problem for me for over a year, so it may have been one of my stories. I saw it in another writer's story and let him know what Lazeez had told me. Do you recall the title or what the story was about?

Replies:   awnlee jawking
awnlee jawking ๐Ÿšซ

@REP

Sorry, no. All I can remember is that the story was posted in the last week or so.

AJ

Replies:   REP
REP ๐Ÿšซ

@awnlee jawking

No problem AJ. A recollection popped into my mind about receiving an email from you regarding my story Time Scope. Thanks for pointing out my mistake regarding numbering the farms. The 1/4 problem appeared twice in that chapter, so there is a high probability you were thinking of me. I posted an update to correct the glitch and error.

graybyrd ๐Ÿšซ
Updated:

I did a bit of research in Liz Castro's excellent book, "HTML, XHTML & CSS - 6th Ed." and learned these points:

The recommended encoding for web pages is UTF-8; Lazeez requests that UTF-8 be specified in the header for SOL pages.

The encoded declared in the HTML header MUST match the encoding with which your HTML page is saved.

EDIT TO ADD NEW INFORMATION:

This Microsoft support page explains (but also adds to the confusion!):

https://support.office.com/en-us/article/Choose-text-encoding-when-you-open-and-save-files-60D59C21-88B5-4006-831C-D536D42FD861

This page applies to Word versions 2007 to present.

In brief, MS says that Word is now "based on Unicode" so when you open and save Word files, Unicode will apply. Fine. So when Word converts to a (filtered) HTML file, why is that "h. ellipsis" character mangled as shown earlier?

END EDIT

If you do not EXPLICITLY choose an encoding when saving your HTML page, your text editor probably uses the DEFAULT encoding for your system. Windows default for the U.S. and Western Europe is "windows-1252"

(See the problem? You say the HTML is UTF-8, but its saved as windows-1252)

To save your page with the proper encoding (Windows):

In Word's "Save As" menu, look for ENCODED TEXT. Choose the desired encoding (UTF-8) and save the file.

Here's another 'gotcha': opening an encoded web page to edit. You have to make Word 'ask' you for the proper encoding. Choose Tools > Options, click the General tab, and check "Confirm Conversion at Open."

Otherwise, Word will use Windows default encoding: windows-1252.

(The 6th Ed. of HTML, XHTML & CSS is 2007; God Only Knows what has happened with Word since that time. Proceed with Caution: there be Dragons out there.)

As a side note, using Word for ePub:

Generate HTML "filtered"; much cleaner.

You MUST USE A TEXT EDITOR! and not Word to save true text files for XHMTL file edits. Notepad++ for Windows is a good choice.

Word saves HTML header files; replace that with a proper XHTML header.

Word fails to use the UTF-8 character set (using windows-1252) Change the charset variable in the header to utf-8.

When saving the XHTML file with Notepad++, be sure to specify UTF-8 as the encoding to write.

(A useful reference is Liz's other book, "EPUB Straight to the Point" pub. 2011)

I use Mac and Linux, so have my own basket of gremlins. Hope this helps a Windows Word user.

Replies:   graybyrd
graybyrd ๐Ÿšซ

@graybyrd

In brief, MS says that Word is now "based on Unicode" so when you open and save Word files, Unicode will apply. Fine. So when Word converts to a (filtered) HTML file, why is that "h. ellipsis" character mangled as shown earlier?

Okaaay, Mouseketeers... Yes, Windows & Word is indeed based on Unicode.

Guess what MS calls 'unicode' ... not the unicode that thee & me call unicode. MS elected to base their system on UTF-16 which is not the same as UTF-8. The character tables 'twixt the two have differences. So there's more jiggery-pokery going on under the font table... Enjoy!

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ

@graybyrd

MS elected to base their system on UTF-16 which is not the same as UTF-8.

Rather than being proprietary, that's actually looking forward, as most of the rest of the world (i.e. China, Japan and others with more complex single character words) require UTF-16, whereas the simpler Western countries don't.

Not that focusing on one doesn't confuse things.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ

@Vincent Berg

UTF-16

I really don't understand the UTF.

Why isn't UTF-8 a subset of UTF-16?

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ
Updated:

@Switch Blayde

I really don't understand the UTF.

Why isn't UTF-8 a subset of UTF-16?

UTF, or "Unicode Transformation Format", is what 'unicode' is. The only difference between the two, is that UTF-8 take up only a single byte for each character, whereas more complex languages require two bytes for each character, thus they take up at least twice as much space. However, html unicode characters often take up much more, as they create codes such as "&hellip;" instead of a single one or two byte letter.

What we're arguing here is that we want our software to use the more complex html coding, rather than the device specific codes (i.e. PC, Mac or Linux based codes).

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ
Updated:

@Vincent Berg

UTF-8 take up only a single byte for each character,

Actually, I just watched some videos and UTF-8 can take up to 4 bytes for a character.

So UTF is the next evolution of ASCII (which is what I was familiar with on a mainframe years ago). ASCII couldn't handle the needs for more than Latin numbers and letters so they came up with something that could.

This is what I don't understand.

I type two hyphens in Word and it changes it to an em-dash. Good, that's what I want. Lazeez says if I copy and paste that character into the SOL Wizard, it will show up as an em-dash in my story. Also good. That's what I want.

But there are cases where that's not true and the em-dash shows up as a "1/4".

If I were manually converting my Word docx to HTML I'd find all the italics and surround them with the "< i" tags and do a find/replace on the em-dash to change it to an mdash.

But if all I was doing was copying/pasting from Word to some place, such as the Wizard, how can I ensure it would display properly?

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ

@Switch Blayde

But there are cases where that's not true and the em-dash shows up as a "1/4".

If I were manually converting my Word docx to HTML I'd find all the italics and surround them with the "< i" tags and do a find/replace on the em-dash to change it to an mdash.

But if all I was doing was copying/pasting from Word to some place, such as the Wizard, how can I ensure it would display properly?

The screw-up seems tied to your version of WORD. If your text has received the all-clear from Lazeez, I wouldn't worry about it. If it was a problem, you could easily determine it by examining the specific references in your posted chapters. If the text doesn't contain any errors, then you should be safe.

Note: While my text is in the clear, and I'm currently using the PC version of WORD 2013, I'm not using the .docx format, so WORD may have changed the behavior in the newer format while leaving the older format as it was. Until we get more information, it's hard to tell which versions of WORD are problematic.

I'd suggest searching your posted text for any em-dashes. If it doesn't find any, then search for the 1/4 symbol. That will clearly indicate if you have anything to worry about.

If anyone here wants to ensure their text is clean before they post, I'd create a test sample using WORD's "Save as formatted html". You can then examine the html code in a text reader and see whether WORD includes anything in their "Symbol" font. If so, your version of WORD is questionable. I'd send the copy you saved to Lazeez along with which version of WORD it is, so he can figure out how to avoid this problem in the future.

It's more work for Lazeez, but I'm not sure how else to address it. But at least, if we do the necessary leg work on our own before sending him anything, we're at least minimizing our impact on his time.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ

@Vincent Berg

The screw-up seems tied to your version of WORD. If your text has received the all-clear from Lazeez, I wouldn't worry about it.

My version of Word is fine. Lazeez checked it out. That wasn't what I was asking.

I recently copy/pasted sections of Word files into a literary agent's webpage form. I assumed all the characters would copy over intact.

My question is how one would know that.

awnlee jawking ๐Ÿšซ

@Switch Blayde

I recently copy/pasted sections of Word files into a literary agent's webpage form.

How exacting were the agent's submission guidelines?

AJ

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ

@awnlee jawking

How exacting were the agent's submission guidelines?

I don't remember the exact words, but they warned you that copying from Word could be a problem. For instance, they said to put a blank line between paragraphs since the indenting won't work.

I may have screwed up the blank line thing. When I pasted it, it had two blank lines so I deleted one. I'm wondering what that looked like on their end.

Replies:   awnlee jawking
awnlee jawking ๐Ÿšซ

@Switch Blayde

There was a time when Word looked like becoming the de facto submission standard, superseding single-sided, double-spaced etc hard copies!

Large agents usually have their own reformatting software to coerce submissions to meet the submission requirements of each dead tree publisher they submit to. I'm not quite sure how pasting into a web form would work. It strikes me that when the final format is known, take SOL for instance, then a web form is fine, but not when submitting to dead tree publishers, each of who has their own house style.

Oh well, I've all this to come if I ever complete anything worthy of chucking at a publisher. :(

AJ

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ

@awnlee jawking

Large agents usually have their own reformatting software to coerce submissions to meet the submission requirements of each dead tree publisher they submit to. I'm not quite sure how pasting into a web form would work. It strikes me that when the final format is known, take SOL for instance, then a web form is fine, but not when submitting to dead tree publishers, each of who has their own house style.

Chances are, knowing large scale publishers and the agents they work with, there's mostly looking at quick ways to thin out the crap, so they probably toss anything which doesn't display after submission, without even glancing at it. :( Generally, the chances of getting a manuscript into the hands of anyone who'll even glance at it are pretty miniscule (not that I've ever tried it myself).

Lazeez Jiddan (Webmaster)

@Switch Blayde

I recently copy/pasted sections of Word files into a literary agent's webpage form. I assumed all the characters would copy over intact.

My question is how one would know that.

The only way that you โ€”as the submitterโ€” would make sure that everything is ok, is if the creator of the form provided you with a preview right after you pasted your text.

Pasting text into a form should be safe โ€”encoding wiseโ€” as long as you have a modern browser.

The SOL Wizard's pasting function tell the browser that it should use UTF-8 encoding when it sends the form. In 99.99% of cases, things work exactly as expected. Trouble happens when you have a misbehaving browser or usually a really old browser that's not up to date on modern web code. Some authors are still on XP and IE6!

The drawback to pasting into the Wizard is that you would lose formatting that the copied text might have had when copied like italics and bolding.

Attaching files is whole other ball of wax. If the attached file is plain text, then it's up to our tools and the moderators to handle things correctly. If it's HTML, then it's more complex as the method of creation of the html code vary so widely. Various tools, some of it screwy like Word, and many people who create their own html. I receive a lot of html files from many authors that are hand crafted, and some of it is faulty. Many authors use notepad to create html files and use the ISO-8859-1 encoding declaration, while notepad uses windows-1252 encoding by default. So I receive files that need to be handled by the moderator to override the Wizard's automated html->tags converter and figure things out and change stuff manually.

Some html files are so screwed up that sometime we have to send it back as it can't be fixed.

Ernest Bywater ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

Some html files are so screwed up that sometime we have to send it back as it can't be fixed.

Wheew. well, since I don't get all of mine back, I figure they must be mostly OK. I do try to ensure they're using only the html code you allow.

Lazeez Jiddan (Webmaster)

@Ernest Bywater

I do try to ensure they're using only the html code you allow.

While it might be helpful, it's not really necessary. The converter finds the codes that it supports and converts them to SOL tags and simply deletes anything extra.

So if you want to try some files as you use them for your own purposes, it might help you save some effort.

Ernest Bywater ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

While it might be helpful, it's not really necessary. The converter finds the codes that it supports and converts them to SOL tags and simply deletes anything extra.

So if you want to try some files as you use them for your own purposes, it might help you save some effort.

Actually, it helps me, because I also want a simple HTML file for my own records, so there's no real work between the two.

Switch Blayde ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

The converter finds the codes that it supports and converts them to SOL tags and simply deletes anything extra.

That's why I manually surround the italic words in my story with the HTML "i" tags since the Wizard supports them, but I leave the ellipsis and em-dash characters created by my version of Word as is.

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ

@Switch Blayde

That's why I manually surround the italic words in my story with the HTML "i" tags since the Wizard supports them, but I leave the ellipsis and em-dash characters created by my version of Word as is.

Again, I went with the assumption of 'if my code ain't broke, don't fix it'. So I cleaned up my own html and submitted it, and when it worked I figured it was 'satisfactory'. When something didn't work, I quickly identify the issue and resolved it within an hour (of noticing it didn't post correctly, which often takes a while).

For riding bareback with software for so long, it's amazing I never crashed and burned. My biggest problem was always forgetting to switch Windows character sets to UTF-8 for 16-bit characters.

Vincent Berg ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

While it might be helpful, it's not really necessary. The converter finds the codes that it supports and converts them to SOL tags and simply deletes anything extra.

So if you want to try some files as you use them for your own purposes, it might help you save some effort.

I figured that out pretty quickly (without anything failing or being sent back). I'd submit the wrong character set, but the text would process without a hitch and the 'converter' would copy the 'allowable' text into the necessary character set. My only issue was when I started using 16-bit code, when I learned which characters needed html coding and which didn't. That sounds like a recipe for disaster, but I kept my code basic while still adding all sorts of bells and whistles, which SOL simply stripped out, which is fine with me.

awnlee jawking ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

Some authors are still on XP and IE6!

How can they even log in? I mostly use Chrome because XP's version of IE and SOL aren't able to talk securely.

AJ

Lazeez Jiddan (Webmaster)

@awnlee jawking

How can they even log in? I mostly use Chrome because XP's version of IE and SOL aren't able to talk securely.

I don't know, but I see the user agent string in the wizard.

While the proportion is minuscule, there are 263 site users still on IE6!

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

While the proportion is minuscule, there are 263 site users still on IE6!

I don't know if it's the case with these users, but many, many moons ago when i still used Win XP and MSIE6 I downloaded and used a program called Avant Browser which was basically an overlay on MSIE6 with better security and a few extra features. However, it also presented to the world as MSIE6 in all those site stats gathering software. I believe it's still available for a number of versions of Windows and MSIE.

Vincent Berg ๐Ÿšซ

@awnlee jawking

How can they even log in? I mostly use Chrome because XP's version of IE and SOL aren't able to talk securely.

I'm not a lot better, as I'm still on Win 7 and absolutely refuse to upgrade once they started instituting 'forced upgrades'. Now I'm just waiting for my desktop to croak, at which point I'll finally move everything over the a MAC, but my desktop was custom designed and the Mac's, while reliable, aren't nearly as powerfully configured.

I'll be sorry to see this desktop die, but I'll never buy another PC or Linux machine again. The one is too buggy, and the latter is to difficult to 'work around'.

But, those are just the gripes of an ex-software designer. I don't like mucking with the nuts and bolts, but I want tools that allow me to do whatever the frig I want. :(

Vincent Berg ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

Attaching files is whole other ball of wax. If the attached file is plain text, then it's up to our tools and the moderators to handle things correctly. If it's HTML, then it's more complex as the method of creation of the html code vary so widely. Various tools, some of it screwy like Word, and many people who create their own html. I receive a lot of html files from many authors that are hand crafted, and some of it is faulty. Many authors use notepad to create html files and use the ISO-8859-1 encoding declaration, while notepad uses windows-1252 encoding by default. So I receive files that need to be handled by the moderator to override the Wizard's automated html->tags converter and figure things out and change stuff manually.

I hate to admit it, but I learned everything I know about html from examining other people's code and then figuring out what code performed which functions. I then started searching for webpages that performed like I wanted them to, and copied the portions that did those particular functions.

But then, I learned to code using software products which were no new, there were no coding manuals (other than a basic command reference), so I'm pretty experienced in figuring out how things operate and decoding/debugging code on the fly, as well as pointing likely problems in unfamiliar code.

I figured out character sheets right away.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ

@Vincent Berg

I learned everything I know about html from examining other people's code and then figuring out what code performed which functions.

I used w3schools for both HTML and CSS. Here's their site:
https://www.w3schools.com/html/

They also have a validator.

graybyrd ๐Ÿšซ
Updated:

Here's a 2015 reply to a query complaining that Word substituted (TM) for a Trademark symbol that had been inserted from Word's 'Symbols Group':

Many of the symbols Word inserts are not Unicode, but "normal" characgter codes formatted with a different font, such as WingDings. You need to make sure the symbols giving you problems are truly unicode characters for that symbol

Guess that would explain the "1/4" substitution for horizontal ellipsis? WingDings font, anyone?

So it would seem that Word now reads & writes unicode... except, when it doesn't.

Also... what version is the font? I stumbled on the fact that Times New Roman was modified between XP and Vista, revising the glyphs. So, how does one determine just what FONT version is involved? My TNR may be significantly altered from YOUR TNR... ?

Ernest Bywater ๐Ÿšซ
Updated:

The ellipsis character set has a html code and a unicode which are as below, with spaces between the code characters so the code shows then spaces without in the hopes it will display.

html & h e l l i p ; โ€ฆ

unicoe U + 2 0 2 6 U+2026

Please note neither has the 1/4 symbol in them, nor are they the same as the windows 1252 code.

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ

@Ernest Bywater

Please note neither has the 1/4 symbol in them, nor are they the sam as the windows 1252 code.

Also, the same code ("&hellip;") works in Windows 1252 as well.

Ernest Bywater ๐Ÿšซ
Updated:

I don't know if the word processors you're using can do this, but the older versions of MS Word could (I stopped using it with Word for Windows 6a). But you can insert a special character in your text. In Libre Office you do this by - menu -Insert - Special Character - - which opens a sub-window where you select the character you want. Looking at that I have the following options within the font Palatino Linotype:

U+2025 displays two dots
U+2026 displays three dots
U+2010 displays a short dash
U+2011 displays a slightly longer short dash
U+2012 displays an en-dash
U+2013 displays a slightly longer en-dash
U+2104 displays an em-dash
U+2015 displays a slightly longer em-dash

You should be able to do the same to insert the character while embedding the Unicode in the word processor format code.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ

@Ernest Bywater

But you can insert a special character in your text.

But those characters aren't recognized by the SOL Wizard.

Ernest Bywater ๐Ÿšซ

@Switch Blayde

But those characters aren't recognized by the SOL Wizard.

They should be in a format the Wizard should recognize instead of the odd MS Code. Only way is to try them out and see.

Vincent Berg ๐Ÿšซ

@Switch Blayde

But those characters aren't recognized by the SOL Wizard.

No, only a few essential html codes are accepted, but still, it's a way of ensuring you're entering the correct code, and it's a way of verifying whether your version of WORD is screwing you over or not.

As for your worries about copying text to a third party, that has nothing to do with the code you copy, and everything to do with the system you're copying to. Most online tools aren't terribly robust! However, I'm guessing if it's an issue then they're already aware of it, but simply aren't willing to invest the time in trying to fix it.

If they're worried about the results, they can always request a new copy, at which point you'll know to remove ALL publication marks from the document.

Replies:   Ernest Bywater
Ernest Bywater ๐Ÿšซ

@Vincent Berg

at which point you'll know to remove ALL publication marks from the document.

And that I suspect is the key point. They're Publishing Marks and not writing characters.

Switch Blayde ๐Ÿšซ

OK, let me ask this a different way. And I'm not having a problem with the SOL Wizard. This is simply educating myself since I've been out of IT for so long.

If I write something using an HTML editor, I define the charset, say UTF-8, so when I type an "A" it knows to display it as an "A". That I get.

But what if I typed up the document somewhere else and that application did not use UTF-8. What if that application used a non-standard charset that used a different code for an "A" (I know "A" is probably the same in all of them, but I'm just using it as an example)?

UTF-8 = hex 41 = A (hex 42 = B)
my app = hex 42 = A

So when I copy the "A" from my application that doesn't use UTF-8 into an HTML file that defines the charset as UTF-8, will the browser display an "A" or a "B"?

When I paste the "A" I see on my screen, is it really pasting hex-42 which is actually a "B"?

Lazeez Jiddan (Webmaster)

@Switch Blayde

So when I copy the "A" from my application that doesn't use UTF-8 into an HTML file that defines the charset as UTF-8, will the browser display an "A" or a "B"?

Usually, any copy/paste operation is handled through the computer's system API for copy/paste. Copy/paste usually standardizes text encoding to make this operation seamless without the user having to worry about anything. So if you copy an A, you'll paste an A and it's not about hex 42 or hex 41.

The old Mac OS (os 9 and prior) used the 'Macintosh' text encoding scheme and all the applications used it as the default as they were coded against the provided text handling APIs.

Windows default encoding is still 1252 with some applications handling UTF-8 on their own. But when copying and pasting, it's a system API that handles it and again you don't need to think about the issue.

Mac OS X or as is now known macOS is based on UTF-8 internally as far as I know.

Linux used ISO-8859-1 but most distributions are now based on UTF-8.

The problem usually happens when text is crossing from one system to another and not when it's being handled on the same system. So again, unless you're specifying an HTML file's encoding meta tag, there is usually minuscule chance that you might get the encoding screwed up.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

Usually, any copy/paste operation is handled through the computer's system API for copy/paste. Copy/paste usually standardizes text encoding to make this operation seamless without the user having to worry about anything. So if you copy an A, you'll paste an A and it's not about hex 42 or hex 41.

Great! Thanks.

REP ๐Ÿšซ

@Switch Blayde

So when I copy the "A" from my application that doesn't use UTF-8 into an HTML file that defines the charset as UTF-8, will the browser display an "A" or a "B"?

I don't know enough about the subject to have a truly valid opinion. But logic says that assuming your oddball application saves characters as hex values that differ from the characters corresponding UTF-8 values, then copying those hex values and pasting them into an application that uses the UTF-8 character set would end up with you seeing a garbaged text string on your monitor.

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ

@REP

I don't know enough about the subject to have a truly valid opinion. But logic says that assuming your oddball application saves characters as hex values that differ from the characters corresponding UTF-8 values, then copying those hex values and pasting them into an application that uses the UTF-8 character set would end up with you seeing a garbaged text string on your monitor.

Again, the various character sets all treat simple text largely the same. The difference is how they treat 'special' characters, like publication marks (ex: smart quotes, ellipses, em-dashes, etc.) In those cases, they may substitute either different characters or, more likely, characters designed for another machine (i.e. it'll only display on one particular machine, though in the case of Windows-2381?, every browser will display the page as if the computer was a Windows machine, in other words, the browser corrects the mistaken behavior).

The solution, in any base, is to use html commands rather than uploading (copying & pasting) machine specific codes.

Replies:   Switch Blayde  REP
Switch Blayde ๐Ÿšซ

@Vincent Berg

The solution, in any base, is to use html commands rather than uploading (copying & pasting) machine specific codes.

Only if what you paste into recognizes the HTML tag.

REP ๐Ÿšซ
Updated:

@Vincent Berg

It sounds as if your post is addressing something other than SB's original post.

If I understand SB's question regarding a hypothetical application using a hypothetical character set (let's call it XYZ), then the XYZ character set defines the characters to be displayed as Hex codes. Let us assume for this example that the text string ABC equates to the hypothetical character set's hex values of 21, 22,and 23.

Now to a computer these hex values are stored as a string of 1's and 0's. It is hypothetical application and the XYZ character set that results in the hex values of 21, 22, and 23 being translated into the text character string ABC.

So if the hex values were to be copied and pasted into a text string of an application's file that defines the file's hex values to be characters of the UTF-8 character set, the resulting character string seen on the monitor would be !"#.

Or am I missing something?

Minor edits to correct terms.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ

@REP

Or am I missing something?

You got it right. Hex wasn't important. I'm just more familiar with hex than other ways of defining it.

But according to Lazeez, it's not the binary bits that are copied and then mapped to receiver's code table. It's your computer's API copying the actual character, not the binary bits.

What I still don't understand is why it doesn't always do it right.

Replies:   REP  REP
REP ๐Ÿšซ

@Switch Blayde

I'm just more familiar with hex than other ways of defining it.

Hex or hexadecimal is the base of one of several common numbering systems:

Binary - base 2 - 2 values (0 and 1),
Quatnary -base 4 - 4 values (0-3),
Octal - base 8 - 8 values (0-7),
Decimal - base 10 - 10 values (0-9)
Hexadecimal -base 16 -16 values (0-9, A, B, C, D, E, and F).

Today's computer data buses are generally 16-bit or 32-bit words, and the words are expressed as one or two hexadecimal values (i.e., B or B9). That's easier to express and remember than a string of 16 or 32 ones and zeros.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ
Updated:

@REP

That's easier to express and remember than a string of 16 or 32 ones and zeros.

Which is why I used hex instead of binary.

ETA: Many, many years ago, an IBM SE (Systems Engineer) told me he gave his age in hexadecimal to sound younger. It worked until be turned 2A.

Replies:   Not_a_ID
Not_a_ID ๐Ÿšซ

@Switch Blayde

ETA: Many, many years ago, an IBM SE (Systems Engineer) told me he gave his age in hexadecimal to sound younger. It worked until be turned 2A.

Only a 6 year inconvenience. Once he turned 48(decimal) he could go back to being 30(hexadecimal) again. By that same token, a 64YO could claim 40, and an 80YO could claim 50. :)

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ

@Not_a_ID

Only a 6 year inconvenience. Once he turned 48(decimal) he could go back to being 30(hexadecimal) again. By that same token, a 64YO could claim 40, and an 80YO could claim 50. :)

I prefer claiming I'm 2E years old! It inspires more curiosity and opens discussion options. 'D

Replies:   Not_a_ID
Not_a_ID ๐Ÿšซ

@Vincent Berg

I prefer claiming I'm 2E years old! It inspires more curiosity and opens discussion options. 'D

Could switch it up to base 32, in which case you're about halfway through your "teen years" as it were, from age 33 to 64. ;)

REP ๐Ÿšซ

@Switch Blayde

It's your computer's API copying the actual character, not the binary bits.

Computers don't work with characters; they only handle 1's and 0's. Thus it is the binary bits that are copied. The computer displays those numeric values as text string letters.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ

@REP

Thus it is the binary bits that are copied.

The copy/paste functions must do something other than copy binary bits.

That was the question I asked. If the binary bits for one application are different than the binary bits of the 2nd application for the same character, what character would end up in the 2nd app (e.g., the SOL Wizard).

Lazeez said it would copy/paste correctly because the computer's API takes care of it. What that's telling me is the API knows the charset of the 1st app and converts the binary bits to what the 2nd app has defined for that character.

awnlee jawking ๐Ÿšซ

@Switch Blayde

Some time ago, ZoneAlarm rewrote their firewall software in C++. I discovered it left binary crap on the clipboard, interfering with other programs such as M$ Excel, which also uses the clipboard for copying and pasting within Visual Basic macros.

AJ

Lazeez Jiddan (Webmaster)

@Switch Blayde

What that's telling me is the API knows the charset of the 1st app and converts the binary bits to what the 2nd app has defined for that character.

In modern systems, all applications must adhere to programming conventions provided by the host system. Programmers don't like to do extra work for nothing. So if something is provided by the system to begin with, there is no point of re-creating it on your own unless you have a very specific need that isn't fulfilled by the host system.

On Windows, Mac and Linux and every other popular system currently in use, the system provides an API to handle text functions. So the programmer doesn't need to worry about how to construct a string or how that string is encoded. So for example you go something like:

a = "Jake got a feeling of dรฉjร  vue";

or something similar. The program doesn't try to deal with how each character is encoded, it relies on the system to handle it. So when it's time to copy something to the system's clipboard, the application doesn't even look at the underlying encoding of a string's characters, it simply pushes the string to the clipboard using the system's command to do so.

For example on the Mac you so something like this (using Swift):

let pasteboard = NSPasteboard.general()
pasteboard.declareTypes([NSPasteboardTypeString], owner: nil)
pasteboard.setString("Jake got a feeling of dรฉjร  vue", forType: NSPasteboardTypeString)

Nowhere in that code does the programmer deal with anything related to the underlying binaries/hex codes representing this string.

Again, the only time things get screwed up is when the text file crosses from one platform to another. You will never have any problems with a file that was created on the Mac when you open it on a Mac, but email that file to a windows machine and you might need to do some acrobatics to open the file as the character encoding that originated on the Mac may differ from what the windows system expects.

On SOL/FS/SFS we receive text submissions from every platform in existence. So our tools contain an application that can translate from one character set to another. So it has a command to 'Open using Encoding' and then we can change the encoding to UTF-8 before saving it and invoking the posting tools.

Replies:   Not_a_ID  Switch Blayde
Not_a_ID ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

Nowhere in that code does the programmer deal with anything related to the underlying binaries/hex codes representing this string.

Well, unless the programmer is security conscious and is in a habit of "normalizing" their text to ensure nobody is trying to insert executable code into the data stream. ;)

Which is where many security exploits/holes happen.

Switch Blayde ๐Ÿšซ

@Lazeez Jiddan (Webmaster)

So our tools contain an application that can translate from one character set to another

That's the part I was missing. It's not magic. You do convert from one charset to another.

So problems will only occur if UTF-8 does not have a character used in the story or the Wizard's converter doesn't recognize the charset used in the input file (which is highly unlikely).

For SOL submissions, I never did a copy/paste. It's a long story why I did it a this way, but while writing my story in Word I would type in the HTML "i" tags around words I wanted to be italics. And I used "--" for the em-dash and 3 dots for the ellipsis (having removed them from the autocorrect table because of the next step). I then saved it as txt and attached that .txt file to the SOL Wizard submission.

But now when I write in Word I do the command/i to italicize words and let Word's autocorrect convert the "--" to an em-dash and the 3 dots to an ellipsis.

The next time I write a story for SOL, I plan on saving that docx file as "filtered html" and attaching that .html file to the SOL Wizard. I'm assuming the "save as" will put the HTML "i" tags around the italicized words and the SOL converter will handle the em-dash and ellipsis characters (converting from the charset Word uses in the filtered HTML file to UTF-8).

REP ๐Ÿšซ
Updated:

@Switch Blayde

The copy/paste functions must do something other than copy binary bits.

What you see on your screen is nothing more than a series of 1's and 0's in your computer's memory. When you block a section of text on your screen for a copy operation, the computer selects the corresponding 1's and 0's in its memory. When the copy command is given, the selected binary bits are copied to "clip board" memory. Positioning your cursor on the screen and issuing the paste command causes the computer to select the corresponding point in memory and then insert the 1's and 0's stored in "clip board" memory.

What you are overlooking is the file the computer is using to create the screen's image contains numerous definitions, such as the character set definition. The computer uses all of these definitions to create what you see on the screen. So the copied binary bits are copied from a defined text string and pasted into a defined text string. It is the other definitions in the file that tell the computer what to do with those binary bits.

REP ๐Ÿšซ

@Switch Blayde

Lazeez said it would copy/paste correctly

The entire paragraph is correct. When we send a file to the SOL converter, that file contains the definition of the character set being used. The SOL converter can convert that character set's codes to recreate the proper character in the SOL converter's character set because the original file defined the character set being used.

However, that will only occur if converter has a translation routine and database for the original file's character set. In your example, you assumed an obscure (unknown?) character set. Now if the SOL converter received a file using that character set and the converter does not have a translation routine for the obscure character set, the converter will generate a error message and abort the conversion.

Replies:   Switch Blayde
Switch Blayde ๐Ÿšซ

@REP

In your example, you assumed an obscure (unknown?) character set.

Only because I was lazy. I didn't want to search through all the charsets to find an example that would work. So I made one up.

Replies:   Not_a_ID  REP
Not_a_ID ๐Ÿšซ

@Switch Blayde

Only because I was lazy. I didn't want to search through all the charsets to find an example that would work. So I made one up.

https://en.wikipedia.org/wiki/PETSCII

I know someone that used a terminal program on a Commodore 64 that managed to hard lock(requiring a power off reboot) some Windows3.11 machines running BBS software back in the day by hitting certain keys while chatting to the sysop. ;)

REP ๐Ÿšซ

@Switch Blayde

So I made one up.

A fictitious character set was good for your example. Though difficult to talk specifics.

datadude ๐Ÿšซ

@Anotherp08

Holy Hades Butt Crack. I had hoped for a couple suggestions, but wow.

Replies:   Vincent Berg
Vincent Berg ๐Ÿšซ

@datadude

Holy Hades Butt Crack. I had hoped for a couple suggestions, but wow.

There are no simple questions here, only excuses for us old-timers to reveal everything minutia of detail we've accumulated over the years. 'D

Anotherp08 ๐Ÿšซ

This last chapter was a real pain. It seems as soon as you start formatting the SOL converter says, "Yippee this is someone who has formatted everything." IT stripped out almost all of my carriage returns and dumped it as one huge file. I finally used a clean word doc and find and replace. I replaced all the paragraph marks with a {br} and a paragraph mark. Then worked on the simple formatting or the letters. I have some skill with find and replace, so this took me less than five minutes. I used the Format Previewer and was happy with the outcome and saved it as a TXT file and reposted it.
Thanks everyone for their suggestions. If this comes out reasonably well, I will keep using this method.

Back to Top

Close
 

WARNING! ADULT CONTENT...

Storiesonline is for adult entertainment only. By accessing this site you declare that you are of legal age and that you agree with our Terms of Service and Privacy Policy.


Log In