Forum: Author Hangout

"DocFetcher" organizing your library

ChiMi 🚫

I use it for months now and it is great.

You can index folders or whole drives and search for keywords in seconds.
It works with all possible textfiles and can even index images and videos.
The results are displayed in how much the searched word appears in the file (in relation to the whole wordcount of the file)

Pro-tip: add your own specific keywords in the file to finetune your library.

Replies: Capt. Zapp REP

Capt. Zapp 🚫
Updated:

@ChiMi

"DocFetcher" organizing your library

Thanks. I'll give it a try.

REP 🚫

@ChiMi

Are you saying it would organize my library on SOL?

If on my harddisk, what type of organization does it do and can it be limited to a specific directory (ies).

Replies: ChiMi

ChiMi 🚫

@REP

At its core function, DocFetcher is just a simple indexing tool for your files that you store on your hard drive.

After indexing a part or your whole hard-drive you can search for words and get results in seconds. It isn't just indexing the filenames, but the whole text that is in the files. So if you search for something, but you only remember that it used the word "Goddonggit" instead of "goddamnit" you would instantly find the file where this word was used. or you remember it had a cat-girl or was placed in Minnesota, you can search for those words and find every file with those words in it.

The MAJOR difference it makes and where is it most powerful, is with tags or code-words.
With a little (or big) work on your part, you could code or tag your files in a specific way (just add those words into the file at the start or end) to fine-tune your search results.

Let's say you want to reread a romantic love-story, you have hundreds of stories on your drive and know some of them would be considered love stories. If you tag your stories, you could just open docfetcher and type in "love-Story" and it would display all files that you either have manually tagged as a love story or have the word love-story actually in the text.

So docFetcher is at its most usefulness if you manually tag or code your stories.

Replies: Michael Loucks Capt. Zapp

Michael Loucks 🚫

@ChiMi

So docFetcher is at its most usefulness if you manually tag or code your stories.

This feature is built into Mac OS, and I find it VERY handy. Filenames are not 'all that' but tags are nearly magical!

Capt. Zapp 🚫

@ChiMi

So docFetcher is at its most usefulness if you manually tag or code your stories.

So I should be able to find stories using the SOL tags as long as I have them in the 'file info'.

Replies: ChiMi REP

ChiMi 🚫

@Capt. Zapp

yes

REP 🚫

@Capt. Zapp

find stories using the SOL tags

I got the impression from sunkuwan's reply to my question that it works on your hard drive.

If you are thinking of using it on SOL story files, it may not work for it would have to search SOL's data storage locations.

Replies: Capt. Zapp

Capt. Zapp 🚫
Updated:

@REP

I got the impression from sunkuwan's reply to my question that it works on your hard drive.

If you are thinking of using it on SOL story files, it may not work for it would have to search SOL's data storage locations.

I save every story I read on SOL that I like to my hard drive, just in case something happens where the story is taken down. I started doing that after Young Thinker pulled all of his stories. I currently have over 3,500 stories saved although there is some duplication between txt, rtf, and odt files. I generally copy-paste as I read, but on occasion I download the text files which I rename as RTF. That forces them to open with OpenOffice instead of LO where I open them as UTF-8 format (Why LO doesn't ask for file type I have no idea). I then clean up the formatting to suit my taste and re-save it as an odt file. I put the story description and tag information in the document information section.

I have indexed all of the story files (as well as my anime collection). I did initially have a problem trying to index all the files because the standard setup uses a small (256Mb) heap size. I switched to 1024 and have no problems using DocHunter.

ET correct typo

Replies: REP

REP 🚫

@Capt. Zapp

Why LO doesn't ask for file type I have no idea

I suspect it detects the file format automatically and then uses the appropriate file conversion algorithm to display the contents.

Replies: Capt. Zapp

Capt. Zapp 🚫

@REP

I suspect it detects the file format automatically and then uses the appropriate file conversion algorithm to display the contents.

If that is the case, they need to rework it as it doesn't work. I can specify the UTF-8 format if I use the 'open as' command, but it is easier to use the rtf file extension and have OO handle it. I know both programs are supposed to work the same (or at least very similarly) but I find LO easier to work with for almost everything else.

The biggest problem is if I download the txt file and open it in LO straight from the folder, the formatting is incorrect as LO does not recognize the UTF-8 format from the file. Maybe I am missing a setup option that will do that. If anyone knows the answer, please let me know.

Replies: Vincent Berg Ernest Bywater REP Ernest Bywater

Vincent Berg 🚫

@Capt. Zapp

The biggest problem is if I download the txt file and open it in LO straight from the folder, the formatting is incorrect as LO does not recognize the UTF-8 format from the file. Maybe I am missing a setup option that will do that. If anyone knows the answer, please let me know.

Check with Ernest, as he's the most knowledgeable about LO operations.

Ernest Bywater 🚫
Updated:

@Capt. Zapp

The biggest problem is if I download the txt file and open it in LO straight from the folder, the formatting is incorrect as LO does not recognize the UTF-8 format from the file.

Two issues here:

1. .txt files do not have much in the way of format code in them to start with. Thus a lot of what you'd see in the html version just isn't there to start. Bold, centre align etc is rarely in .txt documents

2. a lot of files saved in .txt and .rtf format use the carriage return code instead of the paragraph end code. They give a different result. In html code you use < br > to equate to a carriage return while you use < / p > to equate to a paragraph end.

The default LO and OO saving is as a document using the paragraph end and not the carriage return.

Most .txt files are saved as ASCII while only some systems save them as UTF-8. I'm not sure what it is you get from SoL

I just checked a file from SoL by downloading it as .txt and it came through as a standard plain text file, no bold, no centering, no itlaics, no coloured fonts, no different font sizes - - just one font size and type with every line left aligned, which is what you usually get in any .txt file.

The html has colours, bold, italics, and centering with varying fonts.

With and .rtf file I would expect bold and italics, but not much more.

typo edit

Replies: Capt. Zapp

Capt. Zapp 🚫

@Ernest Bywater

2. a lot of files saved in .txt and .rtf format use the carriage return code instead of the paragraph end code. They give a different result. In html code you use < br > to equate to a carriage return while you use < / p > to equate to a paragraph end.

When I open a txt file with LO, I get a linefeed (little left pointing arrow with a tail) instead of a ¶ (paragraph end mark)

Replies: Ernest Bywater

Ernest Bywater 🚫

@Capt. Zapp

When I open a txt file with LO, I get a linefeed (little left pointing arrow with a tail) instead of a ¶ (paragraph end mark)

Ayep - some people call it a line feed, way back when we used to have to duck the dinosaurs looking for dinner I was taught to call that a carriage return. At that time I used to do a lot of telex messaging.

Replies: Capt. Zapp REP BlacKnight Dominions Son

Capt. Zapp 🚫

@Ernest Bywater

I was taught to call that a carriage return.

Yeah, I remember that too. Any idea if there is a setting in LO for how it is handled?

Replies: Ernest Bywater Vincent Berg

Ernest Bywater 🚫

@Capt. Zapp

Yeah, I remember that too. Any idea if there is a setting in LO for how it is handled?

Because it's a recognised encoding character LO should handle it as per the normal coding procedure for it. However, I think you could use the Find & Replace option to change it in the document you have. I don't have anything on hand I can try it with, but F&R should work.

Replies: Vincent Berg

Vincent Berg 🚫

@Ernest Bywater

However, I think you could use the Find & Replace option to change it in the document you have. I don't have anything on hand I can try it with, but F&R should work.

I use F&R (in WORD) all the time, most often to replace those annoying spaces at the beginning of lines when you paste something into a document just before quoted text, but maybe that's just a WORD trait.

If you know HOW to format, it's not difficult formatting something exactly how you want it. The key is in knowing HOW to format a file.

Vincent Berg 🚫

@Capt. Zapp

Yeah, I remember that too. Any idea if there is a setting in LO for how it is handled?

LO handles paragraphs, as well as style definitions, just fine. But what we're discussing here is someone opening up a plain text file in LO, and then complaining that LO doesn't offer to 'enrich' the file to add all the missing components he never managed to grab.

It's easy enough to capture the text in an SOL file and copy it to MS WORD, LO or even Bear. So don't bitch that LO isn't reading you mind and compensating for what you don't know how to do.

Replies: Capt. Zapp

Capt. Zapp 🚫

@Vincent Berg

LO handles paragraphs, as well as style definitions, just fine. But what we're discussing here is someone opening up a plain text file in LO, and then complaining that LO doesn't offer to 'enrich' the file to add all the missing components he never managed to grab.

What I am saying is that if I open the txt file I download from SOL in LO, it just opens the file and has Line Feeds in place of End of Paragraph. If I go to open the same document in OO, it brings up an 'ASCII Filter Options' box where I can select Unicode (UTF-8) as the character set and set the end of paragraph style. This way all of the paragraphs end with an End of Paragraph instead of Line Feed.

Before I found out OO did it automagically, I DID use F&R, but I find it much easier to double-click the file and have it open the way it should instead of having to open the file then run an F&R, and then check to make sure all the paragraphs are correct.

Someone was griping in another discussion that one word-processor was seconds slower than another saying they didn't have that time to waste (silly, right?) yet doing the F&R takes a LOT longer than a few seconds, even if it is set up as a macro.

For now, I will just keep opening the 'raw' text file I get from SOL in OO, save it to odt, and then work it with LO.

Replies: Ernest Bywater Vincent Berg

Ernest Bywater 🚫

@Capt. Zapp

If I go to open the same document in OO, it brings up an 'ASCII Filter Options' box where I can select Unicode (UTF-8) as the character set and set the end of paragraph style.

While I use LO all the time, I don't know everything about it. However, there are two important things to know about LO.

1. The developers who created and worked on OO up to version 3 are the same people who started LO and have worked on it since. They stopped working on OO and started LO when Oracle owned OO and were pushing to have more and more proprietary Oracle code and Java used in OO while the developers wanted the opposite. Thus they split off. Anything OO could do up to version 3 is in LO because that's what they started with and have only improved it since then.

2. LO has a damn good set of documentations on how to use it, so try searching the LO Help system to see how you set the setting you want.

Vincent Berg 🚫

@Capt. Zapp

For now, I will just keep opening the 'raw' text file I get from SOL in OO, save it to odt, and then work it with LO.

I'd suggest opening the html or epub (which is a self-contained html file) versions, as it would involve less juggling and manipulation.

REP 🚫

@Ernest Bywater

And I still have that carriage return reflex when a bell sounds.

BlacKnight 🚫

@Ernest Bywater

Ayep - some people call it a line feed, way back when we used to have to duck the dinosaurs looking for dinner I was taught to call that a carriage return. At that time I used to do a lot of telex messaging.

Carriage return and line feed are actually different characters - ASCII 0x0d and 0x0a, respectively. On old-school terminals, printers, teletypes, and so on, CR returns the cursor/print head to the beginning of the line, while LF advances the screen/paper one line.

Different OSes encode newlines in different manners. DOS marked newlines with a CR/LF combination, and Windows has largely inherited this convention. Unix uses just a LF. Apple used to use just a CR, but they switched to the Unix convention with MacOS X.

This is why FTP has "binary" and "text" transfer modes. The former transfers files without altering them, while the latter automatically converts newlines from the server's convention to the client's.

Replies: Michael Loucks

Michael Loucks 🚫

@BlacKnight

Different OSes encode newlines in different manners. DOS marked newlines with a CR/LF combination, and Windows has largely inherited this convention. Unix uses just a LF. Apple used to use just a CR, but they switched to the Unix convention with MacOS X.

The most common complaint I get (I write on Mac OS with BBEdit) is that the text files have 'only super long lines for each paragraph. Yes, they do. :-) The solution is to tell your viewer to soft wrap those lines the way my editor does! :-)

The second most common is about non-Latin characters. That solution is for them to use UTF-8, which is the encoding I use so I can mix and match Russian, Swedish, Greek, Japanese, and the 'default' Latin character set for English.

Dominions Son 🚫
Updated:

@Ernest Bywater

Ayep - some people call it a line feed, way back when we used to have to duck the dinosaurs looking for dinner I was taught to call that a carriage return. At that time I used to do a lot of telex messaging.

In a plain text file on DOS/windows, it takes two characters, a carriage return and a line feed. (hitting return in a plain text editor like notepad automatically inserts both).

The carriage return takes you back to the beginning of the line and the line feed advances you down one line.

On Unix based systems like Apple's iOS, one control character serves both functions.

I work in IT, and as a professional matter I have had to deal with Windows->Unix and Unix->Windows text file issues.

ETA: In general Unix based text editors will handle Windows text files with carriage return / line feed pairs better than a Windows based text editor will handle a Unix generated text file.

Replies: Ernest Bywater

Ernest Bywater 🚫

@Dominions Son

The carriage return takes you back to the beginning of the line and the line feed advances you down one line.

Way back when I was doing telexes the carriage return took you from where you were to the start of the next line, while the line feed took you down one line exactly where you were. So a line feed in the middle of a line had the next line start below the end of the last line. When I started to work with computers back in the early 1970s a carriage return was what you used to go to the start of the next line while a line feed was what you used when you got there if you wanted a blank line. In some parts of the fledgling industry the company people teaching uses used the words as if they were interchangeable terms for the same thing. At that time every manufacturer had their own program code that you had to sue to work with their gear - it was well before the days of DOS and Windows.

Replies: Dominions Son

Dominions Son 🚫

@Ernest Bywater

it was well before the days of DOS and Windows.

Unix also predates DOS/Windows and as I said, Unix uses a single control character for both functions (carriage return IIRC). Why MS had to make DOS/Window different on this I have no idea.

Replies: BlacKnight

BlacKnight 🚫

@Dominions Son

Unix also predates DOS/Windows and as I said, Unix uses a single control character for both functions (carriage return IIRC).

Unix (including MacOS since version 10, but not Apple's earlier OS) marks newlines with just a line feed character, as I said above.

Why MS had to make DOS/Window different on this I have no idea.

It wasn't a matter of Microsoft making DOS "different". There wasn't a real standard for them to comply to in those early days. Or, rather, there were many conflicting standards for Microsoft to choose from, and the one they chose happened to be different from the ones that other systems that survived the '80s chose.

Unix used just a line feed. Apple, the TRS-80, and the early Commodore machines used just a carriage return. IBM (for whom Microsoft originally produced DOS) used on their big iron a completely different character set (EBCDIC) that had a newline character that combined both functions. Atari systems also used a different character set with a single newline character. The various CP/M systems (and there were a bunch that you don't hear anything about anymore) used the CR/LF combination, and that's the convention that DOS adopted.

Replies: Dominions Son

Dominions Son 🚫

@BlacKnight

The various CP/M systems (and there were a bunch that you don't hear anything about anymore) used the CR/LF combination, and that's the convention that DOS adopted.

I don't know if it's well confirmed, but I've read a number of accounts that say that the original MS DOS was a derivative of CP/M, so the CR/LF used by MS DOS/Windows makes sense in that context.

Replies: richardshagrin

richardshagrin 🚫

@Dominions Son

the original MS DOS

Gates and company bought some software that mutated into the DOS system that IBM agreed to use. I am not sure whether that software was the first MS DOS or not. They bought it, they didn't write it. They amended it many times over the years. Windows may have some code left from MS DOS. You may need to be a paleontologist to determine what skeletons are buried, where.

Replies: Dominions Son

Dominions Son 🚫

@richardshagrin

Gates and company bought some software that mutated into the DOS system

By some accounts, what they bought was CP/M.

Replies: Zom

Zom 🚫

@Dominions Son

By some accounts, what they bought was CP/M.

"MS-DOS was a renamed form of 86-DOS – owned by Seattle Computer Products, written by Tim Paterson. Development of 86-DOS took only six weeks, as it was basically a clone of Digital Research's CP/M, ported to run on 8086 processors."
https://en.wikipedia.org/wiki/MS-DOS#History

REP 🚫

@Capt. Zapp

as LO does not recognize the UTF-8 format from the file

That is a different situation than what you said earlier. you said "instead of LO where I open them as UTF-8 format (Why LO doesn't ask for file type I have no idea)."

I understood the files opened and you where asking why LO didn't ask you for the file type before it attempted to open the file.

We miscommunicated.

Ernest Bywater 🚫

@Capt. Zapp

LO does not recognize the UTF-8 format from the file

file format recognition is usually in the file extension, and sometimes in the file header info. However, the standard recognition for a file extension will overrule in most cases.

Replies: REP

REP 🚫

@Ernest Bywater

The file extension defines the structure and formatting of the file. The device attempting to extract the file's content reads the file extension and then attempts to extract the contents based on that specific type of structure and formatting.

If you were to take a .txt file and edit its file extension to .avi, the data extraction would fail.

Replies: Ernest Bywater

Ernest Bywater 🚫

@REP

The file extension defines the structure and formatting of the file.

True, but some file extensions have additional options provided in the header information to give some extras possible, but not many. This is not an option for most extensions.

sejintenej 🚫
Updated:

file with those words in it.

Has it passed you antivirus? I searched for it and from the list chose the "cnet" download option to read what they had to say; it froze my laptop. Went back to the results page and chose the DocFletcher site and it froze again

I use Avast, AVG and McAfee OK so it is not a virus per se but I have used cnet in the past with no problems whatsoever but when two subject-linked sites cause freezes I worry

Replies: ChiMi

ChiMi 🚫

@sejintenej

I used the http://docfetcher.sourceforge.net/ sourceforge site. no Virus or malware detected.

I got the impression from sunkuwan's reply to my question that it works on your hard drive.

If you are thinking of using it on SOL story files, it may not work for it would have to search SOL's data storage locations.

Yes, only on harddrive files. But I thought he meant he had included the SOL tags into the file on his hard drive

Vincent Berg 🚫
Updated:

Back when I was working as a software engineer, we had a sophisticated tool we used to build converter tools (like those to parse commands into computer code) which was phenomenal at his. It identified parts of speech (for writing text to speech tools), and could identify phrases under a wide variety of problems.

But when the company went belly up years after I dropped out of the industry, nothing ever happened to the tool, and I've never seen anyone even attempt to develop anything similar.

As much as we rant about how smart modern technology is, sometimes the information we drop by the wayside in our rush to success is astounding.

I'm not sure about tagging my files, since I use them for other things (i.e. submitting to self-publishing sites), but this tool sounds handy for catching all the times I refer to old protagonists. With 16 different books, it happens fairly often. Except, then I'd have to compile a list of all the names I've used throughout the years, including "Slavsin" and "Quichoq" and "Sujub-eum Uesuam".

Replies: ChiMi

ChiMi 🚫

@Vincent Berg

Yeah, it is great for authors who have many different files and just want a quick check if the name they are thinking of using was used extensively in prior works.

Just imagine, you write something with a unique name for a country or place and forgot you used that name a decade earlier in a different novel.
*quick, let's just say it is an homage* :D

Replies: Vincent Berg

Vincent Berg 🚫

@ChiMi

Yeah, it is great for authors who have many different files and just want a quick check if the name they are thinking of using was used extensively in prior works.

I almost never reuse the same name for different characters, but I frequently substitute names, so I'll go for 15 chapters with no slip up, and suddenly have two chapters where I use the wrong character name around four or five times. My editors usually catch it, but it sure pisses me off!

So far, I've NEVER reused a protagonist's name, though I do recycle my secondary character's names (ex: Alice, Betty, John, etc.). Often, in the heat of passion during a particularly exciting chapters, muscle memory comes into play and I'll use ancient names I've spent years on, rather than the one I've only been working on for a couple of months.

Vincent Berg 🚫

I save every story I read on SOL that I like to my hard drive, just in case something happens where the story is taken down. I started doing that after Young Thinker pulled all of his stories. I currently have over 3,500 stories saved although there is some duplication between txt, rtf, and odt files.

With Premiere access, you could download the complete stories in a variety of formats. If you're tagging them, you'd probably want them in .txt to .odt, but it would be considerably less work, though you might miss a few never-ending series.

richardshagrin 🚫

Doc Fetcher sounds like a character in a hospital story.

Replies: REP Vincent Berg

REP 🚫

@richardshagrin

Or a Gofer who brings you documents.

Vincent Berg 🚫

@richardshagrin

Doc Fetcher sounds like a character in a hospital story.

An English doctor. Wasn't there actually a "Doc Fletcher" English series at some point? It might actually have been a book series. (Google returns a "Doc Fletcher", but he's a kayak travel guide writer, not a novelist. There's no mention of any English dramas or stories). There's also James Fletcher, the author of One Hundred and One Dalmatians, who was actually had a PhD. in Botany.

Reply to topic

Forum: Author Hangout

"DocFetcher" organizing your library

WARNING! ADULT CONTENT...