Quick, smart searching Watchtower PDFs, 1879 thru 1949

by Fatfreek 11 Replies latest watchtower bible

  • Fatfreek
    Fatfreek

    I, like many here, have searchable PDF files of older Watchtower magazines.

    The Russell series, 1879-1916, which is available free to download from Freeminds.

    The other years, 1920 thru 1949 Watchtowers in three sets (available for $30 each online). For speedier performance, I've copied these to my hard drive.

    All well and good. However, searching for something on any of these is very slow and crude. Anyone who is used to Google advanced searches knows what I'm talking about.

    For example, when inside a PDF file you can enter a search string, then sit back and wait, and wait. Let's say you want to search for articles by Russell which speak of whether he taught resurrection for the ancient Sodomites. Using the PDF file, one way is to search for the string "sodom". At each hit, you visually search the context of that mention and whether or not Russell was talking about their destiny or something else. Very laborious. It takes several hours for a single string search if you proceed through the entire file this way.

    Using, however, a top notch search engine, like Google (on the internet) -- or even the Watchtower CD Library (on your local computer) -- you enter sodom* resurrect* in the search field and you don't have to wait long. [sodom* finds sodom, sodomite, sodomites, etc.; resurrect* finds resurrect, resurrects, resurrection, resurrections, resurrected, etc.] Chances are good that each hit will yield a fairly good article.

    Note that you're no longer limited to a single search string. In the above example I've used two strings. There are far more features than that which advanced search capabilities allow.

    What's my point? It's that I would like to do smart searches on these PDFs but don't know how to do them. If technology (especially free) exists I would like to know it and there are others who'd like that as well.

    What's available? Google Desktop, for one, is free for the download. After installation, GD goes through your hard drive, perhaps taking days to build its index file(s). When finished, GD then allows you intelligent searches that return results nearly instantly. However, the files cannot be larger than 10,000 words. It scans large files but just doesn't go deeper into them, beyond its stated limit. It also has limited success with files of PDF format.

    There are some conversion programs, PDF to plain text. The one I downloaded yielded terrible results on the sample file I used -- most large words were bro ken in to syl la bles.

    That still doesn't solve the size problem as our Watchtower files are some 50 Mbtes or larger.

    There are indexing programs other than Google Desktop out there. There may be other techniques. Perhaps you've found one that can help. If so, please share what you know.

    Len Miller

    http://elfurl.com/x8byd
  • bereanbiblestudent
    bereanbiblestudent

    I used to work with fine reader pro, That has a very high recognition even on files with a bad quality or foto's of bookpages. And it can both output text or text behind the scan, pdf and other formats.

    Other desktop search programs are copernic desktop search which works fast when it has indexed your files, I have not tried other programs like nero desktop search so cannot tell you about them.

    The text behind files from research application is most of the time not so good, but You can improve that with fine reader pro.

  • Fatfreek
    Fatfreek

    The text behind files from research application is most of the time not so good ...

    What do you mean? If it's what I think you mean, it brings to question the worth of doing a simple string search inside them -- further, doing any type of research within them at all. Let me illustrate. The following is a paragraph I copied from with Adobe 8 reader, then pasted from some 1920 Watch Tower.

    Some of the brethren ha~e held that the Watch Tower
    Bible and Tract Soc’iety is the ehamwl [channel - Fatfreek fix] used by the Lord
    for disttensing or t,’ansmitting the message of pre~nt
    truth to the household of faith. Others have taken
    exception to this statement and have insisted that the
    Society is assumin~ a position that is un-Seriptural and
    contrary to the divine arrangement. We think the dtifterence
    of opinion has been due entirely to a misund, erstan<
    ling, tIen<’e we here e<mdder the question with a
    hope of clarifying it.

    As you can see, it is virtually unreadable despite my fix of a totally unreadable string. Apparently, Adobe 8 OCR ability is not very strong.

    Suppose, however, that in using Adobe 8, I was searching for a string like "sodom". How confident can I be that it finds each instance if it depends on the OCR accuracy of the reader? It doesn't look as if I can.

    I'm supposing that FineReader Pro 9.0 is better at OCR than Adobe 8. However, I'm not about to spring for $399 just to find out.

    Len

  • fokyc
    fokyc

    I find ABBYY FineReader 5.0 Sprint Plus does a very good job, although it is 2002 there is now version 9.0

    another well known poster on the forum also uses one of ABBYY products.

    Their website has some good information which may help, they are experts in the field of OCR

    http://www.abbyy.com/

    fokyc

  • bereanbiblestudent
    bereanbiblestudent

    I mean the same as you, it is supposed to be searchable but I find that they just don't care and just trow it in the ocr process and they do not check.

    You can try version 9 for about a month. In that time you might be able to do a lot of work. If you received a sprint version with a scanner than it is about 200 dollar for as far as I know

  • nicolaou
    nicolaou

    Not sure if this is what you're looking for but you can buy a DVD for £10 which has all Watchtowers in PDF format from 1879 - 1949 (along with Golden Age, Informant and other stuff) on a single, searchable disc. I'd imagine you could copy it to your hard drive just like a CD.

    http://cgi.ebay.co.uk/Watchtower-Jehovahs-Witnesses-Magazines-that-Motivate_W0QQitemZ200226637283QQihZ010QQcategoryZ109012QQssPageNameZWDVWQQrdZ1QQcmdZViewItem

    - edited because formating in Opera sucks -

  • Fatfreek
    Fatfreek

    BereanBibleStudent: You can try version 9 for about a month. In that time you might be able to do a lot of work.

    I just visited their site. They may have discontinued that trial version as I could find no reference to it. That would have been a good opportunity. Let me know if I missed something there.

    Nicoloau: Not sure if this is what you're looking for but you can buy a DVD for £10 which has all Watchtowers in PDF format from 1879 - 1949 (along with Golden Age, Informant and other stuff) on a single, searchable disc.

    That is a good buy but I already have most of that. No Golden Age, or Consolation, or WT years 1917-1919. I have no reason, however, to believe that a string search of these -- while using the free Adobe reader -- would yield any difference in valid hits.

    Len

  • AuldSoul
    AuldSoul

    All the Watchtower publications exist in electronic format for use with the Watchtower Library search engine. That's why I know the Governing Body has no excuse for lying about their old doctrines, they can quickly and easily check. A friend of mine had a copy of it on a laptop when he visited me one time. It had EVERYTHING. He offered me a copy of it, but I was a good little dubbie at the time and turned him down. You can imagine my regret.

    He held a sensitive position in IT at the time, and had access to the inner library for a while (the one they keep under lock and key, where even the Bethelites are not allowed to freely tread). There is more specific information about his assignment explaining how he got he library to begin with, but I wouldn't want to out him.

  • Fatfreek
    Fatfreek

    What an experience, AuldSoul. Yes, I'll bet you're kicking yourself today for declining that offer.

    It makes total sense that they've done that with all their pubs so they (their help desk?) can defend against the malicious misquotes that have made their way onto the internet.

    Hey, their digitized publications (at least their Watchtowers) are made public to the rank and file, with copies going back to 1950. They obviously have the technology. Barbara confirms they have all publications in their library. Why not go all the way, at least for the priveledged few.

    Len

  • Fatfreek
    Fatfreek

    Perhaps I can recap my problem, that is, if I understand it correctly. The following paragraph from the April 1, 1920 Watch Tower I copied, can serve to illustrate. As you can see it's obviously full of errors.

    Some of the brethren ha~e held that the Watch Tower
    Bible and Tract Soc’iety is the ehamwl used by the Lord
    for disttensing or t,’ansmitting the message of pre~nt
    truth to the household of faith. Others have taken
    exception to this statement and have insisted that the
    Society is assumin~ a position that is un-Seriptural and
    contrary to the divine arrangement. We think the dtifterence
    of opinion has been due entirely to a misund, erstan<
    ling, tIen<’e we here e<mdder the question with a
    hope of clarifying it.


    In that short paragraph of some 86 words, the following 12 words (you can only distinguish many of them as you read the context of the page within your Adobe reader) are misspelled:

    have, Society, channel, dispensing, transmitting, present, assuming, un-Scriptural, difference, misunderstanding, hence, consider.

    That's a 14% failure rate.

    So what? To test its importance I searched for those same strings on that April 1, 1920 Watch Tower page using my Adobe 8 reader. Not one was found.

    That means if you have one of those Watchtower CDs between 1920 thru 1949, and based on my unscientific analysis, your string searches also have a potential failure rate 14%. That doesn't give me the warm fuzzy feeling I like to enjoy.

    If you search for words like Russell, 1874, 1914, pyramid, sodom, resurrect, etc., and etc. -- you may also have a failure rate of some 14%. There's no guarantee they'll be any more findable than those 12 words above them.

    As to the original premise of this thread -- finding some indexing software to allow for quick and smart searches of these Watchtower PDFs, it looks as if that's pretty much a hopeless task to do it on the cheap. It all seems dependent on the OCR capability of the software(s) that reads the PDF file and of that which creates the index .

    Len

Share this

Google+
Pinterest
Reddit