I'm banging on an Oracle database with a table holding 200,000,000 records. Just getting a count of those bad boys takes several seconds.
It got me to wondering about Google. They index 8 billion pages. How many gazillion records must exist describing all those pages? Maybe I could see sending in a request and getting the answer back in a coupla weeks. But they find the pages that contain my search terms (and exclude the ones that I put a minus sign in front of) and return a nicely formatted page complete with ads in less than a second. How are they doing that?
I understand that the technology's been around for years and Google's not doing anything especially new. But I'm still curious. I can't believe that just having more computers in your server farm somehow makes you able to search that much faster. When it comes down to it, you still have to go hit an index, crawl across it pulling in all the matches, join it to the matches for the other terms in the query, fetch back the title and description of the 100 pages on page 1, come up with a reasonable guess of how many more pages there are. There's a ton of stuff happening there and it's insanely fast.
Does anybody know how they pull it off?
Dave