The Fragility of Digital Data & The Impact ...

by SYN 18 Replies latest jw friends

  • SYN
    SYN

    Didn't really know which section of the forums to post this in, so I stuck it in "Friends"!

    The article linked at the bottom of this post makes some interesting points - according to a study, only .003% of our data isn't stored on some form of computer today.

    Needless to say, this is a somewhat disturbing statement. Having had loads of experience with computing hardware, one eventually realizes that most of the hardware out there is actually rather unreliable - hard drives, for instance. Anything that has to spin at over 3,000RPM for a couple of years is not suited to long term data storage. Also, the magnetic fluxes that today store most of our data could be wiped out by a single big magnet (not counting CDs of course).

    Even CDR media, purportedly the saviour of mass data storage (and DVD R[w] along with them), only last a maximum of about a century or so. And that's an optimistic estimate. What will future historians find when they dig up our civilisation's remains? Nothing. All our data will have dissipated like motes of dust before the wind. Our society today is defined by information - try living for a year without your credit card, bank account, and without touching any computer - it's virtually impossible.

    This all brings me to a related question: what is being done to backup this place? This DB? I'm not being an upstart, I'm just curious to find out how our words that we spend so much time writing are being backed up...and on a similiar note, out of curiosity, how large is the info stored by us on this forum in total? That's a question for Simon

    As usual, there is a rather simple solution to this problem. All we need to do is build a generalized cross-computer "fuzzy storage" system - others have written about this idea before too. What that would entail would be a system where your computer would be connected to a very large number of other computers by an extremely big pipe - something like the T1 of today or perhaps faster would be required for decent latencies across such a large network. This network would specialize in abstracting the storage of data - a single large file, for instance, would be stored in a distributed fashion on a thousand other computers, thoroughly jazzed and encrypted simultaneously, with only the computer owning it having access to the data. On a corporate network, perhaps the administrators would have access to all the files, but if such a system were implemented properly using some sort of hybrid public-key system, it would be fairly simple to ensure anonimity, user file control, and very good data redundacy. Each person would be allowed to store exactly a certain fraction of the total storage capacity THEIR computer brings to a network (for fairness - perhaps a service would exist that would sell people super-redundant disk space. Such services do in fact already exist, but not in the super-redundant format proposed here...)

    The core of this whole concept involves each computer following enough levels of indirection to ensure that the very large amount of data flooding this network would render it virtually impossible to intercept transmissions and decrypt them. By the very nature of it's distribution, such a network would already make the task of tracking down all the fragments of a file very difficult - only the central algorithm on everyone's computer, primed with the correct one-way public key, would be able to integrate and maintain a virtual filesystem on such a network. Mathematics like this are quite out of the grasp of my mind - but no doubt somebody smart at MIT can figure out a workable way to accomplish all of this with minimum overhead. We already have journalled filesystems - this just distributes the journalled filesystem across a giant network of computers, leading to ultra-high redundancy data storage. Just like the Internet, if there are enough nodes, and a sufficiently high number of redundant copies of each file fragment are made, there will be virtually no way to destroy the data, unless you destroyed the entire network or a very large region of it.

    Unfortunately, all of this piggy-backs on existing technology we have today, such as magnetic hard drives and flash memory. These mediums are notorious for their unreliability - they are far from suitable for very long term data storage and archival. However, the Abstracted Redundant Data Network (ARDN) (OH I love making up acronyms!) would give us an entry point to truly redundant data storage. We have already implemented the abstraction and fuzzification of the location of our data with ARDN - now all we need to do is archive the changes made to the data on the network, bit by bit, in a more permanent medium, such as a holographic storage device. Technology to do this is still being developed - what it would effectively mean is, the further back you want to go in the data's history, the more holocubes you'd have to process. You'd have to unfold the history like flower petals, until eventually you reach the very first modification. Naturally this raises privacy issues - the data will be laid bare for all to access!

    But not if the ARDN implements the fragmentation process properly. Of course, there is no such thing an unbreakable encryption code, especially now that quantum processors are being theorized about and researched - but the levels of protection ARDN will afford will be very high indeed, far higher than today's average unencrypted FAT32 or NTFS or EXT3FS or ReiserFS file system!

    So what do you guys think about this? Am I totally OT? Am I just a smartass with too much time on his hands? Speak up!

    http://shift.com/content/web/385/1.html

  • funkyderek
    funkyderek

    I'm making a start by printing out the entire Internet right now! When I'm done, I'm going to laminate all the pages

  • Valis
    Valis

    definitely a smartass..*L* Syn, I think one of the new forms of storage will be living memory chips that replicate themselves along w/the data that they hold. kind of like a biological BIOS if you will. I think one of the biggest probs w/your idea is bandwidth and always will be. Redundancy is always good, but combine that w/the admin/control freak mentality, this becomes quite a daunting task or idea to implement. Cool to think about though.. BTW, I have it on good authority that Simon copies each post on a big roll of toilet paper and stores them in a fire proof vault, right next to his commemorative gold plated Russell Pyramid.

    Sincerely,

    District Overbeer

  • SYN
    SYN
    I'm making a start by printing out the entire Internet right now! When I'm done, I'm going to laminate all the pages

    There aren't enough trees in the WORLD! But enjoy it!

    Syn, I think one of the new forms of storage will be living memory chips that replicate themselves along w/the data that they hold. kind of like a biological BIOS if you will

    Now that's an interesting idea I didn't think about. While I was writing the post, DNA-style data storage did occur to me, but I ditched it in favour of shiny holocubes! And yes, bandwidth IS expensive.

  • Xander
    Xander
    , leading to ultra-high redundancy data storage. Just like the Internet, if there are enough nodes, and a sufficiently high number of redundant copies of each file fragment are made, there will be virtually no way to destroy the data, unless you destroyed the entire network or a very large region of it

    Riiiight...

    Sorry, but I can't imagine a system large enough (with enough overhead) to make enough duplicates of EVERY BIT OF DATA on EVERY SINGLE COMPUTER to ensure that data corruption is not a problem.

    I mean, really, how many computers out of your control would you want to put a bit of data on? 100? And, how much data on each? 100K, maybe? That means, to store a typical word document (1mb - 1000K) for me alone, I'd need (file to 10 fragments of 100K, each duplicated 100 times for data integrity = 10,000K each, or 100,000K total). So, what now takes up 1MB on my disk will take up 100MB distributed? YUCK!

    And, what of larger, critical files? You NEED to have every single byte of every single file in your windows directory or your OS won't start. How much replication do you want on THAT data?

    Sorry, I think this is just a bad idea. Kind of defeats the whole purpose of having PCs - you're basically reducing them to 'fat clients' at best.

  • Xander
    Xander

    And, anyway, why are you assuming our civilization will have 'remains'? Planning on our destruction any time soon?

    Cause if not, the data isn't going anywhere. Yeah, disks fail. Big deal, the data is backed up. And everyone, everywhere, generally upgrades their disks LONG before the old ones fail. Where does the important information go?

    It's still there, just moved to the NEW disks.

    That'll keep happening forever. There is no practical limit on the amount of data humanity can store digitally - so we'll just keep accumulating all our knowledge into every increasing storage media (which will always be 'even increasing').

  • SYN
    SYN

    What I meant was, the data would be redundantly backed up a few times (not once on EACH computer!), like maybe 3 or 4 redundant copies would exist. The more redundant you wanted your data to be, the more disk space would be needed, is all. This system just distributes the data and makes an event like an asteroid impact less likely to nail all your data in one fell go if you happen to be in the asteroid's way!

    I'm thinking about the possibility of an extinction-level event like the one that got the dinosaurs (apparently), and what our survivors would find, see? Plus, this way you wouldn't have to worry about exploding hard drives (a personal "experience" I've had with a Quantum Fireball that truly lived up to it's name )

  • SYN
    SYN

    LMFAO @ "Fat Clients", OK, that's going into my vocab STRAIGHT AWAY!

  • Xander
    Xander

    What?

    That's what they're called!

    From the 'Lycos Tech Glossay':

    "A thin client is a network computer without a hard disk drive, whereas a fat client includes a disk drive."

    like maybe 3 or 4 redundant copies would exist

    What I was saying is that '3 or 4' times is NOT ENOUGH if I have no control over the computers in question. StorLoc 1 decides to turn his computer off for the night. StorLoc2 is a server that's being taken down for an upgrade. StorLoc3 is a laptop somewhere that just crashed. StorLoc4 is behind a downed T1 connection.

    And tah-dah! Suddenly, my OS won't start (as an example).

    Distributed data storage (and distributed or 'online' applications) are just terrible, terrible, BAD ideas.

    And how long, really, do you think the RIAA would take to require the government to use its capabilities it will now have to simply delete all MP3 files from existence. Is it MP3? Oops, doesn't store right. Byebye. Plus, the government has all kinds of laws governing how much encryption is 'allowed' for data (basically, they must always be able to crack any encryption scheme that is exported) - nothing will change here.

    Talk about 'Big Brother watching'!

  • SYN
    SYN

    Well these are problems! Actually I forgot about the strong encryption thing...but don't you think it would be much cheaper to rent space from a service provider or something than have to worry about your hard drive exploding all the time?

    Another thing I've realized that will make this a problem is that somebody could simply wiretap the incoming fragments on YOUR computer somehow and have access to all your files.

    Plus, OS files would have to be stored locally to start the wheel turning.

    Hey, it was just an idea

Share this

Google+
Pinterest
Reddit