Tuesday, March 06, 2007

So much data, relatively little space

So much data, relatively little space:
"A new study that estimates how much digital information the world is generating (hint: a lot) finds that for the first time, there's not enough storage space to hold it all. Good thing we delete some stuff.

The report, assembled by the technology research firm IDC, sought to account for all the ones and zeros that make up photos, videos, e-mails, Web pages, instant messages, phone calls and other digital content zipping around. The researchers also assumed that on average, each digital file gets replicated three times.

Add it all up and IDC determined that the world generated 161 billion gigabytes -- 161 exabytes -- of digital information last year. ..."

In 2003 the estimate was 5 exabytes (5 followed by 18 zeroes)-- and various groups are pushing for a law requiring ISPs to retain nearly everything that users do online.

Just to put things into perspective, in my job we are working with storage arrays of up to 50 terabytes, and the need continues to rise. And that is just in our group (support for scientific research). The email group has their own storage needs, in addition to the thousands of desktop computers and laptops with their internal and external drives. I just did a quick mental calulation of my internal and external drives, and they add up to about a terabyte of storage.

So what does this mean?

Each increment of 1000 bytes gets its own prefix:

Kilo 1000
Mega 1,000,000
Giga 1,000,000,000
Tera 1,000,000,000,000
Peta 1,000,000,000,000,000
Exa 1,000,000,000,000,000,000
Zetta 1,000,000,000,000,000,000,000
Yotta 1,000,000,000,000,000,000,000,000

Exa-, zetta-, and yottabyte storage on a single system is not feasible at this time, but that may change in the next few years (months? days?). According to the Wikipedia entry on "Exabyte", 64-bit computer architecture has an address space of 16 exabytes.

As chilling as the thought of everyone's web browsing history being open to subpoena, more data are being generated than can be feasibly stored. Much of it goes into the bit bucket and is lost.

Now it's time to go to work and deal with storage issues, albeit more on a gigabyte scale.

No comments: