Storage: Wasted Storage Clogs Servers

Sun Microsystems reports that only 30% of enterprise storage space is being productively used. The rest of the data being stored consists of seldom or never accessed files and unallocated space.  Redundant data can often be found scattered across multiple storage locations, documents like annual reports, project reviews and spreadsheets.

Part of the problem can be that there is no composite global view of all data in the enterprise. Enterprise Content Management systems are one solution. Enterprise Search can help too. Another type of system new on the horizon is called Global Storage Resource Management (SRM) Systems. SRM can track down, identify, and remove or relocatel redundant and seldom used files.

Deduplication is a technique often used by SRM systems that can reduce storage utilizations up to one twentieth of their size.
Deduplication is particularly useful for backups because only unique data block differences need to be backed up. The deduplication process can be broken down into three steps:

  • Analysis – An image of the backup data is analyzed. The analysis may use hints from information like metadata, and file and path names.
  • Redundancy Identification — Based on the analysis, identify which data pieces are redundant. Rather than bit-by-bit comparisons, a hash algorithm is typically applied to a block of data which creates a unique signature for that block. Redundant blocks are identified when their hash signatures match. Common hash algorithms used might be SHA-1, MD5 or some proprietary method.
    Bit-by-bit comparisons are sometimes done too, and that produces the most efficient identification, but the method is very I/O and compute intensive.
  • Redundancy Elimination — Reference pointers need to be created. Depending on the product vendor, either forward or reverse referencing is used. Reverse references point to the first occurrence of the data. Forward references writes out the current data and updates the previous occurence to be a pointer to the new area.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

You must be logged in to post a comment.