Home > Small-midsized Business Data Storage Tips > SMB storage tips > A comparison of data compression and data deduplication technologies for SMBs
SMB Storage Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

SMB STORAGE TIPS

A comparison of data compression and data deduplication technologies for SMBs


Marc Staimer
03.09.2009
Rating: --- (out of 5)


SMB storage technical tips
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


Data deduplication is one of the hottest technologies in data reduction today. But the term "data deduplication" can be confusing because it's often used to describe technologies that aren't really deduplication at all. There are five primary types of data reduction: hardware and software compression; file deduplication; block/variable block deduplication; delta block optimization; and application-aware data reduction. In this article, we'll explain the different types of data compression, the pros and cons of each, and offer recommendations for SMB storage environments.

Hardware and software compression

Hardware and software compression is transparent to applications and storage hardware. It uses an algorithm to reduce the size of files by eliminating redundant bits. However, if the files have been stored multiple times, no matter how good the compression algorithm is, there will be multiple copies of the compressed files.

Compression typically provides an aggregate data reduction ratio range of approximately 1.2:1 to as high as 10:1 (depending on the type of data). Unfortunately, compression has nominal impact on already compressed files such as Microsoft Office files, .pdfs, .jpegs, .mpegs and zip files. The best results are achieved when compression is applied to backup or secondary data and data that is not already compressed.

File-level deduplication

File-level deduplication eliminates multiple copies of the same file. Redundant files are replaced with a pointer to the unique version of the file. File deduplication typically provides an aggregate deduplication ratio range of approximately 2:1 to as high as 10:1 (data type dependent). File-level deduplication is a coarse-grain type of dedupe, so it doesn't reduce files that may change only slightly from previous versions. File deduplication fits best in content-addressable storage (CAS) ...


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



RELATED CONTENT
SMB storage tips
New data protection schemes impact RAID rebuild times
Low-cost data storage replication options for SMBs
Data migration strategies and best practices
Five must-have data storage security tools for smaller businesses
Data reduction strategies for SMBs
Data migration strategies for multivendor storage systems
Optimizing RAID data storage for your business
Data backup and recovery choices for SMBs
Virtual desktop infrastructure deployments: The pros and cons of VDI
Data storage for virtual environments: Pros and cons of DAS, NAS and SAN

Small-midsized Business Backup
SMB data storage briefs: Tandberg announces SMB data protection products
Quantum builds Scalar automated tape libraries for SMBs that scale
What are the pros and cons of using cloud backup for my SMB?
SMB data storage briefs: Thecus Tech Corp. launches new NAS server, the N8800PRO
Low-cost data storage replication options for SMBs
SMB data storage news briefs: Vocalocity offers online storage and data backup services to SMBs
Data reduction strategies for SMBs
Iomega launches ix2-200 NAS desktop backup appliance with replication and iSCSI support
Data backup and recovery choices for SMBs
SMB data storage technology tutorials: Storage advice for smaller businesses

Small-midsized Business NAS
Multiprotocol and unified data storage tutorial for SMBs
SMB data storage briefs: Thecus Tech Corp. launches new NAS server, the N8800PRO
Iomega launches ix2-200 NAS desktop backup appliance with replication and iSCSI support
HP expands Microsoft-based SMB network-attached storage offerings with Data Vault series
Network-attached storage (NAS) performance issues with SMBs can be avoided with proper planning
GreenBytes launches data deduplication for primary and secondary data storage
Dell adds entry-level rack mounted Windows-based NAS
Iomega launches StorCenter network-attached storage upgrade with Mac support and data replication
Cloud network-attached storage (NAS) emerges to address secondary file storage needs
Synology launches RS409 NAS server for SMBs

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary


where files cannot be altered, in backup or secondary data, and in many remote offices or branch offices (ROBOs).

File-level deduplication is primarily available from networked-attached storage (NAS) vendors, including NetApp Inc.'s Ontap and Sun Microsystems Inc.'s ZFS, and from CAS vendors, including Active Circle, Bycast Inc., Caringo Inc., EMC Corp., Hewlett-Packard (HP) Co. and Permabit Technology Corp.

Block/variable block deduplication

Block/variable block deduplication eliminates redundant or duplicate data by retaining just one unique instance or copy of blocks or chunks of data. Redundant data is replaced with a pointer to the block of a unique data copy. Block/variable-block deduplication typically provides an aggregate deduplication ratio range of approximately 3:1 to as high as 80:1 (data type and scalability dependent). Block/variable block-level deduplication fits best for archive data, and ROBOs.

Both types of deduplication become increasingly efficient, as the same data gets backed up or archived multiple times to the data repository. Increased data equals increased data reduction ratios and increased value. Additional value comes from the longer backup/archive disk retention periods, which produces even more value with faster recovery time objectives (RTOs) and further decreased costs by reducing or eliminating tape backups.

A downside to both deduplication methods is the read/write performance that shows up when used with primary data storage. When writing, the deduplication database is checking against what has already been written before it completes the write. This adds noticeable latency to the write. On reads, the deduplication database must reconstitute the files to full hydration. This again adds very noticeable latency to the read. This is why dedupe is best suited for secondary or backup data.

However, some implementations of deduplication are post-processing, which means deduplication occurs after the data has been written. FalconStor Software's SIR, Sepaton Inc.'s virtual tape library (VTL) and NetApp's VTL all use this method. This methodology eliminates much of the write latency penalty, but does nothing for the read penalty.

Block/variable block deduplication can be found in VTLs, NAS appliances or in backup software:

VTLs

  • Copan
  • Data Domain
  • Dell
  • Data Domain
  • EMC
  • FalconStor
  • HP
  • Quantum
  • Sepaton

NAS

  • Copan
  • Data Domain
  • EMC
  • ExaGrid
  • Dell
  • FalconStor
  • Quantum
  • NEC HYDRAstor
  • NetApp

Backup Software

  • Asigra Televaulting
  • CommVault Simpana
  • EMC Avamar
  • Symantec NetBackup Pure Disk

Delta block optimization

Delta block optimization is designed to reduce the amount of data backed up from the source and the amount of data stored. When the most recent version of a file that has already been backed up is backed up again, the software attempts to figure out which blocks are new. Then it writes only these blocks to backup and ignores the blocks in the file that haven't changed.

This technique has a similar shortcoming to file deduplication and compression though. If two users sitting in the same office or two separate servers have identical copies of the same file or identical blocks, then delta block optimization will create two identical backups instead of storing just one.

Application-aware data reduction

Application-aware data reduction eliminates duplicate storage objects within and between different files. It's designed for and best suited with primary data storage. It works by reading the files (post processing after they are initially written) and then expands them from their compressed formats (.pdf, .jpeg, Microsoft Office, .mpeg, Zip, etc.). It then looks for and eliminates common storage objects across all of the files and optimizes and recompresses the files.

So if there is a .jpeg image, and it is inserted in both a Word document as well as a PowerPoint presentation, only one copy of the three images is stored. A reader on the user's PC, server or NAS head eliminates any noticeable read latency penalty. Application-aware data reduction ratios typically range from 4:1 to 10:1, which is usually two to five times greater than other data reduction technologies when used on primary data storage.

Application-aware data deduplication is currently only available from Ocarina as an appliance (also resold by BlueArc and HP).

Data reduction recommendations for SMBs

So how does an SMB know what or if to implement any of these technologies? It depends on the data reduction solution, how it is implemented and what is currently in place.

But doing an apples-to-apples comparison of the different data reduction technologies, implementations and vendors can be a daunting task because they come in so many different packages and implementations. Compression is available as a hardware appliance (examples include those from Hifn Inc. and Storwiz,) and is sometimes available as a hardware option on some storage systems (NetApp).

About the author: Marc Staimer is the founder, senior analyst, and CDS of Dragon Slayer Consulting in Beaverton, OR. The consulting practice of 11 years has focused in the areas of strategic planning, product development, and market development. With over 28 years of marketing, sales and business experience in infrastructure, storage, server, software and virtualization, he's considered one of the industry's leading experts. Marc can be reached at marcstaimer@comcast.net.


Rate this Tip
To rate tips, you must be a member of SearchSMBStorage.com.
Register now to start rating these tips. Log in if you are already a member.


Submit a Tip




DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



SMB Solutions - SAN Consolidation
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2008 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts