Continuous data protection falls under the topic of continuous data technologies (CDT). CDT is made up of continuous data protection, continuous data imaging (CDI) and continuous data replication (CDR). Almost all of these capabilities are being delivered, but in different products, and the industry typically calls many of them CDP. As I see it, these are the following differences in approaches:
- CDP: Continuous data protection is geared toward protecting data changes in a continuous manner. This may be captured to a backup repository where data can be restored from multiple-point-in-time versions of a file. Pure-CDP approaches would only protect file-level changes, and not necessarily control whether any point of time is application consistent versus crash consistent.
- CDI: Continuous data imaging captures data changes in a continuous manner, but presents them as usable images. One example is BakBone Inc.'s NetVault FASTRecover. CDI products are typically more focused on making sure that an any-point-in-time image is application consistent, so that it can immediately be used for recovery. This may be done with deep application understanding, or integration with tools like Microsoft Corp.'s Visual SourceSafe (VSS) so that the continuous data imaging solution can pause the application to make sure it is consistent at a particular point in time. In the case of FASTRecover, the any-point-in-time data is captured to an appliance, but is immediately available to an application server and can stream requested data blocks so that the application server can recover almost instantaneously.
- CDR: In addition, many of the replication tools on the market have evolved to capture and replicate data continuously, and the most versatile of these provide an ability to recover the replicated data to any point in time. The frontrunner in this space has long been InMage Systems Inc., which brought one of the earliest products to market. With a flexible architecture, InMage can capture local changes, aggregate them and then replicate them across a WAN.
The pros and cons of continuous data protection vary by product. But in general, the capabilities they provide for up-to-the-minute protection and granular recovery can significantly enhance data protection. Moreover, they can do away with time-sensitive traditional backup windows, by either making it unnecessary to run nightly backup jobs, or by allowing nightly backup jobs to copy their data from the CDP repository instead of running on production systems. And it should not be overlooked that continuous data protection can service the needs of protecting the desktop or notebook very well. Because it sends a continuous stream of small changes, CDP is practical in situations where running big, fast backup streams against thousands of systems seems impossible. Many of the continuous data protection technologies on the market can also deliver some type of user self-service, and let the user do file version browsing and restoration.
Simultaneously, CDP tools are one more component that the business must manage. Frequently, CDP tools require the use of a host agent or filter driver that duplicates I/O and sends a copy of written data to the continuous data protection storage system. The CDP layer can also consume network bandwidth, as typically, the split I/O is sent over a LAN instead of the storage fabric, although some vendors, such as InMage, have products that can support routing traffic on the SAN as well.
More recently, continuous data protection has become an increasingly attractive solution for protecting the virtual infrastructure. The consolidation of multiple virtual machines on a single hypervisor can quickly make the simultaneous execution of traditional high bandwidth backups impossible. There simply isn't enough I/O and bandwidth behind a single hypervisor. CDP can deliver constant data protection without significant impact on guest performance or host bandwidth. Some continuous data protection solutions may see only a few percentage points of additional processor or I/O load. Moreover, those tools that can replicate data can make the replicated volume immediately ready for re-use, and allow organizations to quickly reboot a system on other hardware or in another site during any outage or failure.
This was first published in June 2009