For Application Disaster Recovery

1 For Application Disaster RecoveryUsing VMware vCenter S...
Author: Jemimah Barker
0 downloads 5 Views

1 For Application Disaster RecoveryUsing VMware vCenter Site Recovery Manager with EMC Replication Technologies For Application Disaster Recovery Session ID: BC8089 Steve Hegarty EMC Virtualization Solutions Engineering

2 Objectives of this session …We will demonstrate and discuss … Fully automated Remote Disaster Recovery EMC RecoverPoint & vCenter SRM integration Fully automated Local Business Continuity Enabling Application consistent local replicas EMC Replication Manager and RecoverPoint Application Performance Design considerations Application Consistency Consistency Groups Component Interdependencies Best Practices We’ll first look at the integration of SRM and RecoverPoint. - What does RP do exactly? - And how does it do it? *** That’s only the DR piece, and as such, not a complete solution.*** We’ll then consider taking local, application consistent replicas and how we can achieve that with Rep Mgr. What does Rep Mgr do? Where does RecoverPoint fit in? So the solution now includes local BC as well as DR. How do each of these solutions fit in together? Does the Application have any say in this? What are the design considerations?

3 … vCenter SRM in Theory This slide and the next slide are intended to be run through very quickly as common at-a-glance views of SRM, before we get into the detail of what needs to be configured under the covers. First off, this is what we may be used to seeing SRM presented as. The various components required with the SRA looking after the storage side of things. Simple.

4 … vCenter SRM in PracticeAnd this is what the SRM recovery plan looks like in execution. Regardless of SRA, the recovery plan looks the same. Again, simple!

5 Other considerations? … that’s why we’re here!But as with anything that looks and feels simple, there is a lot of though and effort required by somebody somewhere in order to make this happen. That’s what we’re looking to uncover and explain in this session.

6 Solution components …. VMware vSphere VMware vCenterVMware vCenter Site Recovery Manager EMC RecoverPoint EMC Replication Manager Microsoft Exchange 2007 Windows 2008 EMC PowerPath/VE EMC CLARiiON CX4-480 Summary of the primary components of this Solution. Although these may all be supported independently of each other, the key is getting all of the required components to integrate with each other.

7 EMC RecoverPoint with vCenter SRMAdapter for VMware Site Recovery Manager vCenter Site Recovery Manager Servers Heterogeneous Storage Production VMware Infrastructure SRA SRM vCenter Virtual Machines Disaster Recovery Heterogeneous, network- based replication Replicate VMware VMFS & RDM across heterogeneous storage WAN bandwidth reduction (up to x10), data compression Protect and recover a single virtual machine or the entire VMware ESX Server Protect virtual environments with local and/or remote point-in-time recovery Note to Presenter: On the slide, SRM stands for Site Recovery Manager, and SRA stands for storage replication adapter. SRA is not an official acronym and is used here only for convenience because of limited space. Let’s focus on one example applicable to EMC and non-EMC storage environments. EMC RecoverPoint utilizes continuous data protection (CDP) to provide a CLR solution with point-in- time recovery of heterogeneous storage for VMware environments. Virtual machines can be brought back online rapidly with no data loss when RecoverPoint is used with VMware Site Recovery Manager to orchestrate and streamline data protection and failover and failback processes. RecoverPoint is the most flexible approach to protecting virtualized data—replicating VMware VMFS to protect and recover a single virtual machine or the entire VMware ESX Server. RecoverPoint provides: Replication of VMware VMFS across heterogeneous storage arrays WAN bandwidth reduction of the data transferred between sites by up to 10 times Protection and recovery of a single virtual machine or the entire VMware ESX Server Protection of virtual environments with local and/or remote point-in-time recovery Dynamic synchronous and asynchronous replication with write-order consistency across standard IP-based networks

8 EMC RecoverPoint What does it do? How does it do it?Continuous remote replication (CRR) between LUNs across two SANs Continuous data protection (CDP) between LUNs on the same SAN Concurrent local and remote (CLR) data protection between LUNs across two SANs Sync and Async support Heterogeneous operating system and storage array support Integrated with intelligent fabric from Brocade and Cisco Intercepts writes from the initiator and splits the write into two copies: one to production & one to RP Journal Data can be replicated across SAN between Fibre Channel ports on RecoverPoint appliances or over WAN using standard TCP/IP Supports failover between RecoverPoint Appliance nodes Failover does not impact performance; ongoing operations automatically transfer over to a remaining node Brief overview of the various modes available with RecoverPoint and how does it work

9 RecoverPoint Write SplittersHost-based, fabric-based, and CLARiiON-based write splitters Intercept the write from the initiator and splits the write into two copies One copy goes to the original target Second copy goes to the local RecoverPoint appliance Used only for LUNs replicated by RecoverPoint Only write traffic is intercepted for LUNs managed by RecoverPoint No read interception, no I/O interception for LUNs not managed by RecoverPoint Managed through RecoverPoint, upgraded independent of RecoverPoint Must differentiate on this slide that the host splitter is only supported on Windows and Linux hosts. Below information is for reference only and will not be covered in depth for this slide ***************************************** RecoverPoint provides out-of-band replication. To be considered out-of-band, the RecoverPoint appliance is not involved in the I/O process. Instead, a component of RecoverPoint, called the splitter (or Kdriver), is involved. The function of a splitter is to intercept writes destined for a volume being replicated by RecoverPoint. The write is then split (“copied”) with one copy being sent to the RecoverPoint appliance and the original being sent to the target. With RecoverPoint, three types of splitters can be used. The first splitter resides on a host server that accesses a volume being protected by RecoverPoint. This splitter resides in the I/O stack, below the file system and volume manager layer, and just above the multi-path layer. This splitter operates as a device driver and inspects each write sent down the I/O stack and determines if the write is destined for one of the volumes that RecoverPoint is protecting. If the write is destined to a protected LUN, then the splitter sends the write downward and will rewrite the address packet in the write so that a copy of the write is sent to the RecoverPoint appliance. When the ACK (acknowledged back) from the original write is received, the splitter will wait until a matching ACK is received from the RecoverPoint appliance before sending an ACK up the I/O stack. The splitter can also be part of the storage services on intelligent SAN switches from Brocade or Cisco. For Brocade, the splitter resides in a Connectrix AP-7600B or on a PB-48K-AP4-18 blade that is installed in a Connectrix ED-48000B or DCX-B director. For Cisco, the splitter resides in a Storage Services Module (SSM) blade, or in an 18/4 Multi-Services blade, both of which can be installed in a Connectrix MDS-9000 family switch. The splitter can also reside in the Connectrix MDS-9222i switch. These intelligent fabric-based write splitters operate at wire speeds and split writes, with the original being sent on to the target LUN and a copy of the original being sent to the RecoverPoint appliance. For a CLARiiON CX4 and CX3, the CLARiiON storage processor also has a write splitter. When a write enters the CLARiiON array (either through a Gigabit Ethernet port or a Fibre Channel port), its destination is examined. If it is destined to one of the LUNs being replicated by RecoverPoint, then a copy of that write is sent back out one of the Fibre Channel ports of the storage processor to the RecoverPoint appliance. Since the splitter resides in the CLARiiON array, any open systems server that is qualified for attachment to the CLARiiON array can be supported by RecoverPoint. Additionally, both Fibre Channel and iSCSI volumes that reside inside the CLARiiON CX4 or CX3 storage array can be replicated by RecoverPoint. Splitter Type How Deployed Overhead Host-based In I/O stack just above the multi-path software Adds write traffic at the HBA; no other impact Fabric-based In intelligent storage services hardware on a Brocade- or Cisco-based switch Operates at wire speeds; no impact CLARiiON-based In FLARE operating system; active in both storage processors No impact

10 Data Flow with CLARiiON SplitterOperations common to both CDP & CRR Data is written from host to CLARiiON Data is split by the CLARiiON Array Splitter and written to production volume & RecoverPoint Cluster Writes are acknowledged back from local RP Cluster to host via the CLARiiON Array Splitter Local CDP Operations RP Cluster writes data to local Journal volume Consistent data is distributed to local CDP replica Remote CRR operations Data is sequenced, checksummed, compressed and replicated to the remote RP Cluster over IP Data is received, uncompressed sequenced and written to remote Journal volume Consistent data is distributed to remote CRR replica Some of the initial host write and ACK operations are common to both CDP and CRR configurations. When CLR is used, all of the above steps are executed

11 Regular VM – ESX – Storage dataflowThe normal data flow, without any RecoverPoint appliance or otherwise, would be Write from VM/Host Written to array cache/disk ACK sent back to Host Regular VM – ESX – Storage dataflow

12 Continuous Data Protection (CDP)With CDP, the write from the host gets “split” by the CX splitter, whereby the write is split/duplicated on receipt into the array and written to both the Production device and the RecoverPoint Journal device simultaneously. In the case of CDP, the RecoverPoint Journal resides on the local array. The Ack is sent back to the host once the write has been received by the local RPA. Continuous Data Protection (CDP)

13 Continuous Remote Replication (CRR)With CRR, the write from the host gets “split” by the CX splitter (step 2), whereby the write is split/duplicated on receipt into the array and written to both the Production device and the RecoverPoint Journal device simultaneously. In the case of CRR, the RecoverPoint Journal resides on the remote array. The Ack is sent back to the host once the write has been received by the local RPA. Continuous Remote Replication (CRR)

14 Continuous Local and Remote (CLR)With CDP, the write from the host gets “split” by the CX splitter, whereby the write is split/duplicated on receipt into the array and written to both the Production device and the RecoverPoint Journal device(s) simultaneously. In the case of CLR, RecoverPoint has 2 Journals. One residing on the local array and the other on the remote array. The ACK is sent back to the host once the write has been received by the local RPA. Continuous Local and Remote (CLR)

15 RecoverPoint Management - TopologyScreenshot of RecoverPoint Management Application showing topology view of the “SITE-A-EX-01- SG01-05” Con Group in CLR mode Highlight the fact that RecoverPoint is now integrated with vCenter, demonstrating how to add a vCenter server to the RecoverPoint Management application and view of protected datastores. So every write to the production devices is “split” and then replicated to the CDP and CRR Journals. The writes in each of the Journals are subsequently sequenced and written to the CDP and CRR replica devices. CLR – Continuous Local Remote Protection Production device is center: Host access to storage CDP Copy is to the left: replication to Journal but no host access CRR Copy is to the right: replication to Journal but no host access RecoverPoint Management - Topology

16 RecoverPoint – Consistency GroupScreenshot of RecoverPoint Management Application showing the Replication Sets within the “SITE- A-EX-01-SG01-05” Con Group, in CLR mode Here we can see all the of the actual devices involved in the consistency group. The nice names given to the LUNs within Navisphere and the actual LUN ID’s also. The con group consists of the production DB and Log devices, the CDP copies of each and the CRR copies of each. These make up the CLR solution RecoverPoint – Consistency Group

17 RecoverPoint & SRM IntegrationEach consistency group within RecoverPoint has a choice of management modes. For SRM to manage the failover of the RecoverPoint replicas, then SRM must be given control of the consistency groups. We will talk later in this session about using the “Maintenance Mode” option. RecoverPoint & SRM Integration

18 Demonstration Announce Short demo to follow

19 Solution ArchitectureSolution Architecture to depict the environment which the demo is about to walk through. Solution Architecture

20 Short 4.5min demo of setting up the Array Managers in SRM, creating Protection Groups and Recovery Plan, executing the Recovery Plan and Managing EMC RecoverPoint.

21 What about local replicas?So that’s the DR setup complete, but for a complete solution for our Mission Critical Applications, we require local replicas also. And not just crash consistent replicas, but Application consistent replicas. For this we use EMC Replication Manager.

22 Any Point-in-Time RecoveryREPLICATION Any Point-in-Time Recovery Traditional recovery methods RecoverPoint recovery method Nightly backups, snapshots, mirrored images plagued by time gaps, corruption Recovery to any point in time Mount image to any host in SAN Full read/write access to image without protection loss Use recovered image for a variety of purposes Operational recovery Backup, testing, or decision support Reporting Nightly Backup: Recovery once every 24 hours RECOVERY GAP Traditional recovery methods Many of you are familiar with traditional data protection technologies that take point-in-time copies at specific times of day. The data is backed up at a specific time each day, and this becomes the recovery point. If data loss occurs 12 hours into the new backup period, any new data created since the last backup is lost, since the system can only be rolled back to the last recovery point. A scheduled snapshot provides a smaller recovery point window, usually as short as three hours, and array-to-array mirroring can ensure that the last image is always available; however, logical corruption of the production data will be mirrored as well. Future recovery methods RecoverPoint uses continuous data protection (CDP) technology to capture all changes to the production data and stores these changes in a journal along with time-indexed recovery information resulting in unlimited recovery points. Users can bookmark specific points in time that might be used during a future recovery, such as the close of a quarter or a pre-patch state. Users can also create application-aware bookmarks to indicate specific points in time that are application-consistent though using application integration modules, such as KVSS for Microsoft Exchange and sqlSnap for Microsoft SQL Server. RecoverPoint also includes this CDP technology in remotely replicated data volumes to enable you to flexibly recover your data and restore operations from a disaster recovery site during an outage or natural disaster. You can now have more choices in applying the most appropriate protection technology to the particular RTO and RTO. Additionally, the source, journal, and replicas can be on different storage arrays from different vendors. Scheduled Snapshots: Recovery once every 3 hours RECOVERY GAP RecoverPoint: Instant recovery to any point in time Check- point Patch Post- Patch Cache Flush Qtrly Close Hot Backup Pre- Patch UNLIMITED RECOVERY POINTS, APPLICATION BOOKMARKS Time Synchronous mirroring between local arrays: Recover image, but susceptible to logical corruption

23 EMC Replication ManagerTitle Month Year Integration with RecoverPoint WEB 1 WEB 2 APP 1 APP 2 Replication Manager Server Proxy Server Backup Client SQL RM Automation Auto-discovery of applications, storage, and VMFS Schedule replication for multiple data stores Intelligence Application consistent data management Replication technologies RecoverPoint CDP, CRR, CLR CLARiiON SnapView Symmetrix TimeFinder Celerra SnapSure and IP Replicator (iSCSI) Multiprotocol Support FC, iSCSI, NFS Support VMware Infrastructure FC or IP SAN LOGS SQL DATA RM C:\ SQL C:\ APP 2 C:\ APP 1 C:\ WEB 2 C:\ WEB 1 C:\ EMC Storage Array Note to Presenter: Slide Objective: Show how EMC is adding VMware specific functionality to storage management solutions by discussing the value of Replication Manager in a VMware environment. Before starting this discussion, should determine if the customer understands the value of Replication Manager i.e. host based management of application consistent copies of data using SAN (RecoverPoint, Invista) or array based (SnapView, SnapSure, TimeFinder) replication. This a fairly standard deployment with one LUN containing C: VMDK files for several (6) VMs in a VMFS. One of the VMs is running a SQL application which has two additional LUNs, (SQL Data and Logs). Replication Manager using VMsnap can make consistent copies of the VMFS file system containing the C: drive content for the VMs. This copy can be mounted on the ESX server and backed up. Replication Manager can also make a consistent copy of the SQL data store by using integration with SQL (VDI) to create application consistent copies of the log and data LUNs. These copies can be mounted to a VM and backed up. After the backups are complete the copies remain in the array so they are available for recovering data. Suppose an error occurred and the SQL database became corrupted. You could restore from tape, but you have a valid copy in the array. You can recover the data from the array copies faster than you can restart the SQL database (near instantaneous restore). Note: The data copy process may take some time, but the application gets immediate access to the data while the data is copied. If all of the C: VMDK files became corrupted, we can do the same thing, i.e. restore directly in the array. However it is much more likely that a single C: VMDK file would become corrupted. In that case, the copy can be mounted on the ESX server, and a single VMDK file can be recovered. The same Replication Manager functionality that has been used in the physical world for years, is now available for Virtualized application data. Note: Replication Manager with array based replication is a better solution than VMware VCB. However we need to be careful to not be overly critical about VCB. VCB will work for many customers with smaller data sets, however VCB may not be appropriate for\ larger data sets and performance sensitive applications. The performance impact is caused by using VM snap to put the VMFS in “snap” mode and the effort to make the copy of the data. One way to carefully point out the value of Replication Manager vs VCB is the following: Replication Manager uses VM snap to get an application consistent copy of the VMFS structure. Replication Manager places the VMFS in “snap” mode for a few moments therefore VMware is back running at full speed in moments. 23

24 Replication Manager VMware IntegrationEMC Replication Manager integration with VMware Replication Manager VMware Integration

25 Mount Host ConfigurationAll CDP replica devices must be statically mounted to the Mount VM. These must also be presented in Physical Compatibility Mode. Mount Host Configuration

26 What about Replication Manager integration with vCenter SRM?This local replication is great, but is it aware of our DR solution with vCenter SRM? Is EMC RepMgr integrated EMC RecoverPoint and VMware SRM?

27 Integration with vCenter SRMBack to this slide, and this is the reason why we would want to place the RP Con Group into Maintenance Mode. If SRM is given exclusive control of the CG, then RepMgr cannot manipulate it in order to create and/or present the CDP replica to the Mount Host. Therefore, as part of the Replication Job or Mount Job, Rep Mgr will automatically place the CG into Maintenance Mode before mounting the replica and will automatically revert to SRM control when the replica is unmounted. Integration with vCenter SRM

28 Manual Mount of CDP ReplicaThe CDP replica can also be mounted manually using the Rep Mgr Mount Wizard. Either an existing application consistent CDP replica can be mounted, or a crash consistent Point-in- time CDP replica can be presented to the Mount host. The exact point in time for the CDP replica can be defined by simply dragging the bar along the timeline, or just entering the exact time in the Time box. Manual Mount of CDP Replica

29 Application Performance

30 Application PerformanceEMC Replication Manager Technical Review September 2009 Application Performance Get the basics right first … Backend Storage Design Sufficient spindles to support I/O requirement Separate DB and Log devices (FAST changes this) Do not place Replica storage on same spindles as Production storage Filesystem Alignment Windows VMFS Load Balancing Balance load across HBA’s and storage array controllers Multipathing – EMC PowerPath/VE or VMware NMP High Availability Redundant Network and Fibre Channel switches No single points of failure (upgrades, maintenance, etc) The points on this slide are not specific to any one solution. They are best practices that should be used across the board.

31 Application Performance …The results for CDP, CRR, and CLR (combination of CDP and CRR) show a 1ms increase in log write response time, and a 4ms increase in database write response time. All results are well within Microsoft’s guidelines of 10ms for log devices and 20ms for database devices.

32 Application Performance …In this case, we are only looking at the array on our primary site , as the CDP and baseline metrics are not relevant to the secondary array. The addition of CDP to the solution increases the array utilization from 18% to 34% because both the journal and CDP volumes for the Exchange environment are on the same array as the production volumes. The array: Splits the IOs sent to production volumes. Services the IO to the production volumes. Transmits the IOs to the RecoverPoint appliances. Receives the IOs back to the journal volumes. De-stages the IOs sent to the journal volumes to the CDP copies. Array SP utilization drops to 22% with CRR in comparison to the 34% with CDP and CLR. This difference can be attributed to the fact that with CRR, the read write operations to replicas and journals are now occurring on the remote array only. The local array splits the writes, hence perecent utilization is still higher than the baseline, but the local array is no longer responsible for writing to the replicas and journals.  Will need to remove the “we” and “our”

33 Other considerations …

34 Placeholder VM Reserved memoryIf a production VM has reserved memory, be aware that this is not automatically set on the recovery site VM Not setting this on the recovery VM may prevent a successful recovery of your VM VM boot may fail with “Insufficient Disk Space” VM will need to create a swap file This can be resolved manually either before or after recovery Placeholder VM Reserved Memory SRM does not automatically copy memory reservation settings when creating the placeholder VM’s on the recovery site. If required, this must be done manually. This can prevent the VM from booting if insufficient space is available to create the vswap file. This is particularly relevant in situations where large memory counts are in place.

35 Application ConsistencyEMC Replication Manager Technical Review September 2009 Application Consistency What is required for application consistent Exchange 2007 replicas? This solution uses RDM’s due to the requirement of Microsoft Exchange for VSS backups Lack of information available from virtual disks (.vmdk files) Only applies to Exchange Database and Log LUNs Guest OS boot volume can remain on VMFS *** Need to gather more exact information on what the data is that’s unavailable from .vmdk files that RDMs provide wrt Application Consistent VSS backups with EMC Replication Manager. The statement is accurate, but I would like more information to explain it. ***

36 Consistency Groups v’s Application SetsRecoverPoint Consistency Groups v’s Replication Manager Application Sets Solutions not including vCenter SRM would not usually require replication of the Guest OS device/LUN EMC Replication Manager expects ONLY the application volumes to be present in the EMC RecoverPoint Consistency Group For this reason, two separate Consistency Groups were created in RecoverPoint for each Exchange server Guest O/S Application data (DB & Log) VMware vCenter SRM protection groups are unaffected

37 Exchange Restore GranularityUsing Multiple RecoverPoint Consistency Groups per Exchange Server Achieve maximum level of granularity Requires multiple EMC Replication Manager Application Sets Requires multiple EMC RecoverPoint Consistency Groups Requires creating multiple Journal volumes (1 Journal per CG) Single RM Application Set based on single RP Consistency Group vCenter SRM is unaffected Note: A single RM Application Set based on >1 RP Consistency Group is not permitted Exchange Restore Granularity

38 In summary … What did we learn about here?Fully automated Remote Disaster Recovery Seamless integration between EMC RecoverPoint & VMware vCenter Site Recovery Manager Fully automated Local Business Continuity Application consistent local replicas Fully integrated, VMware aware local replication Application Performance Design considerations Application Consistency Consistency groups Component Interdependencies Best Practices

39 One more thing … vRPA! RecoverPoint as a Virtual ApplianceWhere is EMC today with the concept of the vRPA? vRPA running in internal EMC labs for development and testing only Additional EMC/VMware integration work is a work in progress Potential Advantages of a vRPA Easy to manage and deploy quickly, assist in upgrade process ESX servers can run vRPA and other VM’s for flexibility and consolidation of environment (power reduction, physical space consolidation) Initial Requirements of the “internal only” vRPA ESX Server with Nehalem processors (VMDirectPath) Recoverpoint 3.3+ Something that our customer ask us often is when will RecoverPoint be available in a virtual appliance or be able to run in a VM. And for clarification, the reason many software ISVs are looking to deploy their software in a VM is that it’s easier to deploy, update and maintain. Plus with all of the management integration within VMware and it’s advanced functionality like VMotion, Storage VMotion, Distributed Resource Scheduling and HA, having your critical services running in a VM seems like a no brainer… So the big question is, where is EMC today with the concept of a virtual RecoverPoint Appliance? The good news is we are thinking about it and even better, testing the feasibility and have developed an internal prototype for EMC internal use only. We are using it mainly in our own internal labs for testing and development and its unfortunately not for production nor customer use at this time. As a side, before EMC is ready to release any product, it goes through a massive amount of QA, regression testing and of course, extensive customer and market validation. The initial requirements of our prototype include use with vShpere 4, VM DirectPath and support for VT-D which is advanced virtualization technology from Intel. This allows VMs to have direct access to physical hardware, in the case of the vRPA, it needs access to supported, certified fibre channel HBAs to connect to the SAN fabric in order to see and replicate storage LUNs. So, the long and short of it is, EMC’s RecoverPoint team is investigating the viability of a vRPA, but much work is ahead and it’s a work in progress. Stay tuned…

40 Q&A

41