The Ever Changing IBM Z Mainframe Disaster Recovery Requirement

With a 50+ year longevity, of course the IBM Z Mainframe Disaster Recovery (DR) requirement and associated processes have changed and evolved accordingly.  Initially, the primary focus would have been HDA (Head Disk Assembly) related, recovering data due to hardware (E.g. 23nn, 33nn DASD) failures.  It seems incredulous in the 21st Century to consider the downtime and data loss with such an event, but these failures were commonplace into the early 1980’s.  Disk drive (DASD) reliability increased with the 3380 device in the 1980’s and the introduction of the 3990-03 Dual Copy capability in the late 1980’s eradicated the potential consequences of a physical HDA failure.

The significant cost of storage and CPU resources dictated that many organizations had to rely upon 3rd party service providers for DR resource provision.  Often this dictated a classification of business applications, differentiating between Mission Critical or not, where DR backup and recovery processes would be application based.  Even the largest of organizations that could afford to duplicate CPU resource, would have to rely upon the Ford Transit Access Method (FTAM), shipping physical tape from one location to another and performing proactive or more likely reactive data restore activities.  A modicum of database log-shipping over SNA networks automated this process for Mission Critical data, but successful DR provision was still a major consideration.

Even with the Dual Copy function, this meant DASD storage resources had to be doubled for contingency purposes.  Therefore this dictated only the upper echelons of the business world (I.E. Financial Organizations, Telecommunications Suppliers, Airlines, Etc.) could afford the duplication of investment required for self-sufficient DR capability.  Put simply, a duplication of IBM Mainframe CPU, Network and Storage resources was required…

The 1990’s heralded a significant evolution in generic IT technology, including IBM Mainframe.  The adoption of RAID technology for IBM Mainframe Count Key Data (CKD) provided an affordable solution for all IBM Mainframe users, where RAID-5(+) implementations became commonplace.  The emergence of ESCON/FICON channel connectivity provided the extended distance requirement to complement the emerging Parallel SYSPLEX technology, allowing IBM Mainframe servers and related storage to be geographically dispersed.  This allowed a greater number of IBM Mainframe customers to provision their own in-house DR capability, but many still relied upon physical tape shipment to a 3rd party DR services provider.

The final significant storage technology evolution was the Virtual Tape Library (VTL) structure, introduced in the mid-1990’s.  This technology simplified capacity optimization for physical tape media, while reducing the number of physical drives required to satisfy the tape workload.  These VTL structures would also benefit from SYSPLEX implementations, but for many IBM Mainframe users, physical tape shipment might still be required.  Even though the IBM Mainframe had supported IP connectivity since the early 1990’s, using this network capability to ship significant amounts of data was dependent upon public network infrastructures becoming faster and more affordable.  In the mid-2000’s, transporting IBM Mainframe backup data via extended network carriers, beyond the limit of FICON technologies became more commonplace, once again, changing the face of DR approaches.

More recently, the need for Grid configurations of 2, 3 or more locations has become the utopia for the Global 1000 type business organization.  Numerous copies of synchronized Mission Critical if not all IBM Z Mainframe data are now maintained, reducing the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) DR criteria to several Minutes or less.

As with anything in life, learning from the lessons of history is always a good thing and for each and every high profile IBM Z Mainframe user (E.g. 5000+ MSU), there are many more smaller users, who face the same DR challenges.  Just as various technology races (E.g. Space, Motor Sport, Energy, et al) eventually deliver affordable benefit to a wider population, the same applies for the IBM Z Mainframe community.  The commonality is the challenges faced, where over the years, DR focus has either been application or entire business based, influenced by the technologies available to the IBM Mainframe user, typically dictated by cost.  However, the recent digital data explosion generates a common challenge for all IT users alike, whether large or small.  Quite simply, to remain competitive and generate new business opportunities from that priceless and unique resource, namely business data, organizations must embrace the DevOps philosophy.

Let’s consider the frequency of performing DR tests.  If you’re a smaller IBM Z Mainframe user, relying upon a 3rd party DR service provider, your DR test frequency might be 1-2 tests per year.  Conversely if you’re a large IBM z Mainframe user, deploying a Grid configuration, you might consider that your business no longer has the requirement for periodic DR tests?  This would be a dangerous thought pattern, because it was forever thus, SYSPLEX and Grid configurations only safeguard from physical hardware scenarios, whereas a logical error will proliferate throughout all data copies, whether, 2, 3 or more…

Similarly, when considering the frequency of Business Application changes, for the archetypal IBM Z Mainframe user, this might have been Monthly or Quarterly, perhaps with imposed change freezes due to significant seasonal or business peaks.  However, in an IT ecosystem where the IBM Z Mainframe is just another interconnected node on the network, the requirement for a significantly increased frequency of Business Application changes arguably becomes mandatory.  Therefore, once again, if we consider our frequency of DR tests, how many per year do we perform?  In all likelihood, this becomes the wrong question!  A better statement might be, “we perform an automated DR test as part of our Business Application changes”.  In theory, the adoption of DevOps either increases the frequency of scheduled Business Application changes, or organization embraces an “on demand” type approach…

We must then consider which IT Group performs the DR test?  In theory, it’s many groups, dictated by their technical expertise, whether Server, Storage, Network, Database, Transaction or Operations based.  Once again, if embracing DevOps, the Application Development teams need to be able to write and test code, while the Operations teams need to implement and manage the associated business services.  In such a model, there has to be a fundamental mind change, where technical Subject Matter Experts (SME) design and implement technical processes, which simplify the activities associated with DevOps.  From a DR viewpoint, this dictates that the DevOps process should facilitate a robust DR test, for each and every Business Application change.  Whether an organization is the largest or smallest of IBM Z Mainframe user is somewhat arbitrary, performing an entire system-wide DR test for an isolated Business Application change is not required.  Conversely, performing a meaningful Business Application test during the DevOps code test and acceptance process makes perfect sense.

Performing a meaningful Business Application DR test as part of the DevOps process is a consistent requirement, whether an organization is the largest or smallest IBM Z Mainframe user.  Although their hardware resource might differ significantly, where the largest IBM Z Mainframe user would typically deploy a high-end VTL (I.E. IBM TS77n0, EMC DLm 8n00, Oracle VSM, et al), the requirement to perform a seamless, agile and timely Business Application DR test remains the same.

If we recognize that the IBM Z Mainframe is typically deployed as the System Of Record (SOR) data server, today’s 21st century Business Application incorporates interoperability with Distributed Systems (E.g. Wintel, UNIX, Linux, et al) platforms.  In theory, this is a consideration, as mostly, IBM Z Mainframe data resides in proprietary 3390 DASD subsystems, while Distributed Systems data typically resides in IP (NFS, NAS) and/or FC (SAN) filesystems.  However, the IBM Z Mainframe has leveraged from Distributed Systems technology advancements, where typical VTL Grid configurations utilize proprietary IP connected disk arrays for VTL data.  Ultimately a VTL structure will contain the “just in case” copy of Business Application backup data, the very data copy required for a meaningful DR test.  Wouldn’t it be advantageous if the IBM Z Mainframe backup resided on the same IP or FC Disk Array as Distributed Systems backups?

Ultimately the high-end VTL (I.E. IBM TS77n0, EMC DLm 8n00, Oracle VSM, et al) solutions are designed for the upper echelons of the business and IBM Z Mainframe world.  Their capacity, performance and resilience capability is significant, and by definition, so is the associated cost.  How easy or difficult might it be to perform a seamless, agile and timely Business Application DR test via such a high-end VTL?  Are there alternative options that any IBM Z Mainframe user can consider, regardless of their size, whether large or small?

The advances in FICON connectivity, x86/POWER servers and Distributed Systems disk arrays has allowed for such technologies to be packaged in a cost efficient and small footprint IBM Z VTL appliance.  Their ability to connect to the IBM Z server via FICON connectivity, provide full IBM Z tape emulation and connect to ubiquitous IP and FC Distributed Systems disk arrays, positions them for strategic use by any IBM Z Mainframe user for DevOps DR testing.  Primarily one consistent copy of enterprise wide Business Application data would reside on the same disk array, simplifying the process of recovering Point-In-Time backup data for DR testing.

On the one hand, for the smaller IBM Z user, such an IBM Z VTL appliance (E.g. Optica zVT) could for the first time, allow them to simplify their DR processes with a 3rd party DR supplier.  They could electronically vault their IBM Z Mainframe backup data to their 3rd party DR supplier and activate a totally automated DR invocation, as and when required.  On the other hand, moreover for DevOps processes, the provision of an isolated LPAR, would allow the smaller IBM Z Mainframe user to perform a meaningful Business Application DR test, in-house, without impacting Production services.  Once again, simplifying the Business Application DR test process applies to the largest of IBM Z Mainframe users, and leveraging from such an IBM Z VTL appliance, would simplify things, without impacting their Grid configuration supporting their Mission critical workloads.

In conclusion, there has always been commonality in DR processes for the smallest and largest of IBM Z Mainframe users, where the only tangible difference would have been budget related, where the largest IBM Z Mainframe user could and in fact needed to invest in the latest and greatest.  As always, sometimes there are requirements that apply to all, regardless of size and budget.  Seemingly DevOps is such a requirement, and the need to perform on-demand seamless, agile and timely Business Application DR tests is mandatory for all.  From an enterprise wide viewpoint, perhaps a modicum of investment in an affordable IBM Z VTL appliance might be the last time an IBM Z Mainframe user needs to revisit their DR testing processes!

IBM Z Server: Best In Class For Availability – Does Form Factor Matter?

A recent ITIC 2017 Global Server Hardware and Server OS Reliability Survey classified the IBM Z server as delivering the highest levels of reliability/uptime, delivering ~8 Seconds or less of unplanned downtime per month.  This was the 9th consecutive year that such a statistic had been recorded for the IBM Z Mainframe platform.  This compares to ~3 Minutes of unplanned downtime per month for several other specialized server technologies, including IBM POWER, Cisco UCS and HP Integrity Superdome via the Linux Operating System.  Clearly, unplanned server downtime is undesirable and costly, impacting the bottom line of the business.  Industry Analysts state that ~80% of global business require 99.99% uptime, equating to ~52.5 Minutes downtime per year or ~8.66 Seconds per day.  In theory, only the IBM Z Mainframe platform exceeds this availability requirement, while IBM POWER, Cisco UCS and HP Integrity Superdome deliver borderline 99.99% availability capability.  The IBM Mainframe is classified as a mission-critical resource in 92 of the top 100 global banks, 23 of the top 25 USA based retailers, all 10 of the top 10 global insurance companies and 23 of the top 25 largest airlines globally…

The requirement for ever increasing amounts of corporate compute power is without doubt, satisfying the processing of ever increasing amounts of data, created from digital sources, including Cloud, Mobile and Social, requiring near real-time analytics to deliver meaningful information from these oceans of data.  Some organizations select x86 server technology to deliver this computing power requirement, either in their own Data Centre or via a 3rd party Cloud Provider.  However, with unplanned downtime characteristics that don’t meet the seeming de facto 99.99% uptime availability metric, can the growth in x86 server technology continue?  From many perspectives, Reliability, Availability & Serviceability (RAS), Data Security via Pervasive Encryption and best-in-class Performance and Scalability, you might think that the IBM Z Mainframe would be the platform of choice?  For whatever reason, this is not always the case!  Maybe we need to look at recent developments and trends in the compute power delivery market and second guess what might happen in the future…

Significant Cloud providers deliver vast amounts of computing power and associated resources, evolving their business models accordingly.  Such business models have many challenges, primarily uptime and data security related, convincing their prospective customers to migrate their workloads from traditional internal Data Centres, into these massive rack provisioned infrastructures.  Recently Google has evolved from using Intel as its primary supplier for Data Centre CPU chips, including CPU chips from IBM and other semiconductor rivals.

In April 2016, Google declared it had ported its online services to the IBM POWER CPU chip and that its toolchain could output code for Intel x86, IBM POWER and 64-bit ARM cores at the flip of a command-line switch.  As part of the OpenPOWER and Open Compute Project (OCP) initiatives, Google, IBM and Rackspace are collaborating to develop an open server specification based on the IBM POWER9 architecture.  The OCP Rack & Power Project will dictate the size and shape or form factor for housing these industry standard rack infrastructures.  What does this mean for the IBM Z server form factor?

Traditionally and over the last decade or more, IBM has utilized the 24 Inch rack form factor for the IBM Z Mainframe and Enterprise Class POWER Systems.  Of course, this is a different form factor to the industry standard 19 Inch rack, which finally became the de facto standard for the ubiquitous blade server.  Unfortunately there was no tangible standard for a 19 Inch rack, generating power, cooling and other issues.  Hence the evolution of the OCP Rack & Power Standard, codenamed Open Rack.  Google and Facebook have recently collaborated to evolve the Open Rack Standard V2.0, based upon an external 21 Inch rack Form factor, accommodating the de facto 19 Inch rack mounted equipment.

How do these recent developments influence the IBM Z platform?  If you’re the ubiquitous global CIO, knowing your organizations requires 99.99%+ uptime, delivering continuous business application change via DevOps, safeguarding corporate data with intelligent and system wide encryption, perhaps you still view the IBM Z Mainframe as a proprietary server with its own form factor?

As IBM have already demonstrated with their OpenPOWER offering, collaborating with Google and Rackspace, their 24 Inch rack approach can be evolved, becoming just another CPU chip in a Cloud (E.g. IaaS, Paas) service provider environment.  Maybe the final evolution step for the IBM Z Mainframe is evolving its form factor to a ubiquitous 19 Inch rack format?  The intelligent and clearly defined approach of the Open Rack Standard makes sense and if IBM could deliver an IBM Z Server in such a format, it just becomes another CPU chip in the ubiquitous Cloud (E.g. IaaS, Paas) service provider environment.  This might be the final piece of the jigsaw for today’s CIO as their approach to procuring compute power might be based solely upon the uptime and data security metrics.  For those organizations requiring in excess of 99.99% uptime and fully compliant security, there only seems to be one choice, the IBM Z Mainframe CPU chip technology, which has been running Linux workloads since 2000!