IBM Z Mainframe Pre-Production Testing: Spring Into Stress Testing via zBuRST

For those of us in the Northern Hemisphere it’s been another long & cold Winter & for many, a time of pandemic lockdown.  As we enter Spring, we often associate this annual season with hope & new life & perhaps opportunity.  Henry Wadsworth Longfellow once wrote ”If Spring came but once in a century, instead of once a year, or burst forth with the sound of an earthquake, and not in silence, what wonder and expectation would there be in all hearts to behold the miraculous change”!  Let’s not carried away, but I have recently worked with an IBM Z customer to finally perform a Pre-Production full workload test via the IBM Z Business Resiliency Stress Test (zBuRST) solution…

In an ideal world, zBuRST would offer a much needed solution for all IBM Z Mainframe users with limited resource or budget to perform Pre-Production full workload testing activities.  However, in reality, there are some significant qualification caveats, primarily a minimum of 10,000 MIPS workload capacity & the need for latest generation z14 or newer Mainframe servers.  As with anything in business or indeed life, if you don’t ask, you will never know & there is some flexibility from an installed MIPS viewpoint via your local IBM account team.

IBM Z Business Resiliency Stress Test (zBuRST) is a solution that enables the use of spare IBM Z server physical resources to stress test changes at Production workload scale, allowing qualitative & quantitative validation of any Production change to safeguard the performance & resilience profile of IBM Z mission critical workloads.  For the avoidance of doubt, a Pre-Production test can be verified with a minimal data subset for qualitative purposes, but only a 100%+ data quantitative stress test will verify the SLA & KPI metrics required for a mission critical workload.  zBuRST only supports Pre-Production (DevTest) environments, which could include a GDPS internal environment, or a 3rd party DR supplier.  However, zBuRST cannot be used for any DR activity, testing or real-life invocation.  Hopefully most IBM Z mainframe users are savvy & have included some flexibility in their 3rd party DR provision contracts, allowing for periodic use of such facilities, not solely DR based.  This is not an unusual requirement & if you rely upon a 3rd party provider for IBM Z resilience, work with them to evolve your IBM Z resource provision service contracts accordingly.

From a big picture viewpoint, zBuRST reduces change risk, safeguarding business resiliency by enabling the detection and resolution of abnormalities and defects in a Pre-Production environment, which inevitably manifest business outages, disruptions, or slowdowns:

  • For IBM Z users with matching (identical) hardware in a standalone test or DR environment, zBuRST provides the ability to perform load or stress test of new IBM Z hardware features & upgraded functions.
  • For IBM Z users whose DR sites do not match their Production environment, the zBuRST objective is to enable critical workload (E.g. use all available resource to test the mission critical workloads) testing.

From an eligibility viewpoint, if your organization is currently testing with constrained IBM Z resources, prohibiting adequate Production workload sized testing, zBuRST improves workload resiliency:

  • Can your business scale reliably & conform to SLA & KPI Metrics during seasonal or ad-hoc peak processing demands (E.g. Year End, Black Friday, Cyber Monday, et al)?
  • Is your business mission critical application impacted by change aversion, with fear of disrupting Production stability?
  • Are your agile DevOps aspirations hampered by the legacy waterfall application development approach, taking too long to adequately test changes, or introduce new features, functions, for Production workloads?
  • Do elongated Production outages (I.E. Downtime) come at an excessive or prohibitive business cost?
  • Is it too complex to provision adequate local or 3rd party IBM Z resources for large scale volume or integration tests?

The zBuRST solution has a number of prerequisites & the primary considerations are:

  • zBuRST is an extension of the IBM Z Application Development and Test Solution (DevTest Solution).
  • zBuRST Tokens are discounted at 80% from the cost of On/Off CoD capacity.
  • zBuRST can be purchased or for systems with a minimum of 10,000 installed MIPS, for up to 50-100% of Production capacity.  All MIPS capacity must reside in the same country.
  • zBuRST pre-paid tokens can be purchased up to 100% of the additional capacity needed to support Production scale stress testing.
  • zBuRST tokens allow for up to 15 days of testing; tokens can be activated for any 15 calendar days, whether consecutive or not (E.g. Preform n stress tests of n days duration).
  • zBuRST tokens expire 5-years from the IBM Z server LICC “Withdrawal from Marketing” date.
  • For DevTest Solutions, zBuRST capacity can be purchased to increase the size of the DevTest environment up to the equivalent number of Production MIPS.
  • For DR machine usage, zBuRST tokens can be purchased up to the equivalent number of Production MIPS.
  • zBuRST tokens can only be installed & exclusively used on IBM Z hardware owned by the IBM Z user (customer); zBuRST is not available to 3rd party IBM Z resource service providers.
  • zBuRST tokens are pre-paid On/Off CoD LIC records.  There can only be one On/Off CoD record active at a time.  Post-paid On/Off CoD LIC records & zBuRST tokens cannot be active at the same time on the same machine.  There cannot be mixing of pre-paid & post-paid On/Off CoD LIC records.

zBuRST can deliver greater certainty & benefit for an IBM Z organization via:

  • Change risk eradication with Production workload stress testing, increasing business resiliency, customer satisfaction & operational efficiency.
  • Faster delivery of new business features & functions at reduced risk, enabling an agile DevOps application change environment.
  • Empowering IT personnel to safely test changes, at Production workload scale, in a DevTest environment, identifying problems or anomalies that might or typically only occur at scale.
  • Higher ROI for DR resource usage (E.g. Use for stress testing, not just for DR testing).
  • Increased & comprehensive application testing capabilities for a lower cost.

When working with my customer over the last few months, the real-life lessons learned were:

  • Collaborate with the 3rd party IBM Z resource supplier, to safeguard the use of their IBM Z server based upon a days as opposed to a DR testing usage approach.  For the avoidance of doubt, contract for n days, where those n days could be used for any number of Pre-Production testing & DR usage.
  • Engage with all ISV organizations from an FYI viewpoint, informing them of this DevTest approach, where their software will be used for Pre-Production testing purposes, allowing them to safely generate temporary software license codes accordingly, as & if required.
  • Work really closely with your IBM account team, as this customer was a ~9,000 MIPS user & find a win-win situation for all.  That could be the provision of anticipated White Space CPU capacity by IBM or as a committed IBM Z Mainframe user, maybe the 10,000 MIPS watermark is just too high.
  • Educate your Operations, Applications & Business units on this zBuRST options.  Some IBM Z users might have been restricted for years if not decades, not being able to perform a 100% data & CPU resource Pre-Production workload test.  The brainstorming, collaboration & good will that manifests itself, is one of those few occasions in IT where the users of your IT services are happy to be an integral part of the change process!

My final observation is a reflection on the last few months of my day-to-day activities.  For 2-3 days per week, I have been combining IT work with being “Captain Clipboard” at a local UK COVID-19 vaccination centre, which in itself, has been so rewarding.  To see the relief on people, especially those that are of a mature age, perhaps infirmed, feeling they can be a part of the wider community again.  The parallels are obvious, zBuRST can allow those IBM Z users prohibited from performing 100% data & CPU Pre-Production testing activities, the opportunity to advance their business.  However, unlike the COVID-19 vaccination, which for the fortunate developed countries, is available to all citizens, zBuRST does have some usage restrictions.  Perhaps it’s up to the wider IBM Z user community to encourage IBM to revisit & modify their approach, perhaps reducing the MIPS capacity requirements to 5,000 MIPS.  Wherever you’re based globally, if you’re a member of SHARE (USA) or GSE (Europe), et al, maybe reach out to your Large Systems representatives & see if the global collective from the IBM Z user organizations can encourage IBM to evolve their opportunity, enabling zBuRST solution usage to a larger majority if not all IBM Z Mainframe users.

The Ever Changing IBM Z Mainframe Disaster Recovery Requirement

With a 50+ year longevity, of course the IBM Z Mainframe Disaster Recovery (DR) requirement and associated processes have changed and evolved accordingly.  Initially, the primary focus would have been HDA (Head Disk Assembly) related, recovering data due to hardware (E.g. 23nn, 33nn DASD) failures.  It seems incredulous in the 21st Century to consider the downtime and data loss with such an event, but these failures were commonplace into the early 1980’s.  Disk drive (DASD) reliability increased with the 3380 device in the 1980’s and the introduction of the 3990-03 Dual Copy capability in the late 1980’s eradicated the potential consequences of a physical HDA failure.

The significant cost of storage and CPU resources dictated that many organizations had to rely upon 3rd party service providers for DR resource provision.  Often this dictated a classification of business applications, differentiating between Mission Critical or not, where DR backup and recovery processes would be application based.  Even the largest of organizations that could afford to duplicate CPU resource, would have to rely upon the Ford Transit Access Method (FTAM), shipping physical tape from one location to another and performing proactive or more likely reactive data restore activities.  A modicum of database log-shipping over SNA networks automated this process for Mission Critical data, but successful DR provision was still a major consideration.

Even with the Dual Copy function, this meant DASD storage resources had to be doubled for contingency purposes.  Therefore this dictated only the upper echelons of the business world (I.E. Financial Organizations, Telecommunications Suppliers, Airlines, Etc.) could afford the duplication of investment required for self-sufficient DR capability.  Put simply, a duplication of IBM Mainframe CPU, Network and Storage resources was required…

The 1990’s heralded a significant evolution in generic IT technology, including IBM Mainframe.  The adoption of RAID technology for IBM Mainframe Count Key Data (CKD) provided an affordable solution for all IBM Mainframe users, where RAID-5(+) implementations became commonplace.  The emergence of ESCON/FICON channel connectivity provided the extended distance requirement to complement the emerging Parallel SYSPLEX technology, allowing IBM Mainframe servers and related storage to be geographically dispersed.  This allowed a greater number of IBM Mainframe customers to provision their own in-house DR capability, but many still relied upon physical tape shipment to a 3rd party DR services provider.

The final significant storage technology evolution was the Virtual Tape Library (VTL) structure, introduced in the mid-1990’s.  This technology simplified capacity optimization for physical tape media, while reducing the number of physical drives required to satisfy the tape workload.  These VTL structures would also benefit from SYSPLEX implementations, but for many IBM Mainframe users, physical tape shipment might still be required.  Even though the IBM Mainframe had supported IP connectivity since the early 1990’s, using this network capability to ship significant amounts of data was dependent upon public network infrastructures becoming faster and more affordable.  In the mid-2000’s, transporting IBM Mainframe backup data via extended network carriers, beyond the limit of FICON technologies became more commonplace, once again, changing the face of DR approaches.

More recently, the need for Grid configurations of 2, 3 or more locations has become the utopia for the Global 1000 type business organization.  Numerous copies of synchronized Mission Critical if not all IBM Z Mainframe data are now maintained, reducing the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) DR criteria to several Minutes or less.

As with anything in life, learning from the lessons of history is always a good thing and for each and every high profile IBM Z Mainframe user (E.g. 5000+ MSU), there are many more smaller users, who face the same DR challenges.  Just as various technology races (E.g. Space, Motor Sport, Energy, et al) eventually deliver affordable benefit to a wider population, the same applies for the IBM Z Mainframe community.  The commonality is the challenges faced, where over the years, DR focus has either been application or entire business based, influenced by the technologies available to the IBM Mainframe user, typically dictated by cost.  However, the recent digital data explosion generates a common challenge for all IT users alike, whether large or small.  Quite simply, to remain competitive and generate new business opportunities from that priceless and unique resource, namely business data, organizations must embrace the DevOps philosophy.

Let’s consider the frequency of performing DR tests.  If you’re a smaller IBM Z Mainframe user, relying upon a 3rd party DR service provider, your DR test frequency might be 1-2 tests per year.  Conversely if you’re a large IBM z Mainframe user, deploying a Grid configuration, you might consider that your business no longer has the requirement for periodic DR tests?  This would be a dangerous thought pattern, because it was forever thus, SYSPLEX and Grid configurations only safeguard from physical hardware scenarios, whereas a logical error will proliferate throughout all data copies, whether, 2, 3 or more…

Similarly, when considering the frequency of Business Application changes, for the archetypal IBM Z Mainframe user, this might have been Monthly or Quarterly, perhaps with imposed change freezes due to significant seasonal or business peaks.  However, in an IT ecosystem where the IBM Z Mainframe is just another interconnected node on the network, the requirement for a significantly increased frequency of Business Application changes arguably becomes mandatory.  Therefore, once again, if we consider our frequency of DR tests, how many per year do we perform?  In all likelihood, this becomes the wrong question!  A better statement might be, “we perform an automated DR test as part of our Business Application changes”.  In theory, the adoption of DevOps either increases the frequency of scheduled Business Application changes, or organization embraces an “on demand” type approach…

We must then consider which IT Group performs the DR test?  In theory, it’s many groups, dictated by their technical expertise, whether Server, Storage, Network, Database, Transaction or Operations based.  Once again, if embracing DevOps, the Application Development teams need to be able to write and test code, while the Operations teams need to implement and manage the associated business services.  In such a model, there has to be a fundamental mind change, where technical Subject Matter Experts (SME) design and implement technical processes, which simplify the activities associated with DevOps.  From a DR viewpoint, this dictates that the DevOps process should facilitate a robust DR test, for each and every Business Application change.  Whether an organization is the largest or smallest of IBM Z Mainframe user is somewhat arbitrary, performing an entire system-wide DR test for an isolated Business Application change is not required.  Conversely, performing a meaningful Business Application test during the DevOps code test and acceptance process makes perfect sense.

Performing a meaningful Business Application DR test as part of the DevOps process is a consistent requirement, whether an organization is the largest or smallest IBM Z Mainframe user.  Although their hardware resource might differ significantly, where the largest IBM Z Mainframe user would typically deploy a high-end VTL (I.E. IBM TS77n0, EMC DLm 8n00, Oracle VSM, et al), the requirement to perform a seamless, agile and timely Business Application DR test remains the same.

If we recognize that the IBM Z Mainframe is typically deployed as the System Of Record (SOR) data server, today’s 21st century Business Application incorporates interoperability with Distributed Systems (E.g. Wintel, UNIX, Linux, et al) platforms.  In theory, this is a consideration, as mostly, IBM Z Mainframe data resides in proprietary 3390 DASD subsystems, while Distributed Systems data typically resides in IP (NFS, NAS) and/or FC (SAN) filesystems.  However, the IBM Z Mainframe has leveraged from Distributed Systems technology advancements, where typical VTL Grid configurations utilize proprietary IP connected disk arrays for VTL data.  Ultimately a VTL structure will contain the “just in case” copy of Business Application backup data, the very data copy required for a meaningful DR test.  Wouldn’t it be advantageous if the IBM Z Mainframe backup resided on the same IP or FC Disk Array as Distributed Systems backups?

Ultimately the high-end VTL (I.E. IBM TS77n0, EMC DLm 8n00, Oracle VSM, et al) solutions are designed for the upper echelons of the business and IBM Z Mainframe world.  Their capacity, performance and resilience capability is significant, and by definition, so is the associated cost.  How easy or difficult might it be to perform a seamless, agile and timely Business Application DR test via such a high-end VTL?  Are there alternative options that any IBM Z Mainframe user can consider, regardless of their size, whether large or small?

The advances in FICON connectivity, x86/POWER servers and Distributed Systems disk arrays has allowed for such technologies to be packaged in a cost efficient and small footprint IBM Z VTL appliance.  Their ability to connect to the IBM Z server via FICON connectivity, provide full IBM Z tape emulation and connect to ubiquitous IP and FC Distributed Systems disk arrays, positions them for strategic use by any IBM Z Mainframe user for DevOps DR testing.  Primarily one consistent copy of enterprise wide Business Application data would reside on the same disk array, simplifying the process of recovering Point-In-Time backup data for DR testing.

On the one hand, for the smaller IBM Z user, such an IBM Z VTL appliance (E.g. Optica zVT) could for the first time, allow them to simplify their DR processes with a 3rd party DR supplier.  They could electronically vault their IBM Z Mainframe backup data to their 3rd party DR supplier and activate a totally automated DR invocation, as and when required.  On the other hand, moreover for DevOps processes, the provision of an isolated LPAR, would allow the smaller IBM Z Mainframe user to perform a meaningful Business Application DR test, in-house, without impacting Production services.  Once again, simplifying the Business Application DR test process applies to the largest of IBM Z Mainframe users, and leveraging from such an IBM Z VTL appliance, would simplify things, without impacting their Grid configuration supporting their Mission critical workloads.

In conclusion, there has always been commonality in DR processes for the smallest and largest of IBM Z Mainframe users, where the only tangible difference would have been budget related, where the largest IBM Z Mainframe user could and in fact needed to invest in the latest and greatest.  As always, sometimes there are requirements that apply to all, regardless of size and budget.  Seemingly DevOps is such a requirement, and the need to perform on-demand seamless, agile and timely Business Application DR tests is mandatory for all.  From an enterprise wide viewpoint, perhaps a modicum of investment in an affordable IBM Z VTL appliance might be the last time an IBM Z Mainframe user needs to revisit their DR testing processes!