Simplified Business Facing IBM Z Mainframe DevOps APM Problem Determination

Increasingly IBM Z Mainframe stakeholders are becoming cognizant that traditional processes for handling Information Technology operations are becoming obsolete, hence the emergence of DevOps (DevSecOps) frameworks.  Driven by digital transformation & the perpetually increasing demand for new digital services, consuming vast unparalleled amounts of data, Data Centres are becoming increasingly pressurized to deliver & maintain these mission-critical services.  A major challenge is the availability of these services, where transaction & throughput workloads can be unpredictable, often ad-hoc demand driven (E.g. Consumer) & not the typical periodic planned peaks (E.g. Monthly, Annual, et al).

Today’s inward facing, dispassionate & honest CIO knows their organization can spend inordinate amounts of time, being reactive to business application impact incidents, often finding they spend too long reacting to incidents & all too often they don’t have enough bandwidth to be proactive & prevent the incident from occurring in the first place.  It’s widely accepted that for the majority of Global 1000 companies, deploying an IBM Z Mainframe platform provides them with the de facto System Of Record (SOR) data platform, with associated Database (E.g. Db2) & Transaction (E.g. CICS, IMS) subsystems.  Therefore playing such a central & integral part of today’s 21st century digital application infrastructure, business performance issues can affect the entire application, dictating that early detection & resolution of performance issues are business critical, with the ultimate goal of eliminating such issues altogether.

Technologies such as z/OS Connect, provide a simple & intuitive API based method for the IBM Z Mainframe to become an interconnected platform, with all other Distributed Platforms.  This dictates the evolution in Operations Management processes, considering the business application from a non-technical viewpoint, treating management from a holistic viewpoint with end-to-end monitoring, regardless of the underlying hardware & software platforms.

Today’s 21st Century digital economy dictates that central Operation teams don’t have inordinate amounts of time & indeed the requisite Subject Matter Expert (SME) skills for problem investigation activities.  A more proactive & automated response would be the deployment of simplified, lean & cost-efficient automated monitoring processes, allowing Operations teams to detect potential problems & their associated failure reason in near real-time.

Distributed tracing provides a methodology for interpreting how applications function across processes, services & networks.  Tracing uses the associated activity log trail from requests processed, capturing tracing information accordingly, as they move from interconnected system to system.  Therefore with Distributed tracing, organisations can monitor applications with Event Streams, helping them to understand the size & shape of the generated traffic, assisting them in the identification & related causes of potential business application problems.  It comes as no surprise that Distributed tracing has become a pivotal cornerstone of the DevOps toolbox, leveraging from the pervasive Kafka Open-Source Software architecture technology for distributed systems.  Simply, Kafka provides meaningful context from messaging & logging generated by IT platforms various, delivering data flow visualizations, simplifying identification & prioritization of business application performance anomalies.  Put simply, Kafka Distributed tracing pinpoints where failures occur & what causes poor performance (I.E. X marks the spot)!

From a business & therefore non-technical viewpoint, the utopia is to understand the user experiences delivered & associated business impacts; ideally positive, therefore eliminating the negative.  Traditionally from a technical viewpoint, experts have focussed on MELT (Metrics, Events, Logs, Traces) data collection, allowing for potential future problem determination & resolution.  Historically when this was the only data available, it therefore follows, manual & time consuming technical processes ensued.  As we have explored, DevOps is about simplification, optimization, automation & ultimately delivering the best business service!  If only there was a better way…

OpenTelemetry is a collection of tools, APIs & SDKs, utilized to instrument, generate, collect & export telemetry data (Metrics, Events, Logs, Traces) to assist software performance behavioural analysis.  Put simply, OpenTelemetry is an Open-Source Software vendor agnostic standard for application telemetry & supporting infrastructures & services data collection:

  • APIs: Code instrumentation deployment for telemetry data trace generation
  • SDKs: Collect the telemetry data for the rest of the telemetry data processing
  • In-Process Exporters: Translate telemetry data into custom formats for Back-End processing or
  • Out-Of-Process Exporters: Translate telemetry data into custom formats for Back-End processing

In conclusion, from a big picture viewpoint, the IBM Z Mainframe is just another IP node on the network, seamlessly interconnecting with Distributed Systems platforms for 21st century digital business application processing.  Regardless of technical platform, DevOps is not a technical discipline, it’s a business orientated user experience process & as such, requires automated issue detection & rapid resolution.  Open-Source Software (OSS) frameworks such as OpenTelemetry & Distributed Tracing allow for the simplified low cost collection & visualization of instrumentation data.  How can the IBM Z Mainframe organization incorporate a DevOps facing solution to aggregate this log data, providing an optimal cost, resource friendly Application Performance Management (APM) solution for simplified business application performance identification?

z/IRIS (Integrable Real-Time Information Streaming) integrates the IBM Z Mainframe platform into commonplace pervasive enterprise wide Application Performance Monitoring (APM) solutions, allowing DevOps resources to gain the insights they need to better understand Mainframe utilization & potential issues for mission critical business services.

z/IRIS incorporates OpenTelemetry observability for IBM Z Mainframe systems & applications, enriching traces (E.g. Db2 Accounting, Db2 Deadlock, zOS Connect, JES2, OMVS, STC, TSO) with attributes to facilitate searching, filtering & analysis of traces in today’s 3rd party enterprise wide APM tools (E.g. AppDynamics, Datadog, Dynatrace, IBM Instana, Jaeger, New Relic, Splunk, Sumo Logic).

Capturing metrics & creating associated charts has been an integral part of performance monitoring for several decades or more.  z/IRIS seamlessly integrates with APM tools such as Instana & data visualization tools such as Grafana to supply zero maintenance automated dashboards for commonplace day-to-day usage.  Of course, each & every business requires their own perspectives, hence z/IRIS incorporates easy-to-use customizable dashboards for such requirements. Because APM & data visualization tools collect data metrics from a variety of information sources, tracing every request from cradle (E.g. Client Browser) to Grave (E.g. Host Server), the z/IRIS Mainframe data combinations for your digital dashboards are potentially infinite, where the data presented is always accurate & in real time.

z/IRIS is simple to use & simple to install, incorporating many tried & tested industry standard Open-Source Software components, optimizing costs & simplifying product support.  Wherever possible, using Java based applications, from an IBM Z Mainframe viewpoint, CPU utilization is minimized, utilizing zIIP processing cycles whenever available.  z/IRIS delivers a lightweight, resource & cost efficient z/OS APM solution to provide an end-to-end performance analysis of today’s 21st Century digital solutions.  Because z/IRIS leverages from industry standard Open-Source frameworks deployed by commonplace Distributed Systems APM solutions, the instrumentation captured & interpreted by z/IRIS enriches dynamically as APM functionality increases.  For example, Datadog Watchdog Insights can identify increased latency from a downstream z/OS Connect application, just by processing new analytics, from existing telemetry data.  The data had already been captured, as APM functionality evolves, new meaningful business insights are gained.  z/IRIS can deliver the following example benefits for any typical IBM Z Mainframe DevOps environment:

  • Automated IBM Z Mainframe Observability: Automate the collection of end-to-end data tracing information.
  • Real Time Impact Notification: Intelligent data processing to present meaningful DevOps dashboard notifications of business applications service status & variances.
  • Universal Access & Ease Of Use: Facilitate end-to-end Application Performance Monitoring (APM) for all IT teams, not just IBM Z Mainframe Subject Matter Experts (SME).
  • Reduce MTTD & MTTR For Optimized User Services: Reduce Mean Time To Detect (MTTD) & ideally eradicate the Mean Time To Repair (MTTR), the typical Key Performance Indicators (KPIs), with intelligent root cause analysis.

Pervasive Encryption & Compression: Why z15 Upgrade Activities Are Optimal & Strategic

A recent IBM Security sponsored Cost of a Data Breach report by the Ponemon Institute highlighted an average data breach cost of $3.86 Million.  Personally Identifiable Information (PII) accounted for ~80% of data breaches, exposing a range of between 3,400 & 99,730 compromised records.  The term Mega Breach is becoming more commonplace, classified as the exposure of 1+ Million records, where the average cost for such events increases exponentially, ~$50 Million for exposures up to 10 Million records, rising to ~$392 Million for exposures of 50+ Million records.  From an incident containment viewpoint, organizations typically require 207 days to identify & 73 days to contain a breach, totalling an average lifecycle of 280 days.  Seemingly the majority (I.E. 95%+) of data records breached were not encrypted.  I think we can all agree to agree, prevention is better than cure & the costs of these data breaches are arguably immeasurable to an organization in terms of customer trust & revenue downturn…

With the launch of IBM z14 in 2017, IBM announced its core CPU hardware included the Central Processor Assist for Cryptographic Function (CPACF) encryption feature embedded in the processor chip.  The ability to encrypt data, both at rest & in flight, for a low cost, was good news for IBM Z customers concerned about data security.  Classified as Pervasive Encryption (PE), the capability was designed to universally simplify data encryption processes, eradicating potential sources of data loss due to unwanted data breach scenarios.

It’s patently obvious that encryption inflates data & so we must consider the pros & cons of data compression accordingly.  An obvious downside of z14 data encryption is that it can render storage-level compression ineffective, because once the data is encrypted, it is not easily compressed.  A zEnterprise Data Compression (zEDC) card could be deployed to compress the data before encryption, but with added expense!  Wouldn’t it be good if data compression & encryption were performed on the CPU core?

For the IBM z15, with the Integrated Accelerator for zEnterprise Data Compression (zEDC), the industry standard compression used by zEDC is now built into the z15 core, vis-à-vis encryption via CPACF.  IBM z15 customers can now have the best of both worlds with compression, followed by encryption, delivered by the processor cores.  Therefore encryption becomes even less expensive, because after data compression, there is significantly less data to encrypt!

zEDC can perform compression for the following data classification types:

  • z/OS data (SMF logstreams, BSAM & QSAM data sets, DFSMShsm & DFSMSdss processing)
  • z/OS Distributed File Service (zFS) data
  • z/OS applications, using standard Java packages or zlib APIs
  • z/OS databases (Db2 Large Objects, Db2 Archive Logs, ContentManager OnDemand)
  • Network transmission (Sterling Connect:Direct)

Arguably the increase in remote working due to COVID-19 will increase the likelihood & therefore cost of data breaches & although encryption isn’t the silver bullet to hacking remediation, it goes a long way.  The IBM Z Mainframe might be the most securable platform, but it’s as vulnerable to security breaches as any other platform, primarily due to human beings, whether the obvious external hacker, or other factors, such as the insider threat, social engineering, software bugs, poor security processes, et al.  If it isn’t already obvious, organizations must periodically & proactively perform Security Audit, Penetration Test & Vulnerability Assessment activities, naming but a few, to combat the aforementioned costs of a security breach.

Over the decades, IBM Z Mainframe upgrade opportunities manifest themselves every several years & of course, high end organizations are likely to upgrade each & every time.  With a demonstrable TCO & ROI proposition, why not, but for many organizations, such an approach is not practicable or financially justifiable.  Occasionally, the “stars align” & an IBM Z Mainframe upgrade activity becomes significantly strategic for all users.

The IBM z15 platform is such a timeframe.  Very rarely do significant storage & security functions coincide, in this instance on-board CPU core data compression & encryption, eradicating host resource (I.E. Software, Hardware) usage concerns, safeguarding CPU (I.E. MSU, MIPS) usage optimization.  External factors such as global data privacy standards (E.g. EU GDPR, US PII) & associated data breach penalties, increase the need for strategic proactive security processes, with data encryption, high on the list of requirements.  Add in the IBM Z Tailored Fit Pricing (TFP) option, simplifying software costs, the need to compress & encrypt data without adding to the host CPU baseline, the IBM z15 platform is ideally suited for these converging requirements.  Pervasive Encryption (PE) was introduced on the IBM z14 platform, but on-board CPU core compression was not; GDPR implementation was required by 25 May 2018, with associated significant financial penalties & disclosure requirements; IBM Z Tailored Fit Pricing (TFP) was announced on 14 May 2019, typically based upon an MSU consumption baseline.

Incidentally, the IBM z15 platform can transform your application & data portfolio with enterprise class data privacy, security & cyber resiliency capabilities, delivered via a hybrid cloud.  Mainframe organisations can now get products & ideas to market faster, avoiding cloud security risks & complex migration challenges.  With the Application Discovery & Delivery Intelligence (ADDI) analytical platform, cognitive technologies are deployed to quickly analyse Mainframe applications, discovering & understanding interdependencies, minimizing change risk, for rapid application modernization. In conclusion, a year after the IBM z15 platform was announced in September 2019, field deployments are high, with the majority of promised function delivered & field tested.  With the ever-increasing cybersecurity threat of data breaches & an opportunity to simplify IBM Z software Monthly License Charges (MLC), a z15 upgrade is both strategic & savvy.  Even & maybe especially, if you’re using older IBM Z server hardware (E.g. z13, zxC12, z114/z196, z10, z9, et al), your organization can easily produce a cost justified business case, based upon reduced software costs, Tailored Fit Pricing or not, optimized compression & encryption, delivering the most securable platform for your organization & its customers.  Combined with proactive security processes to eliminate a myriad of risk register items, maybe that’s a proposition your business leaders might just willingly subscribe to…