Maximizing IBM Z System Of Record (SOR) Data Value: Is ETL Still Relevant?

A generic consensus for the IBM Z Mainframe platform is that it’s the best transaction and database server available, and more recently with the advent of Pervasive Encryption, the best enterprise class security server.  It therefore follows that the majority of mission critical and valuable data resides in IBM Z Mainframe System Of Record (SOR) database repositories, receiving and passing data via real-time transaction services.  Traditionally, maximizing data value generally involved moving data from the IBM Mainframe to another platform, for subsequent analysis, typically for Business Intelligence (BI) and Data Warehouse (DW) purposes.

ETL (Extract, Transform, Load) is an automated and bulk data movement process, transitioning data from source systems via a transformation engine for use by target business decision driven applications, via an installation defined policy, loading the transformed data into target systems, typically data warehouses or specialized data repositories.  Quite simply, ETL enables an organization to make informed and hopefully intelligent data driven business decisions.  This ubiquitous IT industry TLA (Three Letter Acronym) generated a massive industry of ETL solutions, involving specialized software solutions, involving various Distributed Systems hardware platforms, both commodity and specialized.  However, some ~30 years since the first evolution of ETL processes, is ETL still relevant in the 21st Century?

The 21st Century has witnessed a massive and arguably exponential data explosion, from cloud, mobile and social media sources.  These dynamic and open data sources demand intelligent analytics to process the data in near real-time and the notion of having a time delay between the Extract and Load part of the ETL process is becoming increasingly unacceptable for most data driven organizations.  During the last several years, there has been increased usage of Cloud BI, with a reported increase from ~25-80% of public cloud users, deploying Cloud BI solutions.

For cloud resident data warehouses, an evolution from ETL to ELT (Extract, Load, Transform) has taken place.  ELT is an evolutionary and savvy method for of moving data from source systems to centralized data repositories without transforming the data before it’s loaded into the target systems.  The major benefit of the ELT approach is the near real-time processing requirement of today’s data driven 21st Century business.  With ELT, all extracted raw data resides in the data warehouse, where powerful and modern analytical architectures can transform the data, as per the associated business decision making policies.  Put simply, the data transformation occurs when the associated analytical query activities are processed.  For those modern organizations leveraging from public cloud resources, ELT and Cloud BI processes make sense and the growth of Cloud BI speaks for itself.  However, what about the traditional business, which has leveraged from the IBM Z Mainframe platform for 30-50+ years?

Each and every leading Public Cloud supplier, including IBM (Watson) has their own proprietary analytical engine, integrating that technology into their mainstream offerings.  As always, the IBM Z Mainframe platform has evolved to deliver the near real-time requirements of an ELT framework, but are there any other generic solutions that might assist any Mainframe organization in their ETL to ELT evolution process?

B.O.S. Software Service und Vertrieb GmbH offer their tcVISION solution, which approaches this subject matter from a data synchronization viewpoint.  tcVISION is a powerful Change Data Capture (CDC) platform for users of IBM Mainframes and Distributed Systems servers.  tcVISION automatically identifies the changes applied to Mainframe and Distributed Systems databases and files.  No programming effort is necessary to obtain the changed data.  tcVISION continuously propagates the changed data to the target systems in real-time or on a policy driven time interval period, as and when required.  tcVISION offers a rich set of processing and controlling mechanisms to guarantee a data exchange implementation that is fully audit proof.  tcVISION contains powerful bulk processors that perform the initial load of mass data or the cyclic exchange of larger data volumes in an efficient, fast and reliable way.

tcVISION supports several data capture methods that can be individually used as the application and associated data processing flow requires.  These methods are based upon a Real-Time or near Real-Time basis, including IBM Mainframe DBMS, Logstream, Log and Snapshot (compare) data sources.  A myriad of generic database repositories are supported:

  • Adabas: Realtime/Near Realtime, Log Processing, Compare Processing
  • Adabas LUW: Real-time/Near Real-time, log processing, compare processing
  • CA-Datacom: Log processing, compare processing
  • CA-IDMS: Real-time/Near real-time, log processing, compare processing
  • DB2: Real-time/Near real-time, log processing, compare processing
  • DB2/LUW: Real-time/Near real-time, log processing, compare processing
  • Exasol: Compare processing
  • IMS: Real-time/Near real-time, log processing, compare processing
  • Informix: Real-time/Near real-time, log processing, compare processing
  • Microsoft SQL Server: Real-time/Near real-time, log processing, compare processing
  • Oracle: Real-time/Near real-time, log processing, compare processing
  • PostgreSQL: Real-time/Near real-time, log processing, compare processing
  • Sequential file: Compare processing
  • Teradata: Compare processing
  • VSAM: Real-time/Near real-time, log processing, compare processing
  • VSAM/CICS: Real-time/Near real-time, log processing, compare processing

tcVISION incorporates an intelligent bulk load component that can be used to unload data from a Mainframe or Distributed Systems data source, loading the data into a target database, either directly or by using a loader file.  tcVISION comes with an integrated loop-back prevention for bidirectional data exchange, where individual criteria can be specified to detect and ignore changes that have already been applied.  tcVISION incorporates comprehensive monitoring, logging and integrated alert notification.  Optional performance data may be captured and stored into any commercially available relational database.  This performance data can be analyzed and graphically displayed using the tcVISION web component.

From an ETL to ELT evolution viewpoint, tcVISION delivers the following data synchronization benefits:

  • Time Optimization: Significant reduction in data exchange implementation processes and data synchronization processing.
  • Heterogenous Support: Independent of database supplier, offering support for a myriad of source and target databases.
  • Resource Optimization: Mainframe MIPS reduction and data transfer optimization via intelligent secure compression algorithms.
  • Data Availability: Real-time data replication across application and system boundaries.
  • Implementation Simplicity: Eradication of application programming and data engineer resources.
  • Security: Full accountability and auditability all data movements.

In conclusion, the ETL process has now been superseded by the real-time data exchange requirement for 21st Century data processing via the ELT evolution.  Whether viewed as an ELT or data synchronization requirement, tcVISION delivers an independent vendor agnostic solution, which can efficiently deliver seamless data delivery for analytical purposes, while maintaining synchronized data copies between environments in real-time.