Remote Journaling and its Benefits to iSeries High Availability
Author: Dale Porter
Unique and powerful data recovery capabilities are available for shops that use the remote journaling function of the OS/400 operating system. An extension of the standard journaling functions of OS/400, remote journaling essentially duplicates journal entries created on one iSeries machine and transmits them to another iSeries connected via high speed line. By storing the journal entries offline, remote journaling provides data recovery abilities that go well beyond tape backups and standard journaling.
Basics of Remote Journaling
Standard journaling essentially watches, designated objects, and as changes occur to the objects, information about these changes is written in a record called a journal entry, which is stored in a journal receiver. If necessary, these journal entries can be replayed to data that is restored from tape to recover data to a point that is much more current than merely restoring from tape saves.
Essentially, remote journaling takes standard journaling a step further by transmitting copies of journal entries, in real time, to journal receivers on another connected iSeries. This protects the journaled information, and in the event that data needs to be restored, the journal entries can be brought back to the production machine and replayed to data that is restored from tape.
For example, remote journaling can protect changes made to source code on a development machine between tape saves. When your source files are journaled with the remote journaling function enabled, the operating system will send a copy of the journal entries from the development machine to a journal receiver on the production machine as they occur. In the event of a failure, the most recent tape save can be restored, and the journal entries can be brought back to the development machine and then replayed to bring the source code nearly back its state at the time of failure.
Remote Journaling and High Availability
In addition to saving data changes offline for backup recovery purposes, remote journaling can also be an effective way to perform data warehousing and data vaulting. But where remote journaling really shines is when it is integrated into high availability software (HA) that fully mirrors entire business critical applications to a second iSeries for disaster recovery purposes. As data is mirrored in an HA solution that uses remote journaling, the data is very quickly and efficiently transported between systems.
Whether a HA solution uses remote journaling or not to transport data between systems, all high availability solutions use standard journaling to track data changes made to mirrored objects. As changes are made to data on the production system (source), the HA software harvests the data-change information from the journal entries and applies this information to a copy of the object on the backup system (target) to keep it synchronized. This data harvesting is done on either the source or the target machine, depending on the technology used by the HA vendor to transport the data. If the vendor uses their own proprietary technology to transport data, then harvesting is done off the source machine. If the vendor uses remote journaling to transport data, then harvesting is done on the target machine.
Let's look more closely at these two data transport methods:
When an HA vendor uses their own proprietary transport method (present in many HA solutions) the data must first be harvested from journal entries on the source system via their own proprietary process, which filters and sends this information to the target system.
For discussion purposes, this proprietary transport method is referred to as harvest-and-send in Figure 1. In this process, a job within the HA software harvests information from the journal entry on the source system, and once harvested, other jobs within the HA software on the source system validate and filter the information before another HA job transmits the information to the target system. This process results in moving and mapping data many times through the machine interface on the source system, creating a significant amount of overhead on that system.
Figure 1 - Data Replication Using Harvest and Send
When the harvested information reaches the target system, it is put in a temporary storage area - typically a log file. An apply job within the HA software then extracts this information and applies it to the necessary database files to bring these files current with the same files on the source machine.
Remote Journaling Transport
The remote journaling transport method differs dramatically from harvest-and-send in that nearly all of the processing required is moved to the target system, thus relieving the source system of this overhead. Among other benefits, this translates into dramatically less CPU demand on the source system (the system where your business critical applications are running).
Figure 2 illustrates a HA process that uses remote journaling. As with harvest-and-send, the remote journaling method uses standard journaling to detect changes made to data. The difference is that remote journaling automatically and nearly instantaneously puts a duplicate of each journal entry on the communication wire, sending it to a special journal receiver on the target system. The reason this data is put on the wire so quickly is the entire process is performed within the operating system (beneath the machine interface at the licensed internal code level). According to studies conducted by IBM1, when remote journaling is used, the journal entry is typically placed on the communication wire in less than 5 milliseconds.
Figure 2 - Data Replication Using Remote Journaling
Once the journal entry lands in the remote journal receiver on the target system, a harvest and apply process within the HA software extracts information from the remote journal receiver, filters and validates the data, and then applies the change to the database file on the target system.
As already mentioned, the most significant difference between the remote journaling and harvest-and-send transport processes is how system performance is affected on the source system. With harvest-and-send, most of the processing must occur on the source system, while remote journaling moves nearly all of the processing to the target system. One test conducted by IBM measured 7% to 10% additional CPU consumption on the source machine when a harvest-and-send process is used to transmit data. In contrast, only about 0.5% additional source CPU consumption was measured with remote journaling.2
In another test from IBM, when harvest-and-send was used, the number of bytes sent from the source machine to the target dropped significantly when CPU cycles were increased; however, with remote journaling, the throughput was not affected.3 This means that when a harvest-and-send process is used, a high potential exists for significant data loss should the source system suddenly fail or communications be cut since it is likely that an accumulation of unsent transactions would be left on the source machine at the time of failure. However, when the CPU burden of harvesting and validating transactions is shifted to the target machine by virtue of remote journaling - and because remote journaling moves data so quickly - this potential for vulnerability is substantially reduced.
Read More About It
This article has outlined only some of the benefits that are realized when remote journaling is used as the transport engine in iSeries HA solutions. These and other benefits are outlined in detail in the white paper, The Benefits of Remote Journaling for iSeries High Availability, available at www.iterainc.com. Additional information can also be found in Chapter 6 of IBM's Redbook, Striving for Optimal Journal Performance on DB2 Universal Database for iSeries.4
Dale Porter is Director of High Availability Development at iTera, Inc., a provider of high availability and continuous availability solutions for iSeries. Dale has been an IBM midrange developer since 1987. He can be reached at 801-799-0300 or Dale.Porter@iterainc.com.
IBM: Striving for Optimal Journal Performance on DB2 Universal Database for iSeries, Chapter 6.2.1. The entire Redbook can be found at http://www.redbooks.ibm.com
. Search under book ID#: SG24-6286-00
Ibid, Chapter 6.5.6
Ibid, Chapter 6.5.6