Computer Corporation of America
|
Feedback
Search CCA:
   
USA CCA
CCA Products
CCA Customer Support
CCA Resources
CCA - Company
CCAPRINT: A Newsletter for Model 204® and System 1032® Users
August 10, 2003

Model 204

V5R1 RESTART Recovery: Major Enhancements, Part 1
By James Damon


In V5R1, significant enhancements were made to several aspects of RESTART recovery. These enhancements vastly improve the ease-of-use, performance and flexibility of this powerful, data-integrity feature. The first and most significant enhancement, from an ease-of-use perspective, is automated secondary recovery. This article will focus exclusively on this aspect of RESTART recovery and ROLL BACK and ROLL FORWARD processing.

ROLL BACK and ROLL FORWARD processing may be necessary to restore file and data integrity when the Online terminates without taking a checkpoint. This can occur for a number of reasons including:

Electrical power failures
Operating system failures
Model 204 abnormally terminates, because:
  CHKPOINT dataset is full
  CCAJRNL dataset is full
  Operator cancels
  EOJ is without successful checkpoint

Reviewing the RESTART Recovery Process

Successful RESTART recovery depends on the success of two processes, ROLL BACK and ROLL FORWARD, and the following associated procedures and components.

Primary Recovery
Primary recovery is the first attempt at ROLL BACK, ROLL FORWARD processing. If ROLL BACK processing fails for any reason, the job is simply resubmitted without changes to JCL, datasets or parameters: the process is still primary recovery.

Secondary Recovery
However, if ROLL BACK processing is successful, but the subsequent ROLL FORWARD processing fails for any reason, recovery must be rerun. Each of these reruns is called secondary recovery. Prior to V5R1, this and all subsequent reruns required changes to JCL and datasets. In many cases, secondary recovery jobs require direct, human intervention.

Recovery Datasets
A Model 204 Online run, if enabled for recovery, writes recovery data to the following two datasets and, if recovery is necessary, these datasets become the input datasets for primary recovery.

CHKPOINT, used by ROLL BACK processing
CCAJRNL, used by ROLL FORWARD processing
ORIGINAL ONLINE JOB
PRIMMARY RECOVERY JOB
//CHKPOINT DD DSN=O.CHKP
//RESTART DD DSN=O.CHKP
//CCAJRNL DD DSN=O.JRNL //CCARF DD DSN=O.JRNL
//CHKPOINT DD DSN=R.CHKP1
//CCAJRNL DD DSN=R.JRNL1
Figure 1. Online and primary recovery datasets

Managing Pre-V5R1 Secondary Recovery

Prior to V5R1, if ROLL BACK processing in the primary recovery job was successful, but ROLL FORWARD processing failed for any reason, secondary recovery needed to be run modifying the RESTART and CHKPOINT datasets as shown in Figure 2.

SECONDARY RECOVERY JOB
  NEXT SECONDARY RECOVERY JOB
//RESTART DD DSN=R.CHKP1
  //RESTART DD DSN=R.CHKP2
//CCARF DD DSN=O.JRNL
//CCARF DD DSN=O.JRNL
//CHKPOINT DD DSN=R.CHKP2
//CHKPOINT DD DSN=R.CHKP3
//CCAJRNL DD DSN=R.JRNL1 //CCAJRNL DD DSN=R.JRNL1
Figure 2. Pre-V5R1 secondary recovery datasets

Constantly changing the CHKPOINT and RESTART datasets between secondary recovery runs was challenging. It was too easy to lose track of what you were trying to do, making RESTART recovery error prone. If at any time the JCL updates for these datasets were not done, or not done correctly, recovery failures would easily compound themselves. To simplify recovery and help ensure its success, CCA automated secondary recovery in V5R1.

Introducing Automated Secondary Recovery

In V5R1 the RESTART recovery feature itself determines whether to use the dataset pointed to by RESTART or the dataset pointed to by CHKPOINT for ROLL BACK processing.

If the RESTART dataset is used, then the RESTART recovery feature determined that this is a primary recovery run.
If the CHKPOINT dataset is used, then the RESTART recovery feature determined that this is a secondary recovery run.

One of the following messages is issued whenever ROLL BACK processing runs:
M204.2512: ROLL BACK WILL USE THE FOLLOWING DATASET: RESTART
or
M204.2512: ROLL BACK WILL USE THE FOLLOWING DATASET: CHKPOINT

In either case, no changes are required to change JCL DDNAMES and datasets; this is now handled internally by Model 204 recovery.

Reusing Recovery JCL
Beginning in V5R1, the following recovery JCL may be resubmitted repeatedly until recovery failures are eliminated and recovery is successful. In addition, in some cases you might be required to make other changes to correct problems encountered in previous failed recovery processing such as removing a file from processing and setting it aside for later attention. Or, you might have to change various parameters between successive recovery attempts to eliminate the cause of the failures. However, V5R1 recovery processing is straightforward.

PRIMARY RECOVERY JOB
  ALL SUBSEQUENT RECOVERY JOBS
//RESTART DD DSN=O.CHKP
//RESTART DD DSN=O.CHKP
//CCARF DD DSN=O.JRNL //CCARF DD DSN=O.JRNL
//CHKPOINT DD DSN=R.CHKP //CHKPOINT DD DSN=R.CHKP
//CCAJRNL DD DSN=R.JRNL //CCAJRNL DD DSN=R.JRNL
Figure 3. V5R1 Primary and secondary recovery datasets

Planning for Dynamically Allocated Files in Recovery
If you dynamically allocate files, your RESTART recovery job may need a larger primary space allocation for the dataset pointed to by CHKPOINT than for the dataset pointed to by RESTART. In most cases, if the CHKPOINT dataset is one cylinder larger than the RESTART dataset that is sufficient. However, if the job being recovered issued the ALLOCATE command for many files, the size of the CHKPOINT dataset should be larger than the RESTART dataset by a factor of 1 cylinder for each 300 datasets dynamically allocated. If the CHKPOINT allocation is too small, a message similar to the following is issued and recovery must be rerun with a larger CHKPOINT dataset:

M204.2605: CHKPOINT TOO SMALL FOR ROLL FORWARD - 1276 BLOCKS REQUIRED; 120 FOUND

Reviewing the RESTART Recovery Process
Primary recovery reads the dataset pointed to by RESTART to roll back files that were being updated at the time of the original failure. When ROLL BACK processing completes, all qualifying files have been rolled back to the last checkpoint. ROLL FORWARD processing then begins and reads the dataset pointed to by CCARF, reapplying file updates up to the point of the failure. As ROLL FORWARD proceeds, new file preimage pages are written to the dataset pointed to by CHKPOINT. If ROLL FORWARD processing fails, the new CHKPOINT dataset is used in the next recovery run (secondary recovery) for ROLL BACK processing.

Testing Advice

We recommend that all system managers test this automated secondary recovery in a test environment using the following steps:
1. Perform updates to a number of test files in the test Online with CPTIME=999, to ensure that no checkpoint is taken.
2. Cancel the Online run, while updates are in progress, from the operating system console (or for VM, IPL the Model 204 virtual machine).
3. Run RESTART recovery and cancel during ROLL BACK processing.
4. Rerun RESTART recovery, this is still primary recovery, and cancel during ROLL FORWARD processing.
5. Rerun RESTART recovery, this is now secondary recovery, and allow process to complete.
This should provide useful experience and a good understanding of what happens when RESTART recovery is actually required.

Summary

In future CCAPRINT articles, I will discuss the performance and flexibility enhancements that were implemented in Model 204 RESTART recovery.

System 1032

Lesser Known Features: Part 5
By Tym StegnerTym
My article this month, about the lesser-known features of System 1032, looks at the System 1032 debugger.

Unleashing the System 1032 Debugger

Few people are aware that System 1032 incorporates a full-featured debugger as part of the command language. The debugger is a set of diagnostic commands that lets you:

Display source code
Observe program execution and updates to variables
Suspend or step through execution of your procedures

Although the debugger can execute procedures that are not compiled in debug mode, it cannot display source code or statement numbers of the tracepoints, watchpoints, and breakpoints that it executes in those procedures.

To use the full capabilities of the PL1032 debugger, you must compile a program in debug compile mode. The System 1032 compiler in debug compile mode includes additional information in the compiled code that the debugger can use.

To compile a procedure in debug mode, use the following commands shown in Figure A.

1032> DEBUG COMPILE ON
Enter debugger compile mode
1032> @procname.DMC
1032> DEBUG COMPILE OFF
Debugger compile completed successfully
Figure A. Code to compile a procedure in debug mode

1032>

To debug an appropriately compiled procedure, give the DEBUG command:

1032> DEBUG
1032_DBG>

At the debugger prompt, you are within the debug environment. At this point you can configure the debug environment to facilitate debugging your program. Many existing PL1032 commands are available, as well as debugging-specific commands, such as DGO, EXAMINE, SET BREAK, and STEP. You can press the PF1 (Choice) key or the PF2 (Help) key to list available commands.

The configuration capabilities of the debugger include:

Capability Purpose
Tracepoints



Use the SET TRACE command to track the execution of a program at various stages without suspending execution. Each time a tracepoint is encountered, the debugger displays the cause, the statement number, and the source line, before continuing execution.

To execute PL1032 commands at a defined point, a tracepoint can be extended using the DO option. At a tracepoint, SHOW commands are often used and also, information is often logged to a tracking file.

Watchpoints

Use the SET WATCH command to monitor variables, including form variables, as the program executes. When the value of the watched variable changes, the debugger displays both old and new values, the statement causing the change, and then suspends execution. Issue a DGO command to resume execution.

You can use watchpoints to modify the value of a variable during execution. Watchpoints can be extended using the DO clause.

Breakpoints

Use the SET BREAK command to temporarily suspend program execution at a specific location or event.

Locations include labels and statement numbers.
Events are one or more of ENTRY, ERROR and RETURN. The ALL_PROC option lets you set event breaks for all procedures in the debug environment at once.

Breakpoint functionality can be extended using the DO clause.

EXAMINE Use the EXAMINE command to list all or portions of procedure source code.
CLEAR Use the CLEAR command to terminate all or selected tracepoints, watchpoints, or breakpoints. SET commands are cumulative within the debugger environment.

Once you configure your debugging environment by setting appropriate tracepoints, watchpoints upon variables, and/or breakpoints, give the DGO command to return to the PL1032 prompt to execute your procedure.
Note: You remain in the debug environment until you exit debug mode at the 1032_DBG prompt. The debugger retains control of all program execution.
Documentation of the debugger commands can be found in Chapter 5 of the System 1032 User’s Guide, Module 4 and in the command specific pages of the System 1032 Programmer’s Reference, Module 1.

In Summary

Gone are the days of inserting PRINT statements for debugging. Programmers rely on debugging capabilities to develop robust applications. System 1032 serves this need as part of its core command language.

Coming Attractions

In the coming months, articles suggested by a review of lesser-known features will include:

TRANGEN – a transact-command generator
System 1032 Record Descriptors
System 1032 Security

Copyright © 2008 Computer Corporation of America.
All right reserved. Published in the United States of America.


Contact CCA Webmaster
Copyright 2008