Computer Corporation of America
|
Feedback
Search CCA:
   
USA CCA
CCA Products
CCA Customer Support
CCA Resources
CCA - Company
CCAPRINT: A Newsletter for Model 204® and System 1032® Users
July 10, 2004
   
System 1032: Managing Lingering Temporary Files Printer-friendly version
Model 204: Monitoring Checkpoints and the CHKPOINT Dataset Printer-friendly version

System 1032

USE OF AND ACCESS TO PRODUCTS AND FEATURES ARE IN ACCORDANCE WITH THE TERMS AND CONDITIONS OF THE USER’S SOFTWARE LICENSE. THE PRESENTATION OF MATERIAL HEREIN DOES NOT, IN ANY MANNER, MODIFY SUCH TERMS AND CONDITIONS.

Managing Lingering Temporary Files
By Tym Stegner

Tym

 


Recently a customer reported the following error while using an ODBC-enabled PC application:


  [Simba][SimbaClient ODBC
 Driver][Simbaclient][SimbaLNA][Simba][SimbaEngine ODBC Driver][DRM 
File Library] S1032SRV-E-JNCNTWRT, Error Indicator 268550178 writing 
on a scratch file processing JOIN %RMS-E-EXT, ACP file extend failed. 

Translating the codes, looking up the errors, and my knowledge of their applications told me there was an OpenVMS problem when executing a joined query. However, before I could go further the customer told me:

“We figured out the problem. The users login area was running out of disk quota.”

Another contact from the same site summed up their problem as follows:

“From my observation, when a user makes an ODBC connection, a
*.DMT file is created under their SYS$LOGIN area. After the ODBC connection terminates, the file is deleted.

Except that for some users those files are not deleted. This caused the disk quota exceeded problem.

  I have two questions:
 
1.
Why is it only some ODBC users whose temporary files were not deleted?
 
2.
Is there a way we can redirect those temporary files to a scratch area?”

Thus begins our story of what caused the lingering temporary files and how they can be eliminated.

Identifying the Lingering Temporary Files
Of course, a well-functioning application does not leave its temporary files hanging about. Indeed, under most circumstances the System 1032 ODBC Driver does not leave lingering temporary files.

The following example is an abridged version of one of the directory areas provided by the customer:

Directory DSK:[USR_DB_DATA]
AAA_0010F6.DMT_TMP;1 68/68 5-APR-2004
AAA_001482.DMT_TMP;1 68/68 9-APR-2004
...
[126 intervening files omitted]
...
EAA001526.DMS_TMP;1 51/51 6-MAY-2002
EVENT.LOG;1 579/595 7-OCT-1999
FTP_SERVER.LOG;288 3/17 9-JUN-2004
LOGIN.COM;8 6/17 8-DEC-1999
ODBC_SYSTEM113220448241246.DMS;1 102/102 24-MAY-2001
ODBC_SYSTEM123173532401735.DMS;1 119/119 15-MAR-2004
ODBC_SYSTEM124220448081169.DMS;1 119/119 13-DEC-2001
ODBC_SYSTEM170172897521714.DMS;1 119/119 15-APR-2004
ODBC_SYSTEM3481728975286786.DMS;1 119/119 29-MAR-2004
ODBC_SYSTEM392218480725670.DMS;1 102/102 24-OCT-2000
ODBC_SYSTEM617289752227.DMS;1 119/119 23-MAR-2004
SYSODBCVIEW_1169.OUT;1 22/34 13-DEC-2001
SYSODBCVIEW_1246.OUT;1 17/17 24-MAY-2001
SYSODBCVIEW_5670.OUT;1 17/17 24-OCT-2000
Total of 140 files, 9881/9979 blocks.

A close look through the directory listing shows us the EVENT.LOG file was originally created back in 1999, demonstrating that a production-level application has been using the System 1032 ODBC Driver for at least five years. Furthermore, the two different formats of the DMT_TMP file names reflect the fact that this particular site has run at least three versions of the ODBC Driver in production. Over time, the application and other since-corrected errors, combined with the lack of account cleanup, has allowed for lingering temporary files.

What Are the Temporary Files?
Like many applications, the System 1032 ODBC Driver needs occasional temporary files to perform its work. For example:

  The ODBC_SYSTEM*.DMS files are temporary datasets created for join and stored procedure operations.
 
The SYSODBCVIEW_nnnn.OUT files are also a result of join operations.
  The EVENT.LOG is created by the TCP/IP software connection.
  DMD_TMP, DMS_TMP, DMT_TMP files are other ODBC temporary files naming structures used for stored procedure operations, such as creation of datasets for result set processing.

Why Did These Files Linger?
The error message that opens this article is a file extend error, that indicates an error condition of the server which implies that the join query is not completely executed. Therefore, the server is not cleaning up the temporary files as it should.

As for the remaining files--well, four years of application use may generate the occasional application or server error and, again, the server did not have the opportunity to clean up its temporary files.

Addressing the Lingering Temporary File Situation
The default locations for temporary files are SYS$SCRATCH and SYS$LOGIN, as shown in the following ODBC.INI files data source declaration:

 [S1032_IVP]
DBQ=S1032ODBC_DEMO:IVP_DICTIONARY
description=S1032 ODBC Installation Data Source
TempFileDir=SYS$SCRATCH:
NewTableDir=SYS$LOGIN:
;DrmSpy=1
;StatFile=S1032ODBC_INIPATH:ASP_STATS.LIS

Users Clean Out Their Own Temporary Files
One way to clean up extraneous temporary files is to have users log into their OpenVMS accounts and perform cleanup. However, these days not everyone regularly uses the shell account. Indeed, many users may not know shell accounts exist. Therefore, perhaps you must enact mechanisms to clean up user directories. I recommend several approaches for these mechanisms.

Purging User Directories
User accounts added to facilitate ODBC operations may share some common attribute of user name pattern or UIC group. You can use this aspect of your naming convention to perform disk-wide purges of the uniquely identifiable ODBC temporary files. The basic command might be:

$ DELETE DSKn:[...]*.DM%_TMP;*,ODBC_SYSTEM*.DMS;*,SYSODBCVIEW*.OUT;*

You could issue such a command across multiple disks or multiple individual user directories, using a scanning mechanism similar to the DM_SCAN.COM utility, described in the April 2002 CCAPRINT article, “Locating System 1032 Files.

Redirecting Files in Bulk
As the temporary file names are system encoded, another approach is to redirect ODBC temporary files to shared scratch locations by using the destination directives in the ODBC.INI file in the data source section, for example:

TempFileDir=DISK$SCRATCH:[TEMP]         ;re-direct to scratch disk
NewTableDir=DISK$SCRATCH:[DSTEMP] ;re-direct to scratch area

By default SYS$SCRATCH is the same as SYS$LOGIN. Therefore, all the temporary files end up in the same place. By redirecting TempFileDir and NewTableDir to a common location, all the files therein could be periodically deleted at once. For efficiency, CCA recommends that if you redirect lingering temporary files, direct them to a faster disk, if one is available.

An en masse delete can have the effect of attempting to delete files that are still in use. One way to avoid this is to delete files from only the previous days by using the /BEFORE option of the OpenVMS delete command, such as the following:

$ DELETE DSKn:[TEMP]*.*;*/BEFORE=TODAY     !delete yesterday's files

If the files are currently open, they will not be deleted. However, an automated command file to do the periodic delete must be able to handle any errors from the delete command.

Removing Temporary Files for a Special Project
Under some circumstances, such as application development testing, you might want to perform cleanup for only a particular user. To access files from a user defined scratch area, delete processing must occur after the ODBC server has changed its context from the authentication account to the user account. The first opportunity is in the System 1032 initialization file. Thereafter the client application can clean up the temporary files at any time using a stored procedure.

For either time, the delete operation can be either:

  One or more calls to the DELETE_FILE tools procedure
 
A PUSH USING command, if the user account has sufficient subprocess quota

In Summary

As we move our applications and users to a PC or web-based application, we may inadvertantly start paying less attention to the infrastructure. Even though our OpenVMS applications are robust, we still need knowledge of our application systems and educated users.

 

Model 204

USE OF AND ACCESS TO PRODUCTS AND FEATURES ARE IN ACCORDANCE WITH THE TERMS AND CONDITIONS OF THE USER’S SOFTWARE LICENSE. THE PRESENTATION OF MATERIAL HEREIN DOES NOT, IN ANY MANNER, MODIFY SUCH TERMS AND CONDITIONS.

Monitoring Checkpoints and the CHKPOINT Dataset
By James Damon


A Model 204 production Online requires regular, fairly frequent checkpoints to ensure recoverability of the database environment following a power failure, system crash, or any other event that results in the abnormal termination of Model 204. The CHKPOINT dataset provides one half of the recoverability function for an Online system, the roll back function, so it must be allocated and monitored to ensure that its size is sufficient.

If checkpoints are inhibited for long periods of time, the volume of data in the CHKPOINT dataset increases if updates continue to be processed and can become quite large. In the event recovery is subsequently required, a large CHKPOINT dataset extends the time required to run recovery, compounding an already difficult situation. Worse, inhibited checkpoints increase the risk of filling the CHKPOINT dataset; this results in immediate and abnormal termination of the Online. Further, if space in the CHKPOINT dataset is exhausted due to unanticipated update activity, or insufficient space allocation, or unavailable disk space, the Online is also terminated abnormally. You can avoid these possibilities by careful monitoring. You can also automate the monitoring using various features available in Model 204.

Setting the CPMAX and CPTIME Parameters
Your first priority is to avoid filling the CHKPOINT dataset. One preemptive approach, which also reduces roll back recovery time, is to set the CCAIN parameter CPMAX to 1. This guarantees that only one checkpoint’s worth of pre-images is kept in the CHKPOINT dataset. Whenever the CPMAX+1 checkpoint is about to be taken, the CHKPOINT dataset is rewound and positioned at the beginning of the dataset. The next checkpoint record will then be written at that location and will be followed by file-page pre-images from this checkpoint interval. This process repeats each time a checkpoint is taken and helps to minimize the size of the CHKPOINT dataset. CPMAX=1 coupled with a low value of CPTIME helps ensure that frequent checkpoints are taken and the CHKPOINT dataset remains small. (Many customers set CPTIME to 5 minutes.)

Direct Monitoring of the CHKPOINT Dataset
The MONITOR CHKP command provides a number of informational messages related to checkpoints:

> MONITOR CHKP
04.182 JUN 30 15.37.43 PAGE 4
*** M204.2073: LATEST SUCCESSFUL CHECKPOINT COMPLETED AT: 04.182 15:33:11.03
*** M204.2074: 0 CHECKPOINTS HAVE SINCE TIMED OUT
*** M204.2075: 124223 RECORDS CURRENTLY IN CHKPOINT STREAM
*** M204.2076: 1 CHECKPOINTS CURRENTLY IN CHKPOINT STREAM
*** M204.2077: 0 USERS CURRENTLY HAVE CHECKPOINTS INHIBITED

The M204.2075 message provides an exact count of records currently in the CHKPOINT dataset. This same information is provided to the $CHKPINF function using request code 5.

BEGIN
PRINT $CHKPINF(5)
END
124223

For example, suppose this is your CHKPOINT DD:

//CHKPOINT DD SPACE=(CYL,(500,100)),DISP=SHR,DSN=...

Because you know that:
  These records are each 6184 bytes in length
  On a 3390 DASD, one track can accommodate eight 6184-byte blocks
 
CHKPOINT takes 500 cylinders of primary space
  CHKPOINT takes up to 15 extents of 100 cylinders

you can calculate that the dataset will hold a maximum of:

((500+(100*15))*15)*8 = 240,000 records.

In the previous example, the number of records currently in the CHKPOINT dataset is 124,223, so the dataset is 52% full. It is a good practice to start issuing warning messages to the operator and system manager when the dataset reaches 85-90% full and to implement a contingency plan. The plan might include stopping subsystems, bumping users, bumping files, issuing warning messages and other measures that stop updates.

Warning Track Feature
To provide another level of overflow protection for the CHKPOINT dataset, you can couple direct monitoring with the warning track feature available through concatenated streams. You can define the CHKPOINT dataset as a concatenated stream as follows:

//CHK25  DD SPACE=(CYL,250),DISP=SHR,DSN=...
//CHK50 DD SPACE=(CYL,250),DISP=SHR,DSN=...
//CHK75 DD SPACE=(CYL,250),DISP=SHR,DSN=...
//CHK100 DD SPACE=(CYL,250),DISP=SHR,DSN=...
//CCAIN DD *
DEFINE STREAM CHKPOINT WITH SCOPE=SYSTEM - CONCATENATE (CHK25,CHK50,CHK75,CHK100)
MSGCTL M204.1846 SAVE OPR

Although CPMAX=1 generally prevents this, if the first checkpoint dataset fills, the following message is issued:

M204.1846: MEMBER CHKP50 OF CONCATENATED STREAM CHKPOINT NOW ACTIVE

As a result of the MSGCTL command, when M204.1846 is issued it will be displayed on the operator’s terminal, written to the JESLOG, and saved in the in-memory error-save table where you can view it by issuing a VIEW ERRORS command. It is written to CCAAUDIT by default. When CHK100 becomes active, you must implement the contingency plan for avoiding a CHKPOINT full condition.

A CHKPOINT dataset can be defined as concatenated only through the DEFINE STREAM command. Multivolume CHKPOINT datasets, which may result from defining CHKPOINT as a concatenated stream, are supported only through use of the DEFINE STREAM command. OS concatenated CHKPOINT datasets are not supported.

Automated Monitoring
You can automate monitoring in a User Language program using a combination of calls to the $CHKPINF function and the USE PROC -n command to route output from a MONITOR CHKP command or a MONITOR CHKP USERLIST SL command to a temporary procedure. Once in a temporary procedure, that data can be read in by another procedure using the $RDPROC function and analyzed for critical information. Of particular interest from the MONITOR CHKP USERLIST SL command output is a list of users inhibiting checkpoints and how many checkpoints have timed out. (Review the M204.2074 message.) The number of checkpoint time-outs multiplied by CPTIME (in this Online CPTIME=5), indicates that 60 minutes have elapsed without taking a checkpoint. Generally, if checkpoints have been inhibited for longer than 30 minutes, CCA recommends notifying the system manager or the system operator.

> MONITOR CHKP USERLIST SL
04.182 JUN 30 18.59.25 PAGE 12
*** M204.2073: LATEST SUCCESSFUL CHECKPOINT COMPLETED AT: 04.182 16:55:41.07
*** M204.2074: 12 CHECKPOINTS HAVE SINCE TIMED OUT
*** M204.2075: 183528 RECORDS CURRENTLY IN CHKPOINT STREAM
*** M204.2076: 1 CHECKPOINTS CURRENTLY IN CHKPOINT STREAM
*** M204.2077: 1 USERS CURRENTLY HAVE CHECKPOINTS INHIBITED

USER SVR USERID P CUR SLICE AGE FUNC CNCT CPU SEQIO QUE WT FLGS
CPU DKRD DKWR UDD OUT SLIC FINDS RECDS PCPU RQTM
SCRN SVRD SVPAGES SUBSYSTEM PROC-FILE PROC
COMMITS BACKOUTS UPDTIME(MS) LNGUPDATES LNGUPDTIME(MS)

20 10 USER63 H 96 0.030 EVAL 1823 22.434 50 BLKI 12 65
20.546 12567 3987 1 0 1 1 444086 0.168 1823845

The following skeletal procedure shows how you might begin an application to do this:

USE PROC -2 R
MONITOR CHKP USERLIST SL
BEGIN
%CPTIME = $VIEW ('CPTIME')
%CPMAX = $VIEW ('CPMAX')
%CHKP0 = $CHKPINF(0)
%CHKP1 = $CHKPINF(1)
%CHKP2 = $CHKPINF(2)
%CHKP3 = $CHKPINF(3)
%CHKP4 = $CHKPINF(4)
%CHKP5 = $CHKPINF(5)
%CHKP6 = $CHKPINF(6)
%CHKP7 = $CHKPINF(7)
%CTL = $RDPROC('OPEN',,-2)
%NEXTLINE = $RDPROC('GET',%CTL)
. . .
END

In Summary
Data in the CHKPOINT dataset is useful for roll back recovery purposes only during the current checkpoint interval. Once the next checkpoint is taken, previous data in the CHKPOINT dataset can be discarded. For this reason, the size of the CHKPOINT dataset can be minimized by setting CPMAX=1. However, certain situations such as runaway transactions or users inhibiting checkpoints for long periods can cause the CHKPOINT dataset to expand to its maximum size allocation and result in abnormal termination of the Online.

It is this possibility that leads to the requirement to closely monitor the CHKPOINT dataset and to the implementation of automated monitoring to prevent CHKPOINT from filling. Comprehensive information about checkpoints and the CHKPOINT dataset is available by using the MONITOR CHKP command and the $CHKPINF function--allowing for both regular reporting of and automated monitoring of checkpoint activity.

Copyright © 2008 Computer Corporation of America.
All right reserved. Published in the United States of America.


Contact CCA Webmaster
Copyright 2008