Computer Corporation of America
|
Feedback
Search CCA:
   
USA CCA
CCA Products
CCA Customer Support
CCA Resources
CCA - Company
CCAPRINT: A Newsletter for Model 204® and System 1032® Users
January 10, 2002

Model 204
A Faster Way to INITIALIZE

By James Damon

In CCAPrint December 2001, "A Faster Way to CREATE," I described how to use the NOFORMAT option of the CREATE command to reduce the elapsed time required to create a file. In this article I focus on the INITIALIZE command, which is required to initialize all bytes of certain file pages with hexadecimal zeroes. After you create a file and again when you need to reorganize it, you want to get the file back into service as quickly as possible. This article discusses one way to achieve that objective.

During file creation or reorganization, the INITIALIZE command follows the CREATE command.

Initializing a file

Two tables, TABLEA and TABLEC, in a Model 204 file are actually hashed data structures and are accessed using a Model 204 hashing algorithm. When a file is first created, these two tables must be completely initialized to hexadecimal zeroes for the hashing algorithm to work correctly.

Once initialized, these tables cannot be expanded or contracted dynamically, because doing so would change the number of hash cells in the table. Since the number of hash cells directly affects the results of the hashing algorithm, changing the number dynamically would cause subsequent hash operations to fail to locate previously allocated hash cells.

In addition to TABLEA and TABLEC, the following data structures must also be initialized:

The FCT or file control table, which is the first page of each file
TABLEB in a file created with FILEORG=X'08' a hash key file, initialized to hexadecimal zeroes.
The TABLED free page bit pattern, initialized to hexadecimal X'FF'

 

Sizing Table A and Table C

The default size of three pages for Table A is sufficient for just about all files and the elapsed time required to initialize these pages to hexadecimal zeroes is inconsequential. The same is true for the other data structures mentioned in Initializing a file with the exception of TABLEB in hashed files. However, in a large file that makes heavy use of the KEY or NUMERIC RANGE field attributes, Table C may also be large and the elapsed time required to initialize Table C may be significant.

The INITIALIZE commands illustrated in Figure 1 and Figure 2 were executed against a file with CSIZE=1, the minimum value, and CSIZE=50000, respectively. We will compare the disk writes required and elapsed time consumed by comparing the DKWR and the RQTM statistics.

Caution: Since the INITIALIZE command destroys all field definitions, data, procedures, and procedure security, always use the IN FILE filename clause preceding the INITIALIZE command to ensure that the correct file is initialized.

  CREATE (NOFORMAT) TESTZ
*** M204.0782: BEGIN CREATION: FILE TESTZ
*** M204.0787: READING FILE PARAMETERS
PARAMETER BSIZE=1 CSIZE=1 DSIZE=1
END
*** M204.0794: END FILE CREATION: TESTZ

OPEN TESTZ
*** M204.0620: FILE TESTZ OPENED
*** M204.1203: FILE TESTZ WAS LAST UPDATED ON 01.353 DEC 19 12.56.52
*** M204.0627: FILE TESTZ IS NOT INITIALIZED
IN FILE TESTZ INITIALIZE
/// M204.0763: BEGIN INITIALIZATION: FILE TESTZ
$$$ USERID='FMUSER1 ' ACCOUNT='FMUSER1 ' LAST='INIT' SUBSYSTEM='
PROC=' ' PDL=304 DKWR=4 RQTM=41

Figure 1. Setting CSIZE=1

 

In Figure 1, CPU time is so small that it cannot be measured, there are four disk writes, and the elapsed time is 41 milliseconds.

  CREATE (NOFORMAT) TESTZ
*** M204.0782: BEGIN CREATION: FILE TESTZ
*** M204.0787: READING FILE PARAMETERS
PARAMETER BSIZE=1 CSIZE=50000 DSIZE=1
END
*** M204.0794: END FILE CREATION: TESTZ
OPEN TESTZ
*** M204.0620: FILE TESTZ OPENED
*** M204.1203: FILE TESTZ WAS LAST UPDATED ON 01.353 DEC 19 12.56.52
*** M204.0627: FILE TESTZ IS NOT INITIALIZED
IN FILE TESTZ INITIALIZE
/// M204.0763: BEGIN INITIALIZATION: FILE TESTZ
$$$ USERID='FMUSER1 ' ACCOUNT='FMUSER1 ' LAST='INIT' SUBSYSTEM='
PROC=' ' PDL=304 CNCT=166 CPU=2125
DKWR=50003 PCPU=12 RQTM=16588

Figure 2. Setting CSIZE=50000

 

In Figure 2, CPU time is 2,125 milliseconds, there are 50,003 disk writes, and the elapsed time is 165,881 milliseconds or about 2.75 minutes.

Another reason for using the ORDERED attribute

Figure 1 illustrates the advantage of setting CSIZE=1, at least for the INITIALIZE command. Note that this implies that the file contains no fields with the KEY or NUMERIC RANGE attributes. As I've discussed in Living an ORDERED life,CCAPrint August 1997, there are many compelling reasons for converting fields from KEY and NUMERIC RANGE attributes to the ORDERED attribute. Now the INITIALIZE command provides an additional performance reason for making that conversion

In summary

The CREATE and INITIALIZE commands are used infrequently, except in the early stages of file design. When you do not use the KEY or NUMERIC RANGE attributes for fields (CSIZE=1), there are fewer reasons to change table sizes and therefore, fewer reasons to recreate and reinitialize. This is especially true during production use. With no requirement to ever expand TABLEC, file reorganization is rarely required.

Only when file parameters such as ATRPG, FILEORG, FVFPG, or MVFPG, or when field attributes such as AT-MOST-ONE, BINARY, CODED, FLOAT, LENGTH, or OCCURS need to be changed, is file reorganization necessary. However, when it is and when the speed of reorganization is critical, issuing an INITIALIZE command when CSIZE=1 will get the file back into production that much sooner.

 

System 1032
Automating Dataset Rebuilds: Part 1

By Tym Stegner

System 1032 datasets perform more efficiently when the datasets receive regular maintenance. At some sites, this may mean only an annual (or less often) rebuild; while at other sites, those with frequently updated datasets, rebuilding is done more often.

This two-part article addresses some of the efficiency concerns of building datasets, as well as exploring issues and processes relating to the automation of rebuilding datasets.

There are two aspects to dataset creation:

Creating the dataset structure
Initializing and loading the data into the dataset

 

To optimize your datasets, these two aspects are treated as separate events, as the optimizations differ. To create a dataset, the process must represent all the data structures in memory before actually writing out the DMS file.

Begin optimizing when defining the key tables

Optimizing a dataset begins with the KEY_DEFAULTS clause of the dataset definition. If a dataset is updated frequently, it is a benefit to increase the null space in the key tables, as well as to reserve additional record space for expansion.

The KEY_DEFAULTS clause of the dataset definition lets you change the default of 15 percent free space in the key tables to your desired values.

In a similar fashion, the LOAD_DEFAULTS clause of the dataset definition enables you to declare dataset-specific settings including allocation space, input and output buffer sizing, and whether to define different chain file assignments for frequently used text varying attributes.

You cannot modify either the KEY_DEFAULTS clause or the LOAD_DEFAULTS clause after the dataset is created. You must recreate the dataset from a changed definition file.

Optimizing dataset creation

Optimizing dataset creation is best done on each individual dataset. What works for one particular dataset won't necessarily carry over to another.

If the dataset has many attributes, or if many of the attributes are keyed, or if both are true, you can achieve optimization by using a large working set, large page-file quota, and judicious adjustment of the $BUF_NUMBER system variable. The latter is a process-definable control of the number of memory buffers System 1032 uses creating the dataset structure.

To adjust the $BUF_NUMBER value, issue a series of CREATE commands each time setting the value higher, until the process DIO and BIO values are flat for CREATE processing. To view these values, either press <Control/T> or issue a PRINT $DIO $BIO command before and after the CREATE command.

If sufficient memory is available, it might be possible to set the $BUF_NUMBER value high and leave it for the duration of subsequent CREATE commands.

Optimizing loading your datasets

After you create a dataset shell, you can optimize the storage of the data contained within the dataset. The data consists of the:

Data records
Any key tables generated for keyed attributes

 

Loading records

There are two ways to initialize then load data into your dataset.

    Issue a LOAD command, which automatically initializes the dataset, then loads the data.

    Manually initialize the dataset, allocate record space, and then append new records. You can initialize the dataset:
 

By issuing an ADD command, followed by pressing <Ctrl/Z>
 

In a program using what is called a mini-load. Issue the following command:
LOAD DATA_INPUT NL:MAX 0

 

Allocating record space

System 1032 automatically calculates the initial allocation for the dataset based on the input file size and the record length. However, if many records are truncated or if you plan to load the dataset via dataset-to-dataset dump, this calculation may be off. When you know that a dataset will expand beyond the initial load, or if you want to ensure that the file extents for a dataset are contiguous, you can preallocate the required record space in the dataset using the ALLOCATE command.

Disk space is reserved using the Best-Try-Contiguous algorithm. You can allocate either a number of records or a number of disk blocks.

When you allocate space by record, the dataset must first be initialized, so that System 1032 knows what the record size is. Initialization is normally part of LOAD processing, but can be done manually as described in Loading records. After you allocate records, you can post data to the dataset using either the DUMP DS command or the APPEND command. Because the dataset is already initialized, LOAD processing is no longer an option.

Reading records most efficiently

To assist System 1032 in reading the data records, adjust the $CLUSTER_LIMIT system variable, which controls the number of record buffers used during RMS-operations. The $CLUSTER_LIMIT system variable is adjusted the same way as the $BUF_NUMBER system variable: trial and repeat. In addition, for the $CLUSTER_LIMIT system variable, you must also watch for increased page faulting as the value increases.

Note: Resetting the value of $CLUSTER_LIMIT affects only the reading of records from a flat file; System 1032 does not use RMS calls during data storage.

As the records are read in and written to the DMS file, System 1032 key values are spooled. When all records are stored, System 1032 begins building the key tables. Key table creation benefits from increases to the value of the $BUF_NUMBER system variable, not the $CLUSTER_LIMIT system variable.

A simple, manual rebuild script

The simple rebuild script in Figure A is a COM file that accepts a dataset name and passes it to System 1032, while assuming all necessary files are in the current directory. You can add many additional features and checks to the code.

In the following code the original dataset is renamed, then opened in the renamed file via an alias. A SHOW command extracts the dataset definition. A CREATE command recreates the dataset using the original name. Existing data records are then transferred from the old dataset to the new using a dataset-to-dataset DUMP command.

  $! REBUILD.COM
$!
$ if P1 .eqs. ""
$ then
$ write sys$output "%-E-No dataset name specified"
$ exit 44
$ endif
$ if f$search(P1+".DMS") .eqs. ""
$ then
$ write sys$output "%-E-Dataset not found"
$ exit 44
$ endif
$ rename P1+".DMS" *.OLDMS
$ System 1032 variable dsn text varying init "''P1'"
variable newdmd text varying
let newdmd = dsn&".DMD"
open ds @=dsn as old in [] readonly
show on @=newdmd ds old definition
create ds @=dsn description @=newdmd output []
set ds old
find all
dump ds_output @=dsn
exit
$ dir/date/size 'P1'.*

Figure A. Code to rebuild a dataset with a simple COM file

 

Comments on the REBUILD.COM script

The dataset name is passed to System 1032 by substituting the value of P1 into a variable definition on the System 1032 command line.

The PL1032 code uses the @= operator to treat the value of the DSN variable as embedded text, which uses direct commands instead of building command strings for use with the EXECUTE command.

Note: This methodology works well only in DMC files, it's not as easy to do this in compiled procedures.

Dataset rebuilding script design considerations

The following table lists considerations for certain steps that may be part of an automated rebuild process. As a particular feature described may not be in use at your site, the need to incorporate a given element may not be required in your process.

Considerations Why important or necessary
An automated rebuild procedure must handle any existing DMU file for the dataset as part of backup and restore operations, as well as when renaming the dataset for rebuilding purposes. The DMU file contains the overflow of the dataset's damage history. The DMU file must remain associated with the particular old dataset: renaming the dataset breaks the association.

You might want to use the FORCE-CLOSE.COM procedure to list or terminate System 1032 dataset users prior to a rebuild pass.

The FORCE-CLOSE.COM procedure is available from the System 1032 FTP, subdirectory CCAPRINT.

The FORCE-CLOSE procedure lists those processes having System 1032 datasets open, and has an option for the privileged user to FORCEX those processes, thus enabling exclusive access to dataset files.
If your site makes use of the $DAMAGE_ERROR system variable, your rebuild script must check for a TRUE setting value and temporarily reset the variable to FALSE for the duration of the rebuild. When $DAMAGE_ERROR is set TRUE, damaged datasets cannot be opened.
Dataset rebuilds must consider the application of dataset, attribute, and record-level security, as well as the dataset ownership. You might want to use a rebuild account or request an LAF to enable the $SITE_DBA system variable. When a dataset is recreated, you must reapply the security. Only the dataset owner or an account specified as $SITE_DBA can apply security. The $SITE_DBA account has ownership access to all catalogs, regardless of security.
You might want to include allocation settings per dataset that specify rebuild allocation sizes such as the same number of records: n% more than current, m% less than current, and so on. Note you can do this on a fixed basis using the LOAD_DEFAULTS clause of the dataset definition. Preallocating record space in a dataset reserves disk space for future records.
To obtain no-conflict access to the datasets to be rebuilt, the LOCK method is the sure way, as it works intercluster. You must decide whether you want a no-access lock, or a read-only lock. Although System 1032 does not require exclusive access to read a dataset, you get a more consistent dataset image if no one is updating the dataset at the time of backup or copy.

 

Rebuilding datasets with external files

Datasets with any external resource files, such as keys, records, or attribute files, cannot be handled via the dataset-to-dataset-dump rebuild script. Renaming the dataset does not accommodate the dataset's internal references to the external files.

Datasets with external files should be handled by dumping the original dataset to a binary file, then recreating and loading the new dataset.

Scheduling rebuilds, considerations

You can incorporate a backup strategy into the rebuild process. To do so, copy each dataset to be rebuilt to a staging area before the build proceeds. After a night's rebuilding, you can back up and purge the entire staging area.

You can schedule your rebuilds based on the time that has elapsed since the previous rebuild. Or, you can schedule your rebuilds based on the number of deleted, added, or updated records. For example, a dataset that has had no updates probably does not need to be rebuilt often.

You can choose the datasets for rebuilding using a script that was created to test individual datasets as candidates for rebuilding. Or, as System 1032 is a database system, you could create a dataset to list dataset names, location, and other pertinent information about the dataset and its context. You could write a scanning procedure to evaluate whether the listed datasets are ready for rebuilding.

As for the rebuilding process itself, you could write another procedure do the rebuilds in-line. Or, you could have the scanning procedure invoke individual rebuild batch jobs, via the tools procedure SUBMIT, on a per-dataset basis. In addition to making the programming and error handling a bit easier, individual jobs provide a better historical record for tracking resources associated with the dataset rebuilds.

Coming attractions

In Part 2, we will examine the code developed at a System 1032 site that is used to semi-automatically rebuild their datasets.

 

Copyright © 2008 Computer Corporation of America.
All right reserved. Published in the United States of America.


Contact CCA Webmaster
Copyright 2008