Model 204 A Faster Way to INITIALIZE
By James Damon
In CCAPrint December 2001, "A Faster Way to CREATE," I described how to use the NOFORMAT option of the CREATE command to reduce the elapsed time required to create a file. In this article I focus on the INITIALIZE command, which is required to initialize all bytes of certain file pages with hexadecimal zeroes. After you create a file and again when you need to reorganize it, you want to get the file back into service as quickly as possible. This article discusses one way to achieve that objective.
During file creation or reorganization, the INITIALIZE command follows the CREATE command.
Initializing a file
Two tables, TABLEA and TABLEC, in a Model 204 file are actually hashed data structures and are accessed using a Model 204 hashing algorithm. When a file is first created, these two tables must be completely initialized to hexadecimal zeroes for the hashing algorithm to work correctly.
Once initialized, these tables cannot be expanded or contracted dynamically, because doing so would change the number of hash cells in the table. Since the number of hash cells directly affects the results of the hashing algorithm, changing the number dynamically would cause subsequent hash operations to fail to locate previously allocated hash cells.
In addition to TABLEA and TABLEC, the following data structures must also be initialized:
Sizing Table A and Table C
The default size of three pages for Table A is sufficient for just about all files and the elapsed time required to initialize these pages to hexadecimal zeroes is inconsequential. The same is true for the other data structures mentioned in Initializing a file with the exception of TABLEB in hashed files. However, in a large file that makes heavy use of the KEY or NUMERIC RANGE field attributes, Table C may also be large and the elapsed time required to initialize Table C may be significant.
The INITIALIZE commands illustrated in Figure 1 and Figure 2 were executed against a file with CSIZE=1, the minimum value, and CSIZE=50000, respectively. We will compare the disk writes required and elapsed time consumed by comparing the DKWR and the RQTM statistics.
Caution: Since the INITIALIZE command destroys all field definitions, data, procedures, and procedure security, always use the IN FILE filename clause preceding the INITIALIZE command to ensure that the correct file is initialized.
In Figure 1, CPU time is so small that it cannot be measured, there are four disk writes, and the elapsed time is 41 milliseconds.
In Figure 2, CPU time is 2,125 milliseconds, there are 50,003 disk writes, and the elapsed time is 165,881 milliseconds or about 2.75 minutes.
Another reason for using the ORDERED attribute
Figure 1 illustrates the advantage of setting CSIZE=1, at least for the INITIALIZE command. Note that this implies that the file contains no fields with the KEY or NUMERIC RANGE attributes. As I've discussed in Living an ORDERED life,CCAPrint August 1997, there are many compelling reasons for converting fields from KEY and NUMERIC RANGE attributes to the ORDERED attribute. Now the INITIALIZE command provides an additional performance reason for making that conversion
In summary
The CREATE and INITIALIZE commands are used infrequently, except in the early stages of file design. When you do not use the KEY or NUMERIC RANGE attributes for fields (CSIZE=1), there are fewer reasons to change table sizes and therefore, fewer reasons to recreate and reinitialize. This is especially true during production use. With no requirement to ever expand TABLEC, file reorganization is rarely required.
Only when file parameters such as ATRPG, FILEORG, FVFPG, or MVFPG, or when field attributes such as AT-MOST-ONE, BINARY, CODED, FLOAT, LENGTH, or OCCURS need to be changed, is file reorganization necessary. However, when it is and when the speed of reorganization is critical, issuing an INITIALIZE command when CSIZE=1 will get the file back into production that much sooner.
System 1032 Automating Dataset Rebuilds: Part 1
By Tym Stegner
System 1032 datasets perform more efficiently when the datasets receive regular maintenance. At some sites, this may mean only an annual (or less often) rebuild; while at other sites, those with frequently updated datasets, rebuilding is done more often.
This two-part article addresses some of the efficiency concerns of building datasets, as well as exploring issues and processes relating to the automation of rebuilding datasets.
There are two aspects to dataset creation:
To optimize your datasets, these two aspects are treated as separate events, as the optimizations differ. To create a dataset, the process must represent all the data structures in memory before actually writing out the DMS file.
Begin optimizing when defining the key tables
Optimizing a dataset begins with the KEY_DEFAULTS clause of the dataset definition. If a dataset is updated frequently, it is a benefit to increase the null space in the key tables, as well as to reserve additional record space for expansion.
The KEY_DEFAULTS clause of the dataset definition lets you change the default of 15 percent free space in the key tables to your desired values.
In a similar fashion, the LOAD_DEFAULTS clause of the dataset definition enables you to declare dataset-specific settings including allocation space, input and output buffer sizing, and whether to define different chain file assignments for frequently used text varying attributes.
You cannot modify either the KEY_DEFAULTS clause or the LOAD_DEFAULTS clause after the dataset is created. You must recreate the dataset from a changed definition file.
Optimizing dataset creation
Optimizing dataset creation is best done on each individual dataset. What works for one particular dataset won't necessarily carry over to another.
If the dataset has many attributes, or if many of the attributes are keyed, or if both are true, you can achieve optimization by using a large working set, large page-file quota, and judicious adjustment of the $BUF_NUMBER system variable. The latter is a process-definable control of the number of memory buffers System 1032 uses creating the dataset structure.
To adjust the $BUF_NUMBER value, issue a series of CREATE commands each time setting the value higher, until the process DIO and BIO values are flat for CREATE processing. To view these values, either press <Control/T> or issue a PRINT $DIO $BIO command before and after the CREATE command.
If sufficient memory is available, it might be possible to set the $BUF_NUMBER value high and leave it for the duration of subsequent CREATE commands.
Optimizing loading your datasets
After you create a dataset shell, you can optimize the storage of the data contained within the dataset. The data consists of the:
Loading records
There are two ways to initialize then load data into your dataset.
Allocating record space
System 1032 automatically calculates the initial allocation for the dataset based on the input file size and the record length. However, if many records are truncated or if you plan to load the dataset via dataset-to-dataset dump, this calculation may be off. When you know that a dataset will expand beyond the initial load, or if you want to ensure that the file extents for a dataset are contiguous, you can preallocate the required record space in the dataset using the ALLOCATE command.
Disk space is reserved using the Best-Try-Contiguous algorithm. You can allocate either a number of records or a number of disk blocks.
When you allocate space by record, the dataset must first be initialized, so that System 1032 knows what the record size is. Initialization is normally part of LOAD processing, but can be done manually as described in Loading records. After you allocate records, you can post data to the dataset using either the DUMP DS command or the APPEND command. Because the dataset is already initialized, LOAD processing is no longer an option.
Reading records most efficiently
To assist System 1032 in reading the data records, adjust the $CLUSTER_LIMIT system variable, which controls the number of record buffers used during RMS-operations. The $CLUSTER_LIMIT system variable is adjusted the same way as the $BUF_NUMBER system variable: trial and repeat. In addition, for the $CLUSTER_LIMIT system variable, you must also watch for increased page faulting as the value increases.
Note: Resetting the value of $CLUSTER_LIMIT affects only the reading of records from a flat file; System 1032 does not use RMS calls during data storage.
As the records are read in and written to the DMS file, System 1032 key values are spooled. When all records are stored, System 1032 begins building the key tables. Key table creation benefits from increases to the value of the $BUF_NUMBER system variable, not the $CLUSTER_LIMIT system variable.
A simple, manual rebuild script
The simple rebuild script in Figure A is a COM file that accepts a dataset name and passes it to System 1032, while assuming all necessary files are in the current directory. You can add many additional features and checks to the code.
In the following code the original dataset is renamed, then opened in the renamed file via an alias. A SHOW command extracts the dataset definition. A CREATE command recreates the dataset using the original name. Existing data records are then transferred from the old dataset to the new using a dataset-to-dataset DUMP command.
Comments on the REBUILD.COM script
The dataset name is passed to System 1032 by substituting the value of P1 into a variable definition on the System 1032 command line.
The PL1032 code uses the @= operator to treat the value of the DSN variable as embedded text, which uses direct commands instead of building command strings for use with the EXECUTE command.
Note: This methodology works well only in DMC files, it's not as easy to do this in compiled procedures.
Dataset rebuilding script design considerations
The following table lists considerations for certain steps that may be part of an automated rebuild process. As a particular feature described may not be in use at your site, the need to incorporate a given element may not be required in your process.
You might want to use the FORCE-CLOSE.COM procedure to list or terminate System 1032 dataset users prior to a rebuild pass.
The FORCE-CLOSE.COM procedure is available from the System 1032 FTP, subdirectory CCAPRINT.
Rebuilding datasets with external files
Datasets with any external resource files, such as keys, records, or attribute files, cannot be handled via the dataset-to-dataset-dump rebuild script. Renaming the dataset does not accommodate the dataset's internal references to the external files.
Datasets with external files should be handled by dumping the original dataset to a binary file, then recreating and loading the new dataset.
Scheduling rebuilds, considerations
You can incorporate a backup strategy into the rebuild process. To do so, copy each dataset to be rebuilt to a staging area before the build proceeds. After a night's rebuilding, you can back up and purge the entire staging area.
You can schedule your rebuilds based on the time that has elapsed since the previous rebuild. Or, you can schedule your rebuilds based on the number of deleted, added, or updated records. For example, a dataset that has had no updates probably does not need to be rebuilt often.
You can choose the datasets for rebuilding using a script that was created to test individual datasets as candidates for rebuilding. Or, as System 1032 is a database system, you could create a dataset to list dataset names, location, and other pertinent information about the dataset and its context. You could write a scanning procedure to evaluate whether the listed datasets are ready for rebuilding.
As for the rebuilding process itself, you could write another procedure do the rebuilds in-line. Or, you could have the scanning procedure invoke individual rebuild batch jobs, via the tools procedure SUBMIT, on a per-dataset basis. In addition to making the programming and error handling a bit easier, individual jobs provide a better historical record for tracking resources associated with the dataset rebuilds.
Coming attractions
In Part 2, we will examine the code developed at a System 1032 site that is used to semi-automatically rebuild their datasets.
Copyright © 2008 Computer Corporation of America. All right reserved. Published in the United States of America.
Contact CCA Webmaster Copyright 2008