Computer Corporation of America
|
Feedback
Search CCA:
   
USA CCA
CCA Products
CCA Customer Support
CCA Resources
CCA - Company
CCAPRINT: A Newsletter for Model 204® and System 1032® Users
June 10, 2004

 

Model 204

Controlling the Sort in FILELOAD
By James Damon


The FILELOAD command, invoked from BATCH204, is a one-step job for loading a Model 204 file from a sequential input file. Given adequate resources, primarily in the form of memory for sorting deferred index updates, a FILELOAD command can provide up to a 15% reduction in elapsed time compared to loading the same data with the FLOD command. This can be a critical factor when reorganization is urgently required for a production file. Performance of the sort can be the most important part of this process.

Unlike the multistep FLOD command processing, where you can restart processing at several places, FILELOAD processing is a one-step process which must either run to successful completion or be rerun from the very beginning. So, to take advantage of the better performance potential, especially during times of urgency, you neeed a very reliable job setup that will not falter, requiring a rerun. This article covers the how-to.

Processing a FILELOAD Command
A FILELOAD command performs the following tasks:

  1. Loads Table B from the sequential input file, TAPEI.
  2.
While loading Table B, it invokes the system-defined sort utility and begins sorting deferred index updates generated by Step 1.
  3. When the sorting is complete, it invokes the Z command step to apply index updates.

Introducing the Major Sort Utilities
The sort utilities commonly used by CCA customer sites are DFSORT from IBM and SYNCSORT from Syncsort Incorporated. Both of these utilities provide a way to override sort control statements and parameters when the sort is invoked by a third-party job such as BATCH204. We suspect that other sort utilities, not mentioned here, offer the same functionality. The FILELOAD command passes a parameter list to the sort utility which controls a number of different sort functions.

The only sort parameter that can be overriden in the FILELOAD command itself is the default sort space of 50000 bytes. However, there may be other parameters that you would like to override, in addition to overriding the the default sort space. For instance you may want to change the default prefix for the sort work ddnames from CCA1 to something more consistent with standards at your site. Or, you may want to provide an estimate on the number of records you expect to be sorted; this change alone can improve performance for most sort utilities. Or, you may want to specify the amount of dataspace, virtual storage to use for the sort.

Controlling the Sort
The IBM DFSORT utility allows the parameter list provided by BATCH204/FILELOAD to be overridden by parameters and options specified in the ddname=DFSPARM. SYNCSORT provides the same capability through the ddname=$ORTPARM. The following is an example using the ddname=DFSPARM under DFSORT.

//DFSPARM
SIZE=35M
OPTION, DSPSIZE=55,DYNALLOC=(SYSDA,4),FILSZ=E1600000,
LIST,LISTX,MSGPRT=ALL,MSGDDN=SORTMSGS,SORTDD=MINE

If this ddname is present in a BATCH204 job that also includes the FILELOAD command, the options and parameters in the DFSPARM stream will override those specified in the FILELOAD command and others provided in the parameter list passed by FILELOAD. In the previous example the following options and parameters were set:

Parameter setting Description
SIZE=35M The maximum amount of virtual storage to allocate above and below-the-line for sorting
DSPSIZE=55 The amount of virtual storage to allocate in a dataspace for sorting
DYNALLOC=(SYSDA,4) Dynamically allocate a maximum of four work datasets on SYSDA
FILSZ=E1600000 An estimate of the number of records that will be sorted
LIST,LISTX List all sort messages on the ddname=SORTMSGS
MSGPRT=ALL List all messages, both error and informational
MSGDDN=SORTMSGS The ddname where messages will print
SORTDD=MINE

Sort work ddnames will have this prefix. If you want to change sort work ddnames, you must provide one dummy DD with CCA1:

//CCA1WK01 DD DUMMY


Experimenting with the Reorganization Job
Some experimentation will be required to find the optimal combination of parameters and options that you provide to the sort utility. A few examples:

  File size estimate (FILSZ). An accurate estimate on the number of deferred index update records that will be sorted is critical in helping the sort utility calculate the number and size of sort work datasets to allocate and how much sorting can be done in memory. This estimate can be made from the histogram provided by FILELOAD:
   
LONGEST DEFERRED UPDATE LRECL=34   
RANGE OF LRECL NUMBER OF RECORDS
256-269 0
240-255 0
224-239 0
208-223 0
192-207 0
176-191 0
160-175 0
144-159 0
128-143 0
112-127 0
96-111 0
80- 95 0
64- 79 0
48- 63 0
32- 47 18407
1- 31 136012

The total number of deferred index update records to be sorted is the sum of the NUMBER OF RECORDS column. In the previous histogram, you could specify an estimate with FILSZ=E160000. Or, you could specify FILSZ=154419 for an exact count.

 
The proper mix of virtual storage and dataspace virtual storage which does not result in system paging. For example, if you set the SIZE parameter to 500M and you do not have adequate real memory, the sort might take longer due to system paging than if you made the SIZE parameter smaller or did not set it.
  The proper allocation for sort work datasets. You want enough disk storage at the right size. Even though the sort utilities try to do as much sorting as possible in memory, if the amount of memory is insufficient, sort work datasets on disk will be used. Hence, the sort work datasets need to know how big to be. In your testing you can let the sort utility figure out the number and size of the sort work datasets by providing an accurate estimate of the number of records to sort (FILSZ=En) and by setting the DYNALLOC parameter.

CCA suggests that you use copies of critical production files which may need to be reorganized and test your reorganization job streams and sort parameters using those copied files. This will provide information regarding your choices for Model 204 file, sort, and FILELOAD parameters.

In Summary
Many sort options can be specified as described in this article to maximize sort performance during a FILELOAD reorganization and reduce the elapsed time required to return a file to production use. For critical files that routinely require reorganization during production periods or for emergency file reorganizations, the effort required to fine tune these reorganization jobs may be very worthwhile.

 

System 1032

Back to Basics, Part 1 – Concepts
By Tym Stegner

Tym

In this issue of CCAPRINT we begin a series of articles that review the basics of System 1032. Every other month you can expect a review of the features and functions of System 1032 with tips and recommendations. Although this series of articles is targeted to your newer System 1032 staff--those people recently hired to maintain and develop System 1032 applications and data structures--we think our long-standing readers will also welcome a refresh of your favorite database management system.

Early 21st century Americans are surrounded by databases that collect, store, and give back information about almost all aspects of our daily lives. Every time you buy a tee shirt in a store, place a phone call, enroll in a class, order an airline ticket, or fill your gas tank, that transaction is recorded and stored in a database, a computer storage and retrieval program or database management system (DBMS).

Introducing System 1032
System 1032 is a general-purpose DBMS that runs as a user-level program on the OpenVMS operating system. System 1032 was chosen for your site because it stores and retrieves data efficiently. And over the years it has proven completely reliable. System 1032 stores data from the mundane name and address data for a mailing list company to the exotic index of satellite pictures

Using System 1032, you can efficiently store data in electronic form, retrieve it rapidly, order and format it, and also share that data with multiple colleagues simultaneously. This eliminates the need for duplication of data by different members or divisions of a company, thus easing updates of crucial information.

System 1032 can automatically prevent several kinds of data errors. For example, it can detect typing errors or impossible dates in data entry, and it can constrain values to a predefined range. System 1032 can provide security for your data. System 1032 controls access at the table, record, and field levels using one or more of password, username, OpenVMS User Identification Code (UIC), or other identifier. In addition, System 1032 supports flexibility of output formats, a means to automate repetitive tasks, the ability to integrate into third-generation language applications, such as COBOL, FORTRAN, BASIC, and allows access to its data from the PC platform and/or Web server.

Elementary System 1032 Data Structures
The basic unit of operation within System 1032 is a data value. A data value can be thought of as any given fact, such as Yes or No or a Social Security number or a more complex field value such as an explanation in text or the pixels for photo ID cards. In System 1032 you identify the data value by a name and you classify the data value by assigning a data type to the field, for example, text, numeric, binary—to name a few.

Such a defined field with name and data type is called an attribute, which takes the position of a column in a table. In addition to a name and a data type, attributes can optionally have other elements such as alternate names, a format, a title, a prompt, a range (non-text fields), indexing, precision, initial value, and so on.

A collection of one or more attributes constitutes a record or row. A record is a set of related data values about a particular subject. For example, your address book entries, whether online, on a PDA, or in a pocket-sized book, are the records of the names and contact information of your friends and business relationships. A record in System 1032 can have up to 3,000 attributes defined, though the largest known record in current use is only about 300 attributes long.

Storing Records in a Dataset
In System 1032, you collect and work with records or sets of records. To store all the data on the same general subject, such as your personal address book, System 1032 maintains a collection of records with the same attribute organization--thus the same number of data items per record. System 1032 automatically assigns a unique identification number, called $ID, to each record as it is added to the dataset. A collection of records is normally limited only by available disk storage or an upper limit of over two billion records.

A dataset is stored in an OpenVMS file under a name you provide, and has the file type DMS. The largest known dataset in use is about 350 million records; datasets of 5 to 10 million are in common usage; the average dataset size is under 100,000 records. Datasets store the definitions of the attributes contained within, the data values of the defined attributes, and the indexing system for the attribute values known as key tables.

Retrieving Records from Datasets
When you want to retrieve records from your address book dataset, System 1032 uses a key table instead of reading the data values in each record in the dataset. For example, to find all addresses in the zip code area 01701, a System 1032 query looks at the zip code key table where the $IDs are grouped according to zip code. Using only the $IDs for the 01701 zip code, System 1032 quickly retrieves just the records for the 01701 zip code, thus greatly speeding a query request. Key tables are stored within the dataset file.

Dataset Details
A dataset can also optionally contain other structures, such as variables, record descriptors, forms, and procedures. In future articles, we will discuss some of the following dataset structures in greater detail.

  A record descriptor (RD) defines the external record structure of a set of attributes for reading or writing data to non-dataset files or program buffers.
 
Variables are used for temporary in-memory storage of data values, forms are used to display and accept data in an on-screen interactive arrangement, and procedures are compiled PL1032 commands.
  A dataset can optionally contain pre-compiled commands to perform trigger actions for certain System 1032 commands, such as the opening a dataset or updating commands.

Utilizing the Database and Library Structures
Databases
Datasets can be logically related or joined to one another by the data they contain. You can define a join relationship between pairs of datasets in a database on one or more sets of related attributes. This relationship can be dynamic or defined within an OpenVMS file called a database. The database has a user-defined name, and has the file type DMB. Up to 250 datasets can be associated with a single database, and up to 31 of those datasets can be used in join relationships.

The database structure is designed to store related datasets and database-level procedures. CCA recommends that you store:

  Nondynamic join information within databases
 
Database-level triggers, variables, forms, RDs, and procedures, including trigger procedures
  Some Web applications, because they are simplified when the underlying datasets are in a database structure
  Predetermined view datasets in a database with their underlying datasets, otherwise an error occurs

Although it is also physically possible to store one or more datasets within a database file, this practice is rarely exercised any more.

When a database is opened by System 1032, all its associated datasets are automatically opened, and any defined join relationships are activated. Any number of databases can be opened at one time, limited only by disk space and/or process memory.

Libraries
The library structure has the file type DML and it is used to store programming related structures such as compiled procedures, variables, forms, and RDs. You can think of a library as a tool chest. A library never stores data values; it is used for utilities and applications only. System 1032 includes several system libraries containing utility procedures and commands, which are appropriately called tools.

Coming Attractions
The next article in this series will review the System 1032 data types, their uses and options.

We are always interested in your ideas for articles or topics of interest to you, our customers. If this review series for System 1032 triggers a question that you would like discussed more fully, do not hesitate to contact us by replying to your CCAPRINT notification mail or send message.

 

Copyright © 2008 Computer Corporation of America.
All right reserved. Published in the United States of America.


Contact CCA Webmaster
Copyright 2008