Model 204
Controlling the Sort in FILELOAD By James Damon
The FILELOAD command, invoked from BATCH204, is a one-step job for loading a Model 204 file from a sequential input file. Given adequate resources, primarily in the form of memory for sorting deferred index updates, a FILELOAD command can provide up to a 15% reduction in elapsed time compared to loading the same data with the FLOD command. This can be a critical factor when reorganization is urgently required for a production file. Performance of the sort can be the most important part of this process.
Unlike the multistep FLOD command processing, where you can restart processing at several places, FILELOAD processing is a one-step process which must either run to successful completion or be rerun from the very beginning. So, to take advantage of the better performance potential, especially during times of urgency, you neeed a very reliable job setup that will not falter, requiring a rerun. This article covers the how-to.
Processing a FILELOAD Command A FILELOAD command performs the following tasks:
Introducing the Major Sort Utilities The sort utilities commonly used by CCA customer sites are DFSORT from IBM and SYNCSORT from Syncsort Incorporated. Both of these utilities provide a way to override sort control statements and parameters when the sort is invoked by a third-party job such as BATCH204. We suspect that other sort utilities, not mentioned here, offer the same functionality. The FILELOAD command passes a parameter list to the sort utility which controls a number of different sort functions.
The only sort parameter that can be overriden in the FILELOAD command itself is the default sort space of 50000 bytes. However, there may be other parameters that you would like to override, in addition to overriding the the default sort space. For instance you may want to change the default prefix for the sort work ddnames from CCA1 to something more consistent with standards at your site. Or, you may want to provide an estimate on the number of records you expect to be sorted; this change alone can improve performance for most sort utilities. Or, you may want to specify the amount of dataspace, virtual storage to use for the sort.
Controlling the Sort The IBM DFSORT utility allows the parameter list provided by BATCH204/FILELOAD to be overridden by parameters and options specified in the ddname=DFSPARM. SYNCSORT provides the same capability through the ddname=$ORTPARM. The following is an example using the ddname=DFSPARM under DFSORT.
//DFSPARM SIZE=35M OPTION, DSPSIZE=55,DYNALLOC=(SYSDA,4),FILSZ=E1600000, LIST,LISTX,MSGPRT=ALL,MSGDDN=SORTMSGS,SORTDD=MINE
If this ddname is present in a BATCH204 job that also includes the FILELOAD command, the options and parameters in the DFSPARM stream will override those specified in the FILELOAD command and others provided in the parameter list passed by FILELOAD. In the previous example the following options and parameters were set:
Sort work ddnames will have this prefix. If you want to change sort work ddnames, you must provide one dummy DD with CCA1:
//CCA1WK01 DD DUMMY
Experimenting with the Reorganization Job Some experimentation will be required to find the optimal combination of parameters and options that you provide to the sort utility. A few examples:
LONGEST DEFERRED UPDATE LRECL=34 RANGE OF LRECL NUMBER OF RECORDS 256-269 0 240-255 0 224-239 0 208-223 0 192-207 0 176-191 0 160-175 0 144-159 0 128-143 0 112-127 0 96-111 0 80- 95 0 64- 79 0 48- 63 0 32- 47 18407 1- 31 136012
The total number of deferred index update records to be sorted is the sum of the NUMBER OF RECORDS column. In the previous histogram, you could specify an estimate with FILSZ=E160000. Or, you could specify FILSZ=154419 for an exact count.
CCA suggests that you use copies of critical production files which may need to be reorganized and test your reorganization job streams and sort parameters using those copied files. This will provide information regarding your choices for Model 204 file, sort, and FILELOAD parameters.
System 1032
Back to Basics, Part 1 Concepts By Tym Stegner
In this issue of CCAPRINT we begin a series of articles that review the basics of System 1032. Every other month you can expect a review of the features and functions of System 1032 with tips and recommendations. Although this series of articles is targeted to your newer System 1032 staff--those people recently hired to maintain and develop System 1032 applications and data structures--we think our long-standing readers will also welcome a refresh of your favorite database management system.
Early 21st century Americans are surrounded by databases that collect, store, and give back information about almost all aspects of our daily lives. Every time you buy a tee shirt in a store, place a phone call, enroll in a class, order an airline ticket, or fill your gas tank, that transaction is recorded and stored in a database, a computer storage and retrieval program or database management system (DBMS).
Introducing System 1032 System 1032 is a general-purpose DBMS that runs as a user-level program on the OpenVMS operating system. System 1032 was chosen for your site because it stores and retrieves data efficiently. And over the years it has proven completely reliable. System 1032 stores data from the mundane name and address data for a mailing list company to the exotic index of satellite pictures
Using System 1032, you can efficiently store data in electronic form, retrieve it rapidly, order and format it, and also share that data with multiple colleagues simultaneously. This eliminates the need for duplication of data by different members or divisions of a company, thus easing updates of crucial information.
System 1032 can automatically prevent several kinds of data errors. For example, it can detect typing errors or impossible dates in data entry, and it can constrain values to a predefined range. System 1032 can provide security for your data. System 1032 controls access at the table, record, and field levels using one or more of password, username, OpenVMS User Identification Code (UIC), or other identifier. In addition, System 1032 supports flexibility of output formats, a means to automate repetitive tasks, the ability to integrate into third-generation language applications, such as COBOL, FORTRAN, BASIC, and allows access to its data from the PC platform and/or Web server.
Elementary System 1032 Data Structures The basic unit of operation within System 1032 is a data value. A data value can be thought of as any given fact, such as Yes or No or a Social Security number or a more complex field value such as an explanation in text or the pixels for photo ID cards. In System 1032 you identify the data value by a name and you classify the data value by assigning a data type to the field, for example, text, numeric, binaryto name a few.
Such a defined field with name and data type is called an attribute, which takes the position of a column in a table. In addition to a name and a data type, attributes can optionally have other elements such as alternate names, a format, a title, a prompt, a range (non-text fields), indexing, precision, initial value, and so on.
A collection of one or more attributes constitutes a record or row. A record is a set of related data values about a particular subject. For example, your address book entries, whether online, on a PDA, or in a pocket-sized book, are the records of the names and contact information of your friends and business relationships. A record in System 1032 can have up to 3,000 attributes defined, though the largest known record in current use is only about 300 attributes long.
Storing Records in a Dataset In System 1032, you collect and work with records or sets of records. To store all the data on the same general subject, such as your personal address book, System 1032 maintains a collection of records with the same attribute organization--thus the same number of data items per record. System 1032 automatically assigns a unique identification number, called $ID, to each record as it is added to the dataset. A collection of records is normally limited only by available disk storage or an upper limit of over two billion records.
A dataset is stored in an OpenVMS file under a name you provide, and has the file type DMS. The largest known dataset in use is about 350 million records; datasets of 5 to 10 million are in common usage; the average dataset size is under 100,000 records. Datasets store the definitions of the attributes contained within, the data values of the defined attributes, and the indexing system for the attribute values known as key tables.
Retrieving Records from Datasets When you want to retrieve records from your address book dataset, System 1032 uses a key table instead of reading the data values in each record in the dataset. For example, to find all addresses in the zip code area 01701, a System 1032 query looks at the zip code key table where the $IDs are grouped according to zip code. Using only the $IDs for the 01701 zip code, System 1032 quickly retrieves just the records for the 01701 zip code, thus greatly speeding a query request. Key tables are stored within the dataset file.
Dataset Details A dataset can also optionally contain other structures, such as variables, record descriptors, forms, and procedures. In future articles, we will discuss some of the following dataset structures in greater detail.
Utilizing the Database and Library Structures Databases Datasets can be logically related or joined to one another by the data they contain. You can define a join relationship between pairs of datasets in a database on one or more sets of related attributes. This relationship can be dynamic or defined within an OpenVMS file called a database. The database has a user-defined name, and has the file type DMB. Up to 250 datasets can be associated with a single database, and up to 31 of those datasets can be used in join relationships.
The database structure is designed to store related datasets and database-level procedures. CCA recommends that you store:
Although it is also physically possible to store one or more datasets within a database file, this practice is rarely exercised any more.
When a database is opened by System 1032, all its associated datasets are automatically opened, and any defined join relationships are activated. Any number of databases can be opened at one time, limited only by disk space and/or process memory.
Libraries The library structure has the file type DML and it is used to store programming related structures such as compiled procedures, variables, forms, and RDs. You can think of a library as a tool chest. A library never stores data values; it is used for utilities and applications only. System 1032 includes several system libraries containing utility procedures and commands, which are appropriately called tools. Coming Attractions The next article in this series will review the System 1032 data types, their uses and options.
We are always interested in your ideas for articles or topics of interest to you, our customers. If this review series for System 1032 triggers a question that you would like discussed more fully, do not hesitate to contact us by replying to your CCAPRINT notification mail or send message.
Copyright © 2008 Computer Corporation of America. All right reserved. Published in the United States of America.