Computer Corporation of America
|
Feedback
Search CCA:
   
USA CCA
CCA Products
CCA Customer Support
CCA Resources
CCA - Company
CCAPRINT: A Newsletter for Model 204® and System 1032® Users
June 10, 2006
     
System 1032: Validating Key Table Values Printer-friendly version
Model 204: SQL Performance Tuning Printer-friendly version

System 1032
USE OF AND ACCESS TO PRODUCTS AND FEATURES ARE IN ACCORDANCE WITH THE TERMS AND CONDITIONS OF THE USER’S SOFTWARE LICENSE. THE PRESENTATION OF MATERIAL HEREIN DOES NOT, IN ANY MANNER, MODIFY SUCH TERMS AND CONDITIONS.

Validating Key Table Values
By Tym Stegner

TymWhen you are about to update a key, you need to prove that the key table is valid at the outset. How does one verify that the values in an attribute’s key table are correct and up to date? This article recommends methods of validation to inspect whether key table values are valid.

Re-key the Attribute(s)

When using the V9.82-2 version of System 1032 or later, keying an attribute always results in a completely valid key table. If you are uncertain of an attribute key table, the simple, safe remedy is to re-key the suspect attribute or the entire dataset.

Using VALUES Output for a Quick Inspection – Tip 1
The System 1032 VALUES command displays a values breakdown list for a specified attribute. The data values from the low-level key tables are scanned, and the value occurrences are counted and displayed. As the low-level tables must be fully scanned to perform the operation, any inconsistencies in the tables are noted by System 1032 and reported.

The drawback to this tip is that only the key table values are displayed. To get a more complete check of the accuracy of the values and their counts, you want to compare the actual data in the attribute records against the reported values from the low-level key table. Such processing is not done by the VALUES command.

Comparing VALUES Output with PRINT or DUMP Output
The easiest way to make this check is to compare the output of the VALUES command for an attribute to a summarized list of the attribute’s values that was created by a PRINT (or a DUMP) command. (Do not use the PRINT BY command to create a summarized report, because PRINT BY optimizes its processing by using the attribute’s key tables, if available, to report the values.)

I use the following approach to create a breakdown list of the values of an attribute in a dataset.

Extracting Values to a Flat File
First, issue a PRINT (or DUMP) command to extract all the attribute values out to a flat file. If you issue a PRINT command, be sure to include the WITHOUT TITLES clause to suppress the attribute titles. When using a DUMP command, I use text formats to facilitate the next steps: sorting and aggregation.

I recommend that when sorting the values do not sort the data in System 1032 unless you have a small dataset, because the I/O load of extracting the data in sorted order is higher than using the VMS Sort utility to sort the data in the flat file. You can just specify the command as:

$ SORT inputfile outputfile

However, it is more efficient to specify the /key quality to identify and quantify the sort information. See VMS HELP on sort/key for an explanation.

Extracting Unique Values and Counts – Tip 2
Now, we have a sorted list of values. Next we need a list of unique values, including a count. For this, I use variations on a simple DCL command file, shown in Figure 1, to aggregate the multiple values down to a single value.

Figure 1. Aggregating unique values

$! SQSHR.COM
$ Open/Write Ocx 'Ff$parse(P1,,,"Name")'.sum
$ Open Icx 'P1 !P1 is input file name.type
$ Read Icx Lval
$ Xval = ""
$ Lcn = 1
$Vloop:
$ Read/End=No_More Icx Xval
$ If Xval .Nes. Lval
$ Then
$ Write Ocx F$fao("!AS|!SL",Lval,Lcn)
$ Lval = Xval
$ Lcn = 1
$ Else
$ Lcn = Lcn + 1
$ Endif
$ Goto Vloop
$No_More:
$ Write Ocx F$fao("!AS|!SL",Lval,Lcn)
$ Close Icx
$ Close Ocx
$ Exit

Using the Command File
The command file in Figure 1 opens a supplied input file, and creates an output file with the same file name, but with the new file type .SUM. A loop reads values one at a time, writing a line to the output file when the value changes. At the end of the loop, a final write takes care of the last value. The output line is value|count.

For most datasets, you can compare the VALUES key-table count and the PRINT (or DUMP) record generated count for a match by visual inspection. For larger value sets, you might make use of the VMS differences/parallel command to more easily compare the two files. While the format differences create obvious incorrect matching, there is the side benefit of automatically presenting the files side-by-side, to facilitate visual inspection.

Extraction Unique Values and Counts – Tip 3
Figure 2 illustrates another simple DCL command file you can use to do a line-by-line comparison.

Figure 2. Creating a line-by-line comparison

$ Open Vc Valmode.Sum     !xx     (##)
$ Open Pc Prtmode.Sum !xx|##
$Vloop:
$ Read/End=Nomore Vc Vrec
$ Read/End=Nomore Pc Prec
$ If F$edit(F$element(0,"|",Prec),"TRIM") .Nes. -
F$edit(F$element(0,"(",Vrec),"TRIM")
$ Then
$ Write Sys$output "Out of sync"
$ Goto Nomore
$ Endif
$ If (F$element(1,"(",Vrec)-")") .Ne. F$element(1,"|",Prec)
$ Then
$ Write Sys$output "Mismatch: ''Vrec' Vs. ''Prec'"
$ Endif
$ Goto Vloop
$Nomore:
$ Close Vc
$ Close Pc
$ Exit

Note: The code in Figure 2 excludes code necessary to handle the differing presentation of MISSING between DUMP or PRINT reports and the VALUES command.

Using a Find/Search Value Comparison – Tip 4
The code in Figures 1 and 2 validate the key tables only at the lowest levels, where the records and value information are stored. This code does not verify the higher-level key tables, used to simplify the query for specific values during FIND command processing.

To verify the higher-level tables and simultaneously check the low-level tables for complete value inclusion, we will use one of the previous files as the input to a more complete test of the key table integrity. The code in Figure 3 is specific to a particular attribute in a particular dataset, but you could write a tool to generate a similar procedure for any given keyed attribute in a dataset.

Using the summarized data file generated by PRINT or DUMP processing, the following procedure opens two instances of the parent dataset, and performs FIND and SEARCH commands against the steadily diminishing selection set to assure that both FIND and SEARCH processing locate the proper number of records from the input file. Lastly, it checks for any FIND or SEARCH records remaining after the queries.

Figure 3. Analyzing a specific attribute

!  Checks MODE attribute in DETAIL table using PMODE.SUM input file
Open Ds DETAIL As FDS Readonly,-
Ds DETAIL As SDS Readonly
Var Xrec,Xval Text Varying
Var Frn,Srn,Xrn Integer
Var Ctx Integer Init 0 !File context
Init 8 Mode.Err
Call Open_File(Ctx,"Prtmode.Sum",)
Set Message Info No Print
Set Ds SDS; Find All; Consider On
Set Ds FDS; Find All; Consider On
Repeat
Call Read_File(Xrec,Ctx)
If Xrec Ne Missing Then
Let Xval = Xrec[1:$Find("|",Xrec)],
Xrn = $Int(Xrec[($Find("|",Xrec)+1):$Len(Xrec)])
Set Ds FDS
Find Mode Xval
Let Frn = $Nrec
Find Not Last; Consider Replace
Set Ds SDS
Search Mode Eq Xval
Let Srn = $Nrec
Find Not Last; Consider Replace
If Xrn Ne Frn OR Xrn Ne Srn Then
Write On 8 Xval Xrn Frn Srn
Write Xval Xrn Frn Srn
End_If
End_If
Until (Xrec Eq Missing)
Set Ds FDS; Find All
If $Nrec Ne 0 Then
Write On 8 $Nrec Fmt( "***" 2x i5 " Leftover records after FIND")
Write $Nrec Fmt( "***" 2x i5 " Leftover records after FIND")
Print On 8 Mode
End_If
Set Ds SDS; Find All
If $Nrec Ne 0 Then
Write On 8 $Nrec Fmt("***" 2x i5 " Leftover records after SEARCH")
Write $Nrec Fmt("***" 2x i5 " Leftover records after SEARCH")
Print On 8 Mode
End_If
Call Close_File(Ctx)
Exit

Notating the Code in Figure 3

  1. Consider sets are used to cause the next FIND or SEARCH command to disregard already-located records. In each value cycle, the consider set is reduced to unqueried records.
  2. An error file is produced that records any instances where FIND or SEARCH processing located a number of records different from the value read from the summary file.
  3. The code at the bottom of Figure 3 checks to see if records remain after the summary files values are queried.

You can use this approach with bulk data commands within HLI programs as well, though you may want to access the logical declaration utility (LIB$SET_LOGICAL) specifically.

Figure 4 is a sample run using the PRTMODE.SUM file from Figure 3.

Figure 4. Using the PRTMODE SUM file

$ S1032 Use Check_Mode
Current Dataset Is Now FDS
Initializing Channel 8
%RMS-W-EOF, End Of File Detected
*** 45 Leftover Records After FIND
*** 45 Leftover Records After SEARCH

These results are as expected. My PRTMODE.SUM file had the missing value line removed. The 45-record discrepancy determined by the CHECK_MODE command file represents the records where MODE is MISSING.

In Summary
The previous tips are useful for checking to see if a key table in a dataset is inconsistent with itself or inconsistent with the data values in the dataset. Employing these tips does not eliminate the need for regular dataset maintenance, but can serve as a cautionary check on an already opened dataset, if necessary.

Re-keying an attribute necessarily requires exclusive access to the dataset, so these processes might be used to check for key table validity without shutting down an active subsystem.

Model 204
USE OF AND ACCESS TO PRODUCTS AND FEATURES ARE IN ACCORDANCE WITH THE TERMS AND CONDITIONS OF THE USER’S SOFTWARE LICENSE. THE PRESENTATION OF MATERIAL HEREIN DOES NOT, IN ANY MANNER, MODIFY SUCH TERMS AND CONDITIONS.

SQL Performance Tuning
By Mark LaRocca



Model 204 provides many tuning parameters for SQL performance. Performance should be monitored and these parameters adjusted as required. So, assume we intend to use SQL fairly heavily and want to assure maximum performance. In the Figure A example, I am running an SQL request that retrieves 1,820 records from a seven million record file.

SQL Parameters on an IODEV 19 Thread Definition
Here are the initial SQL tuning parameters:

Test Query Against a Seven Million Record File
The query I am running tests performance for an actual Model 204 customer. Of course, I changed the file and field definitions, but the statistics are real.

Figure A. Querying a 7-million record file

SELECT * 
FROM TEST.INVENTORY
WHERE RECRDTYPE = 'AAA' AND
CREATDDATE BETWEEN '20051101' AND '20051102' AND
CHARGTO < '7000'

Coding an Efficient Query
Before we attempt to run this query, we should look at two things in advance to enhance performance.

Based on the previous two points, I have determined that the columns in the WHERE clause are ordered, and I have found that I can change our query to retrieve specific columns as shown in Figure B.


Figure B. Querying specific columns

SELECT Col1, Col2, Col3, Col4, Col5 
FROM TEST.INVENTORY
WHERE RECRDTYPE = 'AAA' AND
CREATDDATE BETWEEN '20051101' AND '20051102' AND
CHARGTO < '7000'

Now, let’s run our query.

Identifying Performance Problems
To our surprise, despite our tuning efforts, the query takes about 30 seconds to retrieve 1820 rows of data. This is far below expected Model 204 performance. Why? Let’s look at a short checklist of possible performance issues.

We know that each of the possible issues can affect performance. But, how do we determine which, if any, of the possible issues are the problem. The answer is LAUDIT.

Set LAUDIT=255 on your IODEV 19 threads. LAUDIT processing displays on the audit trail all information concerning the query you are executing. Also, to audit RK lines, set the X'20' bit on for SYSOPT.

Now that we have these set in our Online, try the query again and take a look at some of the performance statistics in CCAAUDIT.

Remember, an SQL query is translated to IFAM code and, as such, is really a User Language request. In CCAAUDIT you will see the SELECT statement and the User Language code into which it was translated, followed by the FIND statistics and finally a long list of S2 lines.

These lines might say S3 or S4, but usually, on a first request, they are S2. These are the actual fetches for each row; in our example there are 1,820 of these S2 lines.

Now, how do we spot the problem? First, look for server swapping. Does the request begin and end running in the same server? Look for the IODEV 19 login to spot the beginning. Figure C is the request output--edited for space--showing the points of interest.

Figure C. Request output

06097110549  2  5  17 AD ///  M204.0352: IODEV=19, OK MARK  MARK  LOG
06097110549 3 5 17 MS *** M204.0353: MARK MARK LOGIN 06 A
06097110549 4 5 17 MS *** M204.2566: LINK=LINKTCP, LOCALID=192.207.28.
...
06097110549 0 5 17 QT PREPARE statement SQL_STMT_1_ (SELECT Col1, Col2, Col3,
06097110549 1 5 17 XX Col4, Col5 FROM TEST.INVENTORY WHERE ...
...
06097110550 6 5 17 LI IN INVPOCS FD 'CREATED.DATE' IS ALPHA IN RANGE FROM '
06097110550 7 5 17 XX I' AND 'CHARGE.TO' IS ALPHA < '7000'
...
06097110550 29 5 17 LI S0
06097110616 0 5 17 RK ... M204.0880: IFFIND COMPLETE
06097110616 1 5 17 RK ... M204.0881: IFCOUNT = 1820
06097110616 2 5 17 LI S1
06097110616 3 5 17 ST $$$ USERID='CCA06 ' ACCOUNT='CCA06 ' LAST='EX
06097110616 4 5 17 XX PROC='SQL_STMT_1_' QTBL=158 STBL=60 TTBL=6 VTBL=111 P
06097110616 5 5 17 XX 6 DKRD=4845 IN=2 FINDS=1 PCPU=18 RQTM=26466 BXNEXT=11
06097110616 6 5 17 LI S2
...
06097110616 7 5 17 LI S2
06097110616 8 5 17 LI S2
06097110616 9 5 17 LI S2
06097110616 10 5 17 LI S2
06097110616 11 5 17 LI S2
...
06097110620106 5 17 LI S2
06097110620107 5 17 LI S2
06097110620108 5 17 LI S2
06097110620109 5 17 LI S2
06097110620110 5 17 LI S1
06097110620111 5 17 LI IN S0
06097110620112 5 17 QT CLOSE cursor SQL_CUR_1_ on statement SQL_STMT_1_
06097110620113 5 17 QT COMMIT -- all cursors closed
06097110620114 5 17 QT DROP statement SQL_STMT_1_
06097110620115 5 17 RK ... M204.0889: IFFLUSH
06097110620116 5 17 LI EU2,S0,S1,S2

 

Analyzing the Audit Trail
A quick look at the audit shows that the entire transaction took about 30 seconds, starting at 11:05:49 and ending (close cursor) around 11:06:20. We can see that the whole request ran in server 5. Examining the entire audit, we see that no other user took over server 5 in the middle of our request. So, there was no server swapping. Server swapping might mean that we have only one 700,000 size server and users are competing for it. We would then need to provide more servers of this size to reduce the possibility of server swapping.

Our next concern is the FIND processing. We can see from the previous code that the FIND started at 11:05:50 (see S0 line after Prepare lines) and ended (IFFIND COMPLETE) at 11:06:16, covering a total of 26 seconds. This is an indicator that, perhaps, our file needs to be reorganized. Take a look at the FIND statistics.

06097110616  5  5  17 XX 6 DKRD=4845 IN=2 FINDS=1 PCPU=18 RQTM=26466 BXNEXT=11

DKRD=4845 is the number of disk reads and RQTM=26466 is the total elapsed time in milliseconds for the request to this point. This means we are taking about five milliseconds per read. This is high because the first time we read, the records are not in disk cache and also not in Model 204 internal buffers.

Let's run the request again to test our disk cache theory. Running the same request again shows that the total transaction time was reduced from 30 seconds to 11 seconds. I have not provided the audit here, but you can run such I/O tests for any User Language request or SQL query and see the same result. Our speed increases dramatically. However, it is still too slow to meet our high Model 204 standards. The solution is: reorganize the file.

Here is the same DKRD and RQTM for an initial request not in disk cache once the file has been reorganized.

XX DKRD=1592 IN=2 FINDS=1 PCPU=50 RQTM=6137 BXNEXT=935 ...

As you can see, after the reorganization, DKRD dropped from 4845 to 1592. This is because the reorganization resulted in a vast improvement in Table B and Table D utilization, ordered index organization, reduced extension records, and so on. The data required for this request now resides on significantly fewer pages resulting in a 67% decrease in database I/O.

Secondly, notice that the RQTM dropped from 26 to six seconds. Certainly that file needed to be reorganized. (If fact, this file had not been reorganized for years.) Thus, CCA recommends that you reorganize frequently updated files once a week or minimally once a month.

Now, run the request again. At this point all rows are already in disk cache. We see the following.

XX DKRD=1592 IN=2 FINDS=1 PCPU=50 RQTM=3137 BXNEXT=935 ...

DKRDs remained the same, but actual request time decreased to 3137 milliseconds (or three seconds). However, should we expect even faster results on a FIND? Should the records already be in Model 204 storage? Yes, but they were not, because we had set MAXBUF=1500 for our Online and they were probably flushed out. So, set MAXBUF=30000 and rerun the same request. We see the following.

06097120827101  5   17 XX  SLIC=1 IN=2 FINDS=1 PCPU=708 RQTM=264

Request time (RQTM=264ms) is virtually non existent at 0.2 seconds. There is no DKRD statistic listed, because all records requested were already in Model 204 storage. The lesson of the example is:

But what about our final overall transaction time? It dropped from an initial 30 seconds to 11 seconds for a reorganized file to two to three seconds for rows in disk cache or Model 204 cache.

This brings us to our final action: tuning the client/server application.

Tuning the Client/Server Applicationl
This particular test used a Java application using Connect
*’s JDBC for Model 204. Model 204 fetch size defaults to whatever fits in a 4K buffer. In our test, this was 22 rows. To fetch 1820 rows, we must ask the server, Model 204, to send us a new buffer about 90 times. This conversation overhead is very high. I did not show it in the previous statistics, but increasing the fetch size speeds things up four to five times. To do this in Java requires a statement such as:

state.setFetchSize(25000);

This would fetch all rows of our request without requiring the JDBC driver to ask for more. In the previous test, all requests were run with this setting.

Also, if we run an initial request and stay connected for more requests, this eliminates LOGON/LOGOFF processing on Model 204--about one second--and also minimizes the chance that data is flushed from Model 204 buffers.

Smart client/server applications:

In Summary
To optimize SQL performance, tuning is the key, so:

 


Contact CCA Webmaster
Copyright 2007